Revision 1 as of 2007-07-27 23:53:16
A relatively simple explanation of weeding
|Deletions are marked like this.||Additions are marked like this.|
|Line 160:||Line 160:|
|=== Resume LEase Expiration ===||=== Resume Lease Expiration ===|
This page gives a brief overview of how the weeding process works
Output from the weeder (VestaWeed) will be shown as if it were run with full debugging out ("-debug All" on the command line).
Check for Partially Completed Weed
Since weeding is not an instantaneous process, the weeder might be terminated after doing only some of its work. When it starts up, it must check whether another weeder exited part-way through doing its work.
The weeder stores some of state on disk to make sure that it can recover and continue a partially completed weed:
02:55:03.702 07/27/2007 -- Recovering `weeded' weeded = <<Empty>>
(If there were a partially performed weeder run that needs to be completed, the on-disk weeded set wouldn't be empty.)
Check for Another Running Weeder
Sicne Vesta is a multi-process, multi-machine system there could be more than one weeder trying to work at the same time. The cache daemon helps the weeder avoid this by remembering when a weeder is running. The weeder calls the cache to tell it when its starting up, and that's when the check is performed:
02:55:03.702 07/27/2007 -- CALLING -- WeederRecovering doneMarking = false 02:55:03.702 07/27/2007 -- RETURNED -- WeederRecovering err = false
If there was another weeder running, the weeder would exit with an error at this point.
The primary job of the weeder is to decide which function calls stored in the cache daemon should be kept. This is essentially a mark-sweep [http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) garbage collection] operation. Deciding which entries to keep is called the mark pahse by the weeder:
02:55:03.702 07/27/2007 -- Starting weeder mark phase
The weeder needs to know what cache entries currently exist when it starts. To determine this, it calls the cache daemon's StartMark procedure:
02:55:03.702 07/27/2007 -- CALLING -- StartMark 02:55:03.845 07/27/2007 -- RETURNED -- StartMark initCIs = 0-13666, 13668, 13671, 13673, 13675, 13682, 13684-13685, ... 388880-388881, 388884-388885, 388904 (180411 total) newLogVer = 176
Each cache entry is uniquely identified by a cache index (or "CI") which is a small integer. The StartMark procedure returns the set of currently allocated cache index values.
StartMark also returns the next graph log version number.
Mark Entries From Instructions
Based on [http://www.vestasys.org/doc/man/html/VestaWeed.1.html#InputSect models that the weeder is told to keep with a root file] and any time window specified with [http://www.vestasys.org/doc/man/html/VestaWeed.1.html#KeepOption the -keep flag], the weeder picks builds to keep. It then marks all cache entries that are part of that build. It does this by reading the cache daemon's graph log which stores the complete function call graph of all builds in the cache.
02:55:03.991 07/27/2007 -- Starting to mark cache entries reachable from roots 02:55:03.991 07/27/2007 -- Starting scan of graph log 02:55:20.064 07/27/2007 -- Finished scan of graph log roots read = 1399 nodes read = 180411 entries marked = 16024 nodes written = 180411
Because the graph log is very large, the weeder doesn't keep it all in memory. Instead, it repeatedly scans the entries from it marking more cache entries to keep each time:
02:55:20.064 07/27/2007 -- Starting scan of pending graph log 02:55:28.989 07/27/2007 -- Finished scan of pending graph log nodes read = 180411 entries marked = 95696 nodes written = 66501 ... 02:55:30.413 07/27/2007 -- Starting scan of pending graph log 02:55:30.707 07/27/2007 -- Finished scan of pending graph log nodes read = 27328 entries marked = 165 nodes written = 27036 02:55:30.707 07/27/2007 -- Starting scan of pending graph log 02:55:31.098 07/27/2007 -- Finished scan of pending graph log nodes read = 27036 entries marked = 0 nodes written = 27036
It stops when it find no more to be marked.
Initial Set to be Weeded
At this point the weeder has an initial guess at which entries will be kept:
02:55:31.099 07/27/2007 -- Finished marking cache entries reachable from roots entries marked = 143375
(Note that the entries marked are the ones being kept, not the ones being deleted.)
The weeder then tells the cache which entries it thinks will be deleted. It asks the cache to stop letting builds get cache hits on these so that a build won't start using something that's about to be deleted by calling the cache's SetHitFilter procedure:
02:55:31.099 07/27/2007 -- CALLING -- SetHitFilter cis = 5-6, 86, 732, 743, 811, 1083-1084, 1405, 1429-1430, 1432-1434, ... 208550-208566, 215289-215302, 215318-215321, 215324-215329 (37036 total) 02:55:32.489 07/27/2007 -- RETURNED -- SetHitFilter
(Note that this set is the initial set of CIs minus those which have been marked to be kept.)
Get Leased Entries
Next the weeder must find out which cache entries currently or recently running builds have been using:
02:55:32.489 07/27/2007 -- CALLING -- GetLeases 02:55:32.490 07/27/2007 -- RETURNED -- GetLeases cis = 14, 107, 120, 242, 245, 269, 273, 286-288, 301, 323, 334, 336, ... 197850, 203841, 203855, 211079, 211144, 215160, 220513 (5126 total)
It's not safe to delete these, so the weeder must mark them to be kept as well, plus their entire call tree of other cache entries:
02:55:32.496 07/27/2007 -- Started marking entries reachable from leased entries
Resume Lease Expiration
Normally cache entry leases only last for a little while. (See the secription of the [CacheServer]LeaseTimeoutHrs configuration setting on [http://www.vestasys.org/doc/man/html/VCache.1.html the cache daemon man page].) However, during the initial part of the weeder's work leases don't expire. Now that the weeder has the set of cache entry leases it is safe for them to expire again so it tells the cache server that by calling ResumeLeaseExp:
02:55:32.496 07/27/2007 -- CALLED -- ResumeLeaseExp 02:55:32.496 07/27/2007 -- RETURNED -- ResumeLeaseExp
Mark Leased Entries
Next the weeder must mark all the entries that were leased to be kept:
02:55:32.496 07/27/2007 -- Started marking leased cache entries 02:55:32.529 07/27/2007 -- Finished marking leased cache entries entries marked = 51
Then to find the child cache entires of the leased entries (i.e. functions they called) it must again scan the graph log:
02:55:32.529 07/27/2007 -- Starting scan of pending graph log 02:55:32.826 07/27/2007 -- Finished scan of pending graph log nodes read = 27036 entries marked = 14 nodes written = 26985 ...
End of Mark Phase
Finally the weeder reaches the end of its mark phse and knows all the cache entries to be kept out of the original set:
02:55:33.945 07/27/2007 -- Finished weeder mark phase total entries marked = 143426 total entries to be weeded = 36985
Store State on Disk
To make sure that the weeder can recover if it is terminated before it can complete its work, it writes out its state after it finishes its mark phase:
02:55:33.945 07/27/2007 -- Logging miscellaneous variables startTime = 1185519303 disShortId = 0x91cb6300 markedRoots = size = 74 content = pkgDirFP = f385219c4c4631a1 45e528383948c414; modelShortId = 0x9a02f474 [fresh] ... pkgDirFP = bc034f1c19229a1d be382c6c63c87a49; modelShortId = 0x9a02f474 [fresh] 02:55:33.978 07/27/2007 -- Logging `weeded' weeded = 5-6, 86, 732, 743, 811, 1083-1084, 1405, 1429-1430, 1432-1434, ... 215324-215329 (36985 total)
Now that the weeder has determined exactly which entries will be kept and which will be deleted, it begins the deletion phase:
02:55:34.033 07/27/2007 -- Starting weeder deletion phase
Earlier the weeder gave the cache server a guess at which entries would be deleted by calling the HitFilter procedure. Now that it has marked the leased entries and all entries reachable from the leased entries, it can give the cache the full list to be deleted. It does this by calling the EndMark procedure:
02:55:34.646 07/27/2007 -- CALLING -- EndMark cis = 5-6, 86, 732, 743, 811, 1083-1084, 1405, 1429-1430, 1432-1434, ... 215324-215329 (36985 total) prefixes = 17525 total 02:55:34.848 07/27/2007 -- RETURNED -- EndMark chkptVer = 176
After this call the cache will begin deleting cache entries. Of course it continues to keep clients from getting cache hits on any entries which are about to be deleted.
Note that "chkptVer" here is the same as "newLogVer" returned by StartMark.
While it was marking entries, the weeder also gathered a list of derived files to keep. Since all files (both sources and deriveds) are stored by the repository, the weeder must next ask the repository to delete files and reclaim disk space:
02:55:34.848 07/27/2007 -- Started deleting dead repository files 03:51:25.584 07/27/2007 -- Finished deleting dead repository files
See [#find_sources_to_keep "Finding Source Files To Keep" below] for a look inside the what the repository daemon does during this step.
Pruning the Graph Log
The cache daemon takes care of removing cache entries from the MultiPKFiles in which they are actually stored. However it doesn't remove them from the graph log, which is a data structure used primarily during weeding. Instead, the weeder does that:
03:51:25.585 07/27/2007 -- Started pruning graph log
Essentially the weeder write a new .ckp file to go in the graph log suib-directory of the cache. Once it has done so, it does a little sanity check to make sure that it wrote out all the graph log roots which it marked during the mark phase:
03:51:32.478 07/27/2007 -- Checking marked vs. written roots
(Graph log roots represent whole builds. Graph log nodes represent individual cache entries.)
Next it prints information on how much it wrote into the new graph log checkpoint:
03:51:32.478 07/27/2007 -- Finished pruning graph log roots written = 1006 nodes written = 143426
Finally it asks the cache daemon to accept the new .ckp file it has written:
03:51:32.478 07/27/2007 -- CALLED -- CommitChkpt chkptFileName = 176.ckp;46a9a3fd 03:51:34.211 07/27/2007 -- RETURNED -- CommitChkpt result: checkpoint accepted
Clear State on Disk
Now that the weeder is almost done, it's time to clear the information stored on disk which another process would use to recover and continue a partially completed weed:
03:51:34.211 07/27/2007 -- Logging `weeded' weeded = <<Empty>> (0 total)
Checkpoint the Repository
Since we have just cleaned up, now is a good time to have the repository write a new .ckp file of its own state:
03:51:34.236 07/27/2007 -- Started checkpointing the repository 03:55:14.743 07/27/2007 -- Finished checkpointing the repository
Right after weeding the checkpoint will be a little more compact than at other times. Also, writing a new one periodically will help the repository daemon start up faster when it restarts (e.g. in the event of a power outage or other fault).
End of Deletion Phase
Finally we reach the end of the weeding process:
03:55:14.744 07/27/2007 -- Finished weeder deletion phase
[http://www.vestasys.org/doc/man/html/QuickWeed.1.html QuickWeed] does the same thing as [http://www.vestasys.org/doc/man/html/VestaWeed.1.html the normal weeder] if you gave it instructions to keep every entry in the cache (i.e. the equivalent of "-keep infinity"). It does much less work than the normal weeder, but doesn't delete much unless you've just done something like erase your cache.
It starts by reading the graph log and gathering a list of every derived file:
Writing ShortIds of deriveds to keep to file 95925900 19:11:51.911 07/27/2007 -- Started marking derived files to keep 19:11:51.911 07/27/2007 -- Started graph log scan Configuration file: /etc/vesta.cfg Hostname:port: vesta.example.com:21773 Graph log: /var/lib/vesta/cache/graphLog 19:12:00.380 07/27/2007 -- Finished graph log scan Graph log entries processed = 123020 Derived files found = 23504083 19:12:00.381 07/27/2007 -- Finished marking derived files to keep
Then it calls the repository daemon to ask it to mark sources to keep and delete files to reclaim space:
19:12:00.381 07/27/2007 -- Started marking sources and deleting unreachable files 19:13:17.235 07/27/2007 -- Finished marking sources and deleting unreachable files
Finally it checkpoints the repository:
19:13:17.235 07/27/2007 -- Started checkpointing the repository 19:13:18.592 07/27/2007 -- Finished checkpointing the repository
When Does the Repository Daemon Block?
There are two points during the weeding process when the repository must stop responding to incoming network requests while it does book-keeping related to weeding. For most users these delays are very short, however for large repositories they can become noticeable delays.
Finding Source Files To Keep
When the weeder calls the repository to delete files, it passes a couple arguments:
A list of derived files to be kept (as 32-bit shortids stored in another shortid file)
- A time. Any derived files modified after that point in time will be kept
Here's how the repository prints these arguments to its standard output (which is normally logged for administrators):
07/27/2007 02:55:34 keep derived: keepDerivedTime 1185519303, sid 0x91cb6300
Then the repository must build a list of source files to keep. It picks another shortid file to hold the list of a sources, and another time. Any source files with a modification time after this will be kept.
07/27/2007 02:55:34 source weed: keepSourceTime 1185519304, sid 0xb5e0249b
After this it must scan the entire directory structure of the repository to gather the files to keep. While it is doing this it cannot respond to any incoming network requests. If it allowed any changes during this period there is a possibility that it might miss some files which should be kept.
When it finishes writing the list of sources to keep it prints another message:
07/27/2007 03:28:41 source weed done
Once this message is printed, the repository begins handling incoming network requests again.
After the repository has determined the list of source files to keep, it combines it with the list of derived files to keep and then deletes all other files.
07/27/2007 03:35:29 wrote combined keepSidFile 0xb5e0249c 07/27/2007 03:51:25 deletions done
When the repository daemon writes a new .ckp file containing its entire state (directory structure, attributes, mastership flags, etc.) it must also stop responding to netwokr requests until it has written it out. It always prints these messages while it is doing this:
07/27/2007 03:51:34 starting checkpoint 07/27/2007 03:55:14 checkpoint done
Other pages in this wiki:
Source code packages containing the implementation: