Revision 2 as of 2007-07-27 23:59:50

Clear message

This page gives a brief overview of how the weeding process works


Normal Process

Output from the weeder (VestaWeed) will be shown as if it were run with full debugging out ("-debug All" on the command line).

Check for Partially Completed Weed

Since weeding is not an instantaneous process, the weeder might be terminated after doing only some of its work. When it starts up, it must check whether another weeder exited part-way through doing its work.

The weeder stores some of state on disk to make sure that it can recover and continue a partially completed weed:

02:55:03.702 07/27/2007 -- Recovering `weeded'
  weeded = <<Empty>>

(If there were a partially performed weeder run that needs to be completed, the on-disk weeded set wouldn't be empty.)

Check for Another Running Weeder

Sicne Vesta is a multi-process, multi-machine system there could be more than one weeder trying to work at the same time. The cache daemon helps the weeder avoid this by remembering when a weeder is running. The weeder calls the cache to tell it when its starting up, and that's when the check is performed:

02:55:03.702 07/27/2007 -- CALLING -- WeederRecovering
  doneMarking = false

02:55:03.702 07/27/2007 -- RETURNED -- WeederRecovering
  err = false

If there was another weeder running, the weeder would exit with an error at this point.

Mark Phase

The primary job of the weeder is to decide which function calls stored in the cache daemon should be kept. This is essentially a mark-sweep [ garbage collection] operation. Deciding which entries to keep is called the mark pahse by the weeder:

02:55:03.702 07/27/2007 -- Starting weeder mark phase


The weeder needs to know what cache entries currently exist when it starts. To determine this, it calls the cache daemon's StartMark procedure:

02:55:03.702 07/27/2007 -- CALLING -- StartMark

02:55:03.845 07/27/2007 -- RETURNED -- StartMark
  initCIs = 
      0-13666, 13668, 13671, 13673, 13675, 13682, 13684-13685, 
      388880-388881, 388884-388885, 388904 
    (180411 total)
  newLogVer = 176

Each cache entry is uniquely identified by a cache index (or "CI") which is a small integer. The StartMark procedure returns the set of currently allocated cache index values.

StartMark also returns the next graph log version number.

Mark Entries From Instructions

Based on [ models that the weeder is told to keep with a root file] and any time window specified with [ the -keep flag], the weeder picks builds to keep. It then marks all cache entries that are part of that build. It does this by reading the cache daemon's graph log which stores the complete function call graph of all builds in the cache.

02:55:03.991 07/27/2007 -- Starting to mark cache entries reachable from roots

02:55:03.991 07/27/2007 -- Starting scan of graph log

02:55:20.064 07/27/2007 -- Finished scan of graph log
  roots read = 1399
  nodes read = 180411
  entries marked = 16024
  nodes written = 180411

Because the graph log is very large, the weeder doesn't keep it all in memory. Instead, it repeatedly scans the entries from it marking more cache entries to keep each time:

02:55:20.064 07/27/2007 -- Starting scan of pending graph log

02:55:28.989 07/27/2007 -- Finished scan of pending graph log
  nodes read = 180411
  entries marked = 95696
  nodes written = 66501
02:55:30.413 07/27/2007 -- Starting scan of pending graph log

02:55:30.707 07/27/2007 -- Finished scan of pending graph log
  nodes read = 27328
  entries marked = 165
  nodes written = 27036

02:55:30.707 07/27/2007 -- Starting scan of pending graph log

02:55:31.098 07/27/2007 -- Finished scan of pending graph log
  nodes read = 27036
  entries marked = 0
  nodes written = 27036

It stops when it find no more to be marked.

Initial Set to be Weeded

At this point the weeder has an initial guess at which entries will be kept:

02:55:31.099 07/27/2007 -- Finished marking cache entries reachable from roots
  entries marked = 143375

(Note that the entries marked are the ones being kept, not the ones being deleted.)

The weeder then tells the cache which entries it thinks will be deleted. It asks the cache to stop letting builds get cache hits on these so that a build won't start using something that's about to be deleted by calling the cache's SetHitFilter procedure:

02:55:31.099 07/27/2007 -- CALLING -- SetHitFilter
  cis = 
      5-6, 86, 732, 743, 811, 1083-1084, 1405, 1429-1430, 1432-1434, 
      208550-208566, 215289-215302, 215318-215321, 215324-215329 
    (37036 total)

02:55:32.489 07/27/2007 -- RETURNED -- SetHitFilter

(Note that this set is the initial set of CIs minus those which have been marked to be kept.)

Get Leased Entries

Next the weeder must find out which cache entries currently or recently running builds have been using:

02:55:32.489 07/27/2007 -- CALLING -- GetLeases

02:55:32.490 07/27/2007 -- RETURNED -- GetLeases
  cis = 
      14, 107, 120, 242, 245, 269, 273, 286-288, 301, 323, 334, 336, 
      197850, 203841, 203855, 211079, 211144, 215160, 220513 
    (5126 total)

It's not safe to delete these, so the weeder must mark them to be kept as well, plus their entire call tree of other cache entries:

02:55:32.496 07/27/2007 -- Started marking entries reachable from leased entries

Resume Lease Expiration

Normally cache entry leases only last for a little while. (See the secription of the [CacheServer]LeaseTimeoutHrs configuration setting on [ the cache daemon man page].) However, during the initial part of the weeder's work leases don't expire. Now that the weeder has the set of cache entry leases it is safe for them to expire again so it tells the cache server that by calling ResumeLeaseExp:

02:55:32.496 07/27/2007 -- CALLED -- ResumeLeaseExp

02:55:32.496 07/27/2007 -- RETURNED -- ResumeLeaseExp

Mark Leased Entries

Next the weeder must mark all the entries that were leased to be kept:

02:55:32.496 07/27/2007 -- Started marking leased cache entries

02:55:32.529 07/27/2007 -- Finished marking leased cache entries
  entries marked = 51

Then to find the child cache entires of the leased entries (i.e. functions they called) it must again scan the graph log:

02:55:32.529 07/27/2007 -- Starting scan of pending graph log

02:55:32.826 07/27/2007 -- Finished scan of pending graph log
  nodes read = 27036
  entries marked = 14
  nodes written = 26985

End of Mark Phase

Finally the weeder reaches the end of its mark phse and knows all the cache entries to be kept out of the original set:

02:55:33.945 07/27/2007 -- Finished weeder mark phase
  total entries marked = 143426
  total entries to be weeded = 36985

Store State on Disk

To make sure that the weeder can recover if it is terminated before it can complete its work, it writes out its state after it finishes its mark phase:

02:55:33.945 07/27/2007 -- Logging miscellaneous variables
  startTime = 1185519303
  disShortId = 0x91cb6300
  markedRoots =
    size = 74
    content =
      pkgDirFP = f385219c4c4631a1 45e528383948c414; modelShortId = 0x9a02f474   [fresh]
      pkgDirFP = bc034f1c19229a1d be382c6c63c87a49; modelShortId = 0x9a02f474   [fresh]

02:55:33.978 07/27/2007 -- Logging `weeded'
  weeded = 
      5-6, 86, 732, 743, 811, 1083-1084, 1405, 1429-1430, 1432-1434, 
    (36985 total)

Deletion Phase

Now that the weeder has determined exactly which entries will be kept and which will be deleted, it begins the deletion phase:

02:55:34.033 07/27/2007 -- Starting weeder deletion phase


Earlier the weeder gave the cache server a guess at which entries would be deleted by calling the HitFilter procedure. Now that it has marked the leased entries and all entries reachable from the leased entries, it can give the cache the full list to be deleted. It does this by calling the EndMark procedure:

02:55:34.646 07/27/2007 -- CALLING -- EndMark
  cis = 
      5-6, 86, 732, 743, 811, 1083-1084, 1405, 1429-1430, 1432-1434, 
    (36985 total)
  prefixes = 17525 total

02:55:34.848 07/27/2007 -- RETURNED -- EndMark
  chkptVer = 176

After this call the cache will begin deleting cache entries. Of course it continues to keep clients from getting cache hits on any entries which are about to be deleted.

Note that "chkptVer" here is the same as "newLogVer" returned by StartMark.

Deleting Files

While it was marking entries, the weeder also gathered a list of derived files to keep. Since all files (both sources and deriveds) are stored by the repository, the weeder must next ask the repository to delete files and reclaim disk space:

02:55:34.848 07/27/2007 -- Started deleting dead repository files

03:51:25.584 07/27/2007 -- Finished deleting dead repository files

See [#find_sources_to_keep "Finding Source Files To Keep" below] for a look inside the what the repository daemon does during this step.

Pruning the Graph Log

The cache daemon takes care of removing cache entries from the MultiPKFiles in which they are actually stored. However it doesn't remove them from the graph log, which is a data structure used primarily during weeding. Instead, the weeder does that:

03:51:25.585 07/27/2007 -- Started pruning graph log

Essentially the weeder write a new .ckp file to go in the graph log suib-directory of the cache. Once it has done so, it does a little sanity check to make sure that it wrote out all the graph log roots which it marked during the mark phase:

03:51:32.478 07/27/2007 -- Checking marked vs. written roots

(Graph log roots represent whole builds. Graph log nodes represent individual cache entries.)

Next it prints information on how much it wrote into the new graph log checkpoint:

03:51:32.478 07/27/2007 -- Finished pruning graph log
  roots written = 1006
  nodes written = 143426

Finally it asks the cache daemon to accept the new .ckp file it has written:

03:51:32.478 07/27/2007 -- CALLED -- CommitChkpt
  chkptFileName = 176.ckp;46a9a3fd

03:51:34.211 07/27/2007 -- RETURNED -- CommitChkpt
  result: checkpoint accepted

Clear State on Disk

Now that the weeder is almost done, it's time to clear the information stored on disk which another process would use to recover and continue a partially completed weed:

03:51:34.211 07/27/2007 -- Logging `weeded'
  weeded = 
    (0 total)

Checkpoint the Repository

Since we have just cleaned up, now is a good time to have the repository write a new .ckp file of its own state:

03:51:34.236 07/27/2007 -- Started checkpointing the repository

03:55:14.743 07/27/2007 -- Finished checkpointing the repository

Right after weeding the checkpoint will be a little more compact than at other times. Also, writing a new one periodically will help the repository daemon start up faster when it restarts (e.g. in the event of a power outage or other fault).

End of Deletion Phase

Finally we reach the end of the weeding process:

03:55:14.744 07/27/2007 -- Finished weeder deletion phase

QuickWeed Process

[ QuickWeed] does the same thing as [ the normal weeder] if you gave it instructions to keep every entry in the cache (i.e. the equivalent of "-keep infinity"). It does much less work than the normal weeder, but doesn't delete much unless you've just done something like erase your cache.

It starts by reading the graph log and gathering a list of every derived file:

Writing ShortIds of deriveds to keep to file 95925900
19:11:51.911 07/27/2007 -- Started marking derived files to keep
19:11:51.911 07/27/2007 -- Started graph log scan
  Configuration file: /etc/vesta.cfg
  Graph log: /var/lib/vesta/cache/graphLog
19:12:00.380 07/27/2007 -- Finished graph log scan
  Graph log entries processed = 123020
  Derived files found = 23504083
19:12:00.381 07/27/2007 -- Finished marking derived files to keep

Then it calls the repository daemon to ask it to mark sources to keep and delete files to reclaim space:

19:12:00.381 07/27/2007 -- Started marking sources and deleting unreachable files
19:13:17.235 07/27/2007 -- Finished marking sources and deleting unreachable files

Finally it checkpoints the repository:

19:13:17.235 07/27/2007 -- Started checkpointing the repository
19:13:18.592 07/27/2007 -- Finished checkpointing the repository

When Does the Repository Daemon Block?

There are two points during the weeding process when the repository must stop responding to incoming network requests while it does book-keeping related to weeding. For most users these delays are very short, however for large repositories they can become noticeable delays.


Finding Source Files To Keep

When the weeder calls the repository to delete files, it passes a couple arguments:

Here's how the repository prints these arguments to its standard output (which is normally logged for administrators):

 07/27/2007 02:55:34 keep derived: keepDerivedTime 1185519303, sid 0x91cb6300

Then the repository must build a list of source files to keep. It picks another shortid file to hold the list of a sources, and another time. Any source files with a modification time after this will be kept.

 07/27/2007 02:55:34 source weed: keepSourceTime 1185519304, sid 0xb5e0249b

After this it must scan the entire directory structure of the repository to gather the files to keep. While it is doing this it cannot respond to any incoming network requests. If it allowed any changes during this period there is a possibility that it might miss some files which should be kept.

When it finishes writing the list of sources to keep it prints another message:

 07/27/2007 03:28:41 source weed done

Once this message is printed, the repository begins handling incoming network requests again.

After the repository has determined the list of source files to keep, it combines it with the list of derived files to keep and then deletes all other files.

 07/27/2007 03:35:29 wrote combined keepSidFile 0xb5e0249c
 07/27/2007 03:51:25 deletions done


When the repository daemon writes a new .ckp file containing its entire state (directory structure, attributes, mastership flags, etc.) it must also stop responding to netwokr requests until it has written it out. It always prints these messages while it is doing this:

 07/27/2007 03:51:34 starting checkpoint
 07/27/2007 03:55:14 checkpoint done

See Also

Man pages:

Other pages in this wiki:

Source code packages containing the implementation: