This file contains a description of the RevisionRecorder / RevisionExcluder / RevisionReader mechanism. cvs2svn now includes hooks to make it possible to avoid having to invoke CVS or RCS zillions of times in OutputPass (which is otherwise the most expensive part of the conversion). Here is a brief description of how the hooks work. Each conversion requires an instance of RevisionReader, whose responsibility is to produce the text contents of CVS revisions on demand during OutputPass. The RevisionReader can read the CVS revision contents directly out of the RCS files during OutputPass. But additional hooks support the construction of different kinds of RevisionReader that record the CVS file revisions' contents during CollectRevsPass then output the contents during OutputPass. (Indeed, for non-SVN backends, OutputPass might not even require the file contents.) Specifically, the RevisionReader instance can supply instances of two other classes to help it out: RevisionRecorder -- can record the CVS revisions' text during CollectRevsPass to avoid having to parse the RCS files again during OutputPass. RevisionExcluder -- is informed during FilterSymbolsPass about CVS revisions that have been excluded from the conversion and will therefore not be needed during OutputPass. This mechanism can be used to discard temporary data that will not be required. The type of RevisionReader to be used for a run of cvs2svn can be set using --use-internal-co, --use-rcs, or --use-cvs, or via the --options file with a line like: ctx.revision_reader = MyRevisionReader() The following RevisionReaders are supplied with cvs2svn: InternalRevisionReader -- an InternalRevisionRecorder records the revisions' delta text and their dependencies during CollectRevsPass; an InternalRevisionExcluder discards unneeded deltas in FilterSymbolsPass; an InternalRevisionReader reconstitutes the revisions' contents during OutputPass from the recorded data. This is by far the fastest option, but it requires a substantial amount of temporary disk space for the duration of the conversion. RCSRevisionReader -- uses RCS's "co" command to extract the revision text during OutputPass. This is slower than InternalRevisionReader because "co" has to be executed very many times, but is better tested and does not require any temporary disk space. RCSRevisionReader does not use a RevisionRecorder or RevisionExcluder. CVSRevisionReader -- uses the "cvs" command to extract the revision text during OutputPass. This is even slower than RCSRevisionReader, but it can handle a some CVS file quirks that stymy RCSRevisionReader (see the cvs2svn HTML documentation). CVSRevisionReader does not use a RevisionRecorder or RevisionExcluder. It is possible to write your own RevisionReader if you would like to do things differently. A RevisionReader that wants to record information during CollectRevsPass should define a method get_revision_recorder(), which should return an instance of RevisionRecorder. A RevisionRecorder has callback methods that are invoked as the CVS files are parsed. For example, RevisionRecorder.record_text() is passed the log message and text (full text or delta) for each file revision. The record_text() method is allowed to return an arbitrary token (for example, a content hash), and that token is stored into CVSRevision.revision_recorder_token and carried along by cvs2svn. A RevisionReader that wants to be informed about revisions that will be excluded from the conversion should define a method get_revision_excluder(), which returns an instance of RevisionExcluder. The RevisionExcluder's callbacks are invoked during FilterSymbolsPass to tell it which CVS revisions will actually be needed by the conversion. The RevisionExcluder has the opportunity to use this information to delete unneeded temporary data. Later, when OutputPass requires the file contents, it calls RevisionReader.get_content_stream(), which is passed a CVSRevision instance and has to return a stream object that produces the file revision's contents. The fancy RevisionReader could use the token to retrieve the pre-stored file contents without having to call CVS or RCS at all.