Hacking the jigdo source code ============================= This file is not very organized... if in doubt, use the source, Luke! Use --enable-debug when hacking the source, this enables quite a few additional sanity checks (the "Paranoid()" assertions - the "Assert()" assertions are always enabled), #defines GTK_DISABLE_DEPRECATED, calls abort() on failed assertions instead of just printing the assertion, etc. Mailing list, CVS ~~~~~~~~~~~~~~~~~ The jigdo-user mailing list is currently used both for end-user support and development. Subscribe: https://lists.berlios.de/mailman/listinfo/jigdo-user Archive: https://lists.berlios.de/pipermail/jigdo-user Address: jigdo-user at lists.berlios.de Anonymous CVS access to the latest source code is available. Execute cvs -d:pserver:anonymous@cvs.jigdo.berlios.de:/cvsroot/jigdo login Just press return at the login prompt, then execute cvs -z3 -d:pserver:anonymous@cvs.jigdo.berlios.de:/cvsroot/jigdo co jigdo Latest code: http://cvs.berlios.de/cgi-bin/viewcvs.cgi/jigdo/jigdo/jigdo.tar.gz?tarball=1 Browse tree: http://cvs.berlios.de/cgi-bin/viewcvs.cgi/jigdo/jigdo/ Bootstrapping the code ~~~~~~~~~~~~~~~~~~~~~~ If you checkout the code from CVS or download the CVS snapshot, some additional files have to be generated - in the jigdo release tarballs, these files are already present. The Makefiles automatically handle regeneration of those files if they are not present; the only thing that you need to do manually is to invoke autoconf to create the configure script, then you can do "./configure --enable-debug ..." and "make" as usual. Additional tools required for bootstrapping the code: - autoconf, to create the configure script - docbook-utils (docbook2html, docbook2man), to generate HTML documentation and manpages from the SGML in the "doc" subdirectory. - the gettext tools (msgfmt), to generate the .gmo files in the "po" subdirectory. - glade, to generate "src/gtk/interface.cc" from "jigdo.glade" - awk (or gawk/mawk), to generate "src/Makedeps" and to post-process "src/gtk/interface.cc" On Debian systems, bootstrapping and compiling is possible simply by typing "deb/rules". Patches ~~~~~~~ Small bugfixes are very welcome, just send them to the mailing list. Before starting anything bigger, better check with me whether it coincides with my ideas - just to avoid frustration later on. I'm pretty open to all ideas, though, especially if they save me some work. :) ATM, GCC 2.95 is still supported, so contributed code should compile with GCC 2.95 as well as the latest GCC 4.x. See compat.hh for a few replacements for things which GCC 2.95 doesn't have. Sub-projects ~~~~~~~~~~~~ There are some things I'd *love* to see done by somebody else: Windows/Solaris/BSD/MacOS porting: ATM, I test and build jigdo on Linux and Windows, and occasionally on Solaris/BSD. Especially maintaining the Windows port is quite a lot of work... It'd also be nice if someone ensured that it builds on Redhat/SuSE/Mandrake, and maintained the jigdo.spec file. Paul Bolle has been doing this, get in touch with him if you want to improve the .spec file. Andrew Mathieson has started maintaining a Mac OS X port - the Fink port of jigdo had become rather neglected. See Non-interactive frontend for jigdo: The current GTK+ jigdo program is meant to be used interactively, not from a script or similar. The moment it is released, I will be inundated with requests for a version which does not need GTK+ or which behaves more like wget, or which uses curses, or... The code is organized in a way which makes adding new frontends quite easy. CGI-like program for on-the-fly creation from .jigdo/.template files: NOTE: Steve McIntyre has implemented something like this as part of JTE, see With Debian, we have lots of mirror admins who have the bandwidth to host CD images, but not the disc space. They'd like a CGI-like program which generates a pseudo dirlisting with .iso links in it, and, when such an .iso link is downloaded, it creates the .iso data from the .jigdo/.template files and a local Debian mirror _on_the_fly_ as it is being sent over the net. NB however: This *cannot* be a CGI for performance reasons and because it must support HTTP/1.1 ranges (i.e. resuming of downloads). Also, it will be necessary to "pre-process" the .template file because repeatedly ungzipping it would be too slow. If not a CGI, what form should it have? I don't know - I'd go for an Apache module, but Attila Nagy tells me Apache needs too much memory if you have many thousand concurrent downloads. Maybe one could add the functionality to an existing small httpd? ".rawtemplate" support: NOTE: This is now being tackled by Steve McIntyre, see It is possible to make .jigdo creation for Debian CDs much faster by hacking mkisofs to support more direct output of .jigdo/.template files: First, change mkisofs to output not .iso format, but a special ".rawtemplate" (or whatever) format. That format is similar to the normal .template format, except that files are not referenced by their md5sum, but by their filename, and that none of the data is compressed. This should be quite simple, and thus the modifications to mkisofs should be few and small, making it more likely that such a patch is accepted by mkisofs upstream. Next, the .rawtemplate is piped into jigdo-file, which creates .jigdo and .template files as usual. However, due to its cache, it won't need to read the files' contents to know their md5sum, so the overall process is much faster. Translations ~~~~~~~~~~~~ Both jigdo-file and jigdo use gettext, translations are welcome. Please announce your efforts on the mailing list before you start, to avoid duplication of effort. Note: Don't translate any of the strings beginning with '#'. glade doesn't allow me to selectively switch off the "gettextizing" of some strings, and it would make editing the .glade file awkward if I left those strings empty, so I'm marking them with the '#'. Directory layout ~~~~~~~~~~~~~~~~ src: Files which don't belong into one of the subdirs... Also contains lots of stuff from before the point where I started creating subdirs - much of this should move into a subdir sooner or later. src/net: Low-level network access. This used to use libwww for the actual net access. I switched over to libcurl between 0.7.1 and 0.7.2 and haven't looked back! The only disadvantage of libcurl over libwww is that it doesn't support HTTP pipelining, but libwww really is very difficult to work with. All code accessing libcurl should be isolated in this dir *only*. Files in this dir may make calls to glib, but not gtk. src/job: Application logic A "job" is a certain task, e.g. downloading an URL, processing a .jigdo file etc. For the gtk frontend, one job corresponds to one line in the lower part of the window - but this visualisation is the gtk GUI's decision. All jobs are classes in namespace "Job". Each such class declares an abstract child class (i.e. an interface) named "IO", which the code of the user interface needs to implement. For example, Job::SingleURL::IO is implemented by some code in the src/gtk directory (GtkSingleUrl), which when called sets up some GTK widgets. Files in this dir may make calls to glib, but not gtk. src/gtk: GTK+ graphical user interface Provides a way to access the facilities of "jobs" via GTK widgets etc. This is the only dir whose files may make calls to GTK (and of course also glib). src/util: Independent small helper classes Something goes in here if all of the following is true: - It is a single header file, optionally accompanied by a single .cc file and/or a corresponding *test.cc file - It accesses neither gtk nor glib - It is a small, reusable tool, maybe useful for other projects BTW, I try at all costs only to use C libraries from the code, never C++ libraries. You save yourself a lot of trouble that way, IMHO. Code style ~~~~~~~~~~ You're not required to adhere to my rules here, but maybe you'll find them useful... I use K&R-style indentation, indenting by 2 spaces for each level of indentation. I dislike using tabs in the source because it seems people can never agree on a value for the tab indentation. Emacs can be told never to generate tabs when indenting code; see below. Source files in different directories must not have the same leafname, or depend.awk and also #include will break. (IMHO identical leafnames are also a source of confusion.) Functions/methods should not be much longer than one screen. Source files should never be larger than 1000 lines, and probably not even larger than 500 lines. I'm perfectly aware that the code violates these rules all the time. ;-/ All code must, *MUST* be commented. If possible, larger components should be accompanied by a *-test.cc file which tests the code. If the name of the file is "foo-test.cc" (with a "-"), depend.awk will automatically generate a simple Makefile entry. If that entry is not sufficient, call the file "footest.cc" and manually add a Makefile rule. Lines must not be longer than 77 characters. To format comments to exactly that width, and also never to use tabs for indentation, use this in your ~/.emacs: (add-hook 'c++-mode-hook (lambda () (interactive) (setq fill-column 77) (setq indent-tabs-mode nil))) Don't use reference parameters if a function/method writes to its args, use pointers instead. If I call a function, I want to see that I'm calling it as "function(&data)", so "data" may be modified. Don't use multiple inheritance, except if all but one of the base classes are abstract or trivial (where "trivial" is things like NoCopy, which prevents copying of its derived classes). Don't have identically named classes in different namespaces, big confusion will ensue. (Don't over-use namespaces in the first place.) Header files (.fh, .hh, .ih) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To avoid excessive recompilation, I use three levels of headers. .fh contain just forward declarations of classes and global functions, and must not include .hh or .ih headers. .hh are the main headers, and .ih files contain functions which are compiled inline for --disable-debug and not inline for --enable-debug. This is done by checking for the definedness of NOINLINE. "make depend" updates the dependencies in src/Makedeps. The contents of this file are then appended to the Makefile by the configure script. See also the comments in scripts/depend.awk. Unit tests ~~~~~~~~~~ Note: Because the unit tests use stringstreams and other useful stuff which GCC 2.95 doesn't provide, they won't work with that version of GCC. I tried out cppunit and had a look at other things, but did not like any of the solutions, because IMHO they are over-engineered and unnecessarily complicated. I prefer creating a new executable for each test, and not to pack all tests into one executable, because this avoids linking problems if you replace some classes with dummy versions for one test, but want to link in the original code for another. My alternative solution is very simple: Each test is an executable which returns zero on success, and non-zero otherwise. You can use a normal Assert() to check for correct results, because it calls abort() if the assertion fails, which will return a non-zero exit value. To add the test unit program in file subdir/foobar-test.cc (which supposedly tests the code in subdir/foobar.cc): - Add a line "#test-deps" to your subdir/foobar-test.cc source file inside a comment. The directive can optionally be followed by a space-separated list of .o files which need to be linked into the executable. $(TEST-DEFAULTOBJS) are added automatically to this list in the Makefile (expands to "util/log.o util/string-utf.o util/debug.o") - Add "subdir/foobar-test@exe@" to the definition of "test-programs" in Makefile.in - "make depend" - this reads the above "#test-deps" directive and creates an appropriate Makefile entry for the unit test. - "make test" to run all tests, in quiet mode, or "make test-v" in verbose mode (if "dir/prog" failed, re-executes it as "dir/prog all" to enable debug info). If --enable-debug was passed to the configure script, the unit tests are also run by "make all" When linking the tests, $(TEST-LDFLAGS) is added to the link command. By convention, if the test is called with exactly one arg, that arg sets up the debug units for which output is to be enabled - just like the arg to the --debug switch of jigdo and jigdo-file. Add support for this in main() as follows: #include int main(int argc, char* argv[]) { if (argc == 2) Logger::scanOptions(argv[1], argv[0]); ... This is useful while testing the unit test: Whereas "make test" will call the test as "subdir/foobar-test", you can execute it as "subdir/foobar-test all" to enable debug output. Have a look at log.hh for the useful debug("fmt", args...) feature. Implementation ~~~~~~~~~~~~~~ The application logic in job/ and net/ is separated from the frontend in gtk/. Each Job corresponds to one action (downloading a file, processing a .jigdo file), the GTK GUI decides to visualize this as one line in the display. The GTK code maintains a list of JobLines, each of which references a Job of some kind. A simple type of job is a SingleURL, which takes care of downloading a single file. A more complicated job is the processing of a .jigdo file. This involves a number of sub-jobs for downloading data. The "master" job is a MakeImageDl, it starts other "child" jobs for the individual downloads. The action of *assembling* the data is not implemented as a job, instead as a MakeImage object which is owned by the MakeImageDl and fed data by it. See makeimage.hh. Important for the UI: Any UI (whether GTK+ or other) is only allowed to create and destroy top-level jobs. Any job created by a "master" MakeImageDl can only be deleted by the same MakeImageDl. This means that the GUI code which handles SingleURL GTK+ visualization must distinguish between a "child mode" (we don't own the SingleUrl) and "non-child mode" (we created and own the SingleUrl; this will only be the case if the UI also allows single-file downloads). It might even be advisable to have two classes for the two cases; ATM the GTK+ frontend doesn't. MakeImageDl ~~~~~~~~~~~ This serves as an example (and it's the most complicated one) of how the frontend in gtk/ is decoupled from the application logic in job/ and other dirs. Basic idea: In the object that makes data available (e.g. SingleUrl in the case of a HTTP/FTP download), maintain a pointer to an IO object. This pointer is stored inside a public IOSource member (e.g. SingleUrl::io). SingleUrl::IO is an abstract class, to be implemented by the "consumer" of the data. This consumer is fed the data via calls to virtual methods like SingleUrl::IO::singleUrl_data(). There are many IO classes; Job::IO is their base (see job.hh). Implementers of SingleUrl::IO include GtkSingleUrl (GTK+ frontend) and JigdoIO (decompression/parsing of .jigdo data.) The consumer registers its IO-implementing object using IOSource::addListener(). There were *lots* of problems with dangling pointers with early versions of the producer-consumer mechanism: Typically, the frontend object would be deleted because of some user intervention, but the pointer to its producer would still be dereferenced to make one last "hey frontend, you are deleting me" (aka job_deleted()) call. This was not un-fixable, but difficult to maintain. So with the current code, all IO implementers are automatically deregistered from their IOSource when they are destroyed. Often, there is more than one consumer. For example, when downloading .jigdo data, a SingleUrl creates the data (more accurately its Download does) and writes it to disc. The MakeImageDl which owns that SingleUrl wants to have a look at the data and interpret the sections inside the .jigdo, which it does by registering a JigdoIO with SingleUrl::io in MakeImageDl::createJigdoDownload() ("io" member inherited from DataSource). In order to get notified of any news, the GtkSingleUrl frontend also registers itself with the SingleUrl in GtkSingleUrl::openOutputAndRun(). In the most complicated scenario, the downloaded data must also be decompressed by the JigdoIO, resulting in the following flow of data - the "->" can be interpreted both as "an IO* points this way" and "information flows this way": Download -> SingleUrl -+-> JigdoIO -> JigdoIO +-> GtkSingleUrl It might be confusing that JigdoIO appears twice; this is because the same JigdoIO object has a double role as consumer of the SingleUrl data and as consumer of the uncompressed data of the Gunzip it owns. IOW, it implements both DataSource::IO and Gunzip::IO. MakeImageDl's job is to start downloads as necessary and to pump their data into the MakeImage object it owns. MakeImage then creates the output image. With childFor(), MakeImageDl is a factory of DataSource objects. A DataSource can be either a download (SingleUrl) or a "dummy" spooler object which just reads data from a file in the cache (CachedUrl). After creation of the DataSource, an appropriate consumer for the downloaded data (e.g. a JigdoIO) is attached to it. JigdoIO and its related classes are helpers of the MakeImageDl. They hold a pointer to their "master" MakeImageDl and notify it whenever downloads fail or succeed. The MakeImageDl can act as necessary, e.g. because the complete information of the .jigdo file is available to it, it can retry the failed download, but use a different server. -- __ _ |_) /| Richard Atterer | \/¯| http://atterer.net ¯ '` ¯