gawk: Derived Files
1
1 C.2.4 Why Generated Files Are Kept In Git
1 -----------------------------------------
1
1 If you look at the 'gawk' source in the Git repository, you will notice
1 that it includes files that are automatically generated by GNU
1 infrastructure tools, such as 'Makefile.in' from Automake and even
1 'configure' from Autoconf.
1
1 This is different from many Free Software projects that do not store
1 the derived files, because that keeps the repository less cluttered, and
1 it is easier to see the substantive changes when comparing versions and
1 trying to understand what changed between commits.
1
1 However, there are several reasons why the 'gawk' maintainer likes to
1 have everything in the repository.
1
1 First, because it is then easy to reproduce any given version
1 completely, without relying upon the availability of (older, likely
1 obsolete, and maybe even impossible to find) other tools.
1
1 As an extreme example, if you ever even think about trying to
1 compile, oh, say, the V7 'awk', you will discover that not only do you
1 have to bootstrap the V7 'yacc' to do so, but you also need the V7
1 'lex'. And the latter is pretty much impossible to bring up on a modern
1 GNU/Linux system.(1)
1
1 (Or, let's say 'gawk' 1.2 required 'bison' whatever-it-was in 1989
1 and that there was no 'awkgram.c' file in the repository. Is there a
1 guarantee that we could find that 'bison' version? Or that _it_ would
1 build?)
1
1 If the repository has all the generated files, then it's easy to just
1 check them out and build. (Or _easier_, depending upon how far back we
1 go.)
1
1 And that brings us to the second (and stronger) reason why all the
1 files really need to be in Git. It boils down to who do you cater
1 to--the 'gawk' developer(s), or the user who just wants to check out a
1 version and try it out?
1
1 The 'gawk' maintainer wants it to be possible for any interested
1 'awk' user in the world to just clone the repository, check out the
1 branch of interest and build it. Without their having to have the
1 correct version(s) of the autotools.(2) That is the point of the
1 'bootstrap.sh' file. It touches the various other files in the right
1 order such that
1
1 # The canonical incantation for building GNU software:
1 ./bootstrap.sh && ./configure && make
1
1 will _just work_.
1
1 This is extremely important for the 'master' and 'gawk-X.Y-stable'
1 branches.
1
1 Further, the 'gawk' maintainer would argue that it's also important
1 for the 'gawk' developers. When he tried to check out the 'xgawk'
1 branch(3) to build it, he couldn't. (No 'ltmain.sh' file, and he had no
1 idea how to create it, and that was not the only problem.)
1
1 He felt _extremely_ frustrated. With respect to that branch, the
1 maintainer is no different than Jane User who wants to try to build
1 'gawk-4.1-stable' or 'master' from the repository.
1
1 Thus, the maintainer thinks that it's not just important, but
1 critical, that for any given branch, the above incantation _just works_.
1
1 A third reason to have all the files is that without them, using 'git
1 bisect' to try to find the commit that introduced a bug is exceedingly
1 difficult. The maintainer tried to do that on another project that
1 requires running bootstrapping scripts just to create 'configure' and so
1 on; it was really painful. When the repository is self-contained, using
1 'git bisect' in it is very easy.
1
1 What are some of the consequences and/or actions to take?
1
1 1. We don't mind that there are differing files in the different
1 branches as a result of different versions of the autotools.
1
1 A. It's the maintainer's job to merge them and he will deal with
1 it.
1
1 B. He is really good at 'git diff x y > /tmp/diff1 ; gvim
1 /tmp/diff1' to remove the diffs that aren't of interest in
1 order to review code.
1
1 2. It would certainly help if everyone used the same versions of the
1 GNU tools as he does, which in general are the latest released
1 versions of Automake, Autoconf, 'bison', and GNU 'gettext'.
1
1 Installing from source is quite easy. It's how the maintainer
1 worked for years (and still works). He had '/usr/local/bin' at the
1 front of his 'PATH' and just did:
1
1 wget https://ftp.gnu.org/gnu/PACKAGE/PACKAGE-X.Y.Z.tar.gz
1 tar -xpzvf PACKAGE-X.Y.Z.tar.gz
1 cd PACKAGE-X.Y.Z
1 ./configure && make && make check
1 make install # as root
1
1 NOTE: Because of the 'https://' URL, you may have to supply
1 the '--no-check-certificate' option to 'wget' to download the
1 file.
1
1 Most of the above was originally written by the maintainer to other
1 'gawk' developers. It raised the objection from one of the developers
1 "... that anybody pulling down the source from Git is not an end user."
1
1 However, this is not true. There are "power 'awk' users" who can
1 build 'gawk' (using the magic incantation shown previously) but who
1 can't program in C. Thus, the major branches should be kept buildable
1 all the time.
1
1 It was then suggested that there be a 'cron' job to create nightly
1 tarballs of "the source." Here, the problem is that there are source
1 trees, corresponding to the various branches! So, nightly tarballs
1 aren't the answer, especially as the repository can go for weeks without
1 significant change being introduced.
1
1 Fortunately, the Git server can meet this need. For any given branch
1 named BRANCHNAME, use:
1
1 wget https://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-BRANCHNAME.tar.gz
1
1 to retrieve a snapshot of the given branch.
1
1 ---------- Footnotes ----------
1
1 (1) We tried. It was painful.
1
1 (2) There is one GNU program that is (in our opinion) severely
1 difficult to bootstrap from the Git repository. For example, on the
1 author's old (but still working) PowerPC Macintosh with Mac OS X 10.5,
1 it was necessary to bootstrap a ton of software, starting with Git
1 itself, in order to try to work with the latest code. It's not
1 pleasant, and especially on older systems, it's a big waste of time.
1
1 Starting with the latest tarball was no picnic either. The
1 maintainers had dropped '.gz' and '.bz2' files and only distribute
1 '.tar.xz' files. It was necessary to bootstrap 'xz' first!
1
1 (3) A branch (since removed) created by one of the other developers
1 that did not include the generated files.
1