gawk: Derived Files

1 
1 C.2.4 Why Generated Files Are Kept In Git
1 -----------------------------------------
1 
1 If you look at the 'gawk' source in the Git repository, you will notice
1 that it includes files that are automatically generated by GNU
1 infrastructure tools, such as 'Makefile.in' from Automake and even
1 'configure' from Autoconf.
1 
1    This is different from many Free Software projects that do not store
1 the derived files, because that keeps the repository less cluttered, and
1 it is easier to see the substantive changes when comparing versions and
1 trying to understand what changed between commits.
1 
1    However, there are several reasons why the 'gawk' maintainer likes to
1 have everything in the repository.
1 
1    First, because it is then easy to reproduce any given version
1 completely, without relying upon the availability of (older, likely
1 obsolete, and maybe even impossible to find) other tools.
1 
1    As an extreme example, if you ever even think about trying to
1 compile, oh, say, the V7 'awk', you will discover that not only do you
1 have to bootstrap the V7 'yacc' to do so, but you also need the V7
1 'lex'.  And the latter is pretty much impossible to bring up on a modern
1 GNU/Linux system.(1)
1 
1    (Or, let's say 'gawk' 1.2 required 'bison' whatever-it-was in 1989
1 and that there was no 'awkgram.c' file in the repository.  Is there a
1 guarantee that we could find that 'bison' version?  Or that _it_ would
1 build?)
1 
1    If the repository has all the generated files, then it's easy to just
1 check them out and build.  (Or _easier_, depending upon how far back we
1 go.)
1 
1    And that brings us to the second (and stronger) reason why all the
1 files really need to be in Git.  It boils down to who do you cater
1 to--the 'gawk' developer(s), or the user who just wants to check out a
1 version and try it out?
1 
1    The 'gawk' maintainer wants it to be possible for any interested
1 'awk' user in the world to just clone the repository, check out the
1 branch of interest and build it.  Without their having to have the
1 correct version(s) of the autotools.(2)  That is the point of the
1 'bootstrap.sh' file.  It touches the various other files in the right
1 order such that
1 
1      # The canonical incantation for building GNU software:
1      ./bootstrap.sh && ./configure && make
1 
1 will _just work_.
1 
1    This is extremely important for the 'master' and 'gawk-X.Y-stable'
1 branches.
1 
1    Further, the 'gawk' maintainer would argue that it's also important
1 for the 'gawk' developers.  When he tried to check out the 'xgawk'
1 branch(3) to build it, he couldn't.  (No 'ltmain.sh' file, and he had no
1 idea how to create it, and that was not the only problem.)
1 
1    He felt _extremely_ frustrated.  With respect to that branch, the
1 maintainer is no different than Jane User who wants to try to build
1 'gawk-4.1-stable' or 'master' from the repository.
1 
1    Thus, the maintainer thinks that it's not just important, but
1 critical, that for any given branch, the above incantation _just works_.
1 
1    A third reason to have all the files is that without them, using 'git
1 bisect' to try to find the commit that introduced a bug is exceedingly
1 difficult.  The maintainer tried to do that on another project that
1 requires running bootstrapping scripts just to create 'configure' and so
1 on; it was really painful.  When the repository is self-contained, using
1 'git bisect' in it is very easy.
1 
1    What are some of the consequences and/or actions to take?
1 
1   1. We don't mind that there are differing files in the different
1      branches as a result of different versions of the autotools.
1 
1        A. It's the maintainer's job to merge them and he will deal with
1           it.
1 
1        B. He is really good at 'git diff x y > /tmp/diff1 ; gvim
1           /tmp/diff1' to remove the diffs that aren't of interest in
1           order to review code.
1 
1   2. It would certainly help if everyone used the same versions of the
1      GNU tools as he does, which in general are the latest released
1      versions of Automake, Autoconf, 'bison', and GNU 'gettext'.
1 
1      Installing from source is quite easy.  It's how the maintainer
1      worked for years (and still works).  He had '/usr/local/bin' at the
1      front of his 'PATH' and just did:
1 
1           wget https://ftp.gnu.org/gnu/PACKAGE/PACKAGE-X.Y.Z.tar.gz
1           tar -xpzvf PACKAGE-X.Y.Z.tar.gz
1           cd PACKAGE-X.Y.Z
1           ./configure && make && make check
1           make install    # as root
1 
1           NOTE: Because of the 'https://' URL, you may have to supply
1           the '--no-check-certificate' option to 'wget' to download the
1           file.
1 
1    Most of the above was originally written by the maintainer to other
1 'gawk' developers.  It raised the objection from one of the developers
1 "... that anybody pulling down the source from Git is not an end user."
1 
1    However, this is not true.  There are "power 'awk' users" who can
1 build 'gawk' (using the magic incantation shown previously) but who
1 can't program in C. Thus, the major branches should be kept buildable
1 all the time.
1 
1    It was then suggested that there be a 'cron' job to create nightly
1 tarballs of "the source."  Here, the problem is that there are source
1 trees, corresponding to the various branches!  So, nightly tarballs
1 aren't the answer, especially as the repository can go for weeks without
1 significant change being introduced.
1 
1    Fortunately, the Git server can meet this need.  For any given branch
1 named BRANCHNAME, use:
1 
1      wget https://git.savannah.gnu.org/cgit/gawk.git/snapshot/gawk-BRANCHNAME.tar.gz
1 
1 to retrieve a snapshot of the given branch.
1 
1    ---------- Footnotes ----------
1 
1    (1) We tried.  It was painful.
1 
1    (2) There is one GNU program that is (in our opinion) severely
1 difficult to bootstrap from the Git repository.  For example, on the
1 author's old (but still working) PowerPC Macintosh with Mac OS X 10.5,
1 it was necessary to bootstrap a ton of software, starting with Git
1 itself, in order to try to work with the latest code.  It's not
1 pleasant, and especially on older systems, it's a big waste of time.
1 
1    Starting with the latest tarball was no picnic either.  The
1 maintainers had dropped '.gz' and '.bz2' files and only distribute
1 '.tar.xz' files.  It was necessary to bootstrap 'xz' first!
1 
1    (3) A branch (since removed) created by one of the other developers
1 that did not include the generated files.
1