gawk: Passwd Functions

1 
1 10.5 Reading the User Database
1 ==============================
1 
1 The 'PROCINFO' array (⇒Built-in Variables) provides access to the
1 current user's real and effective user and group ID numbers, and, if
1 available, the user's supplementary group set.  However, because these
1 are numbers, they do not provide very useful information to the average
1 user.  There needs to be some way to find the user information
1 associated with the user and group ID numbers.  This minor node presents
1 a suite of functions for retrieving information from the user database.
1 ⇒Group Functions for a similar suite that retrieves information
1 from the group database.
1 
1    The POSIX standard does not define the file where user information is
1 kept.  Instead, it provides the '<pwd.h>' header file and several C
1 language subroutines for obtaining user information.  The primary
1 function is 'getpwent()', for "get password entry."  The "password"
1 comes from the original user database file, '/etc/passwd', which stores
1 user information along with the encrypted passwords (hence the name).
1 
1    Although an 'awk' program could simply read '/etc/passwd' directly,
1 this file may not contain complete information about the system's set of
1 users.(1)  To be sure you are able to produce a readable and complete
1 version of the user database, it is necessary to write a small C program
1 that calls 'getpwent()'.  'getpwent()' is defined as returning a pointer
1 to a 'struct passwd'.  Each time it is called, it returns the next entry
1 in the database.  When there are no more entries, it returns 'NULL', the
1 null pointer.  When this happens, the C program should call 'endpwent()'
1 to close the database.  Following is 'pwcat', a C program that "cats"
1 the password database:
1 
1      /*
1       * pwcat.c
1       *
1       * Generate a printable version of the password database.
1       */
1      #include <stdio.h>
1      #include <pwd.h>
1 
1      int
1      main(int argc, char **argv)
1      {
1          struct passwd *p;
1 
1          while ((p = getpwent()) != NULL)
1              printf("%s:%s:%ld:%ld:%s:%s:%s\n",
1                  p->pw_name, p->pw_passwd, (long) p->pw_uid,
1                  (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
1 
1          endpwent();
1          return 0;
1      }
1 
1    If you don't understand C, don't worry about it.  The output from
1 'pwcat' is the user database, in the traditional '/etc/passwd' format of
1 colon-separated fields.  The fields are:
1 
1 Login name
1      The user's login name.
1 
1 Encrypted password
1      The user's encrypted password.  This may not be available on some
1      systems.
1 
1 User-ID
1      The user's numeric user ID number.  (On some systems, it's a C
1      'long', and not an 'int'.  Thus, we cast it to 'long' for all
1      cases.)
1 
1 Group-ID
1      The user's numeric group ID number.  (Similar comments about 'long'
1      versus 'int' apply here.)
1 
1 Full name
1      The user's full name, and perhaps other information associated with
1      the user.
1 
1 Home directory
1      The user's login (or "home") directory (familiar to shell
1      programmers as '$HOME').
1 
1 Login shell
1      The program that is run when the user logs in.  This is usually a
1      shell, such as Bash.
1 
1    A few lines representative of 'pwcat''s output are as follows:
1 
1      $ pwcat
1      -| root:x:0:1:Operator:/:/bin/sh
1      -| nobody:*:65534:65534::/:
1      -| daemon:*:1:1::/:
1      -| sys:*:2:2::/:/bin/csh
1      -| bin:*:3:3::/bin:
1      -| arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
1      -| miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
1      -| andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
1      ...
1 
1    With that introduction, following is a group of functions for getting
1 user information.  There are several functions here, corresponding to
1 the C functions of the same names:
1 
1      # passwd.awk --- access password file information
1 
1      BEGIN {
1          # tailor this to suit your system
1          _pw_awklib = "/usr/local/libexec/awk/"
1      }
1 
1      function _pw_init(    oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
1      {
1          if (_pw_inited)
1              return
1 
1          oldfs = FS
1          oldrs = RS
1          olddol0 = $0
1          using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
1          using_fpat = (PROCINFO["FS"] == "FPAT")
1          FS = ":"
1          RS = "\n"
1 
1          pwcat = _pw_awklib "pwcat"
1          while ((pwcat | getline) > 0) {
1              _pw_byname[$1] = $0
1              _pw_byuid[$3] = $0
1              _pw_bycount[++_pw_total] = $0
1          }
1          close(pwcat)
1          _pw_count = 0
1          _pw_inited = 1
1          FS = oldfs
1          if (using_fw)
1              FIELDWIDTHS = FIELDWIDTHS
1          else if (using_fpat)
1              FPAT = FPAT
1          RS = oldrs
1          $0 = olddol0
1      }
1 
1    The 'BEGIN' rule sets a private variable to the directory where
1 'pwcat' is stored.  Because it is used to help out an 'awk' library
1 routine, we have chosen to put it in '/usr/local/libexec/awk'; however,
1 you might want it to be in a different directory on your system.
1 
1    The function '_pw_init()' fills three copies of the user information
1 into three associative arrays.  The arrays are indexed by username
1 ('_pw_byname'), by user ID number ('_pw_byuid'), and by order of
1 occurrence ('_pw_bycount').  The variable '_pw_inited' is used for
1 efficiency, as '_pw_init()' needs to be called only once.
1 
1    Because this function uses 'getline' to read information from
1 'pwcat', it first saves the values of 'FS', 'RS', and '$0'.  It notes in
1 the variable 'using_fw' whether field splitting with 'FIELDWIDTHS' is in
1 effect or not.  Doing so is necessary, as these functions could be
1 called from anywhere within a user's program, and the user may have his
1 or her own way of splitting records and fields.  This makes it possible
1 to restore the correct field-splitting mechanism later.  The test can
1 only be true for 'gawk'.  It is false if using 'FS' or 'FPAT', or on
1 some other 'awk' implementation.
1 
1    The code that checks for using 'FPAT', using 'using_fpat' and
1 'PROCINFO["FS"]', is similar.
1 
1    The main part of the function uses a loop to read database lines,
1 split the lines into fields, and then store the lines into each array as
1 necessary.  When the loop is done, '_pw_init()' cleans up by closing the
1 pipeline, setting '_pw_inited' to one, and restoring 'FS' (and
1 'FIELDWIDTHS' or 'FPAT' if necessary), 'RS', and '$0'.  The use of
1 '_pw_count' is explained shortly.
1 
1    The 'getpwnam()' function takes a username as a string argument.  If
1 that user is in the database, it returns the appropriate line.
1 Otherwise, it relies on the array reference to a nonexistent element to
1 create the element with the null string as its value:
1 
1      function getpwnam(name)
1      {
1          _pw_init()
1          return _pw_byname[name]
1      }
1 
1    Similarly, the 'getpwuid()' function takes a user ID number argument.
1 If that user number is in the database, it returns the appropriate line.
1 Otherwise, it returns the null string:
1 
1      function getpwuid(uid)
1      {
1          _pw_init()
1          return _pw_byuid[uid]
1      }
1 
1    The 'getpwent()' function simply steps through the database, one
1 entry at a time.  It uses '_pw_count' to track its current position in
1 the '_pw_bycount' array:
1 
1      function getpwent()
1      {
1          _pw_init()
1          if (_pw_count < _pw_total)
1              return _pw_bycount[++_pw_count]
1          return ""
1      }
1 
1    The 'endpwent()' function resets '_pw_count' to zero, so that
1 subsequent calls to 'getpwent()' start over again:
1 
1      function endpwent()
1      {
1          _pw_count = 0
1      }
1 
1    A conscious design decision in this suite is that each subroutine
1 calls '_pw_init()' to initialize the database arrays.  The overhead of
1 running a separate process to generate the user database, and the I/O to
1 scan it, are only incurred if the user's main program actually calls one
1 of these functions.  If this library file is loaded along with a user's
1 program, but none of the routines are ever called, then there is no
1 extra runtime overhead.  (The alternative is move the body of
1 '_pw_init()' into a 'BEGIN' rule, which always runs 'pwcat'.  This
1 simplifies the code but runs an extra process that may never be needed.)
1 
1    In turn, calling '_pw_init()' is not too expensive, because the
1 '_pw_inited' variable keeps the program from reading the data more than
1 once.  If you are worried about squeezing every last cycle out of your
1 'awk' program, the check of '_pw_inited' could be moved out of
1 '_pw_init()' and duplicated in all the other functions.  In practice,
1 this is not necessary, as most 'awk' programs are I/O-bound, and such a
1 change would clutter up the code.
1 
1    The 'id' program in ⇒Id Program uses these functions.
1 
1    ---------- Footnotes ----------
1 
1    (1) It is often the case that password information is stored in a
1 network database.
1