gawk: Passwd Functions
1
1 10.5 Reading the User Database
1 ==============================
1
1 The 'PROCINFO' array (⇒Built-in Variables) provides access to the
1 current user's real and effective user and group ID numbers, and, if
1 available, the user's supplementary group set. However, because these
1 are numbers, they do not provide very useful information to the average
1 user. There needs to be some way to find the user information
1 associated with the user and group ID numbers. This minor node presents
1 a suite of functions for retrieving information from the user database.
1 ⇒Group Functions for a similar suite that retrieves information
1 from the group database.
1
1 The POSIX standard does not define the file where user information is
1 kept. Instead, it provides the '<pwd.h>' header file and several C
1 language subroutines for obtaining user information. The primary
1 function is 'getpwent()', for "get password entry." The "password"
1 comes from the original user database file, '/etc/passwd', which stores
1 user information along with the encrypted passwords (hence the name).
1
1 Although an 'awk' program could simply read '/etc/passwd' directly,
1 this file may not contain complete information about the system's set of
1 users.(1) To be sure you are able to produce a readable and complete
1 version of the user database, it is necessary to write a small C program
1 that calls 'getpwent()'. 'getpwent()' is defined as returning a pointer
1 to a 'struct passwd'. Each time it is called, it returns the next entry
1 in the database. When there are no more entries, it returns 'NULL', the
1 null pointer. When this happens, the C program should call 'endpwent()'
1 to close the database. Following is 'pwcat', a C program that "cats"
1 the password database:
1
1 /*
1 * pwcat.c
1 *
1 * Generate a printable version of the password database.
1 */
1 #include <stdio.h>
1 #include <pwd.h>
1
1 int
1 main(int argc, char **argv)
1 {
1 struct passwd *p;
1
1 while ((p = getpwent()) != NULL)
1 printf("%s:%s:%ld:%ld:%s:%s:%s\n",
1 p->pw_name, p->pw_passwd, (long) p->pw_uid,
1 (long) p->pw_gid, p->pw_gecos, p->pw_dir, p->pw_shell);
1
1 endpwent();
1 return 0;
1 }
1
1 If you don't understand C, don't worry about it. The output from
1 'pwcat' is the user database, in the traditional '/etc/passwd' format of
1 colon-separated fields. The fields are:
1
1 Login name
1 The user's login name.
1
1 Encrypted password
1 The user's encrypted password. This may not be available on some
1 systems.
1
1 User-ID
1 The user's numeric user ID number. (On some systems, it's a C
1 'long', and not an 'int'. Thus, we cast it to 'long' for all
1 cases.)
1
1 Group-ID
1 The user's numeric group ID number. (Similar comments about 'long'
1 versus 'int' apply here.)
1
1 Full name
1 The user's full name, and perhaps other information associated with
1 the user.
1
1 Home directory
1 The user's login (or "home") directory (familiar to shell
1 programmers as '$HOME').
1
1 Login shell
1 The program that is run when the user logs in. This is usually a
1 shell, such as Bash.
1
1 A few lines representative of 'pwcat''s output are as follows:
1
1 $ pwcat
1 -| root:x:0:1:Operator:/:/bin/sh
1 -| nobody:*:65534:65534::/:
1 -| daemon:*:1:1::/:
1 -| sys:*:2:2::/:/bin/csh
1 -| bin:*:3:3::/bin:
1 -| arnold:xyzzy:2076:10:Arnold Robbins:/home/arnold:/bin/sh
1 -| miriam:yxaay:112:10:Miriam Robbins:/home/miriam:/bin/sh
1 -| andy:abcca2:113:10:Andy Jacobs:/home/andy:/bin/sh
1 ...
1
1 With that introduction, following is a group of functions for getting
1 user information. There are several functions here, corresponding to
1 the C functions of the same names:
1
1 # passwd.awk --- access password file information
1
1 BEGIN {
1 # tailor this to suit your system
1 _pw_awklib = "/usr/local/libexec/awk/"
1 }
1
1 function _pw_init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
1 {
1 if (_pw_inited)
1 return
1
1 oldfs = FS
1 oldrs = RS
1 olddol0 = $0
1 using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
1 using_fpat = (PROCINFO["FS"] == "FPAT")
1 FS = ":"
1 RS = "\n"
1
1 pwcat = _pw_awklib "pwcat"
1 while ((pwcat | getline) > 0) {
1 _pw_byname[$1] = $0
1 _pw_byuid[$3] = $0
1 _pw_bycount[++_pw_total] = $0
1 }
1 close(pwcat)
1 _pw_count = 0
1 _pw_inited = 1
1 FS = oldfs
1 if (using_fw)
1 FIELDWIDTHS = FIELDWIDTHS
1 else if (using_fpat)
1 FPAT = FPAT
1 RS = oldrs
1 $0 = olddol0
1 }
1
1 The 'BEGIN' rule sets a private variable to the directory where
1 'pwcat' is stored. Because it is used to help out an 'awk' library
1 routine, we have chosen to put it in '/usr/local/libexec/awk'; however,
1 you might want it to be in a different directory on your system.
1
1 The function '_pw_init()' fills three copies of the user information
1 into three associative arrays. The arrays are indexed by username
1 ('_pw_byname'), by user ID number ('_pw_byuid'), and by order of
1 occurrence ('_pw_bycount'). The variable '_pw_inited' is used for
1 efficiency, as '_pw_init()' needs to be called only once.
1
1 Because this function uses 'getline' to read information from
1 'pwcat', it first saves the values of 'FS', 'RS', and '$0'. It notes in
1 the variable 'using_fw' whether field splitting with 'FIELDWIDTHS' is in
1 effect or not. Doing so is necessary, as these functions could be
1 called from anywhere within a user's program, and the user may have his
1 or her own way of splitting records and fields. This makes it possible
1 to restore the correct field-splitting mechanism later. The test can
1 only be true for 'gawk'. It is false if using 'FS' or 'FPAT', or on
1 some other 'awk' implementation.
1
1 The code that checks for using 'FPAT', using 'using_fpat' and
1 'PROCINFO["FS"]', is similar.
1
1 The main part of the function uses a loop to read database lines,
1 split the lines into fields, and then store the lines into each array as
1 necessary. When the loop is done, '_pw_init()' cleans up by closing the
1 pipeline, setting '_pw_inited' to one, and restoring 'FS' (and
1 'FIELDWIDTHS' or 'FPAT' if necessary), 'RS', and '$0'. The use of
1 '_pw_count' is explained shortly.
1
1 The 'getpwnam()' function takes a username as a string argument. If
1 that user is in the database, it returns the appropriate line.
1 Otherwise, it relies on the array reference to a nonexistent element to
1 create the element with the null string as its value:
1
1 function getpwnam(name)
1 {
1 _pw_init()
1 return _pw_byname[name]
1 }
1
1 Similarly, the 'getpwuid()' function takes a user ID number argument.
1 If that user number is in the database, it returns the appropriate line.
1 Otherwise, it returns the null string:
1
1 function getpwuid(uid)
1 {
1 _pw_init()
1 return _pw_byuid[uid]
1 }
1
1 The 'getpwent()' function simply steps through the database, one
1 entry at a time. It uses '_pw_count' to track its current position in
1 the '_pw_bycount' array:
1
1 function getpwent()
1 {
1 _pw_init()
1 if (_pw_count < _pw_total)
1 return _pw_bycount[++_pw_count]
1 return ""
1 }
1
1 The 'endpwent()' function resets '_pw_count' to zero, so that
1 subsequent calls to 'getpwent()' start over again:
1
1 function endpwent()
1 {
1 _pw_count = 0
1 }
1
1 A conscious design decision in this suite is that each subroutine
1 calls '_pw_init()' to initialize the database arrays. The overhead of
1 running a separate process to generate the user database, and the I/O to
1 scan it, are only incurred if the user's main program actually calls one
1 of these functions. If this library file is loaded along with a user's
1 program, but none of the routines are ever called, then there is no
1 extra runtime overhead. (The alternative is move the body of
1 '_pw_init()' into a 'BEGIN' rule, which always runs 'pwcat'. This
1 simplifies the code but runs an extra process that may never be needed.)
1
1 In turn, calling '_pw_init()' is not too expensive, because the
1 '_pw_inited' variable keeps the program from reading the data more than
1 once. If you are worried about squeezing every last cycle out of your
1 'awk' program, the check of '_pw_inited' could be moved out of
1 '_pw_init()' and duplicated in all the other functions. In practice,
1 this is not necessary, as most 'awk' programs are I/O-bound, and such a
1 change would clutter up the code.
1
1 The 'id' program in ⇒Id Program uses these functions.
1
1 ---------- Footnotes ----------
1
1 (1) It is often the case that password information is stored in a
1 network database.
1