bash: GNU Parallel

1 
1 3.2.6 GNU Parallel
1 ------------------
1 
1 There are ways to run commands in parallel that are not built into Bash.
1 GNU Parallel is a tool to do just that.
1 
1    GNU Parallel, as its name suggests, can be used to build and run
1 commands in parallel.  You may run the same command with different
1 arguments, whether they are filenames, usernames, hostnames, or lines
1 read from files.  GNU Parallel provides shorthand references to many of
1 the most common operations (input lines, various portions of the input
1 line, different ways to specify the input source, and so on).  Parallel
1 can replace 'xargs' or feed commands from its input sources to several
1 different instances of Bash.
1 
1    For a complete description, refer to the GNU Parallel documentation.
1 A few examples should provide a brief introduction to its use.
1 
1    For example, it is easy to replace 'xargs' to gzip all html files in
1 the current directory and its subdirectories:
1      find . -type f -name '*.html' -print | parallel gzip
1 If you need to protect special characters such as newlines in file
1 names, use find's '-print0' option and parallel's '-0' option.
1 
1    You can use Parallel to move files from the current directory when
1 the number of files is too large to process with one 'mv' invocation:
1      ls | parallel mv {} destdir
1 
1    As you can see, the {} is replaced with each line read from standard
1 input.  While using 'ls' will work in most instances, it is not
1 sufficient to deal with all filenames.  If you need to accommodate
1 special characters in filenames, you can use
1 
1      find . -depth 1 \! -name '.*' -print0 | parallel -0 mv {} destdir
1 
1 as alluded to above.
1 
1    This will run as many 'mv' commands as there are files in the current
1 directory.  You can emulate a parallel 'xargs' by adding the '-X'
1 option:
1      find . -depth 1 \! -name '.*' -print0 | parallel -0 -X mv {} destdir
1 
1    GNU Parallel can replace certain common idioms that operate on lines
1 read from a file (in this case, filenames listed one per line):
1      	while IFS= read -r x; do
1      		do-something1 "$x" "config-$x"
1      		do-something2 < "$x"
1      	done < file | process-output
1 
1 with a more compact syntax reminiscent of lambdas:
1      cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output
1 
1    Parallel provides a built-in mechanism to remove filename extensions,
1 which lends itself to batch file transformations or renaming:
1      ls *.gz | parallel -j+0 "zcat {} | bzip2 >{.}.bz2 && rm {}"
1 This will recompress all files in the current directory with names
1 ending in .gz using bzip2, running one job per CPU (-j+0) in parallel.
1 (We use 'ls' for brevity here; using 'find' as above is more robust in
1 the face of filenames containing unexpected characters.)  Parallel can
1 take arguments from the command line; the above can also be written as
1 
1      parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
1 
1    If a command generates output, you may want to preserve the input
1 order in the output.  For instance, the following command
1      { echo foss.org.my ; echo debian.org; echo freenetproject.org; } | parallel traceroute
1 will display as output the traceroute invocation that finishes first.
1 Adding the '-k' option
1      { echo foss.org.my ; echo debian.org; echo freenetproject.org; } | parallel -k traceroute
1 will ensure that the output of 'traceroute foss.org.my' is displayed
1 first.
1 
1    Finally, Parallel can be used to run a sequence of shell commands in
1 parallel, similar to 'cat file | bash'.  It is not uncommon to take a
1 list of filenames, create a series of shell commands to operate on them,
1 and feed that list of commnds to a shell.  Parallel can speed this up.
1 Assuming that 'file' contains a list of shell commands, one per line,
1 
1      parallel -j 10 < file
1 
1 will evaluate the commands using the shell (since no explicit command is
1 supplied as an argument), in blocks of ten shell jobs at a time.
1