Here are some portable Bourne shell idioms
that I find useful to remember for scripting.
The Bourne shell does much more than most
users realize, and the ksh and bash
extensions are rarely essential.  (From the
command-line, bash and ksh are vastly more
useful.)

My favorite reference book is "Portable Shell
Programming --- An Extensive Collection of
Bourne Shell Examples" by Bruce Blinn from
Prentice Hall.

Get information on a built-in bash command
with ``help''.  It's much easier than reading
the full bash man page at
http://www.gnu.org/software/bash/manual/bash.html

For Bash suggestions, I recommend this bash FAQ:
http://mywiki.wooledge.org/BashFAQ/

* Text filtering commands *

Administrating from scripts and the
command-line often benefit from pipes of text
filtering commands.  Here are some that are
easy to overlook or forget.

--

o ``mmencode'' converts to and from
base64 and "quoted-printable" formats for email.
Search for the ``metamail'' package.
Unfortunately, this has become hard to find.

Alternatively ``uuencode -m'' converts to base64, 
and ``uudecode -m'' converts from base64.

Or decode and encode quoted-printable and
base64 with

=>
perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::decode($_);'
perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::encode($_);'
perl -pe 'use MIME::Base64; $_=MIME::Base64::encode($_);'
perl -pe 'use MIME::Base64; $_=MIME::Base64::decode($_);'
<=

URL-encode a string with

=>
perl -ne 'chomp; s/([^-_.~A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg; print "$_\n"'
<=

o Convert utf-8 characters to escaped hexadecimal for html, and back:

=>
perl -C -pe 's/([^\x00-\x7f])/sprintf("&#%d;", ord($1))/ge;'
perl -C -pe 's/&\#(\d+);/chr($1)/ge;s/&\#x([a-fA-F\d]+);/chr(hex($1))/ge;'
<=

o ``uniq'' lets you remove duplicated lines
from a sorted file.

o Count the number of times a given line
occurs with

=>
sort | uniq -c | sort -n
sort | uniq -c | sort -k1,1nr -k2
<=

o Break one word per line with

=>
perl -pe 's/\s+/\n/g'
<=

o Combine separate lines into a single line
of words with

=>
paste -s -d" " 
<=

o Add up numbers that arrive one per line

=>
paste -s -d+ | bc
<=

o ``comm'' lets you suppress lines unique to
one or both of two files.

o ``cat -s'' never prints more than one blank
line in a row.

o Remove all blank lines with

=>
perl -ne 'print if /\S/'
<=

o Print lines starting with one containing
FOO and ending with one containing BAR.

=>
sed -n '/FOO/,/BAR/p'
<=

o Print lines other than those starting with one containing
FOO and ending with one containing BAR.

=>
sed -n '/FOO/,/BAR/!p'
<=

o ``diff3 -m'' for merging changes in files edited from a common ancestor.

o ``fold'' breaks lines to proper width, and
``fmt'' will reformat lines into paragraphs.

o ``dirname'' and ``basename'' let you
extract the directory and filenames from a
full path to a file.

o ``namei'' breaks a pathname into pieces and
follows symbolic links.

o ``expand'' and ``col -x'' replace tabs by
spaces.

o ``col -b'' removes backspaces from a file.

o ``cat -v'' shows non-printing characters as
ascii escapes.

o ``sed '1,10d' '' deletes the first 10
lines.

o ``sed -n '3p' '' and ``sed -n '3{p;q}' ''
both print the third line, but the latter is
more efficient.

o ``sed '/foo/q' '' truncates a file after
the line containing ``foo''.

o ``sed -ne '/foo/,/bar/p' '' prints
everything from the line containing ``foo''
to the line containing ``bar''.

o Align space-delimited fields into orderly
columns with ``column -t''.

o Right justify queries with

=>
printf "%40s" "Do you want to delete? [y/N] "
<=

o Convert dos text files to unix, and vice
versa:

=>
dos2unix file.txt
unix2dos file.txt
tr -d \\r < win.txt > unix.txt  # if you can't find dos2unix
sed -e 's/$/\r/' < unix.txt > win.txt  # if you can't find unix2dos
<=

o ``cat -n'' and ``nl'' numbers lines.

o Both of these perform string substitution,
but the latter allows more general regular
expressions:

=>
sed -e 's/oldtext/newtext/g'
perl -pe 's/oldtext/newtext/g'
<=

Here's how to replace double quotes by single
quotes for TeX:

=>
< in.tex perl -pne 's%\B"\b%``%g' | 
  perl -pne "s%\b\"\B%''%g" > out.tex
<=

o Use ``iconv'' to convert between character
encodings.

o Here are two ways to find string patterns (regular expressions)
in a file:

=>
grep 'pattern' filename [file] [< file]
perl -ne 'print if /pattern/' [file] [< file]
<=

o Print the first and third columns of each
line:

=>
awk '{print $1,$3}'
perl -lane 'print "$F[0] $F[2]"'
while read a b c d ; do echo "$a $c" ; done
<=

o Convert to lower-case:

=>
tr '[A-Z]' '[a-z]'
tr '[:upper:]' '[:lower:]'
perl -pe 'tr/[A-Z]/[a-z]/'
perl -pe '$_ =lc'
<=

o Simple character substitutions and
deletions may be simplest with ``tr''.

=>
tr -d '\r'  # delete carriage returns
tr '\n' '\0' # replace newlines by null characters. 
<=

=>
$ echo 1-2a-3b | tr "[1-9]" "[2-9]" | tr '-' '_' | tr -d 'a'
2_3_4b
<=

o You can pipe into a loop with ``read -r''.
Here is a complicated way to cat a text file,
piping in and out of a loop.

=>
cat file | while read -r a; do echo "$a" ; done | cat
<=

o To read lines in pairs from two files try

=>
paste file1 file2 | while read -r a b ; do echo "$a $b" ; done
<=

o Divide words one per line, then sum them as numbers:

=>
$ echo 1 2 3.1 | 
    perl -pe 's/\s+/\n/g' |
    perl -e '$s=0; while (<>) {$s += $_;} ; print "$s\n";'
6.1
<=

o Reverse lines with ``tac'' and words with
``rev''.

o Sort a list of dependencies with ``tsort''.

o Shuffle lines randomly with ``shuf''.  Generate shuffled integers with

=>
$ shuf -i1-100 -n3
93
57
71
<=

o Generate random lottery numbers between 1 and 292201338:

=>
$ echo "($RANDOM + 32768*($RANDOM + 32768*$RANDOM)) % 292201338 + 1" |  bc
130237776
<=


_

* Files and directories *

--

o Select text (non-binary) files with
one of these

=>
\ls | perl -lne 'print if -T'
perl -le 'for (glob "*") {print if -T }'
perl -le 'print for grep -T, <*>'
<=

The perl algorithm for detecting text files
is very good.

o To do something to files with goofy names,
including spaces and dashes, delimit the
files with null characters instead of
whitespace or newlines.

=>
find . -type f -print0 | xargs -r0 ls
<=

Or read from one line at a time:

=>
cd "$dir1" && find . -type f  | 
  while read -r f ; do 
    if [ ! -f "$dir2/$f" ] ; then 
      echo "$f is in $dir2 but not in $dir2"
    fi
  done
<=

o See if a directory contains any files, including broken links.

=>
has_files() { 
  set -- "$1"/.[!.]* "$1"/*; test -e "$1" || test -e "$2" || test -L "$1" || test -L "$2"; 
}

if has_files ${dir} ; then echo "${dir} has files" ; else echo "${dir} is empty" ; fi
<=

See if files of a certain type exist:

=>
  if [ "`printf '%s' *.par`" != '*.par' ] ; then echo "has par files" ; fi
[or]
  test "`printf '%s' *.par`" != '*.par' && echo "has pars" || echo "no pars"
<=

o ``readlink -f'' will fully resolve what a
symbolic link points to.

Find all bad symbolic links with

=>
find . -type l |
  while read -r f ; do if ! readlink -f "$f" >&/dev/null
  then echo "$f" ; fi ; done
<=

o To see the canonical path for the current directory, you can use either of these:

=>
readlink -f .
pwd -P
<=

_

* Variables *

--

o To see if a variable contains a regular
expression, combine ``if'' and ``grep''.  For
example to see if the name of a file begins
with a dot, try

=>
 if echo "$filename" | grep '^[.]' >/dev/null 
 then echo yes ; else echo no ; fi
<=

``expr'' also has a support for limited
regular expressions.

=>
if [ `expr "$filename" : '[.].*'` -ne 0 ] 
then echo yes ; else echo no ; fi
<=

o Use ``read -r'' to avoid tokenizing filenames
with spaces.  Here's how to find all files
containing a space, and replace them by
underscores.

=>
find . -iname '*  *' | 
  while read -r f ; do 
    echo mv "$f" "`echo "$f" | sed 's/  */_/g'`"
  done
<=

o For simple integer arithmetic use ``expr'':

=>
N=`expr "$N" + 3`
<=

o For arbitrary-precision floating-point
math, use ``bc -l''

=>
# Get pi to 10 places with arctangent (bc man page)
PI=`echo "scale=10; 4*a(1)" | bc -l`
# Expensive calculation of zero (Craig Artley):
ZERO=`echo "c($PI/4)-sqrt(2)/2" | bc -l`
<=

o ``seq 1 100'' generates all integers
between 1 and 100.  To iterate a loop 100
times, try

=>
for i in `seq 1 100` ; do ... ; done
<=

o You can set the environment of a subprocess
by defining a variable on the same line.  The
current shell is not affected.

=>
$ x=doggie sh -c 'echo x=$x'
x=doggie
$ x=pig ; x=doggie echo x=$x
x=pig
<=

o Test that a string has non-zero length with

=>
if [ -n "$string" ] ; then echo "not empty" ; fi
<=

The ``-n'' is actually the default for a
string expression, so you can omit it:

=>
if [ "$string" ] ; then echo "not empty" ; fi
<=

o There are several good ways to set default
values for environmental variables.  Many do
this

=>
if [ ! "$VARIABLE" ] ; then VARIABLE="default value" ; fi
export VARIABLE
<=

A simple alternative is

=> 
: ${VARIABLE:="default value"} 
export VARIABLE 
<=

The colon at the beginning of the line is
necessary as a no-op that allows its
arguments to be evaluated.

o Rarely you may want to accept a variable
defined as an empty string.  If so, then omit
the colon before the equals when setting the
default.

=> 
: ${VARIABLE="default value"} 
export VARIABLE 
<=

To test whether a string is defined, even if
empty, test

=>
if [ "${VARIABLE+x}" ] ; then echo DEFINED ; fi
<=

o To echo all variables starting with X:
``echo ${!X*}''

o To check whether a series of variables are defined, try

=>
for V in JAVA_HOME SSH_AGENT_PID TEXMFDIR NETHACKOPTIONS ; do
  eval v="\$$V"
  if [ ! "$v" ] ; then echo "You must define $V" ; fi
done
<=

_

* Running commands *

--

o Use ``"$@"'' when passing command-line
arguments unaltered to subprocesses.  This is
equivalent to passing ``"$1" "$2" ...'', but
the first version works properly for no
arguments.

o Test the processing of arguments, like this

=>
$ set a 'b c' d
$ for i in "$@" ; do echo "|$i|" ; done
|a|
|b c|
|d|
$ for i in "$*" ; do echo "|$i|" ; done
|a b c d|
$ for i in $* ; do echo "|$i|" ; done
|a|
|b|
|c|
|d|
<=

o See what runtime options you may have set with these

=>
set -o; bind -p; shopt -p; stty -a
<=

For example, you can edit a bash command by default
in emacs mode.  Change to vi with

=>
set -o vi
<=

In emacs mode, you can edit your command in
your environmental ``$EDITOR'' with ``cntl-x
cntl-e''

In vi-mode, use ``esc-v''.  See ``help fc''
for more.

o Repeat the last argument of the previous
command with ``!$''.  Repeat all arguments
without the command with ``!*''.

o To guarantee that a background process
outlives the current shell, add extra
parentheses like this:

=>
( command & )
<=

Otherwise, your current shell, by exiting X
or ssh, may terminate all processes that have
your shell as the parent process.  The extra
parentheses starts a subshell that exits as
soon as the command is spawned in the
background.  The background process changes
its parent process ID to 1.  This is a
command-line version of the "double fork."

o Repeat until a command succeeds:

=>
while ! cvs -z 3 -q update -dPA ; do echo -n . ; sleep 60 ; done
<=

o Make a progress bar (loop while waiting on
a process)

=>
sleep 10 & while ps -p $! >/dev/null; do echo -n . ; sleep 1 ; done ; echo 
or
while pidof mozilla-bin > /dev/null ; do echo -n . ; sleep 1 ; done ; echo
<=

``pgrep -f'' or ``killall -0'' are alternatives to ``pidof'' for this purpose.

_

* Manipulating paths *

--

o Loop over the elements of a PATH by
tokenizing with the character ':'.

=>
IFS=':' ; for dir in $PATH ; do echo $dir ; done
<=

o Check for the existence of an executable
version of a command in your PATH:

=>
function checkPath() {IFS=':' ; for dir in $PATH ; do if [ -x "$dir/$1" ] ;
                      then return 0; fi ; done; return 1;}
if checkPath commandName ; then ... ; fi
<=

o Here is my prefered way to modify a PATH

=>
# Arguments currentpath newelement [after]
# addtopath a:b c ->  c:a:b
# addtopath a:b c after -> a:b:c
# addtopath a:b a -> a:b
addtopath () {
    P=$1
    E=$2
    O=$3
    if [ ! "$P" ] ; then
        P="$E"
    elif ! echo $P | egrep "(^|:)$E($|:)" >/dev/null ; then
        if [ "$O" = "after" ] ; then
            P="$P:$E"
        else
            P="$E:$P"
        fi
    fi
    echo "$P"
}

# example
PATH=`addtopath "$PATH" /usr/local/bin after`
<=

_

* Common script chores *

--

o Debug the script with ``set -x''.

o Make a script exit immediately after any failed
command with ``set -e''.  

o Process flags in a script:

=>
for i in "$@" ; do
        case $i in 
                -a) FLAG_A=1
                        shift ;;
                -b) FLAG_B="$2"
                        shift ; shift ;;
                --) shift ; break ;;
        esac
done
<=

o Print help from a script:

=>
if [ $# -lt 1 -o "$1" = "-h" -o "$1" = "-help" -o "$1" = "--help" ] ; then
     cat <<-END
	Usage: `basename $0` [-flag] arg1 [arg2]
	More information.
END
     exit
fi
<=

o Handle errors with functions:

Often an error exit is handled most cleanly
with a function.

=>
print_usage_and_exit() {
	cat <<-END
Usage: `basename $0` arg1 arg2 [arg3]
The first two arguments are required.
END
	exit
}

if [ $# -lt 2 ] ; then
	print_usage_and_exit
fi
<=

o Here's a robust way to locate the directory
containing a script, following symbolic
links.  (Taken from the launch script of
``FindBugs''.)

=>
program="$0"
while [ -h "$program" ]; do
        link=`ls -ld "$program"`
        link=`expr "$link" : '.*-> \(.*\)'`
        if [ "`expr "$link" : '/.*'`" = 0 ]; then
                dir=`dirname "$program"`
                program="$dir/$link"
        else
                program="$link"
        fi
done
script_directory=`dirname $program`
script_directory=`cd $script_directory && /bin/pwd`
<=

o Trapping signals to stop scripts:

Ever try to interrupt your script, then
discover that it killed only one command and
continued to the next?  Force a complete exit
by adding the following line early in your
script.

=>
trap "exit 1" 1 2 3 15
<=

You can also trap normal and error exits:

=>
# force script to exit when any command fails
set -e 

# Trap on any exit
trap "echo Always called before exit" 0

# Trap on error exit only
trap "echo Error exit was called " ERR

echo "Next command will fail"

# Returns error code of 1
false

echo "Will not see this comment"
<=

o Process ID's

Get the process ID of the current shell as
``$$'', of the parent shell with ``$PPID''
and ``$!'' for the most recently backgrounded
child process.

Interactively, you get see child PID's 
with ``jobs -p''.

o Here's how to ask a yes or no question,
with a default of no.  It checks whether the
first letter is a y or Y and ignores leading
spaces.

=>
echo -n "Do you want to continue? [y/N]: "
read answer
if expr "$answer" : ' *[yY].*' > /dev/null; then 
   echo Continuing 
else 
   echo Quitting
   exit
fi
<=

o Here's how to ask for a password without
echoing the characters.  The trapping ensures
that an interrupt does not leave the echoing
off.

=>
stty -echo
trap "stty echo ; echo 'Interrupted' ; exit 1" 1 2 3 15
echo -n "Enter password: "
read password
echo "Your password is \"$password\""
stty echo
<=

Gnome and other frameworks often allow simple
scripting of GUIs: 

=> 
password=`zenity --entry --text "Enter password:"`
<=

_

* File descriptors *

--

o Redirecting output file descriptors

Here are common ways to capture the
standard output and standard error 
of a single command in a log file:

=>
command >file.log 2>&1 
command 2>&1 | tee file.log
<=

o If you have a script with many commands,
you can have them all write to the same log
file by default:

=>
# save default standard output in file descriptor 10
exec 10>&1
# redirect standard output to a log file.
exec >file.log
# redirect standard error to same log file
exec 2>&1
# close stdin
exec 0<&-
# This command will write to log file
command
# echo to default standard output instead of log file
echo "Visible message" 1>&10
<=

Avoid file descriptor 5, which bash already
uses.  (``ulimit -n'' should show many
available file descriptors.)

o Avoid writing to stdout if it is not connected to a terminal:

=>
  test -t 1 && echo "Connected to a terminal"
<=

o Open a socket

Associate a file descriptor, say 4, with a
socket, and close with

=>
4< /dev/tcp/$hostname/$port
4<&-
<=

A more portable solution is to use ``nc''.
Listen on a port with

=>
nc -l -p 3535
<=

Connect to a remote host port like

=>
echo 'GET /' | nc hostname 80
<=

An even more general utility is ``socat'',
which also handles Unix sockets.

_

o Hostname lookups on linux

General utilities are ``dig'', ``nslookup'', ``host'', ``hostname''.

Get an IP address for a specific hostname:

=>
host samplehostname | sed 's/.* //'
<=

Get a hostname for an IP address:

=>
nslookup 123.123.123.123 | grep 'name = ' | sed 's/.*name = //'
<=