Awk
Awk is an excellent text parser and scripting language. It can be run in-line as part of a Unix command pipeline which makes it extremely useful when needing to add more complicated behavior in a shell script.
Introduction to Awk
Awk's scripting language is structed as a sequence of patterns and actions:
# A pattern definition
PATTERN { ACTION1; ACTION2; }
# A real example
/^[a-z0-9]/ { print $1 }
# Another example
/^[0-9]\s*/ { Sum += $1 }
A pattern is typically defined as a regular expression, boolean expressions, and special patterns. Regular expressions are always enclosed with a leading and ending forward slash. Boolean expressions are additional expressions that use the &&
(and), ||
(or), and !
(not) operators. Special patterns are additional awk built-ins that trigger on specific conditions, such as before the first line and after the last line with BEGIN
and END
.
# To run an action before the first input, use BEGIN
BEGIN { Sum = 0 }
# Get a number and add it to sum using regular expressions
/^[0-9]+ / { Sum += $1 }
# Add more numbers with boolean expressions
/^[0-9]+ / && $1 >= 0 { print "Adding" $1 }
# After the last output, use END
END { print "Sum is" Sum }
Fields
Fields are basically the 'word' within the current line delimited by one or more white spaces. The field $1
would reference the first 'word' of the current line. $2
the second word, and so on. The field $0
represents the entire line.
# Here's an example line:
#
# $1 $2 $3 $4 $5 $6 $7
# GET /index.html HTTP/1.0 Dec 20 2020 01:23:45
/index.html/ {
print "Someone accessed index.html on" $4 $4 $5 $6 $7
}
Tip: You may alter the current line being processed by assigning a modified value to $0.
Scripting
Functions
Functions are defined in the C style:
function do_something(parameter1, parameter2, ...) {
ACTIONS
}
They can then be called like any other function when defining actions.
/^GET/ { do_something($0) }
Variables
Awk has built-in variables:
NR
- current line numberNF
- number of fields in current lineOFS
- output field separatorFS
- field separator, specified with -FRS
- record separator
You can also pass your own custom variables with the -v var=value
parameter. Eg:
$ echo | awk \
-v hostname=`hostname` \
-v timestamp=`date +%s` '
{
printf("Hostname is %s at time %s", hostname, timestamp")
}
'
Inlining with Bash
When writing a bash script, you may inline Awk as part of a command pipeline by passing the Awk script within a set of single quotes. We use single quotes because we do not want Bash to treat Awk fields as variables (Eg. values such as $2
should not be replaced).
#!/bin/bash
cat /etc/hosts | awk '
/^server/ { print $2 }
'
Tip: To add a single quote, you will need to use '"'"'
which inserts a single '
by switching from single to double quotes. This is useful if you need to insert a single quote in order to trigger a subshell call, for instance.
Tasks
Line Matching
With Awk, it's easy to do something for each matching line using the regex matching operator.
Print every line before a line match
Simple awk code:
# cat list.txt | awk '/PATTERN/ { exit } { print $0 }'
Basically, the code does: match against the given pattern. If it matches, awk exits. Otherwise, print the line and continue.
Print the line number on matching lines
To print the number of lines up to a matching line, we do something similar to the previous example but now we keep an accumulator (n):
awk 'BEGIN { n = 0 } /PATTERN/ { print n; exit } { n++ }'
Eg: Suppose I have a file with the contents:
leo
spoon
cake
fork
To find the number of lines up until 'cake', do:
# cat list.txt | awk 'BEGIN { n = 0 } /cake/ { print n; exit } { n++ }'
Retrieve a section of text with line matching
To grab a specific section from a .spec file (where sections begin with %
):
if [ $# -ne 2 ] ; then
echo "Usage: $0 file.spec section"
exit
fi
Section="$2"
cat $1 | awk -v Section="$Section" '
BEGIN {
InSection=false
}
/^%.*/ {
if ($0 ~ Section) {
InSection=1
} else {
InSection=0
}
next
}
{
if (InSection) {
print $0
}
}
String Matching
Use the match(string, regex, output_array)
command to parse out specific values with regex.
# Given a string Job <87010>, User <asdf>, Project <default>
# we can parse out the Job ID with:
match($0, /^Job <([^>]+)>.*/, arr)
print "Job ID: " arr[1]
Converting Values
Convert Number as Bytes to Human Readable Value
I wanted to sort size in reverse order, but in order to do that properly, the value from du needs to be in kilobytes. I also didn't want to run this through du again just to get the human readable value. So, I did this:
$ du -s ./*/ ./*/*/ ./*/*/*/ \
| sort -rn \
| awk 'BEGIN { \
split("KB MB GB TB PB", type) \
} \
{ \
y = 0; \
x = $1; \
for (i = 4; y < 1 ; i--) \
y = x / (2 ** (10 * i)); \
print y type[i+2]" "$2 \
}'
If you want to count using bytes instead of kilobytes, just add K to the split function and replace i=4 with i=5.
Convert ps
Elapsed Time to Seconds
To convert the elapsed time (in POSIX locale formatted as [[dd-]hh:]mm:ss
) to seconds using awk:
Elapsed=`ps -p $Pid -o etime=`
Elapsed=`echo $Elapsed | tr - : | tr : ' '`
Seconds=`echo $Elapsed | awk '
NF == 2 { print ($1 * 60) + $2 }
NF == 3 { print ((($1 * 60) + $2) * 60) + $3 }
NF == 4 { print ((((($1 * 24) + $2) * 60) + $3) * 60) + $4 }
{}'`
echo $Seconds
For example, a process running for 66-00:12:58
has been running for 5703178 seconds. The Awk command will match how many columns were found and then do the proper calculation.
String Manipulation
String Upper/Lower Casing
There is a tolower()
function that can lowercase an entire string.
You could use this to mass-rename a bunch of files to lower case for instance.
## Lower case every file in the current directory
$ for i in `ls` ; do mv $i `echo $i | awk '{print tolower($1)}'`; done
## Alternatively, use `tr 'A-Z' 'a-z'` to do the lowercasing.
$ for i in `ls` ; do mv $i `echo $i | tr 'A-Z 'a-z'`; done
Retrieving all columns after column N
You can retrieve a specific column using $N
, where N
is the column number. If you wish to get all values after a specific column, you can use this function which concatenates all strings after the specified column together and returns it:
function after(x) {
out=""
for (i=x; i<=NF; i++) out=out" "$i
return out
}
printf("After column 11: %s\n", after(11))
Executing Commands
If your awk script needs to call an external process, pass the command to getline
followed by the variable name used to store stdout output.
Eg. To get the path of a executable from a given PID, use:
"readlink /proc/" $2 "/exe" | getline proc
printf("%s\n", proc)
If you intend to run many commands, you should close the pipe or else you will get a fatal: cannot open pipe 'xyz' (Too many open files)
error. Do so by using the close function:
cmd="date -d\""$1" "$2"\" \"+%s\""; cmd | getline timestamp; close(cmd)