Awk is an excellent text parser and scripting language. It can be run in-line as part of a Unix command pipeline which makes it extremely useful when needing to add more complicated behavior in a shell script.
Introduction to Awk[edit | edit source]
Awk's scripting language is structed as a sequence of patterns and actions:
# A pattern definition
PATTERN { ACTION1; ACTION2; }
# A real example
/^[a-z0-9]/ { print $1 }
# Another example
/^[0-9]\s*/ { Sum += $1 }
A pattern is typically defined as a regular expression, boolean expressions, and special patterns. Regular expressions are always enclosed with a leading and ending forward slash. Boolean expressions are additional expressions that use the &&
(and), ||
(or), and !
(not) operators. Special patterns are additional awk built-ins that trigger on specific conditions, such as before the first line and after the last line with BEGIN
and END
.
# To run an action before the first input, use BEGIN
BEGIN { Sum = 0 }
# Get a number and add it to sum using regular expressions
/^[0-9]+ / { Sum += $1 }
# Add more numbers with boolean expressions
/^[0-9]+ / && $1 >= 0 { print "Adding" $1 }
# After the last output, use END
END { print "Sum is" Sum }
Fields[edit | edit source]
Fields are basically the 'word' within the current line delimited by one or more white spaces. The field $1
would reference the first 'word' of the current line. $2
the second word, and so on. The field $0
represents the entire line.
# Here's an example line:
#
# $1 $2 $3 $4 $5 $6 $7
# GET /index.html HTTP/1.0 Dec 20 2020 01:23:45
/index.html/ {
print "Someone accessed index.html on" $4 $4 $5 $6 $7
}
Tip: You may alter the current line being processed by assigning a modified value to $0.
Scripting[edit | edit source]
Functions[edit | edit source]
Functions are defined in the C style:
function do_something(parameter1, parameter2, ...) {
ACTIONS
}
They can then be called like any other function when defining actions.
/^GET/ { do_something($0) }
Variables[edit | edit source]
Awk has built-in variables:
NR
- current line numberNF
- number of fields in current lineOFS
- output field separatorFS
- field separator, specified with -FRS
- record separator
You can also pass your own custom variables with the -v var=value
parameter. Eg:
$ echo | awk \
-v hostname=`hostname` \
-v timestamp=`date +%s` '
{
printf("Hostname is %s at time %s", hostname, timestamp")
}
'
Inlining with Bash[edit | edit source]
When writing a bash script, you may inline Awk as part of a command pipeline by passing the Awk script within a set of single quotes. We use single quotes because we do not want Bash to treat Awk fields as variables (Eg. values such as $2
should not be replaced).
#!/bin/bash
cat /etc/hosts | awk '
/^server/ { print $2 }
'
Tip: To add a single quote, you will need to use '"'"'
which inserts a single '
by switching from single to double quotes. This is useful if you need to insert a single quote in order to trigger a subshell call, for instance.
Tasks[edit | edit source]
Line Matching[edit | edit source]
With Awk, it's easy to do something for each matching line using the regex matching operator.
Print every line before a line match[edit | edit source]
Simple awk code:
# cat list.txt | awk '/PATTERN/ { exit } { print $0 }'
Basically, the code does: match against the given pattern. If it matches, awk exits. Otherwise, print the line and continue.
Print the line number on matching lines[edit | edit source]
To print the number of lines up to a matching line, we do something similar to the previous example but now we keep an accumulator (n):
awk 'BEGIN { n = 0 } /PATTERN/ { print n; exit } { n++ }'
Eg: Suppose I have a file with the contents:
leo
spoon
cake
fork
To find the number of lines up until 'cake', do:
# cat list.txt | awk 'BEGIN { n = 0 } /cake/ { print n; exit } { n++ }'
Retrieve a section of text with line matching[edit | edit source]
To grab a specific section from a .spec file (where sections begin with %
):
if [ $# -ne 2 ] ; then
echo "Usage: $0 file.spec section"
exit
fi
Section="$2"
cat $1 | awk -v Section="$Section" '
BEGIN {
InSection=false
}
/^%.*/ {
if ($0 ~ Section) {
InSection=1
} else {
InSection=0
}
next
}
{
if (InSection) {
print $0
}
}
String Matching[edit | edit source]
Use the match(string, regex, output_array)
command to parse out specific values with regex.
# Given a string Job <87010>, User <asdf>, Project <default>
# we can parse out the Job ID with:
match($0, /^Job <([^>]+)>.*/, arr)
print "Job ID: " arr[1]
Converting Values[edit | edit source]
Convert Number as Bytes to Human Readable Value[edit | edit source]
I wanted to sort size in reverse order, but in order to do that properly, the value from du needs to be in kilobytes. I also didn't want to run this through du again just to get the human readable value. So, I did this:
$ du -s ./*/ ./*/*/ ./*/*/*/ \
| sort -rn \
| awk 'BEGIN { \
split("KB MB GB TB PB", type) \
} \
{ \
y = 0; \
x = $1; \
for (i = 4; y < 1 ; i--) \
y = x / (2 ** (10 * i)); \
print y type[i+2]" "$2 \
}'
If you want to count using bytes instead of kilobytes, just add K to the split function and replace i=4 with i=5.
Convert ps
Elapsed Time to Seconds[edit | edit source]
To convert the elapsed time (in POSIX locale formatted as [[dd-]hh:]mm:ss
) to seconds using awk:
Elapsed=`ps -p $Pid -o etime=`
Elapsed=`echo $Elapsed | tr - : | tr : ' '`
Seconds=`echo $Elapsed | awk '
NF == 2 { print ($1 * 60) + $2 }
NF == 3 { print ((($1 * 60) + $2) * 60) + $3 }
NF == 4 { print ((((($1 * 24) + $2) * 60) + $3) * 60) + $4 }
{}'`
echo $Seconds
For example, a process running for 66-00:12:58
has been running for 5703178 seconds. The Awk command will match how many columns were found and then do the proper calculation.
String Manipulation[edit | edit source]
String Upper/Lower Casing[edit | edit source]
There is a tolower()
function that can lowercase an entire string.
You could use this to mass-rename a bunch of files to lower case for instance.
## Lower case every file in the current directory
$ for i in `ls` ; do mv $i `echo $i | awk '{print tolower($1)}'`; done
## Alternatively, use `tr 'A-Z' 'a-z'` to do the lowercasing.
$ for i in `ls` ; do mv $i `echo $i | tr 'A-Z 'a-z'`; done
Retrieving all columns after column N[edit | edit source]
You can retrieve a specific column using $N
, where N
is the column number. If you wish to get all values after a specific column, you can use this function which concatenates all strings after the specified column together and returns it:
function after(x) {
out=""
for (i=x; i<=NF; i++) out=out" "$i
return out
}
printf("After column 11: %s\n", after(11))
Executing Commands[edit | edit source]
If your awk script needs to call an external process, pass the command to getline
followed by the variable name used to store stdout output.
Eg. To get the path of a executable from a given PID, use:
"readlink /proc/" $2 "/exe" | getline proc
printf("%s\n", proc)
If you intend to run many commands, you should close the pipe or else you will get a fatal: cannot open pipe 'xyz' (Too many open files)
error. Do so by using the close function:
cmd="date -d\""$1" "$2"\" \"+%s\""; cmd | getline timestamp; close(cmd)