Awk

From Leo's Notes
Last edited on 3 July 2021, at 03:56.

Awk is an excellent text parser and scripting language. It can be run in-line as part of a Unix command pipeline which makes it extremely useful when needing to add more complicated behavior in a shell script.

Introduction to Awk[edit | edit source]

Awk's scripting language is structed as a sequence of patterns and actions:

# A pattern definition
PATTERN  { ACTION1; ACTION2; }

# A real example
/^[a-z0-9]/ { print $1 }

# Another example
/^[0-9]\s*/ { Sum += $1 }

A pattern is typically defined as a regular expression, boolean expressions, and special patterns. Regular expressions are always enclosed with a leading and ending forward slash. Boolean expressions are additional expressions that use the && (and), || (or), and ! (not) operators. Special patterns are additional awk built-ins that trigger on specific conditions, such as before the first line and after the last line with BEGIN and END.

# To run an action before the first input, use BEGIN
BEGIN { Sum = 0 }

# Get a number and add it to sum using regular expressions
/^[0-9]+ / { Sum += $1 }

# Add more numbers with boolean expressions
/^[0-9]+ / && $1 >= 0 { print "Adding" $1 }

# After the last output, use END
END { print "Sum is" Sum }

Fields[edit | edit source]

Fields are basically the 'word' within the current line delimited by one or more white spaces. The field $1 would reference the first 'word' of the current line. $2 the second word, and so on. The field $0 represents the entire line.

# Here's an example line:
#
# $1     $2            $3          $4  $5 $6   $7
# GET    /index.html   HTTP/1.0    Dec 20 2020 01:23:45
/index.html/ {
print "Someone accessed index.html on" $4 $4 $5 $6 $7
}

Tip: You may alter the current line being processed by assigning a modified value to $0.

Scripting[edit | edit source]

Functions[edit | edit source]

Functions are defined in the C style:

function do_something(parameter1, parameter2, ...) {
   ACTIONS
}

They can then be called like any other function when defining actions.

/^GET/ { do_something($0) }

Variables[edit | edit source]

Awk has built-in variables:

  • NR - current line number
  • NF - number of fields in current line
  • OFS - output field separator
  • FS - field separator, specified with -F
  • RS - record separator

You can also pass your own custom variables with the -v var=value parameter. Eg:

$ echo | awk \
   -v hostname=`hostname` \
   -v timestamp=`date +%s` '
{
   printf("Hostname is %s at time %s", hostname, timestamp")
}
'


Inlining with Bash[edit | edit source]

When writing a bash script, you may inline Awk as part of a command pipeline by passing the Awk script within a set of single quotes. We use single quotes because we do not want Bash to treat Awk fields as variables (Eg. values such as $2 should not be replaced).

#!/bin/bash

cat /etc/hosts | awk '
/^server/ { print $2 }
'

Tip: To add a single quote, you will need to use '"'"' which inserts a single ' by switching from single to double quotes. This is useful if you need to insert a single quote in order to trigger a subshell call, for instance.

Tasks[edit | edit source]

Line Matching[edit | edit source]

With Awk, it's easy to do something for each matching line using the regex matching operator.

Print every line before a line match[edit | edit source]

Simple awk code:

# cat list.txt | awk '/PATTERN/ { exit } { print $0 }'

Basically, the code does: match against the given pattern. If it matches, awk exits. Otherwise, print the line and continue.

Print the line number on matching lines[edit | edit source]

To print the number of lines up to a matching line, we do something similar to the previous example but now we keep an accumulator (n):

awk 'BEGIN { n = 0 } /PATTERN/ { print n; exit } { n++ }'

Eg: Suppose I have a file with the contents:

leo
spoon
cake
fork

To find the number of lines up until 'cake', do:

# cat list.txt | awk 'BEGIN { n = 0 } /cake/ { print n; exit } { n++ }'

Retrieve a section of text with line matching[edit | edit source]

To grab a specific section from a .spec file (where sections begin with %):

if [ $# -ne 2 ] ; then
	echo "Usage: $0 file.spec section"
	exit
fi

Section="$2"

cat $1 | awk -v Section="$Section" '
BEGIN {
	InSection=false
} 
/^%.*/ { 
	if ($0 ~ Section) {
		InSection=1
	} else {
		InSection=0
	}

	next
}
{
	if (InSection) {
		print $0
	}
}

String Matching[edit | edit source]

Use the match(string, regex, output_array) command to parse out specific values with regex.

# Given a string Job <87010>, User <asdf>, Project <default>
# we can parse out the Job ID with:

match($0, /^Job <([^>]+)>.*/, arr)
print "Job ID: " arr[1]

Converting Values[edit | edit source]

Convert Number as Bytes to Human Readable Value[edit | edit source]

I wanted to sort size in reverse order, but in order to do that properly, the value from du needs to be in kilobytes. I also didn't want to run this through du again just to get the human readable value. So, I did this:

$ du -s ./*/ ./*/*/ ./*/*/*/ \ 
   | sort -rn \ 
   | awk 'BEGIN { \ 
      split("KB MB GB TB PB", type) \ 
   } \ 
   { \ 
      y = 0; \                                                                                                  
      x = $1; \ 
      for (i = 4; y &lt; 1 ; i--) \ 
         y = x / (2 ** (10 * i)); \ 
      print y type[i+2]" "$2 \ 
   }'

If you want to count using bytes instead of kilobytes, just add K to the split function and replace i=4 with i=5.

Convert ps Elapsed Time to Seconds[edit | edit source]

To convert the elapsed time (in POSIX locale formatted as [[dd-]hh:]mm:ss) to seconds using awk:

Elapsed=`ps -p $Pid -o etime=`
Elapsed=`echo $Elapsed | tr - : | tr : ' '` 

Seconds=`echo $Elapsed | awk ' 
	NF == 2 { print ($1 * 60) + $2 } 
	NF == 3 { print ((($1 * 60) + $2) * 60) + $3 } 
	NF == 4 { print ((((($1 * 24) + $2) * 60) + $3) * 60) + $4 } 
	{}'` 
	
echo $Seconds

For example, a process running for 66-00:12:58 has been running for 5703178 seconds. The Awk command will match how many columns were found and then do the proper calculation.

String Manipulation[edit | edit source]

String Upper/Lower Casing[edit | edit source]

There is a tolower() function that can lowercase an entire string.

You could use this to mass-rename a bunch of files to lower case for instance.

## Lower case every file in the current directory
$ for i in `ls` ; do mv $i `echo $i | awk '{print tolower($1)}'`; done

## Alternatively, use `tr 'A-Z' 'a-z'` to do the lowercasing.
$ for i in `ls` ; do mv $i `echo $i | tr 'A-Z 'a-z'`; done

Retrieving all columns after column N[edit | edit source]

You can retrieve a specific column using $N, where N is the column number. If you wish to get all values after a specific column, you can use this function which concatenates all strings after the specified column together and returns it:

function after(x) {
        out=""
        for (i=x; i<=NF; i++) out=out" "$i
        return out
}

printf("After column 11: %s\n", after(11))

Executing Commands[edit | edit source]

If your awk script needs to call an external process, pass the command to getline followed by the variable name used to store stdout output.

Eg. To get the path of a executable from a given PID, use:

"readlink /proc/" $2 "/exe" | getline proc
printf("%s\n", proc)

If you intend to run many commands, you should close the pipe or else you will get a fatal: cannot open pipe 'xyz' (Too many open files) error. Do so by using the close function:

cmd="date -d\""$1" "$2"\" \"+%s\""; cmd | getline timestamp; close(cmd)