sed and awk

sed and awk are two powerful but often overlooked data processing tools, and if you know how to use them effectively you’ll be well ahead of many sysadmins and developers.

sed

sed is an in-place stream editor, that is, it accepts input from a file or STDIN, manipulates the data stream in some way, and then passes the results to another file or STDOUT. It is another product of Bell Labs, and Linux users are most likely to encounter the GNU version. TLDP is also an excellent resource.

More specifically, sed matches data using line numbers or regular expressions and then acts on these matches as specified by the supplied commands. sed thinks in terms of its pattern space, which is a data buffer where sed stores each full line, then removes any newline (n) character, and then performs the specified commands on the line; and when the script finishes the entire pattern space is printed out unless you limit sed’s output (for example, with the -n option). sed’s hold space is a second buffer that can store matched data specified by you.

Some sed Commands and Switches

Here are the commands and switches I use most frequently. General usage is sed switch command.

CommandPurpose
s/old/new/The master substitute command, changes the first instance of ‘old’ to ‘new’ in the pattern space
/!Inverter, i.e., only perform actions on addresses that do not match the specified pattern
/1, /2, etc.Specify which occurrence of a match is changed
/dDelete the pattern space
/gPerform substitution on all matches in the pattern space, not just the first
/IIgnore case
/pPrint the pattern space to STDOUT; usually used with -n
/w fileWrite the pattern space to file
SwitchPurpose
-e scriptRun the commands from a specified script
-f fileRun the commands from a specified file
-i.optional-extensionEdit files in-place, that is, overwrite the input file(s); if extension is supplied, copy the original to file.extension before overwriting
-l numberSpecify line length (default is 70 characters)
-nDon’t print anything unless requested (usually with /p)
-rUse extended regular expressions

A Few sed Examples

Some examples of sed. Additional examples can be found on my search and replace page. The sed one-liners page provides many more.

Find “text” at any point/beginning/end of a line:

$ sed 's/text OR ^text OR text$//' file.txt

Replace all instances of “text”, case-insensitive first letter, with “blar” in file.txt:

$ sed 's/[Tt]ext/blar/g' file.txt

Delete all lines that do or don’t contain “blar”:

$ sed '/blar/d' file.txt
$ sed '/blar/!d' file.txt

Replace all instances of “text” with “blar” from line 1-100, or from line 101 to EOF:

$ sed '1,100 s/text/blar/g' file.txt
$ sed '101,$ s/text/blar/g' file.txt

Change “text” to “blar” on all lines except between START and END:

$ sed '/START/,/END/!s/text/blar/g' file.txt

Redo all lines in file.txt but add parentheses:

$ sed 's/.*/( & )/' file.txt

Delete blank lines in file.txt:

$ sed '/^$/d' file.txt

Lowercase or uppercase an entire file:

$ sed 's/.*/L&/' file.txt
$ sed 's/.*/U&/' file.txt

Uppercase first letter of each word on current line:

$ sed 's/<./u&/g' file.txt

awk

awk is an interpreted language used for text processing and reporting. The original is yet another product of Bell Labs and various other implementations are widely encountered: mawk is the default interpreter for many current Linux distros, and the GNU implementation (gawk) is common as well. The awk Manual is a good resource for the awk language itself, as is the awk Primer.

Basic awk is pretty straightforward: it reads input either from STDIN or specified files, and this input is split into records denoted by $0 (the default being one line, much like sed), and each record is split into fields denoted by $1, $2, etc., at which point the specified patterns and rules are applied to each field.

awk also understands arithmetic operations (+, -, /, *, %), numeric operations (sin, exp, sqrt, rand), arrays, C-like string formatting with printf (%f, %s, %d), if-else conditions, for and while loops, and some string and system functions like substr(), length(), and systime(). This varies between awk implementations so check your documentation. On the command line, general usage is awk options search-pattern {program actions} file.

Some awk Options and Built-in Variables

Some commonly available awk command-line switches and variables. More are available in gawk, mawk, etc. but these are the basics.

OptionPurpose
-FfsSpecifies the field delimiter (default is space)
-fSpecify a file from which to load awk commands or programs
VariablePurpose
$0The full record; equivalent to “print all fields”
$1, $2, etc.Given field in a record
FILENAMEThe name of the file being read
FSThe field separator; default is whitespace; same as using -F on the command line
NFNumber of fields in a given record
NRNumber of records
OFSThe output field separator; default is the space character
ORSOutput record separator; default is the newline character
RSThe record separator; default is the newline character

Some awk Examples and One-Liners

Print the fifth field from /etc/passwd, delimited by ‘:’, from any record (line) containing ‘admin’, then print the record number and its number of fields:

$ awk -F: '/admin/ { print $5, "Records: "NR, "Fields: "NF }' /etc/passwd
Gnats Bug-Reporting System (admin) Records: 17 Fields: 7

Double-space or triple-space a file. (Note: In awk, ‘1’ always evaluates to true, making awk perform the default operation {print $0}, so it’s a shorthand way of printing a line):

$ awk '1;{print ""}' file.txt  # This is the same as $ awk '{print $0 "n"}' file.txt
$ awk '1;{print "n"}' file.txt

Number each line with its line number, followed by tab:

$ awk '{print NR "t" $0}' file.txt

Count the lines in a file:

$ awk 'END{print NR}' file.txt

Print every line with more than 4 fields, or every line where the value of the last field is > 4:

$ awk 'NF > 4' file.txt
$ awk '$NF > 4' file.txt

Delete leading whitespace (align text left), delete trailing whitespace:

$ awk '{sub(/^[ t]+/, "")};1' file.txt
$ awk '{sub(/[ t]+$/, "")};1' file.txt

Match/inverted match of a field against a regular expression:

$ awk '$1  ~ /^[a-f]/' file.txt # print if match
$ awk '$1 !~ /^[a-f]/' file.txt # print if doesn't match

Delete all blanks lines from a file:

$ awk NF file.txt # this works because 'NF' = 0 for a blank record, thus nothing is printed

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *