The Ultimate Guide to Mastering the Powerful awk Command on Linux

The Ultimate Guide to Mastering the Powerful awk Command on Linux

Learn how to effectively manipulate text in Linux with the powerful awk command Discover its origins, understand its versatile uses, and master key features such as rules, patterns, and actions Explore output field separators, input field separators, and built-in functions to enhance your text processing skills Uncover the secrets of awk scripting and put an end to any awk-ward confusion

Key Takeaways

awk is used to filter and manipulate output from other programs by applying rules with patterns and actions.

awk is capable of printing designated fields from text, altering the delimiters between fields, and executing diverse functions with its integrated capabilities.

The moniker "awk" is derived from the combined names of its developers: Alfred Aho, Peter Weinberger, and Brian Kernighan.

On Linux, awk is a command-line text manipulation dynamo, as well as a powerful scripting language. Here's an introduction to some of its coolest features.

How awk Got Its Name

The awk command is derived from the initials of its original developers, Alfred Aho, Peter Weinberger, and Brian Kernighan, who created it in 1977 at AT&T Bell Laboratories. These three individuals were part of the esteemed Unix pantheon at the time. Since its inception, awk has undergone significant advancements with the involvement of numerous contributors.

Not only is awk a comprehensive text manipulation toolkit for the command line, but it also functions as a complete scripting language. If you find this article intriguing, you can explore in-depth information regarding awk and its functionalities.

What is awk Used For? Rules, Patterns, and Actions

Awk is a powerful tool used for filtering and manipulating output from various programs and functions. It operates by defining rules consisting of patterns and actions. When a text matches a pattern, awk executes the corresponding action. Patterns are enclosed in curly braces ({}), and when combined with an action, form a rule. The entire awk program is enclosed in single quotes (').

Here is the output generated by the "who" command:

who

The Ultimate Guide to Mastering the Powerful awk Command on Linux

Maybe we don't require all of that data. Instead, we simply desire to view the account names. We can redirect the output from the "who" command to "awk" and instruct "awk" to display only the first field.

By default, "awk" defines a field as a sequence of characters enclosed by whitespace, the beginning of a line, or the end of a line. Fields are denoted by a dollar sign ($) followed by a number. Therefore, $1 represents the first field, which we will utilize with the print function to output the first field.

We type the following:

who | awk '{print $1}'

The Ultimate Guide to Mastering the Powerful awk Command on Linux

awk prints the first field and discards the rest of the line.

We can print as many fields as we like. If we add a comma as a separator, awk prints a space between each field.

We type the following to also print the time the person logged in (field four):

who | awk '{print $1,$4}'

The Ultimate Guide to Mastering the Powerful awk Command on Linux

There are a couple of special field identifiers. These represent the entire line of text and the last field in the line of text:

$0: Represents the entire line of text.

$1: Represents the first field.

$2: Represents the second field.

$7: Represents the seventh field.

$45: Represents the 45th field.

$NF: Stands for "number of fields," and represents the last field.

We'll type the following to bring up a small text file that contains a short quote attributed to Dennis Ritchie:

cat dennis_ritchie.txt

The Ultimate Guide to Mastering the Powerful awk Command on Linux

We want awk to print the first, second, and last field of the quote. Note that although it's wrapped around in the terminal window, it's just a single line of text.

We type the following command:

awk '{print $1,$2,$NF}' dennis_ritchie.txt

The Ultimate Guide to Mastering the Powerful awk Command on Linux

We are unaware of the specific position of the field "simplicity." within the line of text and it is of no significance to us. However, we are aware that it is the final field, and we can obtain its value using $NF. The period is simply regarded as any other character within the field.

Adding Output Field Separators to awk Output

You can specify a character other than the default space character to be printed between fields in awk. The output generated by the date command has an unusual format as the time is placed in the middle. However, by using awk, we can extract the desired fields by executing the following command:

date | awk '{print $2,$3,$6}'

The Ultimate Guide to Mastering the Powerful awk Command on Linux

We will utilize the OFS (output field separator) variable to insert a separator between the month, day, and year. Please take note that in the command below, the command is enclosed in single quotes ('), not curly braces ({}):

date | awk 'OFS="/" {print$2,$3,$6}'

date | awk 'OFS="-" {print$2,$3,$6}'

The Ultimate Guide to Mastering the Powerful awk Command on Linux

The BEGIN and END Rules

Before any text processing begins, the BEGIN rule is executed. In fact, the BEGIN rule is executed even before awk reads any text. Once all processing is complete, the END rule is executed. It is possible to have multiple BEGIN and END rules, and they will be executed in the order they appear.

For our example of a BEGIN rule, we'll print the entire quote from the dennis_ritchie.txt file we used previously with a title above it.

To do so, we type this command:

awk 'BEGIN {print "Dennis Ritchie"} {print $0}' dennis_ritchie.txt

The Ultimate Guide to Mastering the Powerful awk Command on Linux

The BEGIN rule has its own set of actions enclosed within curly braces ({}). We can apply this technique to the command we previously used to redirect output from "who" to "awk". To achieve this, we input the following:

who | awk 'BEGIN {print "Active Sessions"} {print $1,$4}'

The Ultimate Guide to Mastering the Powerful awk Command on Linux

Input Field Separators

If you want awk to process text that does not use whitespace to separate fields, you need to specify the character that the text uses as the field separator. For instance, in the /etc/passwd file, colons (:) are used to separate fields.

To utilize the file and specify the separator as a colon, we employ the -F option in awk. The command below instructs awk to display both the user account name and the home folder:

```

awk -F: '{print $1, $6}' /etc/passwd

```

The Ultimate Guide to Mastering the Powerful awk Command on Linux

The output contains the name of the user account (or application or daemon name) and the home folder (or the location of the application).

The Ultimate Guide to Mastering the Powerful awk Command on Linux

Adding Patterns to awk

To filter out all entries except regular user accounts, we can modify our print action by including a pattern. Since User ID numbers are equal to or greater than 1,000, we can use this information as a basis for our filter. To execute the print action only when the third field ($3) contains a value of 1,000 or greater, we need to type the following code:

awk -F: '$3 >= 1000 {print $1,$6}' /etc/passwd

The Ultimate Guide to Mastering the Powerful awk Command on Linux

The pattern should come directly before the associated action.

To create a title for our report, we can utilize the BEGIN rule. We enter the following code, incorporating the (\n) notation to insert a newline character into the title string:

awk -F: 'BEGIN {print "User Accounts\n-------------"} $3 ≥ 1000 {print $1,$6}' /etc/passwd

{{img_placeholder_12}}= 1000 {print $1,$6}' /etc/passwd in a terminal window" style="display:block;height:auto;max-width:100%;" data-img-url="https://static1.howtogeekimages.com/wordpress/wp-content/uploads/2020/02/19.png"/>

Patterns in awk are powerful regular expressions that add to the beauty of the language.

Suppose we desire to view the universally unique identifiers (UUIDs) of the mounted file systems. By searching the /etc/fstab file for instances of the phrase "UUID," we can obtain the desired information.

We use the search pattern "/UUID/" in our command:

awk '/UUID/ {print $0}' /etc/fstab

The Ultimate Guide to Mastering the Powerful awk Command on Linux

This code snippet locates and displays every line containing the term "UUID". In reality, the print action is unnecessary as the default action already prints the entire line. However, it is often beneficial to explicitly include this action for clarity purposes. By incorporating such clues, you will greatly appreciate them when reviewing your script or history file.

awk '/^UUID/ {print $0}' /etc/fstab can be written as follows:

Begin by adding a start of line token (^) to the regular expression to specify that awk should only process lines that begin with "UUID." The code should look like this:

awk '/^UUID/ {print $0}' /etc/fstab

The Ultimate Guide to Mastering the Powerful awk Command on Linux

That's better! Now, we only see genuine mount instructions. To refine the output even further, we type the following and restrict the display to the first field:

awk '/^UUID/ {print $1}' /etc/fstab

The Ultimate Guide to Mastering the Powerful awk Command on Linux

If we had multiple file systems mounted on this machine, we'd get a neat table of their UUIDs.

How to Use awk's Built-In Functions

awk offers a wide range of functions that can be utilized in your programs, whether accessed through the command line or used in scripts. Exploring these functions can greatly benefit you.

To illustrate the steps involved in calling a function, let's consider some numerical functions. As an example, the code below displays the square root of 625:

awk 'BEGIN { print sqrt(625)}'

This command prints the arctangent of 0 (zero) and -1 (which happens to be the mathematical constant, pi):

awk 'BEGIN {print atan2(0, -1)}'

In the following command, we modify the result of the atan2() function before we print it:

awk 'BEGIN {print atan2(0, -1)*100}'

Functions can accept expressions as parameters. For example, here's a convoluted way to ask for the square root of 25:

awk 'BEGIN { print sqrt((2+3)*5)}'

The Ultimate Guide to Mastering the Powerful awk Command on Linux

awk Scripts

If your command line gets complicated, or you develop a routine you know you'll want to use again, you can transfer your awk command into a script.

In our example script, we're going to do all of the following:

Tell the shell which executable to use to run the script.

Prepare awk to utilize the variable FS as the field separator for parsing input text that contains fields separated by colons (:). Utilize the OFS variable to specify that awk should use colons (:) to separate fields in the output.

Set a counter to 0 (zero).

Set the second field of each line of text to a blank value (it's always an "x," so we don't need to see it).

Print the line with the modified second field.

Increment the counter.

Print the value of the counter.

Our script is shown below.

The Ultimate Guide to Mastering the Powerful awk Command on Linux

The preparation steps are carried out by the BEGIN rule, while the counter value is displayed by the END rule. The middle rule, which does not have a name or pattern so it matches every line, modifies the second field, prints the line, and increments the counter.

The following content is the script which includes the first line that specifies the executable (in this case, awk) to be used. It also denotes the -f option, which tells awk that the input text will be sourced from a file. The filename will be provided when running the script. Please note that you can copy and paste the script from below:

[Insert script here]

#!/usr/bin/awk -f

BEGIN {

FS=":"

OFS=":"

accounts=0

}

{

$2=""

print $0

accounts++

}

END {

print accounts " accounts.\n"

}

chmod +x omit.awk

The Ultimate Guide to Mastering the Powerful awk Command on Linux

Now, we'll run it and pass the /etc/passwd file to the script. This is the file awk will process for us, using the rules within the script:

./omit.awk /etc/passwd

The Ultimate Guide to Mastering the Powerful awk Command on Linux

The file is processed and each line is displayed, as shown below.

The Ultimate Guide to Mastering the Powerful awk Command on Linux

The "x" entries in the second field were removed, but note the field separators are still present. The lines are counted and the total is given at the bottom of the output.

awk Doesn't Stand for Awkward

Awk is not synonymous with awkward; in fact, it represents elegance. It serves as both a processing filter and a report writer, efficiently tackling these functions. Rather impressively, awk accomplishes in a few lines what would typically necessitate lengthy coding in a conventional language.

That power is harnessed by the simple concept of rules that contain patterns, that select the text to process, and actions that define the processing.

Linux Commands

Files

tar · pv · cat · tac · chmod · grep · diff · sed · ar · man · pushd · popd · fsck · testdisk · seq · fd · pandoc · cd · $PATH · awk · join · jq · fold · uniq · journalctl · tail · stat · ls · fstab · echo · less · chgrp · chown · rev · look · strings · type · rename · zip · unzip · mount · umount · install · fdisk · mkfs · rm · rmdir · rsync · df · gpg · vi · nano · mkdir · du · ln · patch · convert · rclone · shred · srm · scp · gzip · chattr · cut · find · umask · wc · tr

Processes

alias · screen · top · nice · renice · progress · strace · systemd · tmux · chsh · history · at · batch · free · which · dmesg · chfn · usermod · ps · chroot · xargs · tty · pinky · lsof · vmstat · timeout · wall · yes · kill · sleep · sudo · su · time · groupadd · usermod · groups · lshw · shutdown · reboot · halt · poweroff · passwd · lscpu · crontab · date · bg · fg · pidof · nohup · pmap

Networking

netstat · ping · traceroute · ip · ss · whois · fail2ban · bmon · dig · finger · nmap · ftp · curl · wget · who · whoami · w · iptables · ssh-keygen · ufw · arping · firewalld