In this beginner-friendly guide, you'll learn the very basics of the awk
command. You'll also see some of the ways you can use it when dealing with text.
Let's get started!
awk
command?
awk
is a scripting language, and it is helpful when working in the command line. It's also a widely used command for text processing.
When using awk
, you are able to select data – one or more pieces of individual text – based on a pattern you provide.
For example, some of the operations you can do with awk
are searching for a specific word or pattern in a piece of text given, or even select a certain line or a certain column in a file you provide.
awk
command
In its simplest form, the awk
command is followed by a set of single quotation marks and a set of curly braces, with the name of the file you want to search through mentioned last.
It looks something like this:
awk '{action}' your_file_name.txt
When you want to search for text that has a specific pattern or you're looking for a specific word in the text, the command would look something like this:
awk '/regex pattern/{action}' your_file_name.txt
To create a file in the command line, you use the touch
command.
For example: touch filename.txt
where filename
, is the name of your file.
You can then use the open
command (open filename.txt
), and a word processor program like TextEdit will open where you can add the contents of the file.
So, say you have a text file, information.txt
, that contains data separated into different columns.
The file contents could look something like this:
fristName lastName age city ID
Thomas Shelby 30 Rio 400
Omega Night 45 Ontario 600
Wood Tinker 54 Lisbon N/A
Giorgos Georgiou 35 London 300
Timmy Turner 32 Berlin N/A
In my example, there is one column for firstName
, lastName
, age
, city
, and ID
.
At any time, you can view the output of the contents of your file by typing cat text_file
, where text_file
is the name of your file.
awk
To print all the contents of a file, the action you specify inside the curly braces is print $0
.
This will work in exactly the same way as the cat
command mentioned previously.
awk '{print $0}' information.txt
Ouptut:
fristName lastName age city ID
Thomas Shelby 30 Rio 400
Omega Night 45 Ontario 600
Wood Tinker 54 Lisbon N/A
Giorgos Georgiou 35 London 300
Timmy Turner 32 Berlin N/A
If you would like each line to have a line-number count, you would use the NR
built-in variable:
awk '{print NR,$0}' information.txt
1 fristName lastName age city ID
2
3 Thomas Shelby 30 Rio 400
4 Omega Night 45 Ontario 600
5 Wood Tinker 54 Lisbon N/A
6 Giorgos Georgiou 35 London 300
7 Timmy Turner 32 Berlin N/A
awk
When using awk
, you can specify certain columns you want printed.
To have the first column printed, you use the command:
awk '{print $1}' information.txt
Ouput:
Thomas
Omega
Wood
Giorgos
Timmy
The $1
stands for the first field, in this case the first column.
To print the second column,you would use $2
:
awk '{print $2}' information.txt
Output:
lastName
Shelby
Night
Tinker
Georgiou
Turner
The way awk
determines where each column starts and ends is with a space, by default.
To print more than one column, for example the first and forth columns, you would do:
awk '{print $1, $4}' information.txt
Ouput:
fristName city
Thomas Rio
Omega Ontario
Wood Lisbon
Giorgos London
Timmy Berlin
The $1
represents the first input field (first column), and the $4
represents the forth. You separate them with a comma, $1,$4
, so the output has a space and is more readable.
To print the last field (the last column), you can also use $NF
which represents the last field in a record:
awk '{print $NF}' information.txt
Output:
ID
400
600
N/A
300
N/A
You can also specify the line you want printed from your chosen column:
awk '{print $1}' information.txt | head -1
Ouput:
FirstName
Let's break that command down. awk '{print $1}' information.txt
prints the first column. Then the output of that command (which you saw earlier on) is piped, using the pipe symbol |
, to the head command, where its -1
argument selects the first line of the column.
If you wanted two lines printed, you'd do:
awk '{print $1}' information.txt | head -2
Output:
FirstName
Dionysia
awk
You can print a line that starts with a specific letter.
For example:
awk '/^O/' information.txt
Output:
Omega Night 45 Ontario 600
That command selects any line with text that starts with an O
.
You use the up arrow symbol (^
) first, which indicates the beginning of a line, and then the letter you want a line to start with.
You can also print a line that ends in a specific pattern:
awk '/0$/' information.txt
Output:
Thomas Shelby 30 Rio 400
Omega Night 45 Ontario 600
Giorgos Georgiou 35 London 300
This prints out the lines that end in a 0
– the $
symbol is used after a character to siginify how a line will end.
That command could also be changed to:
awk '! /0$/' information.txt
The !
is used as a NOT
, so in this case it selects the lines that DON'T end in a 0
.
fristName lastName age city ID
Wood Tinker 54 Lisbon N/A
Timmy Turner 32 Berlin N/A
awk
To output words that contain certain letters and print out words that match a pattern you specify, you again use the slashes, //
, shown previously.
If you want to look for words containing on
, you'd do:
awk ' /io/{print $0}' information.txt
Output:
Thomas Shelby 30 Rio 400
Omega Night 45 Ontario 600
Giorgos Georgiou 35 London 300
This matches all entries that contain io
.
Say you had an extra column – a department
column:
fristName lastName age city ID department
Thomas Shelby 30 Rio 400 IT
Omega Night 45 Ontario 600 Design
Wood Tinker 54 Lisbon N/A IT
Giorgos Georgiou 35 London 300 Data
Timmy Turner 32 Berlin N/A Engineering
To find all the information of people working in IT
, you would need to speficy the string you're searching for between the slashes, //
:
awk '/IT/' information.txt
Output:
Thomas Shelby 30 Rio 400 IT
Wood Tinker 54 Lisbon N/A IT
What if you wanted to see only the first and last names of the people working in IT
?
You can specify the column like such:
awk '/IT/{print $1, $2}' information.txt
Output:
Thomas Shelby
Wood Tinker
This will only display the first and second columns where IT
appears, instead of presenting all fields.
When searching for words with a specific pattern, there may be times when you'll need to use an escape character, like such:
awk '/N\/A$/' information.txt
Output:
Wood Tinker 54 Lisbon N/A
Timmy Turner 32 Berlin N/A
I wanted to find lines that end with the pattern N/A
.
So, when searching between the ' // '
like shown so far, I had to use an escape character (\
) between N/A
, otherwise I would've gotten an error.
awk
If, for example, you wanted to find all the information of employees that were under the age of 40
, you would use the <
comparisson operator like so:
awk '$3 < 40 { print $0 }' information.txt
Output:
Thomas Shelby 30 Rio 400
Giorgos Georgiou 35 London 300
Timmy Turner 32 Berlin N/A
The output shows only the information of people under 40.
And there you have it! You now know the absolute basics to start working with awk
and manipulate text data.
To learn more about Linux, freeCodeCamp has a wide variety of learning materials available.
Here are a couple of them get you started:
Thanks for reading and happy learning
на главную сниппетов