Basics of grep, sed and awk

Grep, sed and awk are really powerful Linux tools that are worth to be known to any developer. Basically, these tools provide very flexible and strong text processing mechanisms. Most benefit of these tools is gained when working with large files. This does not mean 30’000 lines but millions of lines. For the most part, interactive text editors can not handle these large files very efficiently. I will dive into each of them very pragmatically to get you started.

First of all, defining the number of lines in a file can be performed as follows:

wc -l filename

grep

Grep means Global Regular Expression and Print and basically allows you to search output or file. It is a text search utility used from Linux command line to globally search a file or STFIN for a given regular expression. It will print matching lines to STDOUT. The basic syntax looks as follows:

grep [options] regex [filename]

Here are some examples of using grep:


# Show grep version
grep –version
# Print lines containing sys
grep sys /etc/passwd
# Print lines containing SYS case insensitive
grep SYS -i /etc/passwd
# Count lines containing sys
grep -c sys /etc/passwd
# Print last 10 lines containing sys
tail -10 /etc/passwd | grep sys
# Print lines fullfilling regex (put regex in single quotes)
grep '^[a-z]' /etc/passwd
# Print non empty lines
grep -v '^$' /etc/passwd
# Print 2 lines after the match (including the match)
# use B instead for before, C for before and after
grep proxy -A2 /etc/passwd

view raw

grep

hosted with ❤ by GitHub

Working with grep requires basic knowledge of regular expressions. Therefore, a short summary of regex is given in the following:


# Anchors
^ Start of string
$ End of string
# Ranges
[A-Za-z] any letter
[0-9] any digit
[349] matches 3, 4 or 9
[^5] any character except 5 (negation)
# Boundaries
\s whitespace
\S non-whitespace
\b word boundary
\B non-word boundary
# Quantifiers
* zero or more times
? zero or one time
+ one or more times
{n} exactly n times

view raw

regex

hosted with ❤ by GitHub

sed

sed (stream editor) is a command-line based text editor. It is one of the „veterans“ in the Linux world and is virtually in every Linux installation included. It allows to perform common text editing tasks like printing, substituting, inserting, deleting, appending lines etc. The basic syntax looks as follows:

sed [options] sed-script [filename]

Here are some examples of using sed:


# Print (note that print lines are duplicated as the match line and the standard output is printed)
sed 'p' /etc/passwd
# Print lines but standard output is suppressed
sed -n 'p' /etc/passwd
# Print lines 1 to 5
sed -n '1,5 p' /etc/passwd
# Print lines fullfilling regex
sed -n '/^root/ p' /etc/passwd
# Substitute bin through binary
# An optional range can be specified in front
sed 's/bin/binary/' /etc/passwd
# Substitute bin through binary but also replace multiple matches in one line
sed 's/bin/binary/g' /etc/passwd
# Substitute bin/bash through bin/sh
# As the search and replacement string contain / another delimiter can be chossen, here @
sed 's@/bin/bash@/bin/sh@' /etc/passwd
# Substitute and print only changes
sed -n 's/bin/binary/p' /etc/passwd
# Write changes to file and backup original file with i-option
sed -i.bak 's/bin/binary/' /etc/passwd
# Insert line before line starting with 'root'
sed '/^root/ i line to be inserted' /etc/passwd
# Insert line after line starting with 'root'
sed '/^root/ a line to be appended' /etc/passwd
# Delete line starting with 'root'
sed '/^root/ d' /etc/passwd
# Multiple sed expressions in the command line
sed '{
/^root/ i line to be inserted
/^root/ a line to be appended
/^root/ d
}' /etc/passwd
# Using a sed script file
sed -f myscript.sed /etc/passwd
# Uppercase 1st column, lowercase 2nd column in comma-separted file with substituting grouping
# Substitution group \([^,]*\) means everything else than a comma
sed 's/\([^,]*\),\([^,]*\)/\U\1,\L\2/ file.csv
# Substitute and subsequently execute
sed 's/^/sudo useradd/e user.list

view raw

sed

hosted with ❤ by GitHub

awk

awk is a scripting language for editing and analyzing texts. Input data is always processed line by line. The name awk comes from the initials of the developer.

awk [options] awk-script filename

There are a number of variables coming with awk:

  • FS: field separator
  • OFS: output field separator
  • RS: record separator
  • ORS: output record separator
  • NR: number of records in file
  • NF: number of fields in record
  • FILENAME: name of file being read
  • FNR: number of records relative to current file

Here are some examples of using awk:


# execute awk file
awk -f users.awk /etc/passwd
# print specific column and show total lines processed
BEGIN { FS=":" ; print "Username"}
{print $1}
END {print "Total users= " NR}
# print column where it meets critera
BEGIN { FS=":" ; print "Username"}
$3>499 {print $1}
# count lines beginning with 'root' and print total users
BEGIN { FS=":" ; print "Username"}
/^root/ {print $1 ; count++}
END {print "Total users= " count}
# Uppercase 1st column, lowercase 2nd column in comma-separted file with substituting grouping
# compare to same sed command above. this is much easier
awk -F"," {print toupper($1), tolower($2), $3} file.csv
# extract xml records which are separated by two new lines
BEGIN { RS="\n\n"}
$0 ~ search {print}
awk -f xml.awk search=example xmlfile
# Count number of specific element
BEGIN { FS=" "; print "Log access"}
{ip[$1]++} // value $1 is the key (associative array)
END { for (i in ip)
print i, " has accesed ", ip[i], "times."
}
# print max number of specific element
BEGIN { FS=" "; print "Most popular browser"}
{browser[$1]++}
END { for (b in browser)
if (max < browser[b]) {
max = browser[b];
maxbrwoser = b;
}
print "Most access was from ", maxbrowser, " and ", max, " times."
}

view raw

awk

hosted with ❤ by GitHub

Recap Linux

Permissions

I work very irregularly with Linux. When I do, however, I often have to change the permissions of a file. And almost every time it happens that I can not remember the rough concepts and the commands. Therefore, I have decided to write it down here very briefly . Maybe it will help even someone else..

Show permissions of a file or folder:

ls -ld filename

What does all the following mean?

linux_permissions

ModeFields Hardlinks Owner Group Filesize Date&Time Filename

The first mode field is the „special file“ designator. It basically marks the type of the file Regular files display as – (none). Then, the mode field has three triples of format rwx . The first triple determines the permissions for the user , the second for the group and the third for others. r ⇒ read access, w ⇒ write access , x ⇒ executable.

Give all permissions to everyone:

chmod 777 filename

chmod means change mode fields. 7 is 4+2+1 that is 111.

Installing and updating software

http://superuser.com/questions/125933/what-is-the-difference-between-yum-apt-get-rpm-configure-make-install