Basics of grep, sed and awk

Grep, sed and awk are really powerful Linux tools that are worth to be known to any developer. Basically, these tools provide very flexible and strong text processing mechanisms. Most benefit of these tools is gained when working with large files. This does not mean 30’000 lines but millions of lines. For the most part, interactive text editors can not handle these large files very efficiently. I will dive into each of them very pragmatically to get you started.

First of all, defining the number of lines in a file can be performed as follows:

wc -l filename

grep

Grep means Global Regular Expression and Print and basically allows you to search output or file. It is a text search utility used from Linux command line to globally search a file or STFIN for a given regular expression. It will print matching lines to STDOUT. The basic syntax looks as follows:

grep [options] regex [filename]

Here are some examples of using grep:


# Show grep version
grep –version
# Print lines containing sys
grep sys /etc/passwd
# Print lines containing SYS case insensitive
grep SYS -i /etc/passwd
# Count lines containing sys
grep -c sys /etc/passwd
# Print last 10 lines containing sys
tail -10 /etc/passwd | grep sys
# Print lines fullfilling regex (put regex in single quotes)
grep '^[a-z]' /etc/passwd
# Print non empty lines
grep -v '^$' /etc/passwd
# Print 2 lines after the match (including the match)
# use B instead for before, C for before and after
grep proxy -A2 /etc/passwd

view raw

grep

hosted with ❤ by GitHub

Working with grep requires basic knowledge of regular expressions. Therefore, a short summary of regex is given in the following:


# Anchors
^ Start of string
$ End of string
# Ranges
[A-Za-z] any letter
[0-9] any digit
[349] matches 3, 4 or 9
[^5] any character except 5 (negation)
# Boundaries
\s whitespace
\S non-whitespace
\b word boundary
\B non-word boundary
# Quantifiers
* zero or more times
? zero or one time
+ one or more times
{n} exactly n times

view raw

regex

hosted with ❤ by GitHub

sed

sed (stream editor) is a command-line based text editor. It is one of the „veterans“ in the Linux world and is virtually in every Linux installation included. It allows to perform common text editing tasks like printing, substituting, inserting, deleting, appending lines etc. The basic syntax looks as follows:

sed [options] sed-script [filename]

Here are some examples of using sed:


# Print (note that print lines are duplicated as the match line and the standard output is printed)
sed 'p' /etc/passwd
# Print lines but standard output is suppressed
sed -n 'p' /etc/passwd
# Print lines 1 to 5
sed -n '1,5 p' /etc/passwd
# Print lines fullfilling regex
sed -n '/^root/ p' /etc/passwd
# Substitute bin through binary
# An optional range can be specified in front
sed 's/bin/binary/' /etc/passwd
# Substitute bin through binary but also replace multiple matches in one line
sed 's/bin/binary/g' /etc/passwd
# Substitute bin/bash through bin/sh
# As the search and replacement string contain / another delimiter can be chossen, here @
sed 's@/bin/bash@/bin/sh@' /etc/passwd
# Substitute and print only changes
sed -n 's/bin/binary/p' /etc/passwd
# Write changes to file and backup original file with i-option
sed -i.bak 's/bin/binary/' /etc/passwd
# Insert line before line starting with 'root'
sed '/^root/ i line to be inserted' /etc/passwd
# Insert line after line starting with 'root'
sed '/^root/ a line to be appended' /etc/passwd
# Delete line starting with 'root'
sed '/^root/ d' /etc/passwd
# Multiple sed expressions in the command line
sed '{
/^root/ i line to be inserted
/^root/ a line to be appended
/^root/ d
}' /etc/passwd
# Using a sed script file
sed -f myscript.sed /etc/passwd
# Uppercase 1st column, lowercase 2nd column in comma-separted file with substituting grouping
# Substitution group \([^,]*\) means everything else than a comma
sed 's/\([^,]*\),\([^,]*\)/\U\1,\L\2/ file.csv
# Substitute and subsequently execute
sed 's/^/sudo useradd/e user.list

view raw

sed

hosted with ❤ by GitHub

awk

awk is a scripting language for editing and analyzing texts. Input data is always processed line by line. The name awk comes from the initials of the developer.

awk [options] awk-script filename

There are a number of variables coming with awk:

  • FS: field separator
  • OFS: output field separator
  • RS: record separator
  • ORS: output record separator
  • NR: number of records in file
  • NF: number of fields in record
  • FILENAME: name of file being read
  • FNR: number of records relative to current file

Here are some examples of using awk:


# execute awk file
awk -f users.awk /etc/passwd
# print specific column and show total lines processed
BEGIN { FS=":" ; print "Username"}
{print $1}
END {print "Total users= " NR}
# print column where it meets critera
BEGIN { FS=":" ; print "Username"}
$3>499 {print $1}
# count lines beginning with 'root' and print total users
BEGIN { FS=":" ; print "Username"}
/^root/ {print $1 ; count++}
END {print "Total users= " count}
# Uppercase 1st column, lowercase 2nd column in comma-separted file with substituting grouping
# compare to same sed command above. this is much easier
awk -F"," {print toupper($1), tolower($2), $3} file.csv
# extract xml records which are separated by two new lines
BEGIN { RS="\n\n"}
$0 ~ search {print}
awk -f xml.awk search=example xmlfile
# Count number of specific element
BEGIN { FS=" "; print "Log access"}
{ip[$1]++} // value $1 is the key (associative array)
END { for (i in ip)
print i, " has accesed ", ip[i], "times."
}
# print max number of specific element
BEGIN { FS=" "; print "Most popular browser"}
{browser[$1]++}
END { for (b in browser)
if (max < browser[b]) {
max = browser[b];
maxbrwoser = b;
}
print "Most access was from ", maxbrowser, " and ", max, " times."
}

view raw

awk

hosted with ❤ by GitHub

Logging Aspect with Spring AOP

Logging is a cross-cutting concern and therefore AOP comes on. With Spring Boot it is quite easy to make AOP work and configure different aspects. Here, I will concentrate on a logger  aspect that logs the entry and exit of a method call. Additionally, the execution time is logged as well.

First, we need to clarify some terminology that comes with AOP:

  • aspect: Class containing cross-cutting concern
  • adivice: Job of an aspect
  • join point: possible point in the application where an aspect can be applied
  • pointcut: defines where an aspect is applied
  • weaving: process of applying aspects to a target object

Spring AOP aspects are woven in during runtime, i.e. during the execution of the application. AspectJ allows to weave in aspects during compile time.

Runtime weaving with Spring AOP is based on proxy usage and thus has several limitations like that private or protected methods, constructors and attributes can not be used as pointcuts. However, I think in most cases this is not a big deal. If these things should be used anyway, consider using AspectJ’s compile time weaving.

In the following,  I will quickly demonstrate how to integrate AOP with Spring and implement the a logging aspect. First of all, the aspectjweaver dependency has to be included in the pom.xml as this will provide some annotations used for defining an aspect. Its version is managed by Spring Boot.


<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjweaver</artifactId>
</dependency>

view raw

pom.xml

hosted with ❤ by GitHub

There are two pointcuts defined in the logger aspect, the first for  @RequestMapping and the second for @Profiled annotations. The aspect is called for all methods annotated with one of these annotations. For the case there are multiple aspects defined and triggered simultaneously, the aspect class is annotated with @Order and set to lowest precedence where with the execution order of the aspect is controlled.


@Aspect
@Order(Ordered.LOWEST_PRECEDENCE)
@Component
public class LoggerAspect {
private Logger logger;
public LoggerAspect() {
logger = LoggerFactory.getLogger(getClass());
}
@Pointcut("@annotation(org.springframework.web.bind.annotation.RequestMapping)")
public void requestMapping() {
}
@Pointcut("@annotation(com.xovis.standalone_multisensor.aspects.Profiled)")
public void profiled() {
}
@Around("requestMapping() || profiled()")
public Object profile(ProceedingJoinPoint pjp) throws Throwable {
StopWatch sw = new StopWatch();
String className = pjp.getTarget().getClass().getSimpleName();
String methodName = pjp.getSignature().getName();
String name = className + "." + methodName;
logger.debug("call " + name);
try {
sw.start();
return pjp.proceed();
} finally {
sw.stop();
logger.debug("exit " + name + " [" + sw.getTotalTimeMillis() + "ms]");
}
}
}

@Profiled is a custom annotation and is implemented as follows:


@Retention(RetentionPolicy.RUNTIME)
public @interface Profiled {
}

view raw

Profiled.java

hosted with ❤ by GitHub

The output of an intercepted method will look like this
c.x.s.aspects.LoggerAspect: call StatusController.getStatus
c.x.s.aspects.LoggerAspect: exit StatusController.getStatus [15ms]