The power of grep, sed, and awk

Grep

g/re/p: globally search for a regular expression and print matching lines

Usage

grep [OPTION]... PATTERNS [FILE]...

instructions from running grep --help

PATTERNS can contain multiple patterns separated by newlines. When FILE is ‘-‘, read standard input. With no FILE, read ‘.’ if recursive, ‘-‘ otherwise. With fewer than two FILEs, assume -h. Exit status is 0 if any line is selected, 1 otherwise; if any error occurs and -q is not given, the exit status is 2.

Grep supports basic regular expressions by default.

most used switches

--color={always|never|auto}
--exclude-dir=GLOB: skip directories that match GLOB
--exclude=GLOB: skip files that match GLOB
--include=GLOB: search only files that match GLOB (a file pattern)
-C, --context=NUM: print NUM lines of output context
-E, --extended-regexp: PATTERNS are extended regular expressions
-F, --fixed-strings: PATTERNS are strings
-L, --files-without-match: print only names of FILEs with no selected lines
-c, --count: print only a count of selected lines per FILE
-i, --ignore-case: ignore case distinctions in patterns and data
-l, --files-with-matches: print only names of FILEs with selected lines
-n, --line-number: print line number with output lines
-q, --quiet, --silent: suppress all normal output
-r, --recursive: search all files recursively in current directory
-v, --invert-match: select non-matching lines

a nice alias for grep

alias grep="grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox}"

Alternatives

ripgrep

Sed

stream editor

Usage

sed [OPTION]... {script-only-if-no-other-script} [input-file]...

instructions from running sed --help

If no -e, –expression, -f, or –file option is given, then the first non-option argument is taken as the sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read.

Sed supports basic regular expressions by default.

most used switches

--debug: annotate program execution
-E, -r, --regexp-extended: use extended regular expressions in the script (for portability use POSIX -E).
-e script, --expression=script: add the script to the commands to be executed
-i[SUFFIX], --in-place[=SUFFIX]: edit files in place (makes backup if SUFFIX supplied)
-n, --quiet, --silent: suppress automatic printing of pattern space

most used commands

`d`

DELETE pattern space

<address>d

seq 10 | sed 1,5d # 6 7 8 9 10
echo "hello\n\nworld" | sed '/^$/d' # hello world

`p`

PRINT pattern space to stdout. Usually used with -n

<address>p

seq 10 | sed -n 1~3p # 1 4 7 10

`q`

QUIT

<address>q

seq 10 | sed 3q # 1 2 3

`s`

SUBSTITUTE

<address>s/regexp/replacement/flags

command | sed -E 's/(apple)/\U\1/g'

/ may be replaced by any other single character. This is particularly useful if regexp itself contains /.

sed 's#/#_#g' <<< "path/to/file" # path_to_file

`{ Commands }`

<address>{ command1; command2; command3 }

seq 10 | sed '{3,7s/[[:digit:]]/x/; /x/d}' # 1 2 8 9 10

Special sequences

\E: stop case conversion
\L: lowercase all characters after it
\U: uppercase all characters after it
\l: lowercase next character
\u: uppercase next character

Flags

number: replace the number^th^ match
g: replace all matches (first match by default)
I: case-insensitive mode

Address

number: the number^th^ line
/regexp/: lines matching a pattern
/regexp/I: lines matching a pattern in case-insensitive mode
start,end: lines within a range
start~step: every step^th^ line from the start^th^ line
$: the last line

printf "%s\n" a b c | sed '1d' # b c
printf "%s\n" a B c | sed '/b/Id' # a c
printf "%s\n" a b c | sed '$d' # a b

Appending an ! after an address negates the selection

printf "%s\n" a b B c | sed '/b/I!d' # b B

Snippet

capitalize

function capitalize {
  sed -E "s/^(.)/\U\1/"
}

dequote

function dequote {
  sed -e "s/^'//" -e "s/'$//"
}

Awk

[Designers] Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan @ AT&T Bell Laboratories

Usage

awk [POSIX or GNU style options] [--] 'program' file ...

Awk supports most extended regular expressions (ERE) by default.

most used switches

-F fs, --field-separator=fs
-v var=val

Program structure

<pattern> { <action1>; <action2> }; <pattern> { <action1>; <action2> }

Patterns

BEGIN
END

BEGIN and END are two special kinds of patterns which are not tested against the input. The action parts of all BEGIN patterns are merged as if all the statements had been written in a single BEGIN rule. They are ex‐ ecuted before any of the input is read. Similarly, all the END rules are merged, and executed when all the input is exhausted (or when an exit statement is executed). BEGIN and END patterns cannot be combined with other patterns in pattern expressions. BEGIN and END patterns cannot have missing action parts.
<test_expression>

Tests whether certain fields match certain regular expressions.
/<regexp>/

See regular expressions section in man awk
&&, ||, !

AND, OR, NOT
/<regexp1>/, /<regexp2>/

Range
null (empty pattern)

Matches all records

Actions

Action statements are enclosed in braces, { and }. Action statements consist of the usual assignment, conditional, and looping statements found in most languages. The operators, control statements, and input/output statements available are patterned after those in C.

For a detailed list, see actions section in man awk.

most used control statement

if (<cond>) <stmt> else <stmt>

most used i/o statements

next

Stop processing current record
print <expr_list>

Print a comma or space separated expression list
printf <fmt> <expr_list>

Similar to that in bash

Both print and printf can be followed by redirection > and >>.

Builtins

Variable

FS: Field separator

OFS: Output field separator

Print username and default shell

awk 'BEGIN {FS=":"; OFS="-";} {print $1, $NF}' /etc/passwd

Same as

awk -F':' -v OFS='-' '{print $1, $NF}' /etc/passwd

NR: Record number

Print file with line number

awk '{print NR, $0}' <filename>

Count line number

awk 'END {print NR}' <filename>

NF: Field number

Print the last field of each record

awk '{print $NF}' <filename>

Function

gsub(re, sub [, str]): Replace every substring matching re with sub in str (default $0)
sub(re, sub [, str]): Same as gsub, but only replace the first match
length(str): Length of str (default $0)
substr(str, idx [, len]): Substring of str at index idx of length len, use the rest of str if len is not provided

Grep

Usage

Alternatives

Sed

Usage

`d`

`p`

`q`

`s`

`{ Commands }`

Special sequences

Flags

Address

Snippet

Awk

Usage

Program structure

Patterns

Actions

Builtins

Variable

Function

Acknowledgements

CATALOG

FEATURED TAGS

FRIENDS