How to use grep command in Linux (Along with Regular Expressions)

The grep Command

Like most UNIX commands, grep is a mnemonic. The grep mnemonic is derived from ex editor commands. The meaning is globally (g) search for a regular expression (re) and print (p) the results (grep). The grep utility searches text files for a specified pattern and prints all lines that contain that pattern. If no files are specified, grep assumes it will receive a text from standard input.

Consider the following scenario. A user comes to you and says that the msxyz program on her machine has locked up her machine. She cannot get the program to halt. It is your job to find the process and stop it. The ‘ps -ef’ command gives you a long list of running processes. The output is probably too long to find this user’s process. You want a single line or a specific process list.

$ ps -ef | grep 'msxyz'

The previous grep command takes the ‘ps -ef’ command output as its input. The grep command performs a search for the string msxyz and prints the results. This gives you specific lines to review. The grep command syntax is:

grep [options] pattern_file

You have multiple terminal and console windows open. To see those specific processes, execute a ps command and search for the dtterm command.

$ ps -e | grep 'dtterm'
   352 ??       0:00 dtterm
   353 ??       0:13 dtterm
   354 ??       0:11 dtterm
   1766 pts/5    0:00 dtterm

The grep Options

The table below shows grep command options. These options modify grep behavior.

Option Meaning
-i Makes the command case insensitive
-c Prints the count of lines that match
-l Prints the names of the files in which the lines match
-v Prints the lines that do not contain the search pattern
-n Prints the line numbers
$ grep -i 'the' /etc/default/login
# Set the TZ environment variable of the shell.
# ULIMIT sets the file size limit for the login.  Units are disk blocks.
# The default of zero means no limit.
# ALTSHELL determines if the SHELL environment variable should be set
# PATH sets the initial shell PATH variable

Compare this to the output of the following command, which prints only the lines of text that match the pattern “The”.

$ grep 'The' /etc/default/login
# The default of zero means no limit.
# bad password is provided.  The range is limited from
# The SYSLOG_FAILED_LOGINS variable is used to determine how many failed

The -c option counts the number of lines that match the pattern. It then prints the count and not the actual lines that matched the pattern.

$ grep -ci 'the' /etc/default/login
19
$ grep -c 'The' /etc/default/login
3

Use the -l option to:

  • Search for a string in many files.
  • Have the output list the only files in which the string is found.

The -l option is often useful when you want to feed the output of grep to another utility to process a list.

# grep -l 'grep' /etc/init.d/*
/etc/init.d/apache
/etc/init.d/cachefs.daemon
/etc/init.d/dhcp
/etc/init.d/dodatadm.udaplt
/etc/init.d/dtlogin
/etc/init.d/imq
/etc/init.d/init.wbem
/etc/init.d/ncakmod
/etc/init.d/swupboots

To find a search pattern in a large file, print the line number before each match using the -n option. This is useful when you are editing files:

# grep -n 'user' /etc/passwd
18:user1:x:100:10::/export/home/user1:/bin/sh
19:user2:x:101:10::/export/home/user2:/bin/ksh

The -v option prints lines that do not contain the search pattern.

# grep -v 'root' /etc/group
staff::10:
sysadmin::14:
smmsp::25:
gdm::50:
webservd::80:
postgres::90:
nobody::60001:
noaccess::60002: nogroup::65534:

Regular Expression Metacharacters

A regular expression (RE) is a character pattern that matches the same characters in a search. Regular expressions:

  • Allow you to specify patterns to search in text.
  • Provide a powerful way to search files for specific pattern occurrences.
  • Give additional meaning to patterns (as shown in Table4-2).

When you use regular expression characters with the grep command, enter quotes around the pattern. Some regular expression characters used by grep are also metacharacters to one or more shells, and a shell might use a metacharacter as a file name metacharacter. Use single (’) quotes. Doing this hides more metacharacters from a shell.

The table below grep command Metacharacters:

Metacharacter Function
\ Escapes the special meaning of an RE character
^ Matches the beginning of the line
$ Matches the end of the line
\< Matches the beginning of word anchor
\> Matches the end of word anchor
[] Matches any one character from the specified set
[-] Matches any one character in the specified range
* Matches zero or more of the preceding character
. Matches any single character
\{ \} Specifies the minimum and maximum number of matches for a regular expression

Regular Expressions

Using a regular expression, you can search the current process table (and header) for any process that contains a capital letter. Do this by using the following range as the pattern to the grep command:

# ps -ef | grep '[A-Z]'
     UID   PID  PPID   C    STIME TTY         TIME CMD
	 root   647     1   0 06:14:45 ?           0:00 /usr/lib/dmi/snmpXdmid -s sls-s10
	 host
	 noaccess   797     1   0 06:15:03 ?           1:34 /usr/java/bin/java -server -Xmx128m
	 XX:+UseParallelGC -XX:ParallelGCThreads=4
	 root   708   704   4 06:14:50 ?           5:22 /usr/X11/bin/Xorg :0 -depth 24
	 nobanner -auth /var/dt/A:0-9Aaayb
	 root   813   739   0 06:15:16 ?           0:00 /bin/ksh /usr/dt/bin/Xsession
	 root   905   903   0 06:15:27 pts/2       0:00 -sh -c      unset DT;     DISPLAY=:0;
	 /usr/dt/bin/dtsession_res -merge
	 root  1045     1   1 06:15:51 ?           1:10 /usr/lib/mixer_applet2 --oaf
	 activate-iid=OAFIID:GNOME_MixerApplet_Factory --oa
	 root  1050     1   0 06:15:52 ?           0:01 /usr/lib/notification-area-applet -oaf-activate-iid=OAFIID:GNOME_NotificationA
	 root  1440  1284   0 08:20:35 pts/4       0:00 grep [A-Z]

If you are only interested in current processes that contain the capital letter A in the line, limit the pattern to specify that character.

# ps -ef | grep 'A'
root   708   704   7 06:14:50 ?           5:26 /usr/X11/bin/Xorg :0 -depth 24
nobanner -auth /var/dt/A:0-9Aaayb
root   905   903   0 06:15:27 pts/2       0:00 -sh -c      unset DT;     DISPLAY=:0;
 /usr/dt/bin/dtsession_res -merge
 root  1442  1284   0 08:21:17 pts/4       0:00 grep A

Escaping a Regular Expression

To escape a regular expression, use a \ (backslash) followed by a single character matches that character. Thus, a \$ matches a dollar sign and a \. matches a period. Doing this divests a metacharacter of its special meaning. The following example shows the $ as a regular expression character that matches the end of a line.

# grep '$' /etc/init.d/nfs.server
#!/sbin/sh

case "$1" in
'start')
       svcadm enable -t network/nfs/server
       ;;
'stop')
        svcadm disable -t network/nfs/server
		;;
*)        echo "Usage: $0 { start | stop }"
       exit 1
	   ;;
esac

The output contains all the lines from the script because the $ matches the end-of-line character for each line in the script. Verify this with the wc command.

$ grep '$' /etc/init.d/nfs.server | wc -l
     24
$ wc -l /etc/init.d/nfs.server
     24 /etc/init.d/nfs.server

To display only the lines from the nfs.server boot script that contain the literal character $, hide its special meaning by preceding the character with the \ regular expression character.

$ grep '\$' /etc/init.d/nfs.server
case "$1" in
        echo "Usage: $0 { start | stop }"
$ grep '\$' /etc/init.d/nfs.server | wc -l
 2

Line Anchors

An anchor is a symbol that matches a character position on a line. The ^ and $ anchors match text patterns relative to the beginning ^ or ending $ of a line of text. For example, the following command finds all lines that contain the pattern root in the /etc/group file:

$ grep 'root' /etc/group
root::0:
other::1:root
bin::2:root,daemon
sys::3:root,bin,adm
adm::4:root,daemon
uucp::5:root

If you intend to display only the one entry for the root group in the /etc/group file, then the pattern must specify that the line begins with the pattern (given the syntax of the file).

$ grep '^root' /etc/group
root::0:

The regular expression character allows you to anchor the pattern match to the beginning of the line. Similarly, the $ regular expression character allows you to anchor the pattern match to the end of the line. Lines print only if the specified pattern represents the characters preceding the end-of-line character.

$ grep 'mount$' /etc/vfstab
#device      device       mount        FS      fsck    mount   mount

Word Anchors

A backslash used with an angle bracket is a word anchor. The less-than bracket () marks the end of a word. Text that precedes this bracket is matched only when it occurs at the end of a word. Words are delimited by spaces, tabs, beginnings of line, ends of line, and punctuation. For example, if you wanted to print the group file entry for the uucp group, issuing the grep command without regular expression characters gives you the uucp and nuucp group entries. Using the following command, however, should give only the single group entry for uucp.

$ grep '\<uucp'  /etc/group
uucp::5:root

Use both word anchors at the same time to ensure your pattern is a complete word by itself, rather than a sub-string of another word. Note the output if you search for the pattern user in the /etc/passwd file.

$ grep 'user' /etc/passwd
user:x:100:1::/home/user:/bin/sh
user2:x:101:1::/home/user2:/bin/sh
user3:x:102:1::/home/user3:/bin/sh

The preceding output includes lines with user as a sub-string of words, such as user2, and user3. If you were searching for the specific user named user, you should use both word anchors (or the -w option).

$ grep '\' /etc/passwd
user:x:100:1::/home/user:/bin/sh

Character Classes

A string enclosed in square brackets specifies a character class. Any single character in the string is matched. For example, the grep ‘[abc]’ frisbee command displays every line that contains an a, b, or c in the frisbee file. The following command prints the lines from the /etc/group file that contain either the letter i or the letter u.

$ grep '[iu]' /etc/group
bin::2:root,daemon
sys::3:root,bin,
adm uucp::5:root
mail::6:root
nuucp::9:root
sysadmin::14:
nogroup::65534:

You might also specify a range of characters, which results in printing lines that contain at least one of the specified characters in the range.

$ grep '[u-y]' /etc/group
sys::3:root,bin,adm
uucp::5:root
tty::7:root,
adm nuucp::9:root
sysadmin::14:
webservd::80:
nobody::60001:
nogroup::65534:

The following examples show the contents of the teams file and how character classes can be used to find the word the or The in any line in the teams file:

$ cat teams Team
one consists of
Tom
Team two consists of
Fred
The teams are chosen randomly.
Tea for two and Dom
Tea for two and Tom

$ grep '\' teams
The teams are chosen randomly.

Line Anchors

An anchor is a symbol that matches a character position on a line. The ^ and $ anchors match text patterns relative to the beginning ^ or ending $ of a line of text. For example, the following command finds all lines that contain the pattern root in the /etc/group file:

$ grep 'root' /etc/group
root::0:
other::1:root
bin::2:root,daemon
sys::3:root,bin,adm
adm::4:root,daemon
uucp::5:root

If you intend to display only the one entry for the root group in the /etc/group file, then the pattern must specify that the line begins with the pattern (given the syntax of the file).

$ grep '^root' /etc/group
root::0:

The regular expression character allows you to anchor the pattern match to the beginning of the line. Similarly the $ regular expression character allows you to anchor the pattern match to the end of the line. Lines print only if the specified pattern represents the characters preceding the end-of-line character.

$ grep 'mount$' /etc/vfstab
#device      device       mount        FS      fsck    mount   mount

Word Anchors

A backslash used with an angle bracket is a word anchor. The less-than bracket () marks the end of a word. Text that precedes this bracket is matched only when it occurs at the end of a word. Words are delimited by spaces, tabs, beginnings of line, ends of line, and punctuation. For example, if you wanted to print the group file entry for the uucp group, issuing the grep command without regular expression characters gives you the uucp and nuucp group entries. Using the following command, however, should give only the single group entry for uucp.

$ grep '\<uucp'  /etc/group
uucp::5:root

Use both word anchors at the same time to ensure your pattern is a complete word by itself, rather than a sub-string of another word. Note the output if you search for the pattern user in the /etc/passwd file.

$ grep 'user' /etc/passwd
user:x:100:1::/home/user:/bin/sh
user2:x:101:1::/home/user2:/bin/sh
user3:x:102:1::/home/user3:/bin/sh

The preceding output includes lines with user as a sub-string of words, such as user2, and user3. If you were searching for the specific user named user, you should use both word anchors (or the -w option).

$ grep '\' /etc/passwd
user:x:100:1::/home/user:/bin/sh

Character Classes

A string enclosed in square brackets specifies a character class. Any single character in the string is matched. For example, the grep ‘[abc]’ frisbee command displays every line that contains an a, b, or c in the frisbee file. The following command prints the lines from the /etc/group file that contain either the letter i or the letter u.

$ grep '[iu]' /etc/group
bin::2:root,daemon
sys::3:root,bin,
adm uucp::5:root
mail::6:root
nuucp::9:root
sysadmin::14:
nogroup::65534:

You might also specify a range of characters, which results in printing lines that contain at least one of the specified characters in the range.

$ grep '[u-y]' /etc/group
sys::3:root,bin,adm
uucp::5:root
tty::7:root,
adm nuucp::9:root
sysadmin::14:
webservd::80:
nobody::60001:
nogroup::65534:

The following examples show the contents of the teams file and how character classes can be used to find the word the or The in any line in the teams file:

$ cat teams Team
one consists of
Tom
Team two consists of
Fred
The teams are chosen randomly.
Tea for two and Dom
Tea for two and Tom

$ grep '\' teams
The teams are chosen randomly.

Character Match

Single Character Match

The . regular expression character matches any one character except the newline character. The following command looks for all lines containing c, followed by any three characters, followed by h.

$ grep 'c...h' /usr/dict/words

The following command looks for all lines that do not have a character before the c; that is, the c is the first character, followed by any three characters, followed by h.

$ grep '^c...h' /usr/dict/words

The following command looks for all lines that do not have a character before the c, followed by any three characters, followed by h, which is the end of the word; that is five-letter words that begin with c and end with h.

$ grep '^c...h$' /usr/dict/words

Character Match by Specifying a Range

The \{ and \} expressions allow you to specifiy the minimum and maximum number of matches for a regular expression. The following example shows the use of this expression.

$ cat test
root
rooot
roooot
rooooot
$ grep 'ro\{3\}t' test
rooot
$ grep 'ro\{2,4\}t' test
root
rooot
roooot

Closure (*)

The *, when used in a regular expression, is termed a closure. The closure symbol matches the preceding symbol or character zero or more times.

$ grep 'Team*' teams
Team one consists of
Team two consists of
Tea for two and Dom
Tea for two and Tom

For example, to find all lines that contain a word beginning with T and a word ending with m, use the following command:

$ grep '\' teams
Team one consists of
Tom
Team two consists of
Tea for two and Dom
Tea for two and Tom

An asterisk (*) has special meaning only when it follows another character. If it is the first character in a regular expression or if it is by itself, it has no special meaning. The following example searches for lines containing a literal asterisk within the file called teams.

$ grep '*' teams

The asterisk has another meaning outside of the regular expression. The following command searches all files in the current directory for the string abc.

$ grep 'abc' * data1:abcd

In the above example, the * is a shell metacharacter rather than a grep metacharacter.

The egrep Command

The egrep command (or the extended grep command) searches a file for a pattern using full regular expressions. For example:

# grep "two | team" teams
# egrep "two | team" teams
Team two consists of
The teams are chosen randomly.
Tea for two and Dom
Tea for two and Tom