Project 23. Search File Content"How do I find all files containing the text Dear Janet before my wife does?" This project shows how to search a file, or many files, for particular text. The search term can be straight text or a regular expression. The project covers the commands grep, wc, awk, and sed. Learn More
Use grepThis chapter puts the spotlight on grep and friends. The grep command searches through files to find particular text that matches a search pattern. A file is searched line by line, and a match occurs when a line contains the search pattern. It's important to realize that the search is done line by line and that to match, a line need only contain the search pattern, not be identical to it. Let's search all text files (*.txt) in the current directory for the words Dear Janet. $ grep "Dear Janet" *.txt hello.txt:Dear Janet, lets-meet.txt:Dear Janet, sauciest of vixens secret-liaison.txt:Dear Janet, We see displayed all lines from all files that match, with the matching line of text preceded by the filename. grep OptionsThe grep command has many options, the most useful of which are explained below. To change the output format from filename:text of matching line, specify the following:
To change the pattern-matching rules, specify the following:
Use recursive mode:
Learn More
$ grep -r "Janet" * archive/old-letter.txt:Dear Janet, hello.txt:Dear Janet, lets-meet.txt:Dear Janet, sauciest of vixens secret-liaison.txt:Dear Janet, Tip
The next example of recursion doesn't work as expected. We intended to say, "Search the current directory recursively for all *.txt files." What actually happens is that the shell expands *.txt to include all matching filenames (which does not include the directory archive); grep then searches each filename in the expansion, and if it's a directory, grep does so recursively. We can't specify to grep both a directory to search recursively and at the same time which files to consider. $ grep -r "Janet" *.txt hello.txt:Dear Janet, lets-meet.txt:Dear Janet, sauciest of vixens secret-liaison.txt:Dear Janet, The solution is to use find and xargs. $ find . -iname "*.txt" -print0 | xargs -0 grep "Janet" ./archive/old-letter.txt:Dear Janet, ./hello.txt:Dear Janet, lets-meet.txt:Dear Janet, sauciest of vixens ./secret-liaison.txt:Dear Janet, Some grep ExamplesMac OS X has a handy dictionary (a list of words, but bereft of definitions) located at /usr/share/dict/web2. Let's use grep to count how many words contain the sequence xy. We use option -c to count the number of matches instead of displaying them. $ grep -c "xy" /usr/share/dict/web2 579 How many words start with xy? This requires the use of a regular expression that says "a line that starts xy". $ grep -c "^xy" /usr/share/dict/web2 75 Name two of them! (Xylophone is the easy one.) The grep command is often combined with command ps to look for specific processes. In the next example, grep filters the output from ps to display only those lines containing safari. (The ps command does not require its options to be preceded by dash.) Tip
$ ps axww | grep -i safari 27946 ?? S 31:08.79 /Applications/Safari.app/ Contents/MacOS/Safari -psn_0_1739980801 16705 std R+ 0:00.00 grep -i safari Learn More
If you want to use the results of this command to extract the process ID of Safari, for example, the second line of output is unwelcome. This can be eliminated in either of two ways. Use grep v. $ ps axww | grep -i safari | grep -v grep 27946 ?? S 31:09.33 /Applications/Safari.app/ Contents/MacOS/Safari -psn_0_1739980801 Learn More
Employ some clever regular-expression trickery. $ ps axww | grep -i "safar[i]" 27946 ?? S 31:09.50 /Applications/Safari.app/ Contents/MacOS/Safari -psn_0_1739980801 How does this safar[i] TRick work? It's a regular expression that's equivalent to "safari", so it still matches "Safari". The grep command line, however, does not match now because it contains "safar[i]" and not "safari". Think about it. Escape and Double EscapeRemember to enclose a regular expression in single quotes to avoid interpretation by the shell. The regular-expression sequence .* matches any string of characters, for example, but it must be escaped from the shell to stop the shell from treating the star as a globbing character and potentially expanding it. To match "line" and then any character sequence and then "1", we would type: $ grep 'line.*1' *.txt If we wish to search for the star character itself, star must also be escaped from regular-expression interpretation. To search for "line *1", we would type: $ grep 'line \*1' *.txt The escape character ensures that star is matched literally rather than being interpreted as a regular-expression operator. Refer to Project 77 if you are unfamiliar with regular expressions. The next line is equivalent. $ grep line\ \\\* *.txt Remember fgrep? It searches for fixed patterns and does not activate regular expressions, so we can type simply $ fgrep 'line *' *.txt Zipped FilesUse a grep -based command to examine the contents of a zip- or bzip2-compressed file directly by using these commands:
These bz variants correspond to the versions of grep discussed in the "grep Options" section above. Count WordsThe wc command counts the number of characters, words, and lines in a text file. It's often used to count the number of results returned by a command or pipeline. We can repeat the dictionary example from earlier by using wc. $ grep "xy" /usr/share/dict/web2 | wc -l 579 $ grep "^xy" /usr/share/dict/web2 | wc -l 75 Option -l says to count lines only, and you can guess at options -c and -w. Note
Note
Use awk to Isolate and Format TextThe awk command (named after its authors, Aho, Weinberger, and Kernighan) is a powerful pattern-processing language. It's explored in detail in Projects 60 and 62, but one (very simple) way it can be used is to isolate a particular portion of each line of text it receives as input. More specifically, this use of awk involves printing a selected field from the input textfield in this instance meaning a sequence of characters separated by white space. We can use awk to isolate Safari's process ID (PID) from the results of our earlier grep/ps search, for example. This example extends the earlier command with a pipeline to awk. An awk script, enclosed in single quotes, tells awk to print the value of the first field (field #1) of each input line. Because the first text string in a line of ps output is always a PID, this yields the PID of process Safari. $ ps axww | grep -i "safar[i]" | awk '{print $1}' 27946 The number 27946 is the PID of Safari, and this number can be given as an argument to the kill command to abort the running process. We'll enclose the pipeline sequence in $(), which tells Bash to execute it, write the result back to the command line, and then execute the new command line. Before we do any actual killing, use echo to demonstrate that the expression enclosed by $() still outputs the Terminal PID. $ echo $(ps axww | grep -i "safar[i]" | awk '{print $1}') 27946 Learn More
Now run kill. $ kill $(ps axww | grep -i "safar[i]" | awk '{print $1}') For completeness, let's create a shell function killer to kill a given process by name. $ killer () { kill $(ps axww | grep -i "$1" | ¬ grep -v "grep -i $1" | awk '{print $1}'); } $ killer safari Tip
The awk statement printf prints a formatted, or embellished, version of each input line. Here's a quick example of what can be done. $ ls -l | awk '{printf("Date: %s %s, File %s\n",$7,$6,$9)}' Date: , File Date: 13 Sep, File csv Date: 13 Sep, File double-space Date: 30 Aug, File script The first lineDate:, Fileresults from the first line written by ls -l. This can easily be removed with grep. Use sedThe sed command is a stream editor and, like awk, processes its input lines based on matching patterns. It's covered in detail in Projects 59 and 61, and we'll use it here simply to search text files for lines that match a given pattern (Jan). Here are a couple of examples equivalent to the grep examples shown earlier in this project. Option -n stops sed from echoing every input line, which it usually does. The construct /re/p searches for a regular expression (re) and displays the lines that contain it. $ sed -n '/Jan/p' *.txt Dear Janet, Dear Janet, sauciest of vixens Dear Jan, Dear Janet, Perhaps on Jan 31st? Next, we count the number of words starting with xy. $ sed -n '/^xy/p' /usr/share/dict/web2 | wc -l 75 To filter the output from ps: $ ps axww | sed -n "/Safar[i]/p" 470 ?? S 0:15.71 /Applications/Safari.app/ Contents/MacOS/Safari -psn_0_3407873 Ignoring case is less elegant. One has to convert all uppercase letters to lowercase (or vice versa) by using the awk function y and then match the pattern. $ ps axww | sed -n "y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ ¬ ;/safar[i]/p" 470 ?? s 0:15.71 /applications/safari.app/ contents/macos/safari -psn_0_3407873 |