Here's a page where I keep a couple of shell commands, for the next time I need them.

On the Mac. Every once in a while you run into a Unix command that doesn't quite work on the Mac. I have some notes on some of these exceptions here.

One-Line Commands

Getting Files Not Containing a String

Use a combination of find and grep to get a list of files not containing a particular string.

In this example, we are searching a directory of markdown files to find all the ones that don't contain the string 'grep'.

$ find . -name '*.md' | xargs grep -H -c 'grep' | grep ':0$' | cut -d':' -f1
./web.md
./macNotes.md
./index.md
./styleGuide.md

The pipes in the command above delineate the following steps:

  1. Find all the .md files in this directory and below.
  2. grep through these files for those containing the string 'grep,' but outputting just the file name and the count of the number of times that the word appears in that file (options provided by -H -c). So one line of the output of this sub-command, representing the file commandline.md, is: ./commandline.md:24, meaning that 'grep' occurs 24 times in this file.
  3. grep out only those lines whose count is 0.
  4. Remove the count from the output, retaining only the file name.

perl

Here are some useful one-liners that may serve as a starting point for your own. For details on the arguments (-pe, -nle, and so on), see the "perl Options" sections which follows.

(For my introduction to one-line Perl programs, I owe a big thank you to Peteris Krumins and his Perl One-Liners: 130 Programs That Get Things Done.)

perl -pe 's/\t/\n/g' data.txt

  • Replace tab (\t) with newline in the file data.txt. Output goes to stdout.

perl -pi.bak -e 's/\t/\n/g' data.txt

  • Copy data.txt to data.txt.bak, then in data.txt replace tab with newline in the file data.txt.

perl -nle 'my @vals = $_ =~ / 212.* /g; print "@vals"' data.txt

  • Read every line of the file data.txt and output every string that starts with "212" preceded by a space character, ending with the last space character on that line. Output goes to stdout.

Numbering Strings

Peteris Krumins has an entire chapter devoted to numbering strings. I had a very special requirement that took me longer to work out than I expected. Here I start with one of Peteris's numbering examples and show how I modified it for my particular situation.

Both examples start with a simple text file, such as the following from Wikipedia:

Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages, Perl 5 and Perl 6.Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.

We assume this is in a file named file.txt.

perl -pe 's/(\w+)/++$i/ge' file.txt

  • Replace all words with their numeric positions. The result is:

1 2 3 4 5 6 7-8, 9-10, 11, 12 13 14, 15 16 17 18 19.20 21 22 23 24 25 26, 27 28 29 30 31 32, 33 "34 35 36 37 38". 39 40 41 42 43 44 45 46 47 48 49 50-51 52 53 54 55 56 57 58 59.

Notice that only word characters are replaced; punctuation like '-' and '.' is not.


perl -pe 's/(Perl)/++$i/ge' file.txt

perl -pe '$i = ($j==0) ? 150 : $j+2;s/(Perl)/++$i/ge;$j=$i;' file.txt

  • My requirements were slightly different. I wanted to:

    • Replace only a particular string, not every word.

    • Have the option, but the not the requirement, of starting with a given number.

The first of the above commands starts the counting at 0 and its result is:

1 is a family of two high-level, general-purpose, interpreted, dynamic programming languages, 2 5 and 3 6.Though 4 is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". 5 was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.

(Don't be confused by the "2 5" and the "3 6". These strings replace the strings Perl 5 and Perl 6.)

The second command allows you to specify a starting number for the counting. In this case the starting number is 150, meaning that the first string is replaced by "151". I fully admit that a prettier version may be possible, but it's the best I could come up with in a minimal amount of time. It produces the following:

151 is a family of two high-level, general-purpose, interpreted, dynamic programming languages, 152 5 and 153 6.Though 154 is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". 155 was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier.

perl Options

The examples above use these perl options. Of course you can use any option with any particular Perl command. Multiple options (e.g. -p -e) can be merged together (-pe).

Option Explanation
-e Indicates you're running a Perl program from the command line.
-p Run the command on every line of input, and print every line after execution.
-n Indicates you don't want every line printed. (Use a print statement to write output.)
-i Edit the file in place--overwriting the file contents.
-pi.bak Modify the file in place, but first copy the file to a new backup file with '.bak' appended to its name. If the backup file already exists, overwrite it.
-l Run chomp on the line before input (i.e. remove each line's newline), but insert the newline back when printing.

cut

cut -c1-3 data.txt

  • Extract the first through third column of each line of the file data.txt.

cut -c3- data.txt

  • Extract from the third character to the end of the line for each line of data.txt.

cut -d' ' -f2 data.txt

  • Extract the second field, where space is the delimiter, of data.txt.

find

Default behavior of find searches into a directory structure recursively; that is, it explores every sub-directory it finds. For this reason, some searches can take a very long time. Also, some people advocate always eliminating error messages and warnings from the output (by appending 2>/dev/null to a find command), so that your search doesn't become misleading when you pipe it to some other command.

find . -size 0

find . -empty

  • Two ways to find all empty files starting in the current directory, recursively down into the tree. The "-ls" argument lists the file properties in detail; remove it to display just the file names.

find . -not -empty -ls

  • Find all non-empty files starting in the current directory, recursively, and provide full listing information on each one.

find . -maxdepth 1 -empty -ls 2>/dev/null

  • Find all empty files in the current directory, non-recursively—i.e., in the current directory only—and don't output any error messages.

find . -size 0 -print0 2>/dev/null | xargs -0 rm

  • Find and delete all empty files starting in the current directory, recursively.

find . -name "*txt" 2>/dev/null

  • Find all the txt files in the current directory, recursively.

find . -name "*txt" 2>/dev/null | wc -l

  • Get a count of all the txt files starting in the current directory, recursively.

find . -type f 2>/dev/null | wc -l

  • Find and get a count of all regular files (e.g., not directories), recursively.

find . -maxdepth 1 -type d 2>/dev/null | while read -r dir; do printf "%s:\t" "$dir"; find "$dir" -type f | wc -l; done

  • For each directory in the current directory (find . -maxdepth 1 -type d), do a recursive count of all the regular files in that directory (find "$dir" -type f | wc -l), and print the result.

find options

The examples above use these options. (Some are technically not options, but commands that can be appended to the find command.)

Option Explanation
-name pattern Limit search to files whose name matches pattern.
-maxdepth n Override default recursive behavior by limiting the descent into the tree by n levels.
-type t Limit search to files of type t (f = regular file; d = directory; other options available).
-size 0
-empty
Two ways of limiting output to empty files and directories.
-not -empty Limit output to non-empty files and directories.
-ls Provide full listing information on the files found.
2>/dev/null Do not return any error messages (e.g., permission warnings).
| wc -l Instead of printing the result, print just a count of the number of lines in the result.

find piped to other commands

Here's a nifty set of commands that returns a list of all the extension types belonging to all the files in a directory, recursively.

# Sort by file extension.
$ find . -type f 2>/dev/null | rev | egrep '^\w+\.' | cut -d '.' -f 1 | rev | sort | uniq -c
      2 docx
     10 html
      3 png
    106 py
      5 xml

# Sort by count.
$ find . -type f 2>/dev/null | rev | egrep '^\w+\.' | cut -d '.' -f 1 | rev | sort | uniq -c | sort -n
      2 docx
      3 png
      5 xml
     10 html
    106 py

rev lets us get the last field after a cut by first reversing the line, then getting the first field after the cut, then reversing the line back to its original order.

egrep '^\w+\.' handles the case where files have no extension by filtering them out. (See note below.)

The combination of sort | uniq -c gets us the counts.

Historical Note

  • The original version of this command was find . -type f 2>/dev/null | rev | cut -d '.' -f 1 | rev | sort | uniq -c.
  • It performed poorly on files that had no extension, counting each one—with its full path—as a separate extension.
  • The piped command grep '^\w+\.' seems to fix this.

Other Useful Commands & Techniques

ls !(*.txt)

  • Display the files which don't have the .txt extension. Also works with grep, as in grep -i 'track record' !(*.txt).

ls -d t*

  • Non-recursive ls of all directory contents whose names start with "t". (Without -d you also get the contents of the items if they are directories.)

ls -1

  • Perform an ls on this directory, displaying each item on a separate line. (That's the digit "1" after the minus sign, not the letter "l".)

Ctrl-a

  • Go to the start of the command at the current command line.

Ctrl-e

  • Go to the end of the command at the current command line.

unzip -qq -t

  • Check whether a jar file is corrupt. If nothing appears, the jar file is NOT corrupt.

$ sudo !!

  • Run the last command as root.

$ ^<string_1>^<string_2>

  • Run the previous command, replacing '<string_1>' with '<string_2>'. Example:
$ grep Python python/virtualenv.md
title: Python Virtual Environments

$ ^Python^error
grep error python/virtualenv.md
I got the following error.

rename

  • Rename files.
  • Change all .md files to .txt files with: rename -v 's/.md/.txt/g' *
  • If your system doesn't have rename, see For Each File in pwd below.

Multi-Line Shell Scripts

Here are a couple of simple scripts that I often use as template for other multi-line shell scripts.

For Each File in pwd

This script renames all the .txt files whose names begin with "data" in the current directory, removing "data" from each one's name.

for i in data*.txt
do
    mv "$i" "`echo $i | sed 's/data//'`"
done

For Each Line in a File

This script takes two arguments:

  • The first is the name of a file to read. In this example each line contains a directory path.
  • The second is a directory path, providing a destination for a repeated copy command.

The script reads every line in the file and uses it to copy files to the directory given in the second argument.

Bare-bones script

exec 4<$1
echo Start
while read -u4 p ; do
    echo $p
    cp $p $2
done
echo Done

The use of the file descriptor 4 was new for me. I learned of it from one of the less popular answers to this Stackoverflow question.

The next code snippet shows how you would use this script, assuming:

  • The name of the shell script is copyFilesToDest.
  • The script has been made executable with chmod.
  • The files to be copied are listed with their full path in a file in the present working directory called fullPathFileList.
  • You have a directory in the present working directory called fileListDest.
$ ./copyFilesToDest.sh fullPathFileList fileListDest/
Start
/Users/[...]/source/aaa.txt
/Users/[...]/source/ccc
Done

$ ls fileListDest/
aaa.txt ccc

A more complicated example:

With the bare-bones version as a starting point, the script can easily be modified to suit a wide variety of situations.

exec 4<$1
echo Start
while read -u4 p ; do
    echo $p
    cp source/$p.* $2
done
echo Done

In this version, the paths in the input file needn't be full paths; they are simply a list of files in source, a directory in the present working directory. Also, the input list is expected to consist of file name "stems"—for example, "bbb" in the file name "bbb.txt."

$ ls source
aaa.txt bbb.txt bbb.xml ccc ddd.csv ddd.xml

$ cat stemList
bbb
ddd

And here's how you run the script from the command line, assuming of course that the script is in a file called copyMatchingToDest.sh:

$ ls stemListDest/

$ ./copyMatchingToDest.sh stemList stemListDest/
Start
bbb
ddd
Done

$ ls stemListDest
bbb.txt bbb.xml ddd.csv ddd.xml

Tricks with grep

grep is great. And powerful. Here are a few ways you can maximize its usefulness.

Counting String Instances on a Single Line

We often use grep <string> | wc -l to count the number of times a string appears in a file.

But what that command actually does is count the number of lines in the file that contain that string. If the string appears more than once on a single line, it's not counted.

What if you want to know the number of times the string occurs in the file? Or, what if what you really want is a count of how many times the string appears on each line?

Here I show you how to do both of these.

Let's start with a file called data.txt that contains multiple instances of the string "man".

A man is a man.
A man who would hurt a friend is no friend of mine.
The cat sat on a hat.
One small step for man, one giant leap for mankind.

Here we get a list of each line containing the word "man". Then we count how many of those lines there are.

$ grep "man" data.txt
A man is a man.
A man who would hurt a friend is no friend of mine.
One small step for man, one giant leap for mankind.

$ grep "man" data.txt | wc -l
       3

To get instead the total number of times "man" occurs in the file, use grep's "-o" option, which puts each hit on a separate line. Then you can do your line counting.

$ grep -o "man" data.txt
man
man
man
man
man

$ grep -o "man" data.txt | wc -l
       5

If you don't want "mankind" included in your count, specify word boundaries.

$ grep -o "\bman\b" data.txt
man
man
man
man

$ grep -o "\bman\b" data.txt | wc -l
       4

But, as I said above, what if what you really want is to know which lines contain the word "man," and how many instances of the word are on each line? Here's a series of commands that will get you just that. (Unfortunately, this seems only to work on Linux—I haven't been able to figure out how to easily do this on a Mac.)

The starting point is combining the -o and the -c options. (This is where the Linux and Mac worlds divide. The output on my Mac is slightly different, causing the following steps to fail.)

$ grep -o -n man data.txt 
1:man
1:man
2:man
4:man
4:man

The -c option provides the line number of the match. Now all we have to do is some easy formatting, then get a unique count.

$ grep -o -n man data.txt  | cut -d : -f 1 | uniq -c
      2 1
      1 2
      2 4

Here's how to read this output. Line 1 contains two instances of the string "man"; line 2 contains 1; and line 4 contains two more.

We can pretty-up the output a little bit with a few perl one-liners, borrowed from the top of this page.

$ grep -o -n man data.txt  | cut -d : -f 1 | uniq -c | perl -p -e 's/^      /count:/g' | perl -p -e 's/ / line: /g'
count:2 line: 1
count:1 line: 2
count:2 line: 4

Performing grep on Multiple Patterns

Here are a few quick examples on how to grep on multiple patterns.

Use egrep instead of grep if you want to place your patterns in the bash command itself.

$ egrep "cat|dog" *.txt

To grep on a file of patterns, put your patterns into a file, one pattern per line. Let's call the file patterns.txt. Then invoke fgrep with -f, thus:

$ cat patterns.txt
cat
dog
octopus
zebra

$ fgrep -f patterns.txt *
<...>

The command above expects the patterns to be fixed strings, not regular expressions. To allow regular expressions in the file, use egrep -f:

$ cat patterns.txt
d*

$ egrep -f patterns.txt *
<...>

You can also grep on a list of patterns with a command line variable. Here we're going to create a command-line variable called patterns, then instruct grep to use it.

$ patterns="
cat
dog
octopus
zebra"

$ grep "${patterns}" *
<...>