{"id":185,"date":"2020-08-15T20:14:44","date_gmt":"2020-08-16T01:14:44","guid":{"rendered":"https:\/\/www.brezeale.com\/?p=185"},"modified":"2024-04-14T09:35:27","modified_gmt":"2024-04-14T14:35:27","slug":"185","status":"publish","type":"post","link":"https:\/\/www.brezeale.com\/?p=185","title":{"rendered":"Text Processing in Linux"},"content":{"rendered":"\r\n<p>I originally created and posted this November 22, 2004.<\/p>\r\n<p>Here are some examples of using the utilities found on Unix (available on some other platforms also) for manipulating the text in files. awk and perl both allow writing full programs, but I primarily use both as short one-liner programs which allows them to be piped to\/from other Unix programs. Each of these programs has capabilities that make it better than the others in some situations which I have attempted to demonstrate below. I don&#8217;t claim any of these to be original to me; references are at the bottom of the page.<\/p>\r\n<p>I have collected this information over the course of several years, during which time I have used Sun Solaris and various flavors of Linux. Note that the versions of these tools included with Solaris don&#8217;t entirely match the GNU versions, so some of what you see below may need to be tinkered with to make work.<\/p>\r\n<p>The philosophy of Unix utilities is to develop a tool that is very good at doing a specific thing. The output of a tool can be sent to another tool via the pipe (i.e., the<span class=\"example\"> | <\/span>character) as shown in several examples below. So, one program&#8217;s output becomes the next program&#8217;s input.<\/p>\r\n<p><a href=\"#awk\">awk<\/a>\u00a0\u00a0<a href=\"#cat\">cat<\/a>\u00a0\u00a0<a href=\"#csplit\">csplit<\/a>\u00a0\u00a0<a href=\"#cut\">cut<\/a>\u00a0\u00a0<a href=\"#find\">find<\/a>\u00a0\u00a0<a href=\"#fmt\">fmt<\/a>\u00a0\u00a0<a href=\"#fold\">fold<\/a>\u00a0\u00a0<a href=\"#grep\">grep<\/a>\u00a0\u00a0<a href=\"#head\">head<\/a>\u00a0\u00a0<a href=\"#join\">join<\/a>\u00a0\u00a0<a href=\"#nl\">nl<\/a>\u00a0\u00a0<a href=\"#paste\">paste<\/a>\u00a0\u00a0<a href=\"#perl\">perl<\/a>\u00a0\u00a0<a href=\"#sdiff\">sdiff<\/a>\u00a0\u00a0<a href=\"#sed\">sed<\/a>\u00a0\u00a0<a href=\"#sort\">sort<\/a>\u00a0\u00a0<a href=\"#split\">split<\/a>\u00a0\u00a0<a href=\"#tail\">tail<\/a>\u00a0\u00a0<a href=\"#uniq\">uniq<\/a>\u00a0\u00a0<a href=\"#wc\">wc<\/a> <br \/><br \/><a href=\"#examples\">Examples<\/a> <a href=\"#references\">References<\/a><\/p>\r\n<h3><a name=\"sedawkperl\"><\/a>sed, awk, and perl<\/h3>\r\n<p><b><a name=\"awk\"><\/a>awk<\/b> \u2014 good for working with files that contain information in columns.<\/p>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Display only the first three columns of the file <span class=\"example\">SOMEFILE<\/span>, using tabs to separate the results:<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk &#8216;{print $1 &#8220;\\t\\t&#8221; $2 &#8220;\\t&#8221; $3}&#8217; SOMEFILE<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Display the first and fifth columns of the password file with a tab between them<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk -F: &#8216;{print $1 &#8220;\\t&#8221; $5}&#8217; \/etc\/passwd<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><span class=\"example\">-F: <\/span>changes the column delimiter from spaces (the default) to a colon (:)<\/p>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Display the second column of the file using double colons as the field separator<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk -v &#8216;FS=::&#8217; &#8216;{print $2}&#8217; ratings.dat<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>replace first column as &#8220;ORACLE&#8221; in <span class=\"example\">SOMEFILE<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk &#8216;{$1 = &#8220;ORACLE&#8221;; print }&#8217; SOMEFILE<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>print the last field of every input line:<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk &#8216;{ print $NF }&#8217; SOMEFILE<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>print the first 50 characters of each line. if a line has fewer than 50 characters, then the line is padded with spaces.<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk &#8216;{ printf(&#8220;%-50.50s\\n&#8221;, $0) }&#8217; SOMEFILE<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>sum the values in column 1<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk &#8216;BEGIN{total=0;} {total += $1;} END{print &#8220;total is &#8220;, total}&#8217; SOMEFILE<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>sum the values in columns 1, 2 and 4 in order to calculate precision and recall<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk -F &#8216;,&#8217; &#8216;BEGIN{TP=0; FP=0; FN=0} {TP += $1; FP += $2; FN += $4} END{print &#8220;precision is &#8220;, TP\/(FP+TP); print &#8220;recall is &#8220;, TP\/(FN+TP)}&#8217; prec-recall-2states.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>sum each row<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk &#8216;{sum=0; for(i=1; i&lt;=NF; i++){sum+=$i}; print sum}&#8217; SOMEFILE<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p><b><a name=\"sed\"><\/a>sed<\/b> \u2014 from the man page:<\/p>\r\n<blockquote>\r\n<p>Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sed\u2019s ability to filter text in a pipeline which particularly distinguishes it from other types of editors.<\/p>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Double space <span class=\"example\">infile<\/span> and send the output to <span class=\"example\">outfile<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed G &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p>I use the input\/output notation shown above. It is appropriate in many, if not all, cases to leave out the less than sign, e.g., <span class=\"example\">sed G infile &gt; outfile<\/span><\/p>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Double space a file which already has blank lines in it. Output file should contain no more than one blank line between lines of text.<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed &#8216;\/^$\/d;G&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Triple space a file<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed &#8216;G;G&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Undo double-spacing (assumes even-numbered lines are always blank)<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed &#8216;n;d&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Insert a blank line above every line which matches <span class=\"example\"> regex <\/span>(&#8220;regex&#8221; represents a <a href=\"http:\/\/www.google.com\/search?q=regular%2Bexpression\">regular expression<\/a>)<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed &#8216;\/regex\/{x;p;x;}&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Print the line immediately before <span class=\"example\">regex<\/span>, but not the line containing <span class=\"example\">regex<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed -n &#8216;\/regexp\/{g;1!p;};h&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Print the line immediately after <span class=\"example\">regex<\/span>, but not the line containing <span class=\"example\">regex<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed -n &#8216;\/regexp\/{n;p;}&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Insert a blank line below every line which matches <span class=\"example\">regex<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed &#8216;\/regex\/G&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Insert a blank line above and below every line which matches <span class=\"example\"> regex <\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed &#8216;\/regex\/{x;p;x;G;}&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Convert DOS newlines (CR\/LF) to Unix format<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\"><span class=\"example\">sed &#8216;s\/^M$\/\/&#8217; &lt; infile &gt; outfile <\/span># in bash\/tcsh, to get <span class=\"example\">^M<\/span> press Ctrl-V then Ctrl-M<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Print only those lines matching the regular expression\u2014similar to grep<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<p class=\"example\">sed -n &#8216;\/some_word\/p&#8217; infile<br \/>sed &#8216;\/some_word\/!d&#8217;<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Print those lines that do not match the regular expression\u2014similar to grep -v<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<p class=\"example\">sed -n &#8216;\/regexp\/!p&#8217;<br \/>sed &#8216;\/regexp\/d&#8217;<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Skip the first two lines (start at line 3) and then alternate between printing 5 lines and skipping 3 for the entire file<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed -n &#8216;3,${p;n;p;n;p;n;p;n;p;n;n;n;}&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<br \/>Notice that there are five p&#8217;s in the sequence, representing the five lines to print. The three lines to skip between each set of lines to print are represented by the <span class=\"example\">n;n;n;<\/span> at the end of the sequence.<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Delete trailing whitespace (spaces, tabs) from end of each line<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed &#8216;s\/[ \\t]*$\/\/&#8217; &lt; infile &gt; outfile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Substitute (find and replace) <span class=\"example\"> foo <\/span>with <span class=\"example\">bar<\/span> on each line<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\"><span class=\"example\">sed &#8216;s\/foo\/bar\/&#8217; &lt; infile &gt; outfile <\/span># replaces only 1st instance in a line<br \/><span class=\"example\">sed &#8216;s\/foo\/bar\/4&#8217; &lt; infile &gt; outfile <\/span> # replaces only 4th instance in a line<br \/><span class=\"example\">sed &#8216;s\/foo\/bar\/g&#8217; &lt; infile &gt; outfile <\/span> # replaces ALL instances in a line<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Replace each occurrence of the hexadecimal character 92 with an apostrophe:<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed s\/\\x92\/&#8217;\/g&#8221; &lt; old_file.txt &gt; new_file.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Print section of file between two regular expressions (inclusive)<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed -n &#8216;\/regex1\/,\/regex1\/p&#8217; &lt; old_file.txt &gt; new_file.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Combine the line containing <span class=\"example\">REGEX<\/span> with the line that follows it<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sed -e &#8216;N&#8217; -e &#8216;s\/REGEX\\n\/REGEX\/&#8217; &lt; old_file.txt &gt; new_file.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p><b><a name=\"perl\"><\/a>perl<\/b> \u2014 can do anything sed and awk can do, but not always as easily as shown in the examples above.<\/p>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>replace OLDSTRING with NEWSTRING in the file(s) in FILELIST [e.g.,<span class=\"example\"> file1 file2<\/span> or <span class=\"example\">*.txt<\/span>]<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">perl -pi.bak -e &#8216;s\/OLDSTRING\/NEWSTRING\/g&#8217; FILELIST<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>The options used are:<\/p>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ul>\r\n<li><span class=\"example\">-e<\/span> \u2014 allows a one-line script to be ran from the command line<\/li>\r\n<li><span class=\"example\">-i<\/span> \u2014 files are edited in place. In the example above, the .bak extension will be placed on original files<\/li>\r\n<li><span class=\"example\">-p<\/span> \u2014 causes the script to be placed in a while loop that iterates over the filename arguments<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<p>&nbsp;<\/p>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>the full perl program to do the same as the one-liner (without creating backup copies) is<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>#!\/usr\/bin\/perl\r\n# perl-example.pl\r\nwhile (&lt;&gt;)\r\n{\r\n\ts\/OLDSTRING\/NEWSTRING\/g;\r\n\tprint;\r\n}\r\n\t\t\t<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p>run using <span class=\"example\">.\/perl-example.pl FILELIST<\/span><\/p>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>remove the carriage returns necessary for DOS text files from files on the Unix system<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\"><span class=\"example\">perl -pi.bak -e &#8216;s\/\\r$\/\/g&#8217; FILELIST<\/span><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<h3>Assorted Utilities<\/h3>\r\n<p>Some of the examples below use the following files:<\/p>\r\n<div align=\"center\">\r\n<table border=\"0\" width=\"80%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" width=\"50%\">file1<\/td>\r\n<td class=\"example\" width=\"50%\">file2<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" bgcolor=\"#CCCCCC\">\r\n<pre>Tom 123 Main \r\nDick 4787 West\r\nHarry 98 North\r\nSue 1035 Cooper<\/pre>\r\n<\/td>\r\n<td valign=\"top\" bgcolor=\"#CCCCCC\">\r\n<pre>Tom programmer\r\nDick lawyer\r\nHarry artist<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p>&nbsp;<\/p>\r\n<table border=\"0\" width=\"80%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\">ga.txt<\/td>\r\n<\/tr>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>The Gettysburg Address\r\nGettysburg, Pennsylvania\r\nNovember 19, 1863\r\n\r\n\r\nFour score and seven years ago our fathers brought forth on this continent,\r\na new nation, conceived in Liberty, and dedicated to the proposition that\r\nall men are created equal.\r\n \r\nNow we are engaged in a great civil war, testing whether that nation, or any\r\nnation so conceived and so dedicated, can long endure. We are met on a great\r\nbattle-field of that war. We have come to dedicate a portion of that field,\r\nas a final resting place for those who here gave their lives that that nation\r\nmight live. It is altogether fitting and proper that we should do this.\r\n \r\nBut, in a larger sense, we can not dedicate -- we can not consecrate -- we\r\ncan not hallow -- this ground. The brave men, living and dead, who struggled\r\nhere, have consecrated it, far above our poor power to add or detract. The\r\nworld will little note, nor long remember what we say here, but it can never\r\nforget what they did here. It is for us the living, rather, to be dedicated\r\nhere to the unfinished work which they who fought here have thus far so\r\nnobly advanced. It is rather for us to be here dedicated to the great task\r\nremaining before us -- that from these honored dead we take increased devotion\r\nto that cause for which they gave the last full measure of devotion -- that we\r\nhere highly resolve that these dead shall not have died in vain -- that this\r\nnation, under God, shall have a new birth of freedom -- and that government\r\nof the people, by the people, for the people, shall not perish from the earth.\r\n \r\nSource: The Collected Works of Abraham Lincoln, Vol. VII, edited by Roy\r\nP. Basler.<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<p>&nbsp;<\/p>\r\n<p>In the examples using these files, the percent sign (%) at the beginning of the line represents the command prompt. Comments of what is happening follow the pound sign (#).<\/p>\r\n<p>&nbsp;<\/p>\r\n<p><b><a name=\"grep\"><\/a>grep<\/b> \u2014 prints the lines of a file that match a search string (<span class=\"example\">string<\/span> can be a <a href=\"http:\/\/www.google.com\/search?q=regular%2Bexpression\">regular expression<\/a>)<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">grep -i string some_file # <span style=\"font-family: Times New Roman, Times, serif;\">print the lines containing <\/span>string<span style=\"font-family: Times New Roman, Times, serif;\"> regardless of case<\/span><br \/>grep -v string some_file # <span style=\"font-family: Times New Roman, Times, serif;\">print the lines that don&#8217;t contain <\/span>string <br \/>grep -E &#8220;string1|string2&#8221; some_file # <span style=\"font-family: Times New Roman, Times, serif;\">print the lines that contain <\/span>string1<span style=\"font-family: Times New Roman, Times, serif;\"> or <\/span>string2<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"find\"><\/a>find<\/b> \u2014 find has many parameters for restricting what it finds, but I only demonstrate here how to use it to recursively search from the current location for files containing <span class=\"example\">the_word<\/span>. <a href=\"https:\/\/www.brezeale.com\/?p=178\">More examples of using find<\/a>.<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">find . -type f -print | xargs grep the_word 2&gt;\/dev\/null <br \/>find . -type f -exec grep &#8216;the_word&#8217; {} \\; -print<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<br \/>In the first example, results of the <span class=\"example\">find<\/span> command are piped to <span class=\"example\">grep; xargs <\/span>is used to pass the filenames one at a time to<span class=\"example\"> grep<\/span>. The value of STDERR (the errors) is eliminated by using <span class=\"example\">2&gt;\/dev\/null.<\/span> The second example shows how to <span class=\"example\">grep<\/span> each filename by using a command-line option of <span class=\"example\">find<\/span>.<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p><b>Operations on entire files<\/b><\/p>\r\n<p><b><a name=\"cat\"><\/a>cat<\/b> \u2014 concatenate files and print on the standard output<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\" height=\"66\">\r\n<pre>% cat -E file2  # display file2, showing $ at end of each line\r\nTom programmer$\r\nDick lawyer$\r\nHarry artist$\r\n\r\n\r\n\r\ncat -v somefile  # display somefile, showing nonprinting characters using ^ and M- notation, except for LFD and TAB\r\ncat -e somefile  # display somefile, combining the effects of -v and -E<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"nl\"><\/a>nl<\/b> \u2014 Number lines of files<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% nl file1\r\n     1\tTom 123 Main \r\n     2\tDick 4787 West\r\n     3\tHarry 98 North\r\n     4\tSue 1035 Cooper<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"wc\"><\/a>wc<\/b> \u2014 print the number of bytes, words, and lines in files<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% wc -l file1  # print number of lines\r\n      4 file1\r\n% wc -w file1  # print number of words\r\n     12 file1\r\n% wc -m file1  # print number of characters\r\n     60 file1\r\n% wc file1     # print number of lines, characters, and words\r\n      4      12      60 file1<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p><b>Alter the format of a file<\/b><\/p>\r\n<p><b><a name=\"fmt\"><\/a>fmt \u2014 <\/b> Reformat each paragraph of a file<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% fmt -w 50 ga.txt # reformat to 50 characters per line\r\nThe Gettysburg Address Gettysburg, Pennsylvania\r\nNovember 19, 1863\r\n\r\nFour score and seven years ago our fathers\r\nbrought forth on this continent, a new nation,\r\nconceived in Liberty, and dedicated to the\r\nproposition that all men are created equal.\r\n\r\nNow we are engaged in a great civil war, testing\r\nwhether that nation, or any nation so conceived\r\nand so dedicated, can long endure. We are met on\r\na great battle-field of that war. We have come\r\nto dedicate a portion of that field, as a final\r\nresting place for those who here gave their lives\r\nthat that nation might live. It is altogether\r\nfitting and proper that we should do this.\r\n\r\nBut, in a larger sense, we can not dedicate --\r\nwe can not consecrate -- we can not hallow --\r\nthis ground. The brave men, living and dead, who\r\nstruggled here, have consecrated it, far above\r\nour poor power to add or detract. The world will\r\nlittle note, nor long remember what we say here,\r\nbut it can never forget what they did here. It is\r\nfor us the living, rather, to be dedicated here\r\nto the unfinished work which they who fought here\r\nhave thus far so nobly advanced. It is rather\r\nfor us to be here dedicated to the great task\r\nremaining before us -- that from these honored\r\ndead we take increased devotion to that cause for\r\nwhich they gave the last full measure of devotion\r\n-- that we here highly resolve that these dead\r\nshall not have died in vain -- that this nation,\r\nunder God, shall have a new birth of freedom --\r\nand that government of the people, by the people,\r\nfor the people, shall not perish from the earth.\r\n\r\nSource: The Collected Works of Abraham Lincoln,\r\nVol. VII, edited by Roy P. Basler.<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"fold\"><\/a>fold<\/b> \u2014 wrap each input line to fit in specified width<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% fold -w 50 ga.txt\r\nThe Gettysburg Address \r\nGettysburg, Pennsylvania \r\nNovember 19, 1863\r\n\r\nFour score and seven years ago our fathers brought\r\n forth on this continent,\r\na new nation, conceived in Liberty, and dedicated \r\nto the proposition that\r\nall men are created equal.\r\n\r\nNow we are engaged in a great civil war, testing w\r\nhether that nation, or any\r\nnation so conceived and so dedicated, can long end\r\nure. We are met on a great\r\nbattle-field of that war. We have come to dedicate\r\n a portion of that field,\r\nas a final resting place for those who here gave t\r\nheir lives that that nation\r\nmight live. It is altogether fitting and proper th\r\nat we should do this.\r\n\r\nBut, in a larger sense, we can not dedicate -- we \r\ncan not consecrate -- we\r\ncan not hallow -- this ground. The brave men, livi\r\nng and dead, who struggled\r\nhere, have consecrated it, far above our poor powe\r\nr to add or detract. The\r\nworld will little note, nor long remember what we \r\nsay here, but it can never\r\nforget what they did here. It is for us the living\r\n, rather, to be dedicated\r\nhere to the unfinished work which they who fought \r\nhere have thus far so\r\nnobly advanced. It is rather for us to be here ded\r\nicated to the great task\r\nremaining before us -- that from these honored dea\r\nd we take increased devotion\r\nto that cause for which they gave the last full me\r\nasure of devotion -- that we\r\nhere highly resolve that these dead shall not have\r\n died in vain -- that this\r\nnation, under God, shall have a new birth of freed\r\nom -- and that government\r\nof the people, by the people, for the people, shal\r\nl not perish from the earth.\r\n\r\nSource: The Collected Works of Abraham Lincoln, Vo\r\nl. VII, edited by Roy\r\nP. Basler.<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p><b>Output parts of files<\/b><\/p>\r\n<p><b><a name=\"head\"><\/a>head<\/b> \u2014 Output the first part of files<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% head -2 file1  # print the first two lines\r\nTom 123 Main\r\nDick 4787 West<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"tail\"><\/a>tail<\/b> \u2014 Output the last part of files<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% tail -2 file1  # display the last 2 lines\r\nHarry 98 North\r\nSue 1035 Cooper<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"split\"><\/a>split<\/b> \u2014 Split a file into pieces (default is 1000 lines each)<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>split somefile         # create files of the form xaa, xab, and so on\r\nsplit -l 500 somefile  # each new file will be at most 500 lines long<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"csplit\"><\/a>csplit<\/b> \u2014 split a file into sections determined by context lines<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>csplit bigfile \/The End\/+4            # break at the line that is 4 lines below The End\r\ncpslit -k bigfile \/The End\/+1 \"{99}\"  # break at the line below each occurrence of The End up to 99 times<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p><b>Operate on fields within a line<\/b><\/p>\r\n<p><b><a name=\"cut\"><\/a>cut<\/b> \u2014 print selected parts of lines from<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% cut -c1-10 file2                  # cut characters 1 through 10 from file2\r\nTom progra\r\nDick lawye\r\nHarry arti\r\n \r\n% cut -d \" \" -f2 file1              # cut the second column (-f2); use a space as the delimiter (-d \" \")\r\n123\r\n4787\r\n98\r\n1035\r\n\r\nls *.txt | cut -c1-3 | xargs mkdir  # create directories with the names of the first three letters of each .txt file<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"paste\"><\/a>paste<\/b> \u2014 merge lines of files, separated by tabs. The columns of the input files are placed side-by-side with each other.<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% paste file1 file2 \r\nTom 123 Main \tTom programmer\r\nDick 4787 West\tDick lawyer\r\nHarry 98 North\tHarry artist\r\nSue 1035 Cooper <\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p><b><a name=\"join\"><\/a>join<\/b> \u2014 join lines of two files on a common field (files should be sorted by common field)<\/p>\r\n<blockquote class=\"exambox\">\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% join -a 2 -a 1 -o 1.1,1.2,2.2 -e \" \" file1 file2\r\nTom 123 programmer\r\nDick 4787 lawyer\r\nHarry 98 artist\r\nSue 1035\r\n\r\njoin -a 2 -a 1 -o 1.1,1.2,2.2 -e \" \" -1 1 -2 3 file1 file2<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p>-a list unpairable lines in file1 and file2<br \/>-o display fields 1 and 2 of file1 field 2 of file2<br \/>-e replace any empty output fields with blanks<br \/>-1 join on field 1 of file1<br \/>-2 join on field 3 of file2<\/p>\r\n<\/blockquote>\r\n<p><b><a name=\"sdiff\"><\/a>sdiff<\/b> \u2014 print differences between files<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sdiff -s file1 file2<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p><span class=\"example\">-s <\/span>supress identical lines<\/p>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p><b>Operate on sorted files<\/b><\/p>\r\n<p><b><a name=\"sort\"><\/a>sort<\/b> \u2014 sort lines of text files<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>% sort +1 file1     # sort on the second column (the count starts at zero)\r\nSue 1035 Cooper\r\nTom 123 Main \r\nDick 4787 West\r\nHarry 98 North\r\n\r\n\r\n% sort -n +1 file1  # perform a numeric sort (-n) by the second column\r\nHarry 98 North\r\nTom 123 Main \r\nSue 1035 Cooper\r\nDick 4787 West<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<br \/>use <a href=\"http:\/\/examples.oreilly.com\/upt3\/split\/lensort\">lensort<\/a> to sort by line length<br \/>use <a href=\"http:\/\/examples.oreilly.com\/upt2\/split\/chunksort\">chunksort<\/a> to sort paragraphs separated by a blank line<\/blockquote>\r\n<p><b><br \/><a name=\"uniq\"><\/a>uniq<\/b> \u2014 displays unique lines from a sorted file<\/p>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td bgcolor=\"#CCCCCC\">\r\n<pre>cat SOMEFILE | sort | uniq   # this could have been done easier with  sort SOMEFILE | uniq\r\nuniq -c filename             # prefix lines by the number of occurrences\r\nuniq -d filename             # display the lines that are not unique\r\nuniq -D filename             # print all duplicate lines\r\nuniq -i filename             # ignore differences in case when comparing\r\nuniq -s filename             # avoid comparing the first N characters\r\nuniq -u filename             # only print unique lines<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p>&nbsp;<\/p>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<p>To perform these operations on multiple files, it is often helpful to create a <a href=\"https:\/\/www.brezeale.com\/?p=183\">simple shell script<\/a> to operate on the appropriate files.<\/p>\r\n<p>&nbsp;<\/p>\r\n<h3><a name=\"examples\"><\/a>Assorted Examples that Combine Tools<\/h3>\r\n<p>These examples don&#8217;t necessarily rely on the sample files given above.<\/p>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>find all files beginning in the current directory and sum the number of lines in them<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">find . -exec wc -l {} \\; | awk &#8216;{total = total+$1;print total &#8221; &#8221; $1 &#8221; &#8221; $2}&#8217;<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>print the 4th, 3rd, and 2nd columns of <span class=\"example\">SOMEFILE<\/span> (in that order), and sort on the last column (the 2nd column of the original file)<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">cat SOMEFILE | awk &#8216;{ print $4 &#8221; &#8221; $3 &#8221; &#8221; $2 }&#8217; | sort +2<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>print total size of all files<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">find . -type f -name &#8220;*.*&#8221; -ls | awk &#8216;BEGIN{ FILECNT = 0; T_SIZE = 0;} { T_SIZE += $7; FILECNT++} END{print &#8220;Total Files:&#8221;, FILECNT, &#8220;Total Size:&#8221;, T_SIZE,&#8221;Average Size:&#8221;, T_SIZE \/ FILECNT;}&#8217;<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>list all files with a size less than 100 bytes<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">ls -l | awk &#8216;{if ($5 &lt; 100) {print $5 &#8221; &#8221; $8}}&#8217;<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p>here <span class=\"example\">$5<\/span> represents the column of file sizes produced by <span class=\"example\">ls -l<\/span><\/p>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>delete all files with a size less than 100 bytes<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">ls -l | awk &#8216;{if ($5 &lt; 100) {print $8}}&#8217; | xargs -i -t rm \\{}<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>if the number in the second column is less than 1000, prefix it with a zero<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk &#8216;{if ($2 &lt; 1000) {print $1 &#8221; 0&#8243; $2 &#8221; &#8221; $3} else {print $1 &#8221; &#8221; $2 &#8221; &#8221; $3}}&#8217; &lt; dvd-titles2.sh &gt; dvd-titles3.sh<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>combine <span class=\"example\">file1<\/span> and <span class=\"example\">file2<\/span> and show TAB characters as <span class=\"example\">^I<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">% paste file1 file2 | cat -T <br \/>Tom 123 Main ^ITom programmer<br \/>Dick 4787 West^IDick lawyer<br \/>Harry 98 North^IHarry artist<br \/>Sue 1035 Cooper^I<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>sort ratings.dat on column 2 and subsort on column 0 using <span class=\"example\">:<\/span> as the delimiter, redirecting the output to ratings-sorted.dat<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">sort -t : -n +2 +0 ratings.dat &gt; ratings-sorted.dat<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>cut the first and third columns of movies-ratings.dat, using the <span class=\"example\">:<\/span> as the delimiter, and count the unique lines<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">cut -d : -f 1,3 movies-ratings.dat | uniq -c<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>In a file where each line begins with &#8216;File&#8217; followed by one or more digits followed by &#8216;=&#8217;, e.g., &#8216;File23=&#8217;, find the duplicates<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">awk -F = &#8216;{print $2}&#8217; untitled.pls |sort|uniq -c |sort<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Find all files from the current location with filenames of at least 50 characters<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">find . -exec basename {} \\; | sed -n &#8216;\/^.\\{50\\}\/p&#8217;<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>A file of closed captions needs to be cleaned up. Search for the blank lines and remove them as well as the two lines that follow the blank lines. This works by not printing everything from the blank line (\/^$\/) to the line with the colons (\/:\/). Since the first section to clean up doesn&#8217;t have a blank line to look for, begin on the 3rd line of the file.<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">% head -7 0273-mary_shelleys_frankenstein.cc<br \/>1<br \/>00:00:30,063 &#8211;&gt; 00:00:33,066<br \/>[ Woman ]<br \/>&#8220;I BUSIED MYSELF<br \/>TO THINK OF A STORY&#8230;<br \/><br \/>2<br \/>00:00:33,066 &#8211;&gt; 00:00:37,570<br \/>&#8220;WHICH WOULD SPEAK<br \/>TO THE MYSTERIOUS FEARS<br \/>OF OUR NATURE&#8230;<br \/><br \/>3<br \/>00:00:37,570 &#8211;&gt; 00:00:39,572<br \/>&#8220;AND AWAKEN&#8230;<br \/>%<br \/>% sed -n &#8216;3,${\/^$\/,\/:\/!p}&#8217; &lt; 3370-betrayed.cc &gt; 3370-betrayed.cc.clean <br \/>%<br \/>% head -7 0273-mary_shelleys_frankenstein.cc.clean<br \/>[ Woman ]<br \/>&#8220;I BUSIED MYSELF<br \/>TO THINK OF A STORY&#8230;<br \/>&#8220;WHICH WOULD SPEAK<br \/>TO THE MYSTERIOUS FEARS<br \/>OF OUR NATURE&#8230;<br \/>&#8220;AND AWAKEN&#8230;<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>Search for lines containing <span class=\"example\">::0038::<\/span> or <span class=\"example\">::0148::<\/span> or <span class=\"example\">::0187::<\/span>, use sed to replace the <span class=\"example\">::<\/span> field delimiters with a %, and then perform a numerical sort on the second column. Note that egrep is equivalent to grep -E<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ egrep &#8220;::0038::|::0148::|::0187::&#8221; ratings.dat | sed &#8216;s\/::\/%\/g&#8217; | sort -t % +1 -n &gt; match-ratings.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>determine the disk usage of each subdirectory of the current directory, sort in descending order, and format for readability<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ du -s *|sort -n -r|awk &#8216;{printf(&#8220;%8.0fKB %s\\n&#8221;, $1, $2)}&#8217;<br \/>29223820KB bob<br \/>23038660KB tom<br \/>19999376KB sue<br \/>11010288KB andy<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>for columns 3-6125, find those columns that have some value other than &#8216;0,&#8217; and count the number of occurrences<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">\r\n<pre>#!\/bin\/sh\r\n\r\nfor col in $(seq 3 6125); do \r\n\techo \"column $col\"\r\n\tawk '{print $'$col'}' allshots2nd10minutes.shots | grep -vc \"0,\"\r\ndone\r\n<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>print column 51 followed by the line number for this value, sorted by the values from column 51<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ awk &#8216;{print $51 &#8220;\\t&#8221; FNR}&#8217; allshots2nd5-10thIframes-sparse.shots |sort<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>extract the 6th column from all but the last line of <span class=\"example\">somefile<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ head -n -1 somefile | awk &#8216;{print $6}&#8217;<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>print all but the first column of <span class=\"example\">somefile<\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ awk -f remove_first_column.awk somefile<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>where the file<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<p><span class=\"example\">remove_first_column.awk<\/span><\/p>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>consists of the following:<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">\r\n<pre># remove_first_column.awk\r\nBEGIN {\r\n\tORS=\"\"\r\n}\r\n{\r\n\tfor (i = 2; i &lt;= NF; i++)\r\n\t\tif (i == NF)\r\n\t\t\tprint $i \"\\n\"\r\n\t\telse\r\n\t\t\tprint $i \" \"\r\n}\r\n<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>The first line of <span class=\"example\">file1<\/span> contains header information, which we don&#8217;t want. <span class=\"example\">file2<\/span> lacks the column headers and therefore contains one less line than <span class=\"example\">file1<\/span>. Extract all but the first line of <span class=\"example\">file1<\/span> and combine with the columns of <span class=\"example\">file2<\/span> to create <span class=\"example\">file3<\/span> with the vertical bar (|) as the delimiter between the columns of each.<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ tail -n+2 file1 | paste -d &#8216;|&#8217; &#8211; file2 &gt; file3<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>delete the lines up to and including the regular expression (REGEX)<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ sed &#8216;1,\/REGEX\/d;&#8217; somefile.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>delete the lines up to the regular expression (REGEX)<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ sed -e &#8216;\/REGEX\/p&#8217; -e &#8216;1,\/REGEX\/d;&#8217; somefile.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>delete all newlines (this turns the entire document into a single line<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ tr -d &#8216;\\n&#8217; &lt; somefile.txt<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>combine groups of nonblank lines into a single line, where each group is separated by a single blank line. This works by first changing each blank line to XXXXX; second, each newline is replaced by a space; third, each XXXXX is now replaced with a newline in order to separate the original groups into lines.<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ cat somefile.txt\r\n<pre>this is the\r\nfirst section of\r\nthe file\r\n\r\nthis is the\r\nsecond section of\r\nthe file\r\n\r\nthis is the\r\nthird section of\r\nthe file\r\n<\/pre>\r\n$ sed &#8216;s\/^$\/XXXXX\/&#8217; somefile.txt | tr &#8216;\\n&#8217; &#8216; &#8216; | sed &#8216;s\/XXXXX\/\\n\/g&#8217;| sed &#8216;s\/^ \/\/&#8217;\r\n<pre>this is the first section of the file\r\nthis is the second section of the file\r\nthis is the third section of the file\r\n<\/pre>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<ol>\r\n<li style=\"list-style-type: none;\">\r\n<ol>\r\n<li>remove non-alphabetic characters and convert uppercase to lowercase<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<blockquote>\r\n<table border=\"1\" width=\"100%\">\r\n<tbody>\r\n<tr>\r\n<td class=\"example\" bgcolor=\"#CCCCCC\">$ tr -cs &#8220;[:alpha:]&#8221; &#8221; &#8221; &lt; somefile.txt | tr &#8220;[:upper:]&#8221; &#8220;[:lower:]&#8221;<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/blockquote>\r\n<p>&nbsp;<\/p>\r\n<h3><a name=\"references\"><\/a>References<\/h3>\r\n<ol>\r\n<li><a href=\"http:\/\/www.gnu.org\/software\/coreutils\/\">GNU core utilities<\/a><\/li>\r\n<li><a href=\"http:\/\/www-106.ibm.com\/developerworks\/linux\/edu\/l-dw-linux-gnutex-i.html\">Using the GNU text utilities<\/a><\/li>\r\n<li><a href=\"http:\/\/www.student.northpark.edu\/pemente\/awk\/awk1line.txt\">awk one-liners<\/a><\/li>\r\n<li><a href=\"http:\/\/www.gnu.org\/software\/gawk\/manual\/gawk.html\">The GNU Awk User&#8217;s Guide<\/a><\/li>\r\n<li><a href=\"http:\/\/www.grymoire.com\/Unix\/Awk.html#uh-4\">Awk: Dynamic Variables<\/a><\/li>\r\n<li><a href=\"http:\/\/sparky.rice.edu\/~hartigan\/awk.html\">How to Use Awk <\/a> (Hartigan)<\/li>\r\n<li><a href=\"http:\/\/www.student.northpark.edu\/pemente\/sed\/sed1line.txt\">sed one-liners<\/a><\/li>\r\n<li><a href=\"http:\/\/sed.sourceforge.net\/grabbag\/scripts\/\">sed scripts<\/a><\/li>\r\n<li><a href=\"http:\/\/www.grymoire.com\/Unix\/Sed.html\">Sed &#8211; An Introduction<\/a><\/li>\r\n<li><a href=\"http:\/\/www-106.ibm.com\/developerworks\/linux\/library\/l-p101\/\">Perl one-liners<\/a><\/li>\r\n<li><a href=\"http:\/\/www-106.ibm.com\/developerworks\/linux\/library\/l-p102.html\">Perl one-liners<\/a><\/li>\r\n<li><a href=\"http:\/\/www.troubleshooters.com\/codecorn\/littperl\/perlreg.htm\">Perl regular expressions<\/a><\/li>\r\n<li><a href=\"http:\/\/www.oreilly.com\/catalog\/upt3\/\">Unix Power Tools, 2<sup>nd<\/sup> Ed., O&#8217;Reilly<\/a><\/li>\r\n<li><a href=\"http:\/\/www.nostarch.com\/frameset.php?startat=lcbk2_stutz\">Linux Cookbook, 2<sup>nd<\/sup> Ed., No Starch Press<\/a><\/li>\r\n<li><a href=\"http:\/\/www.oreilly.com\/catalog\/unixnut3\/\">Unix in a Nutshell, 3<sup>rd<\/sup> Ed., O&#8217;Reilly<\/a><\/li>\r\n<li><a href=\"http:\/\/www.unixreview.com\/documents\/s=8989\/ur0412d\/\">John &amp; Ed&#8217;s Miscellaneous Unix Tips<\/a><\/li>\r\n<li><a href=\"http:\/\/www.oreilly.com\/catalog\/shellsrptg\/\">Classic Shell Scripting, O&#8217;Reilly<\/a> \u2014 great overview of the Unix philosophy of combining small tools that are each very good at a specific thing<\/li>\r\n<\/ol>\r\n","protected":false},"excerpt":{"rendered":"<p>I originally created and posted this November 22, 2004. Here are some examples of using the utilities found on Unix (available on some other platforms also) for manipulating the text in files. awk and perl both allow writing full programs, but I primarily use both as short one-liner programs which allows them to be piped to\/from other Unix programs. Each of these programs has capabilities that make it better than the others in some situations which I have attempted to&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.brezeale.com\/?p=185\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[7],"tags":[],"class_list":["post-185","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.brezeale.com\/index.php?rest_route=\/wp\/v2\/posts\/185","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.brezeale.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.brezeale.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.brezeale.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.brezeale.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=185"}],"version-history":[{"count":7,"href":"https:\/\/www.brezeale.com\/index.php?rest_route=\/wp\/v2\/posts\/185\/revisions"}],"predecessor-version":[{"id":838,"href":"https:\/\/www.brezeale.com\/index.php?rest_route=\/wp\/v2\/posts\/185\/revisions\/838"}],"wp:attachment":[{"href":"https:\/\/www.brezeale.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.brezeale.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.brezeale.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}