Chapter 10

Line editors - ed and sed

Introduction

Now that we know about regular expressions, we can learn a little about Unix's line editors ed and sed. There are about 20 commands in ed but we will only learn the three most useful ones. We can get by with so little knowledge because we will not be using ed directly.

Line editor

A line editor is a program that allows us to change a file one line at a time; we don't normally see the file or even the line we are changing while editing it. The alternative is a full-screen editor which shows us a screenful of the file while we are making changes. It may seem odd to learn about line editors when full-screen editors are available. However, the line editors and their commands are very powerful; they will be needed when we build our own tools and commands. Also, one of the most important features of the full-screen editor vi is that it allows us to use ed's powerful commands.

Starting ed

To edit an existing file, we use:

$ ed cars
727
1
The typical American male devotes more than 1,600 hours a

The 727 is the number of characters in the cars file; it was displayed by ed. The second number was typed by me; it is the command which makes the first line the current line. Mostly, ed does commands without displaying anything but when it moves to a new line it does display it. Notice that ed does not display a prompt; newcomers find that unsettling. Another peculiarity of ed is that it has only one error message - which consists of just a question mark!

Substitute command

The most used command in ed is the substitute command:

s/devote/waste/

It consists of the letter s followed by a character known as the delimiter; then comes a string of characters which is to be removed from the line, followed by the delimiter; lastly comes a string of characters which is to replace what was removed, followed again by the delimiter. So our example takes devote out of the line and puts waste in its place. The reason that the delimiter occurs three times is that the first one announces which character will be used in this particular substitute command. We could use a different delimiter each time we perform a substitution. In practice, most people use slash (/) like we have done unless it occurs in one of the strings of characters.

Errors

If the string of characters we have asked to be removed does not occur in the current line, ed will display its error message:

s/demote/waste/
?

It will not try to find the string on some other line.

Substitute suffix - p

Do not expect ed to display the new version of the line - it won't. We can make that happen by adding a p to the end of the command. Here is another substitute command:

sXmaleXmanXp
The typical American man wastes more than 1,600 hours a

It has a different delimiter (X) and has the p suffix to make ed display the latest version of the line.

Substitute suffix - g

The basic form of the substitute command only makes one change to the line. If we want to change every occurrence in the current line, we have to add a g suffix, as shown here:

s/an/AN/gp
The typical AmericAN mAN wastes more thAN 1,600 hours a

It is OK to use two suffixes but they must be in the order shown. Notice, however, that only the current line is changed; all the others are left alone.

Regular expressions

The first string of characters in the substitute command is actually a regular expression (RE) as this example shows:

s/AN.*AN/an man wastes more than/p
The typical American man wastes more than 1,600 hours a

Notice that ed matches as many characters as possible against the RE.

Because we can use REs, ed does not need separate commands to insert text at the ends of lines or to split a line in two. The substitute command is used for all three jobs. For instance:

s/^/Fact: /p
Fact: The typical American man wastes more than 1,600 hours a

inserts text at the start of a line, and:

s/$/t/p
Fact: The typical American man wastes more than 1,600 hours at

sticks text to the end of the line.

Splitting a line

The following command changes the space character between n and m into a new-line:

s/n m/n\
m/p
man devotes more than 1,600 hours at
-
Fact: The typical American
u
p
Fact: The typical American man devotes more than 1,600 hours at

It is the backslash (\) immediately before the end of the line that tells ed that the new-line is part of the replacement string. Without the backslash, ed would take the new-line as the end of the command. Notice the use of the - command to go to the line above the current line, and the use of the u command to undo the effect of the previous change.

Missing replacement string

The string of characters which replace those matched by the RE is known as the replacement string. If no replacement string is supplied the characters matching the RE are simply removed - as shown here:

s/......//p
The typical American man wastes more than 1,600 hours at
s/.$//p
The typical American man wastes more than 1,600 hours a

Notice that when it is finding a match for a RE, ed starts looking at the left-hand end of the line. That is why the string of dots matched the first six characters on the line. It was the dollar sign after the single dot that made it match the last character on the line.

& in the replacement string

An ampersand (&) in the replacement string has a special meaning. Let's see it in action before looking at the details:

s/man/(&)/p
The typical American (man) wastes more than 1,600 hours a

The ampersand means: whatever the RE matched in the most recent match. In the example that was "man", so the overall effect is to enclose the word in parentheses. If needed, more than one ampersand can be used, perhaps to duplicate the matched text.

Accessing parts of a string

We can enclose parts of a complex regular expression in escaped parentheses (\() and (\)). Doing so, lets us split the complex regular expression into smaller parts and refer to the smaller parts individually. For example:

s/\(.*American\).*\(wastes.*\)/\1 person \2/p
The typical American person wastes more than 1,600 hours a

Here the regular expression matched the whole line. The match was done in three parts. The first part was up to and including "American". The last was from "wastes" to the end of the line. The middle part was all the characters between the first and last parts, that is "(man)" (and the surrounding space characters).

In the replacement string, \\1 means the text matched by the first set of escaped parentheses. Obviously, \\2 means the text matched by the second set. This means that our line of text is replaced with the first and last parts with "person" (and spaces) sandwiched between. Notice that we didn't bother to wrap the middle part in escaped parentheses because we had no need to refer to it later.

This facility is very complex but amazingly useful when building tools to alter complex text. If the need arises, we can refer to up to nine parts of a RE.

Other commands

Because so much can be done with the substitute command and regular expressions, ed only has about 20 commands. We will only learn seven more of them here. You can read ed's man page if you need to find out about the rest.

Our seven new commands are:

Command | Function
--------+-------------------------
d       | delete the current line
p       | display the current line
u       | undo the last change
w       | write changes to disk
q       | quit
g       | global
v       | inverse global

Here are the first three of those commands in action:

p
The typical American person wastes more than 1,600 hours a
d
p
year to his car.  He sits in it while it goes and while it
u
p
The typical American person wastes more than 1,600 hours a

The lines containing just a single character are the commands. Notice that the delete command causes no output, and that the line after the deleted one becomes the current line. Also, when the deletion is undone, the replaced line becomes the current line.

Searching

We can look for a line containing a particular word by using a command like this:

/spends/
tickets.  He spends four of his sixteen waking hours on the

The search starts at the current line; it continues towards the last line of the file and wraps round from the last line to the first one if needed. Is is possible to search from the current line towards line one if needed:

?petrol?
He works to pay for petrol, tolls, insurance, taxes and

This search also wraps round.

Context addresses

These ways of referring to lines:

/spends/
?petrol?

are known as context addresses; they can be used as if they were line numbers. Notice that we can use REs in context addresses; we aren't limited to fixed text. Note, however, that the delimiters are fixed.

Missing RE

If we do not give ed anything where it is expecting a RE it re-uses the last RE we did give. This is useful because it saves us typing - as shown

/[A-Z][a-z][a-z]*/
He works to pay for petrol, tolls, insurance, taxes and
//
tickets.  He spends four of his sixteen waking hours on the
//
road or gathering resources for it.  The model American puts

The same thing applies to the substitute command too:

s//A/p
road or gathering resources for it.  A model American puts

In all the above commands ed re-used the RE which represents a word starting with a capital letter.

Line numbers

If we put a line number in front of an ed command it will affect only the appropriate line of the file. We can also specify a range of lines by giving two line numbers separated by a comma. So this:

11p

would display line 11, and this:

2,6p

would display lines two through to six. If we had used d instead of p, we would have deleted seven lines.

This example shows line numbers in front of the substitute command:

2s/car/automobile/p
year to his automobile.  He sits in it while it goes and while it
1,12s/ it / his car /gp
money to put down on his car and to meet the monthly instalments.

Notice that ed makes the last line specified the new current line, and only displays that line, although it may have changed twelve lines.

Special line numbers

This table shows some special line numbers and their meanings:


Symbol | Meaning
-------+----------------------------
1      | the first line
$      | the last line
.      | the current line
0      | the line before the first!

If a line number is left out, ed assumes the current line is referred to. Therefore the following ranges can be used:


Range           | Meaning
----------------+----------------------------
1,$             | all lines
,$              | from the current line to the end of the file
1,              | from line one to the current line
.-2,.+2         | the five lines centred on the current line
/parks/,/four/  | from the line containing parks
                | to the line containing four

For example, this command would display all of the file:

1,$p

But, we needn't see the output!

Global command - g

The trouble with trying to change all lines in a file by using 1,$ is that it will fail if the change cannot be made on at least one line. To solve this problem we can use the global command; it has this format:

g/RE/commands

where commands represents any ed commands we wish to perform on the lines that match the RE. Here is an example:

g/car/p
year to his automobile.  He sits in his car while his car goes and while it
stands idling.  He parks his car and searches for it.  He earns the
money to put down on his car and to meet the monthly instalments

In the example, commands has been replaced with the p command which displays the line. The net effect of our global command therefore is to display all lines containing car.

Notice the format of the last command. It was g/RE/p, or, ignoring the slashes, gREp. Efficiently performing that command is where grep gets it's name.

Here is another g command:

g/automobile/s//car/g

Here the command is a substitute command. See how we didn't have to type the RE again; ed re-used the previous one which in this case was automobile. The advantage of using the global command is that it can safely be used to change all lines. There is no problem if none of them contain automobile.

Do not confuse the g at the start of the line with the one at the end. The first one means: all lines containing the RE; the second means: make all possible changes on each line.

If we wish to see the changed lines we can use a command like this:

g/his car/s//it/gp
year to it.  He sits in it while it goes and while it
stands idling.  He parks it and searches for it.  He earns the
money to put down on it and to meet the monthly instalments.

It is the p suffix on the substitute command that makes the difference.

Often, we wish to delete certain lines. Here is how it is done:

g/ it /d

We used the d command to delete the lines containing the RE.

The inverse of g - v

The v command is the inverse of g in that the commands are performed on lines that do not match the RE. This command would delete the lines that the previous command did not delete:

v/ it /d

Since all lines have now been deleted, we must take care not to let ed write the changes back to disk. The w command would do that. Instead, all we have to do is quit, using the q command:

q
?
q
$

Notice, we have to repeat the command to confirm that we are aware that we have altered the file but not written it to disk.

Stream editor - sed

The editor ed was designed for interactive use. Typically, it moves backwards and forwards from line to line in a file randomly, making changes as directed by the user. The other line editor, sed, is a stream editor. It always starts at the first line and works through, line by line, towards the end of the text. The changed text is never put back into a file by sed; it simply puts the new version of the text onto the standard output. It is not restricted to working on text in a file - it can edit its standard input too. It is more efficient than ed for non-interactive edits and can handle larger files.

Apart from those differences in how the two editors operate, there is an important difference between ed and sed commands. The ed commands are done only on the current line, unless we specify otherwise. The sed commands are done on every line, unless we specify otherwise.

For example, this ed command:

s/He/She/

would only affect the current line. When we use it with sed it is attempted on every line, as shown:

$ sed 's/He/She/' cars
The typical American male devotes more than 1,600 hours a
year to his car.  She sits in it while it goes and while it
stands idling.  She parks it and searches for it.  He earns the
money to put down on it and to meet the monthly instalments.
She works to pay for petrol, tolls, insurance, taxes and
tickets.  She spends four of his sixteen waking hours on the
road or gathering resources for it.  The model American puts
in 1,600 hours to get 7,500 miles: less than five miles per
hour.  In countries deprived of a transportation industry,
people manage to do the same, walking wherever they want to
go, and they allocate only three to eight percent of their
society's time budget to traffic instead of 28 per cent.

Ivan Illich
$

As you can see, the first of sed's parameters is the editing command; the second is a file name. The quotation marks around the editing command were, strictly speaking, not necessary. When we looked at grep, we saw that it is usually a good idea to wrap REs in quotation marks so that the shell does not interpret the metacharacters and spaces. The same reasoning applies to sed's commands.

A very important point about the previous example is that cars is just an input file. Its contents were not altered. The edited text was only sent to the standard output, not put back into the file.

If we need to save the new version of the text, we have to do this:

$ sed 's/He/She/' cars > newcars
$

which redirects the output into a file.

Just as with ed, we can use line numbers or context addresses. For example:

$ sed '2s/He/She/' cars > newcars
$ sed '2,6s/He/She/' cars > newcars
$ sed '/typical/,/parks/s/He/She/' cars > newcars
$

Multiple commands

So far, we have only done one change at once with sed. Usually however, we wish to do several. Here is how it is done:

$ sed -e 's/car/auto/' -e 's/petrol/gas/' cars > newcars
$

The difference is that each editing command is preceded by the -e option, otherwise sed assumes the second and subsequent editing commands are file names.

For clarity, we could split a long command over several lines, with each editing command on its own line:

$ sed -e 's/car/auto/' \
>     -e 's/petrol/gas/' \
>     -e 's/tickets/fines/' cars > newcars
$

Don't be confused by the >; it is the secondary prompt, used by the shell to remind the user that the command has spilled over onto another line. It was typed by Unix and not by me.

We could achieve the same effect like this:

$ sed  's/car/auto/
>       s/petrol/gas/
>       s/tickets/fines/' cars > newcars
$

The last method, without the -e options and with only one set of quotation marks, is by far the neatest. Notice how, in the previous two examples, the editing commands have been lined up with each other to enhance readability.

Most common usage

The the next example shows probably the most common way of using sed.

$ sed '/Man/d
>      /car/s//auto/
>      /petrol/s//gas/
>      /tickets/s//fines/' cars > newcars
$

All the commands look just like ed's g command except they don't start with "g"! And they work exactly the same, executing the command after the initial RE only on the lines that match the RE. The empty RE in the substitute commands causes the previous RE (which is car in the first substitute command) to be re-used. The big advantage of this variation is that all the editing commands begin with the RE, making the whole thing easier to read.

Quiet operation - -n

Usually, sed writes every line to the standard output after making any changes. The -n option and sed's p command allow us to display only certain lines of a file. This example displays line four:

$ sed -n '4p' cars
money to put down on it and to meet the monthly instalments.
$

And this example makes sed do the same as grep:

$ sed -n '/model/p' cars
road or gathering resources for it.  The model American puts
$

That is, it displays the lines matching the RE (model).

Standard input

So far, we have only seen sed operating on files. This example is different because if shows sed operating on its standard input:

$ date | sed 's/:/ /g' | wc -w
       8
$

Here, the standard input comes via a pipe from the date command; sed changes the colons in the time to spaces and then wc counts the words in sed's output.

This may be a contrived example but it demonstrates a very common way of using sed.

sed: command garbled

The following command looks perfectly straight forward.

sed 's/model/typical/ ' cars

Surely it just changes "model" in its input to "typical"? In fact when we run it, we get the following:

sed: command garbled: s/model/typical/

The problem is that sed is very intolerant of extra spaces, and there is one at the end of the substitute command. To make matters worse, there is no sign of the extra space in sed's error message. Also, the error message is usually all sed has to say when you get an editing command wrong.

QUESTIONS

In these questions, you have to work out the effect of the given substitute command on this line of text:

Smith, A.B.

Each substitution is to be done on the original line, not on the result of the previous substitution.

  1. s/ith/ythe/
    

    Answer

    Smythe, A.B.
    
  2. s/./-/
    

    Answer

    -mith, A.B.
    
  3. s/./-/g
    

    Answer

    -----------
    
  4. s/.*/-/
    

    Answer

    -
    
  5. s/,/./
    

    Answer

    Smith. A.B.
    
  6. s/\./,/g
    

    Answer

    Smith, A,B,
    
  7. s/m.t/mot/
    

    Answer

    Smoth, A.B.
    
  8. s?m.t?mot?
    

    Answer

    Smoth, A.B.
    
  9. s/^/Jean /
    

    Answer

    Jean Smith, A.B.
    
  10. s/$/ (Ms)/
    

    Answer

    Smith, A.B. (Ms)
    
  11. s/[mi]/X/
    

    Answer

    SXith, A.B.
    
  12. s/[im]/X/
    

    Answer

    SXith, A.B.
    
  13. s/[mi][mi]*/XX/
    

    Answer

    SXXth, A.B.
    
  14. s/[mi]*/XX/
    

    Answer

    XXSmith, A.B.
    
  15. s/[A-Z]/X/
    

    Answer

    Xmith, A.B.
    
  16. s/[a-z]/X/
    

    Answer

    SXith, A.B.
    
  17. s/[a-z][a-z]*/X/
    

    Answer

    SX, A.B.
    
  18. s/[A-Z]/X/g
    

    Answer

    Xmith, X.X.
    
  19. s/[^AB]/X/g
    

    Answer

    XXXXXXXAXBX
    
  20. s/[^,]*/X/
    

    Answer

    X, A.B.
    
  21. s/^[^,]*, .\./X/
    

    Answer

    XB.
    
  22. s/.*/& (&)/
    

    Answer

    Smith, A.B. (Smith, A.B.)
    
  23. s/^\([^,]*\), *\(.*\)/\2 \1/
    

    Answer

    A.B. Smith
    

ANSWERS

  1. Smythe, A.B.
    
  2. -mith, A.B.
    
  3. -----------
    
  4. -
    
  5. Smith. A.B.
    
  6. Smith, A,B,
    
  7. Smoth, A.B.
    
  8. Smoth, A.B.
    
  9. Jean Smith, A.B.
    
  10. Smith, A.B. (Ms)
    
  11. SXith, A.B.
    
  12. SXith, A.B.
    
  13. SXXth, A.B.
    
  14. XXSmith, A.B.
    
  15. Xmith, A.B.
    
  16. SXith, A.B.
    
  17. SX, A.B.
    
  18. Xmith, X.X.
    
  19. XXXXXXXAXBX
    
  20. X, A.B.
    
  21. XB.
    
  22. Smith, A.B. (Smith, A.B.)
    
  23. A.B. Smith
    

Valid XHTML 1.0! Valid CSS!
http://homepages.shu.ac.uk/~cmsps/unix/edsed.html
Last updated: Thursday 05 April 2012 at 17:45