Now that we know about
regular expressions, we can learn a little about Unix's line
editors
ed
and
sed.
There are about 20 commands in
ed
but we will only learn the three most useful ones.
We can get by with so little knowledge because we will not be using
ed
directly.
A
line
editor is a program that allows us to change a file
one line at a time; we don't normally see the file or even the line
we are changing while editing it.
The alternative is a
full-screen
editor which shows us a screenful of the file while we are making changes.
It may seem odd to learn about line editors when
full-screen editors are available.
However, the line editors and their commands are very powerful;
they will be needed when we build our own tools and commands.
Also, one of the most important features of the full-screen editor
vi
is that it allows us to use
ed's powerful commands.
To edit an existing file, we use:
$ ed cars 727 1 The typical American male devotes more than 1,600 hours a
The
727
is the number of characters in the
cars
file; it was displayed by
ed.
The second number was typed by me; it is the command which makes
the first line the current line.
Mostly,
ed
does commands without displaying anything but when it moves to a
new line it does display it.
Notice that
ed
does not display a prompt;
newcomers find that unsettling.
Another peculiarity of
ed
is that it has only one error message - which consists of just a question mark!
The most used command in
ed
is the substitute command:
s/devote/waste/
It consists of the letter
s
followed by a character known as the
delimiter;
then comes a string of characters which is to be removed from the line,
followed by the delimiter;
lastly comes a string of characters which is to replace what was removed,
followed again by the delimiter.
So our example takes
devote
out of the line and puts
waste
in its place.
The reason that the delimiter occurs three times is that the first one
announces which character will be used in this particular substitute command.
We could use a different delimiter each time we perform a substitution.
In practice, most people use slash
(/)
like we have done unless it occurs in one of the strings of characters.
If the string of characters we have asked to be removed does not
occur in the current line,
ed
will display its error message:
s/demote/waste/
?
It will not try to find the string on some other line.
Do not expect
ed
to display the new version of the line - it won't.
We can make that happen by adding a
p
to the end of the command.
Here is another substitute command:
sXmaleXmanXp
The typical American man wastes more than 1,600 hours a
It has a different delimiter
(X)
and has the
p
suffix to make
ed
display the latest version of the line.
The basic form of the substitute command only makes
one
change to the line.
If we want to change every occurrence in the current line, we
have to add a
g
suffix, as shown here:
s/an/AN/gp
The typical AmericAN mAN wastes more thAN 1,600 hours a
It is OK to use two suffixes but they must be in the order shown. Notice, however, that only the current line is changed; all the others are left alone.
The first string of characters in the substitute command is actually a regular expression (RE) as this example shows:
s/AN.*AN/an man wastes more than/p
The typical American man wastes more than 1,600 hours a
Notice that
ed
matches as many characters as possible against the RE.
Because we can use REs,
ed
does not need separate commands to insert text at the ends
of lines or to split a line in two.
The substitute command is used for all three jobs.
For instance:
s/^/Fact: /p
Fact: The typical American man wastes more than 1,600 hours a
inserts text at the start of a line, and:
s/$/t/p
Fact: The typical American man wastes more than 1,600 hours at
sticks text to the end of the line.
The following command changes the space character between
n
and
m
into a new-line:
s/n m/n\ m/p man devotes more than 1,600 hours at - Fact: The typical American u p Fact: The typical American man devotes more than 1,600 hours at
It is the backslash
(\) immediately before the end of the line that
tells
ed
that the new-line is part of the replacement string.
Without the backslash,
ed
would take the new-line as the end of the command.
Notice the use of the
-
command to go to the line above the current
line, and the use of the
u
command to undo the effect of
the previous change.
The string of characters which replace those matched by the RE is known as the replacement string. If no replacement string is supplied the characters matching the RE are simply removed - as shown here:
s/......//p The typical American man wastes more than 1,600 hours at s/.$//p The typical American man wastes more than 1,600 hours a
Notice that when it is finding a match for a RE,
ed
starts looking at the
left-hand
end of the line.
That is why the string of dots matched the
first
six characters on the line.
It was the dollar sign after the single dot that made it match the
last
character on the line.
An ampersand (&) in the replacement string has a special meaning.
Let's see it in action before looking at the details:
s/man/(&)/p
The typical American (man) wastes more than 1,600 hours a
The ampersand means: whatever the RE matched in the most recent match. In the example that was "man", so the overall effect is to enclose the word in parentheses. If needed, more than one ampersand can be used, perhaps to duplicate the matched text.
We can enclose parts of a complex regular expression in escaped parentheses
(\() and (\)).
Doing so,
lets us split the complex regular expression into smaller parts
and refer to the smaller parts individually.
For example:
s/\(.*American\).*\(wastes.*\)/\1 person \2/p
The typical American person wastes more than 1,600 hours a
Here the regular expression matched the whole line. The match was done in three parts. The first part was up to and including "American". The last was from "wastes" to the end of the line. The middle part was all the characters between the first and last parts, that is "(man)" (and the surrounding space characters).
In the replacement string,
\\1
means the text matched by the first set of escaped parentheses.
Obviously,
\\2
means the text matched by the second set.
This means that our line of text is replaced with the first and last parts
with "person" (and spaces) sandwiched between.
Notice that we didn't bother to wrap the middle part in escaped parentheses
because we had no need to refer to it later.
This facility is very complex but amazingly useful when building tools to alter complex text. If the need arises, we can refer to up to nine parts of a RE.
Because so much can be done with the substitute command and regular
expressions,
ed
only has about 20 commands.
We will only learn seven more of them here.
You can read
ed's
man
page if you need to find out about the rest.
Our seven new commands are:
Command | Function --------+------------------------- d | delete the current line p | display the current line u | undo the last change w | write changes to disk q | quit g | global v | inverse global
Here are the first three of those commands in action:
p The typical American person wastes more than 1,600 hours a d p year to his car. He sits in it while it goes and while it u p The typical American person wastes more than 1,600 hours a
The lines containing just a single character are the commands. Notice that the delete command causes no output, and that the line after the deleted one becomes the current line. Also, when the deletion is undone, the replaced line becomes the current line.
We can look for a line containing a particular word by using a command like this:
/spends/
tickets. He spends four of his sixteen waking hours on the
The search starts at the current line; it continues towards the last line of the file and wraps round from the last line to the first one if needed. Is is possible to search from the current line towards line one if needed:
?petrol?
He works to pay for petrol, tolls, insurance, taxes and
This search also wraps round.
These ways of referring to lines:
/spends/ ?petrol?
are known as context addresses; they can be used as if they were line numbers. Notice that we can use REs in context addresses; we aren't limited to fixed text. Note, however, that the delimiters are fixed.
If we do not give
ed
anything where it is expecting a RE it re-uses the last RE
we did give.
This is useful because it saves us typing - as shown
/[A-Z][a-z][a-z]*/ He works to pay for petrol, tolls, insurance, taxes and // tickets. He spends four of his sixteen waking hours on the // road or gathering resources for it. The model American puts
The same thing applies to the substitute command too:
s//A/p
road or gathering resources for it. A model American puts
In all the above commands
ed
re-used the RE which represents a word starting with a capital letter.
If we put a line number in front of an
ed
command it will affect only the appropriate line of the file.
We can also specify a range of lines by giving two line numbers
separated by a comma.
So this:
11p
would display line 11, and this:
2,6p
would display lines two through to six.
If we had used
d
instead of
p, we would have deleted seven lines.
This example shows line numbers in front of the substitute command:
2s/car/automobile/p year to his automobile. He sits in it while it goes and while it 1,12s/ it / his car /gp money to put down on his car and to meet the monthly instalments.
Notice that
ed
makes the last line specified the new current line,
and only displays that line, although it may have changed
twelve lines.
This table shows some special line numbers and their meanings:
Symbol | Meaning -------+---------------------------- 1 | the first line $ | the last line . | the current line 0 | the line before the first!
If a line number is left out,
ed
assumes the current line is referred to.
Therefore the following ranges can be used:
Range | Meaning
----------------+----------------------------
1,$ | all lines
,$ | from the current line to the end of the file
1, | from line one to the current line
.-2,.+2 | the five lines centred on the current line
/parks/,/four/ | from the line containing parks
| to the line containing four
For example, this command would display all of the file:
1,$p
But, we needn't see the output!
The trouble with trying to change all lines in a file by using
1,$
is that it will fail if the change cannot be made on at least
one line.
To solve this problem we can use the global command;
it has this format:
g/RE/commands
where
commands
represents any
ed
commands we wish to perform on the lines that match the RE.
Here is an example:
g/car/p
year to his automobile. He sits in his car while his car goes and while it
stands idling. He parks his car and searches for it. He earns the
money to put down on his car and to meet the monthly instalments
In the example,
commands
has been replaced with the
p
command which displays the line.
The net effect of our global command therefore
is to display all lines containing
car.
Notice the format of the last command.
It was
g/RE/p, or, ignoring the slashes,
gREp.
Efficiently performing that command is where
grep
gets it's name.
Here is another
g
command:
g/automobile/s//car/g
Here the command is a substitute command.
See how we didn't have to type the RE again;
ed
re-used the previous one which in this case was
automobile.
The advantage of using the global command is that it can safely be used
to change all lines.
There is no problem if none of them contain
automobile.
Do not confuse the
g
at the start of the line with the one
at the end.
The first one means:
all lines containing the RE; the second means:
make all possible changes on each line.
If we wish to see the changed lines we can use a command like this:
g/his car/s//it/gp
year to it. He sits in it while it goes and while it
stands idling. He parks it and searches for it. He earns the
money to put down on it and to meet the monthly instalments.
It is the
p
suffix on the substitute command that makes the difference.
Often, we wish to delete certain lines. Here is how it is done:
g/ it /d
We used the
d
command to delete the lines containing the RE.
The
v
command is the inverse of
g
in that the commands are
performed on lines that do
not
match the RE.
This command would delete the lines that the previous command did
not delete:
v/ it /d
Since all lines have now been deleted, we must take care not to let
ed
write the changes back to disk.
The
w
command would do that.
Instead, all we have to do is quit, using the
q
command:
q ? q $
Notice, we have to repeat the command to confirm that we are aware that we have altered the file but not written it to disk.
The editor
ed
was designed for interactive use.
Typically, it moves backwards and forwards from line to line in a file randomly,
making changes as directed by the user.
The other line editor,
sed, is a
stream
editor.
It always starts at the first line
and works through, line by line, towards the end of the text.
The changed text is never put back into a file by
sed;
it simply puts the new version of the text onto the standard output.
It is not restricted to working on text in a file - it can edit
its standard input too.
It is more efficient than
ed
for non-interactive edits and can handle larger files.
Apart from those differences in how the two editors operate, there
is an important difference between
ed
and
sed
commands.
The
ed
commands are done
only on the current
line, unless we specify otherwise.
The
sed
commands are done
on every
line, unless we specify otherwise.
For example, this
ed
command:
s/He/She/
would only affect the current line.
When we use it with
sed
it is attempted on
every
line, as shown:
$ sed 's/He/She/' cars
The typical American male devotes more than 1,600 hours a
year to his car. She sits in it while it goes and while it
stands idling. She parks it and searches for it. He earns the
money to put down on it and to meet the monthly instalments.
She works to pay for petrol, tolls, insurance, taxes and
tickets. She spends four of his sixteen waking hours on the
road or gathering resources for it. The model American puts
in 1,600 hours to get 7,500 miles: less than five miles per
hour. In countries deprived of a transportation industry,
people manage to do the same, walking wherever they want to
go, and they allocate only three to eight percent of their
society's time budget to traffic instead of 28 per cent.
Ivan Illich
$
As you can see, the first of
sed's parameters is the editing command; the second is a file name.
The quotation marks around the editing command were, strictly speaking,
not necessary.
When we looked at
grep, we saw that it is usually a good idea to wrap REs in quotation marks
so that the shell does not interpret the metacharacters and spaces.
The same reasoning applies to
sed's commands.
A very important point about the previous example is that
cars
is just an input file.
Its contents were not altered.
The edited text
was only sent to the standard output, not put back into the file.
If we need to save the new version of the text, we have to do this:
$ sed 's/He/She/' cars > newcars
$
which redirects the output into a file.
Just as with
ed, we can use line numbers or context addresses.
For example:
$ sed '2s/He/She/' cars > newcars $ sed '2,6s/He/She/' cars > newcars $ sed '/typical/,/parks/s/He/She/' cars > newcars $
So far, we have only done one change at once with
sed.
Usually however, we wish to do several.
Here is how it is done:
$ sed -e 's/car/auto/' -e 's/petrol/gas/' cars > newcars
$
The difference is that each editing command is preceded by the
-e
option, otherwise
sed
assumes the second and subsequent editing commands are file names.
For clarity, we could split a long command over several lines, with each editing command on its own line:
$ sed -e 's/car/auto/' \ > -e 's/petrol/gas/' \ > -e 's/tickets/fines/' cars > newcars $
Don't be confused by the
>; it is the secondary prompt, used by the shell to remind the user that the
command has spilled over onto another line.
It was typed by Unix and not by me.
We could achieve the same effect like this:
$ sed 's/car/auto/ > s/petrol/gas/ > s/tickets/fines/' cars > newcars $
The last method, without the
-e
options and with only one set of quotation marks, is by far the neatest.
Notice how, in the previous two examples, the editing commands have
been lined up with each other to enhance readability.
The the next example shows probably the most common way
of using
sed.
$ sed '/Man/d > /car/s//auto/ > /petrol/s//gas/ > /tickets/s//fines/' cars > newcars $
All the commands look just like
ed's
g
command except they don't start with "g"!
And they work exactly the same, executing the command after the initial RE
only on the lines that match the RE.
The empty RE in the substitute commands causes the previous
RE (which is
car
in the first substitute command) to be re-used.
The big advantage of this variation is that all the editing commands
begin with the RE, making the whole thing easier to read.
Usually,
sed
writes every line to the standard output after making any changes.
The
-n
option and
sed's
p
command allow us to display only certain lines of a file.
This example displays line four:
$ sed -n '4p' cars
money to put down on it and to meet the monthly instalments.
$
And this example makes
sed
do the same as
grep:
$ sed -n '/model/p' cars
road or gathering resources for it. The model American puts
$
That is, it displays the lines matching the RE (model).
So far, we have only seen
sed
operating on files.
This example is different because if shows
sed
operating on its standard input:
$ date | sed 's/:/ /g' | wc -w
8
$
Here, the standard input comes via a pipe from the
date
command;
sed
changes the colons in the time to spaces and then
wc
counts the words in
sed's output.
This may be a contrived example but it demonstrates a very
common way of using
sed.
The following command looks perfectly straight forward.
sed 's/model/typical/ ' cars
Surely it just changes "model" in its input to "typical"? In fact when we run it, we get the following:
sed: command garbled: s/model/typical/
The problem is that
sed
is very intolerant of extra spaces, and there is one at the end
of the substitute command.
To make matters worse, there is no sign of the extra space in
sed's error message.
Also, the error message is usually all
sed
has to say when you get an editing command wrong.
In these questions, you have to work out the effect of the given substitute command on this line of text:
Smith, A.B.
Each substitution is to be done on the original line, not on the result of the previous substitution.
s/ith/ythe/
Answer
Smythe, A.B.
s/./-/
Answer
-mith, A.B.
s/./-/g
Answer
-----------
s/.*/-/
Answer
-
s/,/./
Answer
Smith. A.B.
s/\./,/g
Answer
Smith, A,B,
s/m.t/mot/
Answer
Smoth, A.B.
s?m.t?mot?
Answer
Smoth, A.B.
s/^/Jean /
Answer
Jean Smith, A.B.
s/$/ (Ms)/
Answer
Smith, A.B. (Ms)
s/[mi]/X/
Answer
SXith, A.B.
s/[im]/X/
Answer
SXith, A.B.
s/[mi][mi]*/XX/
Answer
SXXth, A.B.
s/[mi]*/XX/
Answer
XXSmith, A.B.
s/[A-Z]/X/
Answer
Xmith, A.B.
s/[a-z]/X/
Answer
SXith, A.B.
s/[a-z][a-z]*/X/
Answer
SX, A.B.
s/[A-Z]/X/g
Answer
Xmith, X.X.
s/[^AB]/X/g
Answer
XXXXXXXAXBX
s/[^,]*/X/
Answer
X, A.B.
s/^[^,]*, .\./X/
Answer
XB.
s/.*/& (&)/
Answer
Smith, A.B. (Smith, A.B.)
s/^\([^,]*\), *\(.*\)/\2 \1/
Answer
A.B. Smith
Smythe, A.B.
-mith, A.B.
-----------
-
Smith. A.B.
Smith, A,B,
Smoth, A.B.
Smoth, A.B.
Jean Smith, A.B.
Smith, A.B. (Ms)
SXith, A.B.
SXith, A.B.
SXXth, A.B.
XXSmith, A.B.
Xmith, A.B.
SXith, A.B.
SX, A.B.
Xmith, X.X.
XXXXXXXAXBX
X, A.B.
XB.
Smith, A.B. (Smith, A.B.)
A.B. Smith
http://homepages.shu.ac.uk/~cmsps/unix/edsed.html
Last updated: Thursday 05 April 2012 at 17:45