Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
General
GeneralPortable MacsHardwareNetworking
Applications
Mac ApplicationsEudoraFirefox / MozillaInternet ExplorerOutlook ExpressMS OfficeEntourageExcelPowerPointWordVirtual PCMedia PlayerOther MS Products
Programming
Mac ProgrammingCodeWarriorPerl
Country Specific
Australian Mac GroupUK Mac Group

Mac Forum / Country Specific / UK Mac Group / September 2008



Tip: Looking for answers? Try searching our database.

sed usage question

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Rowland McDonnell - 26 Aug 2008 00:12 GMT
Okay, it's like this:

I'm trying to work out a sed incantation to perform three simple search
and replace jobs on a single file.

I want to replace every occurance of:

<tab>        with  <newline>
&            with  \&
char no. 11  with  \par

(where <tab> and <newline> are the characters with those meanings to the
computer, and \& and \par are those literal strings.  LaTeX speak, for
those who might be curious.  And that's char 11 in dec; B in hex)

I've read the man page, I've read some tutorials (including a tutorial
on quoting which left me totally incomprehending, damnit).  I've tried
to figure out what to do and failed.

But:

sed 's/\&/\\&/g' <inputfile >outputfile

does one of the three jobs I need to have done.

Here are my other two attempts:

sed 's/\t/\n/g' <inputfile >outputfile

replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I
thought it should do.

sed 's/\xB/\\par/g' <inputfile >outputfile

seems to do bugger all rather than replacing character number B (hex)
with the string \par as I thought it should do.

So: can anyone suggest how to get sed to do the two jobs I can't work
out myself?

(And can anyone explain why I have to put the s/ [...] argument to sed
inside quote marks?  I can't see any reason for needing to do that
myself.)

Cheers,
Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Steve Firth - 26 Aug 2008 00:48 GMT
> I'm trying to work out a sed incantation to perform three simple search
> and replace jobs on a single file.

Yes, well since you're the sort of person that can determine guilt or
innocence based simply on a photograph, no doubt something as simple as
RTFM isn't beyond you. So off you go and stop snivelling.
Rowland McDonnell - 26 Aug 2008 02:00 GMT
> > I'm trying to work out a sed incantation to perform three simple search
> > and replace jobs on a single file.
>
> Yes, well since you're the sort of person that can determine guilt or
> innocence based simply on a photograph, no doubt something as simple as
> RTFM isn't beyond you. So off you go and stop snivelling.

I've read the manual, you ill-bred oaf.  I developed my attempts based
on my best understanding of the documentation I could find.

And then I asked a straightforward technical question, presenting my
best ideas so far based on my best understanding of the available
documentation.

Your point in making your post was to achieve what technical aim?

And I've snivelled about it precisely where?  Can you show me?  Here is
my original question:

(don't bother replying: I'm not interested in your personal opinions.)

========================================================================

Okay, it's like this:

I'm trying to work out a sed incantation to perform three simple search
and replace jobs on a single file.

I want to replace every occurance of:

<tab>        with  <newline>
&            with  \&
char no. 11  with  \par

(where <tab> and <newline> are the characters with those meanings to the
computer, and \& and \par are those literal strings.  LaTeX speak, for
those who might be curious.  And that's char 11 in dec; B in hex)

I've read the man page, I've read some tutorials (including a tutorial
on quoting which left me totally incomprehending, damnit).  I've tried
to figure out what to do and failed.

But:

sed 's/\&/\\&/g' <inputfile >outputfile

does one of the three jobs I need to have done.

Here are my other two attempts:

sed 's/\t/\n/g' <inputfile >outputfile

replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I
thought it should do.

sed 's/\xB/\\par/g' <inputfile >outputfile

seems to do bugger all rather than replacing character number B (hex)
with the string \par as I thought it should do.

So: can anyone suggest how to get sed to do the two jobs I can't work
out myself?

(And can anyone explain why I have to put the s/ [...] argument to sed
inside quote marks?  I can't see any reason for needing to do that
myself.)

Cheers,
Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Rowland McDonnell - 26 Aug 2008 02:06 GMT
> > I'm trying to work out a sed incantation to perform three simple search
> > and replace jobs on a single file.
>
> Yes, well since you're the sort of person that can determine guilt or
> innocence based simply on a photograph, no doubt something as simple as
> RTFM isn't beyond you. So off you go and stop snivelling.

(I need to add:

The creature Firth's claim about me being able to determine guilt or
innocence based on a photograph is a product purely of his own diseased
imagination and has no basis in reality or any claim that I have made.

The creature Firth insults me with his dishonest claim that my
straightforward technical question could be classed as `snivelling' by
anyone not an inveterate liar.

The creature Firth has decided to insult me by suggesting that I RTFM,
when it is perfectly clear that I've done just that and got 1/3 of the
way towards my end goal and no further since TFM is FS.

Don't take my word for it:

<http://www.grymoire.com/Unix/Sed.html#toc-uh-0>:

"Anyhow, sed is a marvelous utility. Unfortunately, most people never
learn its real power. The language is very simple, but the documentation
is terrible."

Moreover, the creature Firth was in my killfile for a reason I had
forgotten: his replies to me almost always contain nothing but
ill-mannered insults of a sort that demonstrate little more than the
fact he was brought up very badly and has no respect for social
decency.)

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Rowland McDonnell - 26 Aug 2008 02:16 GMT
> > I'm trying to work out a sed incantation to perform three simple search
> > and replace jobs on a single file.
>
> Yes, well since you're the sort of person that can determine guilt or
> innocence based simply on a photograph, no doubt something as simple as
> RTFM isn't beyond you. So off you go and stop snivelling.

Firth, you are scum plain and simple.

This reply of yours, which I downloaded specially since you are in my
killfile, was one I had hoped contained an honest attempt at a useful
reply.

What I got was something that upset me very badly because it's such a
shitty response.

Yes, that's right: you've upset me very very badly.

So f.ck you Firth, and I hope you die slowly and horribly of something
very nasty in the near future.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Steve Firth - 26 Aug 2008 10:27 GMT
> Yes, that's right: you've upset me very very badly.

Good, now you know how the way that you treat every other user of this
group feels to the person that is treated that way. I know you won't
learn a lesson, since you're learning resistant, but yoru nasty little
sign off proves what a poostain on the bedsheet of life you are, and I
hope it acts as a warning to others about your character, or rather lack
of it.

> (don't bother replying: I'm not interested in your personal opinions.)

Bwhahahahahahahahahahahahaha. And you posted three replies to show how
not interested you are. By *your* standards that makes you a liar, or
worse.

What do people say when they see your photograph - once they've got over
the retching?
Stewart Smith - 26 Aug 2008 09:01 GMT
> sed 's/\t/\n/g' <inputfile >outputfile
>
> replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I
> thought it should do.

I had originally thought that it might need extra backslashes.  I've
definitely used a similar command in vim but maybe that uses a different
syntax to standard sed.  I found this:

http://sed.sourceforge.net/sed1line.txt
"USE OF '\t' IN SED SCRIPTS: For clarity in documentation, we have used
the expression '\t' to indicate a tab character (0x09) in the scripts.
However, most versions of sed do not recognize the '\t' abbreviation,
so when typing these scripts from the command line, you should press
the TAB key instead. '\t' is supported as a regular expression
metacharacter in awk, perl, and HHsed, sedmod, and GNU sed v3.02.80."

Looks like there's lots of useful stuff on that page anyway.

> (And can anyone explain why I have to put the s/ [...] argument to sed
> inside quote marks?  I can't see any reason for needing to do that
> myself.)

It's probably in case you want to use spaces in one of the things you're
trying to match or replace.  If you didn't use the quotes how would the
shell know what sed is supposed to use as input?

Stewart
Rowland McDonnell - 26 Aug 2008 17:58 GMT
> > sed 's/\t/\n/g' <inputfile >outputfile
> >
[quoted text clipped - 6 lines]
>
> http://sed.sourceforge.net/sed1line.txt

Ah!  Examples - excellent!

> "USE OF '\t' IN SED SCRIPTS: For clarity in documentation, we have used
> the expression '\t' to indicate a tab character (0x09) in the scripts.
> However, most versions of sed do not recognize the '\t' abbreviation,
> so when typing these scripts from the command line, you should press
> the TAB key instead. '\t' is supported as a regular expression
> metacharacter in awk, perl, and HHsed, sedmod, and GNU sed v3.02.80."

Ah!

Any idea where I can find out about the regexp format used by sed?

I've read the man pages I can find on the subject but got baffled.

And for those who put that down to whinging, whining Rowland, one of the
man pages I'm pushed towards says this about itself:

"This is an alpha release with known defects.  Please report problems."

> Looks like there's lots of useful stuff on that page anyway.

Well, it might not teach me what I need, but it looks about a thousand
times more useful than anything I managed to find myself.

> > (And can anyone explain why I have to put the s/ [...] argument to sed
> > inside quote marks?  I can't see any reason for needing to do that
> > myself.)
>
> It's probably in case you want to use spaces in one of the things you're
> trying to match or replace.

I get errors if I miss the quotes out, and nary a space in sight.

>  If you didn't use the quotes how would the
> shell know what sed is supposed to use as input?

By parsing the input according to the rules?  The point here is that I
clearly don't understand the rules because I don't understand why the
quotes are needed and I'd like to.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Steve Firth - 26 Aug 2008 18:06 GMT
> The point here is that I clearly don't understand the rules because I
> don't understand

... the difference between arse and elbow.
chris - 27 Aug 2008 09:14 GMT
>>> sed 's/\t/\n/g' <inputfile >outputfile
>>>
[quoted text clipped - 13 lines]
> clearly don't understand the rules because I don't understand why the
> quotes are needed and I'd like to.

The quotes are there to protect the command from being interpreted by
the shell. Thus, everything within the quotes is passed to the command
(sed in this case) as quoted, not following interpretation by the shell.

Take this as an example:

ls *.txt

This will pass all files that end in .txt (as matched by the shell) in
the current directory to the 'ls' command, which will then list them on
screen.

ls '*.txt'

This will only list the one file that matches exactly to *.txt, if it
exists.

So, in the first example the *.txt is interpreted by the shell to expand
to all files that match the '*' wildcard and ending in '.txt'. Then all
those files are passed to ls for output.

I'll leave it to you to read up on the difference between single quotes
and double quotes.

A book on bash may well be a good start as manpages don't seem to give
you the information in a way you like.
Rowland McDonnell - 31 Aug 2008 03:51 GMT
> >>> sed 's/\t/\n/g' <inputfile >outputfile
> >>>
[quoted text clipped - 16 lines]
> The quotes are there to protect the command from being interpreted by
> the shell.

Ah!

Ah so!

Yes!

Obvious.  I feel silly.

On the other hand, I've never read anything that said what you just said
plainly.  Why can't Unix docs just say things plainly like you've just
done?

Thank you kindly, sir, you have just cleared up something that's been
bothering me for a while.  Argh.  Arhg.  My brain hurts.

> Thus, everything within the quotes is passed to the command
> (sed in this case) as quoted, not following interpretation by the shell.
[quoted text clipped - 6 lines]
> the current directory to the 'ls' command, which will then list them on
> screen.

Uhuh.

> ls '*.txt'
>
> This will only list the one file that matches exactly to *.txt, if it
> exists.

By which you mean, a file with the exact name: *.txt; not the file with
the name <anything>.txt.

Yes?

> So, in the first example the *.txt is interpreted by the shell to expand
> to all files that match the '*' wildcard and ending in '.txt'. Then all
> those files are passed to ls for output.
>
> I'll leave it to you to read up on the difference between single quotes
> and double quotes.

<heh>  Cheers. ;-)  However, now you've got me thinking along the right
lines, the next bit shouldn't be a major problem.

> A book on bash may well be a good start as manpages don't seem to give
> you the information in a way you like.

Any suggestions?  I've got Learning Unix for MacOS X Tiger and I reckon
it's crap.

(It's like this: I've never got anywhere learning Unix stuff unless I've
had a guru to hand, as in in the room, to help me out when I got stuck,
which was frequently.  Thing is, all I need is a hand to unstick me from
the problem, and off I go again until the next patch of mud.  I don't
need constant hand-holding, just pulling out of the frequent bogs they
drop you in to.)

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

chris - 02 Sep 2008 09:56 GMT
>> ls '*.txt'
>>
[quoted text clipped - 5 lines]
>
> Yes?

Correct.

>> A book on bash may well be a good start as manpages don't seem to give
>> you the information in a way you like.
>
> Any suggestions?  I've got Learning Unix for MacOS X Tiger and I reckon
> it's crap.

I'm afraid not. I've been able to get along with the manpages and
resources on the interweb. So, I've never really needed a 'good' book on
bash.
Rowland McDonnell - 02 Sep 2008 10:58 GMT
[snip]

> >> A book on bash may well be a good start as manpages don't seem to give
> >> you the information in a way you like.
[quoted text clipped - 5 lines]
> resources on the interweb. So, I've never really needed a 'good' book on
> bash.

Righto - thanks.

I don't think there exists a good book on the subject.  But if I'm lucky
enough to get a non-abusive reply to my queries here, it seems that I
can, one piece at a time, learn a few things.

The problem is that anything I learn will get forgotten soon because I
don't do a lot of command line stuff, and the bits and pieces I'm
learning are not explained well in any documentation that I've seen.

If I weren't so f.cked in the head, I'd write a Unix manual myself.  One
that would be a lot better than all others in existence for beginners
wanting to learn how to do the sorts of things I want to do.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Gary - 26 Aug 2008 23:05 GMT
> Okay, it's like this:
>
[quoted text clipped - 27 lines]
> replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I
> thought it should do.

Okay for your tab/newline one, do this:

sed -E 's/  /^L^M/g' <afile

Now that needs some explaining

I don't think the -E is needed... But the thing in quotes is most important.

The key sequence is,

's/Ctrl-VCtrl-I/Ctrl-VCtrl-LCtrl-V-Ctrl-M/g'

The Ctrl-V says, expect a control character next, and the Ctrl-I is tab.
Ctrl-L is LF and Ctrl-M is CR. You can obviously replace tab with either CR,
LF or both, or whatever. The g on the end says to replace all occurences.

My input file changed from

$ more afile
a       b       c       d

The whitespace there is Tab, to:

$ sed -E 's/  /^L^M/g' <afile
a
b
c
d

Hope that's what you're after there, You can use the same technique for your
hex character, of course B is 11, which is the 11th letter of the alphabet so
use Ctrl-VCtrl-K as the thing to be replaced for that one.

Enjoy.

Signature

remove stars for email
g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m

Steve Folly - 26 Aug 2008 23:19 GMT
On 26/08/2008 23:05, in article
0001HW.C4DA3CB3005AA279B01AD9AF@news-europe.giganews.com, "Gary"
<postmaster@127.0.0.1> wrote:

> The key sequence is,
>
[quoted text clipped - 3 lines]
> Ctrl-L is LF and Ctrl-M is CR. You can obviously replace tab with either CR,
> LF or both, or whatever. The g on the end says to replace all occurences.

Be careful - text files in Unix have just a LF (newline) as a line
delimiter. Text files created in Windows have CR/LF delimiters.

> Enjoy.

OK, you got sed working, but I wonder if 'tr' is simpler for single
character to single character replacements (which does recognize \t and \n)?

$ tr '\t' '\n' < infile > outfile

Signature

Regards,
Steve

"...which means he created the heaven and the earth... in the DARK! How good
is that?"

Gary - 26 Aug 2008 23:30 GMT
> On 26/08/2008 23:05, in article
> 0001HW.C4DA3CB3005AA279B01AD9AF@news-europe.giganews.com, "Gary"
[quoted text clipped - 10 lines]
> Be careful - text files in Unix have just a LF (newline) as a line
> delimiter. Text files created in Windows have CR/LF delimiters.

Well, I did think along those lines and that's why I described all the
options. If I used only LF in Mac OS X terminal, I get stepped output:

$ sed -E 's/  /^L/g' <afile
a
b
 c
  d

> OK, you got sed working, but I wonder if 'tr' is simpler for single
> character to single character replacements (which does recognize \t and \n)?
>
> $ tr '\t' '\n' < infile > outfile

That's what so great about UNIX. You can skin the same cat 100 different ways
just before you shoot yourself in the foot with it.

Still, the OPs problems were more than just single character substitutions
and whilst you could pipeline a tr before/after your sed, why have another
process when you don't have to? The single sed should be able to do it all.

Choices though. It's all about choices.

I might have done the whole thing with an awk. Much better than perl or
somesuch :)

Signature

remove stars for email
g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m

Rowland McDonnell - 31 Aug 2008 03:51 GMT
> > The key sequence is,
> >
[quoted text clipped - 6 lines]
> Be careful - text files in Unix have just a LF (newline) as a line
> delimiter. Text files created in Windows have CR/LF delimiters.

*SOME* text files created by Windoze have that hopelessly obsolete
brain-dead waste-of-space line terminator; not all.

(why oh why did MS ever decide to do it, I mean why waste space like
that?  WHY???)

> > Enjoy.
>
> OK, you got sed working, but I wonder if 'tr' is simpler for single
> character to single character replacements (which does recognize \t and \n)?
>
> $ tr '\t' '\n' < infile > outfile

My wife found that one for me - I didn't.  Thanks to you too for putting
me on to a neat tool I didn't know about.

But: the job I want to do isn't all single character replacements. If
nothing else works, I'll probably use tr in part.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Bruce Horrocks - 31 Aug 2008 14:18 GMT
> But: the job I want to do isn't all single character replacements. If
> nothing else works, I'll probably use tr in part.

Sed will also work from a file of instructions so you can have the 3
commands in there rather than try and create a hugely complicated,
single command line. Alternatively use a shell script that pipes the
output of sed or tr to another invocation of sed or tr that does the
next transformation. Repeat as required.

Regards,
Signature

Bruce Horrocks
Surrey
England
(bruce at scorecrow dot com)

Rowland McDonnell - 01 Sep 2008 06:15 GMT
> > But: the job I want to do isn't all single character replacements. If
> > nothing else works, I'll probably use tr in part.
>
> Sed will also work from a file of instructions so you can have the 3
> commands in there rather than try and create a hugely complicated,
> single command line.

I have to say I am sure it'd be /much/ easier to write a `hugely
complicated' single command line than to learn how to do drive sed from
a file of instructions.  Writing a `hugely complicated' command line is
just a matter of putting one thing after another - it's no more
complicated than writing the bits on separate lines.  Learning how to do
each thing with sed has proven very very hard.  Very very very hard
indeed.

But it's very easy to write a shell script with each step done with a
separate call to sed (or whatever), so who needs long command lines
trying to do it all in one go?

> Alternatively use a shell script that pipes the
> output of sed or tr to another invocation of sed or tr that does the
> next transformation. Repeat as required.

I've been thinking along those lines.  But thanks - if I hadn't been,
your suggestion would probably proven to be invaluable.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Mark Bestley - 31 Aug 2008 14:19 GMT
> > > The key sequence is,
> > >
[quoted text clipped - 13 lines]
> (why oh why did MS ever decide to do it, I mean why waste space like
> that?  WHY???)

Because they followed various standards. Unix was the non standard way.
RFCS for Mail, HTTP use CRLF.  (And did anyone but Mac just have CR?)

They are based on how printers and teletypes work.

A version of the history is <http://en.wikipedia.org/wiki/Newline>

Signature

Mark

Andrew Stephenson - 31 Aug 2008 15:18 GMT
> > *SOME* text files created by Windoze have that hopelessly obsolete
> > brain-dead waste-of-space line terminator; not all.
[quoted text clipped - 6 lines]
>
> [...]

And many programs use the two line-end codes in clever ways to
save bytes here and there.  Word Star, frex, sets the high bit
of the CR to denote "soft carriage return" (as computed by the
formatting function) and of LF to denote a page break (ditto).
Setting those bits must save a lot of time when redisplaying a
page on-screen (and a little when printing).

But Rowland knows better, hence it is a stupid way of working.

Sidebar: an ASCII toolkit filter wot I writ and use daily, has
line-end recognition rules that look for:
        <cr> <lf>
        <cr> <ff>
        <cr>
        <lf> <cr>
        <lf> <ff>
        <lf>
        <ff>
So one can learn to live with these little differences.  Note,
frex, how some Mac apps will happily insert whichever combo is
preferred.  This is only an issue for those who enjoy stress.

IMHO, of course. :-)
Signature

Andrew Stephenson

Rowland McDonnell - 01 Sep 2008 06:15 GMT
> > > *SOME* text files created by Windoze have that hopelessly obsolete
> > > brain-dead waste-of-space line terminator; not all.
[quoted text clipped - 9 lines]
> And many programs use the two line-end codes in clever ways to
> save bytes here and there.

But that has nothing whatever to do with using them *as a pair* to
indicate the end of each line in a stored text file.

There's no reason to do that - except when the software you are using
expects that, for reasons of historical idiocy.

>  Word Star, frex, sets the high bit
> of the CR to denote "soft carriage return" (as computed by the
> formatting function) and of LF to denote a page break (ditto).
> Setting those bits must save a lot of time when redisplaying a
> page on-screen (and a little when printing).

I don't see what's clever about deciding to waste space above the ASCII
range to duplicate existing control codes.

FF does nicely for a `new page' marker - surely that's what it's meant
for?  So why waste space >127 to duplicate it?

ASCII's got no `soft CR', but it has got a lot of control codes that
were redundant come the end of the 1970s, so WordStar could have used a
private re-definition of one of those, leaving the >127 range clear, and
keeping (almost) all the control codes down below 32 where they belong
(IMHO).

On the other hand, why not just set the high bit of CR for that job?  No
stupidity in that because it's not necessarily inefficient - it's just
`daft to me to use that part of the range that way'.

> But Rowland knows better, hence it is a stupid way of working.

You know, I find false and snide remarks like that most annoying.

Why state that I hold opinions and attitudes that I do not?

It is certainly stupid to use both CR and LF as a combination to delimit
every line ending.

That is certainly stupid because it's a completely pointless waste of
space.

But you've decided to claim that I've made a different claim, haven't
you?

Now, you're talking about using different ASCII codes to indicate
different things.  I am firmly of the opinion that is the opposite of
stupid because it is a way of using what you've got in the way it was
meant to be used in order to improve efficiency.

So you see, you have just insulted me on the grounds that I hold
opinions, but I do not hold the opinions you false and maliciously
accuse me of holding.

The truth is that I hold the opposite opinions to the ones that you
falsely and maliciously accuse me of holding.

Why do that?  Why use dishonest tactics to launch a personal attack
against me?

> Sidebar: an ASCII toolkit filter wot I writ and use daily, has
> line-end recognition rules that look for:
[quoted text clipped - 6 lines]
>            <ff>
> So one can learn to live with these little differences.

If all the above combinations are turned into identical (say) LF EOL
characters, the above filtering spec would bugger up some things I've
done in the past.

I've worked with files where a FF means FF, for example.

>  Note,
> frex, how some Mac apps will happily insert whichever combo is
> preferred.  This is only an issue for those who enjoy stress.

Some Mac applications barf when faced with CR/LF but can handle CR *OR*
LF quite happily.  It's necessary to get the issue dealt with.

[snip]

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Jack Campin - bogus address - 01 Sep 2008 22:06 GMT
> It is certainly stupid to use both CR and LF as a combination
> to delimit every line ending.

It mapped exactly onto what you wanted a golfball or dot matrix
printer to do at the end of a line: move the head back and roll
the carriage up.

And there were frequent occasions when you'd want to decouple those
actions, as when overstriking or vertical tabbing.  Having separate
CR and LF must have saved vast amounts of time and wear on printers.

That sort of hardware survived long enough that there were vast
volumes of data designed to be printed on it.  It made no sense to
introduce a change in format just to free up one control character
(what would you then do with it?) or save maybe 5% of file space.

==== j a c k  at  c a m p i n . m e . u k  ===  <http://www.campin.me.uk> ====
Jack Campin, 11 Third St, Newtongrange EH22 4PU, Scotland == mob 07800 739 557
CD-ROMs and free stuff:  Scottish music, food intolerance, and Mac logic fonts
Rowland McDonnell - 02 Sep 2008 07:47 GMT
> > It is certainly stupid to use both CR and LF as a combination
> > to delimit every line ending.
>
> It mapped exactly onto what you wanted a golfball or dot matrix
> printer to do at the end of a line: move the head back and roll
> the carriage up.

But why store two characters in your file when you only need one to
delimit a line ending?  Why transmit two characters as a line delimiter
when you only need one to delimit a line ending?

Well, there are sometimes reasons for the latter, but none for the
former, so:

Why not translate the internal coding to external coding as and when you
want to print - if the printer really does require the separate
characters?

> And there were frequent occasions when you'd want to decouple those
> actions, as when overstriking or vertical tabbing.  Having separate
> CR and LF must have saved vast amounts of time and wear on printers.

I recall using old dot matrix printers in the early 80s.  Lots of
different Epsons, mostly.   I recall them doing the CR/LF job on receipt
of a single character (and I recall CR doing the job, although I suspect
that one could change this using DIP switches in many cases).

You can wind up one line using LF if CR is used for `CR/LF'.

Backspace exists.

But all that aside: you don't have to save your file on the host
computer in the same form you send it to the printer.  It makes no sense
to waste space when storage is so expensive, and it does make sense to
write a tiny bit of trivial code that sends CR/LF when the source file
says CR only.

Best of all possible worlds, that way - if you ask me.

> That sort of hardware survived long enough that there were vast
> volumes of data designed to be printed on it.

Yes - and I recall that hardware: I recall that some of it output CR/LF
combined on receipt of a CR character - I say `some' because in many
cases, I had no idea exactly what was going on, but in some, I did.

I recall some host computers having an option to /transmit/ both to the
printer if it needed both - but the host computer would only /store/ one
character to delimit the end of the line to save on storage costs.

Storage and transmission - two different jobs.  It made no sense back
then, given the very high cost of storage, to store the form you're
going to transmit if it's inefficient and can be generated from a more
efficient storage format using a trivially small amount of low-CPU-load
code as is the case with turning a stored CR into a transmitted CR/LF.

> It made no sense to
> introduce a change in format just to free up one control character
> (what would you then do with it?) or save maybe 5% of file space.

Alternatively, it made no sense to waste that storage space, so that is
in fact what people did: they wrote the computer code so that they could
store text files with single character line delimiters, and the printers
either worked fine with that, or the host computer sent a CR/LF combo
(or, irritatingly, LF/CR in the case of the BBC Micro) when the original
data said `CR' only.

That's what was actually done at the time on the hardware *I* used.

It was implemented in practice and I used it.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Mark Bestley - 02 Sep 2008 10:47 GMT
> > > It is certainly stupid to use both CR and LF as a combination
> > > to delimit every line ending.
[quoted text clipped - 13 lines]
> want to print - if the printer really does require the separate
> characters?

Because this was all defined in days when there were no processors.
Teletypes and telex machines.

> > And there were frequent occasions when you'd want to decouple those
> > actions, as when overstriking or vertical tabbing.  Having separate
[quoted text clipped - 8 lines]
>
> Backspace exists.

But slow

> But all that aside: you don't have to save your file on the host
> computer in the same form you send it to the printer.  It makes no sense
> to waste space when storage is so expensive, and it does make sense to
> write a tiny bit of trivial code that sends CR/LF when the source file
> says CR only.

> Best of all possible worlds, that way - if you ask me.
>
> > That sort of hardware survived long enough that there were vast
> > volumes of data designed to be printed on it.

The standards were set to include non computer equipment

Signature

Mark

Rowland McDonnell - 02 Sep 2008 10:58 GMT
> > > > It is certainly stupid to use both CR and LF as a combination
> > > > to delimit every line ending.
[quoted text clipped - 16 lines]
> Because this was all defined in days when there were no processors.
> Teletypes and telex machines.

Baudot, ITA1, and ITA2 were defined back in the
pre-digital-electronic-stored-program-computer era.

ASCII was defined long after it had started.

> > > And there were frequent occasions when you'd want to decouple those
> > > actions, as when overstriking or vertical tabbing.  Having separate
[quoted text clipped - 10 lines]
>
> But slow

Only if you're using a teletype or something else with a proper carriage
return mechanism.

Yer typical dot matrix printer can whizz back to the start of a line as
fast as the stepper motor can whine, regardless of whether or not it's
receiving a carriage return or multiple backspaces - assuming the data
feed is fast enough, that is.

Page and line printers pay no attention to such concepts.

But this is getting away from my main point.

What I'm trying to get across here is the idea that *storing a CR/LF
pair in your computer file is not that sensible because it's
inefficient* - a point that's not terribly critical these days, but did
matter back in the 70s and 80s when storage was more expensive.

There are valid reasons for being able to *send* control codes to
devices to control them - but why store 'em in the file?  Just a waste
of space.

> > But all that aside: you don't have to save your file on the host
> > computer in the same form you send it to the printer.  It makes no sense
[quoted text clipped - 8 lines]
>
> The standards were set to include non computer equipment

The computer standards were not.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Bruce Horrocks - 03 Sep 2008 00:56 GMT
>>> It is certainly stupid to use both CR and LF as a combination
>>> to delimit every line ending.
[quoted text clipped - 5 lines]
> delimit a line ending?  Why transmit two characters as a line delimiter
> when you only need one to delimit a line ending?

Because, ASCII was invented to standardise codes used on teletypes, and
this was done before there were computers. So the codes reflect
functions required for teletypes. Instead, for historical reasons
derived from typewriters, teletypes used one code to return the carriage
to the beginning of the line and another to advance the paper. Hence CR
and LF.

A /different/ question is why people decided to use two codes for end of
line in computer files? And the reasons for this are less clear. My best
guess, and it is only a guess really, is that teletypes were used as the
standard output device on early computers, so it was easier to use CRLF
at the end of a line as any file echoed to the teletype would print
properly.

I agree with your later point that a small program could easily have
translated an EOL code to CRLF when printing to a teletype however I
suspect that this was not done for either or both of two reasons.
Firstly a "dump file to output device" program whether that device was a
paper tape, mag tape or teletype could use the same program regardless
of device. (And in those early days, one program instead of two similar
ones, or one with extra options, was a big saving.)

Secondly, the early computers often had dedicated i/o hardware
associated with peripherals that avoided loading the main CPU. It is
therefore possible that "dump file to teletype" was hardware assisted
which would make interpreting the file to translate EOL to CRLF much
harder whereas a simple dump was, well, simple.

In hindsight, it looks like an odd decision and I can't really defend
it. However, back in those days, the people working with computers were
really rather bright mathematicians etc. I'm sure that they would have
appreciated that they were wasting space in file store and comms, and so
wouldn't have made the decision lightly.

Regards,
Signature

Bruce Horrocks
Surrey
England
(bruce at scorecrow dot com)

Jack Campin - bogus address - 03 Sep 2008 11:47 GMT
>>> It is certainly stupid to use both CR and LF as a combination
>>> to delimit every line ending.
[quoted text clipped - 4 lines]
> to delimit a line ending?  Why transmit two characters as a line
> delimiter when you only need one to delimit a line ending?

In some situations, there were NO line ending markers at all.  When
your data was on 80-column cards, the mapping from "next card" to
"start of new line" was implicit - I used to take decks of cards
and print them on a teletype with no computer at all being involved,
it would just roll the carriage up a line for every card fed in.

Many computer file systems (from 1960s mainframes on) implemented
80-column card images as an allocation unit, whether the medium
was cards, tape or disk.  The delimiters were either nonexistent
or invisible to the user.  As other people have pointed out, you
needed more than that when you had to control I/O hardware.  Given
run length encoding in the disk or tape drivers, an 80-column card
image is as space-efficient as any other data structure for text
files.

==== j a c k  at  c a m p i n . m e . u k  ===  <http://www.campin.me.uk> ====
Jack Campin, 11 Third St, Newtongrange EH22 4PU, Scotland == mob 07800 739 557
CD-ROMs and free stuff:  Scottish music, food intolerance, and Mac logic fonts
Woody - 31 Aug 2008 15:40 GMT
> > > > The key sequence is,
> > > >
[quoted text clipped - 10 lines]
> > *SOME* text files created by Windoze have that hopelessly obsolete
> > brain-dead waste-of-space line terminator; not all.

Not all, just most.
Very few native windows appplications will not write cr/lf as many
windows applications will not display any other type of line terminator
[1]. It is hardly obsolete as it is used everywhere.

> > (why oh why did MS ever decide to do it, I mean why waste space like
> > that?  WHY???)
>
> Because they followed various standards. Unix was the non standard way.
> RFCS for Mail, HTTP use CRLF.  (And did anyone but Mac just have CR?)

No, that is a mac only thing.

> They are based on how printers and teletypes work.

Indeed, and on that basis they are the ones that make sense.

> A version of the history is <http://en.wikipedia.org/wiki/Newline>

Why everyone couldn't pick the same standard really, whatever it was.

[1] Notepad wont. Visual studio will but will tell you that your line
terminators are wrong.

Signature

Woody

www.alienrat.com

Rowland McDonnell - 01 Sep 2008 06:15 GMT
> > > > > The key sequence is,
> > > > >
[quoted text clipped - 15 lines]
> windows applications will not display any other type of line terminator
> [1]. It is hardly obsolete as it is used everywhere.

`Everywhere' in the Windoze world.  Not so in the Unix world.

It's obsolete because the technical reasons for using it vanished in the
1970s if not before, and the only reason we still use it is that CP/M
used it, so MS-DOS used it and it's just become a nasty inefficient
habit.

> > > (why oh why did MS ever decide to do it, I mean why waste space like
> > > that?  WHY???)
[quoted text clipped - 3 lines]
>
> No, that is a mac only thing.

No, it's also an `all the 8 bit micros you've heard of except for CP/M'
thing.

But the idiotic inefficient wasteful CP/M way of doing things took over
for historical reasons I don't have to explain to you, surely?

> > They are based on how printers and teletypes work.
>
> Indeed, and on that basis they are the ones that make sense.

But it does not make sense because printers and teletypes can generate
their own `LF' on seeing a `CR', thus permitting a saving on storage and
transmission.

> > A version of the history is <http://en.wikipedia.org/wiki/Newline>
>
> Why everyone couldn't pick the same standard really, whatever it was.

They did all pick the same standard, except for IBM.  That standard was
ASCII.

They all implemented this standard in a different way.

Most firms were bright enough to use either CR *OR* LF.  A few used
both.  Unfortunately, MS-DOS was a port of CP/M, one of the brain-dead
few that used both.

[snip]

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Rowland McDonnell - 01 Sep 2008 06:15 GMT
> > > > The key sequence is,
> > > >
[quoted text clipped - 15 lines]
>
> Because they followed various standards.

But why waste time following an obsolete standard designed for the
1960s?  Designed for technology that was obsolete before they started
designing the IBM PC in the first place?

> Unix was the non standard way.

`The' non standard way?  No, there were multiple ways of doing things,
none of which were accepted as `the standard'.

IBM didn't use ASCII at all and certainly didn't bother wasting space
like that.

DEC did waste space.

Unix didn't waste space.

<shrug>

You could argue that it was only the lunatic fringe that used the
inefficient CR/LF combination.

> RFCS for Mail, HTTP use CRLF.

Really?  Tell me more: which RFCs?

They were surely all written *AFTER* MS-DOS had taken over the world?
So specifying CR/LF just means `We want to be compatible with the lowest
common denominator' which makes a sort of sense - and you might as well
do so come 1990 (when the Web started) because storage and transmission
costs had dropped a *LOT* by then compared to the 1970s.

>  (And did anyone but Mac just have CR?)

How about pretty much every 8 bit micro in the known universe /except/
for the CP/M crowd - they used CR/LF which I hope explains to you why
MS-DOS and Windoze got CR/LF.

> They are based on how printers and teletypes work.

Printers and teletypes have long (maybe always?) had the ability to
provide their own CR/LF combination from a single character input[1].
So there is no sanity in wasting storage space *saving* two characters
when only one is required to be *stored* - especially back in the 1970s
when CP/M was created and storage was very expensive.

<shrug>  But CP/M did it the insane way.

Even if you do need to kick out a CR/LF pair, you can do that at
printing time with a very small bit of code - thus saving expensive
storage space even taking into account the fact that you need more code
to run things.

> A version of the history is <http://en.wikipedia.org/wiki/Newline>

Reads a bit dodgily to me.  However: it seems to me from that that CR/LF
was a weird way of doing it even back in the old days.

Rowland.

[1]  I used to think otherwise, having had a teletype myself and having
to send it CR/LF (BBC Micros send LF/CR when set to sent both - fine for
a 1980s dot matrix, NBG for an old teletype.  So I had to write an
output routine patch to send an extra CR) - but then I found out that
they had a *mechanical* adjustment to change to the other mode of
operation.  But I'd had no manual for the thing and no way was I going
to poke around tweaking things inside a machine that complex without
guidance.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Rod - 01 Sep 2008 09:34 GMT
>>>>> The key sequence is,
>>>>>
[quoted text clipped - 80 lines]
> to poke around tweaking things inside a machine that complex without
> guidance.

ICL's VME, which started in the teletype era, used EBCDIC. It had LF, CR
and NL (x'15'). NL combined the functions of LF and CR in one character.
But x'14' and x'15' had different uses in older-style teletypes, I gather.

Signature

Rod

Hypothyroidism is a seriously debilitating condition with an insidious
onset.
Although common it frequently goes undiagnosed.
<www.thyromind.info> <www.thyroiduk.org> <www.altsupportthyroid.org>

Rowland McDonnell - 02 Sep 2008 07:28 GMT
[snip]
> >> They are based on how printers and teletypes work.
> >
[quoted text clipped - 3 lines]
> > when only one is required to be *stored* - especially back in the 1970s
> > when CP/M was created and storage was very expensive.

[snip]

> > [1]  I used to think otherwise, having had a teletype myself and having
> > to send it CR/LF (BBC Micros send LF/CR when set to sent both - fine for
[quoted text clipped - 7 lines]
> ICL's VME, which started in the teletype era, used EBCDIC. It had LF, CR
> and NL (x'15'). NL combined the functions of LF and CR in one character.

Now *that* makes good sense.

btw, EBCDIC came in many forms - within IBM, not just due to those who
lifted it for their own use.

> But x'14' and x'15' had different uses in older-style teletypes, I gather.

Does x'14' mean 20, and x'14', 21?

Older teletypes used Baudot code (ITA1), or International Telegraph
Alphabet No 2 (ITA2).

Read and learn - I did: <http://en.wikipedia.org/wiki/Baudot_code>.

Baudot was arranged in a particular way to minimise operator fatigue.
It's a five bit code and the original scheme involved a five finger
chording keyboard - for direct binary input, required to be synchronous
with the `whole system', whatever that was (or so it seems to me from
reading the Wikipedia page).

CR and LF didn't turn up until the 1901 creation of a different code
arrangement - created by the bloke who'd just invented typewriter-like
keyboard for teletype operation.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Mark Bestley - 01 Sep 2008 13:46 GMT
> > > > > The key sequence is,

> > Because they followed various standards.
>
[quoted text clipped - 28 lines]
> do so come 1990 (when the Web started) because storage and transmission
> costs had dropped a *LOT* by then compared to the 1970s.

Most if not all. A quick search for CRLF gives an early one as RFC561 -
Standardizing Network Mail Headers from September 1973
I know it is in the Mail and Http RFC having had to debug some agents.

and as these standards were in use you needed to keep them for
compatibility.

> Printers and teletypes have long (maybe always?) had the ability to
> provide their own CR/LF combination from a single character input[1].
[quoted text clipped - 8 lines]
> storage space even taking into account the fact that you need more code
> to run things.

Yes but how do you go to the beginning of the linr to overwrite? That is
one reason that CR is on its own. Plus I have seen LF on its own used to
print things

> > A version of the history is <http://en.wikipedia.org/wiki/Newline>
>
> Reads a bit dodgily to me.  However: it seems to me from that that CR/LF
> was a weird way of doing it even back in the old days.

What exactly is dodgy about this - it seems to roughly match information
I have seen over the years.

Signature

Mark

Tim Streater - 01 Sep 2008 15:25 GMT
> > Really?  Tell me more: which RFCs?
> >
[quoted text clipped - 10 lines]
> and as these standards were in use you needed to keep them for
> compatibility.

A quick google for "SMTP mail" led to the wikipedia article and to RFCs
2821 and 2822 (both pretty recent) which both unequivocally state that
CRLF is mandatory.
Richard Tobin - 01 Sep 2008 16:09 GMT
>Most if not all. A quick search for CRLF gives an early one as RFC561 -
>Standardizing Network Mail Headers from September 1973
>I know it is in the Mail and Http RFC having had to debug some agents.

I think the first RFC that standardised this was RFC139, for
Telnet (May 1971):

  The representation of the end of a physical line at a terminal is
  implemented differently on network HOSTS.  For example, some use a
  return (or new line) key, the terminal hardware both returns the
  carriage or printer to start of line and feeds the paper to the next
  line.  In other implementations, the user hits carriage return and
  the hardware returns carriage while the software returns to the
  terminal a line feed.  The network-wide representation will be
  carriage return followed by line feed.  It represents the physical
  formatting that is being attempted, and is to be interpreted and
  appropriately translated by both using site and serving site.

This was referring to the encoding of the data.  Later RFCs have (I
think) uniformly used CR-LF for textual protocol header fields (one of
the nice things about internet protocols is that many of them use
human-readable headers, unlike the bit-saving formats in, for example,
the Janet protocols).  HTTP is a relatively recent example.

-- Richard
Signature

Please remember to mention me / in tapes you leave behind.

Rowland McDonnell - 02 Sep 2008 07:28 GMT
> >Most if not all. A quick search for CRLF gives an early one as RFC561 -
> >Standardizing Network Mail Headers from September 1973
[quoted text clipped - 19 lines]
> human-readable headers, unlike the bit-saving formats in, for example,
> the Janet protocols).  HTTP is a relatively recent example.

Hmm.

Hmmmmmm...........

Righto.

I shall be thinking about this.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Rowland McDonnell - 02 Sep 2008 07:28 GMT
> > > > > > The key sequence is,
[snip]

> > > RFCS for Mail, HTTP use CRLF.
> >
[quoted text clipped - 8 lines]
> Most if not all. A quick search for CRLF gives an early one as RFC561 -
> Standardizing Network Mail Headers from September 1973

Oh!  Hmm.  Interesting.  1973, eh?

Hmmm....

I want to learn more about why they did it.

> I know it is in the Mail and Http RFC having had to debug some agents.
>
> and as these standards were in use you needed to keep them for
> compatibility.

Aye - but you don't have to use them internally, do you?   Use efficient
methods inside your OS environment, and translate to the inefficient
communication protocol only when you have to.

> > Printers and teletypes have long (maybe always?) had the ability to
> > provide their own CR/LF combination from a single character input[1].
[quoted text clipped - 11 lines]
> Yes but how do you go to the beginning of the linr to overwrite? That is
> one reason that CR is on its own.

The reason that one could adjust teletypes to provide an LF in addition
to performing a CR on receipt of a CR is that quite often *that* was
rather handy.  Backspace characters exist, you know.

But if you have a teletype as a terminal, the original ASCII control
code set makes some sort of sense.  I never suggested otherwise.

However, they make no sense in a world of printers with buffers, and
electronic displays such as CRT monitors.  Use X,Y addressing, use page
printing methods, use line printing methods.  There is no need to
perform a `CR' operation when printing in the modern sense, is there?

There was never any need to do so when printing from late 70s/early 80s
home micros onto normal dot matrix printers of that era, either.

By that time, the ability to perform a separate `carriage return'
operation had ceased to make any sense from the point of view of
printing utility - although it did make sense back in the teletype era.

> Plus I have seen LF on its own used to
> print things

Yes, but it makes a lot more sense to do things the PostScript way,
don't you think?

Generate a page, and then print it?

I think you've hit upon why it is that most early 80s PCs used CR as a
`line terminator' - there was no need back then for the ability to
perform a carriage return, but winding up a single line?  Yes.

Although there is also a vertical tab character defined in ASCII - as
well as an escape character, which permits any extension to ASCII you
like.  So if you really want to build a printer which lets the user play
around with it in fashions that have no obvious utility, why not?

> > > A version of the history is <http://en.wikipedia.org/wiki/Newline>
> >
> > Reads a bit dodgily to me.  However: it seems to me from that that CR/LF
> > was a weird way of doing it even back in the old days.
>
> What exactly is dodgy about this

It reads like unreliable information - the style of presentation damns
it according to my nose.

> - it seems to roughly match information
> I have seen over the years.

Righto - thanks for the confirmation of its validity.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Rowland McDonnell - 01 Sep 2008 06:56 GMT
> > Okay, it's like this:
> >
[quoted text clipped - 42 lines]
> The Ctrl-V says, expect a control character next, and the Ctrl-I is tab.
> Ctrl-L is LF and Ctrl-M is CR.

Umm.  I need LFs.

(LF/CR is surely wrong?  CR/LF would be less wrong, but still wrong.)

At least, I *thought* I needed LFs.  I've just done it with LFs along -
TextEdit doesn't like the result, and nor does TeXShell.

TeXShell claims to like both Unix and Mac line terminators.

It seems that here&now, TextEdit and TeXShell don't work with Unix line
terminators (LF) but do work with Mac and Windoze line terminators (CR
or CR/LF).

Does anyone have any remarks to make that might shed some light on this
bizarrity?

> You can obviously replace tab with either CR,
> LF or both, or whatever. The g on the end says to replace all occurences.
[quoted text clipped - 13 lines]
>
> Hope that's what you're after there,

Something like it - thanks.

When I type in the line you suggest, and wrap round for a second line of
text entry, I get the second line appearing on the same line as the
first.

It seems to work anyway.  I do get inconsistent behaviour when typing in
these lines - sometimes I get appearing what you posted here, sometimes
I get crazy stuff like:

sed -E 's/\342\210\232\313\232

which just appeared when I was trying to type in another attempt to
replace character 11 (dec) with \par.

Any idea what's going on with all this?  It makes it very hard to recall
what you've done when you can't read it on account of it being munged
several different ways by the Terminal.

What the Terminal's doing at the moment makes it impossible to `go back
a line' with up-arrow, so it's turned out quite tedious to do the
required fiddling to get it to work.

(it's all well and good having a command to enter, but you have got to
get it right - and when part of what you type doesn't appear or appears
in random guises, and the rest of it is combinations of \'/, mistakes
are easy.  Very very easy.  Oh god not again!  ARGH!  That should have
been a backslash, boy, a BACKSLASH!!!! <cough>)

> You can use the same technique for your
> hex character, of course B is 11, which is the 11th letter of the alphabet so
> use Ctrl-VCtrl-K as the thing to be replaced for that one.

Yep, I've got that to work too.

So my question is: how come this works?  I've not seen any reference to
this way of entering data for sed anywhere I've looked.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Gary - 01 Sep 2008 16:58 GMT
> Yep, I've got that to work too.
>
> So my question is: how come this works?  I've not seen any reference to
> this way of entering data for sed anywhere I've looked.
>
> Rowland.

It's not a sed thing specifically. It's a shell thing. Normaly if you wanted
x'03' and just pressed CTRL-c, you'd terminate whatever you were doing as the
CTRL-c would be picked up by the shell. The CTRL-v first tells the shell that
the next CTRL- character is not to be interpreted as a signal, but just used
as data.

Signature

remove stars for email
g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m

Richard Tobin - 01 Sep 2008 17:23 GMT
>It's not a sed thing specifically. It's a shell thing.

It's not even a shell thing, though probably bash implements its own
version of it.  It's in the terminal driver (see the termios man page,
under LNEXT).

-- Richard
Signature

Please remember to mention me / in tapes you leave behind.

Gary - 01 Sep 2008 17:54 GMT
> It's not even a shell thing, though probably bash implements its own
> version of it.  It's in the terminal driver (see the termios man page,
> under LNEXT).
>
> -- Richard

Yep. People don't usually get that close to the terminal driver, but it's
true. I use this technique in applications where my login shell _is_ the
application (from /etc/passwd) so yep. I should not have been so generic.

/me shoots myself in all three feet.

Signature

remove stars for email
g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m

Rowland McDonnell - 02 Sep 2008 07:47 GMT
> >It's not a sed thing specifically. It's a shell thing.
>
> It's not even a shell thing, though probably bash implements its own
> version of it.  It's in the terminal driver (see the termios man page,
> under LNEXT).

Ye gods.  Urgh.

Okay - no wonder I've never come across this particular bobble.

Even if I'd met the man page in question, I think this intro:

SYNOPSIS

    #include <termios.h>

DESCRIPTION

    This describes a general terminal line discipline that is supported
    on tty asynchronous communication ports.

would have put me off reading it instantly.  Hell, that intro *still*
puts me off reading it...

But I've had a look at I don't see how I'm supposed to be able to go
from:

LNEXT   Special character on input and is recognized if the IEXTEN flag
       is set.  Receipt of this character causes the next character to
       be taken literally.

to understanding that that's what ctrl-v does.  I mean, yeah, okay, it's
talking about the job in hand - but really...  This is, let's face it,
not really what one would call user documentation, is it?

The wonderful world of man pages, eh?

Cheers,
Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Rowland McDonnell - 04 Sep 2008 09:33 GMT
> > Okay, it's like this:
> >
[quoted text clipped - 60 lines]
> hex character, of course B is 11, which is the 11th letter of the alphabet so
> use Ctrl-VCtrl-K as the thing to be replaced for that one.

The above works okay - but I've run into another problem.

I'd like to put the above into a command line script - but the above
stuff's for the Terminal, not TextEdit.

Any ideas what I might try?

Cheers,
Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Justin C - 04 Sep 2008 22:58 GMT
>> $ sed -E 's/  /^L^M/g' <afile
>> a
[quoted text clipped - 6 lines]
> I'd like to put the above into a command line script - but the above
> stuff's for the Terminal, not TextEdit.

------
#!/bin/sh

sed -E 's/  /^L^M/g' < $1 > $2
------

Call the file whatever you like, 'boris' or something. Don't forget to make it executable ('chmod u+x boris'), then:
$ boris input.txt output.txt

    Justin.

Signature

Justin C, by the sea.

jim - 05 Sep 2008 06:36 GMT
> ------ #!/bin/sh
>
> sed -E 's/  /^L^M/g' < $1 > $2 ------
>
> Call the file whatever you like, 'boris' or something. Don't forget to
> make it executable ('chmod u+x boris'), then: $ boris input.txt output.txt

For that to work you'll also have to move 'boris' to somewhere like
/usr/local/bin

Otherwise you'll have to type

$ ./boris input.txt output.txt

Jim
Signature

"Well, well. We've come a long way from the Prime Minister's
exploding cake." - Adam West, Batman.

http://www.UrsaMinorBeta.co.uk   http://twitter.com/GreyAreaUK

Rowland McDonnell - 08 Sep 2008 09:25 GMT
> > ------ #!/bin/sh
> >
[quoted text clipped - 9 lines]
>
> $ ./boris input.txt output.txt

I was thinking of saving the file with a .command extension and running
it using the Finder.

Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Justin C - 08 Sep 2008 23:38 GMT
>> > ------ #!/bin/sh
>> >
[quoted text clipped - 12 lines]
> I was thinking of saving the file with a .command extension and running
> it using the Finder.

How will you pass it the names of the file(s) to process?

WRT character number 11, I don't know how to do that with sed.

Here's the perl I worked out for Char 11 part. It requires the files
to be processed to be the *only* files in the directory. It creates
new files with the prefix "new_". It's untested.

If you wish to give it a try, copy and paste into TextMate (which it
sounds like you have), save it, make it executable (as per a previous
post). In terminal navigate to the directory containing the files on
which the search/replace needs to be performed, and type:
   ~/path/to/executable

It'll work through your files, where it finds Char 11 (hex 0b) it'll
replace it with the character string \par (the first \ in the replace
string is to escape the second, it isn't a typo). The output will be
in a new file leaving original data in tact.

I just created a few test files, and ran a few tests (so it is
partially tested now with fake data), and it appears to do what you
want. I release it into the public domain.

To any perl gurus out there, yes, I know there are probably better
ways of doing it, but I don't know them! And to those command line
gurus: This is the only way I know how!

[begin here]
#!/usr/bin/perl

use warnings;
use strict;

my @files = glob "*";

foreach ( @files ) {
   my $newname = "new_" . $_;
   my $oldname = $_;
   open (INPUT, "<" , $oldname)
                   or die "Cannot open $oldname : $!";
   open (OUTPUT, ">" , $newname)
                   or die "Cannot create $newname : $!";

   while ( <INPUT> ) {
       $_ =~ s/\x0b/\\par/g;
       print OUTPUT $_;
   }
   
   close INPUT;
   close OUTPUT;
}
[done] <-- don't copy this or the begin line, but you know that
already.

    Justin.

Signature

Justin C, by the sea.

Rowland McDonnell - 08 Sep 2008 09:25 GMT
> >> $ sed -E 's/  /^L^M/g' <afile
> >> a
[quoted text clipped - 16 lines]
> make it executable ('chmod u+x boris'), then:
> $ boris input.txt output.txt

Aha!

Yes - got that.

But, erm, I'm still stuck with the `character number 11' problem - I
can't enter that into TextEdit, can I?

Cheers,
Rowland.

Signature

Remove the animal for email address: rowland.mcdonnell@dog.physics.org
                                           Sorry - the spam got to me
http://www.mag-uk.org                             http://www.bmf.co.uk
UK biker?   Join MAG and the BMF and stop the Eurocrats banning biking

Bruce Horrocks - 08 Sep 2008 22:05 GMT
>>>> $ sed -E 's/  /^L^M/g' <afile
>>>> a
[quoted text clipped - 21 lines]
> But, erm, I'm still stuck with the `character number 11' problem - I
> can't enter that into TextEdit, can I?

Hi Rowland,

Do this instead:

perl -p -e 's/\t/\n/g;' -e 's/&/\\\&/g;' -e 's/\xB/\\par/g;' input_file
> output_file

All one line.

Regards,
Signature

Bruce Horrocks
Surrey
England
(bruce at scorecrow dot com)

Justin C - 09 Sep 2008 00:54 GMT
> perl -p -e 's/\t/\n/g;' -e 's/&/\\\&/g;' -e 's/\xB/\\par/g;'
> input_file output_file

There, I bloody knew someone would!

    Justin.

Signature

Justin C, by the sea.

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.