sed usage question
|
|
Thread rating:  |
Rowland McDonnell - 26 Aug 2008 00:12 GMT Okay, it's like this:
I'm trying to work out a sed incantation to perform three simple search and replace jobs on a single file.
I want to replace every occurance of:
<tab> with <newline> & with \& char no. 11 with \par
(where <tab> and <newline> are the characters with those meanings to the computer, and \& and \par are those literal strings. LaTeX speak, for those who might be curious. And that's char 11 in dec; B in hex)
I've read the man page, I've read some tutorials (including a tutorial on quoting which left me totally incomprehending, damnit). I've tried to figure out what to do and failed.
But:
sed 's/\&/\\&/g' <inputfile >outputfile
does one of the three jobs I need to have done.
Here are my other two attempts:
sed 's/\t/\n/g' <inputfile >outputfile
replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I thought it should do.
sed 's/\xB/\\par/g' <inputfile >outputfile
seems to do bugger all rather than replacing character number B (hex) with the string \par as I thought it should do.
So: can anyone suggest how to get sed to do the two jobs I can't work out myself?
(And can anyone explain why I have to put the s/ [...] argument to sed inside quote marks? I can't see any reason for needing to do that myself.)
Cheers, Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Steve Firth - 26 Aug 2008 00:48 GMT > I'm trying to work out a sed incantation to perform three simple search > and replace jobs on a single file. Yes, well since you're the sort of person that can determine guilt or innocence based simply on a photograph, no doubt something as simple as RTFM isn't beyond you. So off you go and stop snivelling.
Rowland McDonnell - 26 Aug 2008 02:00 GMT > > I'm trying to work out a sed incantation to perform three simple search > > and replace jobs on a single file. > > Yes, well since you're the sort of person that can determine guilt or > innocence based simply on a photograph, no doubt something as simple as > RTFM isn't beyond you. So off you go and stop snivelling. I've read the manual, you ill-bred oaf. I developed my attempts based on my best understanding of the documentation I could find.
And then I asked a straightforward technical question, presenting my best ideas so far based on my best understanding of the available documentation.
Your point in making your post was to achieve what technical aim?
And I've snivelled about it precisely where? Can you show me? Here is my original question:
(don't bother replying: I'm not interested in your personal opinions.)
========================================================================
Okay, it's like this:
I'm trying to work out a sed incantation to perform three simple search and replace jobs on a single file.
I want to replace every occurance of:
<tab> with <newline> & with \& char no. 11 with \par
(where <tab> and <newline> are the characters with those meanings to the computer, and \& and \par are those literal strings. LaTeX speak, for those who might be curious. And that's char 11 in dec; B in hex)
I've read the man page, I've read some tutorials (including a tutorial on quoting which left me totally incomprehending, damnit). I've tried to figure out what to do and failed.
But:
sed 's/\&/\\&/g' <inputfile >outputfile
does one of the three jobs I need to have done.
Here are my other two attempts:
sed 's/\t/\n/g' <inputfile >outputfile
replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I thought it should do.
sed 's/\xB/\\par/g' <inputfile >outputfile
seems to do bugger all rather than replacing character number B (hex) with the string \par as I thought it should do.
So: can anyone suggest how to get sed to do the two jobs I can't work out myself?
(And can anyone explain why I have to put the s/ [...] argument to sed inside quote marks? I can't see any reason for needing to do that myself.)
Cheers, Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Rowland McDonnell - 26 Aug 2008 02:06 GMT > > I'm trying to work out a sed incantation to perform three simple search > > and replace jobs on a single file. > > Yes, well since you're the sort of person that can determine guilt or > innocence based simply on a photograph, no doubt something as simple as > RTFM isn't beyond you. So off you go and stop snivelling. (I need to add:
The creature Firth's claim about me being able to determine guilt or innocence based on a photograph is a product purely of his own diseased imagination and has no basis in reality or any claim that I have made.
The creature Firth insults me with his dishonest claim that my straightforward technical question could be classed as `snivelling' by anyone not an inveterate liar.
The creature Firth has decided to insult me by suggesting that I RTFM, when it is perfectly clear that I've done just that and got 1/3 of the way towards my end goal and no further since TFM is FS.
Don't take my word for it:
<http://www.grymoire.com/Unix/Sed.html#toc-uh-0>:
"Anyhow, sed is a marvelous utility. Unfortunately, most people never learn its real power. The language is very simple, but the documentation is terrible."
Moreover, the creature Firth was in my killfile for a reason I had forgotten: his replies to me almost always contain nothing but ill-mannered insults of a sort that demonstrate little more than the fact he was brought up very badly and has no respect for social decency.)
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Rowland McDonnell - 26 Aug 2008 02:16 GMT > > I'm trying to work out a sed incantation to perform three simple search > > and replace jobs on a single file. > > Yes, well since you're the sort of person that can determine guilt or > innocence based simply on a photograph, no doubt something as simple as > RTFM isn't beyond you. So off you go and stop snivelling. Firth, you are scum plain and simple.
This reply of yours, which I downloaded specially since you are in my killfile, was one I had hoped contained an honest attempt at a useful reply.
What I got was something that upset me very badly because it's such a shitty response.
Yes, that's right: you've upset me very very badly.
So f.ck you Firth, and I hope you die slowly and horribly of something very nasty in the near future.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Steve Firth - 26 Aug 2008 10:27 GMT > Yes, that's right: you've upset me very very badly. Good, now you know how the way that you treat every other user of this group feels to the person that is treated that way. I know you won't learn a lesson, since you're learning resistant, but yoru nasty little sign off proves what a poostain on the bedsheet of life you are, and I hope it acts as a warning to others about your character, or rather lack of it.
> (don't bother replying: I'm not interested in your personal opinions.) Bwhahahahahahahahahahahahaha. And you posted three replies to show how not interested you are. By *your* standards that makes you a liar, or worse.
What do people say when they see your photograph - once they've got over the retching?
Stewart Smith - 26 Aug 2008 09:01 GMT > sed 's/\t/\n/g' <inputfile >outputfile > > replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I > thought it should do. I had originally thought that it might need extra backslashes. I've definitely used a similar command in vim but maybe that uses a different syntax to standard sed. I found this:
http://sed.sourceforge.net/sed1line.txt "USE OF '\t' IN SED SCRIPTS: For clarity in documentation, we have used the expression '\t' to indicate a tab character (0x09) in the scripts. However, most versions of sed do not recognize the '\t' abbreviation, so when typing these scripts from the command line, you should press the TAB key instead. '\t' is supported as a regular expression metacharacter in awk, perl, and HHsed, sedmod, and GNU sed v3.02.80."
Looks like there's lots of useful stuff on that page anyway.
> (And can anyone explain why I have to put the s/ [...] argument to sed > inside quote marks? I can't see any reason for needing to do that > myself.) It's probably in case you want to use spaces in one of the things you're trying to match or replace. If you didn't use the quotes how would the shell know what sed is supposed to use as input?
Stewart
Rowland McDonnell - 26 Aug 2008 17:58 GMT > > sed 's/\t/\n/g' <inputfile >outputfile > > [quoted text clipped - 6 lines] > > http://sed.sourceforge.net/sed1line.txt Ah! Examples - excellent!
> "USE OF '\t' IN SED SCRIPTS: For clarity in documentation, we have used > the expression '\t' to indicate a tab character (0x09) in the scripts. > However, most versions of sed do not recognize the '\t' abbreviation, > so when typing these scripts from the command line, you should press > the TAB key instead. '\t' is supported as a regular expression > metacharacter in awk, perl, and HHsed, sedmod, and GNU sed v3.02.80." Ah!
Any idea where I can find out about the regexp format used by sed?
I've read the man pages I can find on the subject but got baffled.
And for those who put that down to whinging, whining Rowland, one of the man pages I'm pushed towards says this about itself:
"This is an alpha release with known defects. Please report problems."
> Looks like there's lots of useful stuff on that page anyway. Well, it might not teach me what I need, but it looks about a thousand times more useful than anything I managed to find myself.
> > (And can anyone explain why I have to put the s/ [...] argument to sed > > inside quote marks? I can't see any reason for needing to do that > > myself.) > > It's probably in case you want to use spaces in one of the things you're > trying to match or replace. I get errors if I miss the quotes out, and nary a space in sight.
> If you didn't use the quotes how would the > shell know what sed is supposed to use as input? By parsing the input according to the rules? The point here is that I clearly don't understand the rules because I don't understand why the quotes are needed and I'd like to.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Steve Firth - 26 Aug 2008 18:06 GMT > The point here is that I clearly don't understand the rules because I > don't understand ... the difference between arse and elbow.
chris - 27 Aug 2008 09:14 GMT >>> sed 's/\t/\n/g' <inputfile >outputfile >>> [quoted text clipped - 13 lines] > clearly don't understand the rules because I don't understand why the > quotes are needed and I'd like to. The quotes are there to protect the command from being interpreted by the shell. Thus, everything within the quotes is passed to the command (sed in this case) as quoted, not following interpretation by the shell.
Take this as an example:
ls *.txt
This will pass all files that end in .txt (as matched by the shell) in the current directory to the 'ls' command, which will then list them on screen.
ls '*.txt'
This will only list the one file that matches exactly to *.txt, if it exists.
So, in the first example the *.txt is interpreted by the shell to expand to all files that match the '*' wildcard and ending in '.txt'. Then all those files are passed to ls for output.
I'll leave it to you to read up on the difference between single quotes and double quotes.
A book on bash may well be a good start as manpages don't seem to give you the information in a way you like.
Rowland McDonnell - 31 Aug 2008 03:51 GMT > >>> sed 's/\t/\n/g' <inputfile >outputfile > >>> [quoted text clipped - 16 lines] > The quotes are there to protect the command from being interpreted by > the shell. Ah!
Ah so!
Yes!
Obvious. I feel silly.
On the other hand, I've never read anything that said what you just said plainly. Why can't Unix docs just say things plainly like you've just done?
Thank you kindly, sir, you have just cleared up something that's been bothering me for a while. Argh. Arhg. My brain hurts.
> Thus, everything within the quotes is passed to the command > (sed in this case) as quoted, not following interpretation by the shell. [quoted text clipped - 6 lines] > the current directory to the 'ls' command, which will then list them on > screen. Uhuh.
> ls '*.txt' > > This will only list the one file that matches exactly to *.txt, if it > exists. By which you mean, a file with the exact name: *.txt; not the file with the name <anything>.txt.
Yes?
> So, in the first example the *.txt is interpreted by the shell to expand > to all files that match the '*' wildcard and ending in '.txt'. Then all > those files are passed to ls for output. > > I'll leave it to you to read up on the difference between single quotes > and double quotes. <heh> Cheers. ;-) However, now you've got me thinking along the right lines, the next bit shouldn't be a major problem.
> A book on bash may well be a good start as manpages don't seem to give > you the information in a way you like. Any suggestions? I've got Learning Unix for MacOS X Tiger and I reckon it's crap.
(It's like this: I've never got anywhere learning Unix stuff unless I've had a guru to hand, as in in the room, to help me out when I got stuck, which was frequently. Thing is, all I need is a hand to unstick me from the problem, and off I go again until the next patch of mud. I don't need constant hand-holding, just pulling out of the frequent bogs they drop you in to.)
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
chris - 02 Sep 2008 09:56 GMT >> ls '*.txt' >> [quoted text clipped - 5 lines] > > Yes? Correct.
>> A book on bash may well be a good start as manpages don't seem to give >> you the information in a way you like. > > Any suggestions? I've got Learning Unix for MacOS X Tiger and I reckon > it's crap. I'm afraid not. I've been able to get along with the manpages and resources on the interweb. So, I've never really needed a 'good' book on bash.
Rowland McDonnell - 02 Sep 2008 10:58 GMT [snip]
> >> A book on bash may well be a good start as manpages don't seem to give > >> you the information in a way you like. [quoted text clipped - 5 lines] > resources on the interweb. So, I've never really needed a 'good' book on > bash. Righto - thanks.
I don't think there exists a good book on the subject. But if I'm lucky enough to get a non-abusive reply to my queries here, it seems that I can, one piece at a time, learn a few things.
The problem is that anything I learn will get forgotten soon because I don't do a lot of command line stuff, and the bits and pieces I'm learning are not explained well in any documentation that I've seen.
If I weren't so f.cked in the head, I'd write a Unix manual myself. One that would be a lot better than all others in existence for beginners wanting to learn how to do the sorts of things I want to do.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Gary - 26 Aug 2008 23:05 GMT > Okay, it's like this: > [quoted text clipped - 27 lines] > replaces `t's with `n's rather than \t (tab)s with \n (newline)s as I > thought it should do. Okay for your tab/newline one, do this:
sed -E 's/ /^L^M/g' <afile
Now that needs some explaining
I don't think the -E is needed... But the thing in quotes is most important.
The key sequence is,
's/Ctrl-VCtrl-I/Ctrl-VCtrl-LCtrl-V-Ctrl-M/g'
The Ctrl-V says, expect a control character next, and the Ctrl-I is tab. Ctrl-L is LF and Ctrl-M is CR. You can obviously replace tab with either CR, LF or both, or whatever. The g on the end says to replace all occurences.
My input file changed from
$ more afile a b c d
The whitespace there is Tab, to:
$ sed -E 's/ /^L^M/g' <afile a b c d
Hope that's what you're after there, You can use the same technique for your hex character, of course B is 11, which is the 11th letter of the alphabet so use Ctrl-VCtrl-K as the thing to be replaced for that one.
Enjoy.
 Signature remove stars for email g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m
Steve Folly - 26 Aug 2008 23:19 GMT On 26/08/2008 23:05, in article 0001HW.C4DA3CB3005AA279B01AD9AF@news-europe.giganews.com, "Gary" <postmaster@127.0.0.1> wrote:
> The key sequence is, > [quoted text clipped - 3 lines] > Ctrl-L is LF and Ctrl-M is CR. You can obviously replace tab with either CR, > LF or both, or whatever. The g on the end says to replace all occurences. Be careful - text files in Unix have just a LF (newline) as a line delimiter. Text files created in Windows have CR/LF delimiters.
> Enjoy. OK, you got sed working, but I wonder if 'tr' is simpler for single character to single character replacements (which does recognize \t and \n)?
$ tr '\t' '\n' < infile > outfile
 Signature Regards, Steve
"...which means he created the heaven and the earth... in the DARK! How good is that?"
Gary - 26 Aug 2008 23:30 GMT > On 26/08/2008 23:05, in article > 0001HW.C4DA3CB3005AA279B01AD9AF@news-europe.giganews.com, "Gary" [quoted text clipped - 10 lines] > Be careful - text files in Unix have just a LF (newline) as a line > delimiter. Text files created in Windows have CR/LF delimiters. Well, I did think along those lines and that's why I described all the options. If I used only LF in Mac OS X terminal, I get stepped output:
$ sed -E 's/ /^L/g' <afile a b c d
> OK, you got sed working, but I wonder if 'tr' is simpler for single > character to single character replacements (which does recognize \t and \n)? > > $ tr '\t' '\n' < infile > outfile That's what so great about UNIX. You can skin the same cat 100 different ways just before you shoot yourself in the foot with it.
Still, the OPs problems were more than just single character substitutions and whilst you could pipeline a tr before/after your sed, why have another process when you don't have to? The single sed should be able to do it all.
Choices though. It's all about choices.
I might have done the whole thing with an awk. Much better than perl or somesuch :)
 Signature remove stars for email g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m
Rowland McDonnell - 31 Aug 2008 03:51 GMT > > The key sequence is, > > [quoted text clipped - 6 lines] > Be careful - text files in Unix have just a LF (newline) as a line > delimiter. Text files created in Windows have CR/LF delimiters. *SOME* text files created by Windoze have that hopelessly obsolete brain-dead waste-of-space line terminator; not all.
(why oh why did MS ever decide to do it, I mean why waste space like that? WHY???)
> > Enjoy. > > OK, you got sed working, but I wonder if 'tr' is simpler for single > character to single character replacements (which does recognize \t and \n)? > > $ tr '\t' '\n' < infile > outfile My wife found that one for me - I didn't. Thanks to you too for putting me on to a neat tool I didn't know about.
But: the job I want to do isn't all single character replacements. If nothing else works, I'll probably use tr in part.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Bruce Horrocks - 31 Aug 2008 14:18 GMT > But: the job I want to do isn't all single character replacements. If > nothing else works, I'll probably use tr in part. Sed will also work from a file of instructions so you can have the 3 commands in there rather than try and create a hugely complicated, single command line. Alternatively use a shell script that pipes the output of sed or tr to another invocation of sed or tr that does the next transformation. Repeat as required.
Regards,
 Signature Bruce Horrocks Surrey England (bruce at scorecrow dot com)
Rowland McDonnell - 01 Sep 2008 06:15 GMT > > But: the job I want to do isn't all single character replacements. If > > nothing else works, I'll probably use tr in part. > > Sed will also work from a file of instructions so you can have the 3 > commands in there rather than try and create a hugely complicated, > single command line. I have to say I am sure it'd be /much/ easier to write a `hugely complicated' single command line than to learn how to do drive sed from a file of instructions. Writing a `hugely complicated' command line is just a matter of putting one thing after another - it's no more complicated than writing the bits on separate lines. Learning how to do each thing with sed has proven very very hard. Very very very hard indeed.
But it's very easy to write a shell script with each step done with a separate call to sed (or whatever), so who needs long command lines trying to do it all in one go?
> Alternatively use a shell script that pipes the > output of sed or tr to another invocation of sed or tr that does the > next transformation. Repeat as required. I've been thinking along those lines. But thanks - if I hadn't been, your suggestion would probably proven to be invaluable.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Mark Bestley - 31 Aug 2008 14:19 GMT > > > The key sequence is, > > > [quoted text clipped - 13 lines] > (why oh why did MS ever decide to do it, I mean why waste space like > that? WHY???) Because they followed various standards. Unix was the non standard way. RFCS for Mail, HTTP use CRLF. (And did anyone but Mac just have CR?)
They are based on how printers and teletypes work.
A version of the history is <http://en.wikipedia.org/wiki/Newline>
 Signature Mark
Andrew Stephenson - 31 Aug 2008 15:18 GMT > > *SOME* text files created by Windoze have that hopelessly obsolete > > brain-dead waste-of-space line terminator; not all. [quoted text clipped - 6 lines] > > [...] And many programs use the two line-end codes in clever ways to save bytes here and there. Word Star, frex, sets the high bit of the CR to denote "soft carriage return" (as computed by the formatting function) and of LF to denote a page break (ditto). Setting those bits must save a lot of time when redisplaying a page on-screen (and a little when printing).
But Rowland knows better, hence it is a stupid way of working.
Sidebar: an ASCII toolkit filter wot I writ and use daily, has line-end recognition rules that look for: <cr> <lf> <cr> <ff> <cr> <lf> <cr> <lf> <ff> <lf> <ff> So one can learn to live with these little differences. Note, frex, how some Mac apps will happily insert whichever combo is preferred. This is only an issue for those who enjoy stress.
IMHO, of course. :-)
 Signature Andrew Stephenson
Rowland McDonnell - 01 Sep 2008 06:15 GMT > > > *SOME* text files created by Windoze have that hopelessly obsolete > > > brain-dead waste-of-space line terminator; not all. [quoted text clipped - 9 lines] > And many programs use the two line-end codes in clever ways to > save bytes here and there. But that has nothing whatever to do with using them *as a pair* to indicate the end of each line in a stored text file.
There's no reason to do that - except when the software you are using expects that, for reasons of historical idiocy.
> Word Star, frex, sets the high bit > of the CR to denote "soft carriage return" (as computed by the > formatting function) and of LF to denote a page break (ditto). > Setting those bits must save a lot of time when redisplaying a > page on-screen (and a little when printing). I don't see what's clever about deciding to waste space above the ASCII range to duplicate existing control codes.
FF does nicely for a `new page' marker - surely that's what it's meant for? So why waste space >127 to duplicate it?
ASCII's got no `soft CR', but it has got a lot of control codes that were redundant come the end of the 1970s, so WordStar could have used a private re-definition of one of those, leaving the >127 range clear, and keeping (almost) all the control codes down below 32 where they belong (IMHO).
On the other hand, why not just set the high bit of CR for that job? No stupidity in that because it's not necessarily inefficient - it's just `daft to me to use that part of the range that way'.
> But Rowland knows better, hence it is a stupid way of working. You know, I find false and snide remarks like that most annoying.
Why state that I hold opinions and attitudes that I do not?
It is certainly stupid to use both CR and LF as a combination to delimit every line ending.
That is certainly stupid because it's a completely pointless waste of space.
But you've decided to claim that I've made a different claim, haven't you?
Now, you're talking about using different ASCII codes to indicate different things. I am firmly of the opinion that is the opposite of stupid because it is a way of using what you've got in the way it was meant to be used in order to improve efficiency.
So you see, you have just insulted me on the grounds that I hold opinions, but I do not hold the opinions you false and maliciously accuse me of holding.
The truth is that I hold the opposite opinions to the ones that you falsely and maliciously accuse me of holding.
Why do that? Why use dishonest tactics to launch a personal attack against me?
> Sidebar: an ASCII toolkit filter wot I writ and use daily, has > line-end recognition rules that look for: [quoted text clipped - 6 lines] > <ff> > So one can learn to live with these little differences. If all the above combinations are turned into identical (say) LF EOL characters, the above filtering spec would bugger up some things I've done in the past.
I've worked with files where a FF means FF, for example.
> Note, > frex, how some Mac apps will happily insert whichever combo is > preferred. This is only an issue for those who enjoy stress. Some Mac applications barf when faced with CR/LF but can handle CR *OR* LF quite happily. It's necessary to get the issue dealt with.
[snip]
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Jack Campin - bogus address - 01 Sep 2008 22:06 GMT > It is certainly stupid to use both CR and LF as a combination > to delimit every line ending. It mapped exactly onto what you wanted a golfball or dot matrix printer to do at the end of a line: move the head back and roll the carriage up.
And there were frequent occasions when you'd want to decouple those actions, as when overstriking or vertical tabbing. Having separate CR and LF must have saved vast amounts of time and wear on printers.
That sort of hardware survived long enough that there were vast volumes of data designed to be printed on it. It made no sense to introduce a change in format just to free up one control character (what would you then do with it?) or save maybe 5% of file space.
==== j a c k at c a m p i n . m e . u k === <http://www.campin.me.uk> ==== Jack Campin, 11 Third St, Newtongrange EH22 4PU, Scotland == mob 07800 739 557 CD-ROMs and free stuff: Scottish music, food intolerance, and Mac logic fonts
Rowland McDonnell - 02 Sep 2008 07:47 GMT > > It is certainly stupid to use both CR and LF as a combination > > to delimit every line ending. > > It mapped exactly onto what you wanted a golfball or dot matrix > printer to do at the end of a line: move the head back and roll > the carriage up. But why store two characters in your file when you only need one to delimit a line ending? Why transmit two characters as a line delimiter when you only need one to delimit a line ending?
Well, there are sometimes reasons for the latter, but none for the former, so:
Why not translate the internal coding to external coding as and when you want to print - if the printer really does require the separate characters?
> And there were frequent occasions when you'd want to decouple those > actions, as when overstriking or vertical tabbing. Having separate > CR and LF must have saved vast amounts of time and wear on printers. I recall using old dot matrix printers in the early 80s. Lots of different Epsons, mostly. I recall them doing the CR/LF job on receipt of a single character (and I recall CR doing the job, although I suspect that one could change this using DIP switches in many cases).
You can wind up one line using LF if CR is used for `CR/LF'.
Backspace exists.
But all that aside: you don't have to save your file on the host computer in the same form you send it to the printer. It makes no sense to waste space when storage is so expensive, and it does make sense to write a tiny bit of trivial code that sends CR/LF when the source file says CR only.
Best of all possible worlds, that way - if you ask me.
> That sort of hardware survived long enough that there were vast > volumes of data designed to be printed on it. Yes - and I recall that hardware: I recall that some of it output CR/LF combined on receipt of a CR character - I say `some' because in many cases, I had no idea exactly what was going on, but in some, I did.
I recall some host computers having an option to /transmit/ both to the printer if it needed both - but the host computer would only /store/ one character to delimit the end of the line to save on storage costs.
Storage and transmission - two different jobs. It made no sense back then, given the very high cost of storage, to store the form you're going to transmit if it's inefficient and can be generated from a more efficient storage format using a trivially small amount of low-CPU-load code as is the case with turning a stored CR into a transmitted CR/LF.
> It made no sense to > introduce a change in format just to free up one control character > (what would you then do with it?) or save maybe 5% of file space. Alternatively, it made no sense to waste that storage space, so that is in fact what people did: they wrote the computer code so that they could store text files with single character line delimiters, and the printers either worked fine with that, or the host computer sent a CR/LF combo (or, irritatingly, LF/CR in the case of the BBC Micro) when the original data said `CR' only.
That's what was actually done at the time on the hardware *I* used.
It was implemented in practice and I used it.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Mark Bestley - 02 Sep 2008 10:47 GMT > > > It is certainly stupid to use both CR and LF as a combination > > > to delimit every line ending. [quoted text clipped - 13 lines] > want to print - if the printer really does require the separate > characters? Because this was all defined in days when there were no processors. Teletypes and telex machines.
> > And there were frequent occasions when you'd want to decouple those > > actions, as when overstriking or vertical tabbing. Having separate [quoted text clipped - 8 lines] > > Backspace exists. But slow
> But all that aside: you don't have to save your file on the host > computer in the same form you send it to the printer. It makes no sense > to waste space when storage is so expensive, and it does make sense to > write a tiny bit of trivial code that sends CR/LF when the source file > says CR only.
> Best of all possible worlds, that way - if you ask me. > > > That sort of hardware survived long enough that there were vast > > volumes of data designed to be printed on it. The standards were set to include non computer equipment
 Signature Mark
Rowland McDonnell - 02 Sep 2008 10:58 GMT > > > > It is certainly stupid to use both CR and LF as a combination > > > > to delimit every line ending. [quoted text clipped - 16 lines] > Because this was all defined in days when there were no processors. > Teletypes and telex machines. Baudot, ITA1, and ITA2 were defined back in the pre-digital-electronic-stored-program-computer era.
ASCII was defined long after it had started.
> > > And there were frequent occasions when you'd want to decouple those > > > actions, as when overstriking or vertical tabbing. Having separate [quoted text clipped - 10 lines] > > But slow Only if you're using a teletype or something else with a proper carriage return mechanism.
Yer typical dot matrix printer can whizz back to the start of a line as fast as the stepper motor can whine, regardless of whether or not it's receiving a carriage return or multiple backspaces - assuming the data feed is fast enough, that is.
Page and line printers pay no attention to such concepts.
But this is getting away from my main point.
What I'm trying to get across here is the idea that *storing a CR/LF pair in your computer file is not that sensible because it's inefficient* - a point that's not terribly critical these days, but did matter back in the 70s and 80s when storage was more expensive.
There are valid reasons for being able to *send* control codes to devices to control them - but why store 'em in the file? Just a waste of space.
> > But all that aside: you don't have to save your file on the host > > computer in the same form you send it to the printer. It makes no sense [quoted text clipped - 8 lines] > > The standards were set to include non computer equipment The computer standards were not.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Bruce Horrocks - 03 Sep 2008 00:56 GMT >>> It is certainly stupid to use both CR and LF as a combination >>> to delimit every line ending. [quoted text clipped - 5 lines] > delimit a line ending? Why transmit two characters as a line delimiter > when you only need one to delimit a line ending? Because, ASCII was invented to standardise codes used on teletypes, and this was done before there were computers. So the codes reflect functions required for teletypes. Instead, for historical reasons derived from typewriters, teletypes used one code to return the carriage to the beginning of the line and another to advance the paper. Hence CR and LF.
A /different/ question is why people decided to use two codes for end of line in computer files? And the reasons for this are less clear. My best guess, and it is only a guess really, is that teletypes were used as the standard output device on early computers, so it was easier to use CRLF at the end of a line as any file echoed to the teletype would print properly.
I agree with your later point that a small program could easily have translated an EOL code to CRLF when printing to a teletype however I suspect that this was not done for either or both of two reasons. Firstly a "dump file to output device" program whether that device was a paper tape, mag tape or teletype could use the same program regardless of device. (And in those early days, one program instead of two similar ones, or one with extra options, was a big saving.)
Secondly, the early computers often had dedicated i/o hardware associated with peripherals that avoided loading the main CPU. It is therefore possible that "dump file to teletype" was hardware assisted which would make interpreting the file to translate EOL to CRLF much harder whereas a simple dump was, well, simple.
In hindsight, it looks like an odd decision and I can't really defend it. However, back in those days, the people working with computers were really rather bright mathematicians etc. I'm sure that they would have appreciated that they were wasting space in file store and comms, and so wouldn't have made the decision lightly.
Regards,
 Signature Bruce Horrocks Surrey England (bruce at scorecrow dot com)
Jack Campin - bogus address - 03 Sep 2008 11:47 GMT >>> It is certainly stupid to use both CR and LF as a combination >>> to delimit every line ending. [quoted text clipped - 4 lines] > to delimit a line ending? Why transmit two characters as a line > delimiter when you only need one to delimit a line ending? In some situations, there were NO line ending markers at all. When your data was on 80-column cards, the mapping from "next card" to "start of new line" was implicit - I used to take decks of cards and print them on a teletype with no computer at all being involved, it would just roll the carriage up a line for every card fed in.
Many computer file systems (from 1960s mainframes on) implemented 80-column card images as an allocation unit, whether the medium was cards, tape or disk. The delimiters were either nonexistent or invisible to the user. As other people have pointed out, you needed more than that when you had to control I/O hardware. Given run length encoding in the disk or tape drivers, an 80-column card image is as space-efficient as any other data structure for text files.
==== j a c k at c a m p i n . m e . u k === <http://www.campin.me.uk> ==== Jack Campin, 11 Third St, Newtongrange EH22 4PU, Scotland == mob 07800 739 557 CD-ROMs and free stuff: Scottish music, food intolerance, and Mac logic fonts
Woody - 31 Aug 2008 15:40 GMT > > > > The key sequence is, > > > > [quoted text clipped - 10 lines] > > *SOME* text files created by Windoze have that hopelessly obsolete > > brain-dead waste-of-space line terminator; not all. Not all, just most. Very few native windows appplications will not write cr/lf as many windows applications will not display any other type of line terminator [1]. It is hardly obsolete as it is used everywhere.
> > (why oh why did MS ever decide to do it, I mean why waste space like > > that? WHY???) > > Because they followed various standards. Unix was the non standard way. > RFCS for Mail, HTTP use CRLF. (And did anyone but Mac just have CR?) No, that is a mac only thing.
> They are based on how printers and teletypes work. Indeed, and on that basis they are the ones that make sense.
> A version of the history is <http://en.wikipedia.org/wiki/Newline> Why everyone couldn't pick the same standard really, whatever it was.
[1] Notepad wont. Visual studio will but will tell you that your line terminators are wrong.
 Signature Woody
www.alienrat.com
Rowland McDonnell - 01 Sep 2008 06:15 GMT > > > > > The key sequence is, > > > > > [quoted text clipped - 15 lines] > windows applications will not display any other type of line terminator > [1]. It is hardly obsolete as it is used everywhere. `Everywhere' in the Windoze world. Not so in the Unix world.
It's obsolete because the technical reasons for using it vanished in the 1970s if not before, and the only reason we still use it is that CP/M used it, so MS-DOS used it and it's just become a nasty inefficient habit.
> > > (why oh why did MS ever decide to do it, I mean why waste space like > > > that? WHY???) [quoted text clipped - 3 lines] > > No, that is a mac only thing. No, it's also an `all the 8 bit micros you've heard of except for CP/M' thing.
But the idiotic inefficient wasteful CP/M way of doing things took over for historical reasons I don't have to explain to you, surely?
> > They are based on how printers and teletypes work. > > Indeed, and on that basis they are the ones that make sense. But it does not make sense because printers and teletypes can generate their own `LF' on seeing a `CR', thus permitting a saving on storage and transmission.
> > A version of the history is <http://en.wikipedia.org/wiki/Newline> > > Why everyone couldn't pick the same standard really, whatever it was. They did all pick the same standard, except for IBM. That standard was ASCII.
They all implemented this standard in a different way.
Most firms were bright enough to use either CR *OR* LF. A few used both. Unfortunately, MS-DOS was a port of CP/M, one of the brain-dead few that used both.
[snip]
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Rowland McDonnell - 01 Sep 2008 06:15 GMT > > > > The key sequence is, > > > > [quoted text clipped - 15 lines] > > Because they followed various standards. But why waste time following an obsolete standard designed for the 1960s? Designed for technology that was obsolete before they started designing the IBM PC in the first place?
> Unix was the non standard way. `The' non standard way? No, there were multiple ways of doing things, none of which were accepted as `the standard'.
IBM didn't use ASCII at all and certainly didn't bother wasting space like that.
DEC did waste space.
Unix didn't waste space.
<shrug>
You could argue that it was only the lunatic fringe that used the inefficient CR/LF combination.
> RFCS for Mail, HTTP use CRLF. Really? Tell me more: which RFCs?
They were surely all written *AFTER* MS-DOS had taken over the world? So specifying CR/LF just means `We want to be compatible with the lowest common denominator' which makes a sort of sense - and you might as well do so come 1990 (when the Web started) because storage and transmission costs had dropped a *LOT* by then compared to the 1970s.
> (And did anyone but Mac just have CR?) How about pretty much every 8 bit micro in the known universe /except/ for the CP/M crowd - they used CR/LF which I hope explains to you why MS-DOS and Windoze got CR/LF.
> They are based on how printers and teletypes work. Printers and teletypes have long (maybe always?) had the ability to provide their own CR/LF combination from a single character input[1]. So there is no sanity in wasting storage space *saving* two characters when only one is required to be *stored* - especially back in the 1970s when CP/M was created and storage was very expensive.
<shrug> But CP/M did it the insane way.
Even if you do need to kick out a CR/LF pair, you can do that at printing time with a very small bit of code - thus saving expensive storage space even taking into account the fact that you need more code to run things.
> A version of the history is <http://en.wikipedia.org/wiki/Newline> Reads a bit dodgily to me. However: it seems to me from that that CR/LF was a weird way of doing it even back in the old days.
Rowland.
[1] I used to think otherwise, having had a teletype myself and having to send it CR/LF (BBC Micros send LF/CR when set to sent both - fine for a 1980s dot matrix, NBG for an old teletype. So I had to write an output routine patch to send an extra CR) - but then I found out that they had a *mechanical* adjustment to change to the other mode of operation. But I'd had no manual for the thing and no way was I going to poke around tweaking things inside a machine that complex without guidance.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Rod - 01 Sep 2008 09:34 GMT >>>>> The key sequence is, >>>>> [quoted text clipped - 80 lines] > to poke around tweaking things inside a machine that complex without > guidance. ICL's VME, which started in the teletype era, used EBCDIC. It had LF, CR and NL (x'15'). NL combined the functions of LF and CR in one character. But x'14' and x'15' had different uses in older-style teletypes, I gather.
 Signature Rod
Hypothyroidism is a seriously debilitating condition with an insidious onset. Although common it frequently goes undiagnosed. <www.thyromind.info> <www.thyroiduk.org> <www.altsupportthyroid.org>
Rowland McDonnell - 02 Sep 2008 07:28 GMT [snip]
> >> They are based on how printers and teletypes work. > > [quoted text clipped - 3 lines] > > when only one is required to be *stored* - especially back in the 1970s > > when CP/M was created and storage was very expensive. [snip]
> > [1] I used to think otherwise, having had a teletype myself and having > > to send it CR/LF (BBC Micros send LF/CR when set to sent both - fine for [quoted text clipped - 7 lines] > ICL's VME, which started in the teletype era, used EBCDIC. It had LF, CR > and NL (x'15'). NL combined the functions of LF and CR in one character. Now *that* makes good sense.
btw, EBCDIC came in many forms - within IBM, not just due to those who lifted it for their own use.
> But x'14' and x'15' had different uses in older-style teletypes, I gather. Does x'14' mean 20, and x'14', 21?
Older teletypes used Baudot code (ITA1), or International Telegraph Alphabet No 2 (ITA2).
Read and learn - I did: <http://en.wikipedia.org/wiki/Baudot_code>.
Baudot was arranged in a particular way to minimise operator fatigue. It's a five bit code and the original scheme involved a five finger chording keyboard - for direct binary input, required to be synchronous with the `whole system', whatever that was (or so it seems to me from reading the Wikipedia page).
CR and LF didn't turn up until the 1901 creation of a different code arrangement - created by the bloke who'd just invented typewriter-like keyboard for teletype operation.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Mark Bestley - 01 Sep 2008 13:46 GMT > > > > > The key sequence is,
> > Because they followed various standards. > [quoted text clipped - 28 lines] > do so come 1990 (when the Web started) because storage and transmission > costs had dropped a *LOT* by then compared to the 1970s. Most if not all. A quick search for CRLF gives an early one as RFC561 - Standardizing Network Mail Headers from September 1973 I know it is in the Mail and Http RFC having had to debug some agents.
and as these standards were in use you needed to keep them for compatibility.
> Printers and teletypes have long (maybe always?) had the ability to > provide their own CR/LF combination from a single character input[1]. [quoted text clipped - 8 lines] > storage space even taking into account the fact that you need more code > to run things. Yes but how do you go to the beginning of the linr to overwrite? That is one reason that CR is on its own. Plus I have seen LF on its own used to print things
> > A version of the history is <http://en.wikipedia.org/wiki/Newline> > > Reads a bit dodgily to me. However: it seems to me from that that CR/LF > was a weird way of doing it even back in the old days. What exactly is dodgy about this - it seems to roughly match information I have seen over the years.
 Signature Mark
Tim Streater - 01 Sep 2008 15:25 GMT > > Really? Tell me more: which RFCs? > > [quoted text clipped - 10 lines] > and as these standards were in use you needed to keep them for > compatibility. A quick google for "SMTP mail" led to the wikipedia article and to RFCs 2821 and 2822 (both pretty recent) which both unequivocally state that CRLF is mandatory.
Richard Tobin - 01 Sep 2008 16:09 GMT >Most if not all. A quick search for CRLF gives an early one as RFC561 - >Standardizing Network Mail Headers from September 1973 >I know it is in the Mail and Http RFC having had to debug some agents. I think the first RFC that standardised this was RFC139, for Telnet (May 1971):
The representation of the end of a physical line at a terminal is implemented differently on network HOSTS. For example, some use a return (or new line) key, the terminal hardware both returns the carriage or printer to start of line and feeds the paper to the next line. In other implementations, the user hits carriage return and the hardware returns carriage while the software returns to the terminal a line feed. The network-wide representation will be carriage return followed by line feed. It represents the physical formatting that is being attempted, and is to be interpreted and appropriately translated by both using site and serving site.
This was referring to the encoding of the data. Later RFCs have (I think) uniformly used CR-LF for textual protocol header fields (one of the nice things about internet protocols is that many of them use human-readable headers, unlike the bit-saving formats in, for example, the Janet protocols). HTTP is a relatively recent example.
-- Richard
 Signature Please remember to mention me / in tapes you leave behind.
Rowland McDonnell - 02 Sep 2008 07:28 GMT > >Most if not all. A quick search for CRLF gives an early one as RFC561 - > >Standardizing Network Mail Headers from September 1973 [quoted text clipped - 19 lines] > human-readable headers, unlike the bit-saving formats in, for example, > the Janet protocols). HTTP is a relatively recent example. Hmm.
Hmmmmmm...........
Righto.
I shall be thinking about this.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Rowland McDonnell - 02 Sep 2008 07:28 GMT > > > > > > The key sequence is, [snip]
> > > RFCS for Mail, HTTP use CRLF. > > [quoted text clipped - 8 lines] > Most if not all. A quick search for CRLF gives an early one as RFC561 - > Standardizing Network Mail Headers from September 1973 Oh! Hmm. Interesting. 1973, eh?
Hmmm....
I want to learn more about why they did it.
> I know it is in the Mail and Http RFC having had to debug some agents. > > and as these standards were in use you needed to keep them for > compatibility. Aye - but you don't have to use them internally, do you? Use efficient methods inside your OS environment, and translate to the inefficient communication protocol only when you have to.
> > Printers and teletypes have long (maybe always?) had the ability to > > provide their own CR/LF combination from a single character input[1]. [quoted text clipped - 11 lines] > Yes but how do you go to the beginning of the linr to overwrite? That is > one reason that CR is on its own. The reason that one could adjust teletypes to provide an LF in addition to performing a CR on receipt of a CR is that quite often *that* was rather handy. Backspace characters exist, you know.
But if you have a teletype as a terminal, the original ASCII control code set makes some sort of sense. I never suggested otherwise.
However, they make no sense in a world of printers with buffers, and electronic displays such as CRT monitors. Use X,Y addressing, use page printing methods, use line printing methods. There is no need to perform a `CR' operation when printing in the modern sense, is there?
There was never any need to do so when printing from late 70s/early 80s home micros onto normal dot matrix printers of that era, either.
By that time, the ability to perform a separate `carriage return' operation had ceased to make any sense from the point of view of printing utility - although it did make sense back in the teletype era.
> Plus I have seen LF on its own used to > print things Yes, but it makes a lot more sense to do things the PostScript way, don't you think?
Generate a page, and then print it?
I think you've hit upon why it is that most early 80s PCs used CR as a `line terminator' - there was no need back then for the ability to perform a carriage return, but winding up a single line? Yes.
Although there is also a vertical tab character defined in ASCII - as well as an escape character, which permits any extension to ASCII you like. So if you really want to build a printer which lets the user play around with it in fashions that have no obvious utility, why not?
> > > A version of the history is <http://en.wikipedia.org/wiki/Newline> > > > > Reads a bit dodgily to me. However: it seems to me from that that CR/LF > > was a weird way of doing it even back in the old days. > > What exactly is dodgy about this It reads like unreliable information - the style of presentation damns it according to my nose.
> - it seems to roughly match information > I have seen over the years. Righto - thanks for the confirmation of its validity.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Rowland McDonnell - 01 Sep 2008 06:56 GMT > > Okay, it's like this: > > [quoted text clipped - 42 lines] > The Ctrl-V says, expect a control character next, and the Ctrl-I is tab. > Ctrl-L is LF and Ctrl-M is CR. Umm. I need LFs.
(LF/CR is surely wrong? CR/LF would be less wrong, but still wrong.)
At least, I *thought* I needed LFs. I've just done it with LFs along - TextEdit doesn't like the result, and nor does TeXShell.
TeXShell claims to like both Unix and Mac line terminators.
It seems that here&now, TextEdit and TeXShell don't work with Unix line terminators (LF) but do work with Mac and Windoze line terminators (CR or CR/LF).
Does anyone have any remarks to make that might shed some light on this bizarrity?
> You can obviously replace tab with either CR, > LF or both, or whatever. The g on the end says to replace all occurences. [quoted text clipped - 13 lines] > > Hope that's what you're after there, Something like it - thanks.
When I type in the line you suggest, and wrap round for a second line of text entry, I get the second line appearing on the same line as the first.
It seems to work anyway. I do get inconsistent behaviour when typing in these lines - sometimes I get appearing what you posted here, sometimes I get crazy stuff like:
sed -E 's/\342\210\232\313\232
which just appeared when I was trying to type in another attempt to replace character 11 (dec) with \par.
Any idea what's going on with all this? It makes it very hard to recall what you've done when you can't read it on account of it being munged several different ways by the Terminal.
What the Terminal's doing at the moment makes it impossible to `go back a line' with up-arrow, so it's turned out quite tedious to do the required fiddling to get it to work.
(it's all well and good having a command to enter, but you have got to get it right - and when part of what you type doesn't appear or appears in random guises, and the rest of it is combinations of \'/, mistakes are easy. Very very easy. Oh god not again! ARGH! That should have been a backslash, boy, a BACKSLASH!!!! <cough>)
> You can use the same technique for your > hex character, of course B is 11, which is the 11th letter of the alphabet so > use Ctrl-VCtrl-K as the thing to be replaced for that one. Yep, I've got that to work too.
So my question is: how come this works? I've not seen any reference to this way of entering data for sed anywhere I've looked.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Gary - 01 Sep 2008 16:58 GMT > Yep, I've got that to work too. > > So my question is: how come this works? I've not seen any reference to > this way of entering data for sed anywhere I've looked. > > Rowland. It's not a sed thing specifically. It's a shell thing. Normaly if you wanted x'03' and just pressed CTRL-c, you'd terminate whatever you were doing as the CTRL-c would be picked up by the shell. The CTRL-v first tells the shell that the next CTRL- character is not to be interpreted as a signal, but just used as data.
 Signature remove stars for email g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m
Richard Tobin - 01 Sep 2008 17:23 GMT >It's not a sed thing specifically. It's a shell thing. It's not even a shell thing, though probably bash implements its own version of it. It's in the terminal driver (see the termios man page, under LNEXT).
-- Richard
 Signature Please remember to mention me / in tapes you leave behind.
Gary - 01 Sep 2008 17:54 GMT > It's not even a shell thing, though probably bash implements its own > version of it. It's in the terminal driver (see the termios man page, > under LNEXT). > > -- Richard Yep. People don't usually get that close to the terminal driver, but it's true. I use this technique in applications where my login shell _is_ the application (from /etc/passwd) so yep. I should not have been so generic.
/me shoots myself in all three feet.
 Signature remove stars for email g*a*r*y*c*o*w*e*l*l*a*t*m*a*c*d*o*t*c*o*m
Rowland McDonnell - 02 Sep 2008 07:47 GMT > >It's not a sed thing specifically. It's a shell thing. > > It's not even a shell thing, though probably bash implements its own > version of it. It's in the terminal driver (see the termios man page, > under LNEXT). Ye gods. Urgh.
Okay - no wonder I've never come across this particular bobble.
Even if I'd met the man page in question, I think this intro:
SYNOPSIS
#include <termios.h>
DESCRIPTION
This describes a general terminal line discipline that is supported on tty asynchronous communication ports.
would have put me off reading it instantly. Hell, that intro *still* puts me off reading it...
But I've had a look at I don't see how I'm supposed to be able to go from:
LNEXT Special character on input and is recognized if the IEXTEN flag is set. Receipt of this character causes the next character to be taken literally.
to understanding that that's what ctrl-v does. I mean, yeah, okay, it's talking about the job in hand - but really... This is, let's face it, not really what one would call user documentation, is it?
The wonderful world of man pages, eh?
Cheers, Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Rowland McDonnell - 04 Sep 2008 09:33 GMT > > Okay, it's like this: > > [quoted text clipped - 60 lines] > hex character, of course B is 11, which is the 11th letter of the alphabet so > use Ctrl-VCtrl-K as the thing to be replaced for that one. The above works okay - but I've run into another problem.
I'd like to put the above into a command line script - but the above stuff's for the Terminal, not TextEdit.
Any ideas what I might try?
Cheers, Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Justin C - 04 Sep 2008 22:58 GMT >> $ sed -E 's/ /^L^M/g' <afile >> a [quoted text clipped - 6 lines] > I'd like to put the above into a command line script - but the above > stuff's for the Terminal, not TextEdit. ------ #!/bin/sh
sed -E 's/ /^L^M/g' < $1 > $2 ------
Call the file whatever you like, 'boris' or something. Don't forget to make it executable ('chmod u+x boris'), then: $ boris input.txt output.txt
Justin.
 Signature Justin C, by the sea.
jim - 05 Sep 2008 06:36 GMT > ------ #!/bin/sh > > sed -E 's/ /^L^M/g' < $1 > $2 ------ > > Call the file whatever you like, 'boris' or something. Don't forget to > make it executable ('chmod u+x boris'), then: $ boris input.txt output.txt For that to work you'll also have to move 'boris' to somewhere like /usr/local/bin
Otherwise you'll have to type
$ ./boris input.txt output.txt
Jim
 Signature "Well, well. We've come a long way from the Prime Minister's exploding cake." - Adam West, Batman.
http://www.UrsaMinorBeta.co.uk http://twitter.com/GreyAreaUK
Rowland McDonnell - 08 Sep 2008 09:25 GMT > > ------ #!/bin/sh > > [quoted text clipped - 9 lines] > > $ ./boris input.txt output.txt I was thinking of saving the file with a .command extension and running it using the Finder.
Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Justin C - 08 Sep 2008 23:38 GMT >> > ------ #!/bin/sh >> > [quoted text clipped - 12 lines] > I was thinking of saving the file with a .command extension and running > it using the Finder. How will you pass it the names of the file(s) to process?
WRT character number 11, I don't know how to do that with sed.
Here's the perl I worked out for Char 11 part. It requires the files to be processed to be the *only* files in the directory. It creates new files with the prefix "new_". It's untested.
If you wish to give it a try, copy and paste into TextMate (which it sounds like you have), save it, make it executable (as per a previous post). In terminal navigate to the directory containing the files on which the search/replace needs to be performed, and type: ~/path/to/executable
It'll work through your files, where it finds Char 11 (hex 0b) it'll replace it with the character string \par (the first \ in the replace string is to escape the second, it isn't a typo). The output will be in a new file leaving original data in tact.
I just created a few test files, and ran a few tests (so it is partially tested now with fake data), and it appears to do what you want. I release it into the public domain.
To any perl gurus out there, yes, I know there are probably better ways of doing it, but I don't know them! And to those command line gurus: This is the only way I know how!
[begin here] #!/usr/bin/perl
use warnings; use strict;
my @files = glob "*";
foreach ( @files ) { my $newname = "new_" . $_; my $oldname = $_; open (INPUT, "<" , $oldname) or die "Cannot open $oldname : $!"; open (OUTPUT, ">" , $newname) or die "Cannot create $newname : $!";
while ( <INPUT> ) { $_ =~ s/\x0b/\\par/g; print OUTPUT $_; } close INPUT; close OUTPUT; } [done] <-- don't copy this or the begin line, but you know that already.
Justin.
 Signature Justin C, by the sea.
Rowland McDonnell - 08 Sep 2008 09:25 GMT > >> $ sed -E 's/ /^L^M/g' <afile > >> a [quoted text clipped - 16 lines] > make it executable ('chmod u+x boris'), then: > $ boris input.txt output.txt Aha!
Yes - got that.
But, erm, I'm still stuck with the `character number 11' problem - I can't enter that into TextEdit, can I?
Cheers, Rowland.
 Signature Remove the animal for email address: rowland.mcdonnell@dog.physics.org Sorry - the spam got to me http://www.mag-uk.org http://www.bmf.co.uk UK biker? Join MAG and the BMF and stop the Eurocrats banning biking
Bruce Horrocks - 08 Sep 2008 22:05 GMT >>>> $ sed -E 's/ /^L^M/g' <afile >>>> a [quoted text clipped - 21 lines] > But, erm, I'm still stuck with the `character number 11' problem - I > can't enter that into TextEdit, can I? Hi Rowland,
Do this instead:
perl -p -e 's/\t/\n/g;' -e 's/&/\\\&/g;' -e 's/\xB/\\par/g;' input_file
> output_file All one line.
Regards,
 Signature Bruce Horrocks Surrey England (bruce at scorecrow dot com)
Justin C - 09 Sep 2008 00:54 GMT > perl -p -e 's/\t/\n/g;' -e 's/&/\\\&/g;' -e 's/\xB/\\par/g;' > input_file output_file There, I bloody knew someone would!
Justin.
 Signature Justin C, by the sea.
|
|
|