Deleting paragraph markers at the end of each line
|
|
Thread rating:  |
Stephen Fox - 26 Feb 2007 00:04 GMT I'm new to this group, so forgive me if this has been asked and answered before. Using Office X.
I've converted a long (5 MB) PDF file to a Word document. But each line is now a separate paragraph. Is there a way to delete these line paragraphs so as to consolidate lines into a single paragraph?
Hope that's clear.
Thanks!
Steve
Clive Huggan - 26 Feb 2007 03:24 GMT Welcome, Steve!
Click on the pilcrow button (backwards "P" -- paragraph mark) on the toolbar to make non-printing marks visible if they aren't already. Check whether the lines end just in a paragraph mark or a space followed by a paragraph mark.
Bring up the "Find & Replace" pane with Command-Shift-h. <==[I think it's Command-Shift-h in Word X -- I skipped from Word 2001 to Word 2004 so can't be sure; if not it's Command-h].
Click in the "Find what" box, then hold down the Shift key and type "6" to give you a carat "^", then follow that with a "p". Together, "^p" stands for "paragraph mark" in this pane.
In the "Replace with" box, either type a space (if you will need one to stop the adjacent words from being joined) or if not just click in the box, which will result in the paragraph mark being replaced with nothing.
Click "Find next" then "Replace" if it does what you want.
Once you're happy, key Command-r to trigger the Replace action. If you didn't have to watch for the genuine paragraph marks you could.
I usually display the Word document next to the PDF and replace the "genuine" paragraph marks first with a temporary character -- say, the Option-z character. Then I key Command-a to "Replace All" (i.e., to replace all the remaining "non-genuine" paragraph marks). Then I replace the temporary characters with paragraph marks, again with Command-a.
After the first time, the whole procedure takes less time than reading this post! ;-)
You may well know that you can paste text from PDFs into Word via Edit menu -> Paste Special -> Unformatted Text and the pasted-in text will have the characteristics of the paragraph in which your insertion point is located -- saves a lot of re-formatting. You can also export PDF text from Acrobat if that's what you're using.
I do a huge amount of this on occasions (have just checked this newsgroup as light relief from an hour of it). If you need to do it often, we have a macro that automates the paste-in; post back if you want details.
Cheers,
Clive Huggan Canberra, Australia (My time zone is 5-11 hours different from North America and Europe, so my follow-on responses to those regions can be delayed) ============================================================ Avoid long delays before your post appears -- use Entourage or newsreader software -- see http://word.mvps.org/Mac/AccessNewsgroups.html ============================================================
On 26/2/07 11:04 AM, in article #MCT#mTWHHA.600@TK2MSFTNGP05.phx.gbl,
> I'm new to this group, so forgive me if this has been asked and answered > before. Using Office X. [quoted text clipped - 8 lines] > > Steve Clive Huggan - 26 Feb 2007 03:31 GMT Well thanks for nothing, Steve -- I've just seen the other thread, created long before your new one, and realize I've been completely wasting my time.
Clive Huggan ============
On 26/2/07 2:24 PM, in article C2089E2A.26758%REMOVETHISoffice@ANDTHISstrategists.com.au, "Clive Huggan" <REMOVETHISoffice@ANDTHISstrategists.com.au> wrote:
> Welcome, Steve! > [quoted text clipped - 63 lines] >> >> Steve CyberTaz - 26 Feb 2007 06:20 GMT No Worries, Clive - perhaps another will benefit from the information :)
Regards |:>) Bob Jones [MVP] Office:Mac
On 2/25/07 10:31 PM, in article C2089FD6.2675B%REMOVETHISoffice@ANDTHISstrategists.com.au, "Clive Huggan" <REMOVETHISoffice@ANDTHISstrategists.com.au> wrote:
> Well thanks for nothing, Steve -- I've just seen the other thread, created > long before your new one, and realize I've been completely wasting my time. [quoted text clipped - 73 lines] >>> >>> Steve Stephen Fox - 26 Feb 2007 17:55 GMT > Well thanks for nothing, Steve -- I've just seen the other thread, created > long before your new one, and realize I've been completely wasting my time. [quoted text clipped - 73 lines] >>> >>> Steve Clive,
I hadn't seen the other thread either, before starting my own. Sorry. I will give your technique a try. Got nowhere with the suggestions from the other thread.
BTW, is this a top or bottom posting list?
Steve
Daiya Mitchell - 26 Feb 2007 18:12 GMT > BTW, is this a top or bottom posting list? General policy--maintain the existing pattern to avoid the worst-case scenario of back and forth responses. Inline posting with snipping irrelevant material is always accepted and welcomed. Top posting completely accepted. Bottom posting--just realize that some people will not bother to scroll all the way to the bottom to read your post. Some will.
I personally get kinda furious to be asked to scroll past four screens of irrelevant material to read a "thanks, all fine now", because if the conversation is over, it really doesn't matter about the logical flow of discussion. I generally choose top or bottom posting as seems appropriate to the context.
See also: http://word.mvps.org/Mac/AccessNewsgroups.html
Daiya
Clive Huggan - 26 Feb 2007 23:01 GMT On 27/2/07 4:55 AM, in article ObQoV9cWHHA.3980@TK2MSFTNGP02.phx.gbl,
>> Well thanks for nothing, Steve -- I've just seen the other thread, created >> long before your new one, and realize I've been completely wasting my time. [quoted text clipped - 83 lines] > > Steve Most of the people who respond to a lot of posts in the Microsoft newsgroups prefer top posting, Steve, because the time taken to scroll down becomes significant. However, if a bottom-posting trend has already started, we tend to follow. Otherwise it can become confusing. And of course there is inline posting. No great dramas either way.
Clive Huggan ============
Stephen Fox - 27 Feb 2007 19:15 GMT Clive,
I got your method to work, somewhat.
I'm going through the nearly 600 pages paragraph by paragraph, using the Command/Shift/H technique. But that's better than one line at a time.
Steve
Most of the people who respond to a lot of posts in the Microsoft newsgroups prefer top posting, Steve, because the time taken to scroll down becomes significant. However, if a bottom-posting trend has already started, we tend to follow. Otherwise it can become confusing. And of course there is inline posting. No great dramas either way.
Clive Huggan
> On 27/2/07 4:55 AM, in article ObQoV9cWHHA.3980@TK2MSFTNGP02.phx.gbl, > [quoted text clipped - 94 lines] > Clive Huggan > ============ Clive Huggan - 28 Feb 2007 09:18 GMT Thanks for the feedback, Steve.
Yes, those "real" end-of-para paragraph marks will be a pain -- but oh, the sheer enjoyment when you zap all the remaining, unwanted paragraph marks in one hit!
It's just occurred to me (sorry, I became very busy when I originally replied): I should have mentioned that if there are two paragraph marks between paragraphs, as there often is, the correcting is very quick because your first step is to replace all instances of ^p^p with say followed by replacing all the instances of ^p with nothing or a space -- all done in a minute, even with 600 pages (but on a Saved As copy, just in case ;-)
Though on second thoughts Elliott probably mentioned that.
Cheers, Clive H =======
On 28/2/07 6:15 AM, in article eNLZeOqWHHA.392@TK2MSFTNGP06.phx.gbl,
> Clive, > [quoted text clipped - 124 lines] >> Clive Huggan >> ============ Stephen Fox - 28 Feb 2007 17:26 GMT Clive,
I must have missed something, because the problem I ran into is a bit more complicated than the ones you describe.
The first time I tried the ^p replace thing, it did get rid of them all, including the "real" paragraph markers. I would up with a document that looked like a solid block of text. Bad news.
That left me with the same trick, but omitting the last line of each paragraph. But going through a 600 page book in that manner is also formidable.
Like I said, I must have missed something because you make it sound like Cheez Whiz.
Steve
> Thanks for the feedback, Steve. > [quoted text clipped - 143 lines] >>> Clive Huggan >>> ============ Elliott Roper - 28 Feb 2007 17:45 GMT > Clive, > [quoted text clipped - 11 lines] > Like I said, I must have missed something because you make it sound like > Cheez Whiz. You did, and it is, except not as cheesy.
The bit you missed is replacing ^p^p (i.e) any sequence of two paragraph marks with something not likely to be in your document. It becomes a temporary placeholder for your real paragraph ends while you nuke the unwanted ones into a smouldering pile of green glowing space characters. Yes, at that stage, your document is one enormous paragraph.
The next step replaces your placeholder with brand new genuine paragraph marks.
 Signature To de-mung my e-mail address:- fsnospam$elliott$$ PGP Fingerprint: 1A96 3CF7 637F 896B C810 E199 7E5C A9E4 8E59 E248
|
|
|