Replace all paragraphs by their plain text versions

H

Hans List

Hi All,

In order to remove all character formatting, language codes
etc. from all paragraphs I wrote the following macro:

Sub ReplaceParsByPlainText()
Dim aPar As Paragraph
Dim myRange As Range
Dim myText As String

For Each aPar In ActiveDocument.Paragraphs
Set myRange = aPar.Range
'Remove paragraph mark from selection
myRange.SetRange Start:=myRange.Start, _
End:=myRange.End - 1
myText = myRange.Text

aPar.Range.Select
Selection.TypeText myText
Next aPar

End Sub

I presume the line 'Selection.TypeText myText' will probably
be the slowest part.

My question:
- Is this the fastest way to achieve my goal (paragraphs
without character formatting etc.)

Thank you for your suggestions!

Hans List
 
H

Hans List

Jezebel said:
With ActiveDocument.Content
.Cut
.PasteAndFormat wdFormatPlainText
End With


Well, actually I want to preserve all other formatting ;-)
Hans
 
D

Doug Robbins - Word MVP

How about just selecting the text and using Ctrl+Space? That will remove
any direct character formatting that has been applied and leave the text
with the default format set for the Style that is applied to the paragraph.

--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
 
G

Graham Mayor

I looked at that, but it won't affect headers/footers and it all gets even
more complicated if there are sections in the document each with its own
header/footer - and heaven forbid, text boxes :(?

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
J

Jezebel

That's true ... but if you're converting to plain text, surely you'd want to
discard all that stuff anyway?
 
H

Hans List

Doug said:
How about just selecting the text and using Ctrl+Space? That will remove
any direct character formatting that has been applied and leave the text
with the default format set for the Style that is applied to the paragraph.
I'm not sure if this method is drastic enough: I want to remove all
language changes etc. also. Will investigate the result!

Is there also VBA command to simulate ctrl+space?

Hans List
 
H

Hans List

Hi Jezebel and Doug,
activedocument.Content.Font.Reset

I'll investigate the result.

Perhaps I should give a little background: I translate Word documents.
For this I have to convert them to RTF. These documents often were made
with different Word versions (e.g. Word 2, German + Word 97 Japanese).

This (and other things) causes the RTF to be interspersed with all kinds
of (what we call) rogue codes, often in the middle of words. This slows
down the translation proces a lot and also disables automatic
terminology recognition (because of the codes in the middle of words).

So, I was thinking about a three step procedure:

1. Replace all important character (bold, italics, super/subscript etc.)
formatting by html-like codes:
The <b>quick</b> brown <i>fox</i> jumps over the lazy dog.

2. Run a macro (like ReplaceParsByPlainText) to remove all other
(unimportant character formatting and things that cause rogue codes).

3. Run a macro that replaces the html-like codes with real formatting.

Steps 1 and 2 are ready. I'm currently investigating the fastest and
most bullet-proof way for step 2.

Hans List
 
H

Hans List

Hi Graham and Jezebel,
That's true ... but if you're converting to plain text, surely you'd want to
discard all that stuff anyway?


No I don't want. See my other posting today about the background.

But, if the command ActiveDocument.Content.Font.Reset results in good
RTF file, I guess I can cycle through all stories in the active document
and reset the font info.

The 'cycling code' is available on the MVP site.

I'm not sure, however, if the reset font command is drastic enough.

Thanks for your help!

Hans List
 
G

Graham Mayor

I thought we had established that this wasn't plain text - now confirmed in
a later post :)

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
G

Graham Mayor

Have you tried opening and saving the RTF file in WordPad? This clears out
quite a lot of superfluous stuff.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
H

Hans List

Graham said:
Have you tried opening and saving the RTF file in WordPad? This clears out
quite a lot of superfluous stuff.
No I haven't. But I would image that I'd loose a lot of
formatting that WordPad doesn't support.

Hans List
 
H

Hans List

Strange thing with this macro:

Sub ReplaceParsByPlainText()
Dim aPar As Paragraph
Dim myRange As Range
Dim myText As String

For Each aPar In ActiveDocument.Paragraphs
Set myRange = aPar.Range
'Remove paragraph mark from selection
myRange.SetRange Start:=myRange.Start, _
End:=myRange.End - 1
myText = myRange.Text

aPar.Range.Select
Selection.TypeText myText
Next aPar

End Sub

It works all right to a certain point. After that paragraphs
are pasted directly after each other. Some paragraphs later
everything looks all right. Again some paragraphs later
everything is glued together.

I cannot discover what is triggering this behavior. Can it
be lack of memory? (I got a warning, 'This action cannot be
undone.').

Hans List
 
H

Hans List

Hans said:
Strange thing with this macro:

Sub ReplaceParsByPlainText()
Dim aPar As Paragraph
Dim myRange As Range
Dim myText As String

For Each aPar In ActiveDocument.Paragraphs
Set myRange = aPar.Range
'Remove paragraph mark from selection
myRange.SetRange Start:=myRange.Start, _
End:=myRange.End - 1
myText = myRange.Text

aPar.Range.Select

Should be:
myRange.Select ;-)
Selection.TypeText myText
Next aPar

End Sub


Seems to work OK now.

Hans List
 
G

Graham Mayor

Don't imagine it - try it! Your comment is a bit like the old Guinness ad.
"I don't like Guinness so I have never tried it!"

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
H

Hans List

Graham said:
Don't imagine it - try it! Your comment is a bit like the old Guinness ad.
"I don't like Guinness so I have never tried it!"
Did so... ;-)

Lost a whole bunch of formatting:

- Positioning of floating graphics
- Headers/footers
- TOC can no longer automatically be updated
- Headings lost styles

etc. etc.

So, this conversion isn't a solution.

Hans
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top