Working with Outline levels and paragraphs in VBA

S

Sandusky

Windows XP Pro SP2
Word 2002 SP3

I'm new to working in VBA with Word, but I've done tons of VBA in Excel, so
I understand the environment pretty well, just not the Word objects.

I have a Word document that was created from a text file, presumably
Notepad. Anyway, all of the "paragraphs" have hard returns at end of each
line in the paragraph. If I'm not mistaken, Word will then treat each line
as its own paragraph.

I'm looking for a way to clean this up. The only thing I have going for me
is that whoever had this before me had meticulously inserted well organized
outline levels all through the document (400+ pages). Can these outline
levels be used to me advantage? Can sections or ranges be assigned to
specific outline level index?

I could really use any advice you might have, as I have no idea how hard or
easy it would be to clean up these "paragraphs".

Thanks!

-gk-
 
K

Klaus Linke

Hi,

As for the hard ¶ returns at the end of each line:

For tipps, see http://www.word.mvps.org/FAQs/Formatting/CleanWebText.htm

Often in such files, the "line breaks" have a space before them, while the
"real" paragraph marks haven't. In that case, you could replace " ^p" with a
space.

Or the "real" paragraph marks are marked by two or more consecutive
paragraph marks, in which case you could first format multiple paragraph
marks say as "bold":
Edit > Replace, check "Match wildcards,
Find what: ^13(^13){1,}
Replace with: ^& ((and Format > Font > Bold))

Then replace the remaining (single) paragraph marks that are not bold (with
a regular Replace) with spaces.


As for the headings/outline levels: How have they been applied in the text
file?
Often, you have say 3 empty paragraphs before a heading and one empty ¶
below.
Then you could remove the empty ¶s and add a tag with a wildcard
replacement,
Find what: (^13)^13^13^13([!^13]@^13)^13
Replace with: \1<H>\2

That gets rid of the empty paragraph marks (^13 can be used instead of ^p in
a wildcard "Find", \1 inserts the first parenthesized (expression), \2 the
second...
It also inserts "<H>" as a marker tag in front of the designated heading.

Or (other) headings might be differentiable from regular text because they
are short (say between 2 and 20 characters) and don't have a punctuation
mark (?!.) at the end.
Then you could insert a tag with a wildcard replacement
Find what: (^13)([!^13\!\?.]{2,20}^13)
Replace with: \1<H>\2

[!^13\!\?.] matches any character except a ¶ mark, and the punctuation marks
"! ? ."
{2,20} looks for between 2 and 20 of them.
In some countries, the list separator is a semicolon instead of a comma:
{2;20}

Then in two regular replacements, replace first the tag <H> with a "Heading"
style of your choice, to apply the style, then replace the tag with nothing
to delete it.

How to best do it depends a lot on how the file looks exactly...

Regards,
Klaus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top