Parsing Numbered/Bulleted Paragraphs

J

James K

I looking for some direction on a code snippet that I'm trying to develop. My VBA experience is entirely in Excel, so I'm not sure if this is feasible in Word in the first place.


Context: I have a questionnaire that is very long and has the numbered questions mixed in with lots of irrelevant text and directions for the administrator.

What I want to do is "comb" through a document and pull out any bullet-numbered paragraphs. Then the bullets would be placed into a new doc with the same numbering. Ideally I am going to put these into a lookup in Excel, but at the moment I'm just trying to figure out if this first part is even possible.

I tried parsing the XML with Python, and managed to pull out the text I needed. But I soon realized that there was no practical way to reverse engineer the actual bullet numbers that way.

In short, is there any easy way to identify each number-bulleted paragraph and then copy-paste the bullet and it's contents to another doc?


Thanks for any help or direction in this.

JK
 
S

Shauna Kelly

Hi James

Assuming that the bulleted paragraphs are all formatted in the same style,
create a table of contents using that style, click in the ToC and do
Ctrl-Shift-F9 to unlink the table of contents (it's something equivalent to
Excel Copy > Paste as Values: it turns everything to flat text), then you
can copy into any other document you need.

If the bulleted paragraphs are not all formatted in the same style then (a)
you've just learned why using styles to format text is a always good idea
and (b) you'll have to cycle through the .Paragraphs collection looking for
those paragraphs with a .ListFormat and read the .ListString. Try something
like ActiveDocument.Paragraphs(n).Range.ListFormat.ListString.

Or, if you can create a throw-away copy of the document, you can do
ActiveDocument.ConvertNumbersToText. This is a one-way street. It converts
all paragraph numbering to flat text. There's no reverse or undo.

Cheers

Shauna

Shauna Kelly
http://www.shaunakelly.com/word
 
J

James K

Thanks Shauna,

ActiveDocument.Paragraphs(n).Range.ListFormat.ListString was exactly what I was looking for. After reading that I just did a check if the .ListString contained only 0-9 or a decimal and then copy to excel if that was true. It caught about 99% of the cases I wanted to drop and the rest are easy to filter out in excel.

I agree about the styles, but unfortunately I'm getting these documents from a number of different sources and can't count on them all to do that.


Thanks again,

JK
 
J

James K

Thanks Shauna,

ActiveDocument.Paragraphs(n).Range.ListFormat.ListString was exactly what I was looking for. After reading that I just did a check if the .ListString contained only 0-9 or a decimal and then copy to excel if that was true. It caught about 99% of the cases I wanted to drop and the rest are easy to filter out in excel.

I agree about the styles, but unfortunately I'm getting these documents from a number of different sources and can't count on them all to do that.


Thanks again,

JK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top