Another tagging question

V

Vince

The complexity of this task got my attention. Let me explain as best I can:

Problem / Input:
1. djskljfdslkfjdsf
2.. sadaskdsa
3.. dsadsadsa
1.. fsdfdsf
2.. dfdsfds
3.. dfdsfds
4.. djasda

Desired Output:
<LISTGROUP>
<LISTITEM>
<NUMBER>1.</NUMBER> <TEXT>djskljfdslkfjdsf</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>2.</NUMBER> <TEXT>sadaskdsa</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>3.</NUMBER> <TEXT>dsadsadsa</TEXT>
<LISTITEM>
<NUMBER>a.</NUMBER> <TEXT>fsdfdsf</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>b.</NUMBER> <TEXT>dfdsfds</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>c.</NUMBER> <TEXT>dfdsfds</TEXT>
</LISTITEM>
</LISTITEM>
<LISTITEM>
<NUMBER>4.</NUMBER> <TEXT>djasda</TEXT>
</LISTITEM>
</LISTGROUP>

Notice how each individual item is encapsulated in a <LISTITEM> tag and how
one level (3a, 3b, 3c) are encapsulated under 3's LISTITEM tag. Well, this
is hard to explain but I hope you can see the logic.

Now, there could be upto 4 such levels (i.e) 3.a may have a 3.a.i which may
further have a 3.a.i.A and so on. The taggings need to be done
appropriately. Also, the numbers may either be typed or could be part of the
automatic bullet lists that Word has. Also, if manual spaces are used
instead of bullet lists, they are usually of the right number. I mean,
1. dasdas
a. dssffsd
b. sadajkjs
c. dfskjs

is also possible (spaces usually have a + or - 2 error rate)

Questions:
1) Any idea on how I should go about doing this in the most error-free
fashion? We get many documents to process and the most error-free method is
desirable.

Before I begin coding this, I thought I would see if any one had any special
tips I should take into consideration.

Thank you for your time/ response.

Vince
 
K

Klaus Linke

Hi Vince,

If you have used styles for your outline numbering, putting in the <TEXT> and
<LISTITEM> tags is easy:
Search for those styles with "Match wildcards" checked,
Find what: ([!^13]@) (^13)
Replace with: <TEXT>\1</TEXT>\2

The numbers are trickier. You probably need to loop the paragraphs, look for
list paragraphs:
http://www.word.mvps.org/faqs/numbering/ListString.htm

After you have added the <NUMBER> tags and the numbers, you can add the
<LISTITEM> tags with another replacement:
Find what: ([!^13]@) (^13)
Replace with: <LISTITEM>\1</LISTITEM>\2

With the <LISTGROUP> tags, you are on your own because Word has no such concept.
As far as Word is concerned, every list starts at the beginning of the document
and goes up to the end.

So you'd have to check the restarts that may have been applied, and figure out
the beginning and end of the lists for yourself.

It might be simpler to save as HTML or XML, examine the tags, and see if you can
live with that, or process those tags further.

Greetings,
Klaus
 
K

Klaus Linke

The numbers are trickier.

Probably you could simply use "ActiveDocument.ConvertNumbersToText", insert the
<LISTITEM> tags, and then add the <NUMBER>...</NUMBER> tags between the
<LISTITEM> and <TEXT> tags with another Find/Replace or two.

Klaus
 
V

Vince

Klaus,

I'll look into this, thanks a lot.
Vince

Klaus Linke said:
Probably you could simply use "ActiveDocument.ConvertNumbersToText", insert the
<LISTITEM> tags, and then add the <NUMBER>...</NUMBER> tags between the
<LISTITEM> and <TEXT> tags with another Find/Replace or two.

Klaus
 
V

Vince

Klaus,

That link did the trick!! Of course, it only works for numbered lists (those
automatically done using word). I am trying to write a routine that converts
the manually typed lists to a numbered list...

Thanks, again for the link and your suggestions.

Vince
 
K

Klaus Linke

Hi Vince,
That link did the trick!!

Good to hear!
Of course, it only works for numbered lists (those automatically
done using word). I am trying to write a routine that converts
the manually typed lists to a numbered list...

You probably could loop the paragraphs, check for numbering (such as numbers
followed by a dot followed by whitespace) directly, and insert the proper
<NUMBER> tags.
Turning typed numbering into automatic numbering could be quite a job... and you
want to turn it back into "hard" text later, anyway.

Greetings,
Klaus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top