V
Vince
The complexity of this task got my attention. Let me explain as best I can:
Problem / Input:
1. djskljfdslkfjdsf
2.. sadaskdsa
3.. dsadsadsa
1.. fsdfdsf
2.. dfdsfds
3.. dfdsfds
4.. djasda
Desired Output:
<LISTGROUP>
<LISTITEM>
<NUMBER>1.</NUMBER> <TEXT>djskljfdslkfjdsf</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>2.</NUMBER> <TEXT>sadaskdsa</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>3.</NUMBER> <TEXT>dsadsadsa</TEXT>
<LISTITEM>
<NUMBER>a.</NUMBER> <TEXT>fsdfdsf</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>b.</NUMBER> <TEXT>dfdsfds</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>c.</NUMBER> <TEXT>dfdsfds</TEXT>
</LISTITEM>
</LISTITEM>
<LISTITEM>
<NUMBER>4.</NUMBER> <TEXT>djasda</TEXT>
</LISTITEM>
</LISTGROUP>
Notice how each individual item is encapsulated in a <LISTITEM> tag and how
one level (3a, 3b, 3c) are encapsulated under 3's LISTITEM tag. Well, this
is hard to explain but I hope you can see the logic.
Now, there could be upto 4 such levels (i.e) 3.a may have a 3.a.i which may
further have a 3.a.i.A and so on. The taggings need to be done
appropriately. Also, the numbers may either be typed or could be part of the
automatic bullet lists that Word has. Also, if manual spaces are used
instead of bullet lists, they are usually of the right number. I mean,
1. dasdas
a. dssffsd
b. sadajkjs
c. dfskjs
is also possible (spaces usually have a + or - 2 error rate)
Questions:
1) Any idea on how I should go about doing this in the most error-free
fashion? We get many documents to process and the most error-free method is
desirable.
Before I begin coding this, I thought I would see if any one had any special
tips I should take into consideration.
Thank you for your time/ response.
Vince
Problem / Input:
1. djskljfdslkfjdsf
2.. sadaskdsa
3.. dsadsadsa
1.. fsdfdsf
2.. dfdsfds
3.. dfdsfds
4.. djasda
Desired Output:
<LISTGROUP>
<LISTITEM>
<NUMBER>1.</NUMBER> <TEXT>djskljfdslkfjdsf</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>2.</NUMBER> <TEXT>sadaskdsa</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>3.</NUMBER> <TEXT>dsadsadsa</TEXT>
<LISTITEM>
<NUMBER>a.</NUMBER> <TEXT>fsdfdsf</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>b.</NUMBER> <TEXT>dfdsfds</TEXT>
</LISTITEM>
<LISTITEM>
<NUMBER>c.</NUMBER> <TEXT>dfdsfds</TEXT>
</LISTITEM>
</LISTITEM>
<LISTITEM>
<NUMBER>4.</NUMBER> <TEXT>djasda</TEXT>
</LISTITEM>
</LISTGROUP>
Notice how each individual item is encapsulated in a <LISTITEM> tag and how
one level (3a, 3b, 3c) are encapsulated under 3's LISTITEM tag. Well, this
is hard to explain but I hope you can see the logic.
Now, there could be upto 4 such levels (i.e) 3.a may have a 3.a.i which may
further have a 3.a.i.A and so on. The taggings need to be done
appropriately. Also, the numbers may either be typed or could be part of the
automatic bullet lists that Word has. Also, if manual spaces are used
instead of bullet lists, they are usually of the right number. I mean,
1. dasdas
a. dssffsd
b. sadajkjs
c. dfskjs
is also possible (spaces usually have a + or - 2 error rate)
Questions:
1) Any idea on how I should go about doing this in the most error-free
fashion? We get many documents to process and the most error-free method is
desirable.
Before I begin coding this, I thought I would see if any one had any special
tips I should take into consideration.
Thank you for your time/ response.
Vince