Parsing Word Document based on Selection.Styles

S

STeve

Hey guys,

I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:

Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()

Dim arguments As [String]() = Environment.GetCommandLineArgs()


dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True

w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)

Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatistic.wdStatisticLines)

Dim i As Integer

For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)

If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If

w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next



I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:

Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdStatisticLines)

For x = 0 To count1
Selection.HomeKey Unit:=wdLine

If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If

If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text

y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")

y1 = y
End If

Selection.MoveDown Unit:=wdLine, count:=1
Next

file = path + y
saveHTML (file)
End Sub

Any help is appreciated, Thanks guys
Steve
 
C

Cindy M -WordMVP-

Hi STeve,
I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved?
My inclination would be to use Word's Find/Replace functionality.
Roughly, I'd do this:

- declare two ranges
- rng1.Find to locate the first instance of "Heading 1"
- rng2.Find to get the second instance of "Heading 1"
- set the rng1.End to the rng2.Start point
- put this rng1 into your HTML-whatever

From this point on, you already have the starting point, so you can
collapse rng1 to its end point, collapse rng2 to its end-point, and
rng2.Find again to get the next section. And so on.

Learning the Word object model isn't simple, but a good way to get a
handle on it is to try a few things out in the UI, then record them
in macros. The code you get isn't optimal, but it will give you an
idea what objects, methods and properties you need.

For Find/Replace, you should also review all the article on that
topic at mvps.org/word. This should give you an idea of "proper
syntax" and alert you to possible pitfalls.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Sep 30 2003)
http://www.mvps.org/word

This reply is posted in the Newsgroup; please post any follow
question or reply in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top