reading text out of word docs

K

Keith G Hicks

I have a post I added to the microsoft.public.dotnet.languages.vb newsgroup
yesterday with the same subject as this one (reading text out of word docs).
It's about how to read word docs using vb.net. I haven't gotten any
responses. I figured it wouldn't be very hard for someone who knows what
they're doing. I spent hours online and in the help file yesterday and got
almost nowhere. Maybe someone here would know how to handle this. I'm not
goign to put all the info here because it's not vba and I don't want to
xpost.
 
K

Keith G Hicks

Okay. Not getting help in the other newsgropu so I'm moving this here:

I started working on a program to read text out of some well organized word
docs. I've done this sort of thing in vba but not quite this extensively and
I'm not great with word automation. I know enough to be dangerous. LOL. I
need to open the doc (got that part done), locate certain phrases that are
in all of them and then read some text after those phrases into variables so
I can post them to a sql db. The part I'm struggling with is how to read the
doc. I'm not changing the docs in any way. They are deposited into a folder
on the network and I open and read them as they arrive. Setting up the
watcher for this in general is not a problem. I just need help reading the
docs in vb.net.

Here's some of what I have so far:

oWord = CreateObject("Word.Application")
oWord.Visible = True
oDoc = oWord.Documents.Open("C:\SomeWordDoc.doc", , True)

Dim rng As Word.Range

With oWord.Selection
..HomeKey(wdStory)
rng = .Range
End With

rng.Find.Text = "Issue date::"
If rng.Find.Execute() Then
'MsgBox("found")
rng = oWord.Selection.Range
rng.End = rng.Next(wdLine, 1).End ' rng.MoveEnd(wdLine)
MsgBox(rng)
Else
MsgBox("Not found")
End If

'move to linebelow "Issue Date:" to get county
I decided that it might be best to read teh entire text into a string
variable and use RegEx to get the pieces I need. But there's a problem with
that. There are some places in the text where that will work adn I know how
to do that. But the bigger problem for me is how to read specific lines. For
example, I need to read the 4th line of each document. There is no specific
text in the 4th line that I can use RegEx to find it with so I have to read
the 4th line. I found this idea somewhere:

rng.Start = oDoc.Paragraphs(4).Range.Start

rng.End = oDoc.Paragraphs(4).Range.End


It seems to work but not sure if that's teh best way.

Then the last thing is that there is a large block of text in the middle of
these documents that I will need to read. I know the line it starts on but
have no idea which line it will stop on. But there is a line that follows it
that I can find using RegEx. Not sure how to grab that text based on those
ideas.

Help with the above will really get me started well on this. I'd really
apprecate it.

Thanks,

Keith
 
G

Greg Maxey

See if any of this helps:
Sub ScratchMaco()
Dim oRng As Word.Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "Issue date:"
If .Execute Then
oRng.Collapse wdCollapseEnd
oRng.MoveEndUntil Chr(13)
If IsDate(oRng.Text) Then
MsgBox oRng.Text
oRng.Collapse wdCollapseStart
oRng.Move wdParagraph
oRng.MoveEndUntil Chr(13)
MsgBox "County is: " & oRng.Text
On Error Resume Next
MsgBox ActiveDocument.Paragraphs(4).Range.Text
'Or specifically the fourth line.
ActiveDocument.Range(0, 0).Select
Dim i As Long
For i = 1 To 3
With Selection
.MoveDown unit:=wdLine
.Bookmarks("\line").Select
End With
Next i
MsgBox Selection.Text
'GoTo a specific line e.g., line 8:
Selection.GoTo What:=wdGoToLine, Count:=8
'Set a range equal to the complete paragraph range.
Set oRng = Selection.Paragraphs(1).Range
MsgBox oRng.Text
Else
MsgBox "No date found on this line"
End If
End If
End With
End Sub
 
K

Keith G Hicks

Very helpful. Thank you.


Greg Maxey said:
See if any of this helps:
Sub ScratchMaco()
Dim oRng As Word.Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "Issue date:"
If .Execute Then
oRng.Collapse wdCollapseEnd
oRng.MoveEndUntil Chr(13)
If IsDate(oRng.Text) Then
MsgBox oRng.Text
oRng.Collapse wdCollapseStart
oRng.Move wdParagraph
oRng.MoveEndUntil Chr(13)
MsgBox "County is: " & oRng.Text
On Error Resume Next
MsgBox ActiveDocument.Paragraphs(4).Range.Text
'Or specifically the fourth line.
ActiveDocument.Range(0, 0).Select
Dim i As Long
For i = 1 To 3
With Selection
.MoveDown unit:=wdLine
.Bookmarks("\line").Select
End With
Next i
MsgBox Selection.Text
'GoTo a specific line e.g., line 8:
Selection.GoTo What:=wdGoToLine, Count:=8
'Set a range equal to the complete paragraph range.
Set oRng = Selection.Paragraphs(1).Range
MsgBox oRng.Text
Else
MsgBox "No date found on this line"
End If
End If
End With
End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top