Character count & bookmark position not tallying

P

Paulkelly77

Hello there,

I am trying to write a VBS script to run over a collection of 4000 or
so documents that were created using a word template which uses a form
on document creation that people fill in. This form then inserts text
at various bookmarks throughout the document. I need to now extract
the text at some of the bookmark points using the MID function, before
I can do that I need to get accurate information about the various
bookmark starting points.

Using the following code I locate the position of one of the bookmarks
(bookMeetingDate) in a test document, it reports position 990, I know
the text that is inserted just after the bookmark in my test document
is 18 January, when I do an InStr(oRangeWhole) for that text I get
position 559 reported back to me, can anyone shed some light on why
there is a difference in these two values and whhat I might do to get
an accurate value of the bookmark position?

18 January only appears once in the test document.

Option Explicit

Dim oWord
Set oWord = CreateObject ("Word.Application")

Dim oDoc
Set oDoc = oWord.Documents.Open("C:\docparser\I-06-01.doc",True,True)

Dim oRangeWhole
Set oRangeWhole = oDoc.Range

WScript.echo oDoc.Bookmarks("bookMeetingDate").Start ' Reports 990
WScript.echo InStr(oRangeWhole,"18 January") ' Reports 559

oDoc.Close(False)

oWord.Quit

Set oDoc = Nothing
Set oWord = Nothing
 
S

Shauna Kelly

Hi Paul

There are lots of possible things going on here.

First, a bookmark can cover several characters, or be 'empty', and live
between two characters. So you might get a different result for the .Start
and .End of the Bookmark's range.

Second, a Word document isn't just a run of text in which you can use InStr
as one would for a String. If anything like a Table or an image is in the
document, then the position you'll get *within the document* and the
position *within the string of text* might be completely different.

If you need to extract text, then the best way is to create a Range object
that covers the text you want to extract (eg by using the range's .Find
property or using the bookmark's .Range or some such) and then getting the
..Text property of the Range.

So, for example, if you wanted the text in the whole of the paragraph in
which the bookMeetingDate Bookmark appears, you could use
ActiveDocument.Bookmarks("bookMeetingDate").Range.Paragraphs(1).Range.Text.

Don't forget that bookmarks are easily deleted or over-written by users
accidentally. (And I always deliberately delete any bookmarks that templates
leave lying in my documents for no good reason!) So you won't always be able
to be sure that they exist.

Hope this helps.

Shauna Kelly. Microsoft MVP.
http://www.shaunakelly.com/word
 
J

Jezebel

Perhaps it might be more useful if you explained what you're actually trying
to do. As Shauna has explained, you can't treat the content of a document as
a string. There are *lots* of things that occur in a document, and affect
the start and end values of individual ranges, that don't show up in the
strings you can extract from the content (just try inserting a table, and
account for all those end-of-cell and end-of-row markers). There's also:

- content that isn't into the body of the document at all (textboxes,
footnotes, comments, etc)
- fields (do you want to count the field code or the field result?)
- hidden text (count it or ignore it?)
 
P

Paulkelly77

Thanks for the quick response,

The
ActiveDocument.Bookmarks("bookMeetingDate").Range.Paragraphs(1).Range.Text
thing works, thanks a lot.

I think what was throwing me off pursuing that was the fact that the
..start and .end values of the range was the same character position, as
such I though the bookmark range contained no text.

Oh well, a bit more persistance next time eh?

Thanks a lot for the advice / help.
 
J

Jean-Guy Marcil

(e-mail address removed) was telling us:
(e-mail address removed) nous racontait que :
Thanks for the quick response,

The
ActiveDocument.Bookmarks("bookMeetingDate").Range.Paragraphs(1).Range.Text
thing works, thanks a lot.

I think what was throwing me off pursuing that was the fact that the
.start and .end values of the range was the same character position,
as such I though the bookmark range contained no text.


If you mean that oDoc.Bookmarks("bookMeetingDate").Start =
oDoc.Bookmarks("bookMeetingDate").End, so that the Start and End value of a
bookmark are the same, then you are correct, the bookmark range does not
contain any text. This means that the text you are looking for was inserted
right after the bookmark and the bookmark was preserved.

Also, remember that
ActiveDocument.Bookmarks("bookMeetingDate").Range.Paragraphs(1).Range.Text
will include the ¶. You may want to strip it off if you are going to process
the paragraph text itself and send it elsewhere...

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top