Range.Text and Range.Characters

D

dan8

Dear colleagues,

I develop MS Word Add-in , the goal is to perform specific search in Word
docment and then set cursor to found positions.

Range.Characters returns a collection of single- of double-byte characters.
The case of double-byte chars is e.g. document with a table. Table contains
some "D7" charatcers.

Characters is too slow way to access the whole document text, so I use
Range.Text. However, Text returns me a string of single-byte characters. D
and 7 are two separate characters , so Range.Text turns out to be longer
than Range.Characters.Count.

This becomes a problem , when I use Range.Move to set cursor to positions
found in Range.Text . Move uses the same way as Characters, i.e. D7 is single
character. So cursor position becomes incorrect.

I need at least one of the following :

- be able to get text as multibyte string, where two-byte character is one
character. Collecting text from Charatcers property does not work quickly
enough.

- be able to position (Move) cursor by bytes, not characters.

Thank you in advance for help !
 
Z

zkid

A little confused on what it is you're trying to accomplish. Are you just
searching a table, cell by cell? What do you mean that D7 is a single
character? Can you provide a sample of what might be contained in a cell and
for what you're actually testing?
 
D

dan8

zkid said:
A little confused on what it is you're trying to accomplish. Are you just
searching a table, cell by cell? What do you mean that D7 is a single
character? Can you provide a sample of what might be contained in a cell and
for what you're actually testing?

Don't mind what Add-In is seaching for and how.

I need to get text of Word document, then find some positions in this text ,
and be able to set cursor to these positions in original document.

Example of problem is following :

Let's consider a Word document with plain text : abcdef
If we get ActiveDocument.Range(EmptyParam,EmptyParam), and then get
Text property of this Range, we have "abcdef" . Characters property contains
the same , 6 items (Items, one by one):
a
b
c
d
e
f
, so we are happy.

Now let's insert an empty 2x2 cell table between 'c' and 'd' , get the same
Range, Text and Characters :
Text="abc#$D#$D#7#$D#7#$D#7#$D#7#$D#7#$D#7def"
(so table added 13 characters to Text string)
and, Characters property (Items, one by one) are :
a
b
c
#$D
#$D#7
#$D#7
#$D#7
#$D#7
#$D#7
#$D#7
d
e
f
(Table added 7 items to Characters property, last 6 of them are double-byte).

if I perform search for "d" in Text, the position will be 17 . But setting
cursor to position 17 using Range.Move is incorrect, because Move works in
the same way as Characters property (sees double-byte character as single
character).

The obvious solution is to collect text from Characters property , but
this is too slow for large documents. I have to use Text property.

Do you understand the Problem now ?

Thank you.
 
J

Jay Freedman

Hi Dan,

I believe you're going about the job the wrong way. Using absolute
character positions in a Word document is too unreliable -- there are
other things that can mess up the count, including hidden text,
fields, characters from double-character fonts...

Instead, declare a Word.Range object, set it to the document's range,
and use its .Find method to search for the desired text. When
..Find.Execute returns True, call the .Select method of the Range
object.

I gather from your sample that you're automating this from .Net code,
and I'm not familiar enough with that to give you a working code
sample, but in VBA it would be

Dim oRg As Range
Set oRg = ActiveDocument.Range
With oRg.Find
.ClearFormatting
.Text = "d" ' fill in your search string
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchWildcards = False
If .Execute Then
oRg.Select
End If
End With

Even in a very large document this will be very fast. You can include
additional criteria in the .Find parameters (e.g., set .Format = True
and .Font.Bold = True to find only occurrences that are bold), or you
can add an If statement around the .Select (e.g.,
If oRg.Information(wdWithinTable) Then
to select only occurrences within tables).

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the
newsgroup so all may benefit.
 
T

Tony Jollans

Terminology is a little bit of a problem here!

Each Item in Range.Characters is actually, itself, a Range which may contain
more than one character (with a small c, in other words the normal English
meaning of the word) - each of these characters normally occupies two bytes
although that is not really relevant to the issue at hand.

Now to your problem! I don't think you can do what you are asking and the
best way to proceed rather depends on what you are really trying to do.

Are you processing (or do you want to process) every character in the text
or are you simply searching for something?

If you are processing every character it's probably as easy as anything to
keep your own count. Assuming you are not working with Unicode code points
above U+FFFF, the "D7" is one of very few such sequences (off the top of my
head I can't think of another) and you should be able to hard code them. You
may find it helpful to compare Len(Range) with Range.Characters.Count or, if
tables are your only concern, checking Range.Tables.Count might be useful. I
suspect, however, that you may run into other problems with things such as
Fields and, perhaps, inserted symbols from Symbol Fonts. In specific
instances you may be alright but in an AddIn that may be more general
purpose it could be harder to handle all the possibilities.

If you are just trying to search for a character, why not use Word's Find?
It will deal with all the issues and probably be more efficient than
anything you can write.
 
D

dan8

Jay Freedman said:
I believe you're going about the job the wrong way. Using absolute
character positions in a Word document is too unreliable -- there are
other things that can mess up the count, including hidden text,
fields, characters from double-character fonts...

Yes. What I need is just the same positions for searching and further cursor
setting. This text may contain any hidden , formatting characters, etc.
Actually,
Range.Characters and Range.Move does the work without errors. But getting
whole text by collecting Characters.Items is too slow.
Instead, declare a Word.Range object, set it to the document's range,
and use its .Find method to search for the desired text. When
..Find.Execute returns True, call the .Select method of the Range
object.

Thank you for the idea and code sample. But my search is too specific, it
can not be implemented with Word Range.Find . Actually, this is the reason
why user needs Add-In and can't use standard Word search.
 
D

dan8

Each Item in Range.Characters is actually, itself, a Range which may contain
more than one character (with a small c, in other words the normal English
meaning of the word) - each of these characters normally occupies two bytes
although that is not really relevant to the issue at hand.

As I wrote above, using Range.Characters and then Range.Move does the work.
I turn Text of each character (1 or 2 bytes) to 2-byte character in Unicode
string, perform my search in this string, and then use Range.Move in original
document. And positioon is always correct.
The only problem is performance of first step, collecting document text from
Characters.

Are you processing (or do you want to process) every character in the text
or are you simply searching for something?

Searching, but the search itself is complicated and can not be implemented
by embedded Range.Search. I need this string in my Add-In code.
In specific
instances you may be alright but in an AddIn that may be more general
purpose it could be harder to handle all the possibilities.

Quite so :( Add-In should work well in any Word document. This is not a
problem for search, if some hidden characters appears in text being searched.
But further cursor positioning should take these characters into account as
well, and set cursor to correct visible position.
 
T

Tony Jollans

There isn't really any way to reliably identify the position you want from
the text string you have.

What is it about the search that means you can't use Word's own Find? From
what you've said, it only involves normal characters, and if you can code
the logic for your own search through text it ought to be possible to code
the same logic using one or more Finds.
 
D

dan8

Tony Jollans said:
There isn't really any way to reliably identify the position you want from
the text string you have.

However, Word's own Find does this somehow . I am wondering about the same
ways to access text and positioning, but with my own search.


What is it about the search that means you can't use Word's own Find? From
what you've said, it only involves normal characters,

Character was only an illustration. Actually user will search for words in
text, matching to specific criteria. Particularly, there will be so called
fuzzy search , where target words has no more than k differences from
pattern. Or, search without pattern at all - for words that presents several
times in text (number of occurences is user-defined parameter).
Just beleive that standard search is not right tool here, even with wildcards.
 
T

Tony Jollans

However, Word's own Find does this somehow . I am wondering about the
same
ways to access text and positioning, but with my own search.

You have the access - you just find the performance unacceptable.

I do appreciate that doing some things using Word's object model can be slow
and painful and all I can really offer, if you have to work with the text,
is the comparison of Len(Range.Text) with Range.Characters.Count I mentioned
earlier - it doesn't give you a direct answer but it does alert you to the
situations where some further processing will be required. As I write this I
wonder whether, having done your own search within the text and (presumably)
identified an exact string, you could then use Word's Find on the Range for
that exact string which would provide you with the correct location (maybe
even search backwards from the text offset you have identified to minimise
the distance Word has to look - as you know the text position is greater
than or equal to the 'range position').

--
Enjoy,
Tony


dan8 said:
However, Word's own Find does this somehow . I am wondering about the same
ways to access text and positioning, but with my own search.




Character was only an illustration. Actually user will search for words in
text, matching to specific criteria. Particularly, there will be so called
fuzzy search , where target words has no more than k differences from
pattern. Or, search without pattern at all - for words that presents several
times in text (number of occurences is user-defined parameter).
Just beleive that standard search is not right tool here, even with
wildcards.
 
D

dan8

Tony Jollans said:
You have the access - you just find the performance unacceptable.

Word performs it's own search and positioning much quicker than me using
Range.Characters and Range.Move. The bottleneck is exactly Range.Characters,
I have traced the code. That's why I suspect the existence of quicker
methods..

all I can really offer, if you have to work with the text,
is the comparison of Len(Range.Text) with Range.Characters.Count I mentioned
earlier - it doesn't give you a direct answer but it does alert you to the
situations where some further processing will be required. As I write this I
wonder whether, having done your own search within the text and (presumably)
identified an exact string, you could then use Word's Find on the Range for
that exact string which would provide you with the correct location (maybe
even search backwards from the text offset you have identified to minimise
the distance Word has to look - as you know the text position is greater
than or equal to the 'range position').

Thank you for the idea, I agree this is the correct approach. But only when
we are sure there is no other way (apart from Range.Charatcers) to get
string with multibyte characters.
 
T

Tony Jollans

All I can really do is wish you luck. I know of no way through the Word
Object Model to get what you ask for.
 
K

Klaus Linke

Thank you for the idea, I agree this is the correct approach. But only
when we are sure there is no other way (apart from Range.Charatcers)
to get string with multibyte characters.

Range.Text should work fine for that.

In tables, you don't have D7 characters (×), you do have D and 7 as two
characters (which together are called "end-of-cell-markers" or
"end-of-row-markers".

Regards,
Klaus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top