Word's Find utility: wildcard problem

A

Arne

I want to use Word's Find utility to find two words, but not more than a certain number of characters apart. For instance, i'd assume that
"THIS?{1,9}THAT"
finds the words THIS and THAT separated by up to 9 other characters.
However, Find also marks both words when they are separated by more than 9 characters.
Am I trying to strech the capabilities of the Find utility too far, or am I missing something?
 
K

Klaus Linke

Arne said:
I want to use Word's Find utility to find two words,
but not more than a certain number of characters apart.
For instance, i'd assume that
"THIS?{1,9}THAT"
finds the words THIS and THAT separated by up to 9 other characters.
However, Find also marks both words when they are
separated by more than 9 characters.
Am I trying to strech the capabilities of the Find utility
too far, or am I missing something?


Hi Arne,

First off, the fact that the expression matches even if THIS and THAT are
more than 9 characters apart looks like a bug to me.

That said, "Find what: THIS?{1,9}THAT" wouldn't work as you expect anyway.
It matches THIS, and then matches 1 to 9 arbitrary characters. Only *then*
it looks at the following text, and matches in case that the following text
is "THAT".
So it should only match if "THIS" and "THAT" are 9 characters apart.

What would work is "Find what: THIS[!T]{1,9}THAT":
It matches "THIS", then it matches 1 to 9 characters as long as they are
not "T". As soon as a "T" appears, the match for the expression [!T]{1,9}
stops, and Word now checks if the following text is "THAT".
But if the text between THIS and THAT contains a T, you're out of luck.

The "proper" way to do it would probably require more than one
"Find/Replace":
For example,
-- replace "THIS" with "THIS$",
-- replace "THAT" with "§THAT",
-- Find $[!$§]{1,9}§
(and then delete all $ and §)

This kind of search that you often see in search engines or data retrieval
software isn't too easy to do, even using wildcards. It would really
require some specialized search algorithms.
Other things along these lines: Find sentences/paragraphs that contain the
words X, Y, Z (...), in any order, or find sentences/paragraphs that
contain word X and also contain words Y or Z, but not word N.

Perhaps you could write to [email protected] to add such a feature...

Greetings,
Klaus
 
A

Arne

Ah, very useful information. Thanks a lot! What I am doing or rather what I am trying to do is this: I receive a file containing the results of an automated literature search. It usually contains hundreds of abstracts of scientific articles, and I am getting tired of sifting through all these. So in these abstracts I am looking for terms of interest to me like, for instance: "supercritical water", but in the article it may be spelled as "supercrit. water", so I might search for "supercrit*water". But it can happen that the two words are not really related and that they are separated by a number of other wors, so I want to make sure that "supercrit" and "water" are not too far apart.

Klaus Linke said:
Arne said:
I want to use Word's Find utility to find two words,
but not more than a certain number of characters apart.
For instance, i'd assume that
"THIS?{1,9}THAT"
finds the words THIS and THAT separated by up to 9 other characters.
However, Find also marks both words when they are
separated by more than 9 characters.
Am I trying to strech the capabilities of the Find utility
too far, or am I missing something?


Hi Arne,

First off, the fact that the expression matches even if THIS and THAT are
more than 9 characters apart looks like a bug to me.

That said, "Find what: THIS?{1,9}THAT" wouldn't work as you expect anyway.
It matches THIS, and then matches 1 to 9 arbitrary characters. Only *then*
it looks at the following text, and matches in case that the following text
is "THAT".
So it should only match if "THIS" and "THAT" are 9 characters apart.

What would work is "Find what: THIS[!T]{1,9}THAT":
It matches "THIS", then it matches 1 to 9 characters as long as they are
not "T". As soon as a "T" appears, the match for the expression [!T]{1,9}
stops, and Word now checks if the following text is "THAT".
But if the text between THIS and THAT contains a T, you're out of luck.

The "proper" way to do it would probably require more than one
"Find/Replace":
For example,
-- replace "THIS" with "THIS$",
-- replace "THAT" with "§THAT",
-- Find $[!$§]{1,9}§
(and then delete all $ and §)

This kind of search that you often see in search engines or data retrieval
software isn't too easy to do, even using wildcards. It would really
require some specialized search algorithms.
Other things along these lines: Find sentences/paragraphs that contain the
words X, Y, Z (...), in any order, or find sentences/paragraphs that
contain word X and also contain words Y or Z, but not word N.

Perhaps you could write to [email protected] to add such a feature...

Greetings,
Klaus
 
G

Greg Maxey

Arne,

How about something like find:
(<supercri)(*)(water>)
--
Greg Maxey
A peer in "peer to peer" support
Rockledge, FL
To e-mail, edit out the "w...spam" in [email protected]
Ah, very useful information. Thanks a lot! What I am doing or rather
what I am trying to do is this: I receive a file containing the
results of an automated literature search. It usually contains
hundreds of abstracts of scientific articles, and I am getting tired
of sifting through all these. So in these abstracts I am looking for
terms of interest to me like, for instance: "supercritical water",
but in the article it may be spelled as "supercrit. water", so I
might search for "supercrit*water". But it can happen that the two
words are not really related and that they are separated by a number
of other wors, so I want to make sure that "supercrit" and "water"
are not too far apart.

Klaus Linke said:
Arne said:
I want to use Word's Find utility to find two words,
but not more than a certain number of characters apart.
For instance, i'd assume that
"THIS?{1,9}THAT"
finds the words THIS and THAT separated by up to 9 other characters.
However, Find also marks both words when they are
separated by more than 9 characters.
Am I trying to strech the capabilities of the Find utility
too far, or am I missing something?


Hi Arne,

First off, the fact that the expression matches even if THIS and
THAT are
more than 9 characters apart looks like a bug to me.

That said, "Find what: THIS?{1,9}THAT" wouldn't work as you expect
anyway.
It matches THIS, and then matches 1 to 9 arbitrary characters. Only
*then*
it looks at the following text, and matches in case that the
following text
is "THAT".
So it should only match if "THIS" and "THAT" are 9 characters apart.

What would work is "Find what: THIS[!T]{1,9}THAT":
It matches "THIS", then it matches 1 to 9 characters as long as they
are
not "T". As soon as a "T" appears, the match for the expression
[!T]{1,9}
stops, and Word now checks if the following text is "THAT".
But if the text between THIS and THAT contains a T, you're out of
luck.

The "proper" way to do it would probably require more than one
"Find/Replace":
For example,
-- replace "THIS" with "THIS$",
-- replace "THAT" with "§THAT",
-- Find $[!$§]{1,9}§
(and then delete all $ and §)

This kind of search that you often see in search engines or data
retrieval
software isn't too easy to do, even using wildcards. It would really
require some specialized search algorithms.
Other things along these lines: Find sentences/paragraphs that
contain the
words X, Y, Z (...), in any order, or find sentences/paragraphs that
contain word X and also contain words Y or Z, but not word N.

Perhaps you could write to [email protected] to add such a
feature...

Greetings,
Klaus
 
A

Arne

Interesting... Suppose that the words are configured as follows: "...supercrit. CO2 blah blah blah water...". So this abstract clearly is about sc carbon dioxode instead of sc water, and I want it to be discarded. If I used (<supercri)(*)(water>), would I not find the whole sentence?

Greg Maxey said:
Arne,

How about something like find:
(<supercri)(*)(water>)
--
Greg Maxey
A peer in "peer to peer" support
Rockledge, FL
To e-mail, edit out the "w...spam" in [email protected]
Ah, very useful information. Thanks a lot! What I am doing or rather
what I am trying to do is this: I receive a file containing the
results of an automated literature search. It usually contains
hundreds of abstracts of scientific articles, and I am getting tired
of sifting through all these. So in these abstracts I am looking for
terms of interest to me like, for instance: "supercritical water",
but in the article it may be spelled as "supercrit. water", so I
might search for "supercrit*water". But it can happen that the two
words are not really related and that they are separated by a number
of other wors, so I want to make sure that "supercrit" and "water"
are not too far apart.

Klaus Linke said:
Arne wrote:
I want to use Word's Find utility to find two words,
but not more than a certain number of characters apart.
For instance, i'd assume that
"THIS?{1,9}THAT"
finds the words THIS and THAT separated by up to 9 other characters.
However, Find also marks both words when they are
separated by more than 9 characters.
Am I trying to strech the capabilities of the Find utility
too far, or am I missing something?


Hi Arne,

First off, the fact that the expression matches even if THIS and
THAT are
more than 9 characters apart looks like a bug to me.

That said, "Find what: THIS?{1,9}THAT" wouldn't work as you expect
anyway.
It matches THIS, and then matches 1 to 9 arbitrary characters. Only
*then*
it looks at the following text, and matches in case that the
following text
is "THAT".
So it should only match if "THIS" and "THAT" are 9 characters apart.

What would work is "Find what: THIS[!T]{1,9}THAT":
It matches "THIS", then it matches 1 to 9 characters as long as they
are
not "T". As soon as a "T" appears, the match for the expression
[!T]{1,9}
stops, and Word now checks if the following text is "THAT".
But if the text between THIS and THAT contains a T, you're out of
luck.

The "proper" way to do it would probably require more than one
"Find/Replace":
For example,
-- replace "THIS" with "THIS$",
-- replace "THAT" with "§THAT",
-- Find $[!$§]{1,9}§
(and then delete all $ and §)

This kind of search that you often see in search engines or data
retrieval
software isn't too easy to do, even using wildcards. It would really
require some specialized search algorithms.
Other things along these lines: Find sentences/paragraphs that
contain the
words X, Y, Z (...), in any order, or find sentences/paragraphs that
contain word X and also contain words Y or Z, but not word N.

Perhaps you could write to [email protected] to add such a
feature...

Greetings,
Klaus
 
G

Greg Maxey

Arne,

I clearly blew that one ;-)



--
Greg Maxey
A peer in "peer to peer" support
Rockledge, FL
To e-mail, edit out the "w...spam" in [email protected]
Interesting... Suppose that the words are configured as follows:
"...supercrit. CO2 blah blah blah water...". So this abstract clearly
is about sc carbon dioxode instead of sc water, and I want it to be
discarded. If I used (<supercri)(*)(water>), would I not find the
whole sentence?

Greg Maxey said:
Arne,

How about something like find:
(<supercri)(*)(water>)
--
Greg Maxey
A peer in "peer to peer" support
Rockledge, FL
To e-mail, edit out the "w...spam" in [email protected]
Ah, very useful information. Thanks a lot! What I am doing or rather
what I am trying to do is this: I receive a file containing the
results of an automated literature search. It usually contains
hundreds of abstracts of scientific articles, and I am getting tired
of sifting through all these. So in these abstracts I am looking for
terms of interest to me like, for instance: "supercritical water",
but in the article it may be spelled as "supercrit. water", so I
might search for "supercrit*water". But it can happen that the two
words are not really related and that they are separated by a number
of other wors, so I want to make sure that "supercrit" and "water"
are not too far apart.

:

Arne wrote:
I want to use Word's Find utility to find two words,
but not more than a certain number of characters apart.
For instance, i'd assume that
"THIS?{1,9}THAT"
finds the words THIS and THAT separated by up to 9 other
characters.
However, Find also marks both words when they are
separated by more than 9 characters.
Am I trying to strech the capabilities of the Find utility
too far, or am I missing something?


Hi Arne,

First off, the fact that the expression matches even if THIS and
THAT are
more than 9 characters apart looks like a bug to me.

That said, "Find what: THIS?{1,9}THAT" wouldn't work as you expect
anyway.
It matches THIS, and then matches 1 to 9 arbitrary characters. Only
*then*
it looks at the following text, and matches in case that the
following text
is "THAT".
So it should only match if "THIS" and "THAT" are 9 characters
apart.

What would work is "Find what: THIS[!T]{1,9}THAT":
It matches "THIS", then it matches 1 to 9 characters as long as
they
are
not "T". As soon as a "T" appears, the match for the expression
[!T]{1,9}
stops, and Word now checks if the following text is "THAT".
But if the text between THIS and THAT contains a T, you're out of
luck.

The "proper" way to do it would probably require more than one
"Find/Replace":
For example,
-- replace "THIS" with "THIS$",
-- replace "THAT" with "§THAT",
-- Find $[!$§]{1,9}§
(and then delete all $ and §)

This kind of search that you often see in search engines or data
retrieval
software isn't too easy to do, even using wildcards. It would
really
require some specialized search algorithms.
Other things along these lines: Find sentences/paragraphs that
contain the
words X, Y, Z (...), in any order, or find sentences/paragraphs
that
contain word X and also contain words Y or Z, but not word N.

Perhaps you could write to [email protected] to add such a
feature...

Greetings,
Klaus
 
A

Arne

Thanks anyway, Greg. I appreciate your input. I myself have posted loads of 'helpful' replies that did turn out to be, well, less useful. You cannot always be sharp :)

Greg Maxey said:
Arne,

I clearly blew that one ;-)



--
Greg Maxey
A peer in "peer to peer" support
Rockledge, FL
To e-mail, edit out the "w...spam" in [email protected]
Interesting... Suppose that the words are configured as follows:
"...supercrit. CO2 blah blah blah water...". So this abstract clearly
is about sc carbon dioxode instead of sc water, and I want it to be
discarded. If I used (<supercri)(*)(water>), would I not find the
whole sentence?

Greg Maxey said:
Arne,

How about something like find:
(<supercri)(*)(water>)
--
Greg Maxey
A peer in "peer to peer" support
Rockledge, FL
To e-mail, edit out the "w...spam" in [email protected]

Arne wrote:
Ah, very useful information. Thanks a lot! What I am doing or rather
what I am trying to do is this: I receive a file containing the
results of an automated literature search. It usually contains
hundreds of abstracts of scientific articles, and I am getting tired
of sifting through all these. So in these abstracts I am looking for
terms of interest to me like, for instance: "supercritical water",
but in the article it may be spelled as "supercrit. water", so I
might search for "supercrit*water". But it can happen that the two
words are not really related and that they are separated by a number
of other wors, so I want to make sure that "supercrit" and "water"
are not too far apart.

:

Arne wrote:
I want to use Word's Find utility to find two words,
but not more than a certain number of characters apart.
For instance, i'd assume that
"THIS?{1,9}THAT"
finds the words THIS and THAT separated by up to 9 other
characters.
However, Find also marks both words when they are
separated by more than 9 characters.
Am I trying to strech the capabilities of the Find utility
too far, or am I missing something?


Hi Arne,

First off, the fact that the expression matches even if THIS and
THAT are
more than 9 characters apart looks like a bug to me.

That said, "Find what: THIS?{1,9}THAT" wouldn't work as you expect
anyway.
It matches THIS, and then matches 1 to 9 arbitrary characters. Only
*then*
it looks at the following text, and matches in case that the
following text
is "THAT".
So it should only match if "THIS" and "THAT" are 9 characters
apart.

What would work is "Find what: THIS[!T]{1,9}THAT":
It matches "THIS", then it matches 1 to 9 characters as long as
they
are
not "T". As soon as a "T" appears, the match for the expression
[!T]{1,9}
stops, and Word now checks if the following text is "THAT".
But if the text between THIS and THAT contains a T, you're out of
luck.

The "proper" way to do it would probably require more than one
"Find/Replace":
For example,
-- replace "THIS" with "THIS$",
-- replace "THAT" with "§THAT",
-- Find $[!$§]{1,9}§
(and then delete all $ and §)

This kind of search that you often see in search engines or data
retrieval
software isn't too easy to do, even using wildcards. It would
really
require some specialized search algorithms.
Other things along these lines: Find sentences/paragraphs that
contain the
words X, Y, Z (...), in any order, or find sentences/paragraphs
that
contain word X and also contain words Y or Z, but not word N.

Perhaps you could write to [email protected] to add such a
feature...

Greetings,
Klaus
 
Top