Regex & Wildcards

V

Vince

Hey,

I need to find the following by matching Wild Cards.

1.1 mol/L
1 mol/L
1mol/L
1.1 mol /L
1 mol /L
1mol /L
1.1 mol / L
1 mol / L
1mol / L
1.1 mol/ L
1 mol/ L
1mol/ L

A sentence could contain any one this. For instance "James drank a solution
of Nitrogen Peroxide with a concentration of 5.15 mol/L".
This is what I could come up with:

([0-9.]@)( @)(mol/L)

Takes care of any numerals / decimals but does not account for:
a) The space between the number and mol/L (It looks for one space or more
but there is a possibility that a space might not exist like 1.1mol/L)
b) It strictly looks for mol/L and can't account for mol / L, mol/ L or mol
/L. In order to use this, I would have to repeat each instance with
appropriate spaces!

Questions:
1) How do I write a single Wildcard match for all the possibilities listed
above?
2) How can I say "Optional" in Regex. Eg. Di[peg] could be anyone of "Dig"
"Dip" or "Die". But I need to say that "Di" may or may not be followed by
"p" "e" or "g". In Perl, I would say "(Di)([epg])*" How do I say that in
VBA?

Thanks a lot for your time / any reponse.

Vince
 
H

Helmut Weber

Hi Vince,
before putting much effort into something,
that is hardly possible, as wildcard search
does not allow to search for zero or more occurences,
why not adjusting the text beforehand, like

Sub Test777()
ResetSearch
Dim rDcm As Range
Set rDcm = ActiveDocument.Range
With rDcm.Find
.Text = "mol"
.Replacement.Text = " mol"
.Execute Replace:=wdReplaceAll
.Text = "mol[ ]{1,}/"
.Replacement.Text = "mol/"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
.Text = "mol/[ ]{1,}L"
.Replacement.Text = "mol/L"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
.Text = "[ ]{1,}mol/L"
.Replacement.Text = " mol/L"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End Sub
'---
Public Sub ResetSearch()
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute
End With
End Sub

HTH

Greetings from Bavaria, Germany
Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word XP, Win 98
http://word.mvps.org/
 
V

Vince

Hey Helmut,

Thanks for your response.

I wanted to save efforts by coming up with a text file that contained all
find and replace conditions. At the risk of boring you, please allow me to
explain.

Problem: I am trying to copy edit word files and part of the long list of
copy editing rules, involves separating numerals and units of the format
"numeral thin space unit". So, I copied a huge list of units from the
internet and wrote a function that reads from a text file and does the find
and replace automatically. For instance, the text file could be:

([0-9.]@)( @)(mol/L)SPLIT\1^s\3SPLITTRUESPLITTRUE ' This tells the program
to find the first part before the first split, replace it with the
([0-9.]@)( @)(m/s)SPLIT\1^s\3SPLITTRUESPLITTRUE ' part before the second
split, match wild characters and be case sensitive

Basically, I wanted this text file to be edited by the user so that they can
add their own units that I missed. But, the problem or rather, the
inconvenience is that they need to type all possibilities into the file. For
instance, the above would be:

([0-9.]@)( @)(mol/L)SPLIT\1^s\3SPLITTRUESPLITTRUE ' This tells the program
to find the first part before the first split, replace it with the
([0-9.]@)( @)(m/s)SPLIT\1^s\3SPLITTRUESPLITTRUE ' part before the second
split, match wild characters and be case sensitive
([0-9.]@)( @)(mol / L)SPLIT\1^smol/LSPLITTRUESPLITTRUE ' This tells the
program to find the first part before the first split, replace it with the
([0-9.]@)( @)(m / s)SPLIT\1^sm/sSPLITTRUESPLITTRUE ' part before the second
split, match wild characters and be case sensitive
([0-9.]@)( @)(mol /L)SPLIT\1^smol/LSPLITTRUESPLITTRUE ' This tells the
program to find the first part before the first split, replace it with the
([0-9.]@)( @)(m /s)SPLIT\1^sm/sSPLITTRUESPLITTRUE ' part before the second
split, match wild characters and be case sensitive
([0-9.]@)( @)(mol/ L)SPLIT\1^smol/LSPLITTRUESPLITTRUE ' This tells the
program to find the first part before the first split, replace it with the
([0-9.]@)( @)(m/ s)SPLIT\1^sm/sSPLITTRUESPLITTRUE ' part before the second
split, match wild characters and be case sensitive

This two units, multplies to over 8 lines! This could slow down the program
(Don't really mind that...) but the main problem is that the text file could
become a little too big in the long run. This is why I was wondering if I
could somehow accomodate the possiblities in the text file to begin with
(using some wildcard search).

What I could do, however, is to use your method so that the program (when
reading from the file) also makes rooms for the possibilites listed above.
If you have a better idea, please let me know.

Thank you for your time.

Vince

Helmut Weber said:
Hi Vince,
before putting much effort into something,
that is hardly possible, as wildcard search
does not allow to search for zero or more occurences,
why not adjusting the text beforehand, like

Sub Test777()
ResetSearch
Dim rDcm As Range
Set rDcm = ActiveDocument.Range
With rDcm.Find
.Text = "mol"
.Replacement.Text = " mol"
.Execute Replace:=wdReplaceAll
.Text = "mol[ ]{1,}/"
.Replacement.Text = "mol/"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
.Text = "mol/[ ]{1,}L"
.Replacement.Text = "mol/L"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
.Text = "[ ]{1,}mol/L"
.Replacement.Text = " mol/L"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End Sub
'---
Public Sub ResetSearch()
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute
End With
End Sub

HTH

Greetings from Bavaria, Germany
Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word XP, Win 98
 
H

Helmut Weber

Hi Vince,
not that I understand all, but for things like:

"mol /L", "mol/ L", "mol / L", "mol / L"
"m /s", "m / s", "m/ s", "m /s", "m / s"

a possible workaround would be
to replace first "/" by " / ", in order to
overcome the limition that there is no search for
zero ore more occurences of a character.
So we add additional characters first!
After that, each "/" would be surrounded by spaces.
And after that, the following search using wildcards
would find all occurences of [ ]{1,}/[ ]{1,} and
can be replaced by "/": Resulting in "mol/L", "m/s".

And there may be more such simple tricks.

HTH
Greetings from Bavaria, Germany
Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word 2002, Windows 2000
 
V

Vince

Hey Helmut,

Thanks, that's a great idea! I just have to find out if adding a space
before and after every slash in the document is acceptable (what if there's
some text that has a '/' and is not a unit). But, I don't think it should be
a problem.....

Thanks, again!

Vince
 
H

Helmut Weber

Hi Vince,

just one more word,
depending on how big and how complex your docs
are, and on how much effort is justified,
one could even create a macro, that after
removing all spaces from slashes, highlights all
units as they are defined in a list, and locates
"/" that are not highlighted. And many more
variations.

Cheers

Helmut Weber
 
V

Vince

Thanks, Helmut!

Excellent idea. I am changing everything coming from the text file to Green
color. Easy to detect odd ones out like you mentioned.

Thanks, again!

Vince
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top