Symbol font chars changing to "(" after passing through VBA Regex

J

jalanford

I've been having some trouble with characters in Symbol font converting to
"(" when the paragraph which contains them is altered using VBA Regular
Expressions. I'm using the Range.Text object because the FormattedText object
will not work (for me, anyway) with Regex. The characters are fine if only a
match is done. However, once any kind of substitution is performed in the
paragraph, the characters "break". Has anyone else out there encountered
this? If so, have you come up with any work arounds? The search is complex
enough to rule out using Word's wildcard search capabilities with Selection.

Thanks,
 
K

Klaus Linke

jalanford said:
I've been having some trouble with characters in Symbol font converting to
"(" when the paragraph which contains them is altered using VBA Regular
Expressions. I'm using the Range.Text object because the FormattedText
object
will not work (for me, anyway) with Regex. The characters are fine if only
a
match is done. However, once any kind of substitution is performed in the
paragraph, the characters "break". Has anyone else out there encountered
this? If so, have you come up with any work arounds? The search is complex
enough to rule out using Word's wildcard search capabilities with
Selection.

Thanks,


Hi,

Your RegEx replacement reads the text into a string, and then re-inserts
that string (or parts of it).

Symbols from symbol fonts aren't in the string if the symbol has been
inserted from the "Insert > Symbol" dialog. Instead, you see a "(" in the
string, as you found out.

Symbols that are typed in using any symbol font appear as Unicode characters
with codes between &HF000 and &HF0FF. You can turn the "(" into those codes
using the macro from here:
http://groups.google.com/group/micr...700ed?lnk=st&q=&rnum=7&hl=en#43be18bf4c8700ed
That still doesn't solve your problem completely though, because all symbol
fonts use the same code page. You'd need to reapply the proper symbol font
after your replacement.

The easiest way out would be to avoid using symbol fonts altogether, and use
proper Unicode fonts/characters instead.

If that is no option, maybe explain what the replacement is supposed to be
doing. You say it is too complex for Word's wildcard capabilities, but maybe
there's a possibility anyway? Best post a sample text, and how the result
should look.

Or you could try to tag the symbol font characters and the used font (say
"<symbol font="Wingdings" char="232">) before you do the RegEx replacement,
and turn them back into symbols after the replacement.

Regards,
Klaus
 
J

jalanford

Klaus Linke said:
Hi,

Your RegEx replacement reads the text into a string, and then re-inserts
that string (or parts of it).

Symbols from symbol fonts aren't in the string if the symbol has been
inserted from the "Insert > Symbol" dialog. Instead, you see a "(" in the
string, as you found out.

Symbols that are typed in using any symbol font appear as Unicode characters
with codes between &HF000 and &HF0FF. You can turn the "(" into those codes
using the macro from here:
http://groups.google.com/group/micr...700ed?lnk=st&q=&rnum=7&hl=en#43be18bf4c8700ed
That still doesn't solve your problem completely though, because all symbol
fonts use the same code page. You'd need to reapply the proper symbol font
after your replacement.

The easiest way out would be to avoid using symbol fonts altogether, and use
proper Unicode fonts/characters instead.

If that is no option, maybe explain what the replacement is supposed to be
doing. You say it is too complex for Word's wildcard capabilities, but maybe
there's a possibility anyway? Best post a sample text, and how the result
should look.

Or you could try to tag the symbol font characters and the used font (say
"<symbol font="Wingdings" char="232">) before you do the RegEx replacement,
and turn them back into symbols after the replacement.

Regards,
Klaus

Thank you for your quick response. You've confirmed what I pretty much
suspected. What I'm trying to accomplish is to identify paragraphs in generic
papers that might have "run-in" heads (that is, the head is part of the
paragraph, not on a line by itself). These heads do not have a standard
format but do generally follow a rough pattern. I've converted everything to
flat text by converting all the formatting to tagged markup. The wide variety
of possibilities in the tagged markup are what cause Word's wild card option
to fall short in this case. The regex identifies the paras with run-in heads
and places a marker at the end of the head text for later processing. The
placement of the marker is the problem. Of course, that's the time when the
string text is used for substitution.

I am intrigued by your last suggestion. That's exactly the kind of solution
I was contemplating. Your SymbolToUnicode macro supplies the hints on how to
do this. I plan on trying your 2 macros first before I get too involved with
writing the "character to tag" macro. It seems, however, that in order to
prevent "surprises" from other symbol fonts, that kind of solution will need
to be implemented.

Thanks again.
 
K

Klaus Linke

Just to make sure: I was refering to the SymbolsUnprotect macro, to turn the
"(" into the "real" codes of the symbols.

I suppose if you can detect run-in heads with RegEx, you probably could also
do it with Word wildcard searches (possibly more than one, because one of
the differences between RegEx and Word wildcards is that the latter does not
support "OR").
But if you can make the tagging work, it should solve the problem.

Greetings,
Klaus
 
J

jalanford

Yes, I understand that UnprotectSymbol will "unlock" the information in the
protected symbol characters, making that information available to be captured
and processed, and unprotect the characters, making them more susceptible to
inadvertent alteration; thus allowing the SymbolToUnicode macro to work.

Thank you.
 
K

Klaus Linke

:)

Good luck with your macros... If you manage to write macros to turn the
symbols into tags and back, it might be great if you could post them for
future reference?

Klaus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top