Unicode text Find & Replace

Chand · Dec 31, 2009

I am developing unicode to general text program in VSTO (VB.NET). Iam
working on Gurmukhi/Punkabi unicode i.e. Raavi. There are many chars in it
which are formed ny combining two or more diferant chars like à¨• + à¨¿ = à¨•à¨¿ or à¨•
+ à¨¿ + à¨‚ = à¨•à¨¿à¨‚. Earlier i ws not able to search any single instance of à¨¿ or à¨‚
but later i tried use wildcard option along with [à¨¿] . This char is sarched
perfectly but while replacing it with general char "i" replaces whole "à¨•à¨¿" or
"à¨•à¨¿à¨‚" with single char "i". Can anyone help How can i search single chars
and replace with some other chars to normalize Unicode to General font
convertion ? ? ?

Klaus Linke · Jan 5, 2010

Chand said:
I am developing unicode to general text program in VSTO (VB.NET).
Iam working on Gurmukhi/Punkabi unicode i.e. Raavi.
There are many chars in it which are formed ny combining two or more
diferant chars like à¨• + à¨¿ = à¨•à¨¿ or à¨• + à¨¿ + à¨‚ = à¨•à¨¿à¨‚.
Earlier i ws not able to search any single instance of à¨¿ or à¨‚
but later i tried use wildcard option along with [à¨¿] .

That's a problem I know from diacritics in other languages. The diacritic
alone, or the letter it's combined with alone, isn't found.
The work-around I use is the same as yours.
I've filed a bug report years ago, but didn't hear back if it's going to be
fixed.

This char is sarched perfectly but while replacing it with general
char "i" replaces whole "à¨•à¨¿" or "à¨•à¨¿à¨‚" with single char "i".
Can anyone help How can i search single chars and replace
with some other chars to normalize Unicode to General font
convertion ? ? ?

I'm not sure from your description what you want to replace with what.
Since you're doing a wildcard replacement, you can re-use anything matched
if you put parentheses around it in "Find what", and then use the
appropriate placeholder in "Replace with" (\1 for the first expression in
parentheses, \2 for the second...).

I have no idea about Gurmukhi/Punkabi unicode. In other scripts using
ligatures and diacritics like say Arabic, the ligatures form automatically
if a well-designed font is used. The glyphs for ligatures may be only in
Unicode for compatibility reasons -- because old fonts don't do the
ligatures, or because old files used the ligatures since fonts back when
didn't do them automatically.
So maybe ask in a group with knowledgeable people (say
microsoft.public.word.international.features) if the replacements you are
trying to make are sensible, or if instead you can use a font that handles
the ligatures automatically?

Regards,
Klaus

Rich B. · Jan 12, 2010

I'm unsure if this addresses the problem, but the code that I wrote to
replace both ANSI and Unicode character strings with ligatures required:
1. Specify match case (prevents character variants from matching)
2. Exclude small caps
3. Enable format matching (to detect caps and bold/italic properly)

Cheers

Klaus Linke said:
Chand said:

I am developing unicode to general text program in VSTO (VB.NET).
Iam working on Gurmukhi/Punkabi unicode i.e. Raavi.
There are many chars in it which are formed ny combining two or more
diferant chars like à¨• + à¨¿ = à¨•à¨¿ or à¨• + à¨¿ + à¨‚ = à¨•à¨¿à¨‚.
Earlier i ws not able to search any single instance of à¨¿ or à¨‚
but later i tried use wildcard option along with [à¨¿] .

Click to expand...

That's a problem I know from diacritics in other languages. The diacritic
alone, or the letter it's combined with alone, isn't found.
The work-around I use is the same as yours.
I've filed a bug report years ago, but didn't hear back if it's going to be
fixed.

This char is sarched perfectly but while replacing it with general
char "i" replaces whole "à¨•à¨¿" or "à¨•à¨¿à¨‚" with single char "i".
Can anyone help How can i search single chars and replace
with some other chars to normalize Unicode to General font
convertion ? ? ?

Click to expand...

I'm not sure from your description what you want to replace with what.
Since you're doing a wildcard replacement, you can re-use anything matched
if you put parentheses around it in "Find what", and then use the
appropriate placeholder in "Replace with" (\1 for the first expression in
parentheses, \2 for the second...).

I have no idea about Gurmukhi/Punkabi unicode. In other scripts using
ligatures and diacritics like say Arabic, the ligatures form automatically
if a well-designed font is used. The glyphs for ligatures may be only in
Unicode for compatibility reasons -- because old fonts don't do the
ligatures, or because old files used the ligatures since fonts back when
didn't do them automatically.
So maybe ask in a group with knowledgeable people (say
microsoft.public.word.international.features) if the replacements you are
trying to make are sensible, or if instead you can use a font that handles
the ligatures automatically?

Regards,
Klaus

.

search/find by unicode codepoint	1	Apr 29, 2010
Be consistent with Unicode codepoints!	2	Apr 8, 2010
Symbol "font" - not Unicode compliant, how to Search/Replace?	2	Apr 7, 2010
how do I use macro to find and replace unicode characters	6	Apr 1, 2009
Find and Replace	6	May 16, 2012
find and replace using wildcard characters	2	Jan 27, 2009
Find and Replace Problem	4	Apr 20, 2008
Find and replace many strings	1	Feb 26, 2009

Unicode text Find & Replace

Chand

Klaus Linke

Rich B.

Ask a Question

Similar Threads