How to extract UTF - 8 format from Symbol font in Word OpenXml...?

A

Anita

hello all,

i have Word 2007 document with some text in Symbol font.
OpenXmlfor this document looks as below.

<w:p w:rsidR="00980537" w:rsidRPr="003F0139" w:rsidRDefault="003F0139">
<w:r>
<w:rPr>
<w:rFonts w:ascii="Symbol" w:hAnsi="Symbol" />
</w:rPr>
<w:t>ï¤</w:t>
</w:r>
</w:p>

Is there any way to extract the symbol and convert it to corresponding UTF-8
format...?

Any pointers on the this would be of great help.

Thanks,
Anita.
 
K

Klaus Linke

Anita said:
Is there any way to extract the symbol and convert it to corresponding
UTF-8
format...?

Just about every character in the Symbol font can be found in Unicode. You
can find the mapping table here:
http://www.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt

Note that Word doesn't use the plain codes as given in the table for symbol
fonts, but adds F000 to it.

So instead of
2260 B9 # NOT EQUAL TO # notequal
in the link above, you may have to use
2260 F0B9 # NOT EQUAL TO # notequal

(that is, replace U+F0B9 in the Symbol font with U+2260 in a Unicode font).

Regards,
Klaus
 
K

Klaus Linke

Just about every character in the Symbol font can be found in Unicode.

Anita said:
Hello Klaus,

Is there a way to get this mapping directly using some code...?


Hi Anita,

The mapping isn't found in the Symbol font itself (which predates Unicode
anyway), so it would have to be hard-coded into Word.
And unfortunately, it isn't.

IIRC it *is* hard coded into MacWord, which automatically uses the Unicode
codes for the Symbol font and a few other popular fonts with symbols.

Regards,
Klaus
 
A

Anita

Hello,
I have some symbols like, ∈

I am creating WordOpenXml from this information. I am creating <w:p> in
default font. (i.e. Calibri, body)

<w:p w:rsidR="00FD58B1" w:rsidRDefault="00771ABE">
<w:r>
<w:t> ∈</w:t>
</w:r>
</w:p>

However, when these symbols, don’t get rendered properly.

These symbols need to converted first as per above algorithm, so that they
are displayed properly in Word. Also, I will have to add font as Symbol as
shown below, so that Word displays it properly.

<w:p w:rsidR="00980537" w:rsidRPr="003F0139" w:rsidRDefault="003F0139">
<w:r>
<w:rPr>
<w:rFonts w:ascii="Symbol" w:hAnsi="Symbol" />
</w:rPr>
<w:t>F0CE</w:t>
</w:r>
</w:p>

Also, there are some symbols, that get rendered properly in Cambria Math
only.
So, the problem is while creating this OpenXML, I need to go through every
such symbol and determine the font that should be used to display it.
Is there a way by which Word can itself determine the font to use by looking
at symbol?
Also, how do I determine, the font to be used for a symbol?
Thanks,
Anita
 
K

Klaus Linke

Hi Anita,

I'm not entirely sure I understand what you are trying to do. In any case, I
would try to avoid using the Symbol font, myself.

As you see in your posts here in the newsgroup, the Unicode characters
appear pretty fine...
<w:t> ∈</w:t> Not so for the Symbol font...
<w:t> </w:t>

In Word, those old "decorative" fonts are notoriously difficult to deal
with.
For example, Word usually won't show the font that really is applied
("Symbol") at all, since changing that font would destroy the symbol.

If you're creating XML files, I'd try to stick with Unicode in the text
(<w:t>...</w:t> without specifying a font), and would try to define the
fonts you want to use in the style definition, say:
<w:style w:type="paragraph" w:default="on"
w:styleId="myStyle">[...]<w:rPr><w:rFonts w:ascii="Calibri" w:h-ansi="Arial
Unicode MS"/>

As far as I know, with the above definition Word should use "Calibri" for
characters that are available in the Calibri font, and switch to "Arial
Unicode MS" for all that aren't.
I've not looked deeply into the documentation, nor experimented much with
it, though.
Is there a way by which Word can itself determine the font to use by
looking
at symbol?

In the user interface, if you insert some "exotic" Unicode character which
isn't available in the current font, Word does try to change the font
automatically to some font that contains it.
But the font that is chosen seems pretty random, unless specific fonts (say
for Thai text...) have already been defined in the style. If you leave it up
to Word, you usually end up with lots and lots of large fonts like "MS
Gothic", "Sim Sun", "Batang"... being applied haphazardly (as "manual"
formatting).

With the right style definitions (see above), Word should use the "proper"
font as defined in the style for Western text, Asian text, RTL text
(Arab/Hebrew), ..., depending on the languages you type in.
But in my experience, that system does not work terribly well for
mathematical symbols and other stuff without an associated language (say
phonetics, with the characters that are used in IPA phonetics coming from
many different Unicode blocks).
Also, how do I determine, the font to be used for a symbol?

For a rough idea of what font contains what characters, see Alan Wood's
Unicode Ressources:
http://www.alanwood.net/unicode/index.html

Many fonts, though, contain only parts of certain Unicode blocks, and I
don't know how to determine whether a certain character exists in a certain
font from VBA. There's probably some API call that could deliver that
information.

Regards,
Klaus
 
K

Klaus Linke

If you're creating XML files, I'd try to stick with Unicode in the text
(<w:t>...</w:t> without specifying a font), and would try to define the
fonts you want to use in the style definition, say:
<w:style w:type="paragraph" w:default="on"
w:styleId="myStyle">[...]<w:rPr><w:rFonts w:ascii="Calibri"
w:h-ansi="Arial Unicode MS"/>

Just tried it, and it does not work well at all...
Word only uses Calibri for lower-ASCII characters, and switches to "Arial
Unicode MS" for lots of characters that are available in Calibri.

Unless you find some better way to define the styles, the next-best solution
might be to use dedicated character and/or paragraph styles for mathematical
formulas, and make sure they use a font that contains all the necessary
symbols.

Regards,
Klaus
 
A

Anita

hello Klaus,

Thanks for your prompt reply.

If i have understood it correctly, the approach would still require
me to go through each symbol in input xml and then decide which font would
support, and then generate OpenXml.

Any idea on some other way to accomplish this, so that, i dont have
to hard code the supported symbols for a font at my end.

Any pointers on this would be of great help.

Thanks,
~Anita

Klaus Linke said:
If you're creating XML files, I'd try to stick with Unicode in the text
(<w:t>...</w:t> without specifying a font), and would try to define the
fonts you want to use in the style definition, say:
<w:style w:type="paragraph" w:default="on"
w:styleId="myStyle">[...]<w:rPr><w:rFonts w:ascii="Calibri"
w:h-ansi="Arial Unicode MS"/>

Just tried it, and it does not work well at all...
Word only uses Calibri for lower-ASCII characters, and switches to "Arial
Unicode MS" for lots of characters that are available in Calibri.

Unless you find some better way to define the styles, the next-best solution
might be to use dedicated character and/or paragraph styles for mathematical
formulas, and make sure they use a font that contains all the necessary
symbols.

Regards,
Klaus
 
T

Tony Jollans

If you are going to build a document, you must have logic that decides what
fonts to use. Do you really require so many different fonts that this logic
might be difficult?

Word has its own logic, using, amongst other things, the unicode subset
bitfields, which it uses when you type a character (although I would agree
with Klaus that it does appear rather haphazard sometimes), but it will not
override existing content in a document.

As a workaround, it might be possible, though possibly rather cumbersome, to
not state a font when you create your document and then copy and paste the
symbols in Word to force Word to use its own logic. I have just done a
rudimentary test and it appears to work for a single character; whether it
will scale up I don't know.

--
Enjoy,
Tony

www.WordArticles.com

Anita said:
hello Klaus,

Thanks for your prompt reply.

If i have understood it correctly, the approach would still
require
me to go through each symbol in input xml and then decide which font would
support, and then generate OpenXml.

Any idea on some other way to accomplish this, so that, i dont have
to hard code the supported symbols for a font at my end.

Any pointers on this would be of great help.

Thanks,
~Anita

Klaus Linke said:
If you're creating XML files, I'd try to stick with Unicode in the text
(<w:t>...</w:t> without specifying a font), and would try to define the
fonts you want to use in the style definition, say:
<w:style w:type="paragraph" w:default="on"
w:styleId="myStyle">[...]<w:rPr><w:rFonts w:ascii="Calibri"
w:h-ansi="Arial Unicode MS"/>

Just tried it, and it does not work well at all...
Word only uses Calibri for lower-ASCII characters, and switches to "Arial
Unicode MS" for lots of characters that are available in Calibri.

Unless you find some better way to define the styles, the next-best
solution
might be to use dedicated character and/or paragraph styles for
mathematical
formulas, and make sure they use a font that contains all the necessary
symbols.

Regards,
Klaus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top