wordML XML and HTML

L

LNowak

Problem: In my xml file - there is element that has HTML formatted text in
it. When Word imports my XML file - it displays the element as text string
and does not render the HTML stored in the element.

Normally - that is probally what I would want - but in this case I need the
table render.

<examples>
<customer> John smith</customer>
<example_table><table style="width: 100%" class="style8">
<tr><td>ABC</td><td>XYZ</td><td>123</td></tr>
<tr><td>ABC</td><td>XYZ</td><td>123</td></tr>
</table>
</example_table>
</examples>

I have been trying to get this to work for couple hours now... :(

I have an xsd , xml and xslt

I open the xsd in word
format the the xml tags etc... create the seed doc
save as xml file - uncheck save data only
run the seed doc through xml2xslt
run the xml and xslt through transformer

Everything works, doc looks good etc... except for the couple of elements
that have preformatted HTML data. They are displayes as string in the doc and
not renedered as Tables.

Any one know what I might be missing or have pointer?

Thanks
Leigh
 
C

Cindy M.

Hi Leigh,

HTML is not "native" to Word. Word knows nothing about HTML. If you have HTML in
your XML file as you show, then you have to put it in an element defined in your
XSD that can be used as a "heads-up, this is HTML" identifier. Your XSLT needs
to take the content of such elements and transform the HTML into the
WordProcessingML equivalent.
Problem: In my xml file - there is element that has HTML formatted text in
it. When Word imports my XML file - it displays the element as text string
and does not render the HTML stored in the element.

Normally - that is probally what I would want - but in this case I need the
table render.

<examples>
<customer> John smith</customer>
<example_table><table style="width: 100%" class="style8">
<tr><td>ABC</td><td>XYZ</td><td>123</td></tr>
<tr><td>ABC</td><td>XYZ</td><td>123</td></tr>
</table>
</example_table>
</examples>

I have been trying to get this to work for couple hours now... :(

I have an xsd , xml and xslt

I open the xsd in word
format the the xml tags etc... create the seed doc
save as xml file - uncheck save data only
run the seed doc through xml2xslt
run the xml and xslt through transformer

Everything works, doc looks good etc... except for the couple of elements
that have preformatted HTML data. They are displayes as string in the doc and
not renedered as Tables.

Any one know what I might be missing or have pointer?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 
L

LNowak

Thanks for the reply :)
Unfortunately – I didn’t develop the schema (and I have hundreds of them) or
would have never let HTML embedded into xml element… Kind of defeats the
whole point of decoupling data from formatting.

Anyways – any insight on the following:
- open a web page in IE
- highlight a table
- copy the table
- start MS-Word
- paste table into word

The HTML table renders as word table with all formatting etc…

Thanks
Leigh
 
P

Peter Jamieson

Anyways – any insight on the following:
- open a web page in IE
- highlight a table
- copy the table
- start MS-Word
- paste table into word

The HTML table renders as word table with all formatting etc…

This is different from including HTML format text in a WordML element
(assuming that that is what you are driving at!)

When you copy/pste from IE to Word, you are not copying HTML format text
from one .htm file to another .htm/.xml/.doc file. You are copying
material from one application to another via the Windows clipboard.

So let's make the simplifying assumption that when you copy a table in
IE, IE places a copy of the original HTML in the clipboard.

The questin is what happens when you then paste into Word. Well Word can
understand HTML too, so what really happens is that Word reads the table
HTML and converts it so that it becomes part of Word's in-memory
document representation. However, when you save that Word document, that
table will be rendered as HTML if you are saving as a Web page; as .rtf
if you are saving in .rtf format; as .doc foormat if you are saving in
that format, and as WordML if you are saving as WordML .xml or Word 2007
..docx format.

e.g. Suppose I start with the following chunk of HTML that represents
the first cell of a table where the cell contains the text "1."

<table border="1" width="100%">
<tr>
<td width="4%" valign="top"><font size="2">1.</font></td>


If I copy that from IE7 and paste it into Word 2007 as HTML, then save
the document as a Web page and re-open it as a plain text file, I see

<table class=MsoNormalTable border=1 cellpadding=0 width="100%"
style='width:100.0%;mso-cellspacing:1.5pt;border:eek:utset #660000 1.0pt;
mso-border-alt:eek:utset #660000 .75pt;mso-yfti-tbllook:1184'>
<tr style='mso-yfti-irow:0;mso-yfti-firstrow:yes'>
<td width="4%" valign=top style='width:4.0%;border:inset #660033 1.0pt;
mso-border-alt:inset #660033 .75pt;padding:.75pt .75pt .75pt .75pt'>
<p class=MsoNormal
style='margin-bottom:0cm;margin-bottom:.0001pt;line-height:
normal'><span style='font-size:10.0pt;font-family:"Book Antiqua","serif";
mso-fareast-font-family:"Times New Roman";mso-bidi-font-family:"Times
New Roman";
color:black'>1.</span><span style='font-size:12.0pt;font-family:"Book
Antiqua","serif";
mso-fareast-font-family:"Times New Roman";mso-bidi-font-family:"Times
New Roman";
color:black'><o:p></o:p></span></p>
</td>

Not surprisingly, since this is an HTML file, the table is rendered
using the <TABLE>, <TR> and <TD> elements. Hardly surprising, since
there's no other way to do it. But the content is radically different
from the original because Word needs to record loads of layout
information that it has in effect added.

If I save as Word 2003 format WordML, the equivalent chunk is

<w:tbl><w:tblPr><w:tblW w:w="5000" w:type="pct"/><w:tblCellSpacing
w:w="15" w:type="dxa"/><w:tblBorders><w:top w:val="outset" w:sz="6"
wx:bdrwidth="15" w:space="0" w:color="660000"/><w:left w:val="outset"
w:sz="6" wx:bdrwidth="15" w:space="0" w:color="660000"/><w:bottom
w:val="outset" w:sz="6" wx:bdrwidth="15" w:space="0"
w:color="660000"/><w:right w:val="outset" w:sz="6" wx:bdrwidth="15"
w:space="0" w:color="660000"/></w:tblBorders><w:tblCellMar><w:top
w:w="15" w:type="dxa"/><w:left w:w="15" w:type="dxa"/><w:bottom w:w="15"
w:type="dxa"/><w:right w:w="15" w:type="dxa"/></w:tblCellMar><w:tblLook
w:val="04A0"/></w:tblPr><w:tblGrid><w:gridCol w:w="404"/><w:gridCol
w:w="1554"/><w:gridCol w:w="1554"/><w:gridCol w:w="2899"/><w:gridCol
w:w="2735"/></w:tblGrid><w:tr wsp:rsidR="00596248"
wsp:rsidRPr="00596248"><w:trPr><w:tblCellSpacing w:w="15"
w:type="dxa"/></w:trPr><w:tc><w:tcPr><w:tcW w:w="200"
w:type="pct"/><w:tcBorders><w:top w:val="outset" w:sz="6"
wx:bdrwidth="15" w:space="0" w:color="660033"/><w:left w:val="outset"
w:sz="6" wx:bdrwidth="15" w:space="0" w:color="660033"/><w:bottom
w:val="outset" w:sz="6" wx:bdrwidth="15" w:space="0"
w:color="660033"/><w:right w:val="outset" w:sz="6" wx:bdrwidth="15"
w:space="0" w:color="660033"/></w:tcBorders></w:tcPr><w:p
wsp:rsidR="00596248" wsp:rsidRPr="00596248" wsp:rsidRDefault="00596248"
wsp:rsidP="00596248"><w:pPr><w:spacing w:after="0" w:line="240"
w:line-rule="auto"/><w:rPr><w:rFonts w:ascii="Book Antiqua"
w:fareast="Times New Roman" w:h-ansi="Book Antiqua"/><wx:font
wx:val="Book Antiqua"/><w:color w:val="000000"/><w:sz
w:val="24"/><w:sz-cs w:val="24"/></w:rPr></w:pPr><w:r
wsp:rsidRPr="00596248"><w:rPr><w:rFonts w:ascii="Book Antiqua"
w:fareast="Times New Roman" w:h-ansi="Book Antiqua"/><wx:font
wx:val="Book Antiqua"/><w:color w:val="000000"/><w:sz
w:val="20"/><w:sz-cs w:val="20"/></w:rPr><w:t>1.</w:t></w:r></w:p></w:tc>

No sign of any HTML there - it's all WordML. However, if you opened that
..xml document and saved it as .html, you'd probably roughly the same
thing as the previous .htm chunk.


Peter Jamieson

http://tips.pjmsn.me.uk
 
L

LNowak

Peter Jamieson said:
The questin is what happens when you then paste into Word. Well Word can
understand HTML too, so what really happens is that Word reads the table
HTML and converts it so that it becomes part of Word's in-memory
document representation.

Exactly - I need to insert the contents of the xml element into "Word's
in-memory
document representation" and have word convert the html

Any Ideals?

I am thinking a macro... but I was really hoping that Word would expose some
kinda of componet (text box) that would do just that.

Many thanks for your :)
Leigh
 
P

Peter Jamieson

OK, I see roughly where you're coming from now. Not sure I can help
much, but...

Even if you can select the HTML, then open it in Word (e.g. save it to
disk then open it using the Word object model), it's still not easy to
extract the XML that specifically pertains to the table (because you get
loads of header stuff too).

Part of the problem is that html (and here I'd guess you are really
dealing with xhtml or it probably wouldn't be well-formed XML) has a
bunch of defaults so that e.g. your table cells have borders which you
would have to specify explicitly in Word ML.

Last time I had to deal with this kind of problem I sat down and
experimented with WordML until I'd worked out exactly what the WordML
equivalents of those defaults would be. For example in this case I'd
start with the assumption that I could replace

<table> by <w:tbl>
<tr> by <w:tr>
<td> by <w:tc>

(and the equivalent closing tags)

but then you probably need to surround the text in any cell by

<w:p> (at least one paragraph)

<w:t><w:r> and </w:r></w:t>

</w:p>

That at least gives the basic structure in simple cases, but for borders
etc. you just have to work your way through the schema and/or some examples.


Peter Jamieson

http://tips.pjmsn.me.uk
 
C

Cindy M.

Hi Leigh,
Exactly - I need to insert the contents of the xml element into "Word's
in-memory
document representation" and have word convert the html

Any Ideals?

I am thinking a macro... but I was really hoping that Word would expose some
kinda of componet (text box) that would do just that.
The only way to trigger Word's HTML converter is when pasting from the
Clipboard, opening a file, or inserting from a file. Beyond that, you have to
"roll your own".

If you could, for example, put the HTML into a text document, save that to
disk, then use Insert/File you should get Word's interpretation of that HTML.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top