How to get HTML content from Word(Directly)

Z

Zhiguo

Hi, my friends:

I can get the content of a word document *in plain text* by the
following code:
///////////////////////////////////////////////////////////////////////////
WORD::_DocumentPtr spDoc = m_spApp->ActiveDocument;
WORD::RangePtr spRange1=NULL;
spDoc->get_Content(&spRange1);
BSTR bstr1;
spRange1->get_Text(&bstr1);

///////////////////////////////////////////////////////////////////////////
But now I want to get the content in HTML format.
I know I can save the document in HTML format, and thus get the
content in HTML format.
Is there a way to do it directly? Without the "saving as" trick.

Thanks! Any help will be greatly appreciated!
 
J

Jay Freedman

No, there is no way to do it directly. The HTML tags don't exist in the
document that's in memory; they are inserted by the HTML conversion filter
that's selected in the Save As dialog, which interprets the style and direct
formatting information in the document, and the tags exist only in the
character stream sent to the disk file.

--
Regards,
Jay Freedman
Microsoft Word MVP
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 
Z

Zhiguo

But the HTML that Word saveas, is very...complicated.
Is there a way to convert .doc file into clean html files?
Thanks!
 
J

Jay Freedman

There are two HTML entries in the Save As Type dropdown. If you choose "Web
Page, Filtered" instead of "Web Page" you will get a considerably less
complicated coding. The point of the unfiltered "Web Page" output is to
enable rendering of everything the native file format is capable of
expressing; the filtered output is closer to clean HTML but not completely
clean.

If you want very simple HTML, you can write a macro (or maybe find one on
the Web, although I don't have an example handy) to find each occurrence of
text with the types of formatting you're interested in, and surround it with
the appropriate tags; the result can then be saved as plain text.

An alternative would be to dump the plain text into a real HTML editor and
tag it any way you like.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top