Trying to Retrive RTF From word document

  • Thread starter AliR \(VC++ MVP\)
  • Start date
A

AliR \(VC++ MVP\)

Hi Everyone,

I am not sure if this is the correct newsgroup for this question.
What I am trying to do is open a word document and read some text out of it.
So far I have had a great start. I am using the Office object model, and I
am
able to retrieve alot of the information that I need except for one thing
the text with RTF formatting information! I am able to enumerate the
paragraphs and ask each one for their style, and their text, but the only
text I have been able to get out of it has been just the text without any
RTF formating. Anyone know how to get the RTF data. I even tried
CParagraph::get_FormattedText, but that returns plain text also. I wonder
if there is a way to tell it that I want CF_RTF.

Here is the quick and dirty test code so far.

CDocuments, CDocument0, CApplication, CParagraphs, CParagraph, and CRange
are wrapper classes for the interfaces created by VStudio. (Please overlook
the bad error handeling as this is just test code)

CDocument0 Doc;
COleVariant True( (short)TRUE),
False((short)FALSE),
Long((long)DISP_E_PARAMNOTFOUND,VT_ERROR);
try
{
CFileDialog Dlg(true,"*.doc");
if (Dlg.DoModal() == IDOK)
{
CApplication App;
if(!App.CreateDispatch("Word.Application"))
{
AfxMessageBox("Coud Not Create The Application Object");
return;
}

App.put_Visible(FALSE);

CDocuments Documents(App.get_Documents());


Doc.put_UpdateStylesOnOpen(TRUE);


Doc.AttachDispatch(Documents.Open(COleVariant(Dlg.GetPathName()),Long,Long,L
ong,Long,
Long,Long,Long,Long,Long,Long,Long,Long,Long,Long,Long));

Doc.UpdateStyles();

CStyles Styles;
Styles.AttachDispatch(Doc.get_Styles());

CParagraphs Paragraphs;
Paragraphs.AttachDispatch(Doc.get_Paragraphs());
LPDISPATCH lpDisp = Paragraphs.get_First();
while (lpDisp != NULL)
{
CParagraph Paragraph;
Paragraph.AttachDispatch(lpDisp);

CRange wdRange;
wdRange = Paragraph.get_Range();

//here is where I get the text, but it doesn't have the RTF
stuff
CString Text = wdRange.get_Text();
Text.Trim();
if (!Text.IsEmpty())
{
CStyle Style;
Style.AttachDispatch(Styles.Item(&Paragraph.get_Style()));
Text += CString(" - ") + Style.get_NameLocal();

MessageBox(Text);
}

lpDisp = Paragraph.Next(Long);
}

}
}
catch (CException *e)
{
char Error[255];
e->GetErrorMessage(Error,254);
MessageBox(Error,"Exception Thrown");
e->Delete();
}
Doc.Close(False,Long,Long);


AliR.
 
C

Cindy M.

Hi AliR,
I am not sure if this is the correct newsgroup for this question.
What I am trying to do is open a word document and read some text out of it.
So far I have had a great start. I am using the Office object model, and I
am
able to retrieve alot of the information that I need except for one thing
the text with RTF formatting information! I am able to enumerate the
paragraphs and ask each one for their style, and their text, but the only
text I have been able to get out of it has been just the text without any
RTF formating. Anyone know how to get the RTF data. I even tried
CParagraph::get_FormattedText, but that returns plain text also. I wonder
if there is a way to tell it that I want CF_RTF.
No, there is nothing in Word's object model that will give you the RTF for the
text in the document. Word's native format is NOT RTF, it has to use a
converter for that.

If you copy the text onto the Clipboard you should be able to extract the RTF
from that.

And if you're automating Word 2003 or 2007 you can pick up the XML directly
from a range then transform/parse that into RTF if you really need RTF.

Beyond that, you'd have to inspect every ParagraphFormat and character
formatting property and "translate" it to RTF in your code.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
S

spm

As Cindy said, RTF is not the native format used by Word, but if you
have a need to get text out in RTF, it can be done, using the
IDataObject interface. Basically, this is what you do to get a Word
document in RTF:

- Get the IDataObject interface from the Document object (that's the
Document object from the Word Object Model). It may be that you can get
one from a Range object, too, but I haven't tried that. If you can,
this will enable you to get the RTF of a range of text instead.

- Get an id for the RTF clipboard format, by calling
RegisterClipboardFormat(CF_RTF)

- Set up a FormatEtc structure, thus:

cfFormat = the RTF format id
pdt = null
dwAspect = DVASPECT_CONTENT
lindex = -1
tymed = TYMED_HGLOBAL

- Call your IDataObject's GetData method, passing a pointer to your
FormatEtc as the first paramater. The call will return a StgMedium
structure.

- Lock the hGlobal field of the StgMedium, and you have a pointer to a
string containing all the RTF.

When finished, unlock StgMedium.hGlobal (and free it if the
pUnkForRelease is null, else use the pUnkForRelease interface to do
so), and finally release the IDataObject interface you created up front.

The advantage of all this is that you avoid overwriting the clipboard
(something that can really upset a user).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top