Iterate embedded (attached) objects in Word

  • Thread starter roger.searjeant
  • Start date
R

roger.searjeant

I am writing a C# program and using the official MS PIAs to access Word
documents (Office version 10, i.e. Office XP) via interop.

Everything works fine except that I cannot figure out how to get at any
OLE objects embedded in a Word document (e.g. another document, a
picture, Excel file etc.): the Word.Document API doesn't appear to have
any method or property for this. I have tried using Reflector to
search for the appropriate call, and I have Googled these newsgroups,
but found nothing.

Can anyone help me?

Roger Searjeant.
 
K

khcharles

I wish I knew how to do it without windows and DoVerbs. That's how I'm doing
it right now. I know that you can get the object descriptor and the embedded
object itself which is encapsulated as an embedded object of classtype
something, outlook.fileattach for me. I just don't know how to extract the
file from that embedded object. for one thing I don't even know what
interface that object implements.

Someone has got to know!
 
R

roger.searjeant

OK, I think I have a solution! I am using C# and the PIA (Primary
Interop Assemblies) from MS, which provides access to the standard Word
and Office COM interfaces. It takes some digging to get what you need
(there really isn't much documentation), but I believe I have done it.

I will post a zip containing a working example solution, if you (or
others reading this) are interested.

Note that there are subtle (but important) differences between Office
10 (XP) and Office 11 (2003): the assignment I have been working on
required Office 10 (they are a bit behind the times where I work). I
might try to include code for both versions (compiled conditionally).

Do you know if I can post a zip here?
Roger.
 
K

khcharles

I don't think so. my email is my display name @hotmail.com. please email it
to me if you don't post.

Nice work
 
C

Cindy M -WordMVP-

Hi Roger.searjeant,
Everything works fine except that I cannot figure out how to get at any
OLE objects embedded in a Word document (e.g. another document, a
picture, Excel file etc.):
The simplest way, when using automation, is to create a LINK field for
the object, then break the LINK if you don't want an active link.

Other than that, the object model provides the InsertObject method.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
K

khcharles

I think that roger is trying to extract out embedded outlook attachments, not
insert them in.
 
C

Cindy M -WordMVP-

Hi =?Utf-8?B?a2hjaGFybGVz?=,
I think that roger is trying to extract out embedded outlook attachments, not
insert them in.
Certainly not Outlook attachments (at least, not according to his original post)
OLE objects. But yes, I missed the word "at" in "get at any OLE objects" when I
read the post, originally.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 
K

khcharles

Extracting out embedded outlook attachments is what I'm trying to do. for
anything else I think you can use inlineshape.oleformat.object.SaveAs.
 
R

roger.searjeant

Hello again folks. Here's an update.

First of all, khcharles is correct about what I am trying to do: I have
a Word document containing embedded objects, potentially of mixed type
and origin. I need to extract these and save them out to a folder.

Next, I have a skeleton VS.Net solution which I will share with anyone
who wants it (khcharles, I will send you a copy).

It seems to be possible to access embedded OLE objects in two quite
separate ways, using the Word automation API. First by iterating the
Word document's InlineShapes, and testing for
wdInlineShapeEmbeddedOLEObject, and second by iterating the document's
Fields and testing for wdFieldEmbed.

In both cases, the OLEFormat property of the associated Word object
(i.e. InlineShape or Field object, respectively) yields a
Word.OLEFormat object. This object contains an 'Object' property which
appears to be the way you are supposed to access the embedded OLE
object.

This works fine (for me) when the embedded object is another Word
document, but fails when it is (e.g.) an Excel worksheet. I receive an
InvalidCastException even if I simply extract the Object value into
another (CLR) Object variable, thus:

// Calling context defines: Word.OLEFormat oOLE
Object theEmbeddedObject = oOLE.Object; // throws
InvalidCastException

This does seem odd: you might expect the InvalidCastException if you
tried to cast from System.Object to something more specific, but this
assignment is essentially redundant, i.e. there is no re-interpretation
here.

The OLEFormat object appears to be a wrapper around a
System.__ComObject (i.e. a runtime-callable-wrapper for a COM object).
In the VS.NET debugger you can see that Word correctly handles the COM
object for an embedded Word document, and assigns a Word.DocumentClass
object to the OLEFormat.Object property. When the embedded Excel sheet
is encountered, however, you see that the Object property value is not
assigned - it raises the exception.

The other way to handle this, it seems to me, is to use the OLE ProgID
string to start a COM server of the correct type, then ask that server
to handle the object. All I want to do is call the 'SaveAs' OLE verb
on the object. I haven't located the list of Excel's OLE verbs (can
anyone help with that?) so I don't know the correct syntax to use. But
all this seems really silly when I should be able to pull the OLE
object straight out of the enclosing document!

Cindy, it's good to see a Word MVP working on this issue! Can you help
to resolve this?

Many thanks to all,
Roger Searjeant.
 
R

roger.searjeant

This is a test posting - I wrote quite a detailed follow-up message,
which seemed to post OK but hasn't appeared. This is just to see
whether posting works OK (from Google Groups).
 
R

roger.searjeant

Another short follow-up to let you all know what I have discovered.

Cindy - I found another of your postings concerning Word automation
which did help with this issue. It turns out that when trying to deal
with the Excel embedding, the following sequence of calls resolves the
InvalidCastException:

// Word.OLEFormat oOLE is defined in this context:
oOLE.ActivateAs(oOLE.ProgID); // Use ProgID to select correct COM
Server
oOLE.Activate(); // Activate the server.

The above lines work, in that the Excel application starts, appears on
the screen and successfully loads the embedded Worksheet. Extracting
the oOLE.Object also now works without raising an Exception; however,
this doesn't really help because the extracted object is still only an
untyped RCW as far as I can tell.

I now need to find a way to call 'Save As' on the Excel sheet. With an
OLE object, we should be able to drive it via the IDispatch interface
if we know the OLE verbs. I just need to do SaveAs, but I don't know
the verb for that and (more fundamentally) it seems that the RCW
doesn't provide access to a dispatch interface, which seems very odd.
I tried the following code, which uses reflection to late-bind to a
function name. Without an IDispatch-based object to play with, this
isn't going to work:

// The following block is also no good. The type object (t) refers to
a System.__ComObject
// and not to the Excel application object. InvokeMember only works in
IDispatch
// interfaces: this isn't a dispatch-based object.
// Type t = Type.GetTypeFromProgID(sProgId);
// MethodInfo[] methods = t.GetMethods();
// object o = Activator.CreateInstance(t);
// t.InvokeMember("SaveAs", BindingFlags.InvokeMethod, null, o, null);
// Don't know member name, though.

SO: to summarise, I am stuck at this stage: I can extract the Excel
embedding, and start/load Excel but I cannot:
(1) extract a strongly-typed Excel object from the OLE object (in the
same way I can with Word)
(2) Invoke 'SaveAs' in Excel via the OLE verb.

Cindy - any further thoughts on this? Anyone else?
Cheers,
Roger.
 
C

Cindy M -WordMVP-

<[email protected]>
<[email protected]>
Newsgroups: microsoft.public.office.developer.automation
NNTP-Posting-Host: 40.207.203.62.cust.bluewin.ch 62.203.207.40
Path: number1.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newshub.sdsu.edu!msrtrans!TK2MSFTNGP08.phx.gbl!tk2msftngp13.phx.gbl
Lines: 1
Xref: number1.nntp.dca.giganews.com microsoft.public.office.developer.automation:9513

Hi Roger.searjeant,
SO: to summarise, I am stuck at this stage: I can extract the Excel
embedding, and start/load Excel but I cannot:
(1) extract a strongly-typed Excel object from the OLE object (in the
same way I can with Word)
(2) Invoke 'SaveAs' in Excel via the OLE verb.
Sorry, I was away in "ferrin parts" for some three weeks, and have been
playing catch-up. Are you still looking for help with this?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question
or reply in the newsgroup and not by e-mail :)
 
R

roger.searjeant

Hi Cindy!

Yes, I am certainly still looking for help. I haven't (yet) tried
talking to our MSFT TAM. Maybe I should also do that? It's looking
more and more like a bug.

I can zip and send you a C# project if you like, but I should warn you
that we use Office 10 ('XP') and not Office 2003, so you'd need to have
that (and the PIAs) installed.

I'd be grateful for any suggestions or pointers.

Thanks,
Roger Searjeant.
 
C

Cindy M -WordMVP-

<[email protected]>
<[email protected]>
Newsgroups: microsoft.public.office.developer.automation
NNTP-Posting-Host: 154.196.62.81.cust.bluewin.ch 81.62.196.154
Path: number1.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newshosting.com!nx02.iad01.newshosting.com!news.alt.net!msrtrans!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP14.phx.gbl
Lines: 1
Xref: number1.nntp.dca.giganews.com microsoft.public.office.developer.automation:9864

Hi Roger.searjeant,

I've been unable to concentrate on this again for quite a while, I know.
My apologies. During that time, however, I did work with automating Excel
via OLE, and recall the following, which may be pertinent

Excel does not provide SaveAs if you open it in an OLE-bound control
(such as in a Word document). You must open it in an Excel window in
order to use SaveAs.

So rather than use an "Activate" verb, it would need to be "Open".
the Excel application starts, appears on
the screen and successfully loads the embedded Worksheet. Extracting
the oOLE.Object also now works without raising an Exception; however,
this doesn't really help because the extracted object is still only an
untyped RCW as far as I can tell.

I now need to find a way to call 'Save As' on the Excel sheet. With an
OLE object, we should be able to drive it via the IDispatch interface
if we know the OLE verbs. I just need to do SaveAs, but I don't know
the verb for that and (more fundamentally) it seems that the RCW
doesn't provide access to a dispatch interface, which seems very odd.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top