retrieve last 10 authors from Word file

A

AC

Is this a magical question? I know there are tools that will give you this
but I want to automate it in some code. Posted a similar question
yesterday. Maybe it was too vague.

I am looking at maybe parsing it at a binary level to get the info out. It
doesn't look like a standard format. Will post it here if I don't get a
response.

Thanks in advance,
--AC
 
T

Tony Jollans

I'm not an expert but I don't believe Word holds that information. If you
have track changes on, changes can be recorded by author and one might be
able to extract all the authors who have made changes but I think that's the
limit of it.

Parsing the file at a binary level requires knowledge of the OLE structured
storage format that Word uses and, except for file properties, Microsoft do
not document this so, at best, it would require guesswork.
 
J

Jezebel

It's actually even harder than that. A Word document file doesn't have an
OLE structured format. It's polymorphic and, by all accounts deliberately
obfuscated, as a security measure.

Document properties will tell the original and latest authors. That's it as
far as I know.
 
J

Jonathan West

Jezebel said:
It's actually even harder than that. A Word document file doesn't have an
OLE structured format. It's polymorphic and, by all accounts deliberately
obfuscated, as a security measure.

It is an OLE structured storage document, but an exceedingly complex one. I
don't believe there is any obfuscation for security purposes, it is just a
difficult file format. That will be changing in Office 12, where the native
file format will be XML-based and the specification will be fully published.
Document properties will tell the original and latest authors. That's it
as far as I know.

If you open a Word the document using the "Recover Text from Any File"
option, the lat 10 authors are listed towards the end. But I don't know how
to reliably select just the right paragraphs in order to machine-read the
author list by this technique

--
Regards
Jonathan West - Word MVP
www.intelligentdocuments.co.uk
Please reply to the newsgroup
Keep your VBA code safe, sign the ClassicVB petition www.classicvb.org
 
J

Jezebel

Perhaps we have a different understanding of "OLE structured" -- the Word
object model is OLE structured, but I don't know what it means to say that
the file itself is: you can't link the file itself or make OLE calls to it.
 
G

Graham Mayor

I suspect the script editor ALT+SHIFT+F11 will give you all that there is to
get from the document, but I don't have to hand a document edited by
multiple authors to check what is included.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
T

Tony Jollans

The file, itself, is not an OLE object, it allows for the storing of OLE
objects.

I also believe that what few API routines are available do not have VBA
wrappers and, although I have a rough idea of the structure, for all
practical purposes it is a black box.
 
J

Jonathan West

Jezebel said:
Perhaps we have a different understanding of "OLE structured" -- the Word
object model is OLE structured, but I don't know what it means to say that
the file itself is: you can't link the file itself or make OLE calls to
it.

"OLE structured storage" is a technical term for a class of file formats,
just like "XML" is. Saying that a file is XML doesn't help you a while lot
unless and until you know what the various XML tags within the file are
supposed to mean. That meaning varies depending on what XML Schema(s) have
been applied. So it is with the OLE structured storage format - there is a
very long & detailed specification of the Word Binary File Format (which I
have not seen) but which uses the overall "grammar" of the OLE structured
storage format.

--
Regards
Jonathan West - Word MVP
www.intelligentdocuments.co.uk
Please reply to the newsgroup
Keep your VBA code safe, sign the ClassicVB petition www.classicvb.org
 
A

AC

AC said:
Is this a magical question? I know there are tools that will give you
this
but I want to automate it in some code. Posted a similar question
yesterday. Maybe it was too vague.

I am looking at maybe parsing it at a binary level to get the info out.
It doesn't look like a standard format. Will post it here if I don't get
a response.

Thanks in advance,
--AC

WordLeaker is all we have at this time that is freeware. There are some
packages you can apparently pay for but didn't find demos to run and this
works, so went with it. It works ok and I have sort of automated it. It
comes with source too. Using it on Word97 files so not sure how well it
works with later versions.

http://tinyurl.com/aggad

--AC

P.S. for the search engines, Microsoft Word, Last Ten Authors , Author
History
 
A

Anne P.

Actually, Word does store info about the last 10 authors to edit a document
and also complete path and filename info. It is called "metadata" and it
has been biting a lot of people in the a__. There are numerous documents
on Microsoft's Knowledge Base about Metadata and these articles tell you
exactly what type of information is being stored in your documents.

If you want to see the metadata that is in a document, from Word choose
File, Open. In the file type drop down list select Recover Text From any
File, then select a document that you know has been edited by several
people.

As you scroll down through the document you will see this info.

Anne P.
 
A

AC

Anne P. said:
Actually, Word does store info about the last 10 authors to edit a
document and also complete path and filename info. It is called
"metadata" and it has been biting a lot of people in the a__. There are
numerous documents on Microsoft's Knowledge Base about Metadata and these
articles tell you exactly what type of information is being stored in your
documents.

If you want to see the metadata that is in a document, from Word choose
File, Open. In the file type drop down list select Recover Text From any
File, then select a document that you know has been edited by several
people.

As you scroll down through the document you will see this info.

Anne P.

"There are numerous documents on Microsoft's Knowledge Base about Metadata
and these articles tell you exactly what type of information is being stored
in your documents."

I know it stores the last ten authors which is what this thread is about.
That's not the question. The question was how to retrieve the author
history programmatically. If I am talking about handling Word files at the
binary level don't you think I would know how to view the data as text? My
"solution" post was an attempt to help others looking for programmatic
information since it is hard to find.


Regards
 
A

Anne P.

Save your attitude for someone who deserves it. Maybe my reply got posted
in the "wrong level" of this thread. I was not referring directly to your
post, or impugning your knowledge. I was simply pointing out in reply to
"other" posts that the author information most assuredly is stored and is
referred to as Metadata. I assumed that an interested party would think to
do a Google search on metadata. After all, there are a large number of
solutions out there for sale to "clean" this information out of a file, so
maybe the poster could find out how they retrieve this information.

Anne P.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top