What exactly is saved in Word, Excel or PowerPoint files?

A

Alexander Erlich

Hello,

when I create an new, empty document in Word, Excel or PowerPoint, it
has a size of several kB.

What exactly is saved into these files (except for what I change by
editing, formatting etc.)? The time of creation? The name of the user
to whom Office is licensed? Information about the operating system? The
license key?

When I have a document from another user, what can I learn about him?
How is it when documents are changed (e.g. on other PCs, with other
Office and Windows versions)?

Thx in advance!

Alexander
 
A

Alexander Erlich

B

Beth Metlon

What you are asking about is called metadata - information that is stored in
a file but not immediately visible on the screen. Primarily all that is
saved is the file structure, creation date, author, subject - the
information you can see when you go to File/Properies.

In the past additional information, such as the current and previous
document path, last authors, and a GUID for the file. Due to Personally
Identifiable Information (PII) concerns this type of information is no
longer stored in Office documents. IOW, all you can learn about an author of
a document these days is essencially same information you can access by
going to File/Properties. (Note that there are various features, such as
Track Changes, that will include additional information.)

Also note that Word has an option to remove personal information on save so
for some documents there is nothing to learn.

Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 
A

Alexander Erlich

Hello JoAnn,
No, the license number is not saved in there. That would lead to widespread
theft of software.

You mean that experts could easily "look up" a serial number from the
files? Yes, I haven't thought about that, this would of course increase
the number of serial numbers available on the internet.

Thank you!

Alexander
 
A

Alexander Erlich

Another question: how comes the file size grows with the saves (even if
the content of the file does not, like when you change the position of
pictures or rewrite text)?

Alexander
 
D

databaseben

there is a freeware
from microsoft
called something like
"remove hidden data"
or something like that.

it clears out any hidden
data of the file so it you
send it, nothing but the data
will be made available to the
receiver....
 
A

Alexander Erlich

Regarding a file, it possible to find out whether the Office version is
legal or not? Is this used in order to prevent theft of software? Do
other developers of wide-spread formats, such as Flash or PDF, store
metadata to find out whether the software to create these files was
acquired legally?

Alexander
 
B

Bob Buckland ?:-\)

Hi Alexander,

In recent versions of MS Office the approach from MS and others has been to remove or limit identifying metadata, to protect privacy
and private data, being included in documents as mentioned in the articles linked to in the earlier messages. In some countries you
might have legal issues in including 'tracking data'.

To see a bit of what in a Word document, for example, try using File=>Open and set the file type to 'recover text' and then open a
Word document.

As a 'widespread' document could be found in many different locations it wouldn't be too helpful to authors to know what copy made
the file.
===============
Regarding a file, it possible to find out whether the Office version is
legal or not? Is this used in order to prevent theft of software? Do
other developers of wide-spread formats, such as Flash or PDF, store
metadata to find out whether the software to create these files was
acquired legally?

Alexander >>
--

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*
 
D

databaseben

i don't think that
it was a design flaw
nor a deliberate intent.

instead it is an inherited
quality. In that all files
are no longer simply a
filename on a envelope
of information.

Now the envelopes have
more than the filename.
And if you use the additional
data, then it is beneficial towards
better organization.

If you those files on another system
then the data is helpful. If you send
a file to someone else, there would be
no need to send that data. Hence
the hidden data removal tool.

But all of this is my personal
opinion. For me, instead
of sending any office file in
and editiable state, i send them
in pdf format.

If i send any office file, regardless
of whatever metadata is sent, I
don't want any of my information
transposed or re writtened by the
end user. So I will always
send my office files in an editiable
state like pdf or mdi. Emails are
exempt, however....
 
E

Echo S

That's because of "fast saves." Basically, that causes changes to be
appended to the file as opposed to the file being rewritten. This causes the
file size to increase.

In PPT, you can turn it off in Tools | Options, and I recommend that you do
so.
 
A

Alexander Erlich

Hello,

thank you for your answers, I also tried the Metadata removal tool. I
have two further questions:

1) How can it be that Office requires roughly half a GB of hard disk
space, whereas OpenOffice requires only about a hundred MB? I know that
OO has less tools, i.e. it does not have Access nor Frontpage. But
still, the disk requirements seem somewhat unproportionate, is there an
explaination for this? Is it the amount of features?

2) Maybe I have misunderstood something, but I am still not aware
whether metadata can be used to identify illegal copies of Office or
Windows. Even though it has been said, that MS and other developers had
a tendency to reduce that, I am not certain whether it is or is not
possible to identify such copies. I would be grateful for information
on that, and also about other developers, like Adobe or Macromedia.

Alexander
 
E

Echo S

2) Maybe I have misunderstood something, but I am still not aware
whether metadata can be used to identify illegal copies of Office or
Windows. Even though it has been said, that MS and other developers had
a tendency to reduce that, I am not certain whether it is or is not
possible to identify such copies. I would be grateful for information
on that, and also about other developers, like Adobe or Macromedia.

I believe that metadata in Office files cannot be used to identify illegal
copies of Office, but I don't know for sure. I can tell you, as others have,
that the license key for Office is not saved in the files.

As for Macromedia, it was purchased by Adobe. As for Adobe, I have heard
that there are ways to see this type of info in Photoshop (and probably
other) files, but I don't know the details. (ISTR something about a student
version watermark and copyright watermarks....) You will be better off
asking about Adobe products in the Adobe forums at http://www.adobe.com
 
B

Beth Melton

I'd say the disk space has to do with the number of features Office has over
other apps.

To answer your question about whether an Office document can identify an
illegal copy, I'm sure it's possible but the Microsoft applications
definitely do not store that type of information. Microsoft has enough
lawsuits - they don't need any more.

btw, with so many questions about a document storing information on illegal
software, one might surmise you're making sure you aren't "found out". ;-)

Please post all follow-up questions to the newsgroup. Requests for
assistance by email can not be acknowledged.

~~~~~~~~~~~~~~~
Beth Melton
Microsoft Office MVP

Word FAQ: http://mvps.org/word
TechTrax eZine: http://mousetrax.com/techtrax/
MVP FAQ site: http://mvps.org/
 
A

Alexander Erlich

Hello,

thank you for your answers. I am not making sure not to be "found out",
but finding out how metadata works. In computer science we are
currently dealing with algorithms that permit us to encode data. I am
making a presentation about "serial numbers". I will try to explain
what they are and how they "protect" software, how key generators work
etc. Metadata in MS Office and other wide-spread tools is very
important here, and it is also important how good it can protect
software from theft.

Thank you!

Alexander
 
Top