Is a file DOCX or XLSX?

J

Jialiang Ge [MSFT]

Hello Dave,

From your post, my understanding on this issue is: you want to know how to
determine the file type (docx/xlsx) according to a file stream. If I'm off
base, please feel free to let me know.

In the previous versions of Office file types, in order to determine the
file type according to the file stream, we could utilize the file signature
information in the first few bytes of the stream. (See File Signature Table
http://www.garykessler.net/library/file_sigs.html). But in Office 2007,
because all the files are actually in zip format, I do not think that we
could rely on the file signature.

Based on my discussion with Office product team, Office 2007, internally,
is mainly depending on the file extension to determine the file type due to
some security issues. For instance, if you rename a xlsx file to docx,
Office would choose to use Word, rather than Excel, to open the file even
if the file format is xlsx actually.

For your question about how to determine the file type with just a file
stream, I think we do need to extract the [Content_types].xml with, for
instance, SharpZipLib. Then check the Content Type items.
For docx file:
<Override PartName="/word/document.xml"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.
document.main+xml" />
For xlsx file:
<Override PartName="/xl/workbook.xml"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.she
et.main+xml" />

Please let me know if you have any other concerns, or need anything else.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
D

David Thielen

Ok, that's what I am doing. This strikes me as poor design.

--
thanks - dave
david_at_windward_dot_net
http://www.windwardreports.com

Cubicle Wars - http://www.windwardreports.com/film.htm




Jialiang Ge said:
Hello Dave,

From your post, my understanding on this issue is: you want to know how to
determine the file type (docx/xlsx) according to a file stream. If I'm off
base, please feel free to let me know.

In the previous versions of Office file types, in order to determine the
file type according to the file stream, we could utilize the file signature
information in the first few bytes of the stream. (See File Signature Table
http://www.garykessler.net/library/file_sigs.html). But in Office 2007,
because all the files are actually in zip format, I do not think that we
could rely on the file signature.

Based on my discussion with Office product team, Office 2007, internally,
is mainly depending on the file extension to determine the file type due to
some security issues. For instance, if you rename a xlsx file to docx,
Office would choose to use Word, rather than Excel, to open the file even
if the file format is xlsx actually.

For your question about how to determine the file type with just a file
stream, I think we do need to extract the [Content_types].xml with, for
instance, SharpZipLib. Then check the Content Type items.
For docx file:
<Override PartName="/word/document.xml"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.
document.main+xml" />
For xlsx file:
<Override PartName="/xl/workbook.xml"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.she
et.main+xml" />

Please let me know if you have any other concerns, or need anything else.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top