docx file corrupted when saved and retrieved from SQL2005

H

Haim

I have an application that saves word files to a SQL2005 database and
retrieves them on demand. It works fine for files saved in the doc format,
but if the file is saved from the docx format, I cannot open it in my code.
When I open the file retrieved from the database manually in word 2007, I get
an error message "The file is corrupted and cannot be open". then I get a
second dialog box "some of the content cannot be read, if you trust the souce
word will attempt to open it" and then it does.
I have similar code which automates word and has it save the file to the
database. If I save the file in the old format I have no problems with it.
If I save the file in the docx format, than I cannot reoppen it.


Here is the code for retrieving the document:
Dim myrow As bardpagesDS.SelTempPropFilesRow =
myfileds.SelTempPropFiles.Rows(0)
Dim mybuffer() As Byte
Dim myext As String

Select Case filetype
Case "abstract"

mybuffer = myrow.AbstractFile
myext = myrow.ABSFiletype
Case "body"
mybuffer = myrow.ProposalFile
myext = myrow.PropFileType

End Select
If My.Computer.FileSystem.FileExists(filetype + "." + myext) Then
My.Computer.FileSystem.DeleteFile(filetype + "." + myext)
End If
Dim myfile As New IO.FileStream(filetype + "." + myext,
IO.FileMode.Create, IO.FileAccess.ReadWrite)
myfile.Write(mybuffer, 0, mybuffer.Length)
myfile.Close()
Return filetype + "." + myext

Code Which Uploads file
Public Function uploadfile(ByVal mytcn As Integer, ByVal mypw As String,
ByVal myfile As HttpPostedFile, ByVal filetype As String) As String
Try

Dim myarray(myfile.ContentLength) As Byte

Dim mybr As New System.IO.BinaryReader(myfile.InputStream)

mybr.Read(myarray, 0, myfile.ContentLength)
mybr.Close()
Dim c As String = System.IO.Path.GetFileName(myfile.FileName)
c = Right(c, 3)
Dim myscmd As Data.SqlClient.SqlCommand

myscmd = Me.SCMD_AddAbstractFile

With myscmd
.Parameters(1).Value = mytcn
.Parameters(2).Value = mypw
.Parameters(3).Value = c
.Parameters(4).Value = myarray

.Connection.Open()

.ExecuteNonQuery()

.Connection.Close()
End With
Return "OK"
Catch ex As Exception
Return ex.Message
End Try

End Function
--
Haim Katz
BARD
Was this post helpful to you?

Why should I rate a post?
 
J

Jialiang Ge [MSFT]

Good morning Haim. Welcome to Microsoft Newsgroup Support Service! My name
is Jialiang Ge [MSFT]. It's my pleasure to work with your on this issue.

I have reproduced the symptom "The file is corrupted and cannot be open"
for docx with your code. After debugging the program and looking into the
binary of the resulting docx, I find that the symptom is caused by the
declaration of buffer: Dim myarray(myfile.ContentLength) As Byte

Let's first look at the resolution of the issue, then I will explain "WHY".
Last, I will share the skills to trouble-shoot this kind of issues, which
may benefit you in future.

--- RESOLUTION ---
Change the code line

Dim myarray(myfile.ContentLength) As Byte

in the function "uploadfile" to

Dim myarray(myfile.ContentLength - 1) As Byte

--- CAUSE ---
In the grammar of VB.NET, vb6 or vba, Dim abArray(n) defines an array sized
n+1. In other words, Dim myarray(myfile.ContentLength) declares an array
with (myfile.ContentLength + 1) elements and result in an extra byte at the
end of the file. To allocate an array with n elements, we need to use Dim
abArray(n-1)

--- SKILL ---
You may wonder how I found out the problem. I used your code to import &
export a docx file. The resulting docx cannot be opened, thus I used a
binary editor to compare the two docx. To my surprise, the resulting one
always has an extra byte (00) in the end. If I remove the byte, it opens
fine. That said, if we can determine where the extra byte is getting added
(import or export), we should be set. I started with the import procedure.
The source docx for test contained 1069bytes (it's deteced from my binary
editor). I saw the code myfile.ContentLength return 1069 as expected,
however, in.Parameters(4).Value = myarray, I detected that myarray
contained 1070 elements. Then I know where the problem is.

You may also wonder why the problem did not occur to the doc files. It's
because doc and docx have completely different file formats (see
http://msdn.microsoft.com/en-us/library/aa338205.aspx). Microsoft Word uses
different parser for the two files. The parser of doc does not regards the
extra 00 byte at the end of the file as a mistake. But the parser of docx
does.

Haim, please try the solution and let me know whether it works for you or
not. If you have any other questions or concerns, please DON'T hesitate to
tell me.

P.S.
I notice that you once posted this question in the
microsoft.public.office.developer.officedev.other newsgroup a few weeks
ago. Our managed newsgroup system did not capture that post and we
(Microsoft) did not jump in because
microsoft.public.office.developer.officedev.other is not a managed
newsgroup. For the list of queues that are managed by us, please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa974230.aspx.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
H

Haim

Dear Jialiang,
That was the timeliest, most thorough and accurate response I have ever
received in any newsgroup. This problem has been bugging me for months (I
use this in many of my applications and have been saving files in doc format
to work around the bug) and in less than a few hours you have answered all my
questions, including the one why my original post was not answered.

You did the equivalent of ending an inning by jumping over the fence to
prevent a home run, and then got up to bat with the bases loaded and hit a
grand slam.

Thank you very much.

Haim
--
Haim Katz
BARD


"Jialiang Ge [MSFT]" said:
Good morning Haim. Welcome to Microsoft Newsgroup Support Service! My name
is Jialiang Ge [MSFT]. It's my pleasure to work with your on this issue.

I have reproduced the symptom "The file is corrupted and cannot be open"
for docx with your code. After debugging the program and looking into the
binary of the resulting docx, I find that the symptom is caused by the
declaration of buffer: Dim myarray(myfile.ContentLength) As Byte

Let's first look at the resolution of the issue, then I will explain "WHY".
Last, I will share the skills to trouble-shoot this kind of issues, which
may benefit you in future.

--- RESOLUTION ---
Change the code line

Dim myarray(myfile.ContentLength) As Byte

in the function "uploadfile" to

Dim myarray(myfile.ContentLength - 1) As Byte

--- CAUSE ---
In the grammar of VB.NET, vb6 or vba, Dim abArray(n) defines an array sized
n+1. In other words, Dim myarray(myfile.ContentLength) declares an array
with (myfile.ContentLength + 1) elements and result in an extra byte at the
end of the file. To allocate an array with n elements, we need to use Dim
abArray(n-1)

--- SKILL ---
You may wonder how I found out the problem. I used your code to import &
export a docx file. The resulting docx cannot be opened, thus I used a
binary editor to compare the two docx. To my surprise, the resulting one
always has an extra byte (00) in the end. If I remove the byte, it opens
fine. That said, if we can determine where the extra byte is getting added
(import or export), we should be set. I started with the import procedure.
The source docx for test contained 1069bytes (it's deteced from my binary
editor). I saw the code myfile.ContentLength return 1069 as expected,
however, in.Parameters(4).Value = myarray, I detected that myarray
contained 1070 elements. Then I know where the problem is.

You may also wonder why the problem did not occur to the doc files. It's
because doc and docx have completely different file formats (see
http://msdn.microsoft.com/en-us/library/aa338205.aspx). Microsoft Word uses
different parser for the two files. The parser of doc does not regards the
extra 00 byte at the end of the file as a mistake. But the parser of docx
does.

Haim, please try the solution and let me know whether it works for you or
not. If you have any other questions or concerns, please DON'T hesitate to
tell me.

P.S.
I notice that you once posted this question in the
microsoft.public.office.developer.officedev.other newsgroup a few weeks
ago. Our managed newsgroup system did not capture that post and we
(Microsoft) did not jump in because
microsoft.public.office.developer.officedev.other is not a managed
newsgroup. For the list of queues that are managed by us, please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa974230.aspx.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
J

Jialiang Ge [MSFT]

You are welcome, Haim. Thanks for using Microsoft Newsgroup Support Service!

Have a great day!

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

This posting is provided "AS IS" with no warranties, and confers no rights.
=================================================
 
L

liam wheldon

Amazing response, not seen such a good response in ages and gave me the solution that I couldn't find anywhere else! along with a great explanation :)

Amazing how much hell, sweat and stress 1 byte can cause!

regards
Liam
I have an application that saves word files to a SQL2005 database and
retrieves them on demand. It works fine for files saved in the doc format,
but if the file is saved from the docx format, I cannot open it in my code.
When I open the file retrieved from the database manually in word 2007, I get
an error message "The file is corrupted and cannot be open". then I get a
second dialog box "some of the content cannot be read, if you trust the souce
word will attempt to open it" and then it does.
I have similar code which automates word and has it save the file to the
database. If I save the file in the old format I have no problems with it.
If I save the file in the docx format, than I cannot reoppen it.


Here is the code for retrieving the document:
Dim myrow As bardpagesDS.SelTempPropFilesRow =
myfileds.SelTempPropFiles.Rows(0)
Dim mybuffer() As Byte
Dim myext As String

Select Case filetype
Case "abstract"

mybuffer = myrow.AbstractFile
myext = myrow.ABSFiletype
Case "body"
mybuffer = myrow.ProposalFile
myext = myrow.PropFileType

End Select
If My.Computer.FileSystem.FileExists(filetype + "." + myext) Then
My.Computer.FileSystem.DeleteFile(filetype + "." + myext)
End If
Dim myfile As New IO.FileStream(filetype + "." + myext,
IO.FileMode.Create, IO.FileAccess.ReadWrite)
myfile.Write(mybuffer, 0, mybuffer.Length)
myfile.Close()
Return filetype + "." + myext

Code Which Uploads file
Public Function uploadfile(ByVal mytcn As Integer, ByVal mypw As String,
ByVal myfile As HttpPostedFile, ByVal filetype As String) As String
Try

Dim myarray(myfile.ContentLength) As Byte

Dim mybr As New System.IO.BinaryReader(myfile.InputStream)

mybr.Read(myarray, 0, myfile.ContentLength)
mybr.Close()
Dim c As String = System.IO.Path.GetFileName(myfile.FileName)
c = Right(c, 3)
Dim myscmd As Data.SqlClient.SqlCommand

myscmd = Me.SCMD_AddAbstractFile

With myscmd
.Parameters(1).Value = mytcn
.Parameters(2).Value = mypw
.Parameters(3).Value = c
.Parameters(4).Value = myarray

.Connection.Open()

.ExecuteNonQuery()

.Connection.Close()
End With
Return "OK"
Catch ex As Exception
Return ex.Message
End Try

End Function
--
Haim Katz
BARD
Was this post helpful to you?

Why should I rate a post?

--
Haim Katz
BARD
On Wednesday, July 30, 2008 2:22 AM jialg wrote:
Good morning Haim. Welcome to Microsoft Newsgroup Support Service! My name
is Jialiang Ge [MSFT]. It's my pleasure to work with your on this issue.

I have reproduced the symptom "The file is corrupted and cannot be open"
for docx with your code. After debugging the program and looking into the
binary of the resulting docx, I find that the symptom is caused by the
declaration of buffer: Dim myarray(myfile.ContentLength) As Byte

Let's first look at the resolution of the issue, then I will explain "WHY".
Last, I will share the skills to trouble-shoot this kind of issues, which
may benefit you in future.

--- RESOLUTION ---
Change the code line

Dim myarray(myfile.ContentLength) As Byte

in the function "uploadfile" to

Dim myarray(myfile.ContentLength - 1) As Byte

--- CAUSE ---
In the grammar of VB.NET, vb6 or vba, Dim abArray(n) defines an array sized
n+1. In other words, Dim myarray(myfile.ContentLength) declares an array
with (myfile.ContentLength + 1) elements and result in an extra byte at the
end of the file. To allocate an array with n elements, we need to use Dim
abArray(n-1)

--- SKILL ---
You may wonder how I found out the problem. I used your code to import &
export a docx file. The resulting docx cannot be opened, thus I used a
binary editor to compare the two docx. To my surprise, the resulting one
always has an extra byte (00) in the end. If I remove the byte, it opens
fine. That said, if we can determine where the extra byte is getting added
(import or export), we should be set. I started with the import procedure.
The source docx for test contained 1069bytes (it's deteced from my binary
editor). I saw the code myfile.ContentLength return 1069 as expected,
however, in.Parameters(4).Value = myarray, I detected that myarray
contained 1070 elements. Then I know where the problem is.

You may also wonder why the problem did not occur to the doc files. It's
because doc and docx have completely different file formats (see
http://msdn.microsoft.com/en-us/library/aa338205.aspx). Microsoft Word uses
different parser for the two files. The parser of doc does not regards the
extra 00 byte at the end of the file as a mistake. But the parser of docx
does.

Haim, please try the solution and let me know whether it works for you or
not. If you have any other questions or concerns, please DON'T hesitate to
tell me.

P.S.
I notice that you once posted this question in the
microsoft.public.office.developer.officedev.other newsgroup a few weeks
ago. Our managed newsgroup system did not capture that post and we
(Microsoft) did not jump in because
microsoft.public.office.developer.officedev.other is not a managed
newsgroup. For the list of queues that are managed by us, please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa974230.aspx.

Regards,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
On Wednesday, July 30, 2008 8:28 AM tom-hi wrote:
Dear Jialiang,
That was the timeliest, most thorough and accurate response I have ever
received in any newsgroup. This problem has been bugging me for months (I
use this in many of my applications and have been saving files in doc format
to work around the bug) and in less than a few hours you have answered all my
questions, including the one why my original post was not answered.

You did the equivalent of ending an inning by jumping over the fence to
prevent a home run, and then got up to bat with the bases loaded and hit a
grand slam.

Thank you very much.

Haim
--
Haim Katz
BARD


""Jialiang Ge [MSFT]"" wrote:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top