Extract files from BLOB corrupts PDF

  • Thread starter Danny J. Lesandrini
  • Start date
D

Danny J. Lesandrini

I need to extract some PDF files from a SQL Server BLOB field, linked in an
Access MDB file. I found a couple different code approaches, both of which
are represented below, and both of which I've tested out.

I keep getting this error when I try to open the file that is created.

Adobe Reader could not open <file name> because it is either not a
supported file type or because the file has been corrupted.


I know the file IS a PDF file, because if I double click on the OLE field in
the linked table from the Access MDB, the PDF opens up fine. I've tried it
with several variations:

* From an Access 2003 database
* From an Access 2000 database
* With the DAO.Recordset.GetChunk code (below)
* With the ADO.Recordset.GetChunk code (not shown)
* With the ADO.Stream object (commented out below)

I also tried it on a few different backup copies of the SQL Server database,
from and on different servers. Sometimes through a VPN connection and
sometimes directly from the server.

I'm out of options here. Any chance someone sees what's wrong with my
approach and/or has an alternate solution for getting files out of a BLOB field?
--
Danny J. Lesandrini
(e-mail address removed)
www.amazecreations.com



Function ProcessEventRows()
On Error GoTo Err_WriteBLOB

Dim intNumBlocks As Integer
Dim intDestFile As Integer
Dim intI As Integer
Dim lngFileLength As Long
Dim lngLeftOver As Long
Dim strFileData As String
Dim varRetVal As Variant
Dim rst As DAO.Recordset
Dim dbs As DAO.Database

Dim strField As String
Dim strDestination As String
Dim strSQL As String
Dim strErr As String
Dim lEID As Long
Dim lAID As Long
Dim strPath As String
Dim strFileName As String

If MsgBox("Do you want to clear the log and process ALL event files?", vbQuestion + vbYesNo, "Continue?") = vbNo Then Exit
Function

Set dbs = CurrentDb
strSQL = "DELETE FROM tblAttachmentLog"
dbs.Execute strSQL

strSQL = "SELECT AbsenceID, EventID, FilePath, OldName, AltName, Attachment FROM qryEventLog"
Set rst = CurrentDb.OpenRecordset(strSQL, dbOpenSnapshot)

Do Until rst.EOF
lEID = rst!EventID
lAID = rst!AbsenceID
strPath = rst!FilePath
strFileName = Trim(Nz(rst!OldName, ""))

If strFileName = "" Then strFileName = strAltName
strFileName = lAID & "-" & lEID & strFileName & ".pdf"

strDestination = strPath & strFileName
If Dir(strDestination) <> "" Then Kill strDestination

'Get the size of the field.
strField = "Attachment"
lngFileLength = rst(strField).FieldSize

'Cancel if field is empty.
If lngFileLength = 0 Then
' can't process this one ...
Stop
Else
' alternate code example for extracting BLOBs ... didn't work either !
'Dim mStream As ADODB.Stream
'Set mStream = New ADODB.Stream
'mStream.Type = adTypeBinary
'mStream.Open
'mStream.Write rst.Fields(strField).Value
'mStream.SaveToFile strDestination, adSaveCreateOverWrite
'Set mStream = Nothing

strSQL = "INSERT INTO tblAttachmentLog (EventID, FilePath, FileName, FileSize) " & _
"VALUES (" & lEID & ",'" & strPath & "','" & strFileName & "'," & lngFileLength & ")"
dbs.Execute strSQL

'Calculate number of blocks to write and the leftover bytes.
intNumBlocks = lngFileLength \ BLOCKSIZE
lngLeftOver = lngFileLength Mod BLOCKSIZE

'Create pointer for to destination file.
intDestFile = FreeFile
Open strDestination For Output As intDestFile
Close intDestFile

'Open the destination file.
Open strDestination For Binary As intDestFile
strFileData = rst(strField).GetChunk(0, lngLeftOver)
Put intDestFile, , strFileData

'Read the leftover chunks and write it to output file.
For intI = 1 To intNumBlocks
DoEvents
strFileData = rst(strField).GetChunk((intI - 1) * BLOCKSIZE + lngLeftOver, BLOCKSIZE)
Put intDestFile, , strFileData
Next intI
Put intDestFile, , strFileData
Close intDestFile
End If

rst.MoveNext
Loop

Exit_Here:
Set dbs = Nothing
Set rst = Nothing
Exit Function

Err_WriteBLOB:
' log error here --
Resume Next
End Function
 
J

Jeff Boyce

I got a similar message when I already had Adobe Reader open.

Can you confirm that no other copies of Adobe Reader are open when you do
this?

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
S

Sylvain Lafontaine

Before storing them into an OLE Object (or Blob field), Access prepends a
header - usually of a length of 78 bytes but I'm not sure if this is always
this value under any condition - and you must strip it before extracting the
required binary file. You can easily verify that by making a comparaison
between the length of a PDF file before it is stored and after it has been
stored and manually extracted. See

http://bytes.com/groups/net/419364-images-ole-fields-access-database

--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Email: sylvain aei ca (fill the blanks, no spam please)
Independent consultant and remote programming for Access and SQL-Server
(French)
 
D

Danny J. Lesandrini

Sylvain:

Thanks so much. This is exactly the kind of thing I thought was going on. I'll
look into it today
 
S

Sylvain Lafontaine

A shame that there is no function provided by Access to provide this
information in the first place or even to automatically retrieve the stored
blob file without the header (or to write it with the header). Even (or
especially) with the OLE Object control, the support for images and for
other blob files like PDF or Word documents has always been - and is
still - one of the weakliest point of Access.

--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Email: sylvain aei ca (fill the blanks, no spam please)
Independent consultant and remote programming for Access and SQL-Server
(French)
 
D

Danny J. Lesandrini

Sylvain:

I tried stripping off 85 characters of the first block written, but the PDF still says it's corrupt.
I'm still looking for a solution to this problem. Maybe it's the Adobe version I have installed, but
it seems to me that shouldn't matter at all. I mean, all the code is doing is streaming bits to a file.
Shouldn't matter if you have an app on the machine that can open it or not. And if the file that is
created can't be (eventually) opened in Acrobat, then what's the point of extracting it.
 
D

David W. Fenton

I need to extract some PDF files from a SQL Server BLOB field,
linked in an Access MDB file. I found a couple different code
approaches, both of which are represented below, and both of which
I've tested out.

I've read the rest of the thread, and I wonder if the problem is
that you're using OLE in Access and a BLOB field in SQL Server. A
BLOB field knows nothing of OLE -- it just stores binary data. But
an OLE field takes the binary data and encloses it in a wrapper that
tells COM what to do with it. What you seem to need is the ability
to store the raw binary data without the OLE wrapper. I would
suggest that the way to do that is to *not* use an OLE control for
interacting with it.

I can't suggest much else, as this is beyond my area of expertise,
but that would be the direction I'd be going in if trying to work
this problem myself. You also might want to poke around Stephen
Lebans' website, as he's done a lot of this kind of work.
 
D

Danny J. Lesandrini

I wondered about that myself. I did some more searching and I *did*
find the utility on Steven Leban's site to extract docs ...

http://www.lebans.com/oletodisk.htm

I only had time to do a quick test, but the way it works, you open his
utility and point it at an Access table in another database. I pointed to
a database with a table link to SQL Server. The utility identified the
correct OLE field and processed the files. (I had to modify the rst
code slightly to accomodate for SQL Server, adding dbSeeChanges.)

Anyhow, I produced 3 PDF files and they all three worked fine. These
were the same 3 that failed on every other attempt.

Steven's approach requires that you put a DLL in the System32 folder
and it figures out by itself if the file is PDF, DOC, XLS, etc. Tonight I'll
run this on the whole 7000 files and see what I get.
 
S

Sylvain Lafontaine

Well, then I suppose that's not 85 caracters. Try with 78 car. or better,
verify with a known PDF what's the exact difference in length before and
after the storage. You can also make a verification on the exact beginning
of the file by looking at the first bytes.

--
Sylvain Lafontaine, ing.
MVP - Windows Live Platform
Email: sylvain aei ca (fill the blanks, no spam please)
Independent consultant and remote programming for Access and SQL-Server
(French)
 
I

IanG

Hi Danny

Do you end up with a solution for this?

I'm getting the same error but from a different scenario. I have uses
adding PDF files to a linked SQL table via the Access DB. The files are then
retrieved via an ASP.net website.

I had found examples of Read/Write functions, of which the write function is
close to your example (the read and write seem to be named with reference to
the file system rather than the DB so the read takes gets the file and
inserts it into the table while the write creates the fiels from the DB
content). Unfortunately I am getting the same error when the ASP code
retrives it.

I would be interested in your comments and any solution you can up with.

Kind regards
 
M

mchesonis

I don't know if this is related, if the file size of the pdf is less than
1.43mb than it will display correctly, if over 1.43mb than it gives the error
you documented. Does anyone know why the file size is limited?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top