HOW TO: embed binary data in Excel - an odyssey

D

dermot

Imagine if you could embed binary data within an Excel
file. You could
* encrypt and store confidential data extremely securely
* store compressed data compactly
* include setup files inside a single Excel file
and "unpack" them when running it the first time (useful
for web downloads), eg a little DLL

I'll illustrate the second of these. I had some data I
needed to include in a workbook to use with my code.
* Size of data as a stand-alone CSV file: 670k
* Size of the same data included in an otherwise empty
XLS file: 2,100k (bloatware!)
* Size of the same data compressed and embedded in an XLS
file: just 230k, including 50k of VBA compression code

So how can you embed data in an Excel file? It was an
odyssey for me because Excel is so inconsistent in the
way it stores things.

My first thought was a bitmap, because they get stored
pixel for pixel in the Excel file on disk. I planned to
create an empty bitmap of the right size inside the
workbook, save the workbook, and then overwrite the
picture part of the file on disk, with my data. The
problem was that no matter what I did, Excel wouldn't
leave the bitmap alone. Yes, it stored it all right, but
it kept changing the bits, or inserting chunks of its own
code at intervals throughout the bitmap. Very frustrating.

I looked at other standard "containers" that could
include binary data, considering only standard objects
included with Excel. I didn't find any.

So then I looked at text containers. Most of them, such
as file properties and textboxes, had severe length
limits. However, the userform Textbox control could take
an indefinite string, it seemed. The only problem was
that certain ASCII characters caused the string to
truncate.

So I "remapped" the ASCII strings. I discovered which
characters caused the problem, and replaced them by two
characters - character 1 (which I added to the list of
prohibited characters so it could be used for this
purpose) and the offending character, shifted 20
characters forward to a less offensive character. So for
example, character 10 would be replaced by characters 1
and then 30. When decoding, every time you hit character
1, you skip it and subtract 20 from the next character.
This remapping increased the length of a string by about
3%, quite a small penalty.

After all this effort, it worked fine - except that it
became very, very slow for long strings over 100k. Grrrr..

I tried just about everything, even worksheet
DocumentProperties, which seem to be new with XP. But
would you believe it? Each character is stored as two
characters, the first being a zero, so that doubles the
original length of the string.

Eventually, after lots of testing, I tried cellnotes.
They can take 32,767 bytes each, and they dislike about
25 characters (different ones from the textbox,
naturally), but are fast. Interestingly, when Excel hits
one of the offending characters, it doesn't truncate the
string. Instead, it allocates two characters to ALL of
the original characters, even those which were not
offensive, doubling the string length. So as long as I
remap my data to avoid those ASCII characters, I avoid
doubling up. This remapping adds about 12% to the length
of the string, but that's acceptable.

The result is that using only VBA, you can store binary
data efficiently inside an Excel workbook.

I should add that I have pure VBA compression code which
does a little better than Zip but is of course a lot
slower - so you can do everything with VBA.

I'll post some sample code soon on my site, for anyone
who is interested.

Dermot Balson
Free VBA code for user interfaces, internet connectivity,
encryption
http://www.webace.com.au/~balson/InsaneExcel/Default.html
Last updated March 2003
 
B

Bill Lunney

There are established encoding techniques for this kind of thing. Encoding
of binary data within the standard printable ASCII range is a necessity for
some mail format transmission protocols etc.

Personally I think I would write a class to do this using one of the
traditional methods. The code is likely to be already available and just
needs wrapping. Of course there is a bit of an overhead in terms of size
and (particularly in the case of VBA) execution speed.

As a side note characters stored in the two byte format you mention is
standard UNICODE. It's necessary for Asian alphabets etc. where 255 isn't
enough.

All this said I think I may leverage of your experiences should I need to do
this kind of thing. It's clear you've put a reasonable amount of effort
into this!



--

Regards,


Bill Lunney
www.billlunney.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top