Print to PDF from Word 2008 results in large (huge!) files

T

Tom_R._in_OK

Version: 2008
Operating System: Mac OS X 10.5 (Leopard)
Processor: Intel

I have noticed that when creating a PDF of a Word 2008 document (by choosing "print to PDF" from the Print dialog box) results in inordinately large PDF files. Most recent case in point - I had a 44 page Word document, with no images, only 2 fonts, and 6 tables, which was less than 1 megabyte when saved as a .docx file. When I printed to PDF, the resulting PDF was 38 megabytes.

As a counter example, I printed the entire Gutenberg version of "Thuvia, Maid of Mars", containing 116 pages, (http://www.gutenberg.org/files/72/72-h/72-h.htm) from Safari to PDF, and got a PDF that was only 520 kb.

Emailing PDF versions of my Word files to clients is nearly a daily requirement for me; needless to say, a 38 megabyte PDF is not something I can email.

Is this a known problem with Word? Is there any known workaround for this?

Thanks for any advice,

Tom
 
E

Elliott Roper

Version: 2008
Operating System: Mac OS X 10.5 (Leopard)
Processor: Intel

I have noticed that when creating a PDF of a Word 2008 document (by choosing
"print to PDF" from the Print dialog box) results in inordinately large PDF
files. Most recent case in point - I had a 44 page Word document, with no
images, only 2 fonts, and 6 tables, which was less than 1 megabyte when saved
as a .docx file. When I printed to PDF, the resulting PDF was 38 megabytes.
Is this a known problem with Word? Is there any known workaround for this?

That should not be happening. Expect such a PDF to be between half and
double the size of the original.

Word hands off the PDF job to OS X, although it sometimes tries to
confuse it with the wrong flavour of images, which is not happening on
yours of course.

Before printing, try Page Set Up and changing the printer to "Any
Printer". It is a wild guess, but if you do not have a Postscript
capable printer selected, you might be producing a huge bitmap.

Open the offending PDF in Preview.app and zoom in hugely. Are there
jaggies? If yes, you have a bitmap.
 
M

Michel Bintener

Open the PDF file with an application called ColorSync Utility, which
resides in the Utilities folder. Use Spotlight to find it. In the PDF
window, click on the dropdown box in the lower left corner and choose Reduce
File Size, then click on Apply in the lower right corner. When you're done,
save the file. See if the filter you've just applied has any effect on the
file size.

Apart from that, there are no features built into Word that allow you to
define the size of a PDF file.


Version: 2008
Operating System: Mac OS X 10.5 (Leopard)
Processor: Intel

I have noticed that when creating a PDF of a Word 2008 document (by choosing
"print to PDF" from the Print dialog box) results in inordinately large PDF
files. Most recent case in point - I had a 44 page Word document, with no
images, only 2 fonts, and 6 tables, which was less than 1 megabyte when saved
as a .docx file. When I printed to PDF, the resulting PDF was 38 megabytes.

As a counter example, I printed the entire Gutenberg version of "Thuvia, Maid
of Mars", containing 116 pages,
(http://www.gutenberg.org/files/72/72-h/72-h.htm) from Safari to PDF, and got
a PDF that was only 520 kb.

Emailing PDF versions of my Word files to clients is nearly a daily
requirement for me; needless to say, a 38 megabyte PDF is not something I can
email.

Is this a known problem with Word? Is there any known workaround for this?

Thanks for any advice,

Tom

--
Michel Bintener
Microsoft MVP
Office:mac (Entourage & Word)

*** Please always reply to the newsgroup. ***
 
T

Tom_R._in_OK

Michael and Elliot, thank you for the replies.

Running ColorSync and applying the "reduce size" filter to the PDF did not have any effect on its size.

I opened the PDF in Preview to look for jaggies and found none. I confirmed that it is not a bitmap by searching for text (successfully) within the PDF.

I tried the recommendation to format the Word document for any printer in page set-up; this resulted in a PDF that was only 9 megs instead of 38 megs, which is a great improvement but still seems unusually large (and still too large to conveniently email files).

Originally, the Word file was formatted for an HP LaserJet 2015 with PCL 6 installed.

Thanks,

Tom
 
E

Elliott Roper

Michael and Elliot, thank you for the replies.

Running ColorSync and applying the "reduce size" filter to the PDF did not
have any effect on its size. T
I opened the PDF in Preview to look for jaggies and found none. I confirmed
that it is not a bitmap by searching for text (successfully) within the PDF. good trick!
I tried the recommendation to format the Word document for any printer in
page set-up; this resulted in a PDF that was only 9 megs instead of 38 megs,
which is a great improvement but still seems unusually large (and still too
large to conveniently email files). mmm...
Originally, the Word file was formatted for an HP LaserJet 2015 with PCL 6
installed.
Thanks for posting back. There are so many different set-ups out there,
it is hard to tell if one is posting bad advice.

9 from 38 is interesting. You don't have a Postscript printer available
to you?
Could you try something else? In the print dialog, probably under PDF,
can you see something like "save as postscript"? Do so, then open the
result in Mac's Preview.app and save that as PDF. If it lets you, how
big is that?
(That's my trick for convincing Word to use eps illustrations instead
of their bitmap preview) There is a faint chance that procedure will
leave behind whatever is causing that outrageous bloat.

This is mostly to satisfy my own morbid curiosity. If it does reduce
the pdf size, we can work together to find out what the real cause is.
I'm wondering if it follows a particular document, font or template
choice.
 
D

Daiya Mitchell

Elliott said:
Could you try something else? In the print dialog, probably under PDF,
can you see something like "save as postscript"? Do so, then open the
result in Mac's Preview.app and save that as PDF. If it lets you, how
big is that?
(That's my trick for convincing Word to use eps illustrations instead
of their bitmap preview) There is a faint chance that procedure will
leave behind whatever is causing that outrageous bloat.

While experimenting:

Word 2008 also has an option to Save As PDF, in the File | Save As
dialog (change Format to PDF), NOT the File | Print dialog. That might
produce different results, so try that too.

I'd also be inclined to run a document through the corruption fixes and
see if the rebuilt doc still produces the large PDF.
http://word.mvps.org/Mac/DocumentCorruption.html
 
C

CyberTaz

Just another vote in favor of Daiya's description - The size, structure &
extent of revision of the 6 tables could very well account for corruption
contributing to the output file size.

Another thought: Has the Track Changes feature ever been used in that file?

Regards |:>)
Bob Jones
[MVP] Office:Mac
 
E

Elliott Roper

CyberTaz said:
Just another vote in favor of Daiya's description - The size, structure &
extent of revision of the 6 tables could very well account for corruption
contributing to the output file size.

Another thought: Has the Track Changes feature ever been used in that file?

I kept *most* mousy quiet about table corruption and track changes
bloat because I don't use Word 2008. Silly me. I had assumed .docx
meant no more corrupted detritus sloshing about in the bilges after
track changes and tables were invoked.

In any case, what possible mechanism could there be for that stuff
finding its way to a PDF?

How naive of me!

All is for the best, in the best of all possible Words.

Elliott "Pangloss" Roper.
 
T

Tom_R._in_OK

Okay, wow, thanks for all the suggestions.

(Note, I wrote this in order of things I tried, but the best results appear to have to do with the corrupt document idea, which is at the bottom of my post.)

First, an update on description of file in question.

I had forgotten that there were more than 6 tables. There are 6 moderately large tables (5 columns, between 15-45 or 50 rows), and many small tables of 3 columns by 5 or 6 rows. Probably around 50-60 of the smaller tables. There is one section break.

Second, a report on the suggested experiments.

PRINTING TO POSTSCRIPT

Printing to postscript resulted in two separate files - filename.ps and filename.2.ps. These two files contained only 28 of the 44 pages of the original. Converting these pages to PDF resulted in two PDFs sized 14.2 and 9 Megs, respectively. The pages missing were 16 pages immediately preceding the section break. The 2nd file began with the section break.

SAVING AS PDF

Saving as PDF, as opposed to Printing as PDF from the Print dialog box, resulted in two separate PDF files, containing all of the 44 pages between them. These files were sized 8.7 and 29.7, respectively.

CHECKING FOR CORRUPTION

Saving as a webpage then converting back lost significant formatting, so I didn't bother trying to PDF it.

Copying all but the last paragraph symbol and pasting into a new document, and then printing to PDF resulted in two PDFs, containing all 44 pages; these files were sized only 156 kb and 316 kb, each. I lost a little bit of formatting (certain styles had been modified, and it appears the styles reverted back to the Microsoft-defined versions).

So, it looks like the file was probably corrupt. But Now there's the issue of the Word documenting creating two PDF documents. That's a minor inconvenience, but I can easily put them back together using a program I have called PDFPen.
 
E

Elliott Roper

Okay, wow, thanks for all the suggestions.
Excellent report. Thanks for that!
<snip>
So it *was* a corrupted document. That should not have made it through
to what was printed. That is a bug on top of a bug.

About the 2 pdfs. That is a famous bug which happens whenever a section
break changes page margins or orientation.

You solution (pdf post-production putting humpty-dumpty back together
again) is the only rational course until *that* bug is fixed.

Some time before hell freezes over by the look of it.
 
C

CyberTaz

Hello Elliott -

Interjections below:


I kept *most* mousy quiet about table corruption and track changes
bloat because I don't use Word 2008. Silly me. I had assumed .docx
meant no more corrupted detritus sloshing about in the bilges after
track changes and tables were invoked.

As is my understanding as well [pregnant pause while awaiting the inevitable
"but..."]
In any case, what possible mechanism could there be for that stuff
finding its way to a PDF?

Well, for one thing it wasn't explicitly stated that the doc originated in
2008. It may have been, in fact, a .doc in Compatibility Mode. There's also
no guarantee that much of the table content wasn't copied from other
sources. And, although the .docx format should avoid *generating* the crud
there's nothing to say that it can't carry it along once it's dumped in. I
don't know of anything which suggests that saving a .doc in .docx "purges
it" of all sins & afflictions visited upon it in the past.
How naive of me!

Not really - you were just pursuing from a different perspective :)
All is for the best, in the best of all possible Words.

And sometimes you just make a lucky guess ;-)))
Elliott "Pangloss" Roper.

Regards |:>)
Bob Jones
[MVP] Office:Mac
 
C

CyberTaz

Yes Tom - I certainly join with Mr. Roper in commending you for the generous
contribution of your findings. I'm sure the information will be useful to
many who may have a similar experience.

Regards |:>)
Bob Jones
[MVP] Office:Mac
 
D

Daiya Mitchell

So, it looks like the file was probably corrupt. But Now there's the issue of the Word documenting creating two PDF documents. That's a minor inconvenience, but I can easily put them back together using a program I have called PDFPen.

Indeed, thanks for the report and identifying a new sign of corruption.
Just FYI--Preview in Leopard lets you join separate PDF files.

Tables can corrupt, by the way, so with 60 tables--yeah, corrupt would
have been our first guess. :)

It's possible that one of the Paste Options would have saved your
formatting when you did the Paste-Without-Last-Paragraph-Mark.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top