WORD document size limit

B

Beyond X

A few weeks ago (July 8) I posted concerning the problems that occurred
to my Word document apparently because of its large size. Suzanne (M.S.
MPV) kindly responded with comment that the document size that can be
handled by M.S. Word has limitation of 32 MB. Considering today's HDD
size exceeding 100 GB and memory size more than 10 GB (and a CD of 700
MB), I am sure there are more than a few folks who would buid a document
far exceeding today's mere 32 MB. MS Word offers a way to get around
the 32 MB limitation by splitting the document into "subdocuments" and
managing them with "Master document", but it should be cumbersome enough
to deter constant editing and referencing within the documents, namely,
such a concept sounds ancient. For example, there are a number of
dictionaries far more than 50 MB sold on CD's. Of course we cannnot edit
them as they are. But it does not seem too difficult to write a software
that enables us to edit a dictionary, for instance, realized on HDD by
a WORD-like software.
Is there any effort going on in MS to expand (if not eliminating) the 32
MB limitation? If any one knows any word processor-like software that
permits a document with, say, 100 MB?
 
M

macropod

Hi Beyond X,

The maximum file size is limited to 32MB for the total document text only and does not include graphics, regardless of how the
graphics image is inserted (Link to file, Save with document, or Wrapping style) into the document. Therefore, if the file contains
graphics, the maximum file size can be larger than 32MB. See: http://support.microsoft.com/kb/211489

A 32MB document equates to 32*1048576 = 33,554,432 characters. At 1,000 words per page and 5 characters per word (plus a space),
getting to the character limit would require a document of almost 5,600 pages. Even if the character limit includes the formatting
metadata, you'd probably still end up with a document of more than 5,000 pages before the limit is reached. I think your concerns
overstate the significance of the limit.

The "Master document" 'feature' has been widely reported as unreliable and prone to corrupting all of the documents its associated
with. If you really need to link multiple documents, Word has much more stable tools, including INCLUDETEXT and RD fields.
 
R

roccogrand

If I may comment,

If you are trying to build a dictionary, why not use MS Access or SQL
Server? Access databases can be 2 GB and you can link them. And based on
what I have read, a full-blown SQL Server database is limited only by the
memory on your system.

Your observation explains why my large documents slow to a crawl and crash
when they become more than 25 Mb, especially with lots of graphics. Thanks.

HTH.

David
 
B

Bob Buckland ?:-\)

For some of what's going on for future Microsoft products you may want to visit http://research.microsoft.com :)

To add to macropod's reply. The size of the document on disk isn't the only consideration and with Word 2007 and a different file
format that can compress files, the 32 mb text limit may not be easily discernible by simply looking at file size. Files on disk
are compressed during save. The size of the content in memory, the speed of the processor, the memory in your graphics card and the
speed of processors and more for being able to load the file into memory for faster processing (i.e. so it wouldn't be switching in
and out to disk as you scrolled, or searched, or jumped to an endnote for example) are considerations, and take in to account being
able to be used by folks on a wide range of hardware and internet connection speeds.

Up/down loading to a shared workspace or network drive of a file that size if you don't have a fast connection can be an issue (and
how fast does it need to be for it to be 'fast'? <g>). You'd be limited in email destinations that could use it (although many
email programs can break down files then reassemble them).

For memory considerations, while a 3"x3" jpg picture might be as small as 1KB on disk, the 'memory size' when it's opened for
display in your document could be 263K. While that may not seem like a lot in '2GB of RAM', not all of that space is available for
manipulating a single document.

If you take a look at some of the features in Word, Sharepoint and document management and workflows, I'm guessing that what will
often be more common than creating (by typing) single, extra large documents would be rapid document assembly/publishing. As an
example the specification MS created for Office Open XML. Part 4, as a .DocX is 14,810KB in size. It has 5,255 pages in it and
according to word count in Word it has 1,145,697 'words'. Double clicking the file to open it, until it first appears took, in one
test, about 1:45, and then for Word to repaginate it to the current printer (Word doesn't 'think' in terms of pages) and jump to the
end of the document the first time took almost an additional 2 minutes and the CPU was generally spiked at 100% most of that time
(and that's with the check spelling as you type features turned off <g>) and not loading into memory any of the building block
galleries such as Insert=>QuickParts (that don't load until first use). When saved as a Word97-2003.doc (legacy) file that same
file (which then lost the use of Word 2007 features:

4 content controls,
896 equations [convert to pictures],
5 alignment tabs and
72 SmartArt graphics [convert to pictures] becomes

was 62,052KB and took (on a local, non-network/shared drive) 4+ minutes to save (and again tied up the computer from being able to
do much else. (The machine isn't the latest and greatest, but it's a not atypical work machine).

Making a change to a style used throughout the documents isn't instantaneous either (i.e. it's not a very enjoyable experience if
you had to work on this document, in Word, every day <g>).

Microsoft, in using and creating various developer tools to assemble all of the parts of the spec from its various database and work
files, edited and maintained by a team of folks, to produce all 5 parts of the document (not just the largest part, Part 4) in less
time than it took Word to save the .doc version of the one part <g> (That's pulling the data/content together, applying
formatting and generating final documents).

You can find out more through content on places such as http://channel9.msdn.com

Tools such as IBM's SoDA, (a bit of it is shown here)
http://www.ibm.com/developerworks/rational/library/jan07/karlsen_johnson
can be used to generate large documents with differing formatting from single content parts much more quickly than what even minor
changes might take for review and printing of a single, large Word document.

Proofing tools are not often regular documents, but structured database systems that can be fairly complex in their structure and
not easily edited without specialized tools.

Word as the user interface may not be the best tool for working with large structured files with multiple contributors. With the
new XML based document formats even folks at Microsoft are not always using Word to manipulate, build, or format Word 'documents'.
While Word document files are, in effect, database structures, database files can and do exist in not just gigabyte, but in terabyte
sizes, but you wouldn't edit them as a single really long 'page' (text stream) presented to you (which is in effect what Word does
show you) :) For Excel, Microsoft Office 2007 products include Server products to assist with working on really large spreadsheets
with complex, processor/memory hungry calculations. There may be more of that in the next version for Word. For this version being
able to work through Sharepoint supports more of the 'work on parts - assemble later' approach.

As to Word's 'Master/Sub' document feature, it's lineage comes from being able to move users, documents from DOS WordPerfect to Word
and retain similar features. It may be more stable with .docX files in Word 2007, but my guess is that in a future version we'd see
a tool that would take advantage more of the ability to work with the parts of the new file formats in a different way to be more
friendly.

=================
A few weeks ago (July 8) I posted concerning the problems that occurred
to my Word document apparently because of its large size. Suzanne (M.S.
MPV) kindly responded with comment that the document size that can be
handled by M.S. Word has limitation of 32 MB. Considering today's HDD
size exceeding 100 GB and memory size more than 10 GB (and a CD of 700
MB), I am sure there are more than a few folks who would buid a document
far exceeding today's mere 32 MB. MS Word offers a way to get around
the 32 MB limitation by splitting the document into "subdocuments" and
managing them with "Master document", but it should be cumbersome enough
to deter constant editing and referencing within the documents, namely,
such a concept sounds ancient. For example, there are a number of
dictionaries far more than 50 MB sold on CD's. Of course we cannnot edit
them as they are. But it does not seem too difficult to write a software
that enables us to edit a dictionary, for instance, realized on HDD by
a WORD-like software.
Is there any effort going on in MS to expand (if not eliminating) the 32
MB limitation? If any one knows any word processor-like software that
permits a document with, say, 100 MB? >>
--

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*
 
B

Beyond X

Thaks a lot for your comments.
My Word document currently run:
2,677 pages
360,735 words
1,956,455 characters
These numbers seem far below the coresponding numbers you quoted.
However, Explorer indicates the document size is 31,896 KB, which is
about 32 MB and reaching the limit. Can you explain why such a
discrepancy in calculating the document size? (My document includes some
jpg graphics but not much.) Am I still safe to continue?
 
M

macropod

Hi Beyond X,

In addition to the text, Word needs to hold a mass of 'metadata' about the formatting of every paragraph, all the image data for the
jpegs, and various other bits of information about the document (eg all the document properties, creation & print dates, author,
etc.). Consequently, the actual text can be much less than the overall file size.
 
R

roccogrand

I am still curious why you need a 10 GB Word document. My dissertation was
six hundred pages and fit on one floppy. What kinds of documents do you
create that are so large?

David
 
B

Beyond X

Thanks for quick response.
In order to get a picture of what is going on, I did some experiments as
follows:
The current Word document size is (-.doc), as I stated previously, about
32 MB.
I converted it to Notepad document (-.txt) and found the document size
shrank to 2.7 MB or about 8%.
When a picture is saved as -.doc using Word, the file size is 163 MB. If
it is saved as -.jpg file, the size is 20 KB. (To be sure, Notepad
cannot take graphics.) You say that inserted pictures do not count to
the max size for a Word document. So the amount of inserted pictures
should not matter, any way.
So I realize the Word software means realization of minor (but perhaps
very convenient for some people) features in word processing in the
sacrifice of document size.

What would I miss if I work the document with Notepad (or perhaps
Wordpad) beside absence of pictures and less options in Search (Find)?
Not much, as long as Notepad permits a larger document size. Is this true?
 
M

macropod

Hi Beyond X,

For starters, you lose all capacity to format text with attributes like bold, italics, underline, colour, etc and the ability to
format paragraphs as justified, centered, etc. With Notepad, there's no capacity to insert images, tables, autoshapes etc, or to use
bookmarks, cross-references, footnotes, links to other files. The list goes on and on. Notepad is not a Word processor - it's a text
editor and all you get with it is the plain text. I think you'll also find that Notepad slows to a crawl with large text files.

Besides, when was the last time you created a 5,000 page document, consisting of nothing more than a narrative?
 
G

grammatim

Back when he was Marcel Proust, or Jules Romains, or Vardis Fisher, or
Anthony Powell, or Patrick O'Brian, or ...

Hi Beyond X,

For starters, you lose all capacity to format text with attributes like bold, italics, underline, colour, etc and the ability to
format paragraphs as justified, centered, etc. With Notepad, there's no capacity to insert images, tables, autoshapes etc, or to use
bookmarks, cross-references, footnotes, links to other files. The list goes on and on. Notepad is not a Word processor - it's a text
editor and all you get with it is the plain text. I think you'll also find that Notepad slows to a crawl with large text files.

Besides, when was the last time you created a 5,000 page document, consisting of nothing more than a narrative?

--
Cheers
macropod
[MVP - Microsoft Word]

Beyond X said:
Thanks for quick response.
In order to get a picture of what is going on, I did some experiments as follows:
The current Word document size is (-.doc), as I stated previously, about 32 MB.
I converted it to Notepad document (-.txt) and found the document size shrank to 2.7 MB or about 8%.
When a picture is saved as -.doc using Word, the file size is 163 MB. If it is saved as -.jpg file, the size is 20 KB. (To be
sure, Notepad cannot take graphics.) You say that inserted pictures do not count to the max size for a Word document. So the
amount of inserted pictures should not matter, any way.
So I realize the Word software means realization of minor (but perhaps very convenient for some people) features in word
processing in the sacrifice of document size.
What would I miss if I work the document with Notepad (or perhaps Wordpad) beside absence of pictures and less options in Search
(Find)?
Not much, as long as Notepad permits a larger document size. Is this true?
 
B

Bob Buckland ?:-\)

Hi Beyond X.,

I'm not sure I'm following part of your statistics. You mentioned that addding a single, 20KB .JPG to a Word document caused the
Word document to go to 163MB with that one insertion?

That would tend to indicate one or more of the following possibilities contributing to sizing of the document if the document is not
corrupted.

a. Pictures being inserted from Insert=>Object rather than Insert=>Picture

b. Pasting pictures in rather than inserting them.

c. Using 'Allow Fast Saves' under Tools=>Options=>Save

d. Saving in a file format other than 'Word document' (such as Word 6-95).

e. The Word document maintaining duplicate copies of each picture.

What version of Word are you using for this document?

============
Thanks for quick response.
In order to get a picture of what is going on, I did some experiments as
follows:
The current Word document size is (-.doc), as I stated previously, about
32 MB.
I converted it to Notepad document (-.txt) and found the document size
shrank to 2.7 MB or about 8%.
When a picture is saved as -.doc using Word, the file size is 163 MB. If
it is saved as -.jpg file, the size is 20 KB. (To be sure, Notepad
cannot take graphics.) You say that inserted pictures do not count to
the max size for a Word document. So the amount of inserted pictures
should not matter, any way.
So I realize the Word software means realization of minor (but perhaps
very convenient for some people) features in word processing in the
sacrifice of document size.

What would I miss if I work the document with Notepad (or perhaps
Wordpad) beside absence of pictures and less options in Search (Find)?
Not much, as long as Notepad permits a larger document size. Is this true? <<
--

Bob Buckland ?:)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top