I'm working with a long (150 page) gnarly document that was scanned in as a
word document. I'm using Word 2003. How do I delete all formatting in this
original document. Most specifically, all column commands and all page
breaks?
In the Replace dialog, click the More button. Click in the Find What
box and then click the Special button. Select "Manual Page Break" and
click OK. This puts the code ^m in the Find What box. Leave the
Replace With box blank, and click Replace All. That will remove all
manual page breaks.
Select the entire document (Ctrl+A), go to the Columns dialog, and
choose "One".
To remove all direct (non-style) paragraph formatting, you can select
everything and press Ctrl+A.
To remove all direct font formatting, select everything and press
Ctrl+spacebar.
If you just want to flatten the whole document to Normal style, select
all and press Ctrl+Shift+N.
There's probably another problem that's not so easily dealt with.
Scanning/OCR software often puts blocks of text into text boxes to try
to maintain absolute position. Although it's possible to write a macro
to get the text out of the boxes, it often isn't possible for the
macro to know where to put the text, so it winds up as a worse jumble
than the original.