The James Joyce idea isn't that far off... you have to divine the gestalt
and feel dk's pain...
It sounds like the OCR program that's being used to convert the scanner
output to editable text is making a mess of the result. Many OCR programs
try to match the positioning of text on the original page by putting chunks
of text into frames, which then float in the resulting Word document. The
problem with this is that all of the frames on a page are usually anchored
to the first paragraph on the page, so there's no permanent record of the
order in which the frames should appear. Any sort of processing other than
very simple editing or printing causes the frames to move around, and often
go out of order. Saving to RTF and reopening the document is certainly going
to do that.
One possibility -- not a good one -- is to insert a separate empty text
paragraph for each frame and move the anchors to those paragraphs, in the
order they should appear. Since the frames are still floating, though, they
can still move out of position. A better solution is to cut the text from
the frames and paste it into plain text paragraphs, then delete the frames.
Unfortunately, there is no quick or automatic way to do this tedious job.
(Yes, there's a Remove Frame button in the Format Frame dialog, but it will
dump the frame's contents wherever the frame's anchor happens to be. This is
not a solution.)
Possibly there's some control in the OCR program that lets you turn off the
"match original page position" effect and get unframed plain text to start
with. If so, that would be the best solution.