Scanned text exported to Word

T

trizab

Version: 2008 Operating System: Mac OS X 10.6 (Snow Leopard) Processor: Intel Using an Epson 4990 scanner and Acrobat 8 Pro's OCR, scanned text is exported to a Word doc wonderfully. But Frames enclose the text, many or few. Right-clicking on the Frame brings up a contextual menu that includes Format Frame. Clicking on this brings up the Frame formatting box which includes Remove Frame which works.
Or I can copy and paste the contents of the Frames to another Word document.
This can be very tedious and time consuming as there may be many Framed enclosed text.
I'd like to eliminate the Frames from the Export altogether. Have looked for a Preference and such to do this but have not found anything yet.

Any suggestions would be appreciated.
 
C

CyberTaz

Unfortunately that's how OCR works. I haven't used the Acrobat facility
itself, but I would imagine that you aren't scanning a standard business
letter or comparable pages of text only. Documents that have structured
layouts and/or contain graphic objects, however, are a different story. The
only way the text content can be extracted from those documents is in
'chunks' of consecutive text & the only way they can be presented in Word is
in Frames which containerize each chunk.

There would probably be a VBA solution if Office 2008 supported VBA but that
isn't an option. I don't know if something can be done with Apple Script or
not, but you may get some additional response.
 
T

trizab

Thank you for your prompt response.
The source material I've been scanning contain lists of individual sentences, some with broken structure, others complete. All in B&W, no graphics or images. I'm scanning at 300 dpi which, in combination with Acrobat, gives almost error-proof OCR. The Framed text has been the only nuisance.
Always looking for the expedient way to do things, I appreciate your confirmation that the Frames are just to be dealt with.

Tricia
 
R

Randy Singer (MacAttorney)

On Jan 27, 6:49 am, (e-mail address removed) wrote:

While Acrobat Pro is a very powerful (and expensive) application, it
isn't the best OCR program for the Macintosh.

I use OmniPage Pro for OCR, and it is head and shoulders above any
other OCR program for the Macintosh. It maintains all formatting and
can create a perfect Word document.

While OmniPage Pro X has a suggested retail price of around $500:

OmniPage Pro X
http://WWW.NUANCE.COM/imaging/omnipage/omnipage-macintosh.asp

....I found out about this semi-secret deal, direct from Nuance!:

OmniPage Pro X for $99!
<http://shop.nuance.com/store/nuanceus/pd/productID.111905000/OfferID.
1177248709?
ClickID=ce7qv4axzwpsnkls7aqixllkxsqzaxxvikq&resid=lngSwwoBAkgAAE0or50AAAAV&rests=1249365496052>

or

http://is.gd/21Hwx

The downsides to OmniPage Pro for the Macintosh are that since the
program was purchased from Caere by Nuance, support is now non-
existent. If you have a problem with the program, you can forget
about getting any sort of useful advice from Nuance.

Some folks have had a hard time getting OmniPage Pro installed. The
work-around is to boot up as "root" and to install the program.

The above is a pain, but OmniPage is worth the trouble if you really
need a good OCR program.
 
T

trizab

Thanks for the suggestion. A light version of OmniPage was included with a long gone scanner. It didn't work well enough for me, too much time spent correcting. I tried opening it recently and it wouldn't so I uninstalled it.
In my search for a cheap OCR I found a few but the one that worked best was VelOCRaptor (http://velocraptor.com/). It's shareware so I could try it before buying it - $29.00, which I didn't.
Acrobat 8 Pro is included in the CS3 Design Standard I bought a few years back. It's OCR is wonderfully accurate. The Frames issue from exporting to Word is a bother but I've found somewhat expedient workarounds. When I had a glitch with losing formatting I learned about the Specials in Find/Replace through Bob Jones.
I've got what I need for what I'm doing. But I'll consider your suggestion when I need more.
 
J

John McGhie

Yeah, the issue you will always come up against is the same as you will get
attempting to convert from PDF: there is not enough "information" on the
page to build a proper, fully-functional Word document.

OmniPage Pro is perhaps the best of a bad job, in that what it gives you
will PRINT pretty much the same as the original.

However, when you try to "edit" the thing, or use the text somewhere else,
it's a disaster area: full of frames and lines and boxes and with text in
strange pieces.

Generally, I would simply recognise the text using whatever was the cheapest
OCR I could find. Then save the result as "Plain Text", which will get rid
of all the formatting entirely.

Then simply use Word's built-in abilities to re-create the document. You
can reformat a 100-page document from plain text to camera-ready final print
production standard in less than an hour if you have Word set up correctly:
that's what it is designed for.

But trying to keep the other guy's formatting and layout is an exercise in
frustration: the OCR program has to guess how he did that, and it can never
guess completely accurately.

Cheers

Thanks for the suggestion. A light version of OmniPage was included with a
long gone scanner. It didn't work well enough for me, too much time spent
correcting. I tried opening it recently and it wouldn't so I uninstalled it.
In my search for a cheap OCR I found a few but the one that worked best was
VelOCRaptor (http://velocraptor.com/). It's shareware so I could try it before
buying it - $29.00, which I didn't.
Acrobat 8 Pro is included in the CS3 Design Standard I bought a few years
back. It's OCR is wonderfully accurate. The Frames issue from exporting to
Word is a bother but I've found somewhat expedient workarounds. When I had a
glitch with losing formatting I learned about the Specials in Find/Replace
through Bob Jones.
I've got what I need for what I'm doing. But I'll consider your suggestion
when I need more.

This email is my business email -- Please do not email me about forum
matters unless you intend to pay!

--

John McGhie, Microsoft MVP (Word, Mac Word), Consultant Technical Writer,
McGhie Information Engineering Pty Ltd
Sydney, Australia. | Ph: +61 (0)4 1209 1410
+61 4 1209 1410, mailto:[email protected]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top