Printouts are pictures
Unfortunately Yes.
AFAICS this construction was not the best of all possible solutions.
An *Import* feature working on known file-formats IMO would have been
a preferable solution. In the case of PDFs f.e. an instrument as used
in "Abbey PDF-Transformer" (which produces really fine formatted
output to WinWord [AFAICS based on Abbey's expertise of OCR software])
would have been ways better than sending text through a printer and
then re-cerate text by OCR. This seems a bit crazy to me.
So yes, they are of course OCRed.
But to which result?
A really bad one! (see below)
Right-click on one of those printouts and select copy all text.
Then paste the text somewhere else and take a look.
You can see for yourself then whether the quality of the OCR is
the issue or WDS.
Thanks for the suggestion!
It reveals how badly OCR is implemented in ON.
ON's OCR is the culprit, not WDS.
I.
1.) Sorry to say so: The OCR produces output hardly usable for a
search.
Unfortunately I cannot make any attachments, so pls permit longer
input here:
a) Result of Copy+Paste in Acrobat:0-110 Polizei 367 E106 Blessing, Peter, Dr.
0-112 Feuerwehr KÜN-190/156 C206 Bleyel, Bernd
0-19222 Rettungsleitstelle 318/467 D040 Bluthardt, Christian
A 221 A214 Bochert, Ralf, Dr.
367 E106 Ahrens, Uwe, Prof. 230/281/285 A304 Boelke, Klaus, Dr.
0-579796 A014 AISEC 393 E141 Boese, Jürgen, Dr.
263/264 B026 Akademisches Auslandsamt 280 C040 Böhm, Hugo
375 Y104 Albrecht, Tobias 326 C009 Bossack, Sandra
KÜN-137 A406 Albrecht, Wolfgang, Dr. 202 B007 Böttcher, Michael
432 F015 Asche, Gerd 449 Y006 Bouché, Daniel
207 A011 Asta (0-251460) 90 A013 Braner, Hannelore
0-506348 A012 Asta HN (Fax) 554 B001b Bräsel, Martina
KÜN-155 C105a Asta KÜN (KÜN-544756) 430 Z005 Bray, Laurent, Dr.
KÜN-53078 C105a Asta KÜN (Fax) KÜN-218 D110.1 Brazel, Christa
KÜN-208 A117 Auerbach, Achim 218 A204 Brecht, Ulrich, Dr.
288 C035 Aufenthaltsraum KÜN-211 D013.1 Breitenbacher, Manuel
640 A Aufzug 1-3 KÜN-166/167 C016 Breitkreuz, Ehrenfried
641 B Behindertenaufzug 260 B023 Brnic, Sonja
644 D Aufzug 321 D110 Brückner, Hans
646 E Aufzug 384 F129 Bucher, Georg, Dr.
645 F Aufzug 221 A214 Buer, Christian, Dr.
403 F222 Auth, Werner, Dr. KÜN-252 D219 Burk, Uwe
<<
Words are separated by blanks. Easy to be indexed and used in a
search.
b) Copy+Paste from ON (input from PDF via ON printer)0-110Polizei 367E106Blessing, Peter, Dr.
0-112Feuerwehr KÜN-190/156C206Bleyel, Bernd
0-19222Rettungsleitstelle 318/467D040Bluthardt, Christian
A 221A214Bochert, Ralf, Dr.
367E106Ahrens, Uwe, Prof.230/281/285A304Boelke, Klaus, Dr.
0-579796A014AISEC393E141Boese, Jürgen, Dr.
263/264B026Akademisches Auslandsamt280C040Böhm, Hugo
375Y104Albrecht, Tobias 326C009 Bossack, Sandra
KÜN-137A406Albrecht, Wolfgang, Dr.202B007Böttcher, Michael
432F015Asche, Gerd 449Y006Bouché, Daniel
207A011Asta (0-251460)90A013Braner, Hannelore
0-506348A012Asta HN (Fax)554B001bBräsel, Martina
KÜN-155C105aAsta KÜN (KÜN-544756)430Z005Bray, Laurent, Dr.
KÜN-53078C105aAsta KÜN (Fax)KÜN-218D110.1Brazel, Christa
KÜN-208A117Auerbach, Achim218A204Brecht, Ulrich, Dr.
288C035Aufenthaltsraum KÜN-211D013.1Breitenbacher, Manuel
640AAufzug 1-3KÜN-166/167C016Breitkreuz, Ehrenfried
641BBehindertenaufzug260B023Brnic, Sonja
644DAufzug321D110Brückner, Hans
646EAufzug384F129Bucher, Georg, Dr.
645FAufzug221A214Buer, Christian, Dr.
403F222Auth, Werner, Dr. KÜN-252D219Burk, Uwe
<<
Separation of words only if following comma +blank (", ").
2.) I'm sure that you'll agree that a search cannot work at all with
text-materiel like that.
MOST URGENT fix needed.
II.
Words separated by comma+blank are found in the search.
If there are multiple hits on a page the hits are not shown on the
list.
III.
As we are at it:
The search engine implemented in ON could be at least a bit better.
There are no options at all, neither using truncated search
(wildcards), nor a combined search using the Boolean algebra.
I would have expected that at least an "expert mode" would be provided
and at least something like Acrobat offers would be available in ON
(not talk about askSam's features).
Although I would prefer to have things from PDFs in ON, I guess that
in order to be able to perform intelligent searches I will have to
stick with Acrobat for PDFfed material and askSam for other material
[siiiigh]
Rainald
(who is seriously disappointed)