Can I automatically set all pictures to be enabled for text search

Alex · Jul 12, 2006

Right click the picture, point at Make Text in Image Searchable, and choose
Disable. Then do this again and choose English. Unfortunately you will have
to do this individually for each picture.

Grant Robertson · Jul 12, 2006

Unfortunately you will have
to do this individually for each picture.

Please tell me that you won't really expect users who have been using the
beta to manually go through all of the possibly hundreds of pages with
potentially thousands of imported "printouts" and manually do this for
each and every one. I'm sorry, but isn't that what the heck computers are
for? It always amazes me when developers think it is not a big deal for
users to perform hundreds of tedious manual steps in software on a
computer when the computer could have just done all that work. At the
very least, please provide a separate power toy that will go through all
our OneNote files and reset this setting for each "printout." Otherwise,
you are essentially just throwing out one of the main reasons people
would have to use OneNote in the first place. The main thing you guys
constantly sell us on in your marketing. "The one place to put all our
notes where we will be able to _search_ and find everything." What good
does it do when half of a users notebooks are unsearchable unless they
manually do all that work?

Rainald Taesler · Jul 12, 2006

Patrick,
thanks for keeping with me on this issue.

Am Mittwoch, 12. Juli 2006 17:12 schrieb Patrick Schmid
[email protected]:

Does it use German for you?

Yes, it did.
As you might see form the samples in my last posting (below) the
Umlauts were recognized correctly.

Right-click and select make
searchable. Is German selected?

Yes it was.

OCR is in general extremely language dependent and
if English is selected there I am not surprised about this outcome.

Only if it comes to certain issues as recognizing words with Umlauts
etc.
Just the "ö" and "Ä" and the "ü" are treated differently:
"Böttcher" with English OCR shows as "BOttcher" and "Bräsel" became
"Brazel".

But as obvious in the samples I sent, the language used for OCR is
without any influence on the issue dealt with here.
Possibly with Chinese <g> the separation of words might be an issue.
But the serious mistake to not separate words but glue everything
together (pls have a closer look again on the samples) in languages
with Latin characters has absolutely nothing to do with how strings
are built. And in the case given it was just names and figures which
won't make sense to an OCR dictionary in any of the languages anyway
;-)
So this theory does not lack of some crudeness ;-).

But manual OCR brought the solution of the riddle. I've got it now:
It's a bug in the *automatic* indexing.

The sample b) was taken from a document which had not been treated
with manual OCR. The text behind the visual layer had been generated
automatically.

If the same document is treated with manual OCR, however, the result
is correct !!!

Sample c)
result of document after running *manual* OCR:

0-110 Polizei 367 E106 Blessing, Peter, Dr.
0-112 Feuerwehr KÜN-190/156 0206 Bleyel, Bernd
0-19222 Rettungsleitstelle 318/467 D040 Bluthardt, Christian
A 221 A214 Bochert, Ralf, Dr.
367 E106 Ahrens, Uwe, Prof. 230/281/285 A304 Boelke, Klaus, Dr.
0-579796 A014 AISEC 393 E141 Boese, Jürgen, Dr.
263/264 B026 Akademisches Auslandsamt 280 0040 Böhm, Hugo
375 Y104 Albrecht, Tobias 326 0009 Bossack, Sandra
KÜN-137 A406 Albrecht, Wolfgang, Dr. 202 B007 Böttcher, Michael
432 F015 Asche, Gerd 449 Y006 Bouch, Daniel

Now compare this to sample b) below.
I'm sure that thereafter you'll agree that there is a really serious
bug in the OCR engine which most urgently needs to be fixed in order
to make the result usable.

I would write it up and post in "Connect" could I get in :-( :-(

Thanks again for your suggestions.
Although it was the wrong direction <g>, experimenting on them brought
me to find out what really is the cause of the search problems.

Rainald
P.S. I would be grateful for your comments on my points II. and III.
(below)

How about things *printed* to ON?

Click to expand...

Printouts are pictures

Click to expand...

Unfortunately Yes.
AFAICS this construction was not the best of all possible
solutions. An *Import* feature working on known file-formats
IMO would have been a preferable solution. In the case of PDFs
f.e. an instrument as used in "Abbey PDF-Transformer" (which
produces really fine formatted output to WinWord [AFAICS based
on Abbey's expertise of OCR software]) would have been ways
better than sending text through a printer and then re-cerate
text by OCR. This seems a bit crazy to me.

So yes, they are of course OCRed.

Click to expand...

But to which result?
A really bad one! (see below)

Right-click on one of those printouts and select copy all text.
Then paste the text somewhere else and take a look.
You can see for yourself then whether the quality of the OCR is
the issue or WDS.

Click to expand...

Thanks for the suggestion!
It reveals how badly OCR is implemented in ON.

ON's OCR is the culprit, not WDS.

I.
1.) Sorry to say so: The OCR produces output hardly usable for a
search.
Unfortunately I cannot make any attachments, so pls permit
longer input here:

a) Result of Copy+Paste in Acrobat:0-110 Polizei 367 E106 Blessing, Peter, Dr.
0-112 Feuerwehr KÜN-190/156 C206 Bleyel, Bernd
0-19222 Rettungsleitstelle 318/467 D040 Bluthardt, Christian
A 221 A214 Bochert, Ralf, Dr.
367 E106 Ahrens, Uwe, Prof. 230/281/285 A304 Boelke, Klaus, Dr.
0-579796 A014 AISEC 393 E141 Boese, Jürgen, Dr.
263/264 B026 Akademisches Auslandsamt 280 C040 Böhm, Hugo
375 Y104 Albrecht, Tobias 326 C009 Bossack, Sandra
KÜN-137 A406 Albrecht, Wolfgang, Dr. 202 B007 Böttcher, Michael
432 F015 Asche, Gerd 449 Y006 Bouché, Daniel
207 A011 Asta (0-251460) 90 A013 Braner, Hannelore
0-506348 A012 Asta HN (Fax) 554 B001b Bräsel, Martina
KÜN-155 C105a Asta KÜN (KÜN-544756) 430 Z005 Bray, Laurent, Dr.
KÜN-53078 C105a Asta KÜN (Fax) KÜN-218 D110.1 Brazel, Christa
KÜN-208 A117 Auerbach, Achim 218 A204 Brecht, Ulrich, Dr.
288 C035 Aufenthaltsraum KÜN-211 D013.1 Breitenbacher, Manuel
640 A Aufzug 1-3 KÜN-166/167 C016 Breitkreuz, Ehrenfried
641 B Behindertenaufzug 260 B023 Brnic, Sonja
644 D Aufzug 321 D110 Brückner, Hans
646 E Aufzug 384 F129 Bucher, Georg, Dr.
645 F Aufzug 221 A214 Buer, Christian, Dr.
403 F222 Auth, Werner, Dr. KÜN-252 D219 Burk, Uwe
<<
Words are separated by blanks. Easy to be indexed and used in a
search.

b) Copy+Paste from ON (input from PDF via ON printer)

0-110Polizei 367E106Blessing, Peter, Dr.
0-112Feuerwehr KÜN-190/156C206Bleyel, Bernd
0-19222Rettungsleitstelle 318/467D040Bluthardt, Christian
A 221A214Bochert, Ralf, Dr.
367E106Ahrens, Uwe, Prof.230/281/285A304Boelke, Klaus, Dr.
0-579796A014AISEC393E141Boese, Jürgen, Dr.
263/264B026Akademisches Auslandsamt280C040Böhm, Hugo
375Y104Albrecht, Tobias 326C009 Bossack, Sandra
KÜN-137A406Albrecht, Wolfgang, Dr.202B007Böttcher, Michael
432F015Asche, Gerd 449Y006Bouché, Daniel
207A011Asta (0-251460)90A013Braner, Hannelore
0-506348A012Asta HN (Fax)554B001bBräsel, Martina
KÜN-155C105aAsta KÜN (KÜN-544756)430Z005Bray, Laurent, Dr.
KÜN-53078C105aAsta KÜN (Fax)KÜN-218D110.1Brazel, Christa
KÜN-208A117Auerbach, Achim218A204Brecht, Ulrich, Dr.
288C035Aufenthaltsraum KÜN-211D013.1Breitenbacher, Manuel
640AAufzug 1-3KÜN-166/167C016Breitkreuz, Ehrenfried
641BBehindertenaufzug260B023Brnic, Sonja
644DAufzug321D110Brückner, Hans
646EAufzug384F129Bucher, Georg, Dr.
645FAufzug221A214Buer, Christian, Dr.
403F222Auth, Werner, Dr. KÜN-252D219Burk, Uwe
<<
Separation of words only if following comma +blank (", ").

2.) I'm sure that you'll agree that a search cannot work at all
with text-materiel like that.
MOST URGENT fix needed.

II.
Words separated by comma+blank are found in the search.
If there are multiple hits on a page the hits are not shown on
the list.

III.
As we are at it:
The search engine implemented in ON could be at least a bit
better. There are no options at all, neither using truncated
search (wildcards), nor a combined search using the Boolean
algebra. I would have expected that at least an "expert mode" would
be
provided and at least something like Acrobat offers would be
available in ON (not talk about askSam's features).

Although I would prefer to have things from PDFs in ON, I guess
that in order to be able to perform intelligent searches I will
have to stick with Acrobat for PDFfed material and askSam for
other material [siiiigh]

Rainald
(who is seriously disappointed)

Click to expand...

Rainald Taesler · Jul 12, 2006

Alex <MS> <MS> shared these words of wisdom:

Alex,
would you pls be so kind as to check my last two postings in this
thread?

I detected a most serious bug of the OCR engine which makes the
results of "printed" input unusable for a search.

TIA
Rainald

Rainald Taesler · Jul 12, 2006

Grant Robertson said:
[...] What good does it
do when half of a users notebooks are unsearchable unless they
manually do all that work?

I'm with you!

Rainald

Patrick Schmid · Jul 12, 2006

Only if it comes to certain issues as recognizing words with Umlauts

etc.
Just the "ö" and "Ä" and the "ü" are treated differently:
"Böttcher" with English OCR shows as "BOttcher" and "Bräsel" became
"Brazel".

But as obvious in the samples I sent, the language used for OCR is
without any influence on the issue dealt with here.
Possibly with Chinese <g> the separation of words might be an issue.
But the serious mistake to not separate words but glue everything
together (pls have a closer look again on the samples) in languages
with Latin characters has absolutely nothing to do with how strings
are built. And in the case given it was just names and figures which
won't make sense to an OCR dictionary in any of the languages anyway
;-)
So this theory does not lack of some crudeness ;-).

Sorry to disappoint you. The way you describe OCR hasn't really made it
out of research labs yet. OCR is completely language dependent, and not
just for individual characters. For example in your case, it might not
have separated the words, because it didn't recognize them as words. I
have a professor who does research in this area and believe me, this is
really a hard problem.

But manual OCR brought the solution of the riddle. I've got it now:
It's a bug in the *automatic* indexing.

Manual OCR meaning you selected the language manually?

The sample b) was taken from a document which had not been treated
with manual OCR. The text behind the visual layer had been generated
automatically.

If the same document is treated with manual OCR, however, the result
is correct !!!

Interesting. I suppose it used the wrong language then.

Now compare this to sample b) below.
I'm sure that thereafter you'll agree that there is a really serious
bug in the OCR engine which most urgently needs to be fixed in order
to make the result usable.

I would write it up and post in "Connect" could I get in :-( :-(

See the post from Alex a bit earlier. You need to sign up for the
OneNote site

Thanks again for your suggestions.
Although it was the wrong direction <g>, experimenting on them brought
me to find out what really is the cause of the search problems.

Rainald
P.S. I would be grateful for your comments on my points II. and III.
(below)

II: UI issue. Hit the small little arrows in the yellow bar above the
list and it'll jump through all of them, including the ones on the same
page.

III: Outlook supports advanced search terms. Maybe ON does too and we
just don't know the syntax?

Patrick Schmid
--------------
http://pschmid.net

Patrick Schmid <[email protected]> shared these words of
wisdom:

How about things *printed* to ON?

Printouts are pictures

Unfortunately Yes.
AFAICS this construction was not the best of all possible
solutions. An *Import* feature working on known file-formats
IMO would have been a preferable solution. In the case of PDFs
f.e. an instrument as used in "Abbey PDF-Transformer" (which
produces really fine formatted output to WinWord [AFAICS based
on Abbey's expertise of OCR software]) would have been ways
better than sending text through a printer and then re-cerate
text by OCR. This seems a bit crazy to me.
So yes, they are of course OCRed.

But to which result?
A really bad one! (see below)

Right-click on one of those printouts and select copy all text.
Then paste the text somewhere else and take a look.
You can see for yourself then whether the quality of the OCR is
the issue or WDS.

Thanks for the suggestion!
It reveals how badly OCR is implemented in ON.

ON's OCR is the culprit, not WDS.

I.
1.) Sorry to say so: The OCR produces output hardly usable for a
search.
Unfortunately I cannot make any attachments, so pls permit
longer input here:

a) Result of Copy+Paste in Acrobat:

0-110 Polizei 367 E106 Blessing, Peter, Dr.
0-112 Feuerwehr KÜN-190/156 C206 Bleyel, Bernd
0-19222 Rettungsleitstelle 318/467 D040 Bluthardt, Christian
A 221 A214 Bochert, Ralf, Dr.
367 E106 Ahrens, Uwe, Prof. 230/281/285 A304 Boelke, Klaus, Dr.
0-579796 A014 AISEC 393 E141 Boese, Jürgen, Dr.
263/264 B026 Akademisches Auslandsamt 280 C040 Böhm, Hugo
375 Y104 Albrecht, Tobias 326 C009 Bossack, Sandra
KÜN-137 A406 Albrecht, Wolfgang, Dr. 202 B007 Böttcher, Michael
432 F015 Asche, Gerd 449 Y006 Bouché, Daniel
207 A011 Asta (0-251460) 90 A013 Braner, Hannelore
0-506348 A012 Asta HN (Fax) 554 B001b Bräsel, Martina
KÜN-155 C105a Asta KÜN (KÜN-544756) 430 Z005 Bray, Laurent, Dr.
KÜN-53078 C105a Asta KÜN (Fax) KÜN-218 D110.1 Brazel, Christa
KÜN-208 A117 Auerbach, Achim 218 A204 Brecht, Ulrich, Dr.
288 C035 Aufenthaltsraum KÜN-211 D013.1 Breitenbacher, Manuel
640 A Aufzug 1-3 KÜN-166/167 C016 Breitkreuz, Ehrenfried
641 B Behindertenaufzug 260 B023 Brnic, Sonja
644 D Aufzug 321 D110 Brückner, Hans
646 E Aufzug 384 F129 Bucher, Georg, Dr.
645 F Aufzug 221 A214 Buer, Christian, Dr.
403 F222 Auth, Werner, Dr. KÜN-252 D219 Burk, Uwe
<<
Words are separated by blanks. Easy to be indexed and used in a
search.

b) Copy+Paste from ON (input from PDF via ON printer)

0-110Polizei 367E106Blessing, Peter, Dr.
0-112Feuerwehr KÜN-190/156C206Bleyel, Bernd
0-19222Rettungsleitstelle 318/467D040Bluthardt, Christian
A 221A214Bochert, Ralf, Dr.
367E106Ahrens, Uwe, Prof.230/281/285A304Boelke, Klaus, Dr.
0-579796A014AISEC393E141Boese, Jürgen, Dr.
263/264B026Akademisches Auslandsamt280C040Böhm, Hugo
375Y104Albrecht, Tobias 326C009 Bossack, Sandra
KÜN-137A406Albrecht, Wolfgang, Dr.202B007Böttcher, Michael
432F015Asche, Gerd 449Y006Bouché, Daniel
207A011Asta (0-251460)90A013Braner, Hannelore
0-506348A012Asta HN (Fax)554B001bBräsel, Martina
KÜN-155C105aAsta KÜN (KÜN-544756)430Z005Bray, Laurent, Dr.
KÜN-53078C105aAsta KÜN (Fax)KÜN-218D110.1Brazel, Christa
KÜN-208A117Auerbach, Achim218A204Brecht, Ulrich, Dr.
288C035Aufenthaltsraum KÜN-211D013.1Breitenbacher, Manuel
640AAufzug 1-3KÜN-166/167C016Breitkreuz, Ehrenfried
641BBehindertenaufzug260B023Brnic, Sonja
644DAufzug321D110Brückner, Hans
646EAufzug384F129Bucher, Georg, Dr.
645FAufzug221A214Buer, Christian, Dr.
403F222Auth, Werner, Dr. KÜN-252D219Burk, Uwe
<<
Separation of words only if following comma +blank (", ").

2.) I'm sure that you'll agree that a search cannot work at all
with text-materiel like that.
MOST URGENT fix needed.

II.
Words separated by comma+blank are found in the search.
If there are multiple hits on a page the hits are not shown on
the list.

III.
As we are at it:
The search engine implemented in ON could be at least a bit
better. There are no options at all, neither using truncated
search (wildcards), nor a combined search using the Boolean
algebra. I would have expected that at least an "expert mode" would
be
provided and at least something like Acrobat offers would be
available in ON (not talk about askSam's features).

Although I would prefer to have things from PDFs in ON, I guess
that in order to be able to perform intelligent searches I will
have to stick with Acrobat for PDFfed material and askSam for
other material [siiiigh]

Rainald
(who is seriously disappointed)

Click to expand...

Click to expand...

Patrick Schmid · Jul 12, 2006

[email protected] says...

Please tell me that you won't really expect users who have been using the
beta to manually go through all of the possibly hundreds of pages with
potentially thousands of imported "printouts" and manually do this for
each and every one. I'm sorry, but isn't that what the heck computers are

Yes, why would they not expect you to do that? You are dealing with beta
software. Issues like that are to be expected and you have to deal with
them yourself. Microsoft will definitely not write a powertoy for you,
because they will point to the following that you should have read
before using the beta:
# Beta testers may experience problems with 2007 Microsoft Office system
Beta 2 products that could potentially result in loss, corruption, or
destruction of existing data.
# This beta testing release is not appropriate for production use.
# Beta code is offered "as is," and does not include technical support.

Sorry, but in this case you are on your own. If this had happened with a
released version, I'd be on your side and screaming for a tool. But with
beta software?
E.g. prior to B2, printouts ended up in ON sometimes with horrible
quality. I still have those in my notebooks as a consequence of using
beta software.
Those are just the issues you have to swallow and deal with. Consider
those issues to be the price tag of beta software.

Patrick Schmid

Rainald Taesler · Jul 12, 2006

Sorry to disappoint you. The way you describe OCR hasn't really
made it out of research labs yet. OCR is completely language
dependent, and not just for individual characters.

Obviously a misunderstanding.
Could it be that you did not read through this - unfortunately long -
branch of the thread carefully enough? Could easily happen with the
huge workload on your shoulders ...

I was *NOT* talking of individual characters.
I just wanted to show you in how far the language influences the
character recognition.

For example in your case, it might not have separated the words,
because it didn't recognize them as words.

Sorry, NO!!
This obviously was not the situation in the case given.
Please checks samples b) and c) again.
Added: Naturally prior to my last posting I carefully compared the
output in English and in German.
There was no influence of the language. Period.

I have a professor who does
research in this area and believe me, this is really a hard
problem.

No doubt about that. But let's get back to the topic.

Manual OCR meaning you selected the language manually?

Not just the *language*!
I just did what I had been told to do:
"Right-click and select make searchable".
I plaid with using different languages too. Ran a whole series of
tests with importing the document several times and trying every
combination. But - again - *language* is not an issue here.

Interesting. I suppose it used the wrong language then.

No. No. No!
I repeat: it has nothing do with the language.

The results were absolutely the same with using German and with using
English (except Umlaut issues).
The automatic OCR swallows the blanks (except ", "), manual OCR does
not do this and treats each item individually keeping the blanks
(regardless of German or English)

Please, dear Patrick, jump off your "language" theory an re-read what
I had reported.
I would be too sad if you just would not see what is the case.

See the post from Alex a bit earlier. You need to sign up for the
OneNote site

Yes, saw it. And will try it later. But some things getting
unnecessarily lengthy get in the way ;-) ;-)

II: UI issue. Hit the small little arrows in the yellow bar
above the list and it'll jump through all of them, including the
ones on the same page.

Misunderstanding again. [siiiigh]
I was not talking of the hits marked in the *documents*.
I said "the hits are not shown on the list".
"List" means:
- if during/after a search one clicks on "View list" (field yellow,
item bearing a hyperlink), in the "Task area" (Aufgabenbereich) a list
of the matches is opened;
- only a part of the hits appears there.

III: Outlook supports advanced search terms. Maybe ON does too
and we just don't know the syntax?

Only a problem of the syntax?
I do not think so. At least there is no UI component for an
intelligent search.
And I tried it with the usual wildcards - to no avail.

As this for me is really important: Would pls be so kind as to ask
through your backroom channels if their might perhaps be something in
the make?

Rainald

Patrick Schmid <[email protected]> shared these words
of wisdom:

How about things *printed* to ON?

Printouts are pictures

Unfortunately Yes.
AFAICS this construction was not the best of all possible
solutions. An *Import* feature working on known file-formats
IMO would have been a preferable solution. In the case of PDFs
f.e. an instrument as used in "Abbey PDF-Transformer" (which
produces really fine formatted output to WinWord [AFAICS based
on Abbey's expertise of OCR software]) would have been ways
better than sending text through a printer and then re-cerate
text by OCR. This seems a bit crazy to me.
So yes, they are of course OCRed.

But to which result?
A really bad one! (see below)

Right-click on one of those printouts and select copy all
text. Then paste the text somewhere else and take a look.
You can see for yourself then whether the quality of the OCR
is the issue or WDS.

Thanks for the suggestion!
It reveals how badly OCR is implemented in ON.

ON's OCR is the culprit, not WDS.

I.
1.) Sorry to say so: The OCR produces output hardly usable
for a search.
Unfortunately I cannot make any attachments, so pls permit
longer input here:

a) Result of Copy+Paste in Acrobat:

0-110 Polizei 367 E106 Blessing, Peter, Dr.
0-112 Feuerwehr KÜN-190/156 C206 Bleyel, Bernd
0-19222 Rettungsleitstelle 318/467 D040 Bluthardt, Christian
A 221 A214 Bochert, Ralf, Dr.
367 E106 Ahrens, Uwe, Prof. 230/281/285 A304 Boelke, Klaus,
Dr. 0-579796 A014 AISEC 393 E141 Boese, Jürgen, Dr.
263/264 B026 Akademisches Auslandsamt 280 C040 Böhm, Hugo
375 Y104 Albrecht, Tobias 326 C009 Bossack, Sandra
KÜN-137 A406 Albrecht, Wolfgang, Dr. 202 B007 Böttcher,
Michael 432 F015 Asche, Gerd 449 Y006 Bouché, Daniel
207 A011 Asta (0-251460) 90 A013 Braner, Hannelore
0-506348 A012 Asta HN (Fax) 554 B001b Bräsel, Martina
KÜN-155 C105a Asta KÜN (KÜN-544756) 430 Z005 Bray, Laurent,
Dr. KÜN-53078 C105a Asta KÜN (Fax) KÜN-218 D110.1 Brazel,
Christa KÜN-208 A117 Auerbach, Achim 218 A204 Brecht, Ulrich,
Dr. 288 C035 Aufenthaltsraum KÜN-211 D013.1 Breitenbacher,
Manuel 640 A Aufzug 1-3 KÜN-166/167 C016 Breitkreuz,
Ehrenfried 641 B Behindertenaufzug 260 B023 Brnic, Sonja
644 D Aufzug 321 D110 Brückner, Hans
646 E Aufzug 384 F129 Bucher, Georg, Dr.
645 F Aufzug 221 A214 Buer, Christian, Dr.
403 F222 Auth, Werner, Dr. KÜN-252 D219 Burk, Uwe
<<
Words are separated by blanks. Easy to be indexed and used in
a search.

b) Copy+Paste from ON (input from PDF via ON printer)

0-110Polizei 367E106Blessing, Peter, Dr.
0-112Feuerwehr KÜN-190/156C206Bleyel, Bernd
0-19222Rettungsleitstelle 318/467D040Bluthardt, Christian
A 221A214Bochert, Ralf, Dr.
367E106Ahrens, Uwe, Prof.230/281/285A304Boelke, Klaus, Dr.
0-579796A014AISEC393E141Boese, Jürgen, Dr.
263/264B026Akademisches Auslandsamt280C040Böhm, Hugo
375Y104Albrecht, Tobias 326C009 Bossack, Sandra
KÜN-137A406Albrecht, Wolfgang, Dr.202B007Böttcher, Michael
432F015Asche, Gerd 449Y006Bouché, Daniel
207A011Asta (0-251460)90A013Braner, Hannelore
0-506348A012Asta HN (Fax)554B001bBräsel, Martina
KÜN-155C105aAsta KÜN (KÜN-544756)430Z005Bray, Laurent, Dr.
KÜN-53078C105aAsta KÜN (Fax)KÜN-218D110.1Brazel, Christa
KÜN-208A117Auerbach, Achim218A204Brecht, Ulrich, Dr.
288C035Aufenthaltsraum KÜN-211D013.1Breitenbacher, Manuel
640AAufzug 1-3KÜN-166/167C016Breitkreuz, Ehrenfried
641BBehindertenaufzug260B023Brnic, Sonja
644DAufzug321D110Brückner, Hans
646EAufzug384F129Bucher, Georg, Dr.
645FAufzug221A214Buer, Christian, Dr.
403F222Auth, Werner, Dr. KÜN-252D219Burk, Uwe
<<
Separation of words only if following comma +blank (", ").

2.) I'm sure that you'll agree that a search cannot work at
all with text-materiel like that.
MOST URGENT fix needed.

II.
Words separated by comma+blank are found in the search.
If there are multiple hits on a page the hits are not shown on
the list.

III.
As we are at it:
The search engine implemented in ON could be at least a bit
better. There are no options at all, neither using truncated
search (wildcards), nor a combined search using the Boolean
algebra. I would have expected that at least an "expert mode"
would be
provided and at least something like Acrobat offers would be
available in ON (not talk about askSam's features).

Although I would prefer to have things from PDFs in ON, I
guess that in order to be able to perform intelligent
searches I will have to stick with Acrobat for PDFfed
material and askSam for other material [siiiigh]

Rainald
(who is seriously disappointed)

Click to expand...

Click to expand...

Patrick Schmid · Jul 12, 2006

Manual OCR meaning you selected the language manually?

Not just the *language*!
I just did what I had been told to do:
"Right-click and select make searchable".
I plaid with using different languages too. Ran a whole series of
tests with importing the document several times and trying every
combination. But - again - *language* is not an issue here. Gotcha

Misunderstanding again. [siiiigh]
I was not talking of the hits marked in the *documents*.
I said "the hits are not shown on the list".
"List" means:
- if during/after a search one clicks on "View list" (field yellow,
item bearing a hyperlink), in the "Task area" (Aufgabenbereich) a list
of the matches is opened;
- only a part of the hits appears there.

I know. I was pointing out to you how to still be able to get to each
hit on a page. I am aware that there is only one hit per page shown in
the list. That's why I said UI issue. WDS does find every entry on each
page, but the UI doesn't display it.

Only a problem of the syntax?
I do not think so. At least there is no UI component for an
intelligent search.
And I tried it with the usual wildcards - to no avail.

As this for me is really important: Would pls be so kind as to ask
through your backroom channels if their might perhaps be something in
the make?

I'll see what I can do.

Patrick Schmid

Rainald Taesler · Jul 13, 2006

Thanks for staying with me.

Gotcha

Great! This gives my heart relieve ;-)
So you'll now also share my view that this is not just an annoying bug
but one of the killers?

I know. I was pointing out to you how to still be able to get to
each hit on a page. I am aware that there is only one hit per
page shown in the list. That's why I said UI issue. WDS does
find every entry on each page, but the UI doesn't display it.

Exactly!
And no weight either etc., etc.
I'd really have expected a lot more [siiiigh]

I'll see what I can do.

Many thanks
Rainald

Patrick Schmid · Jul 13, 2006

Great! This gives my heart relieve ;-)

So you'll now also share my view that this is not just an annoying bug
but one of the killers?

Actually, I am still curious about one thing. If you set the submenu to
automatic versus English/German, does it do it wrong then or right?

Patrick Schmid

Rainald Taesler · Jul 13, 2006

Actually, I am still curious about one thing. If you set the
submenu to automatic versus English/German, does it do it wrong
then or right?

As I said (several times):
The language does not matter at all.

Just automatic OCR vs. manual OCR.

To quote from my previous posting of tonight O:20 AM:The results were absolutely the same with using German and with using
English (except Umlaut issues).
The automatic OCR swallows the blanks (except ", "), manual OCR does
not do this and treats each item individually keeping the blanks
(regardless of German or English)
<<

Clear enough now?

Rainald

Patrick Schmid · Jul 13, 2006

I am still trying to understand what you mean with automatic vs manual
OCR.
If I print to ON, and then right-click on it, go into Make Text in Image
Searchable, there is a language selected (in my case English). I don't
actually have to set it. To explain why I am also so pressing on the
language issue: That submenu gives you the option of Disable (which I
assume means no OCR) or you pick a language to OCR in.
So how exactly do you differentiate between automatic and manual? Where
do you set each one?

From what I understand so far, you mean automatic OCR occurs when you
just print something to ON vs. manual OCR which occurs when you pick a
language from that submenu?
If that is the case, what did the submenu show as selected item straight
after printing to ON? Did it show English or German, or something
completely different? If it showed English or German, then the bug is
somewhere else than if it didn't show either one.
Am I making sense?

Patrick Schmid

Grant Robertson · Jul 13, 2006

pds- said:
Yes, why would they not expect you to do that? You are dealing with beta
software. Issues like that are to be expected and you have to deal with
them yourself. Microsoft will definitely not write a powertoy for you,
because they will point to the following that you should have read
before using the beta:
# Beta testers may experience problems with 2007 Microsoft Office system
Beta 2 products that could potentially result in loss, corruption, or
destruction of existing data.

Actually, I was mostly speaking for the thousands of others who are
probably not using the newsgroup but are busily adding printouts to ON 07
as we speak.

I understand your point about beta but a pissed off customer who has lost
tons of functionality is still a pissed off customer. Regular people, who
don't pay much attention to EULA's are probably the vast majority of beta
users right now. Would I, as a business owner, really want to tell my
thousands of pissed off customers, "So sad, too bad! I told you so!", and
expect to keep them as happy customers? I don't think so.

Patrick Schmid · Jul 13, 2006

Unfortunately the concept of a "beta" has eroded in recent years due to
Google's forever betas (and notable ones by Microsoft and others as
well). That means that users nowadays think that beta software is just a
different way of distributing free software. Then they get real beta
software like Office 2007, and are surprised when they encounter bugs,
hassles, etc.
A public beta user isn't really a customer in Microsoft's eyes, as he or
she hasn't paid yet for the software. Keep in mind that Microsoft is
labeling this as preview (in form of Beta 2), but not actually as beta
testing. The real beta testing is conducted by a group of around 10,000
people worldwide and MS is providing support to them. However, there
hasn't been a single instance since November in which MS provided any
tool or anything besides workaround steps to address beta related
issues. I don't see them changing that now for something that doesn't
corrupt your data (if MS did something to mass-corrupt data, they'd
bring out a tool probably. But they try very hard that the data
corrupting versions don't get outside of MS).
OneNote is actually somewhat of an exception. As the user base of ON is
rather small (even within the 10,000 tech beta members), the ON team
decided to accept actual bug submissions from any user, not just tech
beta ones. In addition, MS people are replying in the ON newsgroup.
Neither of the two is available for any other Office 2007 beta product.
So ON 2007 users are more beta testers than just previewing the program.
That doesn't get them any different treatment though compared to tech
beta members, and we don't get any special tools...

Patrick Schmid

Grant Robertson · Jul 13, 2006

pds- said:
A public beta user isn't really a customer in Microsoft's eyes, as he or
she hasn't paid yet for the software.

Well, I was going to say that MS should think of these as the most
important customers, the ones who are most interested in buying the
product when it comes out. But then I realized, these people are probably
going to buy the product when it comes out anyway and MS knows it. What I
imagine they are most concerned about are the millions more regular
people who would never even try a beta. I'm guessing MS is more concerned
with issues that would prevent those people from buying the product once
they had looked at a demo, tried a demo version or had a friend show it
to them. Unfortunately for us this means they are only truly concerned
with superficial look and feel issues rather than long term usability
issues that a power user would be most interested in.

Patrick Schmid · Jul 13, 2006

Here is another angle: Why should MS be concerned about all the users
who are using ON 2007 for production work right now? By the time ON is
released, those users will have accumulated half a year worth of notes
in 2007. As those can't be converted back to ON 2003, those users are
essentially locked into ON 2007.
So, the ones using the beta right now are already guaranteed customers
come the release date...No need to keep them particularly happy

Patrick Schmid

Rainald Taesler · Jul 13, 2006

Hallo Patrick,

I am still trying to understand what you mean with automatic vs
manual OCR.

The basic "definition" problem (which we both know from our academic
background <bg>).
1. Automatic OCR:
Performed ON automatically on importing images
(a) by cut+paste
(b) by printing with the ON printer driver

2.) Manual OCR:
Executed by selecting "Make ... Searchable".

If I print to ON, and then right-click on it, go into Make Text
in Image Searchable, there is a language selected (in my case
English). I don't actually have to set it.

Exactly.
On my system it defaults to German.

But one can select and the results are slightly different
* Umlauts /Accents
* Text (words) taken from a graphical (image) part of the original PDF
differ to a certain extent.

The "Make ... Searchable" tool opens a msgbox with a progress bar
whilst it works.
It does fire when the language selected is the same one used by the
"automatic" recognition.
F.e.: Manual OCR does only work with German selected after having been
run with a different language before.

To explain why I am also so pressing on the language issue: That
submenu gives you the option of Disable (which I assume means no
OCR)

Right assumption.
Selecting this option shuts off OCR.

or you pick a language to OCR in. So how exactly do you
differentiate between automatic and
manual? Where do you set each one?

"Automatic" is done - as the word says <g>- *automatically*.
On just does nothing at all
The system runs OCR automatically in the background without even
telling that it does it.
It runs recognition on each any image imported in ON or printed to ON.

From what I understand so far, you mean automatic OCR occurs
when you just print something to ON vs. manual OCR which occurs
when you pick a language from that submenu?

You've got it, finally!

If that is the case, what did the submenu show as selected item
straight after printing to ON?

Did it show English or German, or something completely different?

If it showed English or German, then the bug is somewhere else than
if it didn't show either one.

For sure. That's what I have been saying/explaining from the very
beginning of the discussion in this branch of the thread.

Am I making sense?

Yes.
But still you were sticking with your old language theory<!g>
Why won't you believe me? As said numerous time the difference simply
lies between the "automatic" and the "manual" recognition.
There's a dramatic difference in the results (as I had shown with the
text samples).

I have put together a notebook with the details and put it on my
FTP-server:
ftp://ftp.hs-heilbronn.de/vdb/onenote/

Gruß
Rainald

Patrick Schmid · Jul 13, 2006

Now that I finally got it, have you tried submitting it as bug on
connect?

Patrick Schmid

Rainald Taesler · Jul 13, 2006

Now that I finally got it, have you tried submitting it as bug on
connect?

Not yet.
I thought that I should wait until you have might have found the time
to go through my sample and checked if it's understandable and
error-free.

Rainald

Can I automatically set all pictures to be enabled for text search

Alex

Grant Robertson

Rainald Taesler

Rainald Taesler

Rainald Taesler

Patrick Schmid

Patrick Schmid

Rainald Taesler

Patrick Schmid

Rainald Taesler

Patrick Schmid

Rainald Taesler

Patrick Schmid

Grant Robertson

Patrick Schmid

Grant Robertson

Patrick Schmid

Rainald Taesler

Patrick Schmid

Rainald Taesler