How Spotlight and Entourage Work Together

A

Andy Ruff

I'll be posting revised versions of this message to the Entourage weblog
sometime this week -- http://blogs.msdn.com/entourage. However, I wanted to
give a heads-up preview to the newsgroups. Let me know if this helps and
there might be areas I may elaborate upon.

-Andy

----

Spotlight is a blazing fast search engine integrated directly into your Mac.
It¹s a great tool and the ability to search all your Entourage items within
Spotlight makes it, for me, absolutely ³mission critical² to each day¹s
activities.

The recent release of Entourage includes ample documentation regarding how
to use Spotlight with Entourage, but it provides very little note of how the
whole process works. The following will be rather detailed, but I hope that
it will provide a good starting point for understanding why we designed the
feature the way we did and how you might go about troubleshooting problems
that may arise.

BASICS OF SPOTLIGHT

Understanding how Spotlight generally works is a prerequisite for grasping
how Entourage hooks into the Spotlight system. For our concerns, Spotlight
has three primary roles: watching for changes to a file, importing metadata
for a file, and allowing users to search all this metadata. Metadata is
essentially all the essential information within a file. For songs, this
usually includes the title, the album, or the genre (all those fields you
see in iTunes). For e-mail messages, this includes the subject, the date
sent, the Entourage categories, and many other properties of your e-mail
message.

When a file is modified on your computer, Spotlight is notified that such a
change occurred. Spotlight keeps track of all these changes and will, at
some point, begin the process of extracting metadata from the modified
files. The timeframe for which it begins to extract metadata may vary.
Apple cleverly designed Spotlight to only extract metadata at times in which
it wouldn¹t interfere with your ongoing use of the computer.

Once the Mac OS does kick-off the extraction of metadata from a file, it
does so through a Spotlight Importer. Spotlight Importers are plug-ins for
the Mac OS that a developer provides specifically for helping files created
by their applications to be searchable within Spotlight. Spotlight crawls
through its list of changed files, handing each one to the appropriate
importer. The importers then read the files, compile a list of metadata,
and then hand the metadata back to Spotlight. At this point, the changed
file is available for searching within Spotlight.

For example, there¹s a Spotlight Importer for music files such as an
MP3s(System/Library/Spotlight/Audio.mdimporter). When Spotlight is ready to
extract metadata from a MP3, Spotlights asks the music importer to read the
MP3 and provide and information about MP3 to be searchable within Spotlight.
The music importer quickly reads the music file and responds with info like
³Title = Ain¹t Life Grand² and ³Author = Widespread Panic.² After all this
was done, you would be able to search for ³Widespread² in Spotlight and the
corresponding MP3 file would show up in results.


WHAT TOOK SO LONG?


When we announced our Spotlight feature at MacWorld this year, noticed two
types of reactions among attendees: a giant smile followed by an ³awesome²
or grumbled stare preceded by ³about time² or ³what took so long?²

The challenge of supporting Spotlight for Entourage primarily centered on
the way Spotlight, at its fundamentals, is designed. Spotlight is designed
around changes in files, not databases. I¹m not knocking Apple¹s design‹as
Entourage now demonstrates, there are very viable, simple solutions for Mac
applications facing this obstacle.

If you recall, Spotlight uses the change of a file to determine when it
needs to get new metadata for the search index. Entourage, however, stores
all information within a single database file (you can find yours inside
Documents/Microsoft User Data/Office 2004 Identities). Each time you get a
new message, create a contact, or delete an event, that single database file
is changed.

As with all files, Spotlight receives a notification that the Entourage
database file has changed. We could have written an importer that opened up
the database file, searched your database for the changes, and then provided
all the new metadata to Spotlight. In fact, we tried this experiment and it
incredibly taxing on your machines performance and was heavily prone to
causing problems.

Another option would have been to move away from the single file database.
This is, in fact, what Apple chose to do with both Address Book, Mail, and
iCal in Mac OS 10.4. This is a long-held, sometimes heated discussion
amongst members of the Entourage team. There are merits and challenges to
having a single file database or splitting into a series of files for each
item within the database. We made the decision to stay with our current
database structure primarily as such a change, at this time, would prove
highly disruptive and risky. Remember, the goal here was to provide
Spotlight searching to users, not re-architect the way Entourage stores
data.

HOW DOES ENTOURAGE WORK WITH SPOTLIGHT?

When you enable Spotlight indexing within Entourage, a ³cache² file is
created for each item within your Entourage database. If you have 100,000
e-mail messages in your Entourage database, 100,000 cache files will be
created. If you want to see the cache files, you can find them within your
Library/Caches/Metadata/Microsoft folder.

Each cache file contains all the metadata that will be needed for indexing
by Spotlight. All changes within Entourage are reflected to the cache files.
Create a new item and a new cache file will be created. Updated an item and
its cache file will update. Delete an item and its cache file will be
deleted. With all these changes, Spotlight receives file change
notifications and eventually will ask the modified cache files to go through
the import process using the Entourage Spotlight Importer.

When you first turn on Spotlight in Entourage, this may take some time.
Entourage has to crawl through your entire database, reading each item, and
creating the corresponding cache file. For a moderate sized database
(50,000 items), the process typically takes 20 minutes though there are many
factors that cause this time to vary. Once the first set of cache files is
created, Entourage will update cache files almost instantaneously.
Typically, delays in Entourage items showing up in Spotlight results is due
to Spotlight waiting for idle time to index the modified cache files. If you
do a ³Rebuild² within the Entourage Spotlight preferences, Entourage will
simply delete the previous cache files and kick-off the crawling process
that regenerates all the Entourage cache files.

As Spotlight begins the import process, each cache file is handed one-by-one
to the Entourage Spotlight Importer. The importer reads the cache file, and
just as the music importer did with MP3s, provides all the relevant metadata
to Spotlight for searching. Once the importer provides this data, the item
is searchable via Spotlight. Again, keep in mind that Spotlight determines
when this indexing happens‹delays in Entourage items showing up in Spotlight
results most likely mean Spotlight has not yet indexed the item.

We chose to be very liberal in the amount of metadata produced and provided
by Entourage¹s cache files. Not only did we try to align our metadata with
that of the equivalent Apple applications (e.g. where Apple Mail defined a
property as an e-mail subject, we used the same property), but we also
pumped out a lot more information such as categories and projects.
Essentially, our goal was to provide all properties of items accessible via
our AppleScript Dictionary as metadata properties for Spotlight searching.
What this means is, you can use Spotlight to do very powerful, fast queries
against nearly all data within your Entourage database. Try this: type the
name of one of your categories in the Mac OS Spotlight search field‹you¹ll
see that all Entourage items within that category show up!

This approach ends up being a very simple way to provide Spotlight searching
for Entourage. It¹s downside is that it consumes a bit more disk space as it
essentially mirrors some contents of your Entourage database‹you¹ve now go
both the Entourage database file and the associated cache files. We looked
at this for some time, investigating how much additional disk space an
average user would need and how often the two sources of Entourage data
would fall out of sync. We found that for a large number of users, the
additional disk consumption was relatively small (typically less than 15% of
the original database size) and with performance efforts on our part the
risk of getting the two out of sync rather minimal.

SUMMARY

Spotlight¹s a very cool addition to the Mac OS‹it¹s amazingly fast, very
powerful, and incredibly handy. The Entourage team is excited to finally
get to share with you our efforts on supporting Spotlight for Entourage.
The project was an interesting technical challenge for us, but we hope that
we¹ve provided a solution that makes searching Entourage items with
Spotlight a part of your daily routine.

-Andy

--
Andy Ruff
MacBU Program Management
Entoruage Weblog: http://blogs.msdn.com/entourage/

This posting is provided ³AS IS² with no warranties, and confers no rights.
 
S

somebodynew

I have a lot of old mail on my computer as entourage archives in mbox
format. At this point, it does not seem like entourage will index
those mbox files. Is there a way to enable or induce spotlight
indexing of these older messages that do not reside within entourage?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top