> I am using a PowerMac G4 with OSX 10.4.11 Tiger. I have 1G memory. I did
> some experimenting with Spotlight and Finder and found what I believe to be
[quoted text clipped - 27 lines]
> This is unacceptable with the structure that I have compiled. What can be
> done to correct this glaring deficiency?
080628 8:56 - CyberTaz posted:
> Certainly a well-tested matter, and I'm certain that your findings will be
> both interesting & helpful to some of those who frequent this group, but I
[quoted text clipped - 50 lines]
>> This is unacceptable with the structure that I have compiled. What can be
>> done to correct this glaring deficiency?
I've been to the discussion areas on the Apple forums at the Apple Site and
posted this problem but the only responses are more complaints from others
who are having problems with Spotlight, or the usual "fix" of forcing the
indexing.
On Usenet, on a Mac forum for the OSX, I did get this response:
-------------------------------------------
It's not initially clear where the problem really lies. Gathering
information from a document to be indexed for Spotlight is the
responsibility of small chunks of plugin code called importers. So right up
front the first possibility is that the importer for Word documents is only
submitting the first 8k of the content to the Spotlight engine, and in that
case the vendor that wrote the plugin needs to fix it.
On the other hand, it's possible that when indexing content (which is
handled differently from other indexed metadata) the importer *is* giving
Spotlight everything and the engine is giving up after 8k.
From a single, very quick test here it looks like the former; I was just
able to find a 32k AppleWorks document by searching for a made up word that
only occurs at the end. The solution then is to complain to whatever company
provided the plugin (who I presume is Microsoft, but best to check these
things). In a terminal, mdimport -nd1 <filename> will output a message
telling you the path of the importer plugin used; from there, a Get Info
will likely identify the vendor.
---------------------------------------------
CyberTaz - 29 Jun 2008 06:43 GMT
Perhaps the .doc format wasn't constructed to provide more. However, I did
some quick checking with .docx format produced by Word 2008. Similar to the
test you cited I created a 172KB document comprising 195,126 characters
(249,156 with spaces) with a unique string at the very end. Spotlight had no
problem finding it instantly when that string was used for the search.
Regards |:>)
Bob Jones
[MVP] Office:Mac
On 6/28/08 10:08 PM, in article C48C58CB.AFF1%indago@att.net, "indago"
> 080628 8:56 - CyberTaz posted:
>
[quoted text clipped - 80 lines]
> will likely identify the vendor.
> ---------------------------------------------
Elliott Roper - 30 Jun 2008 17:26 GMT
> Perhaps the .doc format wasn't constructed to provide more. However, I did
> some quick checking with .docx format produced by Word 2008. Similar to the
> test you cited I created a 172KB document comprising 195,126 characters
> (249,156 with spaces) with a unique string at the very end. Spotlight had no
> problem finding it instantly when that string was used for the search.
A long time ago I looked into this when Tiger's Spotlight was not
returning hits on Word docs (.doc format) when the first occurrence was
more than 200K characters from the beginning of the file.
I dimly recall that the author of the importer (Apple, Microsoft or
some other entity - I forget) had somewhere stated that was deliberate
design.
My solution then was to print to PDF and use either Preview's or
Spotlight search (which of course was perfectly OK for the full length
of the PDF). Since both were an order of magnitude faster than Word's
own internal search of a single document, that was quite useful while
collaborating on a single large document with several versions, All the
PC toting people would look to me in meetings to find references
several hundred pages into the damn thing.
Matters Spotlight have only got better with Leopard. I have no
first-hand knowledge of the state of the mdi importer for Word 2008 and
.docx files. Nor do I expect gain such knowledge in the near future.
I retrieved the above 'collaboration' document from archive and I can
confirm that the Word 2004 mdi importer I currently have on this
Leopard machine still fails to find matches more than 30-odd pages into
it.

Signature
To de-mung my e-mail address:- fsnospam$elliott$$
PGP Fingerprint: 1A96 3CF7 637F 896B C810 E199 7E5C A9E4 8E59 E248