Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
General
GeneralPortable MacsHardwareNetworking
Applications
Mac ApplicationsEudoraFirefox / MozillaInternet ExplorerOutlook ExpressMS OfficeEntourageExcelPowerPointWordVirtual PCMedia PlayerOther MS Products
Programming
Mac ProgrammingCodeWarriorPerl
Country Specific
Australian Mac GroupUK Mac Group

Mac Forum / Applications / Word / June 2008



Tip: Looking for answers? Try searching our database.

Spotlight/Finder

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
indago - 28 Jun 2008 14:11 GMT
I am using a PowerMac G4 with OSX 10.4.11 Tiger.  I have 1G memory.  I did
some experimenting with Spotlight and Finder and found what I believe to be
a problem.  I compile documents of news articles in chronological order, one
after another, on a particular subject; one is entitled
FinancialMarkets/Global.  Articles relating to this title are compiled in
this particular document.  I use Microsoft Word, in Office 2001.  When the
documents reach around 95K characters, I begin a new document.  If a
document reaches 100K, it doesn't register in the info frame at the bottom
of the window, so I keep them under this figure.  I label them
FinancialMarkets/GlobalA, and FinancialMarkets/GlobalB, and etc.

The FinancialMarkets/Global document is at present around 32.8K, with
articles beginning around 1995.  I have been following the French trader
Kerviel and his exploits with his enormous losses in the marketplace that
stunned the French banking industry.  I had compiled these articles in the
FinancialMarkets/Global document.  I attempted to find this document with
Spotlight to add another article to it and Spotlight didn't find it.  I put
in the name kerviel.  I tried the methods of forced indexing and nothing
worked.  I put in Spotlight some earlier names of individuals from this
document and it worked OK.  Finally, today, I put in the name kerviel early
in the document, like around 20 characters.  Spotlight found the document.
I moved the name further into the document and it still worked OK.  I kept
moving the name further into the document until Spotlight didn't find it
anymore.  The first article with the name Kerviel was at 9.1K characters.
The last place that Spotlight worked was around 8.1K characters in this
document.

I tried this experiment in another document with another name with the same
results.  Spotlight will index a document up to around 8.2K and no further.
This is unacceptable with the structure that I have compiled.  What can be
done to correct this glaring deficiency?
CyberTaz - 28 Jun 2008 14:56 GMT
Certainly a well-tested matter, and I'm certain that your findings will be
both interesting & helpful to some of those who frequent this group, but I
believe you're preaching to the wrong congregation:-)

Both Finder & Spotlight are OS X features - neither MS nor any other
software developer determines how those features work. If there appears to
be a limit to how "deep" Spotlight searches into any given file it is
established by Apple. Likewise, if the parameter can be adjusted the setting
would be in the Preferences for Spotlight, itself.

You may get some useful insights from those who frequent the Apple
Discussions, many of whom are quite technically oriented. They may have some
suggestions to offer which go beyond the typical user level. This link is
directly to the OS X 10.4.x Spotlight forum:

http://discussions.apple.com/forum.jspa?forumID=757

Regards |:>)
Bob Jones
[MVP] Office:Mac

On 6/28/08 9:11 AM, in article C48BA2C9.AFD2%indago@att.net, "indago"

> I am using a PowerMac G4 with OSX 10.4.11 Tiger.  I have 1G memory.  I did
> some experimenting with Spotlight and Finder and found what I believe to be
[quoted text clipped - 27 lines]
> This is unacceptable with the structure that I have compiled.  What can be
> done to correct this glaring deficiency?
indago - 29 Jun 2008 03:08 GMT
080628 8:56 - CyberTaz posted:

> Certainly a well-tested matter, and I'm certain that your findings will be
> both interesting & helpful to some of those who frequent this group, but I
[quoted text clipped - 50 lines]
>> This is unacceptable with the structure that I have compiled.  What can be
>> done to correct this glaring deficiency?

I've been to the discussion areas on the Apple forums at the Apple Site and
posted this problem but the only responses are more complaints from others
who are having problems with Spotlight, or the usual "fix" of forcing the
indexing.

On Usenet, on a Mac forum for the OSX, I did get this response:

-------------------------------------------
It's not initially clear where the problem really lies. Gathering
information from a document to be indexed for Spotlight is the
responsibility of small chunks of plugin code called importers. So right up
front the first possibility is that the importer for Word documents is only
submitting the first 8k of the content to the Spotlight engine, and in that
case the vendor that wrote the plugin needs to fix it.

On the other hand, it's possible that when indexing content (which is
handled differently from other indexed metadata) the importer *is* giving
Spotlight everything and the engine is giving up after 8k.

From a single, very quick test here it looks like the former; I was just
able to find a 32k AppleWorks document by searching for a made up word that
only occurs at the end. The solution then is to complain to whatever company
provided the plugin (who I presume is Microsoft, but best to check these
things). In a terminal, mdimport -nd1 <filename> will output a message
telling you the path of the importer plugin used; from there, a Get Info
will likely identify the vendor.
---------------------------------------------
CyberTaz - 29 Jun 2008 06:43 GMT
Perhaps the .doc format wasn't constructed to provide more. However, I did
some quick checking with .docx format produced by Word 2008. Similar to the
test you cited I created a 172KB document comprising 195,126 characters
(249,156 with spaces) with a unique string at the very end. Spotlight had no
problem finding it instantly when that string was used for the search.

Regards |:>)
Bob Jones
[MVP] Office:Mac

On 6/28/08 10:08 PM, in article C48C58CB.AFF1%indago@att.net, "indago"

> 080628 8:56 - CyberTaz posted:
>
[quoted text clipped - 80 lines]
> will likely identify the vendor.
> ---------------------------------------------
Elliott Roper - 30 Jun 2008 17:26 GMT
> Perhaps the .doc format wasn't constructed to provide more. However, I did
> some quick checking with .docx format produced by Word 2008. Similar to the
> test you cited I created a 172KB document comprising 195,126 characters
> (249,156 with spaces) with a unique string at the very end. Spotlight had no
> problem finding it instantly when that string was used for the search.

A long time ago I looked into this when Tiger's Spotlight was not
returning hits on Word docs (.doc format) when the first occurrence was
more than 200K characters from the beginning of the file.

I dimly recall that the author of the importer (Apple, Microsoft or
some other entity - I forget) had somewhere stated that was deliberate
design.

My solution then was to print to PDF and use either Preview's or
Spotlight search (which of course was perfectly OK for the full length
of the PDF). Since both were an order of magnitude faster than Word's
own internal search of a single document, that was quite useful while
collaborating on a single large document with several versions, All the
PC toting people would look to me in meetings to find references
several hundred pages into the damn thing.

Matters Spotlight have only got better with Leopard. I have no
first-hand knowledge of the state of the mdi importer for Word 2008 and
.docx files. Nor do I expect gain such knowledge in the near future.

I retrieved the above 'collaboration' document from archive and I can
confirm that the Word 2004 mdi importer I currently have on this
Leopard machine still fails to find matches more than 30-odd pages into
it.

Signature

To de-mung my e-mail address:- fsnospam$elliott$$
PGP Fingerprint: 1A96 3CF7 637F 896B C810  E199 7E5C A9E4 8E59 E248

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.