Thursday, August 28, 2008

Indexing pdf documents with Adobe v8.0 reader and Search Server Express 2008

Search Server Express 2008 does not index pdf files out of the box. You need an ifilter for pdf which is installed automatically when you install the Adobe acrobat reader v8.0. (Note this will only index pdf created in adobe 8.0 and under)

Steps to get pdf working with SSE 2008:

1. On your server download and install Adobe Acrobat reader v8.0
2. Go to WSS admin and the SSE 2008 shared services admin page. Under 'File Types' add the pdf extension here
3.
Go to
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\{ANYGUID}\Gather\Search\Extensions\ExtensionList
Find the highest number in the list(should be 38) an enter pdf for the value data.


Modify the registry keys by changing their 'Default' value to the new CLSID of Adobe Reader V8.0
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf\Default - set to {E8978DA6-047F-4E3D-9C78-CDBE46041603}

*** Note if your version of Abobe 8 is say 8.x will need a different CLSID to put in ***

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf\Default - set to {E8978DA6-047F-4E3D-9C78-CDBE46041603}

4. Add the installation directory of Adobe Reader v8.0 to the path under 'Environment Variables' in the system part of Windows Server control panel
"c:\Program Files\Adobe\Reader 8.0\Reader" - this is important as it tells SSE 2008 where to find the right DLLs

5. Stop and restart the search service

net stop osearch
net start osearch

test it by running , you should see PDFs and first couple of lines of content from each(if you dont them something is not set right, you should be able to see within them the content).

Reboot server if still not seeing it.

NOTE: you will need to completely recrawl all your content source first! Would do a reset of of the index to clear SSE out and recrawl everything from scratch.

No comments: