I'm specifying a content source as an internal non sharepoint web site
e.g. http://internal.example.com
However a full crawl only ever crawls 22 pages in the root of the web site (there are 100), unless i specify 'Custom - specify page depth and server hops:' and leave the 'Page Depth' and 'Limit Server Hops' as Unlimited
Which then makes the crawl go mental!
Update: I'm using MS Search Server Express 2008
From stackoverflow
-
You might want to specify exactly what tool/technology you are using to do this crawling. Also, have you tried something other than unlimited, and what are the results?
-
Wget is pretty smart. Here is a command line I use to recursively snapshot sites.
wget -r -k -K --no-parent http://internal.example.com/
0 comments:
Post a Comment