The Spider Engine is a data fetch and analysis software module which can be used to track information sources, such as web sites or document repositories. The Engines function is to periodically monitor a target information source and to report any changes in its content. This reporting can be delivered via simple email alerts pushed to the user, or it can be integrated into a larger Business Intelligence (BI) or Knowledge Management (KM) system.
For example, consider a competitor's web site which contains information of interest such as press releases or new product announcements. The Spider Engine can be configured to track the competitors site, such that when any new press release or product information is posted to the site, an email alert will be generated. This functionality allows an analyst to easily track several competitor sites without the effort of having to manually visit and review each site on a regular basis.
As another example, consider a document repository which consists of a large quantity of PDF documents. The Spider Engine can be configured to periodically analyze the repository and report any additions, deletions, or changes to its documents. This document analysis can be configured to include such details as which pages within each document have changed, and to provide capsule summaries of the exact changes.
Input to the Spider Engine can come from a variety of sources, some of which include traditional databases, XML, web site content, web logs (blogs), Usenet posts, and RSS feeds. By default, the Engine produces XML output of its reports, but this output format can be configured as needed.