Programming Language – Java
Highly extensible and Flexible system for web crawling
Implements search when combined with open source search platforms like Apache Lucene or Apache Solr
Dynamically scalable with Hadoop
Difficult to setup
Poor documentation
Some operations take longer, as the size of crawler grows
Programming Language – Java
Excellent user documentation and easy setup
Extensible, good performance and decent support for distributed crawls
Respects robot.txt
Not dynamically scalable