open:crawling-framework

Crawling Framework

Apache Nutch

Programming Language – Java

Pros

Highly extensible and Flexible system for web crawling
Implements search when combined with open source search platforms like Apache Lucene or Apache Solr
Dynamically scalable with Hadoop

Cons

Difficult to setup
Poor documentation
Some operations take longer, as the size of crawler grows

Programming Language – Java

Pros

Excellent user documentation and easy setup
Extensible, good performance and decent support for distributed crawls
Respects robot.txt

Cons

Not dynamically scalable

  • open/crawling-framework.txt
  • 마지막으로 수정됨: 2021/01/27 01:52
  • 저자 127.0.0.1