Nutch1.6学习笔记
阅读原文时间:2023年07月12日阅读:2

回 到 目 录

暑假每天傍晚或晚上更新

伪恋赛高

这里提供nutch1.6的src下载:

apache-nutch-1.6-src.zip
115网盘礼包码:5lbcymlo6u76
http://115.com/lb/5lbcymlo6u76

如果不想自己编译源代码,可以直接下载我编译后的文件,包括单机版local和依赖hadoop版deploy(64位):

apache-nutch1.6-runtime.zip
115网盘礼包码:5lbcy4rl8e4l
http://115.com/lb/5lbcy4rl8e4l

或者仅下载官方的deploy版

apache-nutch-1.6-bin.tar.gz
115网盘礼包码:5lbbtpwwpbq2
http://115.com/lb/5lbbtpwwpbq2

7/20日编辑:---------------------

今天突然找到nutch各版本的下载地址:http://archive.apache.org/dist/nutch/

apache的各版本软件都可以在这里找到:http://archive.apache.org/dist/

-----------------------------------

目录

安装nutch1.6

使用本地nutch及命令

Nutch的抓取周期

域统计

webgraph

nodedumper和linkrank

注入分值

轻量级抓取freegen

配置solr服务器

使用Luke

solr配置自定义分词器mmseg4j

Luke配置mmseg4j

solr4.2

Cygwin安装nutch

nutch与hadoop

安装nutch1.6及与hadoop1.0.3连接的入门

http://wxweven.blog.163.com/blog/static/1974791152014127115626958/

使用本地nutch及命令

在runtime文件夹中,local文件夹是不借助hadoop的nutch,在该文件夹中实现了单机mapreduce。

本地nutch一般用来做测试、调试。进入local文件夹

在conf文件夹中有很多配置nutch的文件,nutch-default.xml是默认配置,里面有很多配置的

说明。nutch-site.xml是最主要的配置,它会覆盖default中的内容。

在运行nutch前先在nutch-site.xml加入http.agent.name配置。

default中的http.agent.name的例子如下:


http.agent.name

HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.

NOTE: You should also check other related properties:

 http.robots.agents  
 http.agent.description  
 http.agent.url  
 http.agent.email  
 http.agent.version

and set their values appropriately.


考到site.xml,在value标签中加入请求头,这个请求头需要在浏览器中提取,

比如火狐的请求头是

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36

这里是我的nutch-site.xml的完整内容:


http.agent.name Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36

修改好配置之后就能做实验了。

运行bin中的natch程序,提示要输入命令

以下内容部分转自:http://www.blogjava.net/kxx129/archive/2009/09/05/294000.html

Crawl(爬行):  Crawl是“org.apache.nutch.crawl.Crawl”的别称,它是一个完整的爬取和索引过程命令。

      使用方法: 
      Shell代码 
      bin/nutch crawl [-dir d] [-threads n] [-depth i] [-topN]

      bin/nutch crawl [-dir d] [-threads n] [-depth i] [-topN]

       参数说明: 
        :包括URL列表的文本文件,它是一个已存在的文件夹。 
        [-dir ]:Nutch保存爬取记录的工作目录,默认情况下值为:./crawl-[date],其中[date]为当前目期。 
        [-threads ]:Fetcher线程数,覆盖默认配置文件中的fetcher.threads.fetch值(默认为10)。 
        [-depth ]:Nutch爬虫迭代的深度,默认值为5。 
          [-topN ]:限制每一次迭代中的前N条记录,默认值为 Integer.MAX_VALUE。

     例子1:./bin/nutch crawl urls -dir data -threads 50 -depth 2 -topN 2(先不运行这个命令)

        要抓取的网址存放在urls文件夹中(nutch要从urls中的文件读出来),

        抓取后的数据放在data中,

        使用50个线程来抓取,迭代深度为2,每次迭代抓前2条记录

        值得注意的是nutch为了优化效率,不会严格按照深度优先搜索或广度优先搜索来查找

      例子2: nohup ./bin/nutch crawl urls -dir data -threads 50 -depth 2 -topN 2 &

          在前边加了一个nohup, nutch会把日志写到当前目录的nohup.out中

          (更详细的日志文件在logs/hadoop.log中)

          在后边加了一个&,这是linux的后台运行的命令

          如果出错了,在nohup.out中可以看到类似与java异常的日志

          测试的时候,先生成urls文件夹,然后在里面生成url.txt

          写入http://blog.tianya.cn/,表示抓取天涯博客。

          url.txt不是固定的,你可以改成其他名,在urls中的所有文件都将被看作是装作url的文件被读取

          可以去掉-topN,这样抓取前两层所有的url,博主试了一下,某些网页超久的,最好不要去掉

          运行例子2,而一会儿后,抓取完毕,以下是我抓取的nohup.out日志,没有显示异常:

solrUrl is not set, indexing will be skipped…
crawl started in: data
rootUrlDir = urls
threads = 50
depth = 2
solrUrl=null
topN = 2
Injector: starting at 2014-07-13 20:37:26
Injector: crawlDb: data/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 1
Injector: total number of urls injected after normalization and filtering: 2
Injector: Merging injected urls into crawl db.
Injector: finished at 2014-07-13 20:37:57, elapsed: 00:00:30
Generator: starting at 2014-07-13 20:37:57
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 2
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: data/segments/20140713203805
Generator: finished at 2014-07-13 20:38:13, elapsed: 00:00:15
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-07-13 20:38:13
Fetcher: segment: data/segments/20140713203805
Using queue mode : byHost
Fetcher: threads: 50
Fetcher: time-out divisor: 2
QueueFeeder finished: total 1 records + hit by time limit :0
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
fetching http://blog.tianya.cn/
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Fetcher: throughput threshold: -1
Fetcher: throughput threshold retries: 5
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2014-07-13 20:38:26, elapsed: 00:00:13
ParseSegment: starting at 2014-07-13 20:38:26
ParseSegment: segment: data/segments/20140713203805
Parsed (64ms):http://blog.tianya.cn/
ParseSegment: finished at 2014-07-13 20:38:33, elapsed: 00:00:07
CrawlDb update: starting at 2014-07-13 20:38:33
CrawlDb update: db: data/crawldb
CrawlDb update: segments: [data/segments/20140713203805]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: 404 purging: false
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-07-13 20:38:47, elapsed: 00:00:13
Generator: starting at 2014-07-13 20:38:47
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 2
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: data/segments/20140713203855
Generator: finished at 2014-07-13 20:39:02, elapsed: 00:00:15
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-07-13 20:39:02
Fetcher: segment: data/segments/20140713203855
Using queue mode : byHost
Fetcher: threads: 50
Fetcher: time-out divisor: 2
QueueFeeder finished: total 2 records + hit by time limit :0
Using queue mode : byHost
Using queue mode : byHost
fetching http://blog.tianya.cn/blog/culture
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Fetcher: throughput threshold: -1
Fetcher: throughput threshold retries: 5
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1
* queue: http://blog.tianya.cn
maxThreads = 1
inProgress = 1
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1405255142707
now = 1405255143834

  1. http://blog.tianya.cn/blog/daren
    -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 1
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405255142707
    now = 1405255144838
  2. http://blog.tianya.cn/blog/daren
    -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 1
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405255142707
    now = 1405255145841
  3. http://blog.tianya.cn/blog/daren
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405255151041
    now = 1405255146844
  4. http://blog.tianya.cn/blog/daren
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405255151041
    now = 1405255147847
  5. http://blog.tianya.cn/blog/daren
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405255151041
    now = 1405255148852
  6. http://blog.tianya.cn/blog/daren
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405255151041
    now = 1405255149855
  7. http://blog.tianya.cn/blog/daren
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405255151041
    now = 1405255150858
  8. http://blog.tianya.cn/blog/daren
    fetching http://blog.tianya.cn/blog/daren
    -finishing thread FetcherThread, activeThreads=49
    -finishing thread FetcherThread, activeThreads=48
    -finishing thread FetcherThread, activeThreads=46
    -finishing thread FetcherThread, activeThreads=47
    -finishing thread FetcherThread, activeThreads=45
    -finishing thread FetcherThread, activeThreads=44
    -finishing thread FetcherThread, activeThreads=43
    -finishing thread FetcherThread, activeThreads=41
    -finishing thread FetcherThread, activeThreads=42
    -finishing thread FetcherThread, activeThreads=40
    -finishing thread FetcherThread, activeThreads=39
    -finishing thread FetcherThread, activeThreads=38
    -finishing thread FetcherThread, activeThreads=37
    -finishing thread FetcherThread, activeThreads=36
    -finishing thread FetcherThread, activeThreads=35
    -finishing thread FetcherThread, activeThreads=34
    -finishing thread FetcherThread, activeThreads=33
    -finishing thread FetcherThread, activeThreads=32
    -finishing thread FetcherThread, activeThreads=31
    -finishing thread FetcherThread, activeThreads=30
    -finishing thread FetcherThread, activeThreads=29
    -finishing thread FetcherThread, activeThreads=28
    -finishing thread FetcherThread, activeThreads=27
    -finishing thread FetcherThread, activeThreads=26
    -finishing thread FetcherThread, activeThreads=25
    -finishing thread FetcherThread, activeThreads=24
    -finishing thread FetcherThread, activeThreads=23
    -finishing thread FetcherThread, activeThreads=22
    -finishing thread FetcherThread, activeThreads=21
    -finishing thread FetcherThread, activeThreads=20
    -finishing thread FetcherThread, activeThreads=19
    -finishing thread FetcherThread, activeThreads=18
    -finishing thread FetcherThread, activeThreads=17
    -finishing thread FetcherThread, activeThreads=16
    -finishing thread FetcherThread, activeThreads=15
    -finishing thread FetcherThread, activeThreads=14
    -finishing thread FetcherThread, activeThreads=13
    -finishing thread FetcherThread, activeThreads=12
    -finishing thread FetcherThread, activeThreads=11
    -finishing thread FetcherThread, activeThreads=10
    -finishing thread FetcherThread, activeThreads=9
    -finishing thread FetcherThread, activeThreads=8
    -finishing thread FetcherThread, activeThreads=7
    -finishing thread FetcherThread, activeThreads=6
    -finishing thread FetcherThread, activeThreads=5
    -finishing thread FetcherThread, activeThreads=4
    -finishing thread FetcherThread, activeThreads=3
    -finishing thread FetcherThread, activeThreads=2
    -finishing thread FetcherThread, activeThreads=1
    -finishing thread FetcherThread, activeThreads=0
    -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
    -activeThreads=0
    Fetcher: finished at 2014-07-13 20:39:18, elapsed: 00:00:16
    ParseSegment: starting at 2014-07-13 20:39:18
    ParseSegment: segment: data/segments/20140713203855
    Parsed (13ms):http://blog.tianya.cn/blog/culture
    Parsed (3ms):http://blog.tianya.cn/blog/daren
    ParseSegment: finished at 2014-07-13 20:39:25, elapsed: 00:00:07
    CrawlDb update: starting at 2014-07-13 20:39:25
    CrawlDb update: db: data/crawldb
    CrawlDb update: segments: [data/segments/20140713203855]
    CrawlDb update: additions allowed: true
    CrawlDb update: URL normalizing: true
    CrawlDb update: URL filtering: true
    CrawlDb update: 404 purging: false
    CrawlDb update: Merging segment data into db.
    CrawlDb update: finished at 2014-07-13 20:39:38, elapsed: 00:00:13
    LinkDb: starting at 2014-07-13 20:39:38
    LinkDb: linkdb: data/linkdb
    LinkDb: URL normalize: true
    LinkDb: URL filter: true
    LinkDb: internal links will be ignored.
    LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140713203805
    LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140713203855
    LinkDb: finished at 2014-07-13 20:39:48, elapsed: 00:00:10
    crawl finished: data

没加-topN的日志:

solrUrl is not set, indexing will be skipped…
crawl started in: data
rootUrlDir = urls
threads = 50
depth = 2
solrUrl=null
Injector: starting at 2014-07-14 20:52:04
Injector: crawlDb: data/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 1
Injector: total number of urls injected after normalization and filtering: 2
Injector: Merging injected urls into crawl db.
Injector: finished at 2014-07-14 20:52:23, elapsed: 00:00:19
Generator: starting at 2014-07-14 20:52:23
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: data/segments/20140714205231
Generator: finished at 2014-07-14 20:52:39, elapsed: 00:00:15
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-07-14 20:52:39
Fetcher: segment: data/segments/20140714205231
Using queue mode : byHost
Fetcher: threads: 50
Fetcher: time-out divisor: 2
QueueFeeder finished: total 1 records + hit by time limit :0
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
fetching http://blog.tianya.cn/
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Using queue mode : byHost
-finishing thread FetcherThread, activeThreads=1
Fetcher: throughput threshold: -1
Fetcher: throughput threshold retries: 5
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2014-07-14 20:53:01, elapsed: 00:00:22
ParseSegment: starting at 2014-07-14 20:53:01
ParseSegment: segment: data/segments/20140714205231
Parsed (47ms):http://blog.tianya.cn/
ParseSegment: finished at 2014-07-14 20:53:08, elapsed: 00:00:07
CrawlDb update: starting at 2014-07-14 20:53:08
CrawlDb update: db: data/crawldb
CrawlDb update: segments: [data/segments/20140714205231]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: 404 purging: false
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-07-14 20:53:22, elapsed: 00:00:13
Generator: starting at 2014-07-14 20:53:22
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: data/segments/20140714205330
Generator: finished at 2014-07-14 20:53:37, elapsed: 00:00:15
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-07-14 20:53:37
Fetcher: segment: data/segments/20140714205330
Using queue mode : byHost
Fetcher: threads: 50
Fetcher: time-out divisor: 2
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
fetching http://www.tianya.cn/mobile
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
fetching http://blog.tianya.cn/post-5010184-62889385-1.shtml
QueueFeeder finished: total 100 records + hit by time limit :0
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Using queue mode : byHost
Fetcher: throughput threshold: -1
Fetcher: throughput threshold retries: 5
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98
fetching http://blog.tianya.cn/blog/culture
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=97
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97
fetching http://blog.tianya.cn/post-4487705-62917227-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96
fetching http://blog.tianya.cn/post-1119083-62403495-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=95
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=95
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=95
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95
fetching http://blog.tianya.cn/blog/ent
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=94
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94
fetching http://blog.tianya.cn/post-4598537-62971598-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=93
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93
fetching http://blog.tianya.cn/post-5010184-62834903-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92
fetching http://blog.tianya.cn/post-4877164-61406732-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=91
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91
fetching http://blog.tianya.cn/post-78180-59109533-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=90
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90
fetching http://blog.tianya.cn/post-4362114-63792588-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89
fetching http://blog.tianya.cn/post-3961685-62977022-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88
fetching http://blog.tianya.cn/post-5010184-62890806-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87
fetching http://blog.tianya.cn/blog/mingbo
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=86
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86
fetching http://blog.tianya.cn/post-959477-62971507-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=85
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=85
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85
fetching http://blog.tianya.cn/post-4562315-62807399-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=84
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84
fetching http://blog.tianya.cn/post-3941055-62934113-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83
fetching http://blog.tianya.cn/blog/daren
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=82
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=82
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=82
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82
fetching http://blog.tianya.cn/post-1196211-63799917-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=81
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81
fetching http://blog.tianya.cn/post-196238-62376389-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=80
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=80
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80
fetching http://blog.tianya.cn/post-4700528-62898660-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79
fetching http://blog.tianya.cn/post-1119083-62958234-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=78
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78
fetching http://blog.tianya.cn/post-1671874-62898829-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=77
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77
fetching http://blog.tianya.cn/post-5010184-62313586-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76
fetching http://blog.tianya.cn/post-4598537-62379563-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=75
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=75
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=75
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75
fetching http://blog.tianya.cn/post-236764-59417277-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74
fetching http://blog.tianya.cn/post-4360774-62845782-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73
fetching http://blog.tianya.cn/post-196238-61158698-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=72
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72
fetching http://blog.tianya.cn/post-3340761-62357537-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71
fetching http://blog.tianya.cn/post-4562315-62367801-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=70
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70
fetching http://blog.tianya.cn/post-38484-61144592-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=69
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=69
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69
fetching http://blog.tianya.cn/post-4487705-63000074-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=68
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=68
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68
fetching http://blog.tianya.cn/post-3941055-62972581-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=67
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67
fetching http://blog.tianya.cn/post-2066284-62926321-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=66
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=66
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=66
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66
fetching http://blog.tianya.cn/post-4608093-62651701-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65
fetching http://blog.tianya.cn/post-236764-60248116-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=64
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64
fetching http://blog.tianya.cn/post-5010184-62718271-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=63
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63
fetching http://blog.tianya.cn/post-234213-62960519-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=62
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=62
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62
fetching http://blog.tianya.cn/post-4600300-62374308-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61
fetching http://blog.tianya.cn/post-3739914-62875218-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=60
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60
fetching http://blog.tianya.cn/post-1119083-62979540-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59
fetching http://blog.tianya.cn/post-3773157-62890053-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=58
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58
fetching http://blog.tianya.cn/post-4562315-62899385-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=57
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57
fetching http://blog.tianya.cn/post-2513619-62970447-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=56
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56
fetching http://blog.tianya.cn/post-4482611-62820517-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55
fetching http://blog.tianya.cn/post-236764-58766442-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=54
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=54
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54
fetching http://blog.tianya.cn/post-351212-59432160-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53
fetching http://blog.tianya.cn/post-174091-62981677-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=52
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52
fetching http://blog.tianya.cn/post-78180-62903890-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=51
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51
fetching http://blog.tianya.cn/blog/history
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=50
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=50
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=50
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50
fetching http://blog.tianya.cn/post-1578250-62896383-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=49
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49
fetching http://blog.tianya.cn/post-196238-62190438-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=48
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48
fetching http://blog.tianya.cn/post-196238-61974722-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=47
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47
fetching http://blog.tianya.cn/post-4700528-62898663-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=46
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46
fetching http://blog.tianya.cn/post-5010184-62837336-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=45
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=45
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=45
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45
fetching http://blog.tianya.cn/blog/finance
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=44
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44
fetching http://blog.tianya.cn/post-145340-62426203-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=43
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=43
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43
fetching http://blog.tianya.cn/post-1870300-63794004-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42
fetching http://blog.tianya.cn/post-863996-62974859-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=41
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41
fetching http://blog.tianya.cn/post-3727390-62972109-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40
fetching http://blog.tianya.cn/blog/emotion
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39
fetching http://blog.tianya.cn/post-336487-63732130-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38
fetching http://blog.tianya.cn/post-4025452-63785440-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=37
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37
fetching http://blog.tianya.cn/post-137239-63797690-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36
fetching http://blog.tianya.cn/post-1838543-62970839-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=35
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=35
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35
fetching http://blog.tianya.cn/blog/society
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34
fetching http://blog.tianya.cn/post-542686-63799203-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33
fetching http://blog.tianya.cn/post-1438407-62987507-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=32
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32
fetching http://blog.tianya.cn/post-3773157-62390018-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31
fetching http://blog.tianya.cn/post-78180-58859246-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=30
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30
fetching http://blog.tianya.cn/post-236764-62962675-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29
fetching http://blog.tianya.cn/blog/life
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28
fetching http://blog.tianya.cn/post-1883179-62390915-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27
fetching http://blog.tianya.cn/post-4009947-62401775-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=26
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26
fetching http://blog.tianya.cn/post-4047683-63794167-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25
fetching http://blog.tianya.cn/post-1755624-62987935-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=24
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=24
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24
fetching http://blog.tianya.cn/post-5010184-62690266-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=23
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23
fetching http://blog.tianya.cn/post-4353581-62972558-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22
fetching http://blog.tianya.cn/blog/international
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21
fetching http://blog.tianya.cn/post-196238-61768175-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=20
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20
fetching http://blog.tianya.cn/post-4877164-61415979-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19
fetching http://blog.tianya.cn/post-544588-62883194-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=18
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=18
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18
fetching http://blog.tianya.cn/post-4250142-62927024-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=17
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17
fetching http://blog.tianya.cn/post-78180-62980961-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16
fetching http://blog.tianya.cn/post-4353581-62972544-1.shtml
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15
fetching http://blog.tianya.cn/blog/stock
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14
fetching http://blog.tianya.cn/post-4482611-62391796-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=13
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=13
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13
fetching http://blog.tianya.cn/post-4482611-62900444-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=12
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12
fetching http://blog.tianya.cn/blog/sports
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11
fetching http://blog.tianya.cn/post-1882702-63776337-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=10
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10
fetching http://blog.tianya.cn/post-3773157-62958131-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=9
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9
fetching http://blog.tianya.cn/post-4101233-62653750-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=8
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8
fetching http://blog.tianya.cn/blog/newPush
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7
fetching http://blog.tianya.cn/post-2111189-62899907-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=6
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6
fetching http://blog.tianya.cn/blog/food
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=5
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5
fetching http://blog.tianya.cn/post-1515015-63779836-1.shtml
-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=4
* queue: http://blog.tianya.cn
maxThreads = 1
inProgress = 1
crawlDelay = 5000
minCrawlDelay = 0
nextFetchTime = 1405343081023
now = 1405343081793

  1. http://blog.tianya.cn/post-142905-62961160-1.shtml
  2. http://blog.tianya.cn/blog/travel
  3. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  4. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343087577
    now = 1405343082796
  5. http://blog.tianya.cn/post-142905-62961160-1.shtml
  6. http://blog.tianya.cn/blog/travel
  7. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  8. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343087577
    now = 1405343083799
  9. http://blog.tianya.cn/post-142905-62961160-1.shtml
  10. http://blog.tianya.cn/blog/travel
  11. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  12. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343087577
    now = 1405343084804
  13. http://blog.tianya.cn/post-142905-62961160-1.shtml
  14. http://blog.tianya.cn/blog/travel
  15. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  16. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343087577
    now = 1405343085806
  17. http://blog.tianya.cn/post-142905-62961160-1.shtml
  18. http://blog.tianya.cn/blog/travel
  19. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  20. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343087577
    now = 1405343086809
  21. http://blog.tianya.cn/post-142905-62961160-1.shtml
  22. http://blog.tianya.cn/blog/travel
  23. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  24. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    fetching http://blog.tianya.cn/post-142905-62961160-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343092743
    now = 1405343087813
  25. http://blog.tianya.cn/blog/travel
  26. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  27. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343092743
    now = 1405343088816
  28. http://blog.tianya.cn/blog/travel
  29. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  30. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343092743
    now = 1405343089819
  31. http://blog.tianya.cn/blog/travel
  32. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  33. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343092743
    now = 1405343090821
  34. http://blog.tianya.cn/blog/travel
  35. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  36. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343092743
    now = 1405343091824
  37. http://blog.tianya.cn/blog/travel
  38. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  39. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    fetching http://blog.tianya.cn/blog/travel
    -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=2
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 1
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343092743
    now = 1405343092826
  40. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  41. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343093829
  42. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  43. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343094833
  44. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  45. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343095835
  46. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  47. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343096838
  48. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  49. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343097840
  50. http://blog.tianya.cn/post-4598537-62971461-1.shtml
  51. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    fetching http://blog.tianya.cn/post-4598537-62971461-1.shtml
    -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 1
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343098843
  52. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 1
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343099846
  53. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 1
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343098775
    now = 1405343100849
  54. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343105959
    now = 1405343101851
  55. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343105959
    now = 1405343102853
  56. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343105959
    now = 1405343103855
  57. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343105959
    now = 1405343104857
  58. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1
    * queue: http://blog.tianya.cn
    maxThreads = 1
    inProgress = 0
    crawlDelay = 5000
    minCrawlDelay = 0
    nextFetchTime = 1405343105959
    now = 1405343105859
  59. http://blog.tianya.cn/post-4598537-62971498-1.shtml
    fetching http://blog.tianya.cn/post-4598537-62971498-1.shtml
    -finishing thread FetcherThread, activeThreads=49
    -finishing thread FetcherThread, activeThreads=48
    -finishing thread FetcherThread, activeThreads=47
    -finishing thread FetcherThread, activeThreads=46
    -finishing thread FetcherThread, activeThreads=45
    -finishing thread FetcherThread, activeThreads=44
    -finishing thread FetcherThread, activeThreads=43
    -finishing thread FetcherThread, activeThreads=42
    -finishing thread FetcherThread, activeThreads=41
    -finishing thread FetcherThread, activeThreads=40
    -finishing thread FetcherThread, activeThreads=39
    -finishing thread FetcherThread, activeThreads=38
    -finishing thread FetcherThread, activeThreads=37
    -finishing thread FetcherThread, activeThreads=36
    -finishing thread FetcherThread, activeThreads=29
    -finishing thread FetcherThread, activeThreads=30
    -finishing thread FetcherThread, activeThreads=31
    -finishing thread FetcherThread, activeThreads=32
    -finishing thread FetcherThread, activeThreads=33
    -finishing thread FetcherThread, activeThreads=34
    -finishing thread FetcherThread, activeThreads=35
    -finishing thread FetcherThread, activeThreads=28
    -finishing thread FetcherThread, activeThreads=20
    -finishing thread FetcherThread, activeThreads=21
    -finishing thread FetcherThread, activeThreads=22
    -finishing thread FetcherThread, activeThreads=23
    -finishing thread FetcherThread, activeThreads=19
    -finishing thread FetcherThread, activeThreads=18
    -finishing thread FetcherThread, activeThreads=17
    -finishing thread FetcherThread, activeThreads=16
    -finishing thread FetcherThread, activeThreads=15
    -finishing thread FetcherThread, activeThreads=14
    -finishing thread FetcherThread, activeThreads=13
    -finishing thread FetcherThread, activeThreads=12
    -finishing thread FetcherThread, activeThreads=11
    -finishing thread FetcherThread, activeThreads=10
    -finishing thread FetcherThread, activeThreads=9
    -finishing thread FetcherThread, activeThreads=8
    -finishing thread FetcherThread, activeThreads=7
    -finishing thread FetcherThread, activeThreads=6
    -finishing thread FetcherThread, activeThreads=24
    -finishing thread FetcherThread, activeThreads=25
    -finishing thread FetcherThread, activeThreads=26
    -finishing thread FetcherThread, activeThreads=27
    -finishing thread FetcherThread, activeThreads=1
    -finishing thread FetcherThread, activeThreads=2
    -finishing thread FetcherThread, activeThreads=3
    -finishing thread FetcherThread, activeThreads=4
    -finishing thread FetcherThread, activeThreads=5
    -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
    -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
    -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
    -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
    -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
    -finishing thread FetcherThread, activeThreads=0
    -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
    -activeThreads=0
    Fetcher: finished at 2014-07-14 21:05:17, elapsed: 00:11:39
    ParseSegment: starting at 2014-07-14 21:05:17
    ParseSegment: segment: data/segments/20140714205330
    Parsed (13ms):http://blog.tianya.cn/blog/culture
    Parsed (2ms):http://blog.tianya.cn/blog/daren
    Parsed (15ms):http://blog.tianya.cn/blog/emotion
    Parsed (8ms):http://blog.tianya.cn/blog/ent
    Parsed (7ms):http://blog.tianya.cn/blog/finance
    Parsed (7ms):http://blog.tianya.cn/blog/food
    Parsed (12ms):http://blog.tianya.cn/blog/history
    Parsed (6ms):http://blog.tianya.cn/blog/international
    Parsed (6ms):http://blog.tianya.cn/blog/life
    Parsed (3ms):http://blog.tianya.cn/blog/mingbo
    Parsed (7ms):http://blog.tianya.cn/blog/newPush
    Parsed (16ms):http://blog.tianya.cn/blog/society
    Parsed (8ms):http://blog.tianya.cn/blog/sports
    Parsed (8ms):http://blog.tianya.cn/blog/stock
    Parsed (20ms):http://blog.tianya.cn/blog/travel
    Parsed (6ms):http://blog.tianya.cn/post-1119083-62403495-1.shtml
    Parsed (4ms):http://blog.tianya.cn/post-1119083-62958234-1.shtml
    Parsed (5ms):http://blog.tianya.cn/post-1119083-62979540-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-1196211-63799917-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-137239-63797690-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-1438407-62987507-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-145340-62426203-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-1515015-63779836-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-1578250-62896383-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-1671874-62898829-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-174091-62981677-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-1755624-62987935-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-1838543-62970839-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-1870300-63794004-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-1882702-63776337-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-1883179-62390915-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-196238-61158698-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-196238-61768175-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-196238-61974722-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-196238-62190438-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-196238-62376389-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-2066284-62926321-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-2111189-62899907-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-234213-62960519-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-236764-58766442-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-236764-59417277-1.shtml
    http://blog.tianya.cn/post-236764-60248116-1.shtml skipped. Content of size 65778 was truncated to 64957
    Parsed (1ms):http://blog.tianya.cn/post-236764-62962675-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-2513619-62970447-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-3340761-62357537-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-3727390-62972109-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-3739914-62875218-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-3773157-62390018-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-3773157-62890053-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-3773157-62958131-1.shtml
    http://blog.tianya.cn/post-38484-61144592-1.shtml skipped. Content of size 154978 was truncated to 64956
    Parsed (0ms):http://blog.tianya.cn/post-3941055-62934113-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-3941055-62972581-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-3961685-62977022-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-4009947-62401775-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4025452-63785440-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4047683-63794167-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4101233-62653750-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4250142-62927024-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4353581-62972544-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4353581-62972558-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4360774-62845782-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4362114-63792588-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4482611-62391796-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4482611-62820517-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4482611-62900444-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4487705-62917227-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4487705-63000074-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4562315-62367801-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4562315-62807399-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4562315-62899385-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4598537-62379563-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4598537-62971461-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4598537-62971498-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4598537-62971598-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-4600300-62374308-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4608093-62651701-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4700528-62898660-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4700528-62898663-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4877164-61406732-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-4877164-61415979-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-5010184-62313586-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-5010184-62690266-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-5010184-62718271-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-5010184-62834903-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-5010184-62837336-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-5010184-62889385-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-5010184-62890806-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-542686-63799203-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-544588-62883194-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-78180-58859246-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-78180-59109533-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-78180-62903890-1.shtml
    Parsed (0ms):http://blog.tianya.cn/post-78180-62980961-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-863996-62974859-1.shtml
    Parsed (1ms):http://blog.tianya.cn/post-959477-62971507-1.shtml
    ParseSegment: finished at 2014-07-14 21:05:30, elapsed: 00:00:13
    CrawlDb update: starting at 2014-07-14 21:05:30
    CrawlDb update: db: data/crawldb
    CrawlDb update: segments: [data/segments/20140714205330]
    CrawlDb update: additions allowed: true
    CrawlDb update: URL normalizing: true
    CrawlDb update: URL filtering: true
    CrawlDb update: 404 purging: false
    CrawlDb update: Merging segment data into db.
    CrawlDb update: finished at 2014-07-14 21:05:43, elapsed: 00:00:13
    LinkDb: starting at 2014-07-14 21:05:43
    LinkDb: linkdb: data/linkdb
    LinkDb: URL normalize: true
    LinkDb: URL filter: true
    LinkDb: internal links will be ignored.
    LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140714205231
    LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140714205330
    LinkDb: finished at 2014-07-14 21:05:53, elapsed: 00:00:10
    crawl finished: data

可以看到总共用了13分钟。日志靠近最下方有parse各个网页的时间

输入这条命令:

cat nohup.out|grep elapsed

显示信息如下:

Injector: finished at 2014-07-14 20:52:23, elapsed: 00:00:19
Generator: finished at 2014-07-14 20:52:39, elapsed: 00:00:15
Fetcher: finished at 2014-07-14 20:53:01, elapsed: 00:00:22
ParseSegment: finished at 2014-07-14 20:53:08, elapsed: 00:00:07
CrawlDb update: finished at 2014-07-14 20:53:22, elapsed: 00:00:13
Generator: finished at 2014-07-14 20:53:37, elapsed: 00:00:15
Fetcher: finished at 2014-07-14 21:05:17, elapsed: 00:11:39
ParseSegment: finished at 2014-07-14 21:05:30, elapsed: 00:00:13
CrawlDb update: finished at 2014-07-14 21:05:43, elapsed: 00:00:13
LinkDb: finished at 2014-07-14 21:05:53, elapsed: 00:00:10

每次crawl都是从injector(注入url)开始,然后Generator(产生抓取列表),接着Fetch(抓取),然后ParseSegment(内容解析), CrawlDb update(更新CrawlDb)为一轮,最后以LinkDb结束。由于没加topN所以第二层的Fetch用了11分钟时间。

这里有一篇nutch执行crawl命令的详细步骤的文章:http://www.cnblogs.com/huligong1234/p/3515214.html

以下命令只给出例子,详细说明可参见上面的链接

readdb:  readdb命令是“org.apache.nutch.crawl.CrawlDbReader”的别称,返回或者导出Crawl数据库(crawldb)中的信息。

      例子1:./bin/nutch readdb data/crawldb -stats

          指定抓取完成后的数据在data/crawldb中

          -stats代表在java标准输出中输出信息,如url数、已抓取数、未抓取数

以下是输出信息:

CrawlDb statistics start: data/crawldb
Statistics for CrawlDb: data/crawldb
TOTAL urls: 1469
retry 0: 1469
min score: 0.0
avg score: 0.0017549354
max score: 1.032
status 1 (db_unfetched): 1368
status 2 (db_fetched): 97
status 4 (db_redir_temp): 3
status 5 (db_redir_perm): 1
CrawlDb statistics: done

      例子2:./bin/nutch readdb data/crawldb -dump data/crawldb/crawldb_dump

          -dump把统计信息输出到后面的文件中

       例子3:./bin/nutch readdb data/crawldb -url http://zxcvbnm20111.blog.tianya.cn/

          输出 http://zxcvbnm20111.blog.tianya.cn/这个url的详细信息

          这个网页是在运行例子2的命令之后,在data/crawldb/crawldb_dump文件中找的

          信息如下:

CrawlDb dump: starting
CrawlDb db: data/crawldb
CrawlDb dump: done
lan@Ubuntu1:~/nutch/local$
lan@Ubuntu1:~/nutch/local$ ./bin/nutch readdb data/crawldb -url http://zxcvbnm20111.blog.tianya.cn/
URL: http://zxcvbnm20111.blog.tianya.cn/
Version: 7
Status: 1 (db_unfetched)
Fetch time: Mon Jul 14 21:05:40 CST 2014
Modified time: Thu Jan 01 08:30:00 CST 1970
Retries since fetch: 0
Retry interval: 2592000 seconds (30 days)
Score: 2.9411764E-4
Signature: null
Metadata:

      例子3:./bin/nutch readdb data/crawldb -topN 10 data/crawldb/crawldb_topN 0.5

          在data/crawldb/crawldb_topN文件中输出排名前十的且分值大>=0.5的url及其分值

readseg:  例子1:./bin/nutch readseg -dump data/segments/20140714205330 data/segments/dump -nocontent -nofetch -noparse -noparsedata -noparsetext

 查看segments产生的信息,输出到data/segments/dump文件(在参数中少了-nogenerate,就是说只写入产生segments的信息)

          如果查看fetch信息,就把-nofetch改成-nogenerate

          要查看content信息,就把-nocontent改成-nogenerate

          同理,还有parse、parsedata和parsetext,不再赘述

      例子2: ./bin/nutch readseg -list -dir data/segments

          以列表的方式显示每次产生的segments

      例子3: ./bin/nutch readseg -get data/segments/20140714205231 http://blog.tianya.cn/

          显示某个segments的信息,哇塞,有一大堆html代码和内容~

readlinkdb:  例子1:./bin/nutch readlinkdb data/linkdb -dump data/linkdb/dump

            将linkdb的信息dump到data/linkdb/dump文件里

         例子2:./bin/nutch readlinkdb data/linkdb -url http://cnrdn.com/4NJC

            查看某具体url。这个url是我在上面的dump文件中复制出来的

            结果将会产生和dump文件中该url下面几行一样的文字

Nutch的抓取周期

generate -> fetch -> parse -> update db

实际上,crawl命令等于inject命令+generate命令+fetch命令+parse命令+updatedb命令+invertlinks命令:

inject:     例子1:  ./bin/nutch inject data/crawldb urls

            把要抓取的url注入到crawldb中。url存放在urls文件夹中的所有文件中,注入到data/crawldb中。

            要保证data不存在

generate:      例子: ./bin/nutch generate data/crawldb data/segments

fetch:      例子:./bin/nutch fetch data/segments/20140716205702 -threads 3

parse:     例子:./bin/nutch parse data/segments/20140716205702

updatedb:   例子: ./bin/nutch updatedb data/crawldb -dir data/segments

mergesegs:   例子:./bin/nutch mergesegs data2/segments_all -dir data2/segments

            要注意,在segments文件夹及其子文件夹中不要有自己另外生成的东西

            非常有用的命令,合并之后文件变小。文件越多越大,合并效果越好。I/O越快
            类似的还有mergedb、mergelinkdb命令

invertlinks:  例子:./bin/nutch invertlinks data/linkdb -dir data/segments

            要注意,在segments文件夹及其子文件夹中不要有自己另外生成的东西。

            计算反向链接分析新输入的segment目录,产生新的反向链接库
            把新产生的反向链接库与原来的库进行合并

            通过计算有多少个网页指向当前网页,来计算当前网页的分值

parsecheker:  例子1:./bin/nutch parsechecker http://apdplat.org

            可以方便的查看网页中有哪些链接

        例子2: ./bin/nutch parsechecker -dumpText http://apdplat.org

只查看网页中的文本

域统计

./bin/nutch domainstats data/crawldb/current host host

第一个host是输出目录,第二个host是输出选项

./bin/nutch domainstats data/crawldb/current domain domain

./bin/nutch domainstats data/crawldb/current suffix suffix

./bin/nutch domainstats data/crawldb/current tld tld

从host级别到tld级别统计信息越来越少,因为后面的url包括前面的url,

假如有网址 http://www.cnblogs.com.cn/,host是www.cnblogs.com.cn,domain是cnblogs.com.cn,suffix是顶级域名com.cn。tld是比顶级域名还高级的域名,在这里就是cn,如果url是http://www.cnblogs.com,那么tld和suffix都是com

webgraph

./bin/nutch webgraph -segmentDir data/segments -webgraphdb data/webgraphdb

指定segments输入路径和webgraphdb输出路径。将在data/webgraphdb生成Outlinks、Inlinks、Nodes

分别对应输出链接及数量,输入链接及数量,url及其分值

第一次执行webgraph命令时,nodes中的所有url的分值为0,因此需要执行linkrank命令

输出链接是保存在parse_data里的,所有OutLinkDb的的输入链接是parse_data

由输出连接可以得到所有网页的输入连接,就能计算每个网页的分值

nodedumper和linkrank

./bin/nutch nodedumper -topn 1 -inlinks -output inlinks_topn_1 -webgraphdb data/webgraphdb
  查看data/webgraphdb里的文件内容,可以看到url和输入链接数量

  -asSequenceFile参数是生成序列文件,因为序列文件是2进制的,这里不用

  -topn,如果有相同的输出链接,只输出topn条

  -inlinks,按输入链接降序排序,类似的还有-outlinks、-scores

  -output,指定输出目录

  -webgraphdb,指定webgraphdb路径

  如果按照scores来排序,在生成的文件中,我们可以看到所有的url分值都为0,

  这说明经过执行webgraph命令,所有的url分值都为0

./bin/nutch linkrank -webgraphdb data/webgraphdb

计算分值并记录起来

然后在用命令:

./bin/nutch nodedumper -topn 1 -scores -output after-inject-scores -webgraphdb data/webgraphdb

可以发现after-inject-scores文件夹里的文件内容里的url分值不再是0

./bin/nutch nodedumper -group domain sum -inlinks -output inlink_domain_sum -webgraphdb data/webgraphdb

生成分组数据

domain可以替换成host, sum可以替换成max。这个两个参数要放在-group之后

如果对上面命令再加上-topn 1,输出路径改为inlink_domain_sum_1,会发现这个文件中的有些输入连接数少了

说明nodedumper先进行分组,然后再对每个组中的top1进行求和(和等于每组的最大输入链接数)

注入分值

./bin/nutch scoreupdater -crawldb data/crawldb -webgraphdb data/webgraphdb

crawl命令默认使用了opic插件来计算分值。而webgraph的计算分值方式是从1.0开始有的,

比较完善。

轻量级抓取freegen

./bin/nutch freegen urls2 data3/segments

urls2文件夹中存放了新生成的保存url的文件,有一个url在里边:http://apdplat.org

新生成的段输出到data3/segments

这个命令可以绕过抓取庞大的crawldb库,专门去通过某些url生成segments

配置solr服务器

检查是否配置了索引插件配置是否成功:

./bin/nutch indexchecker http://www.163.com

在显示的信息中title和content比较重要

找了很久才找到3.6.2和4.2.0的下载地址。现在主页上已经不能下载。

这里是solr各版本下载地址:http://archive.apache.org/dist/lucene/solr/

这里使用的是3.6.2

配置solr

1. 把nutch的conf/schema.xml复制到solr的/example/solr/conf中,注意备份solr的schema.xml

在nutch的conf/nutch-default.xml中搜索index-  ,会找到如下xml段

plugin.includes protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic) Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. In order to use HTTPS please enable protocol-httpclient, but be aware of possible intermittent problems with the underlying commons-httpclient library.

可以看到这是把插件include到nutch的配置,index-basic和index-anchor是其中两个插件

进入solr的/example/solr/conf/schema.xml中(即刚才拷贝的nutch的schema.xml)

同样所有index-,其中有两个段







    <!-- fields for index-anchor plugin -->  
    <field name="anchor" type="string" stored="true" indexed="true"  
        multiValued="true"/>

field配置了这两个插件的字段,除了field之外,在上面还有id

2. 把solr的/example/solr/conf/solrconfig.xml中的所有的text改成content

默认搜索的字段应该是content

3. 启动solr。到example下运行start.jar: java -jar start.jar &   (后台运行)

如果没有配置第2步,就会报错说找不到text

4. 打开浏览器访问localhost:8983

可以看到solr管理网页。solr内嵌了jetty服务器,因此能够用b/s方式管理solr

5. 回到nutch的local下输入 bin/nutch | grep solr

可以发现有三条命令:solrindex(建solr索引)、solrdedup(去重)、solrclean(去除301永久重定向、404网址)

6. 通过crawldb、linkdb和segments来把索引提交给http://localhost:8983/solr:

在local目录下输入bin/nutch solrindex http://localhost:8983/solr data/crawldb -linkdb data/linkdb -dir data/segments

输出信息中有:Indexing 3 documents

如果超过250个,就会indexing多次,这个可以在conf/nutch-default.xml或nutch-site.xml中配置solr.commit.size(default中有样例)

调高数量可以提高效率,但是更占内存

solr将会把索引保存在example/solr/data/index中

使用Luke

luku是lucene的索引工具箱,可以方便查看和搜索索引,便于调试

下载地址:http://code.google.com/p/luke/downloads/list

这里使用的是:lukeall-4.0.0-ALPHA.jar

将solr的example/solr/data/index目录拷贝到本地(这里我把index目录拷贝到windows桌面,luke的jar包也放在桌面)

双击jar即可运行luke。Luke会自动提示你指向索引文件夹。

如上图,有10个字段,在左下角的框中显示了schema.xml中的1个id,3个core fields,5个index-basic字段(不知道为什么少了一个cache字段),1个index-anchor字段

选中一个字段,再点show top term可以看到具体的分词

id字段是完整的、不分词的

点击Documents标签,可以通过docments数量来查看字段

注意点一下左上角的绿色左箭头(仅仅是为了让框里有内容从而显示字段信息),然后按绿色的右箭头:

在title字段中找一个text,比如2014。点击search标签在左上角的框中输入title:2014 ,再点search,可以搜索到索引。应该指出:可能某些title是搜不出来的,应该确保建索引时的分词器和搜索时的分词器一致!比如title=明星娱乐圈,而用Luke会把整个title给分成4个字再搜索,这样会导致搜索不出。之后会讲到设置分词器。

**solr配置mmseg4j分词器
**

solr自带的分词器对中文分词不好,导致Luke搜索不到索引信息,因此使用mmseg4j

下载地址:https://code.google.com/p/mmseg4j/downloads/list

这里使用的是mmseg4j-1.8.5.zip

1. 把solr停下来。使用jps命令查看进程号, 然后输入kill -9 进程号  关掉solr

2. 删除solr的example/solr/data目录

3. 在solr的example/solr下新建lib文件夹

4. 把mmseg4j中的mmseg4j-all-1.8.5-with-dic.jar拷贝到solr的example/solr/lib中,让solr的服务器加载这个jar包

5. 修改solr的example/solr/conf/schema.xml:

  把

  替换成

  意思就是配置solr使用mmseg4j的类进行分词,默认的分词方法对中文分词效果不好

之后再打开solr服务器,把索引注入到solr中,然后用Luke打开索引文件夹,在使用搜索中文字段时就能找到了~

可以发现Luke中Number of terms明显变小,这是因为solr配置mmseg4j后把多个汉字分成1个词,原先可能是每个汉字一个词。

另外,可以在localhost:8983/solr/admin页面的Query String: 查询框中也可以搜索,搜索语法和Luke一样,都是"字段名:要查的值"

虽然现在已经配置了solr的分词器,但是Luke还没配置mmseg4j作为分词器,

在Luke中search "title:客户端" 时,会分成3个词(现在Luke还没指定mmseg4j为分词器),

而如果Luke配置了mmseg4j作为分词器时,会把“客户端”当成一个词

Luke配置mmseg4j

虽然luke可以搜索到索引了,solr和luke最好使用同一个分词器。

1.8.5版本跟luke4.0版本有冲突,所以luke使用1.9.1的mmseg4j,下载地址上面有。

这里使用:mmseg4j-1.9.1.v20130120-SNAPSHOT.zip

把mmseg4j-1.9.1的dist中的三个jar包解压出来,并把解压出来的data文件夹和com文件夹复制到Luke的jar中

打开Luke并指定index目录,点击search选项卡,看到右边有一个下拉框选择用来处理分词的类ComplexAnalyzer,在下拉框右边选默认字段为content

搜索:title:客户端

会发现Query Details框中分词就是”客户端”,如果是用原先没配置,会把“客户端”当成3个词

solr4.2

以下内容提到的solr目录均以

1. solr4.2的example/solr/中多了一个collection1文件夹。要把nutch的local/conf/schema-solr4.xml拷贝到solr4.2的example/solr/collection1的conf目录中并重命名为schema.xml

2. solr4.2不需要把schema.xml中的text换成content。怎么看应该修改成那个字段?打开shema.xml,拉到下边有这个标签:

content

solr4.2的这个标签就是text,所以不用改。而solr3.6.2的这个标签是content,所以得把所有的text改成content

3. 在schema.xml的fields标签中加入一个_version_标签,不然启动solr时会报错:

4. 启动solr,也是打开start.jar

5. 拷贝mmseg4j-1.9.1的jar。同样,也是拷贝jar就好了,把mmseg4j的dist中的jar拷贝到solr4.2的collection1的lib目录下,注意如果没有lib文件夹要先mkdir

6. 配置mmseg4j。

  修改solr的example/solr/conf/schema.xml:

  把

  替换成

  注意别把除WhitespaceTokenizerFactory和StandardTokenizerFactory之外的tokenizer给改了

之后就可以提交索引到solr了

提交索引后,可以进入localhost:8983的core admin中查看索引数等信息:

在左下角有个Core Selector下拉框可以选collection1,然后点击下方的query,在右边的界面就能够查询索引了:

Cygwin安装nutch

在windows系统中使用虚拟机来使用linux比较重量级,

我安装的Ubuntu虚拟机需要4G内存才能跑得不那么卡。

Cygwin相比之下比较轻量级,而且能够方便地在cygwin的环境中使用windows的东西。

如果java的目录有包含空格,那么运行 nutch crawl命令时就会出错。

比如使用cygwin时,首先把apache-nutch-1.6复制到cygwin目录下的home/Administrator(取决于你的操作系统用户名)中,

打开cygwin,进入nutch的bin目录下执行./nutch crawl命令,会提示你该目录不存在(如果你的java安装在c:/Program Files/中)。

解决方法:

把整个java目录拷贝到cygwin的home/Administrator目录下,

并设置JAVA_HOME为c:/cygwin/home/Administrator/Java/jdk1.6.0_21 就好了

如果机器上有多个jdk,那就为cygin设置NUTCH_JAVA_HOME。

注意cygwin中的环境变量是windows的目录。

nutch与hadoop

hadoop的教程可以在这里找到:http://www.cnblogs.com/lanhj/p/3841709.html

这里用到的nutch保存在前面用ant编译生成的deploy文件夹,即nutch把job提交给hadoop执行的版本

在nutch的conf/nutch-site.xml中加入http.agent.name的键值对(前面有)就ok啦~

当然hadoop至少要运行自带的WordCount.java成功,并且配上HADOOP_HOME环境变量才行。

还是老样子,先在deploy中生成urls文件夹,在里面生成保存url的文档

然后执行crawl命令:

bin/nutch crawl urls -dir data -threads 50 -depth 2 -topN 1

是不是报出非法输入错误?

原因是:job是hadoop执行的,hadoop默认的目录是HDFS上的目录,因此我们需要把urls上传到HDFS上:

hadoop fs -put urls /user/xxx/

hadoop fs -ls /user/xxx

第一条命令是把urls上传到HDFS的user/xxx目录下(nutch的job要求inject的urls存放在 /user/用户名/ 下),第二条命令是看该目录下有哪些文件。

可以看到urls已经上传到HDFS了(在我的另一篇随笔中,由于我比较懒,暂时没写关于HDFS的概念、命令、api。之前做的ppt和代码都还在,有空再上传)

再次运行crawl命令,

命令执行到半的时候,可以打开http://localhost:50030(hadoop查看mapreduce和jobtracker的页面)

可以看见有Map task或Reduce task

等结束以后,再查看HDFS的/user/xxx目录,可以发现生成了data文件夹

嫌查看HDFS的文件命令麻烦,就打开localhost:80070,然后点击Browse the filesystem查看HDFS上的文件

这个网页只能查看目录以及文档,不能删除、上传、更新

localhost:50060可以查看tasktracker信息

hadoop也内嵌了jetty服务器,所以可以用网页的方式查看hadoop的情况

7月22日更新完毕

手机扫一扫

移动阅读更方便

阿里云服务器
腾讯云服务器
七牛云服务器

你可能感兴趣的文章