Modifier and Type | Class and Description |
---|---|
class |
Crawler
风铃虫
|
Modifier and Type | Method and Description |
---|---|
boolean |
InMemoryRequestCache.exist(Task task,
Request request) |
boolean |
RequestCache.exist(Task task,
Request request)
先查找请求任务是否在集合中存在
|
boolean |
RedisRequestCache.exist(Task task,
Request request) |
long |
InMemoryRequestCache.getCount(Task task) |
long |
RequestCache.getCount(Task task)
获取指定缓存集合的请求任务数量
|
long |
RedisRequestCache.getCount(Task task) |
boolean |
InMemoryRequestCache.lookAndCache(Task task,
Request request) |
boolean |
RequestCache.lookAndCache(Task task,
Request request)
先查找请求任务是否在集合中存在,然后将该请求任务存储到此集合中
|
boolean |
RedisRequestCache.lookAndCache(Task task,
Request request) |
void |
InMemoryRequestCache.remove(Task task) |
void |
RequestCache.remove(Task task)
移除指定的缓存集合
|
void |
RedisRequestCache.remove(Task task) |
void |
InMemoryRequestCache.save(Task task,
Request request) |
void |
RequestCache.save(Task task,
Request request)
将请求任务存储到指定的集合名中
|
void |
RedisRequestCache.save(Task task,
Request request) |
Modifier and Type | Method and Description |
---|---|
void |
CrawlerListener.exitOnBlock(Task task)
任务因为被目标服务器封杀而退出
|
void |
SimpleCrawlerListener.exitOnBlock(Task task) |
void |
CrawlerListener.exitOnFinish(Task task)
任务因为已经完成而退出
|
void |
SimpleCrawlerListener.exitOnFinish(Task task) |
void |
CrawlerListener.onDownError(Task task,
Page page,
Exception e)
下载页面失败的消息
|
void |
SimpleCrawlerListener.onDownError(Task task,
Page page,
Exception e) |
void |
CrawlerListener.onDownSuccess(Task task,
Page page)
下载页面成功的消息
|
void |
SimpleCrawlerListener.onDownSuccess(Task task,
Page page) |
void |
CrawlerListener.onExtractError(Task task,
Page page,
Exception e)
解析页面失败的消息
|
void |
SimpleCrawlerListener.onExtractError(Task task,
Page page,
Exception e) |
void |
CrawlerListener.onExtractSuccess(Task task,
Page page)
解析页面成功的消息
|
void |
SimpleCrawlerListener.onExtractSuccess(Task task,
Page page) |
void |
CrawlerListener.onNullRquest(Task task)
获取的调度命令的请求的url为空时触发
|
void |
SimpleCrawlerListener.onNullRquest(Task task) |
Modifier and Type | Method and Description |
---|---|
void |
SimpleStatuObserver.update(Task task,
Statu statu) |
void |
StatuObserver.update(Task task,
Statu statu)
任务的状态发生了变化
|
Modifier and Type | Method and Description |
---|---|
void |
SchedulerDecorator.clear(Task task) |
void |
Scheduler.clear(Task task)
清空任务
|
Request |
SchedulerDecorator.poll(Task task) |
Request |
Scheduler.poll(Task task)
从资源调度器里获取一个请求任务
|
void |
SchedulerDecorator.push(Task task,
Request request) |
void |
Scheduler.push(Task task,
Request request)
接收所有的请求任并存储起来
|
Modifier and Type | Method and Description |
---|---|
void |
RedisScheduler.clear(Task task) |
void |
SimpleScheduler.clear(Task task) |
Request |
RedisScheduler.poll(Task task) |
Request |
SimpleScheduler.poll(Task task) |
void |
RedisScheduler.push(Task task,
Request request) |
void |
SimpleScheduler.push(Task task,
Request request) |
Modifier and Type | Method and Description |
---|---|
void |
SimpleDuplicateRemover.doWhenNoDuplicate(Task task,
RequestCache requestCache,
Request request) |
void |
DuplicateRemover.doWhenNoDuplicate(Task task,
RequestCache requestCache,
Request request)
当前请求没有重复时需要进行的操作,一般来说,只需将该请求存入请求任务缓存器即可
|
void |
HostDuplicateRemover.doWhenNoDuplicate(Task task,
RequestCache requestCache,
Request request) |
boolean |
SimpleDuplicateRemover.noDuplicate(Task task,
RequestCache requestCache,
Request request) |
boolean |
DuplicateRemover.noDuplicate(Task task,
RequestCache requestCache,
Request request)
判断当前请求是否重复
|
boolean |
HostDuplicateRemover.noDuplicate(Task task,
RequestCache requestCache,
Request request) |
Modifier and Type | Method and Description |
---|---|
static Task |
LocalCrawler.get()
获取风铃虫任务信息
|
Modifier and Type | Method and Description |
---|---|
static void |
LocalCrawler.put(Task crawler)
放置一个风铃虫任务信息
|
Copyright © 2020 Pivotal Software, Inc.. All rights reserved.