Modifier and Type | Method and Description |
---|---|
ExtractRule |
CrawlerBuilder.extractRule(String key)
根据内容提取规则的编码获取 内容提取规则
|
Modifier and Type | Method and Description |
---|---|
List<ExtractRule> |
CrawlerBuilder.extractRules()
获取所有的内容提取规则
|
Modifier and Type | Method and Description |
---|---|
CrawlerBuilder |
CrawlerBuilder.addExtractRule(ExtractRule extractRule)
增加内容提取规则
|
static SimulatorData |
Crawler.testContent(String url,
SiteRule siteRule,
ExtractRule contentExtractRule)
测试内容提取规则
使用默认下载器 |
static SimulatorData |
Crawler.testContent(String url,
SiteRule siteRule,
ExtractRule contentExtractRule,
Downloader downloader)
测试内容提取规则
使用自定义下载器 |
Modifier and Type | Method and Description |
---|---|
CrawlerBuilder |
CrawlerBuilder.addExtractRules(List<ExtractRule> list)
增加内容提取规则
|
CrawlerBuilder |
CrawlerBuilder.setExtractRules(List<ExtractRule> list)
设置 内容提取规则
会清空原始的内容提取规则 |
Modifier and Type | Method and Description |
---|---|
void |
ContentExtract.extract(ContentRule contentRule,
List<ExtractRule> rules,
Page page)
从网页内容里解析出所有符合要求的数据
|
void |
ContentExtractDecorator.extract(ContentRule contentRule,
List<ExtractRule> rules,
Page page) |
Modifier and Type | Method and Description |
---|---|
void |
SimpleContentExtract.extract(ContentRule contentRule,
List<ExtractRule> rules,
Page page) |
Modifier and Type | Method and Description |
---|---|
ContentExtractor |
ExtractorFactory.getContentExtractor(ExtractRule contentRule)
根据内容提取规则生成内容提取器
|
abstract ContentExtractor |
AbstractExtractorFactory.getContentExtractor(ExtractRule contentRule)
根据内容提取规则生成内容提取器
|
Modifier and Type | Field and Description |
---|---|
protected ExtractRule |
AbstractContentExtractor.contentRule |
Constructor and Description |
---|
AbstractContentExtractor(ExtractRule contentRule) |
SimpleContentExtractor(ExtractRule contentRule) |
Modifier and Type | Method and Description |
---|---|
SimulatorData |
Simulator.extract(String url,
SiteRule siteRule,
ExtractRule contentExtractRule,
Downloader downloader)
提取测试
|
SimulatorData |
SimpleSimulator.extract(String url,
SiteRule siteRule,
ExtractRule contentExtractRule,
Downloader downloader) |
Copyright © 2020 Pivotal Software, Inc.. All rights reserved.