webmagic-0.4.0
版本发布时间: 2013-11-07 07:54:37
code4craft/webmagic最新发布版本:WebMagic-1.0.1(2024-10-26 01:46:00)
Improve performance of Downloader.
- Update HttpClient to 4.3.1 and rewrite the code of HttpClientDownloader #32.
- Use gzip by default to reduce the transport cost #31.
- Enable HTTP Keep-Alive and connection persistence, fix the wrong usage of PoolConnectionManage r#30.
The performance of Downloader is improved by 90% in my test.Test code: Kr36NewsModel.java.
Add synchronzing API for small task #28.
OOSpider ooSpider = OOSpider.create(Site.me().setSleepTime(100), BaiduBaike.class);
BaiduBaike baike = ooSpider.<BaiduBaike>get("http://baike.baidu.com/search/word?word=httpclient&pic=1&sug=1&enc=utf8");
System.out.println(baike);
More config for site
- Http proxy support by Site.setHttpProxy #22.
- More http header customizing support by Site.addHeader #27.
- Allow disable gzip by Site.setUseGzip(false).
- Move Site.addStartUrl to Spider.addUrl because I think startUrl is more a Spider's property than Site.
Code refactor in Spider
- Refactor the multi-thread part of Spider and fix some concurrent problem.
- Import Google Guava API for simpler code.
- Allow add request with more information by Spider.addRequest() instead of addUrl #29.
- Allow just downloading start urls without spawn urls extracted by Spider.setSpawnUrl(false).