webmagic-0.3.0
版本发布时间: 2013-09-04 11:02:13
code4craft/webmagic最新发布版本:WebMagic-1.0.1(2024-10-26 01:46:00)
-
Change default XPath selector from HtmlCleaner to Xsoup.
Xsoup is an XPath selector based on Jsoup written by me. It has much better performance than HtmlCleaner.
Time of processing a page is reduced from 7~9ms to 0.4ms.
If Xsoup is not stable for your usage, just use
Spider.xsoupOff()
to turn off it and report an issue to me! -
Add cycle retry times for Site.
When cycle retry times is set, Spider will put the url which downloading failed back to scheduler, and retry after a cycle of queue.