gocolly/colly
Fork: 1775 Star: 23443 (更新于 2024-12-14 21:55:23)
license: Apache-2.0
Language: Go .
Elegant Scraper and Crawler Framework for Golang
最后发布版本: v2.1.0 ( 2020-06-09 07:47:53)
Colly
Lightning Fast and Elegant Scraping Framework for Gophers
Colly provides a clean interface to write any kind of crawler/scraper/spider.
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
Sponsors
Scrapfly is an enterprise-grade solution providing Web Scraping API that aims to simplify the scraping process by managing everything: real browser rendering, rotating proxies, and fingerprints (TLS, HTTP, browser) to bypass all major anti-bots. Scrapfly also unlocks the observability by providing an analytical dashboard and measuring the success rate/block rate in detail.
Features
- Clean API
- Fast (>1k request/sec on a single core)
- Manages request delays and maximum concurrency per domain
- Automatic cookie and session handling
- Sync/async/parallel scraping
- Caching
- Automatic encoding of non-unicode responses
- Robots.txt support
- Distributed scraping
- Configuration via environment variables
- Extensions
Example
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("http://go-colly.org/")
}
See examples folder for more detailed examples.
Installation
Add colly to your go.mod
file:
module github.com/x/y
go 1.14
require (
github.com/gocolly/colly/v2 latest
)
Bugs
Bugs or suggestions? Visit the issue tracker or join #colly
on freenode
Other Projects Using Colly
Below is a list of public, open source projects that use Colly:
- greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.
- altsab/gowap Wappalyzer implementation in Go.
- jesuiscamille/goquotes A quotes scraper, making your day a little better!
- jivesearch/jivesearch A search engine that doesn't track you.
- Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.
- lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.
- yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel's web site for lesson metadata.
- gamedb/gamedb A database of Steam games.
- lawzava/scrape CLI for email scraping from any website.
- eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scraper
- Go-phie/gophie Search, Download and Stream movies from your terminal
- imthaghost/goclone Clone websites to your computer within seconds.
- superiss/spidy Crawl the web and collect expired domains.
- docker-slim/docker-slim Optimize your Docker containers to make them smaller and better.
- seversky/gachifinder an agent for asynchronous scraping, parsing and writing to some storages(elasticsearch for now)
- eval-exec/goodreads crawl all tags and all pages of quotes from goodreads.
If you are using Colly in a project please send a pull request to add it to the list.
Contributors
This project exists thanks to all the people who contribute. [Contribute].
Backers
Thank you to all our backers! 🙏 [Become a backer]
Sponsors
Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]
License
最近版本更新:(数据更新于 2024-10-08 15:29:17)
2020-06-09 07:47:53 v2.1.0
2019-11-28 20:29:32 v2.0.0
2019-02-13 21:20:30 v1.2.0
2018-08-29 00:55:27 v1.1.0
2018-05-14 15:19:53 v1.0.0
主题(topics):
crawler, crawling, framework, go, golang, scraper, scraping, spider
gocolly/colly同语言 Go最近更新仓库
2024-12-22 07:52:58 navidrome/navidrome
2024-12-21 20:15:12 SagerNet/sing-box
2024-12-21 03:25:54 SpecterOps/BloodHound
2024-12-19 23:11:24 shadow1ng/fscan
2024-12-19 21:50:56 minio/minio
2024-12-19 10:04:39 istio/istio