Releases · elixir-crawly/crawly

@serpent213

What's Changed

Created a simple management Web UI. Try it on localhost:4001
Added the possibility of creating spiders with the help of the YML format. Read more here: https://github.com/elixir-crawly/crawly/blob/master/documentation/spiders_in_yml.md
Added the possibility to run Crawly (and your scraping projects) without Elixir. Read more here: https://github.com/elixir-crawly/crawly/blob/master/documentation/standalone_crawly.md
Added generators for Crawly spiders and configuration files to reduce boilerplate
Improved UniqueRequest middleware so that it can store hashes instead of complete URLs (special thanks to @serpent213)
Added SameDomainFilter middleware, my favorite, which will probably deprecate the need to rely on the base_url in the future. Again thanks to @serpent213!

Bugfix for start_urls size (now it's possible to have very large start URLs)
Split business logs from other logs. Per spider logging
Send logs to CrawlyUI (optional)
Allow to override more spider options:
- closespider_itemcount
- closespider_timeout
- concurrent_requests_per_domain (number of started workers)
Change on_spider_log_callback (now it also gets the crawl_id)
Parse pipelines

The release includes the following improvements:

WriteToFile pipeline now adds timestamps to filenames
WriteToFile pipeline will now create a folder if missing
SendToUI item pipeline will send data to experimental CrawlyUI management dashboard
Other smaller features

This release contains the following features:

Automatic cookies management (allows scraping websites under login form or a form with ZIP code)
Spider custom settings (allows overriding settings like concurrency on the spider level)
Injected on_spider_closed_callback (allows notifying other parts of the system on the crawl end)
Fixes and improvements of the documentation

This release contains the following features:

Retries support
Pluggable user agents
Browser rendering support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

Releases: elixir-crawly/crawly

0.15.0 Release

What's Changed

Contributors

Release 0.13.0

0.12.0

Release 0.11.0

Release 0.10.0

Release 0.9.0

Release 0.8.0