Skip to content

Releases: elixir-crawly/crawly

0.15.0 Release

11 Apr 08:16
Compare
Choose a tag to compare

What's Changed

  1. Created a simple management Web UI. Try it on localhost:4001
  2. Added the possibility of creating spiders with the help of the YML format. Read more here: https://github.com/elixir-crawly/crawly/blob/master/documentation/spiders_in_yml.md
  3. Added the possibility to run Crawly (and your scraping projects) without Elixir. Read more here: https://github.com/elixir-crawly/crawly/blob/master/documentation/standalone_crawly.md
  4. Added generators for Crawly spiders and configuration files to reduce boilerplate
  5. Improved UniqueRequest middleware so that it can store hashes instead of complete URLs (special thanks to @serpent213)
  6. Added SameDomainFilter middleware, my favorite, which will probably deprecate the need to rely on the base_url in the future. Again thanks to @serpent213!

Release 0.13.0

05 Feb 20:22
Compare
Choose a tag to compare
  1. Bugfix for start_urls size (now it's possible to have very large start URLs)
  2. Split business logs from other logs. Per spider logging
  3. Send logs to CrawlyUI (optional)
  4. Allow to override more spider options:
    • closespider_itemcount
    • closespider_timeout
    • concurrent_requests_per_domain (number of started workers)
  5. Change on_spider_log_callback (now it also gets the crawl_id)
  6. Parse pipelines

0.12.0

16 Nov 20:01
810cef8
Compare
Choose a tag to compare
Update gollum so we can use new HTTPoison (#139)

Release 0.11.0

07 Oct 08:26
Compare
Choose a tag to compare
Update version to 0.11.0

Release 0.10.0

18 May 09:27
Compare
Choose a tag to compare

The release includes the following improvements:

  1. WriteToFile pipeline now adds timestamps to filenames
  2. WriteToFile pipeline will now create a folder if missing
  3. SendToUI item pipeline will send data to experimental CrawlyUI management dashboard
  4. Other smaller features

Release 0.9.0

14 Apr 10:17
Compare
Choose a tag to compare

This release contains the following features:

Automatic cookies management (allows scraping websites under login form or a form with ZIP code)
Spider custom settings (allows overriding settings like concurrency on the spider level)
Injected on_spider_closed_callback (allows notifying other parts of the system on the crawl end)
Fixes and improvements of the documentation

Release 0.8.0

19 Feb 19:56
Compare
Choose a tag to compare

This release contains the following features:

  1. Retries support
  2. Pluggable user agents
  3. Browser rendering support