edited tags

Link

edited Mar 22, 2022 at 14:59

Reinderien

65.5k
5
70
188

Tweeted twitter.com/StackCodeReview/status/1395846866353393671

occurred May 21, 2021 at 21:00

added 20 characters in body

Source Link

edited May 20, 2021 at 15:23

Reinderien

65.5k
5
70
188

I wrote a simple zooplaZoopla real estate scraper for just practicing what iI learned so far in pythonPython,requests Requests,BeautifulSoup BeautifulSoup and overall web scraping fundamentals and by. By looking at my code iI feel like there would be a better and more elegant way to write it but unfortunately as a beginner iI don't know yet.So So, iI believe iI should let experienced guys here at SOStack Exchange to review my code for enhancement.

Basically in this scraper iI mostly did is jsonJSON parsing.Problem Problem was all the data on the website was coming from javascriptJavaScript under the scriptscript tag so iI have to select that tag and then pass it to json.loads()json.loads() and parse the jsonJSON dict to find the right key value-value pair.

I wrote a simple zoopla real estate scraper for just practicing what i learned so far in python,requests,BeautifulSoup overall web scraping fundamentals and by looking at my code i feel like there would be a better and elegant way to write it but unfortunately as a beginner i don't know yet.So, i believe i should let experienced guys here at SO to review my code for enhancement.

Basically in this scraper i mostly did is json parsing.Problem was all the data on the website was coming from javascript under the script tag so i have to select that tag and then pass it to json.loads() and parse the json dict to find the right key value pair.

I wrote a simple Zoopla real estate scraper for just practicing what I learned so far in Python, Requests, BeautifulSoup and overall web scraping fundamentals. By looking at my code I feel like there would be a better and more elegant way to write it but unfortunately as a beginner I don't know yet. So, I believe I should let experienced guys here at Stack Exchange to review my code for enhancement.

Basically in this scraper I mostly did is JSON parsing. Problem was all the data on the website was coming from JavaScript under the script tag so I have to select that tag and then pass it to json.loads() and parse the JSON dict to find the right key-value pair.

Source Link

asked May 19, 2021 at 20:55

Tauqeer Abbas

63
2

Refactor Web Scraper

I wrote a simple zoopla real estate scraper for just practicing what i learned so far in python,requests,BeautifulSoup overall web scraping fundamentals and by looking at my code i feel like there would be a better and elegant way to write it but unfortunately as a beginner i don't know yet.So, i believe i should let experienced guys here at SO to review my code for enhancement.

import requests
import json
import csv
import time
from bs4 import BeautifulSoup as bs

class ZooplaScraper:

    results = []

def fetch(self, url):
    print(f'HTTP GET request to URL: {url}', end='')
    res = requests.get(url)
    print(f' | Status code: {res.status_code}')

    return res

def parse(self, html):
    content = bs(html, 'html.parser')
    content_array = content.select('script[id="__NEXT_DATA__"]')
    content_dict = json.loads(content_array[0].string)
    content_details = content_dict['props']['initialProps']['pageProps']['regularListingsFormatted']

    for listing in content_details:
        self.results.append ({
            'listing_id': listing['listingId'],
            'name_title': listing['title'],
            'names': listing['branch']['name'],
            'addresses': listing['address'],
            'agent': 'https://zoopla.co.uk' + listing['branch']['branchDetailsUri'],
            'phone_no': listing['branch']['phone'],
            'picture': listing['image']['src'],
            'prices': listing['price'],
            'Listed_on': listing['publishedOn'],
            'listing_detail_link': 'https://zoopla.co.uk' + listing['listingUris']['detail']


        })

def to_csv(self):
    
    with open('zoopla.csv', 'w') as csv_file:
        writer = csv.DictWriter(csv_file, fieldnames=self.results[0].keys())
        writer.writeheader()

        for row in self.results:
            writer.writerow(row)

        print('Stored results to "zoopla.csv"')

def run(self):

    for page in range(1, 5):
        url = 'https://www.zoopla.co.uk/for-sale/property/london/?page_size=25&q=London&radius=0&results_sort=newest_listings&pn='
        url += str(page)
        res = self.fetch(url)
        self.parse(res.text)
        time.sleep(2)
    self.to_csv()

  if __name__ == '__main__':
     scraper = ZooplaScraper()
     scraper.run()

Basically in this scraper i mostly did is json parsing.Problem was all the data on the website was coming from javascript under the script tag so i have to select that tag and then pass it to json.loads() and parse the json dict to find the right key value pair.

python-3.x json web-scraping beautifulsoup

Stack Exchange Network

Return to Question

Refactor Web Scraper