I have been working on where I want to improve my knowledge to do a safe JSON scraping. By that is that I could pretty much do a ugly workaround and wrap it into a try-except but that is not the case. I would like to know where it goes wrong and how I can improve the code I have been working with:
import json
from json import JSONDecodeError
import requests
payload = {
"name": None,
"price": None,
"sizes": None
}
headers = {'origin': 'https://www.aplace.com', 'Content-Type': 'application/json'}
with requests.get("http://api.aplace.com/sv/dam/aaltoili-iso-mehu-gren-red-sand", headers=headers) as response:
if not response.ok:
print(f"Requests NOT ok -> {payload}")
try:
# They added extra comma to fail the JSON. Fixed it by removing duplicate commas
doc = json.loads(response.text.replace(",,", ",")).get("productData", {})
except (JSONDecodeError, KeyError, ValueError) as err:
print(f"Not able to parse json -> {err} -> Payload: {payload}")
raise
if doc:
product_name = doc.get('name', {})
product_brand = doc.get('brandName', {})
product_price = doc.get('markets', {}).get('6', {}).get('pricesByPricelist', {}).get('3', {}).get('price', {})
# Added exception incase the [0] is not able to get any value from 'media'
try:
product_image = doc.get('media', {})[0].get('sources', {}).get('full', {}).get('url', {})
except KeyError:
product_image = None
product_size = doc.get('items', {})
try:
if product_name and product_brand:
payload['name'] = f"{product_brand} {product_name}"
if product_price:
payload['price'] = product_price.replace('\xa0', '')
if product_image:
payload['image'] = product_image
if product_size:
count = 0
sizes = []
sizesWithStock = []
for value in product_size.values():
if value.get('stockByMarket', {}).get('6', {}):
count += value['stockByMarket']['6']
sizes.append(value['name'])
sizesWithStock.append(f"{value['name']} - ({value['stockByMarket']['6']})")
payload['sizes'] = sizes
payload['sizesWithStock'] = sizesWithStock
payload['stock'] = count
except Exception as err:
print("Create notification if we hit here later on")
raise
print(f"Finished payload -> {payload}")
My goal here is to cover all the values and if we do not find it inside the json then we use the if statements to see if we have caught a value or not.
Things that annoys me:
- Long nested dicts but im not sure if there is any other way, is there a better way?
- try-except for product_image, because its a list and I just want to grab the first in the list, is it possible to skip the try-except?
- If unexpected error happens, print to the exception but am I doing it correctly?
Hopefully I can get new knowledge from here :)