I have been working with requests where I as easy as it is. Scraping the webpage and add it into a dict and print the payload if we find a new value or not.
import json
import re
import requests
from selectolax.parser import HTMLParser
payload = {
"store": "Not found",
"name": "Not found",
"price": "Not found",
"image": "Not found",
"sizes": []
}
response = requests.get("https://shelta.se/sneakers/nike-air-zoom-type-whiteblack-cj2033-103")
if response.ok:
bs4 = HTMLParser(response.text)
product_name = bs4.css_first('h1[class="product-page-header"]')
product_price = bs4.css_first('span[class="price"]')
product_image = bs4.css_first('meta[property="og:image"]')
if product_name:
payload['name'] = product_name.text().strip()
if product_price:
payload['price'] = "{} Price".format(product_price.text().strip())
if product_image:
payload['image'] = product_image.attrs['content']
try:
attribute = json.loads(
'{}'.format(
re.search(
r'var\s*JetshopData\s*=\s*(.*?);',
response.text,
re.M | re.S
).group(1)
)
)
payload['sizes'] = [
f'{get_value["Variation"][0]}'
for get_value in attribute['ProductInfo']['Attributes']['Variations']
if get_value.get('IsBuyable')
]
except Exception: # noqa
pass
del bs4
print("New payload!", payload)
else:
print("No new payload!", payload)
The idea mostly is that if we find the values then we want to replace the values in the dict and if we dont find it, basically skip it.
Things that made me concerned:
- What happens if one of the if statements fails? Fails I mean etc if I cannot scrape
product_image.attrs['content']
- That would end up in a exception where it stops the script which I do not want to do. - Im pretty sure to use
except Exception: # noqa
is a bad practice and I do not know what would be the best practice to handle it.
I would appreciate all kind of tips and tricks and how I can improve my knowledge with python!
pip install selectolax
andrequests
\$\endgroup\$