HP Laptops

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

WEB SCRAPPING AND EXPLORATORY DATA ANALYSIS

ON
HP LAPTOPS
ABOUT US

Name : Rayapudi Gautam Kumar Name : Tadicharla Rama Chaitanya


Qualification: B.Tech (EEE) Qualification: B.Tech (EEE)
Experience: Fresher Experience: Fresher
Problem Statement
The objective of this project is to analyze the hp laptops and suggest a laptop with better features
according to the customer’s affordability.
CONTENTS
• Introduction
• Problem Statement
• Web scraping
• Tools Used
• Website
• Steps For Collecting Data
• Raw Data
• Steps For Cleaning Data
• Cleaned Data
• Data Visualization
• Challenges Faced
• Observation
• Conclusion
Introduction
• HP- HP Inc. is an American multinational information technology company headquartered in Palo Alto, California, that develops
personal computers (PCs), printers and related supplies, as well as 3D printing solutions.
Data Description
• In this project we will analyze the laptops of hp and we scrap the data from the below website
• https://www.hp.com/in-en/shop/catalogsearch/result/index/?p=1&product_list_limit=24&q=laptop
The datasets contains the following information:
• Product Name - Name of the product
• Processor - Type of processor
• RAM - The size of RAM(in GB)
• Hard Drive - The size of Hard drive(in GB)
• OS - Operating System of the laptop
• Screen Size - Screen size of the laptop(in cms)
• Additional Features - Additional Features of the laptop(eg.,Backlit Keyboard, Camera Quality, Type of speakers used etc.)
• Processor_Type - Type of the processor
• Original Price - Original price of the product
• GST - GST laid on the product
• Offer Percentage - The off on product in %
• Price Saved - The total price saved on the product
• Price - The final price of the product
• Product ID - The product ID of the product
Data Shape:-
• number of rows: 360
• number of columns: 13
Webscraping
• The Internet has the massive volume of data that can use the collect the data and analysis on the
data.
• The collected data may unstructured or semistructured data.
• We have used Beautiful Soup in our project to scrap the data from website

STEPS:
• Find the URL to scrape the data
• Request the server for the response
• Inspect the page
• Identify the data to extract
• Store the data in required format
TOOLS USED

 Web scraping
• Python
• Request
• BeautifulSoup
 Data Cleaning and Manipulation
• Numpy
• Pandas
• Regular Expression
 Data Visualization
• Matplotlib
• Seaborn
Website
 https://www.hp.com/in-en/shop/catalogsearch/result/index/?p=1&product_list_limit=24&q=laptop
Steps for collecting Data

• Select the website to scrape the data

• Import the necessary libraries(if not found install


the library)

• After scraping check for the every column


length.

• If length is same of all the columns make it as a


DataFrame.

• Finally Save it in a required format (we saved it


in pickle format)
Raw Data
Steps For Cleaning Data and Manipulating Data
• Checking for null values.
• Check for Duplicates.
• Checking and removing special characters.
• Data type conversion.
• Checking for missing values and perform based on criteria.
Cleaned Data
Data Visualization
• Data visualization makes the data to be more understandable and interpretable.
• Visualization can be done in usually three ways: Univariate, Bivariate and Multivariate analysis.
• Univariate: In this type of analysis only one variable is used to do the analysis.
• Bivariate: In bivariate analysis the analysis is done among two variables.
• Multivariate: In multivariate analysis the analysis done using more than two variables.
• Using this we can give relation to different data and draw some key observations and
conclusions.
Univariate Analysis
From this we can easily identify how many laptops are there in required screen size.

Analysis: From this bar plot we can say that most of the laptops are
manufactured with a screen size of 35.6 cm.
Pie Chart:
• Here we plotted the percentage of laptops by taking the percentage column.

Analysis: From this plot we can say most of the laptops have an offer of 9%
• Violin plot:This plot is plotted using Price data. We can observe that most of the data is
concentrated in the range 80000-100000.

Analysis: We can observe that most of the price data is concentrated in the range 80000-
100000.
Bivariate Analysis
Bar plot: In this plot we can observe the price ranges of each Operating System.

Analysis: From this we can say that for Windows 11 Pro the price is more compared to other two
OS.
• Bar Plot: Here Processor type and price columns are considered for plotting.

Analysis: From this we can observe that AMD processor type laptops have a lesser price
compared to the laptops with Intel processor.
• Stacked Bar Plot: for Screen Size and Operating System

Analysis: Here only for the screen size of 35.6 cm all types of OS hp is offering are available.
Multivariate Analysis
• This Heat map gives the correlation between the columns used in DataFrame
• Line Plot: In this plot we considered RAM, Hard Drive and Processor type columns for plotting
the graph.

Analysis: From this plot we can observe that compared to AMD, laptops with Intel processor
have highest RAM even though they have a lesser storage. Meanwhile, for AMD the RAM and
storage have increased linearly i.e., if RAM is increased simultaneously storage is also
increasing.
Scatter Plot: Scatter plot is plotted between Price, Screen size and Processor type.

Analysis: From the above plot we can depict that laptops with Intel processor with screen size of
35.6 cm are more concentrated in 65000 to 95000 price range.
Challenges Faced
• Had same tags and values.
• Losing data.
• We were not able to extract data about the ratings and no. of reviews for a given product.

Solution
• Collected Data in unordered way which is not readable and we converted into readable format
by using regex.
Observations

• The data columns(series) had mostly a strong correlation among them.


• The plots were mostly linear since they had a proportional relation between them.
• We can observe that most of the data is concentrated in the price range of 80000-
100000.
• From the bar plot in univariate analysis we can say that most of the laptops are
manufactured with a screen size of 35.6 cm.
• From the bar plot in bivariate analysis we can say that for Windows 11 Pro the price is
more compared to other two OS.
• From the scatter plot we can depict that laptops with Intel processor with screen size of
35.6 cm are more concentrated in 65000 to 95000 price range.
Conclusion
• From this we can conclude several points:
• Even if laptops with AMD processor has lower prices compared to Intel, it is better to
buy a laptop with Intel processor, since there won’t be much of a difference in prices.
• So if you are looking for a laptop with features like satisfactory RAM, a better storage
with a lower price it is better to look for a laptop with AMD processor.
• If you are looking for a laptop with highest price with a cutting-edge features then you
can go for HPEnvy360 2-in-1 Laptop OLED 15-fe0014TX laptop.
• If you want a laptop with lowest price with satifactory features then you can buy
HPPavilionLaptop14-ec1003AU laptop.
THANK YOU

You might also like