Pandas CheatSheet
Pandas CheatSheet
Pandas CheatSheet
CHEATSHEET
A Beginners Guide
@apexiq.ai
Introduction
What is Pandas?
Pandas is a free library for Python that makes it easy to work with data. It
provides two main data structures: Series (like a list) and DataFrame (like a
table or spreadsheet). With Pandas, you can easily organize, analyze, and
manipulate data.
Installation
To install Pandas, open your terminal or command prompt and type:
!pip install pandas
@apexiq.ai
1. Loading Data
Loading data is the first step in any data analysis workflow. Pandas provides
several functions to read data from various file formats.
Import:
Import Pandas library:
import pandas as pd
@apexiq.ai
2. Viewing Data
After loading the data, it’s important to inspect it to understand its structure
and content. Pandas provides several methods for this.
Summary of DataFrame:
df.info()
Display descriptive statistics for numerical columns (count, mean, std, min,
max)
df.describe()
@apexiq.ai
3. Selecting Data
Selecting specific data from a DataFrame is crucial for analysis. Pandas allows
you to select columns and rows easily.
df['column_name']
df[['col1', 'col2']]
df.iloc[0]
df.loc[0]
@apexiq.ai
4. Modifying Data
Modifying data in a DataFrame is essential for preparing your dataset for
analysis.
df['new_column'] = df['existing_column'] * 2
Rename Columns:
Rename a specific column
Drop Columns:
df.drop(columns=['column_to_drop'], inplace=True)
@apexiq.ai
5. Handling Missing Values
Dealing with missing values is crucial to ensure the integrity of your analysis.
df.isnull().sum()
df.dropna(inplace=True)
df.dropna(subset=['column_name'], inplace=True)
df.fillna(value=0, inplace=True)
df.fillna(method='ffill', inplace=True)
@apexiq.ai
6. Removing Duplicates
Dealing with missing values is crucial to ensure the integrity of your analysis.
df.drop_duplicates(inplace=True)
df.drop_duplicates(subset=['col1'], inplace=True)
@apexiq.ai
7. Sorting Data
Sorting data is essential for analysis and presentation. You can sort your
DataFrame by one or more columns.
@apexiq.ai
8. Grouping and Aggregating Data
Grouping data allows you to perform operations on subsets of your data.
grouped = df.groupby('column_name')
grouped['value_column'].sum()
grouped['value_column'].mean()
Multiple aggregations
agg_df = grouped.agg({'value_column': ['sum', 'mean'], 'another_column':
'count'})
@apexiq.ai
9. Merging and Joining DataFrames
Combining multiple DataFrames is often necessary when working with related
datasets.
@apexiq.ai
10. Applying Functions
You can apply custom functions to your DataFrame or Series to manipulate or
transform data.
df['new_col'] = df['existing_col'].apply(lambda x: x + 1)
@apexiq.ai
11. String Methods
Pandas provides string methods that allow you to perform vectorized string
operations on Series.
df['string_column'] = df['string_column'].str.lower()
df['contains_text'] = df['string_column'].str.contains('text')
Replacing Substrings:
Replace 'old' with 'new' in strings
@apexiq.ai
12. Advanced Data Manipulation
Advanced data manipulation techniques allow for more complex
transformations and reshaping of your DataFrame.
Melt Function:
The melt() function is used to transform wide-format data into long-format
data.
Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.
stacked_df = df.stack()
unstacked_df = stacked_df.unstack()
@apexiq.ai
13. Creating and Using Pivot Tables
Pivot tables allow you to summarize data in a flexible way.
Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.
@apexiq.ai
14. Working with Categorical Data
Pandas provides support for categorical data, which can improve performance
and memory usage.
df['category_column'] = df['category_column'].astype('category')
codes = df['category_column'].cat.codes
grouped = df.groupby('category_column').size()
@apexiq.ai
15. Handling Date and Time Data
Pandas provides powerful tools for working with date and time data, making it
easy to manipulate and analyze time series.
df['date_column'] = pd.to_datetime(df['date_column'])
df['year'] = df['date_column'].dt.year
Extract month
df['month'] = df['date_column'].dt.month
Extract day
df['day'] = df['date_column'].dt.day
df.set_index('date_column', inplace=True)
@apexiq.ai
LIKE FOLLOW SHARE
THANK YOU!
@apexiq.ai