Pandas CheatSheet
Pandas CheatSheet
Pandas CheatSheet
A Beginners Guide
What is Pandas?
Pandas is a free library for Python that makes it easy to work with data. It
provides two main data structures: Series (like a list) and DataFrame (like a
table or spreadsheet). With Pandas, you can easily organize, analyze, and
manipulate data.
To install Pandas, open your terminal or command prompt and type:
!pip install pandas
1. Loading Data
Loading data is the first step in any data analysis workflow. Pandas provides
several functions to read data from various file formats.
Import Pandas library:
import pandas as pd
2. Viewing Data
After loading the data, it’s important to inspect it to understand its structure
and content. Pandas provides several methods for this.
Summary of DataFrame:
Display descriptive statistics for numerical columns (count, mean, std, min,
3. Selecting Data
Selecting specific data from a DataFrame is crucial for analysis. Pandas allows
you to select columns and rows easily.
df[['col1', 'col2']]
4. Modifying Data
Modifying data in a DataFrame is essential for preparing your dataset for
df['new_column'] = df['existing_column'] * 2
Rename Columns:
Rename a specific column
Drop Columns:
df.drop(columns=['column_to_drop'], inplace=True)
5. Handling Missing Values
Dealing with missing values is crucial to ensure the integrity of your analysis.
df.dropna(subset=['column_name'], inplace=True)
df.fillna(value=0, inplace=True)
df.fillna(method='ffill', inplace=True)
6. Removing Duplicates
Dealing with missing values is crucial to ensure the integrity of your analysis.
df.drop_duplicates(subset=['col1'], inplace=True)
7. Sorting Data
Sorting data is essential for analysis and presentation. You can sort your
DataFrame by one or more columns.
8. Grouping and Aggregating Data
Grouping data allows you to perform operations on subsets of your data.
grouped = df.groupby('column_name')
Multiple aggregations
agg_df = grouped.agg({'value_column': ['sum', 'mean'], 'another_column':
9. Merging and Joining DataFrames
Combining multiple DataFrames is often necessary when working with related
10. Applying Functions
You can apply custom functions to your DataFrame or Series to manipulate or
transform data.
df['new_col'] = df['existing_col'].apply(lambda x: x + 1)
11. String Methods
Pandas provides string methods that allow you to perform vectorized string
operations on Series.
df['string_column'] = df['string_column'].str.lower()
df['contains_text'] = df['string_column'].str.contains('text')
Replacing Substrings:
Replace 'old' with 'new' in strings
12. Advanced Data Manipulation
Advanced data manipulation techniques allow for more complex
transformations and reshaping of your DataFrame.
Melt Function:
The melt() function is used to transform wide-format data into long-format
Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.
stacked_df = df.stack()
unstacked_df = stacked_df.unstack()
13. Creating and Using Pivot Tables
Pivot tables allow you to summarize data in a flexible way.
Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.
14. Working with Categorical Data
Pandas provides support for categorical data, which can improve performance
and memory usage.
df['category_column'] = df['category_column'].astype('category')
codes = df['category_column']
grouped = df.groupby('category_column').size()
15. Handling Date and Time Data
Pandas provides powerful tools for working with date and time data, making it
easy to manipulate and analyze time series.
df['date_column'] = pd.to_datetime(df['date_column'])
df['year'] = df['date_column'].dt.year
Extract month
df['month'] = df['date_column'].dt.month
Extract day
df['day'] = df['date_column']
df.set_index('date_column', inplace=True)