Pandas Viva Questions
Pandas Viva Questions
Pandas Viva Questions
Pandas provide two data structures, which are supported by the pandas library, Series, and
DataFrames. Both of these data structures are built on top of the NumPy.
A Series is defined as a one-dimensional array that is capable of storing various data types.
The row labels of series are called the index. By using a 'series' method, we can easily
convert the list, tuple, and dictionary into series. A Series cannot contain multiple columns.
A DataFrame is a widely used data structure of pandas and works with a two-dimensional
array with labeled axes (rows and columns) DataFrame is defined as a standard way to store
data and has two different indexes, i.e., row index and column index. It consists of the
following properties:
It can be seen as a dictionary of Series structure where both the rows and columns are
indexed. It is denoted as "columns" in the case of columns and "index" in case of rows.
• Memory Efficient
• Data Alignment
• Reshaping
• Merge and join
• Time Series
Reindexing is used to conform DataFrame to a new index with optional filling logic. It places
NA/NaN in that location where the values are not present in the previous index. It returns a
new object unless the new index is produced as equivalent to the current one, and the value of
copy becomes False. It is used to change the index of the rows and columns of the
DataFrame.
A Series is defined as a one-dimensional array that is capable of storing various data types.
We can also create a Series from dict. If the dictionary object is being passed as an input and
the index is not specified, then the dictionary keys are taken in a sorted order to construct the
index.
If index is passed, then values correspond to a particular label in the index will be extracted
from the dictionary.
1. import pandas as pd
2. import numpy as np
3. info = {'x' : 0., 'y' : 1., 'z' : 2.}
4. a = pd.Series(info)
5. print (a)
pandas.Series.copy
Series.copy(deep=True)
The above statements make a deep copy that includes a copy of the data and the indices. If
we set the value of deep to False, it will neither copy the indices nor the data.
A DataFrame is a widely used data structure of pandas and works with a two-dimensional
array with labeled axes (rows and columns) It is defined as a standard way to store data and
has two different indexes, i.e., row index and column index.
Output:
Empty DataFrame
Columns: []
Index: []
We can add any new column to an existing DataFrame. The below code demonstrates how to
add any new column to an existing DataFrame:
12. How to Delete Indices, Rows or Columns From a Pandas Data Frame?
If you want to remove the index from the DataFrame, you should have to do the following:
Remove duplicate index values by resetting the index and drop the duplicate values from the
index column.
You can use the drop() method for deleting a column from the DataFrame.
The axis argument that is passed to the drop() method is either 0 if it indicates the rows and 1
if it drops the columns.
You can pass the argument inplace and set it to True to delete the column without reassign
the DataFrame.
You can also delete the duplicate values from the column by using the drop_duplicates()
method.
You can use the drop() method to specify the index of the rows that we want to remove from
the DataFrame.
You can use the .rename method to give different values to the columns or the index values
of DataFrame.
Numerical Python (Numpy) is defined as a Python package used for performing the various
numerical computations and processing of the multidimensional and single-dimensional array
elements. The calculations using Numpy arrays are faster than the normal Python array.
In Pandas, groupby() function allows us to rearrange the data by utilizing them on real-world
data sets. Its primary task is to split the data into various groups. These groups are
categorized based on some criteria. The objects can be divided from any of their axes.
set_index() method that sets an existing column as an index is also provided. Specify the
original name and the new name in dict like {original name: new
name} to index / columns of rename().
index is for index name and columns is for the columns name. If you want to change either,
you need only specify one of index or columns.
A new DataFrame is returned, the original DataFrame is not changed.
df_new = df.rename(columns={'A': 'a'}, index={'ONE': 'one'})
print(df_new)
# a B C
# one 11 12 13
# TWO 21 22 23
# THREE 31 32 33
print(df)
# A B C
# ONE 11 12 13
# TWO 21 22 23
# THREE 31 32 33
Output:
10 100
11 110
12 120
df = pd.DataFrame({
'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
'x': np.linspace(0,stop=N-1,num=N),
'y': np.random.rand(N),
'C': np.random.choice(['Low','Medium','High'],N).tolist(),
'D': np.random.normal(100, 10, size=(N)).tolist()
})
print df_reindexed
Its output is as follows −
A C B
0 2016-01-01 Low NaN
2 2016-01-03 High NaN
5 2016-01-06 Low NaN
# display
data.head()
22.How will you get the number of rows and columns of a DataFrame in pandas?
get the row and column count of the df
df.shape
(4, 4)
23. TRANSPOSE
>>> df
t1 t2 t3
0 T T T
1 C G G
2 C C -
3 A A A
4 A A A
By default, characters are stored as rows and sequences as columns in the DataFrame. If you
want rows to hold sequences, just transpose the matrix in pandas:
>>> df.transpose()
0 1 2 3 4
t1 T C C A A
t2 T G C A A
t3 T G - A A
Now we will use DataFrame.empty attribute to check if the given dataframe is empty or not.
# check if there is any element
# in the given dataframe or not
result = df.empty
# Print the result
print(result)
28. How will you get the top 2 rows from a DataFrame in pandas?
# Select the first 2 rows of the Dataframe
dfObj1 = empDfObj.head(2)
print(“First 2 rows of the Dataframe : “)
print(dfObj1)
32. What Are The Different Ways A DataFrame Can Be Created In pandas?
Ans:
DataFrame can be created in different ways here are some ways by which we create a
DataFrame:
• Using List:
# initialize list of lists
data = [[‘p’, 1], [‘q’, 2], [‘r’, 3]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = [‘Letter’, ‘Number’])
# print dataframe.
df
• Using dict of narray/lists:
To create DataFrame from dict of narray/list, all the narray must be of same length. If index
is passed then the length index should be equal to the length of arrays. If no index is passed,
then by default, index will be range(n) where n is the array length.
• Using arrays:
# DataFrame using arrays.
import pandas as pd
# initialise data of lists.
data = {‘Name’:[‘Tom’, ‘Jack’, ‘nick’, ‘juli’], ‘marks’:[99, 98, 95, 90]}
# Creates pandas DataFrame.
df = pd.DataFrame(data, index =[‘rank1’, ‘rank2’, ‘rank3’, ‘rank4’])
# print the data
df
#using Series of Dictionary
33. # Adding Columns in dataframe
loc() : loc() is label based data selecting method which means that we have to pass the
name of the row or column which we want to select. This method includes the last element
of the range passed in it, unlike iloc(). loc() can accept the boolean data unlike iloc() .
display(data.loc[2 : 5])
display(data)
iloc() : iloc() is a indexed based selecting method which means that we have to pass
integer index in the method to select specific row/column. This method does not
include the last element of the range passed in it unlike loc(). iloc() does not
display(data.iloc[[0, 2, 4, 7]])
display(data.iloc[1 : 5, 2 : 5])
Remove rows or columns by specifying label names and corresponding axis, or by specifying
directly index or column names. When using a multi-index, labels on different levels can be
removed by specifying the level.
Examples
df.drop(index='cow', columns='small')
big
lama speed 45.0
weight 200.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
iteritems()[source]
Iterate over (column name, Series) pairs.
Iterates over the DataFrame columns, returning a tuple with the column name and the
content as a Series.
DataFrame.iterrows
Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.itertuples
Iterate over DataFrame rows as namedtuples of the values.
df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
... 'population': [1864, 22000, 80000]},
... index=['panda', 'polar', 'koala'])
>>> df
species population
panda bear 1864
polar bear 22000
koala marsupial 80000
>>> for label, content in df.items():
... print(f'label: {label}')
... print(f'content: {content}', sep='\n')
...
label: species
content:
panda bear
polar bear
koala marsupial
Name: species, dtype: object
label: population
content:
panda 1864
polar 22000
koala 80000
Name: population, dtype: int64
DataFrame.itertuples(index=True, name='Pandas')[source]
Iterate over DataFrame rows as namedtuples.
Parameters
indexbool, default True
If True, return the index as the first element of the tuple.
namestr or None, default “Pandas”
The name of the returned namedtuples or None to return regular tuples.
df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
... index=['dog', 'hawk'])
>>> df
num_legs num_wings
dog 4 0
hawk 2 2
>>> for row in df.itertuples():
... print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)
With the name parameter set we set a custom name for the yielded namedtuples:
Return item and drop from frame. Raise KeyError if not found.
DataFrame.isna()
DataFrame.isnull
Alias of isna.
DataFrame.notna
DataFrame.dropna
pandas.DataFrame.size
Return an int representing the number of elements in this object.
Return the number of rows if Series. Otherwise return the number of rows times
number of columns if DataFrame.
2. WITHOUT AXIS
AXIS
1. DF1.drop(labels=[rowlabel],axis=0)#Temporary deletion
2. DF1.drop(labels=[rowlabel1,rowlabel2],axis=0)#Temporary deletion
OR DF1.drop(rowlabel,axis=0) OR DF1.drop(rowlabel)
3. DF1.drop(labels=[rowlabel],axis=0,inplace=True)#Permanent deletion
DF1=DF1.drop(labels=[rowlabel],axis=0)#Permanent deletion Without AXIS
DF1.drop(index=[rowlabel])#Temporary deletion
DF1.drop(index=[rowlabel],inplace=True)#Permanent deletion
DF1=DF1.drop(index=[rowlabel])#Permanent deletion