The Series Data Structure: Import Pandas As PD
The Series Data Structure: Import Pandas As PD
The Series Data Structure: Import Pandas As PD
You are currently looking at version 1.0 of this notebook. To download notebooks and datafiles, as
well as get help on Jupyter notebooks in the Coursera platform, visit the Jupyter Notebook FAQ
(https://www.coursera.org/learn/python-data-analysis/resources/0dhYG) course resource.
In [ ]: numbers = [1, 2, 3]
pd.Series(numbers)
In [ ]: import numpy as np
np.nan == None
In [ ]: np.nan == np.nan
In [ ]: np.isnan(np.nan)
In [ ]: s.index
Querying a Series
In [ ]: sports = {'Archery': 'Bhutan',
'Golf': 'Scotland',
'Sumo': 'Japan',
'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s
In [ ]: s.iloc[3]
In [ ]: s.loc['Golf']
In [ ]: s[3]
In [ ]: s['Golf']
In [ ]: s[0] #This won't call s.iloc[0] as one might expect, it generates an error instead
In [ ]: total = 0
for item in s:
total+=item
print(total)
In [ ]: import numpy as np
total = np.sum(s)
print(total)
In [ ]: len(s)
In [ ]: %%timeit -n 100
summary = 0
for item in s:
summary+=item
In [ ]: %%timeit -n 100
summary = np.sum(s)
In [ ]: %%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
for label, value in s.iteritems():
s.loc[label]= value+2
In [ ]: %%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
s+=2
In [ ]: s = pd.Series([1, 2, 3])
s.loc['Animal'] = 'Bears'
s
In [ ]: original_sports
In [ ]: cricket_loving_countries
In [ ]: all_countries
In [ ]: all_countries.loc['Cricket']
In [ ]: df.loc['Store 2']
In [ ]: type(df.loc['Store 2'])
In [ ]: df.loc['Store 1']
In [ ]: df.T
In [ ]: df.T.loc['Cost']
In [ ]: df['Cost']
In [ ]: df.loc['Store 1']['Cost']
In [ ]: df.loc[:,['Name', 'Cost']]
In [ ]: df.drop('Store 1')
In [ ]: df
In [ ]: copy_df = df.copy()
copy_df = copy_df.drop('Store 1')
copy_df
In [ ]: copy_df.drop?
In [ ]: del copy_df['Name']
copy_df
In [ ]: df['Location'] = None
df
In [ ]: costs+=2
costs
In [ ]: df
In [ ]: !cat olympics.csv
In [ ]: df = pd.read_csv('olympics.csv')
df.head()
In [ ]: df.columns
df.head()
Querying a DataFrame
In [ ]: df['Gold'] > 0
In [ ]: only_gold['Gold'].count()
In [ ]: df['Gold'].count()
In [ ]: only_gold = only_gold.dropna()
only_gold.head()
Indexing Dataframes
In [ ]: df.head()
In [ ]: df['country'] = df.index
df = df.set_index('Gold')
df.head()
In [ ]: df = df.reset_index()
df.head()
In [ ]: df = pd.read_csv('census.csv')
df.head()
In [ ]: df['SUMLEV'].unique()
In [ ]: df=df[df['SUMLEV'] == 50]
df.head()
In [ ]: columns_to_keep = ['STNAME',
'CTYNAME',
'BIRTHS2010',
'BIRTHS2011',
'BIRTHS2012',
'BIRTHS2013',
'BIRTHS2014',
'BIRTHS2015',
'POPESTIMATE2010',
'POPESTIMATE2011',
'POPESTIMATE2012',
'POPESTIMATE2013',
'POPESTIMATE2014',
'POPESTIMATE2015']
df = df[columns_to_keep]
df.head()
In [ ]: df = df.set_index(['STNAME', 'CTYNAME'])
df.head()
Missing values
In [ ]: df = pd.read_csv('log.csv')
df
In [ ]: df.fillna?
In [ ]: df = df.set_index('time')
df = df.sort_index()
df
In [ ]: df = df.reset_index()
df = df.set_index(['time', 'user'])
df
In [ ]: df = df.fillna(method='ffill')
df.head()