I've got (probably) quite a simple problem that I just cannot wrap my head around right now. I'm collecting the following two series:
from pandas_datareader import wb
countries = [
'DZA', 'ARM','AZE','BLR','BIH','BRN','KHM','CHN','HRV', 'CZE','EGY',\
'EST','GEO','HUN','IND','IDN','ISR','JPN','JOR','KAZ','KOR','KGZ','LAO','LVA',\
'LBN','LTU','MYS','MDA','MNG','MMR','MKD','PHL','POL','ROU', 'RUS','SAU',\
'SGP','SVK','SVN','TJK','THA','TUR','UKR','UZB','VNM'
]
dat = wb.download(indicator='FR.INR.LEND', country=countries, start=2010, end=2019)
dat.columns = ['lending_rate']
us = wb.download(indicator='FR.INR.LEND', country='US', start=2010, end=2019)
us.columns = ['lending_rate_us']
dat2=pd.concat([dat,us])
dat2
I'd like to take the difference between lending_rate
and lending_rate_us
but obviously would like to subtract lending_rate_us
for the US only from lending_rate
in all other countries (ie. avoid what would otherwise lead to NANs everywhere).
So I guess what I'm trying to do is to copy the values for lending_rate_us
to all other countries to then take the difference between both columns.
Does anyone have an idea how to do that (or an alternative idea that makes more sense)?
Thanks!
EDIT:
I tried the following, alas without success:
from pandas_datareader import wb
countries = [
'DZA', 'ARM','AZE','BLR','BIH','BRN','KHM','CHN','HRV', 'CZE','EGY',\
'EST','GEO','HUN','IND','IDN','ISR','JPN','JOR','KAZ','KOR','KGZ','LAO','LVA',\
'LBN','LTU','MYS','MDA','MNG','MMR','MKD','PHL','POL','ROU', 'RUS','SAU',\
'SGP','SVK','SVN','TJK','THA','TUR','UKR','UZB','VNM'
]
dat = wb.download(indicator='FR.INR.LEND', country=countries, start=2010, end=2019)
dat.columns = ['lending_rate']
us = wb.download(indicator='FR.INR.LEND', country='US', start=2010, end=2019)
us.columns = ['lending_rate']
for i in dat.index.get_level_values(0).unique():
dat["lending_rate_spread"]=dat.loc[i,:]-us.loc["United States",:]
dat
Output:
lending_rate lending_rate_spread
country year
Armenia
2019 12.141989 NaN
2018 12.793042 NaN
2017 14.406002 NaN
2016 17.356706 NaN
2015 17.590330 NaN
... ... ... ...
Vietnam
2014 8.665000 NaN
2013 10.374167 NaN
2012 13.471667 NaN
2011 16.953833 NaN
2010 13.135250 NaN
450 rows × 2 columns
But when I just print the result of the loop without creating a new column I get the correct values:
for i in dat.index.get_level_values(0):
print(dat.loc[i,:]-us.loc["United States",:])
Output:
lending_rate
year
2019 6.859489
2018 7.888875
2017 10.309335
2016 13.845039
2015 14.330330
2014 13.158665
2013 12.744987
2012 13.980068
2011 14.504474
2010 15.950428
lending_rate
year
2019 11.998715
2018 12.544167
2017 12.445833
2016 12.863560
2015 14.274167
I don't understand why I would get the correct result but not present it in the correct way?
lending_rate_us
for the US only from lending_rate in all other countries mean?lending_rate_us
is only reported for the US, so it isNaN
in all other countries. So if I haddiff=lending_rate-lending_rate_us
, I'd get a column withNaN
values. So basically, I want to subtract values for one MultiIndex level (US) from all other levels.