Column Differences for MultiIndex Dataframes

Question

I've got (probably) quite a simple problem that I just cannot wrap my head around right now. I'm collecting the following two series:

from pandas_datareader import wb
countries = [
            'DZA', 'ARM','AZE','BLR','BIH','BRN','KHM','CHN','HRV', 'CZE','EGY',\
             'EST','GEO','HUN','IND','IDN','ISR','JPN','JOR','KAZ','KOR','KGZ','LAO','LVA',\
             'LBN','LTU','MYS','MDA','MNG','MMR','MKD','PHL','POL','ROU', 'RUS','SAU',\
             'SGP','SVK','SVN','TJK','THA','TUR','UKR','UZB','VNM'
]
dat = wb.download(indicator='FR.INR.LEND', country=countries, start=2010, end=2019)
dat.columns = ['lending_rate']
us = wb.download(indicator='FR.INR.LEND', country='US', start=2010, end=2019)
us.columns = ['lending_rate_us']
dat2=pd.concat([dat,us])
dat2

I'd like to take the difference between lending_rate and lending_rate_us but obviously would like to subtract lending_rate_us for the US only from lending_rate in all other countries (ie. avoid what would otherwise lead to NANs everywhere).

So I guess what I'm trying to do is to copy the values for lending_rate_us to all other countries to then take the difference between both columns.

Does anyone have an idea how to do that (or an alternative idea that makes more sense)?

Thanks!

EDIT:

I tried the following, alas without success:

from pandas_datareader import wb
countries = [
            'DZA', 'ARM','AZE','BLR','BIH','BRN','KHM','CHN','HRV', 'CZE','EGY',\
             'EST','GEO','HUN','IND','IDN','ISR','JPN','JOR','KAZ','KOR','KGZ','LAO','LVA',\
             'LBN','LTU','MYS','MDA','MNG','MMR','MKD','PHL','POL','ROU', 'RUS','SAU',\
             'SGP','SVK','SVN','TJK','THA','TUR','UKR','UZB','VNM'
]
dat = wb.download(indicator='FR.INR.LEND', country=countries, start=2010, end=2019)
dat.columns = ['lending_rate']
us = wb.download(indicator='FR.INR.LEND', country='US', start=2010, end=2019)
us.columns = ['lending_rate']

for i in dat.index.get_level_values(0).unique():
    dat["lending_rate_spread"]=dat.loc[i,:]-us.loc["United States",:]
dat

Output:

        lending_rate    lending_rate_spread
country year        
Armenia 
2019    12.141989   NaN
2018    12.793042   NaN
2017    14.406002   NaN
2016    17.356706   NaN
2015    17.590330   NaN
... ... ... ...
Vietnam 
2014    8.665000    NaN
2013    10.374167   NaN
2012    13.471667   NaN
2011    16.953833   NaN
2010    13.135250   NaN
450 rows × 2 columns

But when I just print the result of the loop without creating a new column I get the correct values:

for i in dat.index.get_level_values(0):
    print(dat.loc[i,:]-us.loc["United States",:])

Output:

      lending_rate
year              
2019      6.859489
2018      7.888875
2017     10.309335
2016     13.845039
2015     14.330330
2014     13.158665
2013     12.744987
2012     13.980068
2011     14.504474
2010     15.950428
      lending_rate
year              
2019     11.998715
2018     12.544167
2017     12.445833
2016     12.863560
2015     14.274167

I don't understand why I would get the correct result but not present it in the correct way?

What does subtract lending_rate_us for the US only from lending_rate in all other countries mean? — Ynjxsjmh, Commented Apr 23, 2021 at 15:12
Yes, that was a bit confusing perhaps. I'd like to get spreads of all countries relative to the US. lending_rate_us is only reported for the US, so it is NaN in all other countries. So if I had diff=lending_rate-lending_rate_us, I'd get a column with NaN values. So basically, I want to subtract values for one MultiIndex level (US) from all other levels. — Paul, Commented Apr 24, 2021 at 1:51
I noticed that the display loop succeeds and the write error occurs when the destination country is not specified. Also, there are some countries where the rate values do not add up to the year data being obtained in US, so you need to adjust the US data according to the number of year data for the country you are writing to. — r-beginners, Commented Apr 24, 2021 at 4:54

r-beginners · Accepted Answer · 2021-04-25 02:59:57Z

0

In response to your comment, I reviewed the data again. I reworked the data for each country as NA data existed, and found that all of the data was for 10 years. The method @Paul commented on is possible, so I modified the code.

dat['lending_rate_us'] = us['lending_rate_us']*len(dat['country'].unique())
dat['lending_rate_spread'] = dat['lending_rate'] - us['lending_rate_us']
dat.head()
country     year    lending_rate    lending_rate_us     lending_rate_spread
0   Armenia     2019    12.141989   237.7125    6.859489
1   Armenia     2018    12.793042   220.6875    7.888875
2   Armenia     2017    14.406002   184.3500    10.309335
3   Armenia     2016    17.356706   158.0250    13.845039
4   Armenia     2015    17.590330   146.7000    14.330330

edited Apr 25, 2021 at 2:59

answered Apr 24, 2021 at 5:35

r-beginners

35k3 gold badges18 silver badges39 bronze badges

If my answer helped you, please consider accepting it as the correct answer
– r-beginners
Commented Apr 24, 2021 at 5:35
That does seem to work, thanks. Although I do wonder, if there isn't a more efficient solution for this. It seems to be a fairly complex loop for doing something so simple like calculating a spread. Having to set a destination country does make sense though. I wonder if one cannot simply add a column that repeats values from the US for all other countries and then take the difference between columns?
– Paul
Commented Apr 25, 2021 at 2:25

Add a comment |

Collectives™ on Stack Overflow

Column Differences for MultiIndex Dataframes

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
multi-index
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandasmulti-index or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
multi-index
or ask your own question.