barplot normalization and ordering of groups (x-axis)

Ask Question

Asked 5 years, 11 months ago

Modified 5 years, 10 months ago

Viewed 571 times

Below is what I needed to do to get to the part where I attempt to implement seaborn's barplot.

import matplotlib.pyplot as plt 
import seaborn as sns 
import pandas as pd 
import statsmodels.api as sm 
import numpy as np

da = pd.read_csv("nhanes_2015_2016.csv")

da["DMDMARTL"] = da.DMDMARTL.fillna("Missing")
da["DMDMARTLdescript"] = da.DMDMARTL.replace({1: "Married", 2: "Widowed", 3: "Divorced", 4: "Separated", 5: "Never married", 
                             6: "Living with partner",       77: "Refused", 99: "Don't know"})

da["RIAGENDRx"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})

da["agegrp"] = pd.cut(da.RIDAGEYR, [10, 20, 30, 40, 50, 60, 70, 80])

I pieced together bits of code here and there and arrived at what I have below.

y = "prop"
dx = da.loc[~da.RIAGENDRx.isin(["Male"]), :]
plt.figure(figsize=(12, 5))
prop_df = (dx["agegrp"]
       .groupby(dx["DMDMARTLdescript"])
       .value_counts(normalize=True)
       .rename(y)
       .reset_index())
sns.barplot(x="agegrp", y=y, hue="DMDMARTLdescript", data=prop_df)

The result of running the code above is the following

I have following issues with the plot it generates.

Although I have asked each age group to be normalized `(normalized = True), based on the image, it's fairly obvious that the sum of the bars in each age group exceeds 1.
The age groups are ordered along the x axis in a somewhat arbitrary way. I am not sure how to order them in the numerical order.

(the csv file is publicly available here github link.)

edited Jan 21, 2019 at 16:03

asked Jan 20, 2019 at 20:24

Blackwidow

1461 silver badge8 bronze badges

Concerning (1.) the normalization takes place according to the descript values. I.e. all "divorced" cases sum up to 1.
– ImportanceOfBeingErnest
Commented Jan 20, 2019 at 21:35
Please print output of print(prop_df.groupby(['DMDMARTLdescript']).sum()) and check. And please provide actual sample as we do not have your .csv file. See How to make good reproducible pandas examples.
– Parfait
Commented Jan 20, 2019 at 22:46
@ImportanceOfBeingErnest thank you for your input. So I thought each age group is a divorced case and the sum of the bars in each age group would amount to 1. But I see that the red bar in [10,20] alone is already 1.
– Blackwidow
Commented Jan 21, 2019 at 13:40
Yes, because the red bar is the only case of "Missing" so sum("Missing") is indeed 1.
– ImportanceOfBeingErnest
Commented Jan 21, 2019 at 13:41
@ImportanceOfBeingErnest I ran print(prop_df.groupby(['DMDMARTLdescript']).sum()) and I see what you mean. Is there a way I can make the sum of the bars in each age group normalized instead?
– Blackwidow
Commented Jan 21, 2019 at 13:45

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Collectives™ on Stack Overflow

barplot normalization and ordering of groups (x-axis)

0

Your Answer

Browse other questions tagged
python
pandas
seaborn
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged pythonpandasseaborn or ask your own question.

Linked

Browse other questions tagged
python
pandas
seaborn
or ask your own question.