Below is what I needed to do to get to the part where I attempt to implement seaborn's barplot
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as np
da = pd.read_csv("nhanes_2015_2016.csv")
da["DMDMARTL"] = da.DMDMARTL.fillna("Missing")
da["DMDMARTLdescript"] = da.DMDMARTL.replace({1: "Married", 2: "Widowed", 3: "Divorced", 4: "Separated", 5: "Never married",
6: "Living with partner", 77: "Refused", 99: "Don't know"})
da["RIAGENDRx"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})
da["agegrp"] = pd.cut(da.RIDAGEYR, [10, 20, 30, 40, 50, 60, 70, 80])
I pieced together bits of code here and there and arrived at what I have below.
y = "prop"
dx = da.loc[~da.RIAGENDRx.isin(["Male"]), :]
plt.figure(figsize=(12, 5))
prop_df = (dx["agegrp"]
sns.barplot(x="agegrp", y=y, hue="DMDMARTLdescript", data=prop_df)
The result of running the code above is the following
I have following issues with the plot it generates.
Although I have asked each age group to be normalized `(normalized = True), based on the image, it's fairly obvious that the sum of the bars in each age group exceeds 1.
The age groups are ordered along the x axis in a somewhat arbitrary way. I am not sure how to order them in the numerical order.
(the csv file is publicly available here github link.)
and check. And please provide actual sample as we do not have your.csv
file. See How to make good reproducible pandas examples.sum("Missing")
is indeed 1.print(prop_df.groupby(['DMDMARTLdescript']).sum())
and I see what you mean. Is there a way I can make the sum of the bars in each age group normalized instead?