I have this very large Dataframe containing statistics for various firms for years 1950 to 2020. I have been trying to divide the data first by year and then by industry code (4 digits). Both 'year' and 'industry_code' are columns from the Dataframe. I have created a dictionary in order to obtain data by year, but then I find myself stuck when trying to divide each key by industry, since all of my columns from my initial Dataframe find themselves in the 'value' part of the dictionary. Here is my starting code:
df= pd.read_csv('xyz')
dictio = {}
for year in df['year'].unique():
dictio[year] = df[ df['year'] == year ]
Could someone help me figure out a groupby / loc / if statement or other in order to complete the sampling by year and by industry? Thank you!