Convert List of dictionaries to DataFrame

Question

I'm learning how to analyze data using python. I'm doing a project analyzing data of last Madrid elections in Spain. After getting the data I need from a website using a web crawler, I have the following data structure:

    [{'municipio': 'Ajalvir',
  'link': 'https://resultados.elpais.com/elecciones/2019/autonomicas/12/28/02.html',
  'escrutinio': [{'escrutado': 100.0},
   {'votos_totales': 2178.0, 'votos_totales_porcentaje': 6634.0},
   {'abstencion': 1105.0, 'abstencion_porcentaje': 3366.0},
   {'votos_nulos': 12.0, 'votos_nulos_porcentaje': 55.0},
   {'votos_blancos': 27.0, 'votos_blancos_porcentaje': 125.0}],
  'partidos': [{'pp': 15.0, 'pp_porcentaje': 4054.0},
   {'podemos_iu': 7.0, 'podemos_iu_porcentaje': 1892.0},
   {'psoe': 6.0, 'psoe_porcentaje': 1622.0},
   {'cs': 3.0, 'cs_porcentaje': 811.0},
   {'mas_madrid': 3.0, 'mas_madrid_porcentaje': 811.0},
   {'vox': 2.0, 'vox_porcentaje': 541.0},
   {'pacma': 1.0, 'pacma_porcentaje': 27.0}]},
 {'municipio': 'Alameda del Valle',
  'link': 'https://resultados.elpais.com/elecciones/2019/autonomicas/12/28/03.html',
  'escrutinio': [{'escrutado': 100.0},
   {'votos_totales': 140.0, 'votos_totales_porcentaje': 8284.0},
   {'abstencion': 29.0, 'abstencion_porcentaje': 1716.0},
   {'votos_nulos': 0.0, 'votos_nulos_porcentaje': 0.0},
   {'votos_blancos': 0.0, 'votos_blancos_porcentaje': 0.0}],
  'partidos': [{'pp': 15.0, 'pp_porcentaje': 4054.0},
   {'podemos_iu': 7.0, 'podemos_iu_porcentaje': 1892.0},
   {'psoe': 6.0, 'psoe_porcentaje': 1622.0},
   {'cs': 3.0, 'cs_porcentaje': 811.0},
   {'mas_madrid': 3.0, 'mas_madrid_porcentaje': 811.0},
   {'vox': 2.0, 'vox_porcentaje': 541.0},
   {'pacma': 1.0, 'pacma_porcentaje': 27.0}]},
   ...... ]

I would like to get the info from ['partidos] and create a table also with 'municipio' and 'link'. I tried the following to create my DataFrame:

df = pd.json_normalize(results_pruebas_formatted, record_path='partidos', meta=['municipio', 'link'])

Being the result as follows:

    pp  pp_porcentaje   podemos_iu  podemos_iu_porcentaje   psoe    psoe_porcentaje cs  cs_porcentaje   mas_madrid  mas_madrid_porcentaje   vox vox_porcentaje  pacma   pacma_porcentaje    municipio   link
0   15.0    4054.0  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Ajalvir https://resultados.elpais.com/elecciones/2019/...
1   NaN NaN 7.0 1892.0  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Ajalvir https://resultados.elpais.com/elecciones/2019/...
2   NaN NaN NaN NaN 6.0 1622.0  NaN NaN NaN NaN NaN NaN NaN NaN Ajalvir https://resultados.elpais.com/elecciones/2019/...
3   NaN NaN NaN NaN NaN NaN 3.0 811.0   NaN NaN NaN NaN NaN NaN Ajalvir https://resultados.elpais.com/elecciones/2019/...
4   NaN NaN NaN NaN NaN NaN NaN NaN 3.0 811.0   NaN NaN NaN NaN Ajalvir https://resultados.elpais.com/elecciones/2019/...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..

I would like to group by 'municipio' column to avoid rows with all NaN values except one (above 'Ajalvir' would be the county or municipality that joins the info).

I tried different options after searching on StackOver, but didn't succeed. For example:

df0 = df.groupby('municipio', axis=0, as_index=True).sum()

The structure returned is what I'm looking for (my info grouped by 'municipio' column) but I don't know why all data is the same in all rows.

pp  pp_porcentaje   podemos_iu  podemos_iu_porcentaje   psoe    psoe_porcentaje cs  cs_porcentaje   mas_madrid  mas_madrid_porcentaje   vox vox_porcentaje  pacma   pacma_porcentaje
municipio                                                       
Ajalvir 15.0    4054.0  7.0 1892.0  6.0 1622.0  3.0 811.0   3.0 811.0   2.0 541.0   1.0 27.0
Alameda del Valle   15.0    4054.0  7.0 1892.0  6.0 1622.0  3.0 811.0   3.0 811.0   2.0 541.0   1.0 27.0
Alcalá de Henares   15.0    4054.0  7.0 1892.0  6.0 1622.0  3.0 811.0   3.0 811.0   2.0 541.0   1.0 27.0
Alcobendas  15.0    4054.0  7.0 1892.0  6.0 1622.0  3.0 811.0   3.0 811.0   2.0 541.0   1.0 27.0

Other option I tried is:

df1 = df.astype(str).groupby('municipio').agg(','.join).reset_index()

And returns info in this way:

municipio   pp  pp_porcentaje   podemos_iu  podemos_iu_porcentaje   psoe    psoe_porcentaje cs  cs_porcentaje   mas_madrid  mas_madrid_porcentaje   vox vox_porcentaje  pacma   pacma_porcentaje    link
0   Ajalvir 15.0,nan,nan,nan,nan,nan,nan    4054.0,nan,nan,nan,nan,nan,nan  nan,7.0,nan,nan,nan,nan,nan nan,1892.0,nan,nan,nan,nan,nan  nan,nan,6.0,nan,nan,nan,nan nan,nan,1622.0,nan,nan,nan,nan  nan,nan,nan,3.0,nan,nan,nan nan,nan,nan,811.0,nan,nan,nan   nan,nan,nan,nan,3.0,nan,nan nan,nan,nan,nan,811.0,nan,nan   nan,nan,nan,nan,nan,2.0,nan nan,nan,nan,nan,nan,541.0,nan   nan,nan,nan,nan,nan,nan,1.0 nan,nan,nan,nan,nan,nan,27.0    https://resultados.elpais.com/elecciones/2019/...
1   Alameda del Valle   15.0,nan,nan,nan,nan,nan,nan    4054.0,nan,nan,nan,nan,nan,nan  nan,7.0,nan,nan,nan,nan,nan nan,1892.0,nan,nan,nan,nan,nan  nan,nan,6.0,nan,nan,nan,nan nan,nan,1622.0,nan,nan,nan,nan  nan,nan,nan,3.0,nan,nan,nan nan,nan,nan,811.0,nan,nan,nan   nan,nan,nan,nan,3.0,nan,nan nan,nan,nan,nan,811.0,nan,nan   nan,nan,nan,nan,nan,2.0,nan nan,nan,nan,nan,nan,541.0,nan   nan,nan,nan,nan,nan,nan,1.0 nan,nan,nan,nan,nan,nan,27.0    https://resultados.elpais.com/elecciones/2019/...
2   Alcalá de Henares   15.0,nan,nan,nan,nan,nan,nan    4054.0,nan,nan,nan,nan,nan,nan  nan,7.0,nan,nan,nan,nan,nan nan,1892.0,nan,nan,nan,nan,nan  nan,nan,6.0,nan,nan,nan,nan nan,nan,1622.0,nan,nan,nan,nan  nan,nan,nan,3.0,nan,nan,nan nan,nan,nan,811.0,nan,nan,nan   nan,nan,nan,nan,3.0,nan,nan nan,nan,nan,nan,811.0,nan,nan   nan,nan,nan,nan,nan,2.0,nan nan,nan,nan,nan,nan,541.0,nan   nan,nan,nan,nan,nan,nan,1.0 nan,nan,nan,nan,nan,nan,27.0    https://resultados.elpais.com/elecciones/2019/...

What I'm asking is how to group my data into a dataframe, but preserving the info of each row. What am I doing wrong?

Thank you in advance.

Corralien · Accepted Answer · 2021-08-23 19:19:13Z

1

You can use groupby and bfill values then keep the first row:

>>> df.groupby('municipio') \
      .apply(lambda x: x.bfill().head(1)) \
      .reset_index(drop=True)

     pp  pp_porcentaje  podemos_iu  ...  pacma_porcentaje          municipio                                               link
0  15.0         4054.0         7.0  ...              27.0            Ajalvir  https://resultados.elpais.com/elecciones/2019/...
1  15.0         4054.0         7.0  ...              27.0  Alameda del Valle  https://resultados.elpais.com/elecciones/2019/...

answered Aug 23, 2021 at 19:19

Corralien

120k8 gold badges38 silver badges61 bronze badges

Hi! thank you for your answer; it works fine but I would like not to keep the first row
– Ber Tsacianegu del Tepuy
Commented Aug 24, 2021 at 6:26
Thank you! Now I see I have all my 'partidos' data is the same. I had to make a mistake preparing data. I'm going to fix it and see if your solution can help me. Thank you again!
– Ber Tsacianegu del Tepuy
Commented Aug 24, 2021 at 6:34

Add a comment |

Collectives™ on Stack Overflow

Convert List of dictionaries to DataFrame

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
dictionary
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframedictionary or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
dictionary
or ask your own question.