Convert list of dictionaries to a pandas DataFrame

Question

How can I convert a list of dictionaries into a DataFrame? I want to turn

[{'points': 50, 'time': '5:00', 'year': 2010}, 
 {'points': 25, 'time': '6:00', 'month': "february"}, 
 {'points':90, 'time': '9:00', 'month': 'january'}, 
 {'points_h1':20, 'month': 'june'}]

into

      month  points  points_h1  time  year
0       NaN      50        NaN  5:00  2010
1  february      25        NaN  6:00   NaN
2   january      90        NaN  9:00   NaN
3      june     NaN         20   NaN   NaN

Mateen Ulhaq · Accepted Answer · 2022-06-04 22:32:21Z

1681

If ds is a list of dicts:

df = pd.DataFrame(ds)

Note: this does not work with nested data.

edited Jun 4, 2022 at 22:32

Mateen Ulhaq

27.1k21 gold badges117 silver badges152 bronze badges

answered Dec 17, 2013 at 15:35

joris

139k37 gold badges254 silver badges207 bronze badges

5

How might one use one of the key/value pairs as the index (eg. time)?
– CatsLoveJazz
Commented Jun 28, 2016 at 13:37
13

@CatsLoveJazz You can just do df = df.set_index('time') afterwards
– joris
Commented Jun 28, 2016 at 13:38
4

@CatsLoveJazz No, that is not possible when converting from a dict.
– joris
Commented Jun 29, 2016 at 8:16
7

As of Pandas 0.19.2, there's no mention of this in the documentation, at least not in the docs for pandas.DataFrame
– Leo Alekseyev
Commented Apr 13, 2017 at 22:56
4

Mind that for a nested dictionary '{"":{"... you use the json_normalize approach, see the detailed answer of @cs95
– questionto42
Commented May 27, 2020 at 22:16

| Show 5 more comments

Asclepius · Accepted Answer · 2023-02-08 21:10:33Z

How do I convert a list of dictionaries to a pandas DataFrame?

The other answers are correct, but not much has been explained in terms of advantages and limitations of these methods. The aim of this post will be to show examples of these methods under different situations, discuss when to use (and when not to use), and suggest alternatives.

`DataFrame()`, `DataFrame.from_records()`, and `.from_dict()`

Depending on the structure and format of your data, there are situations where either all three methods work, or some work better than others, or some don't work at all.

Consider a very contrived example.

np.random.seed(0)
data = pd.DataFrame(
    np.random.choice(10, (3, 4)), columns=list('ABCD')).to_dict('r')

print(data)
[{'A': 5, 'B': 0, 'C': 3, 'D': 3},
 {'A': 7, 'B': 9, 'C': 3, 'D': 5},
 {'A': 2, 'B': 4, 'C': 7, 'D': 6}]

This list consists of "records" with every keys present. This is the simplest case you could encounter.

# The following methods all produce the same output.
pd.DataFrame(data)
pd.DataFrame.from_dict(data)
pd.DataFrame.from_records(data)

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

Word on Dictionary Orientations: `orient='index'`/`'columns'`

Before continuing, it is important to make the distinction between the different types of dictionary orientations, and support with pandas. There are two primary types: "columns", and "index".

orient='columns'
Dictionaries with the "columns" orientation will have their keys correspond to columns in the equivalent DataFrame.

For example, data above is in the "columns" orient.

data_c = [
 {'A': 5, 'B': 0, 'C': 3, 'D': 3},
 {'A': 7, 'B': 9, 'C': 3, 'D': 5},
 {'A': 2, 'B': 4, 'C': 7, 'D': 6}]

pd.DataFrame.from_dict(data_c, orient='columns')

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

Note: If you are using pd.DataFrame.from_records, the orientation is assumed to be "columns" (you cannot specify otherwise), and the dictionaries will be loaded accordingly.

orient='index'
With this orient, keys are assumed to correspond to index values. This kind of data is best suited for pd.DataFrame.from_dict.

data_i ={
 0: {'A': 5, 'B': 0, 'C': 3, 'D': 3},
 1: {'A': 7, 'B': 9, 'C': 3, 'D': 5},
 2: {'A': 2, 'B': 4, 'C': 7, 'D': 6}}

pd.DataFrame.from_dict(data_i, orient='index')

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

This case is not considered in the OP, but is still useful to know.

Setting Custom Index

If you need a custom index on the resultant DataFrame, you can set it using the index=... argument.

pd.DataFrame(data, index=['a', 'b', 'c'])
# pd.DataFrame.from_records(data, index=['a', 'b', 'c'])

   A  B  C  D
a  5  0  3  3
b  7  9  3  5
c  2  4  7  6

This is not supported by pd.DataFrame.from_dict.

Dealing with Missing Keys/Columns

All methods work out-of-the-box when handling dictionaries with missing keys/column values. For example,

data2 = [
     {'A': 5, 'C': 3, 'D': 3},
     {'A': 7, 'B': 9, 'F': 5},
     {'B': 4, 'C': 7, 'E': 6}]

# The methods below all produce the same output.
pd.DataFrame(data2)
pd.DataFrame.from_dict(data2)
pd.DataFrame.from_records(data2)

     A    B    C    D    E    F
0  5.0  NaN  3.0  3.0  NaN  NaN
1  7.0  9.0  NaN  NaN  NaN  5.0
2  NaN  4.0  7.0  NaN  6.0  NaN

Reading Subset of Columns

"What if I don't want to read in every single column"? You can easily specify this using the columns=... parameter.

For example, from the example dictionary of data2 above, if you wanted to read only columns "A', 'D', and 'F', you can do so by passing a list:

pd.DataFrame(data2, columns=['A', 'D', 'F'])
# pd.DataFrame.from_records(data2, columns=['A', 'D', 'F'])

     A    D    F
0  5.0  3.0  NaN
1  7.0  NaN  5.0
2  NaN  NaN  NaN

This is not supported by pd.DataFrame.from_dict with the default orient "columns".

pd.DataFrame.from_dict(data2, orient='columns', columns=['A', 'B'])

ValueError: cannot use columns parameter with orient='columns'

Reading Subset of Rows

Not supported by any of these methods directly. You will have to iterate over your data and perform a reverse delete in-place as you iterate. For example, to extract only the 0^th and 2^nd rows from data2 above, you can use:

rows_to_select = {0, 2}
for i in reversed(range(len(data2))):
    if i not in rows_to_select:
        del data2[i]

pd.DataFrame(data2)
# pd.DataFrame.from_dict(data2)
# pd.DataFrame.from_records(data2)

     A    B  C    D    E
0  5.0  NaN  3  3.0  NaN
1  NaN  4.0  7  NaN  6.0

The Panacea: `json_normalize` for Nested Data

A strong, robust alternative to the methods outlined above is the json_normalize function which works with lists of dictionaries (records), and in addition can also handle nested dictionaries.

pd.json_normalize(data)

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6

pd.json_normalize(data2)

     A    B  C    D    E
0  5.0  NaN  3  3.0  NaN
1  NaN  4.0  7  NaN  6.0

Again, keep in mind that the data passed to json_normalize needs to be in the list-of-dictionaries (records) format.

As mentioned, json_normalize can also handle nested dictionaries. Here's an example taken from the documentation.

data_nested = [
  {'counties': [{'name': 'Dade', 'population': 12345},
                {'name': 'Broward', 'population': 40000},
                {'name': 'Palm Beach', 'population': 60000}],
   'info': {'governor': 'Rick Scott'},
   'shortname': 'FL',
   'state': 'Florida'},
  {'counties': [{'name': 'Summit', 'population': 1234},
                {'name': 'Cuyahoga', 'population': 1337}],
   'info': {'governor': 'John Kasich'},
   'shortname': 'OH',
   'state': 'Ohio'}
]

pd.json_normalize(data_nested, 
                          record_path='counties', 
                          meta=['state', 'shortname', ['info', 'governor']])

         name  population    state shortname info.governor
0        Dade       12345  Florida        FL    Rick Scott
1     Broward       40000  Florida        FL    Rick Scott
2  Palm Beach       60000  Florida        FL    Rick Scott
3      Summit        1234     Ohio        OH   John Kasich
4    Cuyahoga        1337     Ohio        OH   John Kasich

For more information on the meta and record_path arguments, check out the documentation.

Summarising

Here's a table of all the methods discussed above, along with supported features/functionality.

_{* Use orient='columns' and then transpose to get the same effect as orient='index'.}

anyone have FutureWarning: Using short name for 'orient' is deprecated. for the first one? pd.__version__ '1.3.5' — rubengavidia0x, Commented Feb 25, 2022 at 22:39
This is a wholesome answer. Although in my case, I have a pandas column with the values as a list of dictionaries. And I want to explode each of those values into a column in the same dataframe. I am unable to apply your method to my case. — Avantika Banerjee, Commented Sep 27, 2022 at 13:58
Maybe I missed it, but was there ever a time that from_records was useful? It seemed to be redundant every time it was applicable. — Joe, Commented Sep 19, 2023 at 3:41

Asclepius · Accepted Answer · 2018-05-14 13:12:38Z

109

In pandas 16.2, I had to do pd.DataFrame.from_records(d) to get this to work.

edited May 14, 2018 at 13:12

Asclepius

63k19 gold badges186 silver badges156 bronze badges

answered Oct 8, 2015 at 15:59

szeitlin

3,3432 gold badges24 silver badges19 bronze badges

Add a comment |

shivsn · Accepted Answer · 2017-07-07 06:03:08Z

You can also use pd.DataFrame.from_dict(d) as :

In [8]: d = [{'points': 50, 'time': '5:00', 'year': 2010}, 
   ...: {'points': 25, 'time': '6:00', 'month': "february"}, 
   ...: {'points':90, 'time': '9:00', 'month': 'january'}, 
   ...: {'points_h1':20, 'month': 'june'}]

In [12]: pd.DataFrame.from_dict(d)
Out[12]: 
      month  points  points_h1  time    year
0       NaN    50.0        NaN  5:00  2010.0
1  february    25.0        NaN  6:00     NaN
2   january    90.0        NaN  9:00     NaN
3      june     NaN       20.0   NaN     NaN

Community · Accepted Answer · 2020-06-20 09:12:55Z

Pyhton3: Most of the solutions listed previously work. However, there are instances when row_number of the dataframe is not required and the each row (record) has to be written individually.

The following method is useful in that case.

import csv

my file= 'C:\Users\John\Desktop\export_dataframe.csv'

records_to_save = data2 #used as in the thread. 


colnames = list[records_to_save[0].keys()] 
# remember colnames is a list of all keys. All values are written corresponding
# to the keys and "None" is specified in case of missing value 

with open(myfile, 'w', newline="",encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(colnames)
    for d in records_to_save:
        writer.writerow([d.get(r, "None") for r in colnames])

cottontail · Accepted Answer · 2023-10-31 23:37:28Z

If there are missing keys in the dicts, simple pd.DataFrame() construction will handle it by assigning NaN values to the missing keys. This "messes up" the dtypes and converts integers to floats. For example, using the sample data in the OP, 'year' column has missing values which get converted into floats, which is probably not something desirable now that we have nullable integer dtypes. One way to solve this issue is to construct the dataframe anyway and handle the dtypes later using astype():

lst = [{'points': 50, 'time': '5:00', 'year': 2010}, 
       {'points': 25, 'time': '6:00', 'month': "february"}, 
       {'points':90, 'time': '9:00', 'month': 'january'}, 
       {'points_h1':20, 'month': 'june'}]

dtypes = {'points': 'Int32', 'time': 'string', 'year': 'Int32', 'month': 'string', 'points_h1': 'Int32'}
df = pd.DataFrame(lst).astype(dtypes)

However, if there are a lot of keys, it doesn't scale well. A simple out-of-the-box method is to convert the list into a json array and read as a json using pd.read_json. Nice thing about it is that you can set a dtype during construction, which casts integers into Int dtypes but leaves everything else (e.g. strings, floats etc.) as is.

import json, io  # both of these are in the standard library
df = pd.read_json(io.StringIO(json.dumps(lst)), dtype='Int32')

cs95 · Accepted Answer · 2020-10-01 11:04:58Z

0

The easiest way I have found to do it is like this:

dict_count = len(dict_list)
df = pd.DataFrame(dict_list[0], index=[0])
for i in range(1,dict_count-1):
    df = df.append(dict_list[i], ignore_index=True)

edited Oct 1, 2020 at 11:04

cs95

401k104 gold badges735 silver badges788 bronze badges

answered Aug 15, 2019 at 6:12

scottapotamus

5985 silver badges20 bronze badges

Avoid looping when using pandas, looping kill's the whole purpose of pandas
– sushanth
Commented Aug 31, 2020 at 11:29
I didn't downvote, but while this will technically work its performance is quite poor. See this for more information.
– EJoshuaS - Stand with Ukraine
Commented Jun 27, 2021 at 4:50

Add a comment |

President James K. Polk · Accepted Answer · 2024-02-24 15:54:59Z

0

I have the following list of dicts with datetime keys and int values:

list = [{datetime.date(2022, 2, 10): 7}, 
        {datetime.date(2022, 2, 11): 1}, 
        {datetime.date(2022, 2, 11): 1}]

I had a problem converting it to a Dataframe with the methods above as it created a Dataframe with columns with dates...

My solution:

df = pd.DataFrame()
for i in list:
    temp_df = pd.DataFrame.from_dict(i, orient='index')
    df = df.append(temp_df)

edited Feb 24 at 15:54

President James K. Polk

41.9k26 gold badges107 silver badges141 bronze badges

answered Jul 13, 2022 at 8:33

vloubes

3154 silver badges19 bronze badges

You are changing orientation of the dataframe. Selected answer will also give you dataframe in column/vertical orientation.
– GodWin1100
Commented Jul 13, 2022 at 17:56

Add a comment |

Collectives™ on Stack Overflow

Convert list of dictionaries to a pandas DataFrame

8 Answers 8

How do I convert a list of dictionaries to a pandas DataFrame?

`DataFrame()`, `DataFrame.from_records()`, and `.from_dict()`

Word on Dictionary Orientations: `orient='index'`/`'columns'`

Setting Custom Index

Dealing with Missing Keys/Columns

Reading Subset of Columns

Reading Subset of Rows

The Panacea: `json_normalize` for Nested Data

Summarising

Not the answer you're looking for? Browse other questions tagged
python
dictionary
pandas
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

How do I convert a list of dictionaries to a pandas DataFrame?

DataFrame(), DataFrame.from_records(), and .from_dict()

Word on Dictionary Orientations: orient='index'/'columns'

Setting Custom Index

Dealing with Missing Keys/Columns

Reading Subset of Columns

Reading Subset of Rows

The Panacea: json_normalize for Nested Data

Summarising

Not the answer you're looking for? Browse other questions tagged pythondictionarypandasdataframe or ask your own question.

Linked

Related

`DataFrame()`, `DataFrame.from_records()`, and `.from_dict()`

Word on Dictionary Orientations: `orient='index'`/`'columns'`

The Panacea: `json_normalize` for Nested Data

Not the answer you're looking for? Browse other questions tagged
python
dictionary
pandas
dataframe
or ask your own question.