Get difference between two lists with Unique Entries

Question

I have two lists in Python:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

Assuming the elements in each list are unique, I want to create a third list with items from the first list which are not in the second list:

temp3 = ['Three', 'Four']

Are there any fast ways without cycles and checking?

Are the elements guaranteed unique? If you have temp1 = ['One', 'One', 'One'] and temp2 = ['One'], do you want ['One', 'One'] back, or []? — Michael Mrozek, Commented Aug 11, 2010 at 19:43
Does this answer your question? Finding elements not in a list — Gonçalo Peres, Commented Apr 28, 2021 at 8:48

Austin · Accepted Answer · 2022-12-20 15:36:20Z

1813

To get elements which are in temp1 but not in temp2 (assuming uniqueness of the elements in each list):

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

Beware that it is asymmetric :

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1])

where you might expect/want it to equal set([1, 3]). If you do want set([1, 3]) as your answer, you can use set([1, 2]).symmetric_difference(set([2, 3])).

edited Dec 20, 2022 at 15:36

Austin

8,4953 gold badges32 silver badges38 bronze badges

answered Aug 11, 2010 at 19:40

ars

123k23 gold badges150 silver badges135 bronze badges

48

@Drewdin: Lists do not support the "-" operand. Sets, however, do, and that what is demonstrated above if you look closely.
– Godsmith
Commented Oct 14, 2014 at 21:21
79

symmetric difference can be written with: ^ (set1 ^ set2)
– Bastian
Commented Oct 1, 2015 at 18:18
10

Note that since sets are unordered, an iterator over the difference can return the elements in any order. E.g., list(set(temp1) - set(temp2)) == ['Four', 'Three'] or list(set(temp1) - set(temp2)) == ['Three', 'Four'].
– Arthur
Commented Mar 31, 2017 at 12:50
6

Order of input list is not preserved by this method.
– Krishna
Commented Apr 25, 2019 at 13:29
9

what if there are duplicate elements? For example a=[1, 1, 1, 1, 2, 2], b=[1, 1, 2, 2]
– Shark Deng
Commented Jun 26, 2020 at 4:58

| Show 10 more comments

Mark Byers · Accepted Answer · 2010-08-11 20:35:25Z

657

The existing solutions all offer either one or the other of:

Faster than O(n*m) performance.
Preserve order of input list.

But so far no solution has both. If you want both, try this:

s = set(temp2)
temp3 = [x for x in temp1 if x not in s]

Performance test

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)
print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)
print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)

Results:

4.34620224079 # ars' answer
4.2770634955  # This answer
30.7715615392 # matt b's answer

The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn't require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here's a second test demonstrating this:

init = '''
temp1 = [str(i) for i in range(100000)]
temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer
3.63890368748 # this answer (3 times faster!)
37.7445402279 # matt b's answer

edited Aug 11, 2010 at 20:35

answered Aug 11, 2010 at 19:44

Mark Byers

836k199 gold badges1.6k silver badges1.5k bronze badges

3

Additional support for this answer: Ran across a use case where preserving list order was important for performance. When working with tarinfo or zipinfo objects I was using set subtraction. To exclude certain tarinfo objects from being extracted from the archive. Creating the new list was fast but super slow during extraction. The reason evaded me at first. Turns out reordering the tarinfo objects list caused a huge performance penalty. Switching to the list comprehension method saved the day.
– Ray Thompson
Commented Dec 13, 2011 at 0:26
@MarkByers - perhaps I should write an entirely new question for this. But how would this work in a forloop? For instance, if my temp1 and temp2 keep changing.. and I want to append the new information to temp3?
– Ason
Commented Aug 9, 2012 at 17:57
1

Could you please explain why your code takes less time than that of matt's answer? @MarkByers
– haccks
Commented Nov 5, 2015 at 16:36
8

@haccks Because checking membership of a list is an O(n) operation (iterating over the entire list), but checking membership of a set is O(1).
– Mark Byers
Commented Nov 5, 2015 at 16:57
1

This is still the fastest answer in v3.11. Note that denfromufa's answer using numpy is much much slower.
– ChaimG
Commented Jan 12, 2023 at 15:02

| Show 4 more comments

SuperNova · Accepted Answer · 2016-09-19 05:25:16Z

272

Can be done using python XOR operator.

This will remove the duplicates in each list
This will show difference of temp1 from temp2 and temp2 from temp1.

set(temp1) ^ set(temp2)

edited Sep 19, 2016 at 5:25

answered Jul 7, 2016 at 7:50

SuperNova

27.3k7 gold badges96 silver badges69 bronze badges

1

Good find! I always overlooked this section of the documentation: docs.python.org/3/library/….
– user3521099
Commented Dec 28, 2020 at 16:09
4

This is the best for a 2-side difference
– EuberDeveloper
Commented Mar 17, 2021 at 22:35
1

Definitely the best answer that addresses the OP's question directly "Get difference between two lists". The others are too complicated with side cases. And there is no datatype conversion.
– Rich Lysakowski PhD
Commented Apr 7, 2021 at 3:25
2

does this perform better than any other solution w.r.t time?
– Gangula
Commented Sep 13, 2021 at 9:11
3

@Gangula To see the difference between the two methods, add a value to temp2 that is not present in temp1 and try again.
– urig
Commented Oct 19, 2021 at 5:41

| Show 4 more comments

Chris · Accepted Answer · 2022-05-06 18:19:15Z

141

You could use list comprehension:

temp3 = [item for item in temp1 if item not in temp2]

edited May 6, 2022 at 18:19

Chris

1,2912 gold badges17 silver badges39 bronze badges

answered Aug 11, 2010 at 19:40

matt b

140k66 gold badges284 silver badges350 bronze badges

23

Turning temp2 into a set before would make this a bit more efficient.
– user355252
Commented Aug 11, 2010 at 19:47
9

True, depends if Ockonal cares about duplicates or not (original question doesn't say)
– matt b
Commented Aug 11, 2010 at 19:47
4

Comment says the (lists|tuples) don't have duplicates.
– user395760
Commented Aug 11, 2010 at 19:52
2

I upvoted your answer because I thought you were right about the duplicates at first. But item not in temp2 and item not in set(temp2) will always return the same results, regardless if there are duplicates or not in temp2.
– arekolek
Commented Mar 7, 2016 at 22:42
9

Up vote for not requiring list items to be hashable.
– Brent
Commented Sep 11, 2017 at 15:19

Add a comment |

Maciej Kucharz · Accepted Answer · 2010-08-11 19:39:59Z

32

Try this:

temp3 = set(temp1) - set(temp2)

answered Aug 11, 2010 at 19:39

Maciej Kucharz

1,4431 gold badge14 silver badges17 bronze badges

Add a comment |

Seperman · Accepted Answer · 2016-03-10 21:57:35Z

In case you want the difference recursively, I have written a package for python: https://github.com/seperman/deepdiff

Installation

Install from PyPi:

pip install deepdiff

Example usage

Importing

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2

Same object returns empty

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}

Type of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}

Value of an item has changed

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

Item added and/or removed

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}

String difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}

String difference 2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End

Type change

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}

List difference

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}

List difference 2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}

List difference ignoring order or duplicates: (with the same dictionaries as above)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}

List that contains dictionary:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}

Sets:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}

Named Tuples:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}

Custom objects:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

Object attribute added:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}

arulmr · Accepted Answer · 2017-07-05 12:07:17Z

30

The difference between two lists (say list1 and list2) can be found using the following simple function.

def diff(list1, list2):
    c = set(list1).union(set(list2))  # or c = set(list1) | set(list2)
    d = set(list1).intersection(set(list2))  # or d = set(list1) & set(list2)
    return list(c - d)

or

def diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))  # or return list(set(list1) ^ set(list2))

By Using the above function, the difference can be found using diff(temp2, temp1) or diff(temp1, temp2). Both will give the result ['Four', 'Three']. You don't have to worry about the order of the list or which list is to be given first.

Python doc reference

edited Jul 5, 2017 at 12:07

answered Aug 17, 2012 at 11:38

arulmr

8,8369 gold badges56 silver badges70 bronze badges

7

Why not set(list1).symmetric_difference(set(list2))?
– swietyy
Commented Mar 4, 2015 at 16:49

Add a comment |

Mohideen bin Mohammed · Accepted Answer · 2017-11-30 12:59:00Z

26

most simple way,

use set().difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))

answer is set([1])

can print as a list,

print list(set(list_a).difference(set(list_b)))

answered Nov 30, 2017 at 12:59

Mohideen bin Mohammed

20.1k11 gold badges118 silver badges128 bronze badges

2

removes dupllicates and doesn't preserve order
– Albert Rothman
Commented Sep 3, 2020 at 18:51

Add a comment |

aaronasterling · Accepted Answer · 2010-08-11 20:57:18Z

19

i'll toss in since none of the present solutions yield a tuple:

temp3 = tuple(set(temp1) - set(temp2))

alternatively:

#edited using @Mark Byers idea. If you accept this one as answer, just accept his instead.
temp3 = tuple(x for x in temp1 if x not in set(temp2))

Like the other non-tuple yielding answers in this direction, it preserves order

edited Aug 11, 2010 at 20:57

answered Aug 11, 2010 at 19:42

aaronasterling

70.9k20 gold badges131 silver badges127 bronze badges

Add a comment |

den.run.ai · Accepted Answer · 2017-04-29 07:04:12Z

18

If you are really looking into performance, then use numpy!

Here is the full notebook as a gist on github with comparison between list, numpy, and pandas.

https://gist.github.com/denfromufa/2821ff59b02e9482be15d27f2bbd4451

edited Apr 29, 2017 at 7:04

answered Aug 7, 2015 at 15:30

den.run.ai

5,92316 gold badges85 silver badges144 bronze badges

i updated the notebook in the link and also the screenshot. Surprisingly pandas is slower than numpy even when switching to hashtable internally. Partly this maybe due to upcasting to int64.
– den.run.ai
Commented Apr 29, 2017 at 7:06
1

running the tests from Mark Byers Answer, numpy took the longest of all answers (ars, SuperNova, Mark Byers, Matt b).
– Gangula
Commented Sep 13, 2021 at 14:24

Add a comment |

arekolek · Accepted Answer · 2016-03-07 22:35:27Z

I wanted something that would take two lists and could do what diff in bash does. Since this question pops up first when you search for "python diff two lists" and is not very specific, I will post what I came up with.

Using SequenceMather from difflib you can compare two lists like diff does. None of the other answers will tell you the position where the difference occurs, but this one does. Some answers give the difference in only one direction. Some reorder the elements. Some don't handle duplicates. But this solution gives you a true difference between two lists:

a = 'A quick fox jumps the lazy dog'.split()
b = 'A quick brown mouse jumps over the dog'.split()

from difflib import SequenceMatcher

for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
  if tag == 'equal': print('both have', a[i:j])
  if tag in ('delete', 'replace'): print('  1st has', a[i:j])
  if tag in ('insert', 'replace'): print('  2nd has', b[k:l])

This outputs:

both have ['A', 'quick']
  1st has ['fox']
  2nd has ['brown', 'mouse']
both have ['jumps']
  2nd has ['over']
both have ['the']
  1st has ['lazy']
both have ['dog']

Of course, if your application makes the same assumptions the other answers make, you will benefit from them the most. But if you are looking for a true diff functionality, then this is the only way to go.

For example, none of the other answers could handle:

a = [1,2,3,4,5]
b = [5,4,3,2,1]

But this one does:

  2nd has [5, 4, 3, 2]
both have [1]
  1st has [2, 3, 4, 5]

Taylor D. Edmiston · Accepted Answer · 2017-02-07 03:26:27Z

Here's a Counter answer for the simplest case.

This is shorter than the one above that does two-way diffs because it only does exactly what the question asks: generate a list of what's in the first list but not the second.

from collections import Counter

lst1 = ['One', 'Two', 'Three', 'Four']
lst2 = ['One', 'Two']

c1 = Counter(lst1)
c2 = Counter(lst2)
diff = list((c1 - c2).elements())

Alternatively, depending on your readability preferences, it makes for a decent one-liner:

diff = list((Counter(lst1) - Counter(lst2)).elements())

Output:

['Three', 'Four']

Note that you can remove the list(...) call if you are just iterating over it.

Because this solution uses counters, it handles quantities properly vs the many set-based answers. For example on this input:

lst1 = ['One', 'Two', 'Two', 'Two', 'Three', 'Three', 'Four']
lst2 = ['One', 'Two']

The output is:

['Two', 'Two', 'Three', 'Three', 'Four']

Good job! This is the right, general answer.
– Victor Wang
Commented Jan 13, 2022 at 7:25 — Victor Wang, Commented Jan 13, 2022 at 7:25

pylang · Accepted Answer · 2018-07-21 00:22:48Z

13

this could be even faster than Mark's list comprehension:

list(itertools.filterfalse(set(temp2).__contains__, temp1))

edited Jul 21, 2018 at 0:22

pylang

44.3k15 gold badges133 silver badges129 bronze badges

answered Aug 18, 2011 at 6:01

Mohammed

1391 silver badge3 bronze badges

7

Might want to include the from itertools import filterfalse bit here. Also note that this doesn't return a sequence like the others, it returns an iterator.
– Matt Luongo
Commented Jan 17, 2012 at 16:16

Add a comment |

Abercrombie · Accepted Answer · 2021-04-26 01:49:09Z

8

Here is a modified version of @SuperNova's answer

def get_diff(a: list, b: list) -> list:
    return list(set(a) ^ set(b))

answered Apr 26, 2021 at 1:49

Abercrombie

1,0862 gold badges15 silver badges23 bronze badges

Add a comment |

sreemanth pulagam · Accepted Answer · 2022-09-03 00:54:55Z

8

single line version of arulmr solution

def diff(listA, listB):
    return set(listA) - set(listB) | set(listB) -set(listA)

edited Sep 3, 2022 at 0:54

answered Jun 26, 2014 at 11:54

sreemanth pulagam

95310 silver badges24 bronze badges

This makes no sense and is very unclear. Is it (set(a) - set(b)) | (set(a) - set(b)) (union of a difference with itself?) or set(a) - (set(b) | set(a)) - set(b) (which would subtract the whole set a from itself, always leading to an empty result)?. I can tell you that it is the first one, because of operator precedence, but still, the union and the repetition here is useless.
– Victor Schröder
Commented Feb 14, 2022 at 18:21

Add a comment |

manhgd · Accepted Answer · 2014-03-29 02:36:31Z

7

This is another solution:

def diff(a, b):
    xa = [i for i in set(a) if i not in b]
    xb = [i for i in set(b) if i not in a]
    return xa + xb

answered Mar 29, 2014 at 2:36

manhgd

871 silver badge1 bronze badge

Add a comment |

Jenobi · Accepted Answer · 2018-10-13 20:40:16Z

Let's say we have two lists

list1 = [1, 3, 5, 7, 9]
list2 = [1, 2, 3, 4, 5]

we can see from the above two lists that items 1, 3, 5 exist in list2 and items 7, 9 do not. On the other hand, items 1, 3, 5 exist in list1 and items 2, 4 do not.

What is the best solution to return a new list containing items 7, 9 and 2, 4?

All answers above find the solution, now whats the most optimal?

def difference(list1, list2):
    new_list = []
    for i in list1:
        if i not in list2:
            new_list.append(i)

    for j in list2:
        if j not in list1:
            new_list.append(j)
    return new_list

versus

def sym_diff(list1, list2):
    return list(set(list1).symmetric_difference(set(list2)))

Using timeit we can see the results

t1 = timeit.Timer("difference(list1, list2)", "from __main__ import difference, 
list1, list2")
t2 = timeit.Timer("sym_diff(list1, list2)", "from __main__ import sym_diff, 
list1, list2")

print('Using two for loops', t1.timeit(number=100000), 'Milliseconds')
print('Using two for loops', t2.timeit(number=100000), 'Milliseconds')

returns

[7, 9, 2, 4]
Using two for loops 0.11572412995155901 Milliseconds
Using symmetric_difference 0.11285737506113946 Milliseconds

Process finished with exit code 0

Oksana B · Accepted Answer · 2021-02-17 13:49:22Z

7

If you should remove all values from list a, which are present in list b.

def list_diff(a, b):
    r = []

    for i in a:
        if i not in b:
            r.append(i)
    return r

list_diff([1,2,2], [1])

Result: [2,2]

or

def list_diff(a, b):
    return [x for x in a if x not in b]

edited Feb 17, 2021 at 13:49

answered Feb 17, 2021 at 13:06

Oksana B

3811 gold badge5 silver badges19 bronze badges

Add a comment |

soundcorner · Accepted Answer · 2014-05-29 13:21:24Z

You could use a naive method if the elements of the difflist are sorted and sets.

list1=[1,2,3,4,5]
list2=[1,2,3]

print list1[len(list2):]

or with native set methods:

subset=set(list1).difference(list2)

print subset

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
print "Naive solution: ", timeit.timeit('temp1[len(temp2):]', init, number = 100000)
print "Native set solution: ", timeit.timeit('set(temp1).difference(temp2)', init, number = 100000)

Naive solution: 0.0787101593292

Native set solution: 0.998837615564

Community · Accepted Answer · 2017-05-23 12:10:41Z

6

If you run into TypeError: unhashable type: 'list' you need to turn lists or sets into tuples, e.g.

set(map(tuple, list_of_lists1)).symmetric_difference(set(map(tuple, list_of_lists2)))

See also How to compare a list of lists/sets in python?

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Jul 10, 2014 at 10:26

the

21.8k12 gold badges72 silver badges102 bronze badges

Add a comment |

Alex Jacob · Accepted Answer · 2017-10-18 17:44:58Z

I am little too late in the game for this but you can do a comparison of performance of some of the above mentioned code with this, two of the fastest contenders are,

list(set(x).symmetric_difference(set(y)))
list(set(x) ^ set(y))

I apologize for the elementary level of coding.

import time
import random
from itertools import filterfalse

# 1 - performance (time taken)
# 2 - correctness (answer - 1,4,5,6)
# set performance
performance = 1
numberoftests = 7

def answer(x,y,z):
    if z == 0:
        start = time.clock()
        lists = (str(list(set(x)-set(y))+list(set(y)-set(y))))
        times = ("1 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 1:
        start = time.clock()
        lists = (str(list(set(x).symmetric_difference(set(y)))))
        times = ("2 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 2:
        start = time.clock()
        lists = (str(list(set(x) ^ set(y))))
        times = ("3 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 3:
        start = time.clock()
        lists = (filterfalse(set(y).__contains__, x))
        times = ("4 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 4:
        start = time.clock()
        lists = (tuple(set(x) - set(y)))
        times = ("5 = " + str(time.clock() - start))
        return (lists,times)

    elif z == 5:
        start = time.clock()
        lists = ([tt for tt in x if tt not in y])
        times = ("6 = " + str(time.clock() - start))
        return (lists,times)

    else:    
        start = time.clock()
        Xarray = [iDa for iDa in x if iDa not in y]
        Yarray = [iDb for iDb in y if iDb not in x]
        lists = (str(Xarray + Yarray))
        times = ("7 = " + str(time.clock() - start))
        return (lists,times)

n = numberoftests

if performance == 2:
    a = [1,2,3,4,5]
    b = [3,2,6]
    for c in range(0,n):
        d = answer(a,b,c)
        print(d[0])

elif performance == 1:
    for tests in range(0,10):
        print("Test Number" + str(tests + 1))
        a = random.sample(range(1, 900000), 9999)
        b = random.sample(range(1, 900000), 9999)
        for c in range(0,n):
            #if c not in (1,4,5,6):
            d = answer(a,b,c)
            print(d[1])

pylang · Accepted Answer · 2018-08-24 18:09:46Z

6

Here are a few simple, order-preserving ways of diffing two lists of strings.

Code

An unusual approach using pathlib:

import pathlib


temp1 = ["One", "Two", "Three", "Four"]
temp2 = ["One", "Two"]

p = pathlib.Path(*temp1)
r = p.relative_to(*temp2)
list(r.parts)
# ['Three', 'Four']

This assumes both lists contain strings with equivalent beginnings. See the docs for more details. Note, it is not particularly fast compared to set operations.

A straight-forward implementation using itertools.zip_longest:

import itertools as it


[x for x, y in it.zip_longest(temp1, temp2) if x != y]
# ['Three', 'Four']

edited Aug 24, 2018 at 18:09

answered May 10, 2018 at 1:35

pylang

44.3k15 gold badges133 silver badges129 bronze badges

1

The itertools solution only works when the elements in temp1 and temp2 line up well. If you, for example, turn around the elements in temp2 or insert some other value in the beginning of temp2, the listcomp will just return the same elements as in temp1
– KenHBS
Commented Aug 24, 2018 at 10:50
Yes, it is a feature of these approaches. As mentioned, these solutions are order preserving - they assume some relative order between the lists. An unordered solution would be to diff two sets.
– pylang
Commented Aug 24, 2018 at 18:09
"This assumes both lists contain strings with equivalent beginnings". If you assume this, all you need to do is truncate the longest list with the length of the shortest; you don’t need pathlib or any module.
– bfontaine
Commented Mar 31, 2023 at 15:06

Add a comment |

Shakhyar Gogoi · Accepted Answer · 2020-05-22 07:09:47Z

6

I prefer to use converting to sets and then using the "difference()" function. The full code is :

temp1 = ['One', 'Two', 'Three', 'Four'  ]                   
temp2 = ['One', 'Two']
set1 = set(temp1)
set2 = set(temp2)
set3 = set1.difference(set2)
temp3 = list(set3)
print(temp3)

Output:

>>>print(temp3)
['Three', 'Four']

It's the easiest to undersand, and morover in future if you work with large data, converting it to sets will remove duplicates if duplicates are not required. Hope it helps ;-)

answered May 22, 2020 at 7:09

Shakhyar Gogoi

1241 silver badge2 bronze badges

1

The difference function is the same as the - operator shown in the accepted answer, so not sure this really adds any new information 10 years later
– OneCricketeer
Commented Dec 3, 2020 at 6:49

Add a comment |

Kiprono Elijah Koech · Accepted Answer · 2021-01-24 10:38:43Z

5

I know this question got great answers already but I wish to add the following method using numpy.

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two']

list(np.setdiff1d(temp1,temp2))

['Four', 'Three'] #Output

answered Jan 24, 2021 at 10:38

Kiprono Elijah Koech

3384 silver badges5 bronze badges

Add a comment |

Nicholas Franceschina · Accepted Answer · 2015-12-17 22:20:50Z

if you want something more like a changeset... could use Counter

from collections import Counter

def diff(a, b):
  """ more verbose than needs to be, for clarity """
  ca, cb = Counter(a), Counter(b)
  to_add = cb - ca
  to_remove = ca - cb
  changes = Counter(to_add)
  changes.subtract(to_remove)
  return changes

lista = ['one', 'three', 'four', 'four', 'one']
listb = ['one', 'two', 'three']

In [127]: diff(lista, listb)
Out[127]: Counter({'two': 1, 'one': -1, 'four': -2})
# in order to go from lista to list b, you need to add a "two", remove a "one", and remove two "four"s

In [128]: diff(listb, lista)
Out[128]: Counter({'four': 2, 'one': 1, 'two': -1})
# in order to go from listb to lista, you must add two "four"s, add a "one", and remove a "two"

Mohammad Etemaddar · Accepted Answer · 2017-04-21 11:34:24Z

3

We can calculate intersection minus union of lists:

temp1 = ['One', 'Two', 'Three', 'Four']
temp2 = ['One', 'Two', 'Five']

set(temp1+temp2)-(set(temp1)&set(temp2))

Out: set(['Four', 'Five', 'Three'])

answered Apr 21, 2017 at 11:34

Mohammad Etemaddar

4634 silver badges18 bronze badges

Add a comment |

fgaim · Accepted Answer · 2017-06-09 15:44:02Z

2

This can be solved with one line. The question is given two lists (temp1 and temp2) return their difference in a third list (temp3).

temp3 = list(set(temp1).difference(set(temp2)))

answered Jun 9, 2017 at 15:44

fgaim

1444 bronze badges

Add a comment |

S.K. Venkat · Accepted Answer · 2018-05-29 11:55:31Z

1

Here is an simple way to distinguish two lists (whatever the contents are), you can get the result as shown below :

>>> from sets import Set
>>>
>>> l1 = ['xvda', False, 'xvdbb', 12, 'xvdbc']
>>> l2 = ['xvda', 'xvdbb', 'xvdbc', 'xvdbd', None]
>>>
>>> Set(l1).symmetric_difference(Set(l2))
Set([False, 'xvdbd', None, 12])

Hope this will helpful.

answered May 29, 2018 at 11:55

S.K. Venkat

1,8072 gold badges24 silver badges36 bronze badges

Add a comment |

mr potato head · Accepted Answer · 2021-01-17 00:34:00Z

0

You can cycle through the first list and, for every item that isn't in the second list but is in the first list, add it to the third list. E.g:

temp3 = []
for i in temp1:
    if i not in temp2:
        temp3.append(i)
print(temp3)

answered Jan 17, 2021 at 0:34

mr potato head

93 bronze badges

Add a comment |

navalega0109 · Accepted Answer · 2023-08-15 08:27:33Z

Out of all possible option fastest option is:

s = set(temp2);
[x for x in temp1 if x not in s]

Performance results

    import timeit
    init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'
    print(timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000))
    print(timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000))
    print(timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000))
    print(timeit.timeit('list(set(temp1) ^ set(temp2))', init, number = 100000))

Results

0.3485999000258744
0.3314229999668896
2.067719299986493
0.3791518000070937

Collectives™ on Stack Overflow

Get difference between two lists with Unique Entries

33 Answers 33

Installation

Example usage

Not the answer you're looking for? Browse other questions tagged
python
performance
list
set
set-difference
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

33 Answers 33

Installation

Example usage

Not the answer you're looking for? Browse other questions tagged pythonperformancelistsetset-difference or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
performance
list
set
set-difference
or ask your own question.