self-join with Pandas

Question

I would like to perform a self-join on a Pandas dataframe so that some rows get appended to the original rows. Each row has a marker 'i' indicating which row should get appended to it on the right.

d = pd.DataFrame(['A','B','C'], columns = ['some_col'])
d['i'] = [2,1,1]

In [17]: d
Out[17]: 
  some_col  i
0        A  2
1        B  1
2        C  1

Desired output:

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

That is, row 2 gets appended to row 0, row 1 to row 1, row 1 to row 2 (as indicated by i).

My idea of how to go about it was

pd.merge(d, d, left_index = True, right_on = 'i', how = 'left')

But it produces something else altogether. How to do it correctly?

MSeifert · Accepted Answer · 2017-01-03 00:08:12Z

16

Instead of using merge you can also use indexing and assignment:

>>> d['new_col'] = d['some_col'][d['i']].values
>>> d
  some_col  i new_col
0        A  2       C
1        B  1       B
2        C  1       B

answered Jan 3, 2017 at 0:08

MSeifert

152k41 gold badges349 silver badges366 bronze badges

I like your answer more than mine. If OP needs a virtual column - it can be done this way: d.assign(some_col_y=d['some_col'].loc[d['i']].values)
– MaxU - stand with Ukraine
Commented Jan 3, 2017 at 0:13

Add a comment |

piRSquared · Accepted Answer · 2017-01-03 00:51:45Z

8

join with on='i'

d.join(d.drop('i', 1), on='i', rsuffix='_y')

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

answered Jan 3, 2017 at 0:51

piRSquared

294k64 gold badges503 silver badges644 bronze badges

Add a comment |

MaxU - stand with Ukraine · Accepted Answer · 2017-01-03 00:04:06Z

2

Try this:

In [69]: d.join(d.set_index('i'), rsuffix='_y')
Out[69]:
  some_col  i some_col_y
0        A  2        NaN
1        B  1          B
1        B  1          C
2        C  1          A

or:

In [64]: pd.merge(d[['some_col']], d, left_index=True, right_on='i', suffixes=['_y','']).sort_index()
Out[64]:
  some_col_y some_col  i
0          C        A  2
1          B        B  1
2          B        C  1

answered Jan 3, 2017 at 0:04

MaxU - stand with Ukraine

211k37 gold badges401 silver badges432 bronze badges

Add a comment |

Collectives™ on Stack Overflow

self-join with Pandas

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
data-structures
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandasdata-structuresdataframe or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
data-structures
dataframe
or ask your own question.