16

I would like to perform a self-join on a Pandas dataframe so that some rows get appended to the original rows. Each row has a marker 'i' indicating which row should get appended to it on the right.

d = pd.DataFrame(['A','B','C'], columns = ['some_col'])
d['i'] = [2,1,1]

In [17]: d
Out[17]: 
  some_col  i
0        A  2
1        B  1
2        C  1

Desired output:

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B

That is, row 2 gets appended to row 0, row 1 to row 1, row 1 to row 2 (as indicated by i).

My idea of how to go about it was

pd.merge(d, d, left_index = True, right_on = 'i', how = 'left')

But it produces something else altogether. How to do it correctly?

3 Answers 3

16

Instead of using merge you can also use indexing and assignment:

>>> d['new_col'] = d['some_col'][d['i']].values
>>> d
  some_col  i new_col
0        A  2       C
1        B  1       B
2        C  1       B
1
  • I like your answer more than mine. If OP needs a virtual column - it can be done this way: d.assign(some_col_y=d['some_col'].loc[d['i']].values) Commented Jan 3, 2017 at 0:13
8

join with on='i'

d.join(d.drop('i', 1), on='i', rsuffix='_y')

  some_col  i some_col_y
0        A  2          C
1        B  1          B
2        C  1          B
0
2

Try this:

In [69]: d.join(d.set_index('i'), rsuffix='_y')
Out[69]:
  some_col  i some_col_y
0        A  2        NaN
1        B  1          B
1        B  1          C
2        C  1          A

or:

In [64]: pd.merge(d[['some_col']], d, left_index=True, right_on='i', suffixes=['_y','']).sort_index()
Out[64]:
  some_col_y some_col  i
0          C        A  2
1          B        B  1
2          B        C  1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.