USING LIKE inside pandas.query()

Question

I have been using Pandas for more than 3 months and I have an fair idea about the dataframes accessing and querying etc.

I have got an requirement wherein I wanted to query the dataframe using LIKE keyword (LIKE similar to SQL) in pandas.query().

i.e: Am trying to execute pandas.query("column_name LIKE 'abc%'") command but its failing.

I know an alternative approach which is to use str.contains("abc%") but this doesn't meet our requirement.

We wanted to execute LIKE inside pandas.query(). How can I do so?

It's been a while since this question was posted: has a solution to this been found or is this still only obtainable through str.contains()? — user1717828, Commented Dec 4, 2017 at 18:26
For those new to Pandas. Look at the docs for Series object like Pandas.series.str.contains: pandas.pydata.org/pandas-docs/stable/reference/api/… — Timothy C. Quinn, Commented Sep 2, 2020 at 2:02

volodymyr · Accepted Answer · 2020-04-21 12:45:17Z

104

If you have to use df.query(), the correct syntax is:

df.query('column_name.str.contains("abc")', engine='python')

You can easily combine this with other conditions:

df.query('column_a.str.contains("abc") or column_b.str.contains("xyz") and column_c>100', engine='python')

It is not a full equivalent of SQL Like, however, but can be useful nevertheless.

edited Apr 21, 2020 at 12:45

answered Jun 22, 2018 at 5:07

volodymyr

7,5144 gold badges43 silver badges45 bronze badges

3

Does not work for me with Pandas version 0.24.2 without the added , engine='python' as @P.Panayotov mentioned. Moreover, using pandas over df might confuse beginners.
– Bouncner
Commented Oct 6, 2019 at 16:36
1

Good suggestions - made a few changes
– volodymyr
Commented Jan 6, 2021 at 15:34
just a reminder: text in str.contains() is a regex by default
– Edward Weinert
Commented Jun 17, 2021 at 9:28
"If you have to use df.query", is there a better way (or different?) than .query if one wants to filter within a method chain?
– baxx
Commented Mar 13, 2022 at 19:24
1

@Bouchner I didn't need to add engine = 'python,' probably because it's almost 3 years later and I'm using Pandas 1.4.1.
– KBurchfiel
Commented Aug 25, 2022 at 14:36

Add a comment |

P.Panayotov · Accepted Answer · 2021-12-21 09:46:23Z

34

@volodymyr is right, but the thing he forgets is that you need to set engine='python' to expression to work.

Example:

>>> pd_df.query('column_name.str.contains("abc")', engine='python')

Here is more information on default engine ('numexpr') and 'python' engine. Also, have in mind that 'python' is slower on big data.

edited Dec 21, 2021 at 9:46

answered Jul 17, 2018 at 7:23

P.Panayotov

5735 silver badges8 bronze badges

1

Edit: Can confirm this worked for me, after setting engine to python. Take care to use " and ' in the correct order.
– Thomas
Commented Oct 29, 2019 at 15:31

Add a comment |

khammel · Accepted Answer · 2015-07-14 00:02:32Z

16

Not using query(), but this will give you what you're looking for:

df[df.col_name.str.startswith('abc')]


df
Out[93]: 
  col_name
0     this
1     that
2     abcd

df[df.col_name.str.startswith('abc')]
Out[94]: 
  col_name
2     abcd

Query uses the pandas eval() and is limited in what you can use within it. If you want to use pure SQL you could consider pandasql where the following statement would work for you:

sqldf("select col_name from df where col_name like 'abc%';", locals())

Or alternately if your problem with the pandas str methods was that your column wasn't entirely of string type you could do the following:

df[df.col_name.str.startswith('abc').fillna(False)]

edited Jul 14, 2015 at 0:02

answered Jul 13, 2015 at 21:40

khammel

2,1271 gold badge16 silver badges19 bronze badges

I have tried SQLDF, this is solving my problem however i am seeing huge performance issue with it. I added 95lakhs of records with regular df.query() i could get the result in 1min. but if i use SQLDF its taking minimum 10mins.
– Pradeep M
Commented Jul 22, 2015 at 18:12
SQLDF creates and tears down an sqlite database hence the performance hit. Is there a reason you can't use startswith()?
– khammel
Commented Jul 23, 2015 at 0:09

Add a comment |

jrjc · Accepted Answer · 2017-05-24 12:16:08Z

11

Super late to this post, but for anyone that comes across it. You can use boolean indexing by making your search criteria based on a string method check str.contains.

Example:

dataframe[dataframe.summary.str.contains('Windows Failed Login', case=False)]

In the code above, the snippet inside the brackets refers to the summary column of the dataframe and uses the .str.contains method to search for 'Windows Failed Login' within every value of that Series. Case sensitive can be set to true or false. This will return boolean index which is then used to return the dataframe your looking for. You can use .fillna() with this in the brackets as well if you run into any Nan errors.

Hope this helps!

edited May 24, 2017 at 12:16

jrjc

21.9k10 gold badges66 silver badges79 bronze badges

answered Dec 22, 2016 at 5:34

Terrance DeJesus

2413 silver badges7 bronze badges

1

I didn't have a summary column, so for a random column name one can use new_df = df[df['Column'].str.contains('something')]
– arie64
Commented Jun 15, 2017 at 17:53

Add a comment |

Shovalt · Accepted Answer · 2020-01-19 14:32:49Z

A trick I just came up with for "starts with":

pandas.query('"abc" <= column_name <= "abc~"')

Explanation: pandas accepts "greater" and "less than" statements for strings in a query, so anything starting with "abc" will be greater or equal to "abc" in the lexicographic order. The tilde (~) is the largest character in the ASCII table, so anything starting with "abc" will be less than or equal to "abc~".

A few things to take into consideration:

This is of course case sensitive. All lower case characters come after all upper cases characters in the ASCII table.
This won't work fully for Unicode strings, but the general principle should be the same.
I couldn't come up with parallel tricks for "contains" or "ends with".

Abhijeet · Accepted Answer · 2022-05-26 17:11:13Z

0

DataFrame:

    Name    Code  App

0  Jhon     8010  google
1  Michael  9020  github
2  Mandy    1240  google.com
3  Krish    1240  facebook

Search a word or related words in Dataframe

S = df[df["column_name"].str.contains("word")]
S.head()

Example:

Myword = input("Enter the word, want to search:")

S = df[df["App"].str.contains(Myword)]
S.head()

print(S)

Output:

Enter the word, want to search: google

   Name   Code  App
0  Jhon   8010  google
2  Mandy  1240  google.com

Note: This method is case sensitive

edited May 26, 2022 at 17:11

answered May 25, 2022 at 15:13

Abhijeet

2981 gold badge6 silver badges15 bronze badges

Add a comment |

Collectives™ on Stack Overflow

USING LIKE inside pandas.query()

6 Answers 6

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframe or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.