Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
2 votes
3 answers
123 views

Remove special character and units form Pandas column name with Python [duplicate]

I'm working on a script to convert a data file from one format to another. I need to remove the special characters from the column headers. I am using Pandas to read a CSV file with the below ...
user15690830's user avatar
1 vote
3 answers
77 views

Merge multiline rows in pandas dataframe based on regex pattern

I have a single column dataframe similar to this: cat = { 'cat': ['a','b','c-',' -d','e']} df = pd.DataFrame(cat) >>> print(df) cat 0 a 1 b 2 c- ...
eazyezy's user avatar
  • 163
2 votes
2 answers
63 views

Is there a better way to replace all non-ASCII characters from specific columns on a DataFrame?

There are some sentences and words in Chinese and Japanese that I just want to drop. Or if there is a better solution than just dropping them, I would like to explore them as well. import pandas as pd ...
Mr Jxtr's user avatar
  • 120
1 vote
1 answer
49 views

Create new column in dataframe based on regex pattern from different column

I have a dataframe with two columns, date of birth ("PN") (Swedish version YYMMDD-NNNN) and date of analysis. I would like to create a new column with only YYMMDD from the first one. But ...
Sofie's user avatar
  • 11
1 vote
4 answers
291 views

Pandas Extract Phone Number if it is in Correct Format

I have a column that has phone numbers. They are usually formatted in (555) 123-4567 but sometimes they are in a different format or they are not proper numbers. I am trying to convert this field to ...
Bijan's user avatar
  • 8,564
2 votes
1 answer
108 views

How to remove specific parts of text from string?

I am using Python. I have a dataframe with string "description". In these strings, I have things like: "п. 5.6.2 ГОСТ 2.114-2016", "п. 4.1 ГОСТ 2.102-2013", "п.5 ...
user24958090's user avatar
0 votes
0 answers
26 views

Assign output of expanded pandas str.extract to new columns of same dataframe when rows are filtered with .loc [duplicate]

I have a dataframe with some registration types and some alphanumeric registration numbers for certain types. If a row has a certain type, I need to perform a regex operation on the relevant number, ...
Alex Howard's user avatar
-2 votes
1 answer
141 views

Parsing data with Pandas - how to output match as a new column

I've got a routine to read in a CSV file and spit out selected columns that match specific criteria: CSV Input File looks like this Name Role Login Phil Role A | Role B 2024/01/01 Bob Role A | Role ...
Marcus Webb's user avatar
0 votes
1 answer
307 views

Pandas dataframe: writing specific values to a file with specific formatting?

I want to update external files. I want to do this by writing new data from Pandas dataframes to the files in a particular format. Here is an example of the first two blocks of data in a file (what I ...
Ant's user avatar
  • 867
0 votes
1 answer
66 views

Regex for column not producing expected output

I have this dataframe: dfsupport = pd.DataFrame({'Date': ['8/12/2020','8/12/2020','13/1/2020','24/5/2020','31/10/2020','11/7/2020','11/7/2020'], 'Category': ['Table','Chair','...
Dan's user avatar
  • 1,065
1 vote
1 answer
75 views

Parsing Text data and converting the parsed data into dataframe

I have a Text data and I want to parse the content and extract the Brand, MPN, Condition, Qty, Price, Customer name. The text data is in the format: ---ABB INSTALLATION PRODUCT--- 54905BE06, NEW, qty ...
Sanjeet Kumar's user avatar
1 vote
1 answer
92 views

Detect rows in a column containing only emojis in a data frame

How to detect rows in a column containing only emojis in a data frame? The rows containing text with emojis will not be considered. Given DF: content 😎🤘🏾 Wow Amazing!!! I am loving it😍😘 🤘🏾 ...
hxgx_0990's user avatar
0 votes
1 answer
63 views

Python dataframe split a string column into many

The data I import comes in irregular fashion. df = # following data all in one column 1 CABATT CAR BATTERY VOLTAGE -10.0 200.0 2 CPTEMP CAR DAS PANEL TEMP C -10.0 200.0 3 CAPTMA CAR PANEL A ...
Mainland's user avatar
  • 4,544
0 votes
0 answers
45 views

Is there a way to slice a dataframe when the characteristic variable has non-printable characters?

I imported multiple CSV files from census data. I was able to successfully bring in the files and append the three files (axis 0) one on top of the other. I was able to put the data into a dataframe ...
Sean's user avatar
  • 1
1 vote
1 answer
77 views

Literal match of strings in dataframe to other dataframe with multiple match options

I have a dataframe (df) with values in a column 'country' that I wish to standardize, using another dataframe called 'country_codes'. A value from df can match against any item from 'country_codes', ...
Benjamin Allen's user avatar
0 votes
0 answers
36 views

How do I delete symbols I scraped from websites in a csv file using pandas

Iam trying to analyse airline reviews which is a single column dataframe containing only the text of the reviews, unfortunately there exist a tick symbol ✅ and special character | in each row showing ...
عمر عيسى's user avatar
1 vote
0 answers
19 views

Python - Unable to Replace Long String Values in DataFrame using pd.replace or re.sub [duplicate]

I have a dataframe which consists of a column with weird formatted string output. The data file is about 50k lines but the strings are repeating, there are about 10 unique strings in it. I am trying ...
sam's user avatar
  • 107
0 votes
0 answers
46 views

To extract a number after apostrophe in a string [duplicate]

I'm looking for a way to extract the number after an apostrophe in a string. Example String = "abc '22 xyz" What regular expression can we use to extract 22 from the above string? Are there ...
Will Graham's user avatar
1 vote
2 answers
73 views

After extracting numbers from dataframe, how can I calculate the average price after converting to one unit of measure uniformly

I have a dataset which contains different properties rent price. It looks like this: data = { 'prices': [ '$350.00', '$450.00 pw', '$325 per week', '$495pw - Views! ...
X.x's user avatar
  • 13
-1 votes
3 answers
160 views

Matching several string matches from lists and making a new row for each match

I have a data frame with text in one of the columns and I am using regex formatted strings to see if I can find any matches from three lists. However, when there are multiple matches from list 1, I ...
user22391597's user avatar
-1 votes
1 answer
82 views

Using a regex stored as a variable in Python [closed]

I read regexes and their replacements from a CSV into a dictionary and then run that over a column in a Dataframe looking for locations: for regex, replacement in regex_replace.items(): df["...
Tomp's user avatar
  • 45
1 vote
2 answers
55 views

Tag data using matching keywords without using nested loops

I've been working on a problem for the past few days of which I am unable to solve. Any help or advice would be greatly appreciated. Basically I have a dataset containing free text called mydata. Data ...
Lacri Mosa's user avatar
1 vote
1 answer
99 views

PyPolars efficient regex mass matching

I'm trying to mass match a column against a dictionary of regexes as follows : import random import time col_to_map_possibilities = [str(x) for x in range(100)] col_to_map_generated = random.choices(...
mlisthenewcool's user avatar
0 votes
1 answer
73 views

Iterate over rows in a data frame to search by regular expressions

I am trying to fetch sql tables from query using regular expressions. That is done for single query by using re.findall import re Query = ["SELECT * FROM WS_DE_Staging.stage_dual_h_20"] ...
Gaurav Jan's user avatar
-1 votes
1 answer
101 views

Joining two dataframes using regex

I'm trying to join 2 data frames using regex. In one data frame is Postcode Area (e.g. BA, M) in the other is Postcode District (e.g. BA1, M18). I want to join on the Postcode Area. My regex is ([A-Z][...
oscarwitch's user avatar
3 votes
3 answers
59 views

Remove a string from the column when it meets the condition

I would like to remove the string from a string column when it contains a lower letter (the string column may be NaN or include multiple string in one row) Column1 Column2 Column3 NaN NaN NaN ...
ProgramNewHand's user avatar
0 votes
0 answers
15 views

Converting multiple dates, using regex and strptime and applying results using lamda apply [duplicate]

I'm mapping date formats using regex, and I want to convert my dates to "%Y-%m-%d" When testing with "%Y-%m-%d %H:%M:%S", my dates don't convert correctly. Can anyone see what I'm ...
Mizanur Choudhury's user avatar
1 vote
1 answer
66 views

How to extract a string of mixed number and text, and get the average numbers?

I have a column in my data frame that contains strings of numbers and text such as: 0 200 to 500 people 1 1 to 5 people 2 5000 people and over 3 2000 to 3000 people I want to convert each ...
pymn's user avatar
  • 171
-3 votes
2 answers
59 views

Formatting a date column with regular expressions

I have the following problem. I have a dataframe with a date column. The date is saved as str and mostly in the following formats: df[dates]=[['5 July 2023'], ['Brussels, 18 September 1998'],['...
Nick's user avatar
  • 23
0 votes
2 answers
55 views

How to identify characters in the matched string that corresponded to dot in regex pattern?

I have the following dataframe. The column 'regex' shows my regex patterns and the column 'matched_string' contains the strings that are matched with them. What I would like to have, is another column ...
Bluegirl FK 's user avatar
1 vote
1 answer
46 views

Split a column into 2 columns like alphabetic text in one column and alphanumeric or numbers or anything in 2nd column

I have a dataframe column which contains product and technical details merged. I just want to split them separately into 2 columns like actual product name in one column and other technical details in ...
Pravin's user avatar
  • 319
2 votes
1 answer
41 views

I am having issues using Regex to parse a chat and turn it into a dataframe. - it is just skipping info

I am creating a function to parse the copied chat from a yahoo fantasy mock draft and turn it into a dataframe of drafted players. It seems to skip lines for reasons I cannot determine. I am using ...
Nicholas Wood's user avatar
-4 votes
1 answer
58 views

Adding elements to a list when specific pattern is found

When reading the file, I would like my script to sum lines following the specific pattern. The thing is that every certain amount of lines I have a header. The sum of lines supposed to happen between ...
kata248's user avatar
  • 59
0 votes
1 answer
27 views

How to obtain multiple partial strings from a dataframe?

I am trying to obtain multiple partial strings from my dataframe and put those partial strings as added columns to my dataframe. Below you will find a simple data sample: df Serienummer 15 SAA ...
Tessa's user avatar
  • 55
1 vote
1 answer
45 views

How to extract date from a specified column containing different types of date formats of a given Pandas DataFrame using Regex

def find_valid_dates(dt): result = re.findall("\d{1,2}-\d{2}-\d{2,4}|\d{1,2} (?:januari|februari|maart|april|mei|juni|juli|augustus|september|oktober|november|december) \d{1,4}", dt) ...
Tessa's user avatar
  • 55
0 votes
3 answers
73 views

Extracting multiple info from string to create new columns

I have the following dataframe df = pd.DataFrame({'info':{0:'1cr1:782906:F:He:1:Ho1:0:Ho2:0', 1:'5cr1:782946:G:He:1:Ho1:0:Ho2:0'}}) that looks like this info 0 1cr1:...
newbzzs's user avatar
  • 305
0 votes
1 answer
77 views

Regex to suppress line breaks to aggregate groups of 3 lines into 1

I'm trying to fix a txt file format using python and regex. For that I used this post as a point of start, but I can't make it work with my file format. Regex: ([^1|-])[\n](.)|(.)[\n]([^|-]) My file ...
Renan Bueno Angioletto's user avatar
0 votes
1 answer
65 views

Removing strings outside of parentheses in python

I have a dataset and need to remove parentheses from some rows within a column. test (ABC) ABC(DEF)G ABC Desired Output test ABC DEF ABC This is what I tried: df['test'] = df['test']....
Coding_Nubie's user avatar
0 votes
2 answers
66 views

How to remove first string part of a transcript in a Dataframe?

I got a transcript column in a large date frame and I want to remove/delete from the beginning of the transcript not useful messages, something like hi or hello if the user type it. Transcript ...
Tal1992's user avatar
  • 73
2 votes
4 answers
61 views

Python, Regex everything before a sequence

I have a data frame column which contains the following strings. I want to parse everything to the left of the 4 or 5 digit number into a separate column column name 0 129 4,029.16 08-31-13 8043 ...
Nev1111's user avatar
  • 1,049
0 votes
1 answer
69 views

pyspark : conditionally transform columns based on multiple columns

I have a dataframe where the ID column is represented in different ways and different character lengths and I am trying to make it uniform . How can i do this with multiple conditions ? I tried the ...
Datamaniac's user avatar
-1 votes
1 answer
54 views

How can I search a value from a df inside another df?

I try to search a value from a df inside another df, I will explain it as a table it will be more talking. df (comment columns is for better comprehension and is not necessary) https://pastebin.com/...
Teddy's user avatar
  • 143
0 votes
1 answer
45 views

Remove solo symbols in Pandas DataFrame

I'm trying to remove any instances of a symbol that is all by itself. I want to keep any other uses of symbols, only is the symbol is by itself. This dataframe will have a mixture of letters and ...
Jared's user avatar
  • 47
1 vote
2 answers
36 views

Removing the numbers that appears at the end of text under a columns of data frames

I want to do a bit cleaning and removing the numbers that appears at the end of text but depending on its ids the cleaning is different, I know how to do it when it is a dictionary and I developed the ...
ella's user avatar
  • 201
0 votes
1 answer
40 views

Remove the last number in data frame columns depends on their data frames - optimizing the code

I wrote the code as follows to remove the last number in data frame columns depends on their data frame. is there a better way of implementing it instead of using a lot of if ? I want to remove all ...
ella's user avatar
  • 201
1 vote
1 answer
81 views

How to filter pandas dataframe by a feature value ending with Case sensitive letter

I have a data frame like this: df: C1 C2 Ford 11 ram 13 SUV 19 SEDAN 14 I want to filter the data frame column C1 where the C1 values end with a upper case character. So the expected output ...
Yash's user avatar
  • 357
0 votes
1 answer
97 views

how to rename dataframe columns that match regex

I'm trying to rename pandas dataframe columns that match a regular expression and leave others as they are. I have multiple dataframes and some of them do not contain columns which match the regex at ...
Zviad Melitskauri's user avatar
1 vote
3 answers
73 views

Iterating through columns in a dataframe, how do I turn the row values from objects to strings so I can used regular expressions?

I have dataframe of 54k+ rows and 31 columns, and the last 10 columns are essays that I want to investigate. The aim of the regex expression I want to run is to strip out the punctuation Running this ...
DanInco's user avatar
  • 13
-2 votes
1 answer
68 views

Pandas DataFrame: .replace() and .strip() methods returning NaN values

I read a pdf file into a DataFrame using tabula and used .concat() to combine it all into one DataFrame by doing the following: import pandas as pd import tabula df = tabula.read_pdf('card_details....
Adam Idris's user avatar
2 votes
1 answer
38 views

Python how to split columns each time there's a chunk of numbers in string

I have a table(dataframe) to clean up, each row looks like this: Column A Column B Cell 1 1234 abcd 667 randomthings Cell 3 4455 abcd abc 847 other randomthings 1 endings I want to split it into a ...
Samiiir's user avatar
  • 23

1
2 3 4 5
15