All Questions
721 questions
2
votes
3
answers
123
views
Remove special character and units form Pandas column name with Python [duplicate]
I'm working on a script to convert a data file from one format to another. I need to remove the special characters from the column headers.
I am using Pandas to read a CSV file with the below ...
1
vote
3
answers
77
views
Merge multiline rows in pandas dataframe based on regex pattern
I have a single column dataframe similar to this:
cat = { 'cat': ['a','b','c-',' -d','e']}
df = pd.DataFrame(cat)
>>> print(df)
cat
0 a
1 b
2 c-
...
2
votes
2
answers
63
views
Is there a better way to replace all non-ASCII characters from specific columns on a DataFrame?
There are some sentences and words in Chinese and Japanese that I just want to drop.
Or if there is a better solution than just dropping them, I would like to explore them as well.
import pandas as pd
...
1
vote
1
answer
49
views
Create new column in dataframe based on regex pattern from different column
I have a dataframe with two columns, date of birth ("PN") (Swedish version YYMMDD-NNNN) and date of analysis. I would like to create a new column with only YYMMDD from the first one. But ...
1
vote
4
answers
291
views
Pandas Extract Phone Number if it is in Correct Format
I have a column that has phone numbers. They are usually formatted in (555) 123-4567 but sometimes they are in a different format or they are not proper numbers. I am trying to convert this field to ...
2
votes
1
answer
108
views
How to remove specific parts of text from string?
I am using Python. I have a dataframe with string "description". In these strings, I have things like: "п. 5.6.2 ГОСТ 2.114-2016", "п. 4.1 ГОСТ 2.102-2013", "п.5 ...
0
votes
0
answers
26
views
Assign output of expanded pandas str.extract to new columns of same dataframe when rows are filtered with .loc [duplicate]
I have a dataframe with some registration types and some alphanumeric registration numbers for certain types. If a row has a certain type, I need to perform a regex operation on the relevant number, ...
-2
votes
1
answer
141
views
Parsing data with Pandas - how to output match as a new column
I've got a routine to read in a CSV file and spit out selected columns that match specific criteria:
CSV Input File looks like this
Name
Role
Login
Phil
Role A | Role B
2024/01/01
Bob
Role A | Role ...
0
votes
1
answer
307
views
Pandas dataframe: writing specific values to a file with specific formatting?
I want to update external files. I want to do this by writing new data from Pandas dataframes to the files in a particular format.
Here is an example of the first two blocks of data in a file (what I ...
0
votes
1
answer
66
views
Regex for column not producing expected output
I have this dataframe:
dfsupport = pd.DataFrame({'Date': ['8/12/2020','8/12/2020','13/1/2020','24/5/2020','31/10/2020','11/7/2020','11/7/2020'],
'Category': ['Table','Chair','...
1
vote
1
answer
75
views
Parsing Text data and converting the parsed data into dataframe
I have a Text data and I want to parse the content and extract the Brand, MPN, Condition, Qty, Price, Customer name.
The text data is in the format:
---ABB INSTALLATION PRODUCT---
54905BE06, NEW, qty ...
1
vote
1
answer
92
views
Detect rows in a column containing only emojis in a data frame
How to detect rows in a column containing only emojis in a data frame? The rows containing text with emojis will not be considered.
Given DF:
content
😎🤘🏾
Wow Amazing!!!
I am loving it😍😘
🤘🏾 ...
0
votes
1
answer
63
views
Python dataframe split a string column into many
The data I import comes in irregular fashion.
df =
# following data all in one column
1 CABATT CAR BATTERY VOLTAGE -10.0 200.0
2 CPTEMP CAR DAS PANEL TEMP C -10.0 200.0
3 CAPTMA CAR PANEL A ...
0
votes
0
answers
45
views
Is there a way to slice a dataframe when the characteristic variable has non-printable characters?
I imported multiple CSV files from census data. I was able to successfully bring in the files and append the three files (axis 0) one on top of the other. I was able to put the data into a dataframe ...
1
vote
1
answer
77
views
Literal match of strings in dataframe to other dataframe with multiple match options
I have a dataframe (df) with values in a column 'country' that I wish to standardize, using another dataframe called 'country_codes'. A value from df can match against any item from 'country_codes', ...
0
votes
0
answers
36
views
How do I delete symbols I scraped from websites in a csv file using pandas
Iam trying to analyse airline reviews which is a single column dataframe containing only the text of the reviews, unfortunately there exist a tick symbol ✅ and special character | in each row showing ...
1
vote
0
answers
19
views
Python - Unable to Replace Long String Values in DataFrame using pd.replace or re.sub [duplicate]
I have a dataframe which consists of a column with weird formatted string output. The data file is about 50k lines but the strings are repeating, there are about 10 unique strings in it. I am trying ...
0
votes
0
answers
46
views
To extract a number after apostrophe in a string [duplicate]
I'm looking for a way to extract the number after an apostrophe in a string.
Example String = "abc '22 xyz"
What regular expression can we use to extract 22 from the above string? Are there ...
1
vote
2
answers
73
views
After extracting numbers from dataframe, how can I calculate the average price after converting to one unit of measure uniformly
I have a dataset which contains different properties rent price. It looks like this:
data = {
'prices': [
'$350.00',
'$450.00 pw',
'$325 per week',
'$495pw - Views! ...
-1
votes
3
answers
160
views
Matching several string matches from lists and making a new row for each match
I have a data frame with text in one of the columns and I am using regex formatted strings to see if I can find any matches from three lists. However, when there are multiple matches from list 1, I ...
-1
votes
1
answer
82
views
Using a regex stored as a variable in Python [closed]
I read regexes and their replacements from a CSV into a dictionary and then run that over a column in a Dataframe looking for locations:
for regex, replacement in regex_replace.items():
df["...
1
vote
2
answers
55
views
Tag data using matching keywords without using nested loops
I've been working on a problem for the past few days of which I am unable to solve. Any help or advice would be greatly appreciated. Basically I have a dataset containing free text called mydata. Data
...
1
vote
1
answer
99
views
PyPolars efficient regex mass matching
I'm trying to mass match a column against a dictionary of regexes as follows :
import random
import time
col_to_map_possibilities = [str(x) for x in range(100)]
col_to_map_generated = random.choices(...
0
votes
1
answer
73
views
Iterate over rows in a data frame to search by regular expressions
I am trying to fetch sql tables from query using regular expressions. That is done for single query by using re.findall
import re
Query = ["SELECT * FROM WS_DE_Staging.stage_dual_h_20"]
...
-1
votes
1
answer
101
views
Joining two dataframes using regex
I'm trying to join 2 data frames using regex. In one data frame is Postcode Area (e.g. BA, M) in the other is Postcode District (e.g. BA1, M18). I want to join on the Postcode Area. My regex is ([A-Z][...
3
votes
3
answers
59
views
Remove a string from the column when it meets the condition
I would like to remove the string from a string column when it contains a lower letter (the string column may be NaN or include multiple string in one row)
Column1
Column2
Column3
NaN
NaN
NaN
...
0
votes
0
answers
15
views
Converting multiple dates, using regex and strptime and applying results using lamda apply [duplicate]
I'm mapping date formats using regex, and I want to convert my dates to "%Y-%m-%d"
When testing with "%Y-%m-%d %H:%M:%S", my dates don't convert correctly. Can anyone see what I'm ...
1
vote
1
answer
66
views
How to extract a string of mixed number and text, and get the average numbers?
I have a column in my data frame that contains strings of numbers and text such as:
0 200 to 500 people
1 1 to 5 people
2 5000 people and over
3 2000 to 3000 people
I want to convert each ...
-3
votes
2
answers
59
views
Formatting a date column with regular expressions
I have the following problem. I have a dataframe with a date column. The date is saved as str and mostly in the following formats: df[dates]=[['5 July 2023'], ['Brussels, 18 September 1998'],['...
0
votes
2
answers
55
views
How to identify characters in the matched string that corresponded to dot in regex pattern?
I have the following dataframe. The column 'regex' shows my regex patterns and the column 'matched_string' contains the strings that are matched with them. What I would like to have, is another column ...
1
vote
1
answer
46
views
Split a column into 2 columns like alphabetic text in one column and alphanumeric or numbers or anything in 2nd column
I have a dataframe column which contains product and technical details merged. I just want to split them separately into 2 columns like actual product name in one column and other technical details in ...
2
votes
1
answer
41
views
I am having issues using Regex to parse a chat and turn it into a dataframe. - it is just skipping info
I am creating a function to parse the copied chat from a yahoo fantasy mock draft and turn it into a dataframe of drafted players. It seems to skip lines for reasons I cannot determine. I am using ...
-4
votes
1
answer
58
views
Adding elements to a list when specific pattern is found
When reading the file, I would like my script to sum lines following the specific pattern. The thing is that every certain amount of lines I have a header. The sum of lines supposed to happen between ...
0
votes
1
answer
27
views
How to obtain multiple partial strings from a dataframe?
I am trying to obtain multiple partial strings from my dataframe and put those partial strings as added columns to my dataframe. Below you will find a simple data sample:
df
Serienummer
15 SAA ...
1
vote
1
answer
45
views
How to extract date from a specified column containing different types of date formats of a given Pandas DataFrame using Regex
def find_valid_dates(dt):
result = re.findall("\d{1,2}-\d{2}-\d{2,4}|\d{1,2} (?:januari|februari|maart|april|mei|juni|juli|augustus|september|oktober|november|december) \d{1,4}", dt)
...
0
votes
3
answers
73
views
Extracting multiple info from string to create new columns
I have the following dataframe
df = pd.DataFrame({'info':{0:'1cr1:782906:F:He:1:Ho1:0:Ho2:0',
1:'5cr1:782946:G:He:1:Ho1:0:Ho2:0'}})
that looks like this
info
0 1cr1:...
0
votes
1
answer
77
views
Regex to suppress line breaks to aggregate groups of 3 lines into 1
I'm trying to fix a txt file format using python and regex. For that I used this post as a point of start, but I can't make it work with my file format.
Regex:
([^1|-])[\n](.)|(.)[\n]([^|-])
My file ...
0
votes
1
answer
65
views
Removing strings outside of parentheses in python
I have a dataset and need to remove parentheses from some rows within a column.
test
(ABC)
ABC(DEF)G
ABC
Desired Output
test
ABC
DEF
ABC
This is what I tried: df['test'] = df['test']....
0
votes
2
answers
66
views
How to remove first string part of a transcript in a Dataframe?
I got a transcript column in a large date frame and I want to remove/delete from the beginning of the transcript not useful messages, something like hi or hello if the user type it.
Transcript ...
2
votes
4
answers
61
views
Python, Regex everything before a sequence
I have a data frame column which contains the following strings.
I want to parse everything to the left of the 4 or 5 digit number into a separate column
column name
0 129 4,029.16 08-31-13 8043 ...
0
votes
1
answer
69
views
pyspark : conditionally transform columns based on multiple columns
I have a dataframe where the ID column is represented in different ways and different character lengths and I am trying to make it uniform . How can i do this with multiple conditions ? I tried the ...
-1
votes
1
answer
54
views
How can I search a value from a df inside another df?
I try to search a value from a df inside another df, I will explain it as a table it will be more talking.
df (comment columns is for better comprehension and is not necessary)
https://pastebin.com/...
0
votes
1
answer
45
views
Remove solo symbols in Pandas DataFrame
I'm trying to remove any instances of a symbol that is all by itself. I want to keep any other uses of symbols, only is the symbol is by itself. This dataframe will have a mixture of letters and ...
1
vote
2
answers
36
views
Removing the numbers that appears at the end of text under a columns of data frames
I want to do a bit cleaning and removing the numbers that appears at the end of text but depending on its ids the cleaning is different, I know how to do it when it is a dictionary and I developed the ...
0
votes
1
answer
40
views
Remove the last number in data frame columns depends on their data frames - optimizing the code
I wrote the code as follows to remove the last number in data frame columns depends on their data frame. is there a better way of implementing it instead of using a lot of if ?
I want to remove all ...
1
vote
1
answer
81
views
How to filter pandas dataframe by a feature value ending with Case sensitive letter
I have a data frame like this:
df:
C1 C2
Ford 11
ram 13
SUV 19
SEDAN 14
I want to filter the data frame column C1 where the C1 values end with a upper case character. So the expected output ...
0
votes
1
answer
97
views
how to rename dataframe columns that match regex
I'm trying to rename pandas dataframe columns that match a regular expression and leave others as they are.
I have multiple dataframes and some of them do not contain columns which match the regex at ...
1
vote
3
answers
73
views
Iterating through columns in a dataframe, how do I turn the row values from objects to strings so I can used regular expressions?
I have dataframe of 54k+ rows and 31 columns, and the last 10 columns are essays that I want to investigate.
The aim of the regex expression I want to run is to strip out the punctuation
Running this ...
-2
votes
1
answer
68
views
Pandas DataFrame: .replace() and .strip() methods returning NaN values
I read a pdf file into a DataFrame using tabula and used .concat() to combine it all into one DataFrame by doing the following:
import pandas as pd
import tabula
df = tabula.read_pdf('card_details....
2
votes
1
answer
38
views
Python how to split columns each time there's a chunk of numbers in string
I have a table(dataframe) to clean up, each row looks like this:
Column A
Column B
Cell 1
1234 abcd 667 randomthings
Cell 3
4455 abcd abc 847 other randomthings 1 endings
I want to split it into a ...