173 questions
0
votes
0
answers
19
views
DataFrame.apply converts integer data to float even in absence of mixed numerical types
I am using DataFrame.apply() to calculate a new "Metric" column by taking an existing integer categorical column and looking up the integer in a list, i.e., indexing into the list. It works ...
2
votes
1
answer
62
views
advanced logic with groupby, apply and transform - compare row value with previous value and create new column
I have the following pandas dataframe:
d= {'Time': [0,1,2,0,1,2,2,3,4], 'Price': ['Auction', 'Auction','800','900','By Negotiation','700','250','250','Make Offer'],'Item': ['Picasso', 'Picasso', '...
1
vote
2
answers
35
views
DataFrame Creation from DataFrame.apply
I have a function that returns a pd.DataFrame given a row of another dataframe:
def func(row):
if row['col1']== 1:
return pd.DataFrame({'a':[1], 'b':[11]})
else:
return pd....
0
votes
1
answer
40
views
Why is .apply / .map only running against the first column?
All!
I have two [2] pd.DataFrames: df_colors [20,3] and df_master [200,13].
I am attempting to update the values for each item in each row.
df_colors.head(5)
COLOR
CODE
WHITE
0
BLACK
1
GREEN
0
...
0
votes
0
answers
90
views
create multiple columns with df.apply that uses multi return function
I am creating a new column with df.apply and a function and would like to use the same function to create another column in the dataframe. The function returns several values from which I'm selecting ...
1
vote
3
answers
96
views
How do I sum a column based on separate category types, and preserve the zeros?
Here is my dataframe:
Id
Category
Hours
1
A
1
1
A
3
1
B
4
2
A
2
2
B
6
3
A
3
And here is the output I want:
Id
Total Hours
A_Hours
B_Hours
1
5
4
4
2
8
2
6
3
3
3
0
How do I achieve this?
I ...
0
votes
0
answers
25
views
Apply logical comparison but truth value ambiguous
I am trying to do a mapping of current resources to available resources in gcp.
I want to check the current RAM and vCPUs(calculated elsewhere) then create a new column for each machine series that ...
1
vote
0
answers
50
views
Apply a function with multiple arguments to a large Pandas dataframe efficiently
My dataframe (1,957,046 x 4) is of baby names by year, count and gender, as follows:
Year
Name
Gender
Count
1880
A
F
1
1880
B
M
5
1880
C
F
2
...
...
...
...
2018
X
M
7
2018
Y
F
4
2018
Z
M
2
I ...
1
vote
1
answer
79
views
Vectorized dataframe filtering with complex logic
I have a very big dataframe with five columns, ID and four numerical. Let's say, integers between 0 and 50. My goal is to calculate cosine similarity matrix for every ID.
However, I want to force some ...
2
votes
1
answer
2k
views
Get correlation per groupby/apply in Python Polars
I have a pandas DataFrame df:
d = {'era': ["a", "a", "b","b","c", "c"], 'feature1': [3, 4, 5, 6, 7, 8], 'feature2': [7, 8, 9, 10, 11, 12], '...
0
votes
1
answer
45
views
Assignment a values to columns inside df.apply()
I need to assign multiple values to multiple columns inside a pandas.DataFrame.
What I want to do looks like that:
df.apply(
lambda x: x['card_{}'.format(card)] = score
for card, score in zip(...
0
votes
1
answer
145
views
Speed up groupby rolling apply utilising multiple columns
I'm trying to create a Brier Score for a grouped rolling window. As the function that calculates the Brier Score utilises multiple columns in the grouped rolling window I've had to use the answer here ...
1
vote
1
answer
71
views
index compatibility of dataframe with multiindex result from apply on group
We have to apply an algorithm to columns in a dataframe, the data has to be grouped by a key and the result shall form a new column in the dataframe. Since it is a common use-case we wonder if we have ...
1
vote
1
answer
30
views
How do I keep values based on dataframe values?
I have the following dataframe.
ID path1 path2 path3
1 12 NaN NaN
1 1 5 NaN
1 2 NaN ''
1 2 4 111
2 123 NaN NaN
3 11 ...
0
votes
1
answer
185
views
Add columns to Dataframe when apply custom function that returns dictionary
def tFunc(row):
if (random.random()>0.5):
info={'A': 'a', 'B': 'b', 'C': 'c', 'D': 'd'}
else:
info={'A': 'a', 'B': 'b', 'C': 'c'}
return info
# Workaround
# for ...
0
votes
1
answer
41
views
How to create a column in a dataframe based on another value in the row (Python)
I have the following data:
country
code
continent
plants
invertebrates
vertebrates
total
Afghanistan
AFG
Asia
5
2
33
40
Albania
ALB
Europe
5
71
61
137
Algeria
DZA
Africa
24
40
81
145
I want to ...
2
votes
3
answers
1k
views
Pandas : Concat rows of a dataframe with same index to form custom string in pairs
Say I have a dataframe
df = pd.DataFrame({'colA' : ['ABC', 'JKL', 'STU', '123'],
'colB' : ['DEF', 'MNO', 'VWX', '456'],
'colC' : ['GHI', 'PQR', 'YZ', '789'],}, ...
1
vote
1
answer
462
views
Pandas : Prevent groupby-apply to sort the results according to index
Say I have a dataframe,
dict_ = {
'Query' : ['apple', 'banana', 'mango', 'bat', 'cat', 'rat', 'lion', 'potato', 'london', 'new jersey'],
'Category': ['fruits', 'fruits', 'fruits', 'animal', '...
1
vote
1
answer
289
views
Apply T-Test test per group
I have dataframe like this:
features_df = pd.DataFrame({
'group': np.array([0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1]),
'variable': ['var1'] * 8 + ['var2'] * 8,
'value': np.array([5.582443, 7....
0
votes
1
answer
378
views
How to improve performance of pandas.apply() to perform text cleaning operations on large sized pandas column?
I'm working on a tweet dataset where one column is the text of the tweet. Following function performs the cleaning of tweet which involves removal of punctuations, stopwords, lower case conversion, ...
0
votes
3
answers
51
views
Check within a column if a certain value is contained, if yes set a value
I have a problem. I want to run a loop through the whole series and check if it contains a certain value. If this row contains a certain value, it should be set to true. I get the following error: ...
1
vote
0
answers
261
views
Numba - how to return multiple columns ( arrays) - after group by apply
I would like to run groupby and then apply Numba function on top of a pandas.
This is the example :
@nb.jit(nopython=True)
def my_Numba_function(arr1,arr2):
arr1[:] =11
arr2[:] =22
...
0
votes
1
answer
58
views
pandas df.apply Including condition with NaN values
I'm trying to replace outliers and NaN values in my pandas.DataFrame with the mode of the series, using the apply method and a lambda function and filtering by a property. I've tried in three ...
0
votes
1
answer
889
views
Python Dataframe add two columns containing lists
I have a dataframe of two columns, each containing list as elements. I want to perform rowwise element additon of two lists. My below solution is inspired from this answer Element-wise addition of 2 ...
0
votes
2
answers
996
views
DataFrame apply/append a function that returns a dict to each row
I'm looking to apply get_sentiment to each row in a dataframe and have the returned dict append to that row. Is there a good way of doing this?
def get_sentiment(txt: str) -> dict:
response = ...
0
votes
1
answer
301
views
How to apply a function that takes multiple arguments to a pandas DataFrame
I want to create two functions, apply those functions on the DataFrame, and return the result to column interval_ratio
import seaborn as sns
import pandas as pd
import numpy as np
max_testing_data = ...
0
votes
1
answer
55
views
Pandas locate and apply changes to column
This is something I always struggle with and is very beginner. Essentially, I want to locate and apply changes to a column based on a filter from another column.
Example input.
import pandas as pd
...
0
votes
1
answer
318
views
How to return a DataFrame when using pandas.apply()
I'm trying to get a concated DataFrame using pandas.apply(), there is a demo below:
Just like the code shown, the apply() returns a Series instead of a concated DataFrame that I expected, how can I ...
0
votes
1
answer
552
views
Pandas: error when creating a new column using a function that takes one argument from another column
I have the following data frame df:
df = pd.DataFrame({'result' : ['s17h10e7', 's5e3h2S105h90e15',
's17H10e7S5e3H2s105h90e15'],
'status' : [102, 117, ...
0
votes
1
answer
521
views
Python: How to apply TTEST_IND to multiple columns and multiple variants in a Dataframe?
Need a quick way to apply a t-test to multiple groups and multiple variables. Let's assume I have a table like this:
df = pd.DataFrame({'group': 'a a b b'.split(), 'B': [1,2,3,4], 'C': [4,6, 5,10]})
...
1
vote
1
answer
562
views
Pandas simple groupby and apply complains "Columns must be same length as key"
Essentially I have a table of timestamps and some data and want to group by the same timestamps and change the timestamps on a grouping basis. I got something working with Interpolate seconds to ...
2
votes
3
answers
237
views
Pandas Groupby and Apply
I am performing a grouby and apply over a dataframe that is returning some strange results, I am using pandas 1.3.1
Here is the code:
ddf = pd.DataFrame({
"id": [1,1,1,1,2]
})
def ...
5
votes
1
answer
147
views
Why does pandas.GroupBy.apply() ignore the sort flag in some situations?
When and why is the sort flag of a DataFrame grouping ignored in pd.GroupBy.apply()? The problem is best understood with an example. In the following 4 equivalent solutions to a dummy problem, ...
2
votes
1
answer
4k
views
faster alternatives to .apply() in pandas
I am trying to speed up the process of applying a custom function to columns in a data frame. I have found that this:
b = b.apply(lambda x: 'not_ticker' if x is None else x)
b = b.apply(lambda x: x if ...
0
votes
2
answers
2k
views
How to Compare Multiple Columns, and Produce Values in single New Column , Using Apply Function in Pandas
Using the Apply Function in Pandas, I want to compare Multiple Columns in a Datafarme , to see if there values are Higher or Lower than a Numerical Value. Than Based on the Result of the Condition If ...
1
vote
0
answers
599
views
Python Type Error: Series.name must be a hashable
I have a grouped dataset (by groupby) called level_temp_grouped and I want to apply a function to two boolean columns namely, up and fill_cand, for each group.
level_temp['neighbor'] = ...
0
votes
1
answer
605
views
Count values using groupby function and using apply function at the same time
I'm trying to count the occurance of grouped values and write values in a column using apply and grouby function on a dataframe. I have the following data frame:
df = pd.DataFrame({'colA': ['name1', '...
1
vote
2
answers
98
views
How can I apply multiple functions involving multiple columns of a pandas dataframe with grouby?
Considering the following datafrme:
id
cat
date
max
score
1
s1
A
12/06
9
5.4
2
s1
B
12/06
10
5.4
3
s2
C
11/04
13
4.2
4
s2
D
11/04
28
10
5
s3
E
08/02
16
5.4
5
s3
F
08/02
6
5.4
I want to group ...
0
votes
1
answer
396
views
Using Panda's Apply Function to loop through List of Coordinates (csv) in Google Places API query with Python
For a project, I have a csv file with 60 coordinates and the radius (for each centroid of a city district) I want to get my Google Maps results of. Aim is to loop through the coordinates and the ...
0
votes
2
answers
875
views
Pandas df.apply function returns None [closed]
What I'm trying to do:
Pass a column through a regex search in order to return that will be added to another column
How:
By writing a function with simple if-else clauses:
def category(series):
...
0
votes
2
answers
197
views
Keep NaN groups when using GroupBy apply
I'm looking to keep the structure when using apply on a GroupBy object where some groups are NaN. Using dropna=False does not appear to help, NaN groups are still lost with apply.
mux = pd.MultiIndex....
1
vote
1
answer
881
views
How to iterate over an array using a lambda function with pandas apply
I have the following dataset:
0 1 2
0 2.0 2.0 4
0 1.0 1.0 2
0 1.0 1.0 3
3 1.0 1.0 5
4 1.0 1.0 2
5 1.0 NaN 1
6 NaN 1.0 1
...
0
votes
1
answer
121
views
Issues Converting Python Code Block to Function
There is a block of code I use regularly in my analysis to standardize the description of the types of devices used by customers to access an internet provider's services. The block of code is as ...
1
vote
0
answers
67
views
how does DataFrameGroupBy.apply handle large dataframes with duplicate index in pandas?
Suppose that we have a large dataframe with duplicate index,
# IPython
In [1]: import pandas as pd
In [2]: from numpy.random import randint
In [3]: df = pd.DataFrame({'a': randint(1, 10, 10000)}, ...
0
votes
1
answer
392
views
How to make decimal part rounding filter, using python pandas Dataframe apply method
I want to make decimal filter with pandas Dataframe.
Filter will ceiling and flooring their decimal part.
Like this
threshold is 0.3 and 0.7
0.75 -> 1
1.99 -> 2
9.13 -> 9
326.2 -> 326 ...
0
votes
1
answer
81
views
Enhancing performance of pandas groupby&apply
These days I've been stucked in problem of speeding up groupby&apply,Here is code:
dat = dat.groupby(['glass_id','label','step'])['equip'].apply(lambda x:'_'.join(sorted(list(x)))).reset_index()
...
0
votes
1
answer
114
views
Problems with Dataframe.apply() in combine phase
Issue
I am trying to use DataFrame.apply() to add new columns to a dataframe. The number of columns being added is dependent on each row of the original dataframe. There is overlap between the columns ...
1
vote
2
answers
246
views
Python pandas dataframe apply result of function to multiple columns where NaN
I have a dataframe with three columns and a function that calculates the values of column y and z given the value of column x. I need to only calculate the values if they are missing NaN.
def ...
0
votes
1
answer
33
views
Make apply return two series
Say I have the following dataframe
id | dict_col
---+---------
1 {"age":[1,2],"name":["john","doe"]}
2 {"age":[3,4],"name":["foo&...
1
vote
1
answer
230
views
Pandas, apply simple function to NaN returns value instead of NaN?
import pandas as pd
import numpy as np
pd.DataFrame(
{'a':[0,1,2,3],
'b':[np.nan, np.nan, np.nan,3]}
).apply(lambda x: x> 1)
returns me False for the column b, whereas I would like to get ...