All Questions
91,432 questions
2
votes
0
answers
23
views
streamlit update df interactively
I want to update a df interactively. The user is selecting a row, and then I want the displayed df to "disappear"and only the leftover options should stay visible.
In the original problem, ...
0
votes
2
answers
34
views
Cannot compare tz-naive and tz-aware timestamps
I'm finding the error below:
Cannot compare tz-naive and tz-aware timestamps
How can I convert dates to fix the issue? The error appear in the end of the syntax below.
from datetime import datetime, ...
0
votes
0
answers
95
views
How to efficiently make a large matrix of 1s and 0s
I have two numpy arrays x and y of same length, and I am trying to make a square matrix A such that the (i,j) entry of the matrix will contain a 1 if a certain relationship holds between x[i], x[j], y[...
1
vote
0
answers
37
views
Snowflake - Error while creating Temp View from snowpark dataframe
Hope you are all doing well.
I am facing a weird issue in Snowpark (Python) while creating a temp view from Dataframe.
I have searched online and while I have had hits, there is no proper solution.
...
0
votes
0
answers
31
views
problems when using Dask in a Dataframe in Python
A newbie here using parallel computing in Phyton
I have ~80 huge CSV files (32 GB each) that I need to process in Python to retrieve some rows from them.
The file structure is
'Barra', 'D1', 'D2','D3'...
0
votes
0
answers
19
views
plot pre-computed mean and confidence intervals for two types of firms (python)
I want to plot two time-series lines with mean and CI each for low and high diversity firms (variable div_ind) throughout 8 calendar years (variable calyr). May I know how to do it? Online sources ...
-2
votes
0
answers
40
views
How do I Get rid of '0' in index column?
I try to rename the Index column to 'idx' and get rid of 0 using this code:
df1.index.rename(name='idx', inplace=True)
However, I end up with the second dataframe as below. It results in messing up ...
1
vote
1
answer
48
views
merge several columns of the same data into one
I've created a dataframe that adds data from several sources. Here is an example subset:
index CompanyName Source1site Source2site Source3site City
1 Comp1 web1.com Nan ...
2
votes
2
answers
73
views
How to use numpy.where in a pipe function for pandas dataframe groupby?
Here is a script to simulate the issue I am facing:
import pandas as pd
import numpy as np
data = {
'a':[1,2,1,1,2,1,1],
'b':[10,40,20,10,40,10,20],
'c':[0.3, 0.2, 0.6, 0.4, 0....
1
vote
2
answers
63
views
Update Pandas DataFrame slice row-wise using dictionary
Question
I am updating the values in a slice of a pandas.DataFrame by row such that each row of the slice has unique value. I am using pandas version 2.2.3.
I have found an approach that seems to work ...
0
votes
0
answers
24
views
python df astype doesn't work on object from csv_read
I have a function that reads a csv into a df. the csv is quite large, but it's a combination of strings (categories) and numbers in different columns:
import pandas as pd
df_temp=pd.read_csv('somefile....
0
votes
0
answers
31
views
AttributeError: 'numpy.ndarray' object has no attribute 'categories'
Modin DataFrame Merge Issue After dropna on Categorical Column:
I'm encountering an issue when using Modin to merge DataFrames that contain categorical columns. The issue arose after I performed a ...
0
votes
1
answer
42
views
Indices mismatch during merge in pandas
I am trying to merge two dataframes in Python, pandas, df1 and df2.
I am trying to merge them on Column1, and then assign value of Column2 from df2 to df1.
This is my code:
df1 = df1.reset_index()
...
0
votes
3
answers
54
views
How to separate multiple tickers into individual dataframes with yfinance downloaded data
I'm trying to download stock data information using yfinance. Currently, I can successfully download a single ticker using yf.download which returns a dataframe with information I can use. This API ...
0
votes
1
answer
13
views
QTableView's ComboBox Delegate model is not synchronizing with the PandasModel
I've got a QTableView that uses a combobox delegate in 2 of the table's columns. The selected item from the combobox displays in the TableView correctly if no columns are sorted. When a column is ...
-2
votes
0
answers
14
views
How to impute OPEN_CLS_STS based on values in DT_CLS in Python [duplicate]
I'm trying to impute the OPEN_CLS_STS based on the values in DT_CLS.
IF DT_CLS has a date populated then OPEN_CLS_STS should have a value 'C'. Otherwise OPEN_CLS_STS should have a value 'O'.
I tried ...
-1
votes
1
answer
48
views
How to impute OPEN_CLS_STS based on values in DT_CLS [duplicate]
I'm trying to impute the OPEN_CLS_STS based on the values in DT_CLS.
IF DT_CLS has a date populated then OPEN_CLS_STS should have a value 'C'. Otherwise OPEN_CLS_STS should have a value 'O'.
I tried ...
0
votes
0
answers
25
views
Creating separate groups in a dataframe when column values repeat [duplicate]
I have a dataframe with numbers formatted as follows:
df = pd.DataFrame({"ColumnA": [1,2,3,4,5,6,7,8,9,10], "ColumnB": [1,3,5,6,4,7,5,4,1,2], "ColumnC": [0,1,1,2,0,2,1,1,...
0
votes
1
answer
72
views
Manipulation of a Pandas dataframe most time- and memory-efficiently
Please imagine I have a dataframe like this:
df = pd.DataFrame(index=pd.Index(['1', '1', '2', '2'], name='from'), columns=['to'], data= ['2', '2', '4', '5'])
df:
Now, I would like to calculate a ...
1
vote
2
answers
40
views
Pandas Dataframe Multiindex - Calculate Mean and add additional column to each level of the index
Given the following dataframe:
Year 2024 2023 2022
Header N Result SD N Result SD N Result SD
Vendor
A 5 20 3 5 22 4 1 21 3
B 4 25 2 ...
-1
votes
0
answers
50
views
My Exponential Moving Average calculations are still somehow wrong?
Where am I going wrong...
Here is my Python code which interacts with the MetaTrader5 API.
import numpy as np
import MetaTrader5 as mt5
import pandas as pd
from sklearn.preprocessing import ...
2
votes
1
answer
27
views
python dataframe slicing by row number
all Python experts,
I'm a Python newbie, stuck with a problem which may look very simple to you. Say I have a data frame of 100 rows, how can I split it into 5 sub-frames, each of which contains the ...
0
votes
0
answers
46
views
I cannot get all data to export to CSV
# Collect batting stats for the 2022, 2023, and 2024 seasons
try:
print("Collecting batting stats from 2022 to 2024...")
batting_data = batting_stats(2021, 2024, league="all&...
1
vote
1
answer
40
views
Comparing empty dataframes
I have a function, extract_redundant_values, to extract redundant rows from a pandas dataframe. I am testing it by running on in_df to generate out_df. I am then comparing this against my expected ...
0
votes
1
answer
51
views
loop over date range and appending new values to a new data frame
I wish to loop each row of the date frame below over each date of date rage below, check the following condition and return the current date of date range in a new data frame with all columns we have ...
1
vote
1
answer
60
views
Fill in rows to dataframe based on another dataframe
I have 2 dataframes that look like this:
import pandas as pd
data = {'QuarterYear': ["Q3 2023", "Q4 2023", "Q1 2024", 'Q2 2024', "Q3 2024", "Q4 2024"]...
1
vote
1
answer
49
views
Alternate background colors in styled pandas df that also apply to MultiIndex in python pandas
SETUP
I have the following df:
import pandas as pd
import numpy as np
arrays = [
np.array(["fruit", "fruit", "fruit","vegetable", "vegetable", &...
1
vote
1
answer
35
views
How to style all cells in a row of a specific MultiIndex value in pandas
SETUP
I have the following df:
import pandas as pd
import numpy as np
arrays = [
np.array(["fruit", "fruit", "fruit","vegetable", "vegetable", &...
1
vote
1
answer
39
views
Pyspark computation time increases with less data
I'm posed with a problem where i have to iterate the same computations on each row of data until they converge. My train of thought was to remove the converged rows after each iteration so the ...
-1
votes
0
answers
40
views
XML to Pandas dataFrame [closed]
0 A 51 non-null object
1 B 51 non-null object
2 C 51 non-null object
3 D 45 non-null object
This is the info of the dataframe.
It is fine when I just return it ...
0
votes
0
answers
30
views
How to transform nested data to be used with an tabular learning network
I have an dataframe containing measurements and error with there corresponding status and counter. And i want to use it to feed it into e.g. TabNet
Raw dataframe
protocolid
time
measurement_1
...
0
votes
0
answers
33
views
Data retrieving and SQL database update
I'm trying to retrieve some data from an API and save them to a local database I created. All data come from Google Ads campaigns, and I need to make two separate calls because of their docs, but that'...
0
votes
0
answers
31
views
why am I am getting an the: sns.lineplot(x=anomaly_df['Date'], y=scaler.inverse_transform(anomaly_df['Close/Last']))
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Input, Dropout
from keras.layers import Dense
from keras.layers import RepeatVector
from keras.layers import ...
0
votes
0
answers
17
views
Trying to iterate over different teams in mean data from a larger dataset [duplicate]
Basically after the mean data for the teams home and away games is taken, I want to plot multiple graphs for each team in one loop, essentially, in the code below, where Arsenal is in quotes in the ...
1
vote
2
answers
78
views
How can I clean a year column with messy values?
I have a project I'm working on for a data analysis course, where we pick a data set and go through the steps of cleaning and exploring the data with a question to answer in mind.
I want to be able to ...
0
votes
3
answers
81
views
Pandas dataframe reshape with columns name [closed]
I have a dataframe like this:
>>> df
TYPE A B C D
0 IN 550 350 600 360
1 OUT 340 270 420 190
I want reshape it to this shape:
AIN AOUT BIN BOUT CIN COUT ...
2
votes
1
answer
57
views
Dropping duplicates by column in PySpark
I have a PySpark dataframe like this but with a lot more data:
user_id
event_date
123
'2024-01-01 14:45:12.00'
123
'2024-01-02 14:45:12.00'
456
'2024-01-01 14:45:12.00'
456
'2024-03-01 14:45:12....
0
votes
0
answers
34
views
Create a new line for comma separated values in pandas column - I dont want to add new rows, I want to have same rows in output [duplicate]
I have a dataframe like this,
df
col1 col2
1 'abc,pqr'
2 'ghv'
3 'mrr, jig'
Now I want to create a new line for each comma separated values in col2, so the output would look ...
0
votes
1
answer
49
views
How can i change a column data type in pandas without creating null values in the whole column in my dataframe
I have been getting null values when trying to convert a column with the non-numeric type values to a column with numeric type values
I have been using the below code line to change my column data ...
0
votes
0
answers
49
views
How to Increase Precision of Decimal Points in Python DataFrames? [closed]
I am developing a system in Python that replicates another written in LabWindows. A part of the design involves calculating the Periodogram, which returns a decimal array. I then add this array to a ...
0
votes
0
answers
28
views
Pandas DataFrame uses more memory than it claimed
My program is very simple. I run it in Jupyter Notebook. It loads data from MongoDB. I tried to store the data as pandas.DataFrame at first.
import pandas as pd
import pymongo
mongo = pymongo....
0
votes
3
answers
60
views
Add columns to dataframe from a dictionary
There are many answers out there to this question, but I couldn't find one that applies to my case.
I have a dataframe that contains ID's:
df = pd.DataFrame({"id": [0, 1, 2, 3, 4]})
Now, I ...
0
votes
1
answer
45
views
How to check pyspark dataframe column for incorrect value type using pytest? [closed]
I am trying to write a test to see if the spark dataframe has records with incorrect value type, but I'm stuck.
There is the dataframe:
schema1 = StructType(
[
StructField("id_key&...
0
votes
0
answers
23
views
Reassigning pandas columns in chained .assign() gives incorrect values [duplicate]
I often follow the convention (for better or worse) of loading data and preprocessing manipulations in a single line of chained pandas commands. In one such manipulation, I need to multiply a set of ...
0
votes
2
answers
67
views
How to convert string scientific notation to float within a txt file
I have code in a .txt file that has scientific notation values stored as strings and I am trying to convert them to floats that way I can perform calculations on them. However, when I try to attempt ...
-1
votes
0
answers
51
views
Pandas read_excel is throwing an issue related to datetime conversion while reading an .xlsx or .xls file, but file doesn’t have any datetime columns
By using below code facing issue:
I am trying to read .xslx as well .xls file.
df = pd.read_excel(filepath,sheet_name = "Package ID Informatio", header=hd, dtype=str)
Code is running well ...
1
vote
2
answers
78
views
Pandas dataframe - combine cell values as strings [duplicate]
I have a dataframe:
Email | Col1 | Col2 | Col3 | Name
--------------------------------------------------------------------
[email protected] | CellStr11 | 1.4 | CellStr13 |...
1
vote
2
answers
42
views
Pandas dataframe - finding row comparing two cell values
I have a dataframe:
Email | ... | Name
--------------------------------------
[email protected] | ... | John Cena
[email protected] | ... | John Cena
I need to find a row, that ...
1
vote
1
answer
37
views
Filter Pandas DataFrame when all IDs are blank [duplicate]
This is how I am populating my DataFrame:
import pandas as pd
data = {'ID1': ['BBG01Q69DW37', 'BBG01Q69DW37','BBG01Q69TEST','BBG01Q69TES1'],
'ID2': ['YU3384903', 'YU3384903','','YU338TES1'],
...
-2
votes
0
answers
26
views
Errors reading csv file from different URLs [duplicate]
I cannot figure out why the same approach in pandas cannot be used to read the CSV file of the two following URLs.
import pandas as pd
url1 = "https://data.ontario.ca/dataset/a2dfa674-a173-45b3-...