147,557 questions
-1
votes
0
answers
18
views
Get rid of '0' in index column
I try to rename the Index column to 'idx' and get rid of 0 using this code:
df1.index.rename(name='idx', inplace=True)
However, I end up with the second dataframe as below. It results in messing up ...
0
votes
1
answer
34
views
merge several columns of the same data into one
I've created a dataframe that adds data from several sources. Here is an example subset:
index CompanyName Source1site Source2site Source3site City
1 Comp1 web1.com Nan ...
0
votes
1
answer
16
views
Debugging Error: Non-Numeric Argument in R Function for Calculating Animal Movement
I have an animal movement dataset ("data") that looks like this:
Data
ID Time x y u v
A 2008-02-01 12:00:00 9155834.12606686 -1085858.899 ...
2
votes
2
answers
64
views
How to use numpy.where in a pipe function for pandas dataframe groupby?
Here is a script to simulate the issue I am facing:
import pandas as pd
import numpy as np
data = {
'a':[1,2,1,1,2,1,1],
'b':[10,40,20,10,40,10,20],
'c':[0.3, 0.2, 0.6, 0.4, 0....
1
vote
2
answers
45
views
Update Pandas DataFrame slice row-wise using dictionary
Question
I am trying to update the values in a pandas (version 2.2.3) dataframe by row (such that each row has the same values) using a dictionary with row indices as keys and row values as values.
In ...
0
votes
0
answers
19
views
python df astype doesn't work on object from csv_read
I have a function that reads a csv into a df. the csv is quite large, but it's a combination of strings (categories) and numbers in different columns:
import pandas as pd
df_temp=pd.read_csv('somefile....
0
votes
0
answers
24
views
AttributeError: 'numpy.ndarray' object has no attribute 'categories'
Modin DataFrame Merge Issue After dropna on Categorical Column:
I'm encountering an issue when using Modin to merge DataFrames that contain categorical columns. The issue arose after I performed a ...
0
votes
1
answer
34
views
Indices mismatch during merge in pandas
I am trying to merge two dataframes in Python, pandas, df1 and df2.
I am trying to merge them on Column1, and then assign value of Column2 from df2 to df1.
This is my code:
df1 = df1.reset_index()
...
0
votes
1
answer
29
views
Pandas read_excel with nrows and skiprows and lazy-loading?
I am searching for ways to read an .xlsx file as chunks of dataframes, instead of loading the whole thing into memory. What exactly happens when I pd.read_excel(nrows, skiprows, usecols) ? Is the ...
0
votes
1
answer
28
views
Comparing every latitude and longitude in a dataframe
Probably in over my head here as I'm still learning R...
I have a dataframe containing a column with longitude values and another column with corresponding latitude values. I want to find long/lat ...
0
votes
3
answers
49
views
How to separate multiple tickers into individual dataframes with yfinance downloaded data
I'm trying to download stock data information using yfinance. Currently, I can successfully download a single ticker using yf.download which returns a dataframe with information I can use. This API ...
0
votes
0
answers
9
views
QTableView's ComboBox Delegate model is not synchronizing with the PandasModel
I've got a QTableView that uses a combobox delegate in 2 of the table's columns. The selected item from the combobox displays in the TableView correctly if no columns are sorted. When a column is ...
3
votes
2
answers
82
views
How to extract strings after specific symbols in one column and separate to multiple rows?
I have data that contains the nearest gene sets, including their genomic region and strand, in one column.
I want to make a new column for the single gene extracted from that column and separate them ...
-2
votes
0
answers
13
views
How to impute OPEN_CLS_STS based on values in DT_CLS in Python [duplicate]
I'm trying to impute the OPEN_CLS_STS based on the values in DT_CLS.
IF DT_CLS has a date populated then OPEN_CLS_STS should have a value 'C'. Otherwise OPEN_CLS_STS should have a value 'O'.
I tried ...
-1
votes
1
answer
45
views
How to impute OPEN_CLS_STS based on values in DT_CLS [duplicate]
I'm trying to impute the OPEN_CLS_STS based on the values in DT_CLS.
IF DT_CLS has a date populated then OPEN_CLS_STS should have a value 'C'. Otherwise OPEN_CLS_STS should have a value 'O'.
I tried ...
0
votes
0
answers
24
views
Creating separate groups in a dataframe when column values repeat [duplicate]
I have a dataframe with numbers formatted as follows:
df = pd.DataFrame({"ColumnA": [1,2,3,4,5,6,7,8,9,10], "ColumnB": [1,3,5,6,4,7,5,4,1,2], "ColumnC": [0,1,1,2,0,2,1,1,...
-1
votes
0
answers
43
views
How convert complex JSON into parquet file
I need convert the next json into a parquet file. Convert this kind of json using pyspark it's really easy, but the complex here is that i have a sub childs and have to do more that one explode, and ...
0
votes
1
answer
71
views
Manipulation of a Pandas dataframe most time- and memory-efficiently
Please imagine I have a dataframe like this:
df = pd.DataFrame(index=pd.Index(['1', '1', '2', '2'], name='from'), columns=['to'], data= ['2', '2', '4', '5'])
df:
Now, I would like to calculate a ...
1
vote
2
answers
39
views
Pandas Dataframe Multiindex - Calculate Mean and add additional column to each level of the index
Given the following dataframe:
Year 2024 2023 2022
Header N Result SD N Result SD N Result SD
Vendor
A 5 20 3 5 22 4 1 21 3
B 4 25 2 ...
0
votes
1
answer
27
views
Order dataframe within pivot_wider function?
I have a dataframe in a longlist format with duplicate IDs. Each ID has a so-called donornr and timepoint (Tijdspunt). One ID (Deelnemernr.) can have duplicate timepoints like so:
Deelnemernr. ...
-1
votes
0
answers
49
views
My Exponential Moving Average calculations are still somehow wrong?
Where am I going wrong...
Here is my Python code which interacts with the MetaTrader5 API.
import numpy as np
import MetaTrader5 as mt5
import pandas as pd
from sklearn.preprocessing import ...
2
votes
1
answer
27
views
python dataframe slicing by row number
all Python experts,
I'm a Python newbie, stuck with a problem which may look very simple to you. Say I have a data frame of 100 rows, how can I split it into 5 sub-frames, each of which contains the ...
0
votes
0
answers
45
views
I cannot get all data to export to CSV
# Collect batting stats for the 2022, 2023, and 2024 seasons
try:
print("Collecting batting stats from 2022 to 2024...")
batting_data = batting_stats(2021, 2024, league="all&...
1
vote
1
answer
38
views
Comparing empty dataframes
I have a function, extract_redundant_values, to extract redundant rows from a pandas dataframe. I am testing it by running on in_df to generate out_df. I am then comparing this against my expected ...
0
votes
1
answer
46
views
loop over date range and appending new values to a new data frame
I wish to loop each row of the date frame below over each date of date rage below, check the following condition and return the current date of date range in a new data frame with all columns we have ...
1
vote
1
answer
58
views
Fill in rows to dataframe based on another dataframe
I have 2 dataframes that look like this:
import pandas as pd
data = {'QuarterYear': ["Q3 2023", "Q4 2023", "Q1 2024", 'Q2 2024', "Q3 2024", "Q4 2024"]...
-1
votes
0
answers
43
views
Convert null values from json file into a empty string
i need a help from all of you.
I have to copy a json file from an S3 bucket to another S3 bucket, but this new json file must contain all the fields that have "null" value as an "" ...
1
vote
1
answer
47
views
Alternate background colors in styled pandas df that also apply to MultiIndex in python pandas
SETUP
I have the following df:
import pandas as pd
import numpy as np
arrays = [
np.array(["fruit", "fruit", "fruit","vegetable", "vegetable", &...
0
votes
2
answers
77
views
Combine likert plot from appended data frame and bar plot from pure data frame in R using ggplot2
I have a data frame in R called df :
library(tibble)
library(tidyverse)
library(ggplot2)
library(ggstats)
var_levels <- c(LETTERS[1:20])
n = 500
likert_levels = c(
"Very \n Dissatisfied&...
0
votes
0
answers
45
views
force a column of all NaNs to be seen as a string
I have two dataframes that I need to merge with an automated process with the corresponding details:
They are read as CSV and there is a corresponding type inference of the dtypes.
Most of the time, ...
1
vote
1
answer
35
views
How to style all cells in a row of a specific MultiIndex value in pandas
SETUP
I have the following df:
import pandas as pd
import numpy as np
arrays = [
np.array(["fruit", "fruit", "fruit","vegetable", "vegetable", &...
1
vote
1
answer
37
views
Pyspark computation time increases with less data
I'm posed with a problem where i have to iterate the same computations on each row of data until they converge. My train of thought was to remove the converged rows after each iteration so the ...
0
votes
0
answers
18
views
bad : in bean i tried to create muiltiple data base to h2 and mysql [closed]
Error creating bean with name 'mysqlEntityManagerFactory' defined in class path resource [com/example/bank/Configure/MysqlConfig.class]: No PersistenceProvider specified in EntityManagerFactory ...
0
votes
1
answer
31
views
Create an empty schema with struct inside
Hello guys i have a small question today, something that i want to set when i create an empty dataframe
i want to set an empty schema if the json that i receive is the field "data" empty
i ...
-1
votes
0
answers
40
views
XML to Pandas dataFrame [closed]
0 A 51 non-null object
1 B 51 non-null object
2 C 51 non-null object
3 D 45 non-null object
This is the info of the dataframe.
It is fine when I just return it ...
0
votes
0
answers
28
views
How to transform nested data to be used with an tabular learning network
I have an dataframe containing measurements and error with there corresponding status and counter. And i want to use it to feed it into e.g. TabNet
Raw dataframe
protocolid
time
measurement_1
...
0
votes
1
answer
37
views
How can I extract specific values from a .csv-File and add them into a specific cell in a pre-exisiting dataframe/tibble in R automatically?
I want to automatically extract specific values from a .csv-File, which is generated by our measuring device, into a a dataframe/tibble in R which has a pre-defined layout. The name of the measured ...
0
votes
0
answers
33
views
Data retrieving and SQL database update
I'm trying to retrieve some data from an API and save them to a local database I created. All data come from Google Ads campaigns, and I need to make two separate calls because of their docs, but that'...
0
votes
0
answers
31
views
why am I am getting an the: sns.lineplot(x=anomaly_df['Date'], y=scaler.inverse_transform(anomaly_df['Close/Last']))
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Input, Dropout
from keras.layers import Dense
from keras.layers import RepeatVector
from keras.layers import ...
0
votes
0
answers
17
views
Trying to iterate over different teams in mean data from a larger dataset [duplicate]
Basically after the mean data for the teams home and away games is taken, I want to plot multiple graphs for each team in one loop, essentially, in the code below, where Arsenal is in quotes in the ...
0
votes
0
answers
21
views
I reduced a dataframe of times series, and now get "Error in replCmat4" and "Incompatible methods" error messages
I want to estimate a panel VAR model on a large set of data (130 companies, 6 variables, 2 identifiers over 10 years), so large I had to cut my sample by half to have enough memory to run the function....
1
vote
2
answers
77
views
How can I clean a year column with messy values?
I have a project I'm working on for a data analysis course, where we pick a data set and go through the steps of cleaning and exploring the data with a question to answer in mind.
I want to be able to ...
0
votes
3
answers
77
views
Pandas dataframe reshape with columns name [closed]
I have a dataframe like this:
>>> df
TYPE A B C D
0 IN 550 350 600 360
1 OUT 340 270 420 190
I want reshape it to this shape:
AIN AOUT BIN BOUT CIN COUT ...
0
votes
0
answers
25
views
Pandas Dataframe rolling mean of last 50 daily values differs from rolling("50D").mean() [duplicate]
I'm trying to find how the "50D" rolling mean is being calculated in the following example because really I cannot find the way.
import pandas as pd
values = [np.nan, -0.00076194, -0....
2
votes
1
answer
55
views
Dropping duplicates by column in PySpark
I have a PySpark dataframe like this but with a lot more data:
user_id
event_date
123
'2024-01-01 14:45:12.00'
123
'2024-01-02 14:45:12.00'
456
'2024-01-01 14:45:12.00'
456
'2024-03-01 14:45:12....
0
votes
0
answers
34
views
Create a new line for comma separated values in pandas column - I dont want to add new rows, I want to have same rows in output [duplicate]
I have a dataframe like this,
df
col1 col2
1 'abc,pqr'
2 'ghv'
3 'mrr, jig'
Now I want to create a new line for each comma separated values in col2, so the output would look ...
1
vote
3
answers
53
views
How can I count cominations of variables in R?
I'm trying to count the number of occurrences of combinations across two variables in a data frame in R.
If I have the following dataframe:
df <- data.frame(v1 = c("A", "A", &...
0
votes
1
answer
49
views
How can i change a column data type in pandas without creating null values in the whole column in my dataframe
I have been getting null values when trying to convert a column with the non-numeric type values to a column with numeric type values
I have been using the below code line to change my column data ...
0
votes
0
answers
47
views
How to Increase Precision of Decimal Points in Python DataFrames? [closed]
I am developing a system in Python that replicates another written in LabWindows. A part of the design involves calculating the Periodogram, which returns a decimal array. I then add this array to a ...
0
votes
0
answers
34
views
When I change the status and save the spreadsheet, the status I changed is not modified [closed]
I am comparing two excel spreadsheets, I select the first one that has all the data that should be in the system and the second spreadsheet has the data that was included in the system, after ...