1,229 questions
-2
votes
0
answers
26
views
How to create a robust preprocessing function for any dataset? [closed]
I am working on a project where I predict the appropriate doctor specialization based on patient symptoms.
The project includes a feature where researchers can upload their own datasets and evaluate ...
0
votes
0
answers
6
views
no change using pyspark.ml.feature VectorAssembler
following an exemple from Databricks with my own data, I can't get the VectorAssembler transformation working.
string_indexer = StringIndexer(inputCol='ptype', outputCol='index_ptype', handleInvalid=&...
3
votes
3
answers
127
views
How can I do one-hot encoding from multiple columns
When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this:
Item
N1
N2
N3
N4
Item1
1
2
4
8
Item2
2
3
6
7
Item3
4
5
7
9
Item4
1
5
6
7
...
1
vote
2
answers
64
views
How do I onehotencode a single column in a dataframe?
I have a dataframe called "vehicles" with 8 columns. 7 are numerical but the column named 'Car_name' which is index 1 in the dataframe and is categorical. i need to encode it
i tried this ...
1
vote
1
answer
88
views
Error in validate_column_names(): Missing required columns after applying recipe in Tidymodels workflow with XGBoost
I'm encountering an issue when using tidymodels with xgboost in a workflow. After applying a recipe that includes step_dummy() to convert categorical variables into dummy variables, I receive the ...
1
vote
2
answers
104
views
Represent categorical column as One-Hot Encoding using SQL
I want to represent a string column as a binary 1 or 0 by pivoting the string column and making its values as header using SQL (Snowflake). It would python equivalent of pd.get_dummies where the ...
0
votes
1
answer
25
views
onehot encoding array shape does nor match all labels in a pandas dataframe column
I am trying to use onehot encoding on a pandas dataframe column. The encoder generates 1582 features but when I proceed to merge these features to my original dataframe, I get the following error ...
0
votes
2
answers
82
views
One-hot-encoded table to tbl_summary
I have a one-hot-encoded tibble that contains the occurrence of certain arteries to in specific tumours. These can be divided in 2 groups: primary and secondary.
If there would be only 1 occurrence ...
0
votes
0
answers
33
views
ValueError: setting an array element with a sequence when running Linear Regression model
I am doing a Kaggle competition on predicting house sales using regression models. I have converted the categorical data into a float using OneHotEncoder using a pipeline. However I keep getting the ...
1
vote
0
answers
54
views
SMOTE Oversampling in Text Classification Fails with Multiple Input Features
I have a text classification problem where the input has 2 features: a text and a language:
the text is a string variable. the language is a string variable that has the following values: "EN&...
0
votes
0
answers
38
views
sklearn OneHotEncoder - create new dataframe with certain rows that have value = 1
I have a dataframe with 20 columns and 10000 rows. I applied OneHotEncoder (drop='first') to the entire dataframe which gave me 79 columns. I am trying to grab specific rows and create a new dataframe ...
3
votes
5
answers
273
views
Convert count row to one hot encoding efficiently
I have a table with rows in this format where the integers are a count:
A B C D E
0 a 2 0 3 x
1 b 1 2 0 y
I'd like to convert it into a format where each count is a one hot encoded ...
1
vote
1
answer
49
views
Transfering NaN's to Dummy Variables While Using One Hot Encoder
I am using OneHotEncoder to create a series of dummy variables based on a categoric variable. The problem I encounter is that any missing values are not transfered to the available dummy variables.
...
0
votes
1
answer
45
views
Sklearn : how to keep NaN values through OneHotEncoder?
INPUT
I have the following data:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
test_df = pd.DataFrame({'sex': ['...
0
votes
0
answers
13
views
ML model deployment, error in MinMaxScaler
This script is used to predict the risk of heart disease for a user based on the user's input data.
Extra Tree classifier with get-dumies encoder for categorical features and MinMaxScaler for numeric ...
1
vote
1
answer
39
views
Regarding onehotencoder space cost
Why doesn't one-hot encoding use bit-based encoding? Wouldn't it take much less memory? What I mean is when you encode for example four cities you can do it like what one-hot encoder does by expanding ...
0
votes
0
answers
17
views
One-hot-encoding categories not seen by the validation set
I'm training a RNN on the IDS-2018 dataset.
I wrote a pipeline for data preprocessing and applied it to the training set, during which I used one hot encoding using fit_transform. However, when I ...
0
votes
1
answer
27
views
How to Automatically Dummy Code High Cardinality Variables in Python
I am working my way through the data engineer salary data set on Kaggle. The salary_currency column has the following value counts.
salary_currency
USD 13695
GBP 558
EUR 406
INR 51
CAD 49
....
-1
votes
1
answer
28
views
One Hot Encoding with large dimensions [closed]
I am building a sales prediction model which consists of "Year", "Month", "Economy Indicator", "Customer_Id", "Product_Id", "Quantity", &...
0
votes
1
answer
136
views
ValueError: For a sparse output, all columns should be a numeric or convertible to a numeric
I am doing a pre-processing for my data before applying sklearn models, but I am having trouble identifying why an error keeps happening. When I run the code for each individual column index in ...
0
votes
0
answers
48
views
PySpark for big data analytics, Assertion Error: facing issues converting string features using hashing and one-hot encoding
I am new to big data analytics and working on machine learning tasks with big data, specifically credit card fraud detection, using PySpark. However, I've encountered a roadblock. In my dataset, I ...
1
vote
1
answer
61
views
ML ColumnTransformer OneHotEncoder
When converting categorical data in first column of my dataframe I am getting strange behavior of ColumnTransformer with OneHotEncoder. the behavior occurs when I add one row to my csv file.
the ...
0
votes
1
answer
157
views
Troubleshooting OneHotEncoder issue in custom pipeline class conversion from Jupyter Notebook to .py file
tl;dr: I have a pipeline defined in a ipynb file working fine but when I tried to encapsulate it in a Class it didnn't worked as expected. I am probably making some mistake in OneHoteEncode. The ...
0
votes
1
answer
50
views
Can I add Multilevel Indexing for one-hot encoded features?
I am working on a dataset of mushroom features, almost all of which I encoded with pandas into binary but some are nominally encoded. I am wondering if I can take the original columns as a second ...
0
votes
1
answer
94
views
One-hot-encoding while loading data with arrow-rs
In my Rust project I am loading documents from Mongo and deserialize them into serde_json Values:
match cursor.deserialize_current() {
Ok(d) => {
let doc = serde_json::to_value(&d)....
0
votes
0
answers
92
views
Auto-arima from pmdarima gives 'Could not successfully fit a viable ARIMA model to input data' after one hot encoding and scaling
I've tried to train a sarimax model through auto-arima function from the pmdarima library. First I've tried to train it without the scaling and encoding of categorical and numerical exogenous features ...
0
votes
1
answer
29
views
Replace single float values in pandas series with array
I am trying to create a one-hot encoding for target values using the built-in iris dataset with pandas. I have split the data into features and target labels. The target labels are a pandas series (...
0
votes
0
answers
66
views
Map Onehot Encoded Features to Regression Coefficients in Pyspark
I have trained a linear regression model in Pyspark. Aside for continuous predictors, it contains categorical features that I onehot-coded. I'd like to have a look at the coefficients per input ...
0
votes
2
answers
48
views
Pyspark one-hot encoding with grouping same id
Is there a way to perform OHE in Spark and 'flatten' dataset so that each Id has only one row?
For example if input is like this:
+---+--------+
| id|category|
+---+--------+
| 0| a|
| 1| ...
0
votes
0
answers
49
views
How to apply the sklearn OneHotEncoder to a subset of rows in a Pandas Dataframe?
I have a pandas dataframe with numerical as well as categorical columns.
For any input row (to keep things simple we take any row from the orginal dataframe), I want to find the N most similar rows to ...
1
vote
0
answers
64
views
Label Encoding for Categorical Features: Preserving Label Consistency Across Runs
Problem Description:
Label Encoding Issue: Upon rerunning the label encoding code, the labels change, causing inconsistency.
Dynamic Data from a Server: Incoming data might introduce new values, ...
0
votes
1
answer
49
views
Core ML MLOneHotEncoder Error Post-Update: "unknown category String"
Stack Overflow community,
I recently updated Xcode and Core ML from version 13.0.1 to 14.1.2 and am facing an issue with the MLOneHotEncoder in my Core ML classifier. The same code and data that ...
1
vote
1
answer
316
views
OneHotEncoder not behaving?
When I run the following code:
from sklearn.preprocessing import OneHotEncoder as ohc
enc = ohc(drop='if_binary', sparse_output=False).set_output(transform='pandas')
I get the error:
----------------...
1
vote
1
answer
144
views
How can I make a one neuron neural network?
I want to make a one neuron function like w1x1+w2x2+w3*x3+b1
My training input is
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, ...
3
votes
2
answers
11k
views
why is pd.get_dummies returning Boolean values instead of the binaries of 0 1
I don't know why my One-Hot encoding code; "pd.get_dummies" is returning Boolean values instead of the binaries of 0 1
df = pd.get_dummies(df)
after writing the following line of code; df = ...
2
votes
5
answers
141
views
Decoding multiple hot-encoded columns efficiently in R
I have the following data frame:
id = c(1,2,3)
where_home = c(1, 0, NA)
where_work = c(0, 1, NA)
with_alone = c(0,0,0)
with_parents = c(0,1,1)
with_colleagues = c(1,1,0)
gender_male = c(1,0,1)
...
0
votes
1
answer
45
views
Transforming multiple hot-encoded columns and converting to long-format
I have a quite complex data frame structure:
ID = c(1,2,3)
Sessions = c("2023-11-14 19:01:39+01:00", "2023-11-14 20:01:39+01:00", "2023-11-14 21:01:39+01:00")
P_affect =...
1
vote
1
answer
76
views
How to perform one hot encoding without converting a data frame into an array?
I have df data frame with categorical features columns 'temp_of_extremities', 'peripheral_pulse', 'mucous_membrane'.
I want to encode categorical features like here:
from sklearn.preprocessing import ...
0
votes
1
answer
96
views
AxisError: axis 1 is out of bounds for array of dimension 1 (OneHotEncoded python) [duplicate]
Im working on a clasification model for dog breeds and Im trying to display examples of labels and their respective one hot encoded label but im getting an error saying that AxisError: axis 1 is out ...
1
vote
1
answer
259
views
Difference between one-hot-encoded and integer output in Sklearn
Consider the case of a multiclass classification problem with 12 classes (classes 0 to 11). These classes are nominal categorical variables (no ranking order).
I have trained two models (M1 and M2), ...
0
votes
0
answers
53
views
one-hot encoding from floating point index? (No gradients provided for any variable with Generalized Dice Loss)
I'm building an Unet-based neural network architecture for an I2I translation task using a paired dataset [input image, ground_truth].
The model takes in input an RGB image of shape 64x64x3 and ...
0
votes
2
answers
1k
views
Change of categorical data to numeric data in the required columns so that a linear regression can be apply to it
Some of the columns on the given dataset contains categorical data. I have to change the data to the numeric so that I can apply simple linear regression to predict the score. The columns name are ...
0
votes
1
answer
50
views
How to turn a hot-encoded variable (multiple columns can be true) to one variable in R?
I am trying to convert columns that are one-hot encoded from a multiple-choice (multiple answers can be true) questionnaire to only one column, without affecting the other variables. I found similar ...
0
votes
0
answers
15
views
Handle New Categories for both training and test set
I worked on a classification problem for the Kaggle Competition, and I found there are 3 categorical columns, but I noticed that the service column in the train data has 66 categories, and in the test ...
0
votes
1
answer
241
views
How do I encode a csv with OneHotEncoder?
I'm trying to train an ai to make football results prediction.
But I always got the following error:
ValueError: could not convert string to float: '2023-09-30T14:00Z'
The picture is the csv file ...
-1
votes
1
answer
121
views
Why am I getting Hidden size error with PyTorch RNN
I am trying to build a RNN for next word prediction, following a next character prediction example (tutorial, github, colab (runtime ~1min)).
In the example, the input shape is (3,14,17) for ...
-2
votes
1
answer
1k
views
Got " TypeError: OneHotEncoder.__init__() got an unexpected keyword argument 'categorical_features' "
I write the following code,
Here I attached the code image
from sklearn.preprocessing import OneHotEncoder ohe = OneHotEncoder(categorical_features=[0])
and it gives me,
TypeError: OneHotEncoder.init()...
1
vote
0
answers
140
views
Pipeline in Pyspark ML 2.4.0 with OneHotEncoder, to migrate to Pyspark 3.0.0
I have created a PipelineModel with Pyspark 2.4.0 with several OneHotEncoder (and not OneHotEncoderEstimator).
The steps can be summarized with this kind of snippet :
import pyspark.sql.functions as F
...
2
votes
1
answer
106
views
Random Forest predicting neither class when target is one hot encoded
I fairly know that trees are sensitive to one hot encoded (OHE) targets however I want to understand why it returns the predictions like this:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
.
...
0
votes
1
answer
77
views
one-hot-encoding : troubles fitting after applying encoding to train and test dataframes
I have 2 dataframes , testing and training , that at the beginning have the same numbers of columns.
But , because in the columns with categorical data the 2 dataframes have different values , after ...