Skip to main content
Filter by
Sorted by
Tagged with
-2 votes
0 answers
26 views

How to create a robust preprocessing function for any dataset? [closed]

I am working on a project where I predict the appropriate doctor specialization based on patient symptoms. The project includes a feature where researchers can upload their own datasets and evaluate ...
Oula R.'s user avatar
  • 31
0 votes
0 answers
6 views

no change using pyspark.ml.feature VectorAssembler

following an exemple from Databricks with my own data, I can't get the VectorAssembler transformation working. string_indexer = StringIndexer(inputCol='ptype', outputCol='index_ptype', handleInvalid=&...
Jonathan Roy's user avatar
3 votes
3 answers
127 views

How can I do one-hot encoding from multiple columns

When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this: Item N1 N2 N3 N4 Item1 1 2 4 8 Item2 2 3 6 7 Item3 4 5 7 9 Item4 1 5 6 7 ...
Eric's user avatar
  • 129
1 vote
2 answers
64 views

How do I onehotencode a single column in a dataframe?

I have a dataframe called "vehicles" with 8 columns. 7 are numerical but the column named 'Car_name' which is index 1 in the dataframe and is categorical. i need to encode it i tried this ...
levi mungai's user avatar
1 vote
1 answer
88 views

Error in validate_column_names(): Missing required columns after applying recipe in Tidymodels workflow with XGBoost

I'm encountering an issue when using tidymodels with xgboost in a workflow. After applying a recipe that includes step_dummy() to convert categorical variables into dummy variables, I receive the ...
TarJae's user avatar
  • 78.6k
1 vote
2 answers
104 views

Represent categorical column as One-Hot Encoding using SQL

I want to represent a string column as a binary 1 or 0 by pivoting the string column and making its values as header using SQL (Snowflake). It would python equivalent of pd.get_dummies where the ...
Krishnang K Dalal's user avatar
0 votes
1 answer
25 views

onehot encoding array shape does nor match all labels in a pandas dataframe column

I am trying to use onehot encoding on a pandas dataframe column. The encoder generates 1582 features but when I proceed to merge these features to my original dataframe, I get the following error ...
Paolo C's user avatar
0 votes
2 answers
82 views

One-hot-encoded table to tbl_summary

I have a one-hot-encoded tibble that contains the occurrence of certain arteries to in specific tumours. These can be divided in 2 groups: primary and secondary. If there would be only 1 occurrence ...
JLA's user avatar
  • 13
0 votes
0 answers
33 views

ValueError: setting an array element with a sequence when running Linear Regression model

I am doing a Kaggle competition on predicting house sales using regression models. I have converted the categorical data into a float using OneHotEncoder using a pipeline. However I keep getting the ...
Red_bull's user avatar
1 vote
0 answers
54 views

SMOTE Oversampling in Text Classification Fails with Multiple Input Features

I have a text classification problem where the input has 2 features: a text and a language: the text is a string variable. the language is a string variable that has the following values: "EN&...
Sandra Sukarieh's user avatar
0 votes
0 answers
38 views

sklearn OneHotEncoder - create new dataframe with certain rows that have value = 1

I have a dataframe with 20 columns and 10000 rows. I applied OneHotEncoder (drop='first') to the entire dataframe which gave me 79 columns. I am trying to grab specific rows and create a new dataframe ...
FutureDataScientist's user avatar
3 votes
5 answers
273 views

Convert count row to one hot encoding efficiently

I have a table with rows in this format where the integers are a count: A B C D E 0 a 2 0 3 x 1 b 1 2 0 y I'd like to convert it into a format where each count is a one hot encoded ...
donkey's user avatar
  • 1,428
1 vote
1 answer
49 views

Transfering NaN's to Dummy Variables While Using One Hot Encoder

I am using OneHotEncoder to create a series of dummy variables based on a categoric variable. The problem I encounter is that any missing values are not transfered to the available dummy variables. ...
Englishman Bob's user avatar
0 votes
1 answer
45 views

Sklearn : how to keep NaN values through OneHotEncoder?

INPUT I have the following data: import pandas as pd import numpy as np from sklearn.preprocessing import OneHotEncoder from sklearn.impute import SimpleImputer test_df = pd.DataFrame({'sex': ['...
ThibaultDECO's user avatar
0 votes
0 answers
13 views

ML model deployment, error in MinMaxScaler

This script is used to predict the risk of heart disease for a user based on the user's input data. Extra Tree classifier with get-dumies encoder for categorical features and MinMaxScaler for numeric ...
yasser Mamdouh's user avatar
1 vote
1 answer
39 views

Regarding onehotencoder space cost

Why doesn't one-hot encoding use bit-based encoding? Wouldn't it take much less memory? What I mean is when you encode for example four cities you can do it like what one-hot encoder does by expanding ...
Laggy's user avatar
  • 51
0 votes
0 answers
17 views

One-hot-encoding categories not seen by the validation set

I'm training a RNN on the IDS-2018 dataset. I wrote a pipeline for data preprocessing and applied it to the training set, during which I used one hot encoding using fit_transform. However, when I ...
Agnese Castellani's user avatar
0 votes
1 answer
27 views

How to Automatically Dummy Code High Cardinality Variables in Python

I am working my way through the data engineer salary data set on Kaggle. The salary_currency column has the following value counts. salary_currency USD 13695 GBP 558 EUR 406 INR 51 CAD 49 ....
Englishman Bob's user avatar
-1 votes
1 answer
28 views

One Hot Encoding with large dimensions [closed]

I am building a sales prediction model which consists of "Year", "Month", "Economy Indicator", "Customer_Id", "Product_Id", "Quantity", &...
ProfessorE's user avatar
0 votes
1 answer
136 views

ValueError: For a sparse output, all columns should be a numeric or convertible to a numeric

I am doing a pre-processing for my data before applying sklearn models, but I am having trouble identifying why an error keeps happening. When I run the code for each individual column index in ...
J.K.'s user avatar
  • 371
0 votes
0 answers
48 views

PySpark for big data analytics, Assertion Error: facing issues converting string features using hashing and one-hot encoding

I am new to big data analytics and working on machine learning tasks with big data, specifically credit card fraud detection, using PySpark. However, I've encountered a roadblock. In my dataset, I ...
Sankar's user avatar
  • 1
1 vote
1 answer
61 views

ML ColumnTransformer OneHotEncoder

When converting categorical data in first column of my dataframe I am getting strange behavior of ColumnTransformer with OneHotEncoder. the behavior occurs when I add one row to my csv file. the ...
Tech Wizard's user avatar
0 votes
1 answer
157 views

Troubleshooting OneHotEncoder issue in custom pipeline class conversion from Jupyter Notebook to .py file

tl;dr: I have a pipeline defined in a ipynb file working fine but when I tried to encapsulate it in a Class it didnn't worked as expected. I am probably making some mistake in OneHoteEncode. The ...
Dimitri's user avatar
  • 119
0 votes
1 answer
50 views

Can I add Multilevel Indexing for one-hot encoded features?

I am working on a dataset of mushroom features, almost all of which I encoded with pandas into binary but some are nominally encoded. I am wondering if I can take the original columns as a second ...
Georgia Anderson's user avatar
0 votes
1 answer
94 views

One-hot-encoding while loading data with arrow-rs

In my Rust project I am loading documents from Mongo and deserialize them into serde_json Values: match cursor.deserialize_current() { Ok(d) => { let doc = serde_json::to_value(&d)....
chmielot's user avatar
  • 371
0 votes
0 answers
92 views

Auto-arima from pmdarima gives 'Could not successfully fit a viable ARIMA model to input data' after one hot encoding and scaling

I've tried to train a sarimax model through auto-arima function from the pmdarima library. First I've tried to train it without the scaling and encoding of categorical and numerical exogenous features ...
Gabriele Passoni's user avatar
0 votes
1 answer
29 views

Replace single float values in pandas series with array

I am trying to create a one-hot encoding for target values using the built-in iris dataset with pandas. I have split the data into features and target labels. The target labels are a pandas series (...
Micah Bassett's user avatar
0 votes
0 answers
66 views

Map Onehot Encoded Features to Regression Coefficients in Pyspark

I have trained a linear regression model in Pyspark. Aside for continuous predictors, it contains categorical features that I onehot-coded. I'd like to have a look at the coefficients per input ...
Michael S's user avatar
0 votes
2 answers
48 views

Pyspark one-hot encoding with grouping same id

Is there a way to perform OHE in Spark and 'flatten' dataset so that each Id has only one row? For example if input is like this: +---+--------+ | id|category| +---+--------+ | 0| a| | 1| ...
Alex_Y's user avatar
  • 608
0 votes
0 answers
49 views

How to apply the sklearn OneHotEncoder to a subset of rows in a Pandas Dataframe?

I have a pandas dataframe with numerical as well as categorical columns. For any input row (to keep things simple we take any row from the orginal dataframe), I want to find the N most similar rows to ...
saptwa's user avatar
  • 1
1 vote
0 answers
64 views

Label Encoding for Categorical Features: Preserving Label Consistency Across Runs

Problem Description: Label Encoding Issue: Upon rerunning the label encoding code, the labels change, causing inconsistency. Dynamic Data from a Server: Incoming data might introduce new values, ...
Zeeshan Khalid's user avatar
0 votes
1 answer
49 views

Core ML MLOneHotEncoder Error Post-Update: "unknown category String"

Stack Overflow community, I recently updated Xcode and Core ML from version 13.0.1 to 14.1.2 and am facing an issue with the MLOneHotEncoder in my Core ML classifier. The same code and data that ...
Simon Bogutzky's user avatar
1 vote
1 answer
316 views

OneHotEncoder not behaving?

When I run the following code: from sklearn.preprocessing import OneHotEncoder as ohc enc = ohc(drop='if_binary', sparse_output=False).set_output(transform='pandas') I get the error: ----------------...
NDStar14's user avatar
1 vote
1 answer
144 views

How can I make a one neuron neural network?

I want to make a one neuron function like w1x1+w2x2+w3*x3+b1 My training input is [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, ...
Ahmet Rauf Oktay's user avatar
3 votes
2 answers
11k views

why is pd.get_dummies returning Boolean values instead of the binaries of 0 1

I don't know why my One-Hot encoding code; "pd.get_dummies" is returning Boolean values instead of the binaries of 0 1 df = pd.get_dummies(df) after writing the following line of code; df = ...
shasilon's user avatar
2 votes
5 answers
141 views

Decoding multiple hot-encoded columns efficiently in R

I have the following data frame: id = c(1,2,3) where_home = c(1, 0, NA) where_work = c(0, 1, NA) with_alone = c(0,0,0) with_parents = c(0,1,1) with_colleagues = c(1,1,0) gender_male = c(1,0,1) ...
Codrin Mironiuc's user avatar
0 votes
1 answer
45 views

Transforming multiple hot-encoded columns and converting to long-format

I have a quite complex data frame structure: ID = c(1,2,3) Sessions = c("2023-11-14 19:01:39+01:00", "2023-11-14 20:01:39+01:00", "2023-11-14 21:01:39+01:00") P_affect =...
Codrin Mironiuc's user avatar
1 vote
1 answer
76 views

How to perform one hot encoding without converting a data frame into an array?

I have df data frame with categorical features columns 'temp_of_extremities', 'peripheral_pulse', 'mucous_membrane'. I want to encode categorical features like here: from sklearn.preprocessing import ...
Паша физтех's user avatar
0 votes
1 answer
96 views

AxisError: axis 1 is out of bounds for array of dimension 1 (OneHotEncoded python) [duplicate]

Im working on a clasification model for dog breeds and Im trying to display examples of labels and their respective one hot encoded label but im getting an error saying that AxisError: axis 1 is out ...
Jeremy Jin's user avatar
1 vote
1 answer
259 views

Difference between one-hot-encoded and integer output in Sklearn

Consider the case of a multiclass classification problem with 12 classes (classes 0 to 11). These classes are nominal categorical variables (no ranking order). I have trained two models (M1 and M2), ...
Murilo's user avatar
  • 679
0 votes
0 answers
53 views

one-hot encoding from floating point index? (No gradients provided for any variable with Generalized Dice Loss)

I'm building an Unet-based neural network architecture for an I2I translation task using a paired dataset [input image, ground_truth]. The model takes in input an RGB image of shape 64x64x3 and ...
Salvatore Amodio's user avatar
0 votes
2 answers
1k views

Change of categorical data to numeric data in the required columns so that a linear regression can be apply to it

Some of the columns on the given dataset contains categorical data. I have to change the data to the numeric so that I can apply simple linear regression to predict the score. The columns name are ...
gaurav najpande's user avatar
0 votes
1 answer
50 views

How to turn a hot-encoded variable (multiple columns can be true) to one variable in R?

I am trying to convert columns that are one-hot encoded from a multiple-choice (multiple answers can be true) questionnaire to only one column, without affecting the other variables. I found similar ...
Codrin Mironiuc's user avatar
0 votes
0 answers
15 views

Handle New Categories for both training and test set

I worked on a classification problem for the Kaggle Competition, and I found there are 3 categorical columns, but I noticed that the service column in the train data has 66 categories, and in the test ...
user22689820's user avatar
0 votes
1 answer
241 views

How do I encode a csv with OneHotEncoder?

I'm trying to train an ai to make football results prediction. But I always got the following error: ValueError: could not convert string to float: '2023-09-30T14:00Z' The picture is the csv file ...
qing762's user avatar
-1 votes
1 answer
121 views

Why am I getting Hidden size error with PyTorch RNN

I am trying to build a RNN for next word prediction, following a next character prediction example (tutorial, github, colab (runtime ~1min)). In the example, the input shape is (3,14,17) for ...
vriv's user avatar
  • 21
-2 votes
1 answer
1k views

Got " TypeError: OneHotEncoder.__init__() got an unexpected keyword argument 'categorical_features' "

I write the following code, Here I attached the code image from sklearn.preprocessing import OneHotEncoder ohe = OneHotEncoder(categorical_features=[0]) and it gives me, TypeError: OneHotEncoder.init()...
Tejal Badgujar's user avatar
1 vote
0 answers
140 views

Pipeline in Pyspark ML 2.4.0 with OneHotEncoder, to migrate to Pyspark 3.0.0

I have created a PipelineModel with Pyspark 2.4.0 with several OneHotEncoder (and not OneHotEncoderEstimator). The steps can be summarized with this kind of snippet : import pyspark.sql.functions as F ...
Antoine Fernandes's user avatar
2 votes
1 answer
106 views

Random Forest predicting neither class when target is one hot encoded

I fairly know that trees are sensitive to one hot encoded (OHE) targets however I want to understand why it returns the predictions like this: array([[0, 0, 0, 0], [0, 0, 0, 0], . ...
Apollonia Vitelli's user avatar
0 votes
1 answer
77 views

one-hot-encoding : troubles fitting after applying encoding to train and test dataframes

I have 2 dataframes , testing and training , that at the beginning have the same numbers of columns. But , because in the columns with categorical data the 2 dataframes have different values , after ...
Angelo's user avatar
  • 3

1
2 3 4 5
25