Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
2 answers
35 views

Feature Importance with ColumnTransform and OneHotEncoder in RandomForestClassifier

Apologies for bothering you, but I haven't been able to find a definitive answer after searching the site. I'm building a RandomForestClassifier on some clinical data where the target variable (...
Aezhel's user avatar
  • 11
0 votes
0 answers
29 views

How to use Keras with Optuna tuning and Sklearn Pipeline

I am developing a model using Keras and use Optuna for the hyperameter tuning. I need to use K-fold method for the development. However, I cannot successfully run it. Please help. Here is the code: ...
HappyFish's user avatar
0 votes
0 answers
23 views

Encountered NaN value in between pipeline steps, sklearn's custom estimators and imblearn's custom sampler

I was trying custom estimator and custom sampler.MyFeatureConcator and MyFeatureResampler are the custom estimators that I would like to use as steps in my pipeline. The error encountered is as: ...
Sid's user avatar
  • 1
1 vote
1 answer
44 views

Data Shape Issues in SKL Pipeline using TFIDF

I am stumped on an issue with Python/Sci-Kit Learn/Pipelines. I am receiving an error that the shape of the data as it passes through the pipeline is not what is expected. Specific error: blocks[0,:] ...
Josh Willis's user avatar
1 vote
0 answers
30 views

The features selected by SelectKBest do not match those transformed by ColumnTransformer

I am in the process of deploying a machine learning model for study purposes and I have some questions about it: My POST method will send to the API my original features (without transformations ...
leandro.starke's user avatar
0 votes
0 answers
29 views

How to implement pipeline into machine learning model

I would like to implement Onehot encoding and label encoding to my dataset using Pipeline into my random forest model. I have created a function that utilize pipeline from scikit learn together with ...
Stackie's user avatar
1 vote
0 answers
65 views

Passing Sample Weights to Sklearn Pipeline object with XGBoost

There are some good questions on this topic, however, I haven't found any solution to this error involving using XGBoost models with sample_weight in sklearn's Pipeline framework. Here is my example ...
a.powell's user avatar
  • 1,722
1 vote
1 answer
32 views

Label encoder the target in Pipeline

I want to create a pipeline to do preprocessing in both training features and target, then train the model. Dataset would be something like: v1 v2 target 0 1 a yes 1 5 c no 2 3 f ...
Fernando Quintino's user avatar
0 votes
0 answers
51 views

Sklearn preprocessors work sequentially but produce NAs when used in Pipeline

Here's the context: I'm working with a dataset containing various feature types (numerical, categorical). My task is the binary prediction of startup success dependent on a target variable defined ...
Elias Hofmann's user avatar
1 vote
0 answers
62 views

Pipeline for ML model using LabelEncoding in a Transformer [duplicate]

I'm attempting to incorporate various transformations into a scikit-learn pipeline along with a LightGBM model. This model aims to predict the prices of second-hand vehicles. Once trained, I plan to ...
alexquilis1's user avatar
0 votes
1 answer
83 views

Error using a custom transformer in an SKLearn pipeline, but not as a standalone transformer

As an exercise I'm trying to create a custom transformer that takes a dataset and labels and returns the transformed dataset keeping only those columns with a correlation with the labels above a ...
Dargscisyhp's user avatar
0 votes
0 answers
45 views

Dynamically set K value of SelectKBest

I am using SelectKBest in my pipeline and I want to be able to configure the number of features I want to select using a config.ini file. So essentially in the .ini file I have this : # ...
Hahanaki's user avatar
0 votes
0 answers
29 views

Sklearn : ValueError feature shape during training is different than feature shape during validation

I'm trying to use sklearn to build a custom Pipeline for a school project that uses ML to analyze text. I have established some logging into my custom Transformers and am encountering an issue that ...
Hahanaki's user avatar
1 vote
2 answers
884 views

Python raises an AttributeError when methods on the sklearn Pipeline object are called

Problem I am calling the fit_transform() and transform() methods on a Pipeline object, but Python is raising an AttributeError whenever I try to do so. Here is what I'm trying to run, with imports. (...
Martin's user avatar
  • 25
0 votes
0 answers
60 views

Sklearn: Extract feature names after model fitting with polynomialFeature, onehot encoding and OrdinalEncoder

As suggested in many other posts e.g., there are ways of extracting relevant feature names. However, How do I make sure that feature names align/are in the same order as the model.coef_? The structure ...
abalone's user avatar
-2 votes
1 answer
39 views

Problems creating a transformer for a pipeline

Right now I'm trying to create a pipeline that initially use Random Oversampling, and the second step I want to use is a custom outlier remover, but I'm having problems executing that pipeline. That ...
Roterun's user avatar
0 votes
0 answers
22 views

ColumnTransformer and Pipelines: how to properly use it

I am trying to build a pipeline but everytime I get rid off some issue, I end with a new one. ColumnTransformer is really playing with me. I want to make some transformations in some columns of a ...
Dimitri's user avatar
  • 119
0 votes
2 answers
221 views

Not sure on how to use the make_pipeline of sklearn correctly

I am playing around with the titanic ddataset and trying to make a correct usage of the sklearn make_pipeline, but I'm becoming a little confused on how tu correctly build the pipelines. Here's the ...
Dimitri's user avatar
  • 119
2 votes
3 answers
590 views

how to properly incorporate early stopping validation in sklearn Pipeline with ColumnTransformer

I want to setup a lightGBM model with early stop validation. I also want to follow the best practice of using Pipeline to combine preprocessing and model fitting and prediction. Code below: ...
PingPong's user avatar
  • 965
0 votes
1 answer
48 views

ColumnTransformer with non-trivially intersecting column domains

I'm working with a housing dataset containing both numerical and categorical data. The only missing values in my data occur in two of the numerical features. As an example, consider X and y given by ...
vonbecker's user avatar
0 votes
0 answers
65 views

Including multiple dataset transformers in custom transformer

Here is my custom transformer, meant to transform the subject dataframe of encoding and scaling: class DfGrooming(BaseEstimator, TransformerMixin): def __init__(self): self....
Aditya Shandilya's user avatar
3 votes
1 answer
200 views

Pass parameters across sklearn pipelines

I am writing a custom sklearn pipeline as follows: Step 1: class Step1(BaseEstimator, TransformerMixin): def __init__(self, input1: str = "Input1") -> None: self.input1 = ...
Ach Raf's user avatar
  • 31
0 votes
0 answers
90 views

Permutation feature importance on features transformed within a pipeline (sklearn)

A similar issue has been raised earlier. I need to compute feature importance of preprocessed features via sklearn.inspection.permutation_importance. The preprocessing is implemented within a pipeline....
victoris_93's user avatar
0 votes
1 answer
38 views

Error with encoding categorical data in order

Data source text I am trying to encode the categorical data columns sex with Ohe and Blood Pressure and Diet with Oe, and then scale the data before passing it through a classifier in a pipeline. ...
111's user avatar
  • 17
0 votes
1 answer
175 views

Pipeline for Machine Learning Model has 'Feature shape mismatch' when trying to predict the target for a single observation

Here is the outline of my Machine Learning / Python project: Build a ColumnTransformer called preprocessor containing multiple transformers (e.g. One Hot Encoding, Ordinal Encoding etc) Build a ...
Zoe's user avatar
  • 35
2 votes
1 answer
460 views

How to manually select features for Scikit-Learn model regression?

There are various methods for doing automated feature selection in Scikit-learn. E.g. my_feature_selector = SelectKBest(score_func=f_regression, k=3) my_feature_selector.fit_transform(X, y) The ...
Bill's user avatar
  • 11.6k
-2 votes
1 answer
319 views

how to use SHAP library for text classification?

i have text data and pip line model . i want to using shap library to Visualize the impact on all the output classes i got this error : TypeError: The passed model is not callable and cannot be ...
MelinA's user avatar
  • 1
0 votes
1 answer
46 views

How can I force a GridSearchCV model (or a pipeline model) to use a given hyparameter value?

I have used GridSearchCV to find the best hyperparameters of a regularized logistic model. It also includes a pipeline to impute and standardize the covariates. numeric_cols = X_train.select_dtypes(...
skan's user avatar
  • 7,710
0 votes
0 answers
101 views

The features are not getting considered despite of adding feature selection in sklearn Pipeline

The pipeline has FeatureSelection to it, but it is not taking the updated Feature Values. This is how my pipeline looks like: # Define pipeline pipeline = ImbPipeline(steps=[ ('preprocessor', ...
Kaiwalya Patil's user avatar
1 vote
1 answer
173 views

Invalid parameter 'logisticregression' for estimator Pipeline. GridSearchCV and ColumnTransformer

I'm trying to perform a GridSearchCV including a pipeline. I want to impute and standardize the numerical variables. And just impute the categorical ones. I've tried to do it like this: numeric_cols = ...
skan's user avatar
  • 7,710
-2 votes
1 answer
128 views

DATA INGESTION -TypeError: cannot unpack non-iterable NoneType object

I am getting this error in data ingestion part (training pipeline). I am trying to run trainining_pipeline.py and this error shows up. Full traceback: Traceback (most recent call last): File "...
bhavay bukkal's user avatar
4 votes
2 answers
147 views

sklearn transformer for outlier removal - returning xy?

I am trying to remove rows that are labeled outliers. I have this partially working, but not in the context of a pipeline and I am not sure why. from sklearn.datasets import make_classification X1, ...
mmann1123's user avatar
  • 5,275
1 vote
1 answer
181 views

Sklearn pipeline with LDA and KNN

I try to use LinearDiscriminantAnalysis (LDA) class from sklearn as preprocessing part of my modeling to reduce the dimensionality of my data, and after applied a KNN classifier. I know that a good ...
Adrien Riaux's user avatar
1 vote
1 answer
294 views

Drop a step from a sklearn pipeline using the step name

How to remove a step from a sklearn pipeline using the step name? By position I know that it can be done: pipeline.steps.pop(n) But with a very large pipeline, it can be difficult to find the ...
Slevin_42's user avatar
0 votes
1 answer
184 views

What is the best practice to chain DL model into sklearn Pipeline() stages and still access hyperparameters e.g, batch_size \ epochs in pipeline?

I want to experiment DL regression model over time-series data by implementing the model using sklearn pipeline() properly. I formed the following DL model in the form of the class WaveNet and would ...
Mario's user avatar
  • 1,960
1 vote
1 answer
36 views

'Vect' not defined sklearn logistic regression error message

So I have this pipeline i used for a text classifier that works fine. from sklearn.feature_extraction.text import TfidfTransformer from sklearn.feature_extraction.text import CountVectorizer from ...
Barri's user avatar
  • 44
0 votes
1 answer
807 views

Sklearn. Pipeline. Several transformers. get_feature_names_out

I'he realised custom transformer of sklearn, where I porcess a column of text data. I create a pipeline, where I combine two transofrmers - NameTransformer, OneHotEncoder. But I have got an error. ...
Anton Troitsky's user avatar
1 vote
2 answers
934 views

How run sklearn.preprocessing.OrdinalEncoder on several columns?

this code raise error: import pandas as pd from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.preprocessing import OrdinalEncoder # Define categorical ...
parvij's user avatar
  • 1,390
0 votes
2 answers
664 views

Problem With Scikit Learn One Hot and Ordinal Encoders

I'm having a problem with Scikit Learn's one-hot and ordinal encoders that I hope someone can explain to me. I'm following along with a Towards Data Science article that uses a Kaggle data set to ...
duffymo's user avatar
  • 308k
1 vote
1 answer
38 views

What object is a sklearn.pipeline.Pipeline that applies a ColumnTransformer actually fitting on when fit(X, Y) is called on it

I am trying to get an idea of the inner workings of a scikit learn Pipeline. Consider the below data set and pipeline construction. data = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], '...
gebruiker's user avatar
  • 117
1 vote
2 answers
2k views

What is the correct order in data preprocessing stage for Machine Learning?

I am trying to create some sort of step-by-step guide/cheat sheet for myself on how to correctly go over the data preprocessing stage for Machine Learning. Let's imagine we have a binary ...
Yara1994's user avatar
  • 391
1 vote
1 answer
206 views

Staged_predict from a Pipeline object

I am having the same issue which was outlined years ago here: https://github.com/scikit-learn/scikit-learn/issues/10197 It seems to not have been resolved so I am looking for a work around. The ...
Keith's user avatar
  • 4,894
2 votes
1 answer
964 views

Using sample_weight param with XGBoost through a pipeline

I want to use the sample_weight parameter with XGBClassifier from the xgboost package. The problem happen when I want to use it inside a pipeline from sklearn.pipeline. from sklearn.preprocessing ...
Will's user avatar
  • 1,835
2 votes
1 answer
194 views

Return pipeline score as one of multiple evaluation metrics

I am using a pipeline in a hyperparameter gridsearch in sklearn. I would like the search to return multiple evaluation scores - one a custom scoring function that I wrote, and the other the default ...
David Pellow's user avatar
0 votes
1 answer
40 views

Extracting feature importances along with column names from sklearn pipeline

I have a sklearn pipeline with two steps (a columntransformer preprocessor with a One hot encoder and a randomforestregressor estimator). I would like to get the feature names of the encoded columns ...
Sherwin R's user avatar
0 votes
1 answer
285 views

Error using categorical data in Pipeline with OneHotEncoder

I would like to build a pipeline to predict 'Survival' from the three features 'SibSp_category', 'Parch_category', 'Embarked'. In the preprocessing step, I use (1) OrdinalEncoder to convert the ...
ja_doe's user avatar
  • 3
3 votes
1 answer
147 views

How to specify the parameter for FeatureUnion to let it pass to underlying transformer

In my code, I am trying to access the sample_weight of the StandardScaler. However, this StandardScaler is within a Pipeline which again is within a FeatureUnion. I can't seem to get this parameter ...
Olivier_s_j's user avatar
  • 5,152
0 votes
1 answer
685 views

Key Error when passing list of input in .predict() using Pipeline

From what I found out when trying myself and reading here on stackoverflow, When I pass a pandas dataframe to .predict(), it successfully gives me a prediction value. Like below: pipe = Pipeline([('...
Tejas Padhye's user avatar
1 vote
2 answers
631 views

sklearn to pmml pipeline how to apply postprocessing linear trasnformation

I'm having a tough time trying to apply a postprocessing step with the sklearn2pmml packages. What I'm trying to do is to apply a linear transformation after applying the predict_proba method within ...
Tom's user avatar
  • 606
0 votes
1 answer
377 views

sklearn to pmml, cant create pipeline for preprocessing step of categorical columns

I'm having a tough time trying to create a PMML pipeline in the library sklearn2pmml (python). I want to convert categorical variables to numerical ones by reasigning them but don't have any clue, I ...
Tom's user avatar
  • 606

1
2 3 4 5
11