All Questions
520 questions
0
votes
2
answers
35
views
Feature Importance with ColumnTransform and OneHotEncoder in RandomForestClassifier
Apologies for bothering you, but I haven't been able to find a definitive answer after searching the site.
I'm building a RandomForestClassifier on some clinical data where the target variable (...
0
votes
0
answers
29
views
How to use Keras with Optuna tuning and Sklearn Pipeline
I am developing a model using Keras and use Optuna for the hyperameter tuning.
I need to use K-fold method for the development.
However, I cannot successfully run it.
Please help.
Here is the code:
...
0
votes
0
answers
23
views
Encountered NaN value in between pipeline steps, sklearn's custom estimators and imblearn's custom sampler
I was trying custom estimator and custom sampler.MyFeatureConcator and MyFeatureResampler are the custom estimators that I would like to use as steps in my pipeline.
The error encountered is as:
...
1
vote
1
answer
44
views
Data Shape Issues in SKL Pipeline using TFIDF
I am stumped on an issue with Python/Sci-Kit Learn/Pipelines. I am receiving an error that the shape of the data as it passes through the pipeline is not what is expected.
Specific error:
blocks[0,:] ...
1
vote
0
answers
30
views
The features selected by SelectKBest do not match those transformed by ColumnTransformer
I am in the process of deploying a machine learning model for study purposes and I have some questions about it:
My POST method will send to the API my original features (without transformations ...
0
votes
0
answers
29
views
How to implement pipeline into machine learning model
I would like to implement Onehot encoding and label encoding to my dataset using Pipeline into my random forest model. I have created a function that utilize pipeline from scikit learn together with ...
1
vote
0
answers
65
views
Passing Sample Weights to Sklearn Pipeline object with XGBoost
There are some good questions on this topic, however, I haven't found any solution to this error involving using XGBoost models with sample_weight in sklearn's Pipeline framework.
Here is my example ...
1
vote
1
answer
32
views
Label encoder the target in Pipeline
I want to create a pipeline to do preprocessing in both training features and target, then train the model. Dataset would be something like:
v1 v2 target
0 1 a yes
1 5 c no
2 3 f ...
0
votes
0
answers
51
views
Sklearn preprocessors work sequentially but produce NAs when used in Pipeline
Here's the context:
I'm working with a dataset containing various feature types (numerical, categorical).
My task is the binary prediction of startup success dependent on a target variable defined ...
1
vote
0
answers
62
views
Pipeline for ML model using LabelEncoding in a Transformer [duplicate]
I'm attempting to incorporate various transformations into a scikit-learn pipeline along with a LightGBM model. This model aims to predict the prices of second-hand vehicles. Once trained, I plan to ...
0
votes
1
answer
83
views
Error using a custom transformer in an SKLearn pipeline, but not as a standalone transformer
As an exercise I'm trying to create a custom transformer that takes a dataset and labels and returns the transformed dataset keeping only those columns with a correlation with the labels above a ...
0
votes
0
answers
45
views
Dynamically set K value of SelectKBest
I am using SelectKBest in my pipeline and I want to be able to configure the number of features I want to select using a config.ini file. So essentially in the .ini file I have this :
# ...
0
votes
0
answers
29
views
Sklearn : ValueError feature shape during training is different than feature shape during validation
I'm trying to use sklearn to build a custom Pipeline for a school project that uses ML to analyze text. I have established some logging into my custom Transformers and am encountering an issue that ...
1
vote
2
answers
884
views
Python raises an AttributeError when methods on the sklearn Pipeline object are called
Problem
I am calling the fit_transform() and transform() methods on a Pipeline object, but Python is raising an AttributeError whenever I try to do so. Here is what I'm trying to run, with imports. (...
0
votes
0
answers
60
views
Sklearn: Extract feature names after model fitting with polynomialFeature, onehot encoding and OrdinalEncoder
As suggested in many other posts e.g.,
there are ways of extracting relevant feature names. However, How do I make sure that feature names align/are in the same order as the model.coef_?
The structure ...
-2
votes
1
answer
39
views
Problems creating a transformer for a pipeline
Right now I'm trying to create a pipeline that initially use Random Oversampling, and the second step I want to use is a custom outlier remover, but I'm having problems executing that pipeline.
That ...
0
votes
0
answers
22
views
ColumnTransformer and Pipelines: how to properly use it
I am trying to build a pipeline but everytime I get rid off some issue, I end with a new one. ColumnTransformer is really playing with me. I want to make some transformations in some columns of a ...
0
votes
2
answers
221
views
Not sure on how to use the make_pipeline of sklearn correctly
I am playing around with the titanic ddataset and trying to make a correct usage of the sklearn make_pipeline, but I'm becoming a little confused on how tu correctly build the pipelines. Here's the ...
2
votes
3
answers
590
views
how to properly incorporate early stopping validation in sklearn Pipeline with ColumnTransformer
I want to setup a lightGBM model with early stop validation. I also want to follow the best practice of using Pipeline to combine preprocessing and model fitting and prediction. Code below:
...
0
votes
1
answer
48
views
ColumnTransformer with non-trivially intersecting column domains
I'm working with a housing dataset containing both numerical and categorical data. The only missing values in my data occur in two of the numerical features. As an example, consider X and y given by
...
0
votes
0
answers
65
views
Including multiple dataset transformers in custom transformer
Here is my custom transformer, meant to transform the subject dataframe of encoding and scaling:
class DfGrooming(BaseEstimator, TransformerMixin):
def __init__(self):
self....
3
votes
1
answer
200
views
Pass parameters across sklearn pipelines
I am writing a custom sklearn pipeline as follows:
Step 1:
class Step1(BaseEstimator, TransformerMixin):
def __init__(self, input1: str = "Input1") -> None:
self.input1 = ...
0
votes
0
answers
90
views
Permutation feature importance on features transformed within a pipeline (sklearn)
A similar issue has been raised earlier. I need to compute feature importance of preprocessed features via sklearn.inspection.permutation_importance. The preprocessing is implemented within a pipeline....
0
votes
1
answer
38
views
Error with encoding categorical data in order
Data source text
I am trying to encode the categorical data columns sex with Ohe and Blood Pressure and Diet with Oe, and then scale the data before passing it through a classifier in a pipeline.
...
0
votes
1
answer
175
views
Pipeline for Machine Learning Model has 'Feature shape mismatch' when trying to predict the target for a single observation
Here is the outline of my Machine Learning / Python project:
Build a ColumnTransformer called preprocessor containing multiple transformers (e.g. One Hot Encoding, Ordinal Encoding etc)
Build a ...
2
votes
1
answer
460
views
How to manually select features for Scikit-Learn model regression?
There are various methods for doing automated feature selection in Scikit-learn.
E.g.
my_feature_selector = SelectKBest(score_func=f_regression, k=3)
my_feature_selector.fit_transform(X, y)
The ...
-2
votes
1
answer
319
views
how to use SHAP library for text classification?
i have text data and pip line model . i want to using shap library to Visualize the impact on all the output classes
i got this error :
TypeError: The passed model is not callable and cannot be ...
0
votes
1
answer
46
views
How can I force a GridSearchCV model (or a pipeline model) to use a given hyparameter value?
I have used GridSearchCV to find the best hyperparameters of a regularized logistic model.
It also includes a pipeline to impute and standardize the covariates.
numeric_cols = X_train.select_dtypes(...
0
votes
0
answers
101
views
The features are not getting considered despite of adding feature selection in sklearn Pipeline
The pipeline has FeatureSelection to it, but it is not taking the updated Feature Values.
This is how my pipeline looks like:
# Define pipeline
pipeline = ImbPipeline(steps=[
('preprocessor', ...
1
vote
1
answer
173
views
Invalid parameter 'logisticregression' for estimator Pipeline. GridSearchCV and ColumnTransformer
I'm trying to perform a GridSearchCV including a pipeline.
I want to impute and standardize the numerical variables.
And just impute the categorical ones.
I've tried to do it like this:
numeric_cols = ...
-2
votes
1
answer
128
views
DATA INGESTION -TypeError: cannot unpack non-iterable NoneType object
I am getting this error in data ingestion part (training pipeline). I am trying to run trainining_pipeline.py and this error shows up.
Full traceback:
Traceback (most recent call last):
File "...
4
votes
2
answers
147
views
sklearn transformer for outlier removal - returning xy?
I am trying to remove rows that are labeled outliers. I have this partially working, but not in the context of a pipeline and I am not sure why.
from sklearn.datasets import make_classification
X1, ...
1
vote
1
answer
181
views
Sklearn pipeline with LDA and KNN
I try to use LinearDiscriminantAnalysis (LDA) class from sklearn as preprocessing part of my modeling to reduce the dimensionality of my data, and after applied a KNN classifier. I know that a good ...
1
vote
1
answer
294
views
Drop a step from a sklearn pipeline using the step name
How to remove a step from a sklearn pipeline using the step name?
By position I know that it can be done:
pipeline.steps.pop(n)
But with a very large pipeline, it can be difficult to find the ...
0
votes
1
answer
184
views
What is the best practice to chain DL model into sklearn Pipeline() stages and still access hyperparameters e.g, batch_size \ epochs in pipeline?
I want to experiment DL regression model over time-series data by implementing the model using sklearn pipeline() properly. I formed the following DL model in the form of the class WaveNet and would ...
1
vote
1
answer
36
views
'Vect' not defined sklearn logistic regression error message
So I have this pipeline i used for a text classifier that works fine.
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from ...
0
votes
1
answer
807
views
Sklearn. Pipeline. Several transformers. get_feature_names_out
I'he realised custom transformer of sklearn, where I porcess a column of text data.
I create a pipeline, where I combine two transofrmers - NameTransformer, OneHotEncoder. But I have got an error.
...
1
vote
2
answers
934
views
How run sklearn.preprocessing.OrdinalEncoder on several columns?
this code raise error:
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OrdinalEncoder
# Define categorical ...
0
votes
2
answers
664
views
Problem With Scikit Learn One Hot and Ordinal Encoders
I'm having a problem with Scikit Learn's one-hot and ordinal encoders that I hope someone can explain to me.
I'm following along with a Towards Data Science article that uses a Kaggle data set to ...
1
vote
1
answer
38
views
What object is a sklearn.pipeline.Pipeline that applies a ColumnTransformer actually fitting on when fit(X, Y) is called on it
I am trying to get an idea of the inner workings of a scikit learn Pipeline.
Consider the below data set and pipeline construction.
data = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'...
1
vote
2
answers
2k
views
What is the correct order in data preprocessing stage for Machine Learning?
I am trying to create some sort of step-by-step guide/cheat sheet for myself on how to correctly go over the data preprocessing stage for Machine Learning.
Let's imagine we have a binary ...
1
vote
1
answer
206
views
Staged_predict from a Pipeline object
I am having the same issue which was outlined years ago here:
https://github.com/scikit-learn/scikit-learn/issues/10197
It seems to not have been resolved so I am looking for a work around. The ...
2
votes
1
answer
964
views
Using sample_weight param with XGBoost through a pipeline
I want to use the sample_weight parameter with XGBClassifier from the xgboost package.
The problem happen when I want to use it inside a pipeline from sklearn.pipeline.
from sklearn.preprocessing ...
2
votes
1
answer
194
views
Return pipeline score as one of multiple evaluation metrics
I am using a pipeline in a hyperparameter gridsearch in sklearn. I would like the search to return multiple evaluation scores - one a custom scoring function that I wrote, and the other the default ...
0
votes
1
answer
40
views
Extracting feature importances along with column names from sklearn pipeline
I have a sklearn pipeline with two steps (a columntransformer preprocessor with a One hot encoder and a randomforestregressor estimator). I would like to get the feature names of the encoded columns ...
0
votes
1
answer
285
views
Error using categorical data in Pipeline with OneHotEncoder
I would like to build a pipeline to predict 'Survival' from the three features 'SibSp_category', 'Parch_category', 'Embarked'.
In the preprocessing step, I use (1) OrdinalEncoder to convert the ...
3
votes
1
answer
147
views
How to specify the parameter for FeatureUnion to let it pass to underlying transformer
In my code, I am trying to access the sample_weight of the StandardScaler. However, this StandardScaler is within a Pipeline which again is within a FeatureUnion. I can't seem to get this parameter ...
0
votes
1
answer
685
views
Key Error when passing list of input in .predict() using Pipeline
From what I found out when trying myself and reading here on stackoverflow, When I pass a pandas dataframe to .predict(), it successfully gives me a prediction value. Like below:
pipe = Pipeline([('...
1
vote
2
answers
631
views
sklearn to pmml pipeline how to apply postprocessing linear trasnformation
I'm having a tough time trying to apply a postprocessing step with the sklearn2pmml packages. What I'm trying to do is to apply a linear transformation after applying the predict_proba method within ...
0
votes
1
answer
377
views
sklearn to pmml, cant create pipeline for preprocessing step of categorical columns
I'm having a tough time trying to create a PMML pipeline in the library sklearn2pmml (python). I want to convert categorical variables to numerical ones by reasigning them but don't have any clue, I ...