Newest 'python+scikit-learn' Questions

0 votes

0 answers

11 views

HistGradientBoostingClassifier - ValueError: could not convert string to float

I'm running a HistGradientBoostingClassifier with some categorical features and getting a 'ValueError: Columns must be same length as key' error. I defined the categorical features through the ...

user28824349

1

asked 1 hour ago

-3 votes

1 answer

36 views

How could I make the accuracy better in my decision tree? [closed]

Here is my code # Prepare target and features target_column = 'resolution' X = data.drop(columns=[target_column]) y = data[target_column] # Convert categorical data to binary/numerical where needed ...

kevineriksson

7

asked 20 hours ago

0 votes

0 answers

24 views

What is the alpha parameter for L1 normalization in scikit-learn QuantileRegressor [migrated]

An example in the documentation for scikit-learn Quantile Regression shows an example where a parameter alpha is set to zero. The default is 1. The documentation for QuantileRegressor shows the ...

user2138149

16k

asked 2 days ago

1 vote

1 answer

55 views

Getting "TypeError: ufunc 'isnan' not supported for the input types"

I am doing a Machine Learning project to predict the prices of electric cars on Jupyter Notebook. I run these cells: from sklearn import preprocessing le = preprocessing.LabelEncoder() cols = ['County'...

Steve Austin

29

asked 2 days ago

1 vote

0 answers

26 views

How to save XGBoost model in ipynb and load in javascript in order to call the model and prompt input from user and get predicted value?

How can I link the trained model (ipynb or python file in local) into javascript (Front-end? I have a trained XGB model using some features(float) to predict one value (carbon intensity). I want to ...

Ryan

15

asked Dec 13 at 4:51

-1 votes

0 answers

30 views

normalization vs minmax scaling on iris dataset [closed]

I am running some experiments on the Iris Dataset. I am facing different behaviours between MinMaxScaler and minimize. Even though I know I shouldn't normalize nor standardize data, I tried it (for ...

Mattia Sospetti

11

asked Dec 12 at 15:53

2 votes

1 answer

27 views

Create custom kernel for GPR

I would like to write a RBF kernel that is working only in a specific range on X axis. I tried to write a class that contains a RBF kernel to test the code class RangeLimitedRBFTest(Kernel): def ...

crema997

21

asked Dec 11 at 11:00

0 votes

0 answers

66 views

KNeighborsClassifier predict throws "Expected 2D array, got 1D array instead" [duplicate]

I am writing an image similarity algorithm. I am using cv2.calcHist to extract image features. After the features are created I save them to a json file as a list of numpy.float64: list(numpy.float64(...

papaya

1

asked Dec 10 at 21:03

0 votes

1 answer

40 views

ImportError: cannot import name 'Memory' from 'joblib' [closed]

I get an error code as: ImportError: cannot import name 'Memory' from 'joblib' This happened after installing some new packages, probably tensorflow-intel. How to solve this or which versions I have ...

Stine

21

asked Dec 10 at 15:25

0 votes

1 answer

49 views

ValueError: could not convert string to float: 'N/' [closed]

I am working on a Machine Learning project that predicts the price electric cars on Jupyter Notebook. I run this cell on Jupyter Notebook: p = regressor.predict(df2) I get this error: ValueError ...

Steve Austin

29

asked Dec 10 at 15:00

0 votes

1 answer

36 views

Scikit-learn estimator that accepts non array-like input

I am developing a scikit-learn estimator, the documentation states that the input "X" for the .fit() method should be an "array-like of shape (n_samples, n_features)". However, my ...

Fumagalli

3

asked Dec 10 at 8:02

-2 votes

0 answers

30 views

NotFittedError: This DecisionTreeClassifier instance is not fitted yet in Jupyter [closed]

I am working on a Machine Learning project on analyzing the cheapest electric vehicles on Jupyter Notebook. I run this cell on Jupyter Notebook: fig = plt.figure(figsize=(25,20)) tree.plot_tree(clf) ...

Steve Austin

29

asked Dec 9 at 20:32

0 votes

0 answers

11 views

TruncatedSVD and randomized_svd from sklearn yield a different matrix U

I recently ran into the following dilemma when using 2 sklearn implementations: Suppose I have an array A of the shape (p, q) that I want to decompose using truncated SVD for let's say the 10 first ...

Julián Andrés Hernández Potes

3

asked Dec 9 at 17:14

-1 votes

0 answers

37 views

Fit multi- variables polynomial model with Python [closed]

How can I use python packages to fit a polynomial model that connects between data set x: xa1 xb1 xc1 xa2 xb2 xc2 ... . . to a data set y: ya1 yb1 yc1 ya2 yb2 yc2 ... . . . I am trying to fit a ...

Shely

49

asked Dec 8 at 12:49

0 votes

0 answers

50 views

HistGradientBoostingClassifier Using Categorical Variables

I landed on the HistGradientBoostingClassifier because, according to the docs, it has native support for categorical variables without the need for one-hot encoding or label encoding. I tried to run ...

aaron_paynev4

1

asked Dec 3 at 19:17

0 votes

0 answers

25 views

AttributeError: module 'scikit_posthocs' has no attribute 'plot_cd_diagram'

I'm trying to plot the critical difference diagram in Nemenyi post-hoc test using the scikit-learn. # Step 6: Visualize the Results with Critical Difference Diagram plt.figure(figsize=(8, 6)) sp....

Jyoti

55

asked Dec 3 at 16:47

-1 votes

0 answers

19 views

BracketError when using diffxpy for a scRNAseq experiment

I am trying to use diffxpy to do differential gene expression analysis between 2 samples (batch) in one cell type from a scRNAseq experiment and I am using the following code: #subsetting an adata ...

CATA_Bioinfo

1

asked Dec 3 at 5:26

0 votes

0 answers

22 views

Sagemaker's SklearnModel requirements.txt not getting installed

This is my code: from sagemaker.sklearn import SKLearnModel role = sagemaker.get_execution_role() model = SKLearnModel( model_data= f"s3://{default_bucket}/{prefix}/model.tar.gz", ...

Daniele Gentili

609

asked Dec 2 at 23:51

0 votes

2 answers

35 views

Feature Importance with ColumnTransform and OneHotEncoder in RandomForestClassifier

Apologies for bothering you, but I haven't been able to find a definitive answer after searching the site. I'm building a RandomForestClassifier on some clinical data where the target variable (...

Aezhel

11

asked Dec 2 at 13:09

0 votes

1 answer

18 views

Implementing sklearn.ensemble.GradientBoostingRegressor with sklearn.multioutput.MultiOutputRegressor and sklearn.model_selection.RandomizedSearchCV

I'm trying to create models that support multivariate output. One of the models I'm trying to use is the GradientBoostingRegressor which does not natively support multivariate output. There is a ...

Cesinco

63

asked Dec 1 at 22:49

0 votes

0 answers

30 views

Polynomial regression with 2D array

I have two 2D datasets Xand Y, which respectively represent experimental and theoretical data. In the two cases, each row corresponds to a sample (one physical configuration), and each column ...

booo

29

asked Nov 26 at 14:57

0 votes

0 answers

46 views

Cannot import name '_check_array_key' from 'skfda._utils' in Python 3.12.7

I am trying to use skfda library in my python code. I have already installed the package using `pip install scikit-fda, however, the error was a little bit strange for me. Can anyone explain why this ...

minh chanh le

31

asked Nov 26 at 8:19

0 votes

0 answers

29 views

How to use Keras with Optuna tuning and Sklearn Pipeline

I am developing a model using Keras and use Optuna for the hyperameter tuning. I need to use K-fold method for the development. However, I cannot successfully run it. Please help. Here is the code: ...

HappyFish

1

asked Nov 26 at 7:32

-1 votes

0 answers

39 views

One Hot Encoding Feature Mismatch Issue

I am doing a Kaggle Challenge which requires us to predict the 12 product ids customers are most likely to purchase based on their past history. It would take way too long to go through all the ...

Dave Patel

9

asked Nov 19 at 23:21

0 votes

0 answers

55 views

How to reduce the size of Numpy data type

I am using Python to do cosine similarity. similarity_matrix = cosine_similarity(tfidf_matrix) The problem is that I am getting this error MemoryError: Unable to allocate 44.8 GiB for an array with ...

asmgx

7,960

asked Nov 17 at 16:41

1 vote

1 answer

69 views

Ignore NaN to calculate mean_absolute_error

I'm trying to calculate MAE (Mean absolute error). In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE. But, in column 2, I have some NaN values. When ...

Daniel M M

73

asked Nov 12 at 19:20

0 votes

0 answers

30 views

Efficient parallelization of silhouette score calculation

I have a large dataset (2 million rows, 100 columns), and I need to perform clusterization. I used the elbow method to determine the optimal number of clusters. However, in order to get a more refined ...

AbliusKarfax

1

asked Nov 12 at 12:26

-2 votes

1 answer

73 views

Problem in installing and using scikit-learn in Python

I want to run the following code in jupyter-notebook from sklearn.linear_model import SGDRegressor before that, I have used the following lines in windows cmd: python -m venv sklearn-env sklearn-env\...

Mohammad Ghadian

1

asked Nov 11 at 8:45

-2 votes

1 answer

65 views

Cannot convert dataframe column to a int64 data type

I have a problem. In my Pandas DataFrame, I have a column called 'job' column. I've created a simple and custom transformer that will map values in that column that corresponds to the type of job. The ...

coffee_programmer

1

asked Nov 11 at 2:00

0 votes

2 answers

49 views

Columns are missing after imputing and creating dummy variables. How should I fix this?

In short: My columns are different between train set and test set after imputing. Code of making train, test dataset random_state_value = 0 #Define target X = data.drop(columns = 'income', axis=1) y =...

Emma Sul

3

asked Nov 8 at 13:46

-1 votes

1 answer

50 views

How to get scikit-learn to ensure that all prediction outputs should sum to 100%?

I have a 'MultiOutputRegressor' which is based on a 'LinearRegression' regressor. I am using it to predict three outputs per row of X_data (like a classifier) which represent the percentage likelihood ...

Richard

1,132

asked Nov 8 at 12:12

0 votes

1 answer

65 views

How to create a scaler applying log transformation and MinMaxScaler in sklearn

I want to apply log() to my DataFrame and MinMaxScaler() together. I want the output to be a pandas DataFrame() with indexes and columns from the original data. I want to use the parameters used to ...

Guilherme Parreira

1,001

asked Nov 7 at 18:41

0 votes

2 answers

63 views

how do i set ‘random_state’ correctly so that my results are always the same?

If I have for example this snippet of code: knn = KNeighborsClassifier() grid_search_knn = GridSearchCV( estimator=knn, n_jobs=-1) Do I have to set it like this: knn = KNeighborsClassifier(...

markusnether

3

asked Nov 7 at 10:17

1 vote

1 answer

42 views

scikit-learn classifiers and regressors caching training data?

I have some 22,000 rows of training data. I use train_test_split to get training and testing data. I run fitting and then get some idea of how well fitting went using various methods or estimation. I ...

Richard

1,132

asked Nov 6 at 13:12

0 votes

0 answers

23 views

Encountered NaN value in between pipeline steps, sklearn's custom estimators and imblearn's custom sampler

I was trying custom estimator and custom sampler.MyFeatureConcator and MyFeatureResampler are the custom estimators that I would like to use as steps in my pipeline. The error encountered is as: ...

Sid

1

asked Nov 6 at 12:59

0 votes

0 answers

133 views

Recursive one-step forecasting in timeseries model

I am trying to implement a recursive one-step forecasting approach for a Random Forest model. The idea is to get a 12-months forecast in an iterative way where each prediction becomes part of the ...

seralouk

33k

asked Nov 1 at 21:53

0 votes

0 answers

53 views

Error when runing mlflow with sklearn model

I'm training a RandomForestRegressor and keeping track of it using mlflow Using the following code works perfectly only when n_estimators is lower than 90 The code: import mlflow import mlflow.sklearn ...

Ivan

1,433

asked Oct 29 at 15:30

0 votes

1 answer

39 views

How to apply multiple estimator on multiple number of features to select the combination with highest f1 score?

I would like to run recursive feature elimination with multiple estimator algorithms on multiple number of features and keep the highest f1 score combination on the test data. Instead of reviewing the ...

user20159866

11

asked Oct 28 at 19:57

0 votes

0 answers

42 views

How can I install scikit-learn from source using my locally modified clone

I have made modifications to the scikit-learn codebase that I want to test locally, but when I run pytest, I get the error "scikit-learn is not built correctly". So I try to build it by ...

success moses

1

asked Oct 25 at 6:51

3 votes

2 answers

76 views

How to preserve data types when working with pandas and sklearn transformers?

While working with a large sklearn Pipeline (fit using a DataFrame) I ran into an error that lead back to a wrong data type of my input. The problem occurred on an a single observation coming from an ...

Woodly0

434

asked Oct 23 at 13:01

1 vote

2 answers

64 views

How do I onehotencode a single column in a dataframe?

I have a dataframe called "vehicles" with 8 columns. 7 are numerical but the column named 'Car_name' which is index 1 in the dataframe and is categorical. i need to encode it i tried this ...

levi mungai

1

asked Oct 22 at 15:12

1 vote

1 answer

45 views

How to check whether an sklearn estimator is a scaler?

I'm writing a function that needs to determine whether an object passed to it is an imputer (can check with isinstance(obj, _BaseImputer)), a scaler, or something else. While all imputers have a ...

ascripter

6,205

asked Oct 21 at 9:47

-3 votes

2 answers

64 views

ValueError: could not convert string to float: '?' while working with MSE

I am using the auto-mpg dataset . I am giving the link of the dataset below: https://www.kaggle.com/datasets/uciml/autompg-dataset I am giving the code below: df = pd.read_csv('data/auto-mpg.csv') df....

Sahil Mantoo

15

asked Oct 20 at 14:49

1 vote

1 answer

62 views

How to pass parameters to this sklearn Cox model in a Pipeline?

If I run the following Python code it works well: target = 'churn' tranOH = ColumnTransformer([ ('one', OneHotEncoder(drop='first', dtype='int'), make_column_selector(dtype_include='category', ...

skan

7,710

asked Oct 16 at 18:58

-1 votes

1 answer

43 views

Why does mean prediction go flat after more data points are added in Gaussian Process Regressor

Im trying to do a Bayesian optimization in a robot simulator to find optimal Kd and Kp values that fit a desired trajectory (Sinusoidal motion). First I make some random movements of the arm using ...

ethandsz

31

asked Oct 16 at 10:59

4 votes

2 answers

157 views

Python Regime Labeling Using Explicit Threshold for Increase/Decrease from peak to trough

I am trying to find the longest stretches of time in a time series where the value from start to end increases by at least a certain threshold without any declines in the interim by at least that ...

cpage

39

asked Oct 15 at 23:30

0 votes

1 answer

44 views

Is it normal for a class in multiclass SVM to have nearly all data points as support vectors?

I’m using scikit-learn’s SVC for multiclass classification of the iris dataset and one class has almost all its data points as support vectors. Is this expected, or could there be an issue with my ...

paul

21

asked Oct 15 at 14:28

0 votes

1 answer

68 views

Gaussian process binary classification: why is the variance with GPy much smaller than with scikit-learn?

I'm learning about binary classification with Gaussian Processes and I am comparing GPy with scikit-learn on a toy 1D problem inspired by Martin Krasser's blog post. Both implementations (GPy and ...

olamarre

175

asked Oct 14 at 13:10

2 votes

1 answer

62 views

PNG conversion into Scikit learn digit format [closed]

I would need some help for converting a PNG RGBA into the correct format for digit recognition in Scikit learn. This is my code image = Image.open(image_path) print (image.size) print (image.mode) ...

Fede Lists

39

asked Oct 12 at 0:18

1 vote

1 answer

38 views

Grid Search gives nan for best_score with LOGO or LOO, not k-fold CV

I have a nan R2 score problem with grid search. FDODB=pd.read_excel('Final Training Set for LOGO.xlsx') array = FDODB.values X = array[:,2:126] Y = array[:,1] Compd = array[:,0] scaler = ...

Nakyung Lee

11

asked Oct 11 at 6:01

Collectives™ on Stack Overflow

All Questions

Related Tags