Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers
11 views

HistGradientBoostingClassifier - ValueError: could not convert string to float

I'm running a HistGradientBoostingClassifier with some categorical features and getting a 'ValueError: Columns must be same length as key' error. I defined the categorical features through the ...
user28824349's user avatar
-3 votes
1 answer
36 views

How could I make the accuracy better in my decision tree? [closed]

Here is my code # Prepare target and features target_column = 'resolution' X = data.drop(columns=[target_column]) y = data[target_column] # Convert categorical data to binary/numerical where needed ...
kevineriksson's user avatar
0 votes
0 answers
24 views

What is the alpha parameter for L1 normalization in scikit-learn QuantileRegressor [migrated]

An example in the documentation for scikit-learn Quantile Regression shows an example where a parameter alpha is set to zero. The default is 1. The documentation for QuantileRegressor shows the ...
user2138149's user avatar
1 vote
1 answer
55 views

Getting "TypeError: ufunc 'isnan' not supported for the input types"

I am doing a Machine Learning project to predict the prices of electric cars on Jupyter Notebook. I run these cells: from sklearn import preprocessing le = preprocessing.LabelEncoder() cols = ['County'...
Steve Austin's user avatar
1 vote
0 answers
26 views

How to save XGBoost model in ipynb and load in javascript in order to call the model and prompt input from user and get predicted value?

How can I link the trained model (ipynb or python file in local) into javascript (Front-end? I have a trained XGB model using some features(float) to predict one value (carbon intensity). I want to ...
Ryan's user avatar
  • 15
-1 votes
0 answers
30 views

normalization vs minmax scaling on iris dataset [closed]

I am running some experiments on the Iris Dataset. I am facing different behaviours between MinMaxScaler and minimize. Even though I know I shouldn't normalize nor standardize data, I tried it (for ...
Mattia Sospetti's user avatar
2 votes
1 answer
27 views

Create custom kernel for GPR

I would like to write a RBF kernel that is working only in a specific range on X axis. I tried to write a class that contains a RBF kernel to test the code class RangeLimitedRBFTest(Kernel): def ...
crema997's user avatar
0 votes
0 answers
66 views

KNeighborsClassifier predict throws "Expected 2D array, got 1D array instead" [duplicate]

I am writing an image similarity algorithm. I am using cv2.calcHist to extract image features. After the features are created I save them to a json file as a list of numpy.float64: list(numpy.float64(...
papaya's user avatar
  • 1
0 votes
1 answer
40 views

ImportError: cannot import name 'Memory' from 'joblib' [closed]

I get an error code as: ImportError: cannot import name 'Memory' from 'joblib' This happened after installing some new packages, probably tensorflow-intel. How to solve this or which versions I have ...
Stine's user avatar
  • 21
0 votes
1 answer
49 views

ValueError: could not convert string to float: 'N/' [closed]

I am working on a Machine Learning project that predicts the price electric cars on Jupyter Notebook. I run this cell on Jupyter Notebook: p = regressor.predict(df2) I get this error: ValueError ...
Steve Austin's user avatar
0 votes
1 answer
36 views

Scikit-learn estimator that accepts non array-like input

I am developing a scikit-learn estimator, the documentation states that the input "X" for the .fit() method should be an "array-like of shape (n_samples, n_features)". However, my ...
Fumagalli's user avatar
-2 votes
0 answers
30 views

NotFittedError: This DecisionTreeClassifier instance is not fitted yet in Jupyter [closed]

I am working on a Machine Learning project on analyzing the cheapest electric vehicles on Jupyter Notebook. I run this cell on Jupyter Notebook: fig = plt.figure(figsize=(25,20)) tree.plot_tree(clf) ...
Steve Austin's user avatar
0 votes
0 answers
11 views

TruncatedSVD and randomized_svd from sklearn yield a different matrix U

I recently ran into the following dilemma when using 2 sklearn implementations: Suppose I have an array A of the shape (p, q) that I want to decompose using truncated SVD for let's say the 10 first ...
Julián Andrés Hernández Potes's user avatar
-1 votes
0 answers
37 views

Fit multi- variables polynomial model with Python [closed]

How can I use python packages to fit a polynomial model that connects between data set x: xa1 xb1 xc1 xa2 xb2 xc2 ... . . to a data set y: ya1 yb1 yc1 ya2 yb2 yc2 ... . . . I am trying to fit a ...
Shely's user avatar
  • 49
0 votes
0 answers
50 views

HistGradientBoostingClassifier Using Categorical Variables

I landed on the HistGradientBoostingClassifier because, according to the docs, it has native support for categorical variables without the need for one-hot encoding or label encoding. I tried to run ...
aaron_paynev4's user avatar
0 votes
0 answers
25 views

AttributeError: module 'scikit_posthocs' has no attribute 'plot_cd_diagram'

I'm trying to plot the critical difference diagram in Nemenyi post-hoc test using the scikit-learn. # Step 6: Visualize the Results with Critical Difference Diagram plt.figure(figsize=(8, 6)) sp....
Jyoti's user avatar
  • 55
-1 votes
0 answers
19 views

BracketError when using diffxpy for a scRNAseq experiment

I am trying to use diffxpy to do differential gene expression analysis between 2 samples (batch) in one cell type from a scRNAseq experiment and I am using the following code: #subsetting an adata ...
CATA_Bioinfo's user avatar
0 votes
0 answers
22 views

Sagemaker's SklearnModel requirements.txt not getting installed

This is my code: from sagemaker.sklearn import SKLearnModel role = sagemaker.get_execution_role() model = SKLearnModel( model_data= f"s3://{default_bucket}/{prefix}/model.tar.gz", ...
Daniele Gentili's user avatar
0 votes
2 answers
35 views

Feature Importance with ColumnTransform and OneHotEncoder in RandomForestClassifier

Apologies for bothering you, but I haven't been able to find a definitive answer after searching the site. I'm building a RandomForestClassifier on some clinical data where the target variable (...
Aezhel's user avatar
  • 11
0 votes
1 answer
18 views

Implementing sklearn.ensemble.GradientBoostingRegressor with sklearn.multioutput.MultiOutputRegressor and sklearn.model_selection.RandomizedSearchCV

I'm trying to create models that support multivariate output. One of the models I'm trying to use is the GradientBoostingRegressor which does not natively support multivariate output. There is a ...
Cesinco's user avatar
  • 63
0 votes
0 answers
30 views

Polynomial regression with 2D array

I have two 2D datasets Xand Y, which respectively represent experimental and theoretical data. In the two cases, each row corresponds to a sample (one physical configuration), and each column ...
booo's user avatar
  • 29
0 votes
0 answers
46 views

Cannot import name '_check_array_key' from 'skfda._utils' in Python 3.12.7

I am trying to use skfda library in my python code. I have already installed the package using `pip install scikit-fda, however, the error was a little bit strange for me. Can anyone explain why this ...
minh chanh le's user avatar
0 votes
0 answers
29 views

How to use Keras with Optuna tuning and Sklearn Pipeline

I am developing a model using Keras and use Optuna for the hyperameter tuning. I need to use K-fold method for the development. However, I cannot successfully run it. Please help. Here is the code: ...
HappyFish's user avatar
-1 votes
0 answers
39 views

One Hot Encoding Feature Mismatch Issue

I am doing a Kaggle Challenge which requires us to predict the 12 product ids customers are most likely to purchase based on their past history. It would take way too long to go through all the ...
Dave Patel's user avatar
0 votes
0 answers
55 views

How to reduce the size of Numpy data type

I am using Python to do cosine similarity. similarity_matrix = cosine_similarity(tfidf_matrix) The problem is that I am getting this error MemoryError: Unable to allocate 44.8 GiB for an array with ...
asmgx's user avatar
  • 7,960
1 vote
1 answer
69 views

Ignore NaN to calculate mean_absolute_error

I'm trying to calculate MAE (Mean absolute error). In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE. But, in column 2, I have some NaN values. When ...
Daniel M M's user avatar
0 votes
0 answers
30 views

Efficient parallelization of silhouette score calculation

I have a large dataset (2 million rows, 100 columns), and I need to perform clusterization. I used the elbow method to determine the optimal number of clusters. However, in order to get a more refined ...
AbliusKarfax's user avatar
-2 votes
1 answer
73 views

Problem in installing and using scikit-learn in Python

I want to run the following code in jupyter-notebook from sklearn.linear_model import SGDRegressor before that, I have used the following lines in windows cmd: python -m venv sklearn-env sklearn-env\...
Mohammad Ghadian's user avatar
-2 votes
1 answer
65 views

Cannot convert dataframe column to a int64 data type

I have a problem. In my Pandas DataFrame, I have a column called 'job' column. I've created a simple and custom transformer that will map values in that column that corresponds to the type of job. The ...
coffee_programmer's user avatar
0 votes
2 answers
49 views

Columns are missing after imputing and creating dummy variables. How should I fix this?

In short: My columns are different between train set and test set after imputing. Code of making train, test dataset random_state_value = 0 #Define target X = data.drop(columns = 'income', axis=1) y =...
Emma Sul's user avatar
-1 votes
1 answer
50 views

How to get scikit-learn to ensure that all prediction outputs should sum to 100%?

I have a 'MultiOutputRegressor' which is based on a 'LinearRegression' regressor. I am using it to predict three outputs per row of X_data (like a classifier) which represent the percentage likelihood ...
Richard's user avatar
  • 1,132
0 votes
1 answer
65 views

How to create a scaler applying log transformation and MinMaxScaler in sklearn

I want to apply log() to my DataFrame and MinMaxScaler() together. I want the output to be a pandas DataFrame() with indexes and columns from the original data. I want to use the parameters used to ...
Guilherme Parreira's user avatar
0 votes
2 answers
63 views

how do i set ‘random_state’ correctly so that my results are always the same?

If I have for example this snippet of code: knn = KNeighborsClassifier() grid_search_knn = GridSearchCV( estimator=knn, n_jobs=-1) Do I have to set it like this: knn = KNeighborsClassifier(...
markusnether's user avatar
1 vote
1 answer
42 views

scikit-learn classifiers and regressors caching training data?

I have some 22,000 rows of training data. I use train_test_split to get training and testing data. I run fitting and then get some idea of how well fitting went using various methods or estimation. I ...
Richard's user avatar
  • 1,132
0 votes
0 answers
23 views

Encountered NaN value in between pipeline steps, sklearn's custom estimators and imblearn's custom sampler

I was trying custom estimator and custom sampler.MyFeatureConcator and MyFeatureResampler are the custom estimators that I would like to use as steps in my pipeline. The error encountered is as: ...
Sid's user avatar
  • 1
0 votes
0 answers
133 views

Recursive one-step forecasting in timeseries model

I am trying to implement a recursive one-step forecasting approach for a Random Forest model. The idea is to get a 12-months forecast in an iterative way where each prediction becomes part of the ...
seralouk's user avatar
  • 33k
0 votes
0 answers
53 views

Error when runing mlflow with sklearn model

I'm training a RandomForestRegressor and keeping track of it using mlflow Using the following code works perfectly only when n_estimators is lower than 90 The code: import mlflow import mlflow.sklearn ...
Ivan's user avatar
  • 1,433
0 votes
1 answer
39 views

How to apply multiple estimator on multiple number of features to select the combination with highest f1 score?

I would like to run recursive feature elimination with multiple estimator algorithms on multiple number of features and keep the highest f1 score combination on the test data. Instead of reviewing the ...
user20159866's user avatar
0 votes
0 answers
42 views

How can I install scikit-learn from source using my locally modified clone

I have made modifications to the scikit-learn codebase that I want to test locally, but when I run pytest, I get the error "scikit-learn is not built correctly". So I try to build it by ...
success moses's user avatar
3 votes
2 answers
76 views

How to preserve data types when working with pandas and sklearn transformers?

While working with a large sklearn Pipeline (fit using a DataFrame) I ran into an error that lead back to a wrong data type of my input. The problem occurred on an a single observation coming from an ...
Woodly0's user avatar
  • 434
1 vote
2 answers
64 views

How do I onehotencode a single column in a dataframe?

I have a dataframe called "vehicles" with 8 columns. 7 are numerical but the column named 'Car_name' which is index 1 in the dataframe and is categorical. i need to encode it i tried this ...
levi mungai's user avatar
1 vote
1 answer
45 views

How to check whether an sklearn estimator is a scaler?

I'm writing a function that needs to determine whether an object passed to it is an imputer (can check with isinstance(obj, _BaseImputer)), a scaler, or something else. While all imputers have a ...
ascripter's user avatar
  • 6,205
-3 votes
2 answers
64 views

ValueError: could not convert string to float: '?' while working with MSE

I am using the auto-mpg dataset . I am giving the link of the dataset below: https://www.kaggle.com/datasets/uciml/autompg-dataset I am giving the code below: df = pd.read_csv('data/auto-mpg.csv') df....
Sahil Mantoo's user avatar
1 vote
1 answer
62 views

How to pass parameters to this sklearn Cox model in a Pipeline?

If I run the following Python code it works well: target = 'churn' tranOH = ColumnTransformer([ ('one', OneHotEncoder(drop='first', dtype='int'), make_column_selector(dtype_include='category', ...
skan's user avatar
  • 7,710
-1 votes
1 answer
43 views

Why does mean prediction go flat after more data points are added in Gaussian Process Regressor

Im trying to do a Bayesian optimization in a robot simulator to find optimal Kd and Kp values that fit a desired trajectory (Sinusoidal motion). First I make some random movements of the arm using ...
ethandsz's user avatar
4 votes
2 answers
157 views

Python Regime Labeling Using Explicit Threshold for Increase/Decrease from peak to trough

I am trying to find the longest stretches of time in a time series where the value from start to end increases by at least a certain threshold without any declines in the interim by at least that ...
cpage's user avatar
  • 39
0 votes
1 answer
44 views

Is it normal for a class in multiclass SVM to have nearly all data points as support vectors?

I’m using scikit-learn’s SVC for multiclass classification of the iris dataset and one class has almost all its data points as support vectors. Is this expected, or could there be an issue with my ...
paul's user avatar
  • 21
0 votes
1 answer
68 views

Gaussian process binary classification: why is the variance with GPy much smaller than with scikit-learn?

I'm learning about binary classification with Gaussian Processes and I am comparing GPy with scikit-learn on a toy 1D problem inspired by Martin Krasser's blog post. Both implementations (GPy and ...
olamarre's user avatar
  • 175
2 votes
1 answer
62 views

PNG conversion into Scikit learn digit format [closed]

I would need some help for converting a PNG RGBA into the correct format for digit recognition in Scikit learn. This is my code image = Image.open(image_path) print (image.size) print (image.mode) ...
Fede Lists's user avatar
1 vote
1 answer
38 views

Grid Search gives nan for best_score with LOGO or LOO, not k-fold CV

I have a nan R2 score problem with grid search. FDODB=pd.read_excel('Final Training Set for LOGO.xlsx') array = FDODB.values X = array[:,2:126] Y = array[:,1] Compd = array[:,0] scaler = ...
Nakyung Lee's user avatar

1
2 3 4 5
440