All Questions
Tagged with python scikit-learn
21,959 questions
0
votes
0
answers
11
views
HistGradientBoostingClassifier - ValueError: could not convert string to float
I'm running a HistGradientBoostingClassifier with some categorical features and getting a 'ValueError: Columns must be same length as key' error. I defined the categorical features through the ...
-3
votes
1
answer
36
views
How could I make the accuracy better in my decision tree? [closed]
Here is my code
# Prepare target and features
target_column = 'resolution'
X = data.drop(columns=[target_column])
y = data[target_column]
# Convert categorical data to binary/numerical where needed
...
0
votes
0
answers
24
views
What is the alpha parameter for L1 normalization in scikit-learn QuantileRegressor [migrated]
An example in the documentation for scikit-learn Quantile Regression shows an example where a parameter alpha is set to zero. The default is 1.
The documentation for QuantileRegressor shows the ...
1
vote
1
answer
55
views
Getting "TypeError: ufunc 'isnan' not supported for the input types"
I am doing a Machine Learning project to predict the prices of electric cars on Jupyter Notebook.
I run these cells:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
cols = ['County'...
1
vote
0
answers
26
views
How to save XGBoost model in ipynb and load in javascript in order to call the model and prompt input from user and get predicted value?
How can I link the trained model (ipynb or python file in local) into javascript (Front-end?
I have a trained XGB model using some features(float) to predict one
value (carbon intensity).
I want to ...
-1
votes
0
answers
30
views
normalization vs minmax scaling on iris dataset [closed]
I am running some experiments on the Iris Dataset.
I am facing different behaviours between MinMaxScaler and minimize.
Even though I know I shouldn't normalize nor standardize data, I tried it (for ...
2
votes
1
answer
27
views
Create custom kernel for GPR
I would like to write a RBF kernel that is working only in a specific range on X axis. I tried to write a class that contains a RBF kernel to test the code
class RangeLimitedRBFTest(Kernel):
def ...
0
votes
0
answers
66
views
KNeighborsClassifier predict throws "Expected 2D array, got 1D array instead" [duplicate]
I am writing an image similarity algorithm. I am using cv2.calcHist to extract image features. After the features are created I save them to a json file as a list of numpy.float64:
list(numpy.float64(...
0
votes
1
answer
40
views
ImportError: cannot import name 'Memory' from 'joblib' [closed]
I get an error code as:
ImportError: cannot import name 'Memory' from 'joblib'
This happened after installing some new packages, probably tensorflow-intel. How to solve this or which versions I have ...
0
votes
1
answer
49
views
ValueError: could not convert string to float: 'N/' [closed]
I am working on a Machine Learning project that predicts the price electric cars on Jupyter Notebook.
I run this cell on Jupyter Notebook:
p = regressor.predict(df2)
I get this error:
ValueError ...
0
votes
1
answer
36
views
Scikit-learn estimator that accepts non array-like input
I am developing a scikit-learn estimator, the documentation states that the input "X" for the .fit() method should be an "array-like of shape (n_samples, n_features)".
However, my ...
-2
votes
0
answers
30
views
NotFittedError: This DecisionTreeClassifier instance is not fitted yet in Jupyter [closed]
I am working on a Machine Learning project on analyzing the cheapest electric vehicles on Jupyter Notebook.
I run this cell on Jupyter Notebook:
fig = plt.figure(figsize=(25,20))
tree.plot_tree(clf)
...
0
votes
0
answers
11
views
TruncatedSVD and randomized_svd from sklearn yield a different matrix U
I recently ran into the following dilemma when using 2 sklearn implementations:
Suppose I have an array A of the shape (p, q) that I want to decompose using truncated SVD for let's say the 10 first ...
-1
votes
0
answers
37
views
Fit multi- variables polynomial model with Python [closed]
How can I use python packages to fit a polynomial model that connects between data set x:
xa1 xb1 xc1
xa2 xb2 xc2
...
.
.
to a data set y:
ya1 yb1 yc1
ya2 yb2 yc2
...
.
.
.
I am trying to fit a ...
0
votes
0
answers
50
views
HistGradientBoostingClassifier Using Categorical Variables
I landed on the HistGradientBoostingClassifier because, according to the docs, it has native support for categorical variables without the need for one-hot encoding or label encoding. I tried to run ...
0
votes
0
answers
25
views
AttributeError: module 'scikit_posthocs' has no attribute 'plot_cd_diagram'
I'm trying to plot the critical difference diagram in Nemenyi post-hoc test using the scikit-learn.
# Step 6: Visualize the Results with Critical Difference Diagram
plt.figure(figsize=(8, 6))
sp....
-1
votes
0
answers
19
views
BracketError when using diffxpy for a scRNAseq experiment
I am trying to use diffxpy to do differential gene expression analysis between 2 samples (batch) in one cell type from a scRNAseq experiment and I am using the following code:
#subsetting an adata ...
0
votes
0
answers
22
views
Sagemaker's SklearnModel requirements.txt not getting installed
This is my code:
from sagemaker.sklearn import SKLearnModel
role = sagemaker.get_execution_role()
model = SKLearnModel(
model_data= f"s3://{default_bucket}/{prefix}/model.tar.gz",
...
0
votes
2
answers
35
views
Feature Importance with ColumnTransform and OneHotEncoder in RandomForestClassifier
Apologies for bothering you, but I haven't been able to find a definitive answer after searching the site.
I'm building a RandomForestClassifier on some clinical data where the target variable (...
0
votes
1
answer
18
views
Implementing sklearn.ensemble.GradientBoostingRegressor with sklearn.multioutput.MultiOutputRegressor and sklearn.model_selection.RandomizedSearchCV
I'm trying to create models that support multivariate output. One of the models I'm trying to use is the GradientBoostingRegressor which does not natively support multivariate output. There is a ...
0
votes
0
answers
30
views
Polynomial regression with 2D array
I have two 2D datasets Xand Y, which respectively represent experimental and theoretical data. In the two cases, each row corresponds to a sample (one physical configuration), and each column ...
0
votes
0
answers
46
views
Cannot import name '_check_array_key' from 'skfda._utils' in Python 3.12.7
I am trying to use skfda library in my python code. I have already installed the package using `pip install scikit-fda, however, the error was a little bit strange for me. Can anyone explain why this ...
0
votes
0
answers
29
views
How to use Keras with Optuna tuning and Sklearn Pipeline
I am developing a model using Keras and use Optuna for the hyperameter tuning.
I need to use K-fold method for the development.
However, I cannot successfully run it.
Please help.
Here is the code:
...
-1
votes
0
answers
39
views
One Hot Encoding Feature Mismatch Issue
I am doing a Kaggle Challenge which requires us to predict the 12 product ids customers are most likely to purchase based on their past history. It would take way too long to go through all the ...
0
votes
0
answers
55
views
How to reduce the size of Numpy data type
I am using Python to do cosine similarity.
similarity_matrix = cosine_similarity(tfidf_matrix)
The problem is that I am getting this error
MemoryError: Unable to allocate 44.8 GiB for an array with ...
1
vote
1
answer
69
views
Ignore NaN to calculate mean_absolute_error
I'm trying to calculate MAE (Mean absolute error).
In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE.
But, in column 2, I have some NaN values.
When ...
0
votes
0
answers
30
views
Efficient parallelization of silhouette score calculation
I have a large dataset (2 million rows, 100 columns), and I need to perform clusterization. I used the elbow method to determine the optimal number of clusters. However, in order to get a more refined ...
-2
votes
1
answer
73
views
Problem in installing and using scikit-learn in Python
I want to run the following code in jupyter-notebook
from sklearn.linear_model import SGDRegressor
before that, I have used the following lines in windows cmd:
python -m venv sklearn-env
sklearn-env\...
-2
votes
1
answer
65
views
Cannot convert dataframe column to a int64 data type
I have a problem.
In my Pandas DataFrame, I have a column called 'job' column. I've created a simple and custom transformer that will map values in that column that corresponds to the type of job. The ...
0
votes
2
answers
49
views
Columns are missing after imputing and creating dummy variables. How should I fix this?
In short: My columns are different between train set and test set after imputing.
Code of making train, test dataset
random_state_value = 0
#Define target
X = data.drop(columns = 'income', axis=1)
y =...
-1
votes
1
answer
50
views
How to get scikit-learn to ensure that all prediction outputs should sum to 100%?
I have a 'MultiOutputRegressor' which is based on a 'LinearRegression' regressor.
I am using it to predict three outputs per row of X_data (like a classifier) which represent the percentage likelihood ...
0
votes
1
answer
65
views
How to create a scaler applying log transformation and MinMaxScaler in sklearn
I want to apply log() to my DataFrame and MinMaxScaler() together.
I want the output to be a pandas DataFrame() with indexes and columns from the original data.
I want to use the parameters used to ...
0
votes
2
answers
63
views
how do i set ‘random_state’ correctly so that my results are always the same?
If I have for example this snippet of code:
knn = KNeighborsClassifier()
grid_search_knn = GridSearchCV(
estimator=knn,
n_jobs=-1)
Do I have to set it like this:
knn = KNeighborsClassifier(...
1
vote
1
answer
42
views
scikit-learn classifiers and regressors caching training data?
I have some 22,000 rows of training data.
I use train_test_split to get training and testing data.
I run fitting and then get some idea of how well fitting went using various methods or estimation.
I ...
0
votes
0
answers
23
views
Encountered NaN value in between pipeline steps, sklearn's custom estimators and imblearn's custom sampler
I was trying custom estimator and custom sampler.MyFeatureConcator and MyFeatureResampler are the custom estimators that I would like to use as steps in my pipeline.
The error encountered is as:
...
0
votes
0
answers
133
views
Recursive one-step forecasting in timeseries model
I am trying to implement a recursive one-step forecasting approach for a Random Forest model.
The idea is to get a 12-months forecast in an iterative way where each prediction becomes part of the ...
0
votes
0
answers
53
views
Error when runing mlflow with sklearn model
I'm training a RandomForestRegressor and keeping track of it using mlflow
Using the following code works perfectly only when n_estimators is lower than 90
The code:
import mlflow
import mlflow.sklearn
...
0
votes
1
answer
39
views
How to apply multiple estimator on multiple number of features to select the combination with highest f1 score?
I would like to run recursive feature elimination with multiple estimator algorithms on multiple number of features and keep the highest f1 score combination on the test data.
Instead of reviewing the ...
0
votes
0
answers
42
views
How can I install scikit-learn from source using my locally modified clone
I have made modifications to the scikit-learn codebase that I want to test locally, but when I run pytest, I get the error "scikit-learn is not built correctly".
So I try to build it by ...
3
votes
2
answers
76
views
How to preserve data types when working with pandas and sklearn transformers?
While working with a large sklearn Pipeline (fit using a DataFrame) I ran into an error that lead back to a wrong data type of my input. The problem occurred on an a single observation coming from an ...
1
vote
2
answers
64
views
How do I onehotencode a single column in a dataframe?
I have a dataframe called "vehicles" with 8 columns. 7 are numerical but the column named 'Car_name' which is index 1 in the dataframe and is categorical. i need to encode it
i tried this ...
1
vote
1
answer
45
views
How to check whether an sklearn estimator is a scaler?
I'm writing a function that needs to determine whether an object passed to it is an imputer (can check with isinstance(obj, _BaseImputer)), a scaler, or something else.
While all imputers have a ...
-3
votes
2
answers
64
views
ValueError: could not convert string to float: '?' while working with MSE
I am using the auto-mpg dataset . I am giving the link of the dataset below:
https://www.kaggle.com/datasets/uciml/autompg-dataset
I am giving the code below:
df = pd.read_csv('data/auto-mpg.csv')
df....
1
vote
1
answer
62
views
How to pass parameters to this sklearn Cox model in a Pipeline?
If I run the following Python code it works well:
target = 'churn'
tranOH = ColumnTransformer([ ('one', OneHotEncoder(drop='first', dtype='int'),
make_column_selector(dtype_include='category', ...
-1
votes
1
answer
43
views
Why does mean prediction go flat after more data points are added in Gaussian Process Regressor
Im trying to do a Bayesian optimization in a robot simulator to find optimal Kd and Kp values that fit a desired trajectory (Sinusoidal motion). First I make some random movements of the arm using ...
4
votes
2
answers
157
views
Python Regime Labeling Using Explicit Threshold for Increase/Decrease from peak to trough
I am trying to find the longest stretches of time in a time series where the value from start to end increases by at least a certain threshold without any declines in the interim by at least that ...
0
votes
1
answer
44
views
Is it normal for a class in multiclass SVM to have nearly all data points as support vectors?
I’m using scikit-learn’s SVC for multiclass classification of the iris dataset and one class has almost all its data points as support vectors. Is this expected, or could there be an issue with my ...
0
votes
1
answer
68
views
Gaussian process binary classification: why is the variance with GPy much smaller than with scikit-learn?
I'm learning about binary classification with Gaussian Processes and I am comparing GPy with scikit-learn on a toy 1D problem inspired by Martin Krasser's blog post. Both implementations (GPy and ...
2
votes
1
answer
62
views
PNG conversion into Scikit learn digit format [closed]
I would need some help for converting a PNG RGBA into the correct format for digit recognition in Scikit learn. This is my code
image = Image.open(image_path)
print (image.size)
print (image.mode)
...
1
vote
1
answer
38
views
Grid Search gives nan for best_score with LOGO or LOO, not k-fold CV
I have a nan R2 score problem with grid search.
FDODB=pd.read_excel('Final Training Set for LOGO.xlsx')
array = FDODB.values
X = array[:,2:126]
Y = array[:,1]
Compd = array[:,0]
scaler = ...