Newest 'sklearn-pandas' Questions

0 votes

0 answers

34 views

Create a new line for comma separated values in pandas column - I dont want to add new rows, I want to have same rows in output [duplicate]

I have a dataframe like this, df col1 col2 1 'abc,pqr' 2 'ghv' 3 'mrr, jig' Now I want to create a new line for each comma separated values in col2, so the output would look ...

Kallol

2,189

asked Dec 6 at 9:31

0 votes

1 answer

31 views

Timestamp issue while creating the model using pipeline in Vertex AI

I am currently utilizing the XGBoost classifier within a pipeline that includes normalization and the XGBoost model itself. The model has been successfully developed in the Notebook environment. The ...

MMM

11

asked Nov 1 at 10:19

0 votes

1 answer

31 views

Cross-Validation Function returns "Unknown label type: (array([0.0, 1.0], dtype=object),)"

Here is the full error: `--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[33], line 2 ...

nicklaus-slade

3

asked Jul 20 at 21:29

0 votes

0 answers

30 views

Issues with Converting Sklearn Logistical Regression Predicted Probabilities into Scores

I'm trying to convert a logistical regression model into user-level scores, based on this article. y_pred_df['sub_primary'] = logreg.predict_proba(y_pred_df.loc[:, [col for col in y_pred_df.columns if ...

jajastrzemb

11

asked Jul 16 at 19:11

11 votes

2 answers

122k views

How to use DataFrameMapper to delete rows with a null value in a specific column?

I am using sklearn-pandas.DataFrameMapper to preprocess my data. I don't want to impute for a specific column. I just want to drop the row if this column is Null. Is there a way to do that?

topcan5

1,707

asked Jul 13 at 15:34

1 vote

2 answers

53 views

ElasticNetCV in Python: Get full grid of hyperparameters with corresponding MSE?

I have fitted a ElasticNetCV in Python with three splits: import numpy as np from sklearn.linear_model import LinearRegression #Sample data: num_samples = 100 # Number of samples num_features = 1000 ...

george1994

261

asked Jun 26 at 8:49

2 votes

3 answers

93 views

Pandas takes all columns of a dataframe even when some columns are specified

I am trying to train KMeans model using Scikit-Learn. I am stuck on this issue for 2 days. Pandas is selecting all columns of a dataframe even though I specified 2 columns. Here is the dataframe in ...

Shree_ML

43

asked May 31 at 8:59

0 votes

0 answers

24 views

_fit_method for KNN gives KD-tree even though I'm working in a high dimensional spce

so since KNeighborsClassifier class in sklearn find the best algorithm depending on the values from fit method when using auto (which is the default), when accessing the algorithm using ._fit_method I ...

aisha kh

1

asked May 24 at 19:53

1 vote

2 answers

52 views

Using SKLearn KMeans With Externally Generated Correlation Matrix

I receive a correlation file from an external source. It is a fairly straightforward file and looks like the following. A sample csv can be found here https://www.dropbox.com/scl/fi/...

Stumbling Through Data Science

1,950

asked May 18 at 9:40

0 votes

2 answers

69 views

Using a Mask to Insert Values from sklearn Iterative Imputer

I created a set of random missing values to practice with a tree imputer. However, I'm stuck on how to overwrite the missing values into the my dataframe. My missing values look like this: from ...

Englishman Bob

483

asked May 6 at 19:14

0 votes

1 answer

170 views

model.fit() class weights do not work when training the model

when calculating classes_weight with from sklearn.utils import class_weight class_weights = class_weight.compute_class_weight(class_weight="balanced", classes=np.unique(...

oliver6626

1

asked May 6 at 17:32

0 votes

0 answers

205 views

Cannot use fetch_california_housing. How to solve this?

I tried executing this code. But I am getting an error. How do I solve this? from sklearn.datasets import fetch_california_housing x = fetch_california_housing() Error ssl.SSLCertVerificationError: [...

dhj

11

asked Apr 13 at 1:48

0 votes

0 answers

54 views

How to pipeline custom function call with dedicated input?

I created a small NLP model with Scikit-Learn and RandomForest step-by-step and it worked very well. The catch is I have to include additionnal data input in the middle of my pipeline and then package ...

Marc Debureaux

93

asked Apr 12 at 14:08

0 votes

1 answer

32 views

Data cardinality is ambiguous sklearn.train

model.fit(x_train, y_train, epochs=1000) i'm trying to make a ai but mine code gives a error and i don't how to fix it? this is the error ValueError: Data cardinality is ambiguous: x sizes: 455 y ...

user24242174

1

asked Apr 11 at 15:20

0 votes

1 answer

127 views

Mlflow log_figure deletes artifact

I am running mlflow with autologging to track an xgboost model. By default, under artifacts it saves the model, requirements, and feature importances. Cool stuff I want to keep. But, if I try to add ...

illan

375

asked Apr 10 at 16:16

0 votes

0 answers

46 views

creating folds for cross validation based on data labels in pandas

I have a dataset from a scientific lab study, with columns specyfying an unique number for each person which attended and a time point specyfying which time was each person attending the experiment (a ...

rogolo

3

asked Apr 10 at 12:15

1 vote

1 answer

68 views

multiple linear regression house price r2 score problem

I Have Sample House Price Data And Simple Code : import pandas as pd from sklearn.preprocessing import LabelEncoder, StandardScaler from sklearn.model_selection import train_test_split from sklearn....

mehran arbabian

180

asked Apr 4 at 16:01

0 votes

0 answers

61 views

How can i fix this error ? Attempt to get argmax of an empty sequence

I was trying to explore data using Python and this is my code : import pandas as pd import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv('transactions-pet_store-clean.csv') ...

Thierry irambona

1

asked Mar 30 at 9:52

0 votes

1 answer

69 views

How to transform Dataframe Mapper to PMML?

I want to use multiple PMMLs to keep the transformation of the data and the application of the model separate. Here is the code I am using. I am doing this because I want to include some kind of ...

Habenzu

77

asked Mar 28 at 18:31

0 votes

0 answers

214 views

ImportError: cannot import name '_gb_losses' from 'sklearn.ensemble' ; Error deploying ML python webapp to streamlit community cloud

Out put below from the install log: 2024-03-13 00:11:02.729 Uncaught app exception Traceback (most recent call last): File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/...

LVB

1

asked Mar 13 at 19:59

0 votes

0 answers

44 views

how to calculate WOE for a dataset with weights column for both continuous and categorical data

I want to calculate Weight of Evidence for a dataset with weights. I understand the ln(% of events/% of non-events) but how does it apply for data with weights and bins? One way I thought that may or ...

Shivaranjani C

1

asked Feb 26 at 2:46

1 vote

1 answer

156 views

How to get immediate neighbors using a kd-tree irrespective of the spacing?

I want to find the immediate neighbours around a given point in a multidimensional space (up to 7 dimensions). Important facts about the space: non-linear spacing among points within a single ...

skm

5,639

asked Feb 21 at 10:08

0 votes

1 answer

85 views

DataFrameMapper with sklearn2pmml Domains

I have a PMMLPipeline with the following DataFrameMapper inside (Domains are coming from sklearn2pmml, while the Mapper is from sklearn-pandas): {'features': [(['A'], [ContinuousDomain(dtype=<...

Habenzu

77

asked Jan 11 at 16:35

0 votes

0 answers

16 views

LinearDiscriminantAnalysis() classifier in python recognize any face even if it is not already exist

my face recognition code is working correctly, so it can recognize an already existing face in the pickle file and label it correctly, my problem is even if the face doesn't exist in the pickle file ...

Osama Mohammed

2,841

asked Dec 23, 2023 at 8:48

0 votes

0 answers

52 views

How do I remove RangeIndex and dtypes from output display?

OUTPUT DISPLAY- This is the output of my program project and only thing is remaining is that RangeIndex and dtypes values which I can't able to remoe from output display SOURCE CODE- I are working on ...

Somansh Bhayani

3

asked Dec 1, 2023 at 19:01

0 votes

1 answer

57 views

Custom classifier won't accept data from test_train_split in sklearn

I am attempting to write a custom classifier for use in a sklearn gridsearchCV pipeline. I've stripped everything back to the bare minimum in the class which currently looks like this: from sklearn....

Ben

451

asked Nov 13, 2023 at 6:05

0 votes

1 answer

63 views

Using kNN with weighted dataset

I have a dataset df: category var 1 ... var 32 weighting country 1 blue 1.0 54.2 3.0 US 2 pink 0.0 101.0 1.0 other 3 blue 1.0 49.9 3.0 US 4 green 1.0 72.2 9.0 US I'm using the kNN classifier ...

MC Jong

63

asked Oct 30, 2023 at 13:03

1 vote

1 answer

39 views

Even though I have successfully installed sklearn via Jupyter, I cannot access its classes. What mistake did I make?

Even though I have successfully installed sklearn via Jupyter, I cannot access its classes. What mistake did I make? !pip install sklearn import sklean from sklearn.preprocessing import LabelEncoder ...

Fernando

11

asked Oct 10, 2023 at 1:08

0 votes

1 answer

73 views

XGBoost evaluation

I built a model using XGBoost algorithm to predict precipitations. It turns out that the RMSE is equal to 7.6. Does it mean that the model performs poorly? If so, what would be your piece of advice to ...

Willy Mbenza

1

asked Oct 5, 2023 at 17:21

1 vote

1 answer

50 views

sklearn requirements installation with pip: ensure that binary wheels are used

In the sklearn installation guide for the latest version (1.3.1) it mentions that you can install dependencies with pip, but says "When using pip, please ensure that binary wheels are used, and ...

Arran Duff

1,434

asked Oct 3, 2023 at 19:32

0 votes

0 answers

121 views

How to Implement Custom Distance Metrics in Sklearn Nearest Neighbor

I am trying to implement my own distance metrics specifically Jaro distance in Sklearn Nearest Neighbour and I am getting back some errors. I've tried looking up online and didn't manage to find a ...

Gabriel Choo

127

asked Sep 26, 2023 at 2:54

1 vote

2 answers

74 views

I want to create a linear regression scatterplot for an assignment. State names are giving me an error (?)

I am working on a project tracking poverty across the US between 1995 and 2020. As I am working on a linear regression scatterplot, I have this: # Create a regression object. regression = ...

EC Cotterman

31

asked Aug 7, 2023 at 14:50

1 vote

0 answers

218 views

Why is InterClusterDistance from yellowbrick failing with "AttributeError: 'NoneType' object has no attribute '_get_renderer'"

I am trying to initialize a InterClusterDistance visualizer from the yellowbrick library. When I execute the following: from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from ...

Data guy

11

asked Aug 3, 2023 at 14:49

1 vote

0 answers

167 views

Using faster pandas groupby class on multiple columns

Short version: I need help applying someone else's groupby class on multiple pandas columns and with more complicated functions. Long version: Someone else (Elizabeth Santorella) wrote a python class ...

Inder Jalli

128

asked Jul 27, 2023 at 15:11

0 votes

1 answer

110 views

MinMaxScaler doesn't scale small values to 1

I found weird behavior of sklearn.preprocessing.MinMaxScaler and same for sklearn.preprocessing.RobustScaler When data max value is very small < 10^(-16) transformer doesn't change data max value ...

Dima

49

asked Jul 24, 2023 at 10:56

0 votes

1 answer

1k views

How to Prepare the Data for a Logistic Regression Using SKLearn

Hello there ) I'm working on an undergraduate data analysis project and would seek guidance in regard to the following case study: What I'm working with: I have a data frame consisting of 3'891 ...

Dan_San

11

asked Jul 24, 2023 at 9:51

0 votes

1 answer

49 views

Application of Min & Max function

My goal is to normalize my data (minimization & Maximization) for machine learning purposes. The issue I am having is presented when you run the code below. #importation of libraries: import ...

Toy L

55

asked Jul 21, 2023 at 23:32

0 votes

1 answer

34 views

How to count number of occurences of each element using groupby in pandas column [duplicate]

Following is the df from io import StringIO import pandas as pd df = pd.read_csv(StringIO(""" Group Date Rank A 01-01-2023 1 A 01-02-2023 2 A 01-03-2023 3 A 01-04-2023 2 A 01-05-2023 1 ...

Yogesh Kamboj

41

asked Jul 9, 2023 at 14:40

-1 votes

2 answers

108 views

Sklearn Random Forest: determine the name of features ascertained by parameter grid for model fit and prediction

New to ML here and trying my hands on fitting a model using Random Forest. Here is my simplified code: X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.15, ...

Sinha

446

asked Jul 6, 2023 at 4:45

0 votes

1 answer

418 views

I got the following error: 'DataFrame' object has no attribute 'year'

picture of csv file containing raw data I am trying to plot a scatter graph using an online csv file i downloaded in inorder to get the linear regression. %matplotlib inline plt.scatter(df.year, df....

Jamilu

5

asked Jun 27, 2023 at 21:26

0 votes

1 answer

92 views

How to reverse the encoding of sklearn LabelEncoder() after training the model?

So I am currently creating a machine learning model in Python which predicts the outcome of a football match. Below is the code from the training of the model: features = ['Home Team',..., '...

Andreas Wong

3

asked Jun 16, 2023 at 18:06

0 votes

1 answer

260 views

Sklearn Pipelines - Feature Engineering

I wrote a simple generic XGBoost classifier code that runs with a pipeline. This is the code (with simple config example): import optuna import pickle import pandas as pd from xgboost import ...

Zag Gol

1,076

asked Jun 10, 2023 at 16:23

0 votes

1 answer

71 views

Why are the kmeans centroids far from the data? Python

I'm making a kmeans model with the data from Twitter, but when I apply the polarity and subjectivity analysis on the scatterplot, the centroids (red x) appear far from the data: from sklearn....

Nana

13

asked Jun 6, 2023 at 4:14

-1 votes

1 answer

229 views

Error when trying to fit a dataset. (python)

I am trying to fit a sklearn linear regression model with many points from a pandas dataframe. this is the program: features =["floors", "waterfront", "lat", "...

Legofan35664

11

asked May 18, 2023 at 4:28

-1 votes

1 answer

794 views

ValueError: Input contains NaN, infinity or a value too large for dtype('float64') when using randomizedSearch

I am trying to use RandomizedSearchCV from sklearn on an MLPRegressor model, and I have scaled the data using standardScaler. The code for the model is presented below. When I try to run the code I ...

user17637519

31

asked May 10, 2023 at 11:12

0 votes

0 answers

76 views

ValueError: setting an array element with a sequence. In decisionTreeClassifier fit

I have the following code, I'm just trying to teach myself how to use a machine learning model. import ast import csv import pandas as pd import numpy as np from sklearn.tree import ...

Malelizarazo

1

asked Apr 25, 2023 at 21:33

0 votes

1 answer

201 views

How to find RMSE without test value in python

First, I am wondering if is there a way to find the RMSE value with the y-test value(I can do it if I have a y-test value). For instance, we have train data and test data. But in the test data, we don'...

ash1

433

asked Apr 20, 2023 at 18:09

-1 votes

1 answer

217 views

Do we need to exclude OneHotEncoded columns while standardizing or normalizing using MinMaxScaler() or StandardScaler()?

This is the final cleaned DataFrame (df2) before Standardizing my code: scaler=StandardScaler() df2[list(df2.columns)]=scaler.fit_transform(df2[list(df2.columns)]) df2 This returns a DataFrame after ...

SAJEER AR

3

asked Apr 6, 2023 at 18:42

0 votes

1 answer

238 views

How to get predict from string data in sklearn

When I convert data from a pandas dataframe to sklearn so I can make predictions. String data becomes problematic. So I used labelencoder but it seems to limit me to using the encoded data instead of ...

M.Namjoo

1

asked Mar 27, 2023 at 13:23

2 votes

1 answer

80 views

Complicated double sum using groupby in Pandas dataframe

I have a dataframe that looks like Race_ID Date Student_ID a b 1 1/1/2023 1 3 1 1 1/1/2023 2 2 2 1 1/1/...

Ishigami

405

asked Mar 24, 2023 at 9:26

Collectives™ on Stack Overflow

Related Tags