Newest 'k-means' Questions

0 votes

0 answers

29 views

Why bother with k-means number of clusters? Why not generate them all and see which one works? [closed]

I'm a sociologist with a CS background. I'm analyzing longitudinal data and I'm not up to speed with the statistical lingo around the whole thing. I'm trying to figure out the statistical names of the ...

Guillaume

1

asked Dec 7 at 13:39

2 votes

0 answers

16 views

How to cluster based on x and y coordinates

I am trying to identify rows in groups of points using clustering algorithms. The bigger picture problem I'm trying to solve is to identify shelves given x and y coordinates of products. I can cluster ...

Tommy Wolfheart

121

asked Dec 4 at 13:17

0 votes

0 answers

11 views

Identify predictors for clustering output?

I have a dataset with variables collected years ago, and many variables collected this year as outcome variables. I want to combine all the variables collected this year to get one outcome, e.g. ...

NPpsy

43

asked Nov 15 at 15:20

1 vote

0 answers

19 views

Question about running k means cluster analysis

In a previous analysis I had 3 groups of subjects - group x with 35 subjects, control group y with 25 subjects, and control group z with 25 subjects. For each group I have levels of 6 different ...

FastBallooningHead

511

asked Oct 18 at 20:39

1 vote

0 answers

40 views

Question on using the elbow method for calculating ideal number of clusters for k means cluster analysis

Newb to cluster analysis here. I have a group of 35 subjects. For all of the subjects I have data for different measures of IQ (verbal, math, etc) and different biomarkers. There are 6 IQ measures in ...

FastBallooningHead

511

asked Oct 7 at 10:42

0 votes

0 answers

19 views

Clustering Mixed Data Types: Algorithm Selection, Distance Measurement, and Feature Weighting

I have a database of 74,000 records with 29 features. Fourteen of these features are categorical and are either 0 or 1, while the other 15 features are continuous and have been normalized and scaled ...

peiman razavi

43

asked Aug 31 at 7:46

1 vote

1 answer

28 views

Is this the right approach to cluster using many different evaluations on the same dimension?

I'm working on a project where I want to sort political parties into two groups. I want to do so using the answers of many respondents in a survey who indicated for each party where they see them on a ...

p0.051

asked Aug 30 at 16:28

1 vote

0 answers

38 views

What if PCA is unable to group my samples, but K-means perfectly clusters them? Is there any problem with my data analysis? Is it possible? [closed]

I am not an expert, but I am currently using unsupervised methods to better explain my mass spectrometry data obtained via DART-MS analyses. I am still learning. It turned out that when analyzing my ...

Isabela

11

asked Aug 5 at 14:26

0 votes

0 answers

14 views

calculation of the C-index clustering for manual [duplicate]

Can anyone give me an example of working on the C-index clustering validity test, but calculating manually??

Raaa

1

asked Jul 23 at 16:37

0 votes

0 answers

25 views

calculation of the C-index clustering [duplicate]

Can anyone give me an example of working on the C-index clustering validity test, but calculating manually??

Raaa

1

asked Jul 23 at 15:14

1 vote

0 answers

22 views

Spatial Temporal Clustering evenly spaced over time

I have a large dataset of spatio-temporal data. It has longitude and latitude coordinates, and a date for each observation. For example: Long Lat Date 50 20.43 9-19-2010 51 19.5 10-4-2010 51 19.3 ...

Robertmg

121

asked Jul 10 at 17:03

0 votes

0 answers

11 views

What are the right metrics to validate the performance of a custom clustering model with three possible outcomes?

I have developed a custom clustering model on top of MiniBatchKmeans, that has three possible outcomes for each data point: Assign the point to the correct cluster. Assign the point to the wrong ...

Sanjay Mythili

1

asked Jun 20 at 7:19

0 votes

0 answers

24 views

Curse of dimensionality in Time series with K-means

I have been looking at the following notebook: time series clustering where the writer says that the dataset is affected by the "Curse of Dimensionality", so applying TimeSeriesKMeans ...

Zackbord

1

asked May 30 at 20:45

3 votes

1 answer

28 views

What is "clall" in index.Gap in "clusterSim" R package?

I am using the "clusterSim" package in my project (https://cran.r-project.org/web/packages/clusterSim/clusterSim.pdf, page 39) and I do not understand the meaning of the "clall" ...

user2702

51

asked May 29 at 10:05

0 votes

0 answers

62 views

Variable importance in cluster analysis

I'm new to the cluster analysis, read lots of things but I'm not able to understand how to variables are ordered into cluster. I mean, I find that my data are clustered into 3 different cluster, but ...

Riccardo

101

asked May 24 at 11:35

0 votes

0 answers

16 views

Should the same environmental variable measured with different methods be removed before K-means? What about variables repr. sep. and by their ratio?

So I'm running K-means clustering algorithm on environmental variables measured on different locations. The aim is to see if the environmental variables can be clustered into separate clusters. Same ...

Cordex

77

asked Apr 28 at 10:23

1 vote

0 answers

125 views

k-means clustering on a probability distribution instead of a dataset

Normally, clustering algorithms such as $k$-means are defined on a dataset in the following sense: if $D$ is a dataset, find a partition of $D$ into sets $\{S_1, \dots, S_n\}$ that minimises the ...

Harry Partridge

111

asked Apr 18 at 8:06

0 votes

1 answer

150 views

Applying clustering algorithms after t-SNE in R

So I'm doing my bachelor`s work and I'm applying different clustering algorithms on certain data. Before all the clustering of course I'm using a dimensionality reduction algorithm such as t-SNE for ...

Danielius Lesun

asked Apr 2 at 8:42

2 votes

1 answer

204 views

What is the standard threshold value that is best for accuracy when employing Euclidean distance as a metric for gauging textual similarity?

I'm using Euclidean distance as a metric to compare two sentences for similarity while clustering them using my custom incremental KMeans algorithm. The current threshold value I'm using is 0.7 which ...

sanjay M

21

asked Apr 1 at 7:32

0 votes

0 answers

10 views

What is normalized winning frequency in kernel self organizing map(SOM)?

In the k-means based kernel SOM, proposed by MacDonald and Fyfe (2000), the update of the mean is based on a soft learning algorithm mi(t + 1) = mi(t) + Λ[φ(x) − mi(t)] where Λ is the normalized ...

Anshuman Jayaprakash

1

asked Mar 1 at 10:51

0 votes

0 answers

43 views

Why does this K-Means cluster example show 'overlap' between clusters?

I was reading the hypertools docs and came across this pictorial that shows 10 clusters (some seem to share very similar coloring) generated from some (mushroom) ...

Vincent Karuri

1

asked Mar 1 at 7:56

1 vote

0 answers

68 views

K-means clustering - weird PCA visualization

I performed PCA on 4 variables and are shown in this visualization: At first look it doesn't look convincing and the some clusters seem weird. The data was cleaned and standardized beforehand. Only ...

Simon

11

asked Feb 22 at 8:22

0 votes

1 answer

20 views

K means clustering of image with k=1 vs mean of all pixels

I have relatively uniformly colored images and I extracted colors using k-means. k means 1 showed the best results for my modeling purposes, k means 2 not so much, and with k-means 3 there ceased to ...

phil27

1

asked Feb 12 at 11:45

0 votes

0 answers

21 views

Method for pairwise ordering two datasets

Given two rather small but unordered multidimensional vectors/datasets (e.g sets of a handful of 3D coordinates), what is a simple method for pairwise alignment/ordering? I've though about using ...

joaocandre

161

asked Feb 11 at 19:29

1 vote

1 answer

59 views

Elbow method not giving a proper curve in python code

I am trying to determine how many clusters to use for my k-means clustering using different methods. first i used the following code to calculate different metrics per cluster number and different ...

rebwar

11

asked Feb 2 at 15:49

3 votes

2 answers

455 views

Termination conditions for K-means and their interconnection

As far as I know, there are two termination criteria for K-means clustering algorithm: assignments of data points do not change centroids do not change I wonder if there is any kind of relation ...

Artem Tartakovskiy

31

asked Jan 21 at 19:34

1 vote

1 answer

46 views

Mathematics behind standardizing the data points in machine learning algorithms (e.g., K-means clustering)

For K-means algorithm, among other methods using distance-based measurements to determine similarity between data points, why we have to standardize the data points with mean as 0 and standard ...

Sophia

121

asked Jan 13 at 1:58

0 votes

1 answer

53 views

Continuous monitoring of KMeans model post production

In the process of deploying a KMeans model for a customer segmentation use case into production. KMeans doesn’t produce the same results every time and after production cluster sizes and arrangements ...

ibarbo

65

asked Jan 5 at 6:08

0 votes

1 answer

169 views

Proving that K-means corresponds to an EM algorithm?

Just wanted to make sure that my proof is correct and that I am not missing anything in the process. Any thoughts? " To demonstrate mathematically that the K-means algorithm corresponds to an ...

Naomi Pomella

1

asked Dec 18, 2023 at 16:14

2 votes

1 answer

36 views

Can I use kmeans on paired data?

I want to see if a treatment brings patients closer to controls using multiple dependent variables. Can I do kmeans and see if the controls are separate from the patients before treatment, but cluster ...

maglorismyspiritanimal

413

asked Dec 5, 2023 at 14:57

4 votes

2 answers

409 views

Question about Silhouette index calculation using scikit

I am currently working with continuous data measured from different sensors (thermometers and voltmeters). I have a matrix whose columns represent the sensors and the rows are normalized measurements (...

slow_learner

205

asked Oct 25, 2023 at 14:03

0 votes

0 answers

41 views

Turning heatmap into clusters - Classification

Assume that you having a heatmap that looks like this. The goal is to classify all the "dot" inside the image. How can that be done? The assumptions of the image: The image has always black ...

euraad

425

asked Sep 11, 2023 at 20:18

0 votes

1 answer

49 views

In unsupervised learning, is a result of 2 clusters meaningful?

I used both agglomerative clustering and k-means on a dataset and see the results below. Result from agglomerative clustering was demonstrated with silhouette score while kmeans with inertia score. ...

LCheng

219

asked Aug 7, 2023 at 15:14

1 vote

0 answers

18 views

Method to find group associated with a target variable [closed]

The business question that I am trying to answer is: what group(s) of people have the highest chance of default? The features that I have are income, debt to income ratio, fico, etc. How do I find the ...

Victoria B

11

asked Aug 5, 2023 at 22:00

1 vote

1 answer

110 views

How to tell whether segments from K Means clustering result are "successful" and will impact business metrics?

Background I'm a data analyst. The Business unit I'm assigned for needs to segment users based on power vs non-power users so they can target each segment with proper treatments. Goal Segment users (...

Blaze Tama

135

asked Jul 26, 2023 at 5:16

0 votes

1 answer

337 views

Dummy Variable Trap in KMeans Clustering

My data set is having a column Gender, so I have to apply One Hot Encodingto perform KMeans Clustering. Q1. Should I take care about ...

mainak mukherjee

23

asked Jul 21, 2023 at 11:20

2 votes

1 answer

160 views

Clustering algorithms puts data points that are visually far apart in same cluster

I am trying to cluster a very large set of data points, of roughly (20000, 100) shape. I could not run density based DBSCAN or SpectralClustering due to the ...

pingo

29

asked Jun 19, 2023 at 23:20

1 vote

2 answers

384 views

Interpreting results of K-means after PCA

I have this dataset about an airline company customers with 22 explanatory variables. My goal is to perform some sort of customer segmentation with the k-means algorithm. One problem that I've found ...

ScarceChicken

11

asked Jun 15, 2023 at 11:38

0 votes

0 answers

66 views

General technique for loss function minimization

I was trying to rationalize the K-Means algorithm and came up with the following thoughts. Suppose we need to compute: $T=min_x L(x)$ but we struggle because $L$ is complex. Suppose we find $L'$ s.t.: ...

Thomas

952

asked May 16, 2023 at 8:54

1 vote

1 answer

1k views

Elbow method Vs Gap statistics, which one? challenging for data scientist

I am working on hourly-weather data. It contains four features: rain, wind speed, humidity, and temperature. Obviously, all of them are continuous values. The number of records is around 17000. Other ...

Asa Ya

73

asked Apr 4, 2023 at 11:51

1 vote

2 answers

92 views

Can I use K-Means to group customers based on a single variable?

I have a test dataset of 11m records. The dataset contains a global customer id and spend figure. I need to group customers into the following categories: 0 Low 1 Low/Med 2 Med 3 Med/High 4 High I ...

John Edwards

11

asked Mar 17, 2023 at 15:39

0 votes

0 answers

58 views

How to identify the clusters in SSE plot?

How to determine the number of clusters from the following plot?

Niro

1

asked Feb 9, 2023 at 8:06

1 vote

1 answer

336 views

Unsupervised learning: How to identify differences between clusters?

I'm learning about unsupervised learning and I tried to use KMeans, AgglomerativeClustering and DBSCAN on the same datase. The result was ok, they seems to work fine according silhouette_score() ...

Antonio Caipora

61

asked Jan 23, 2023 at 21:24

2 votes

0 answers

37 views

Does it make sense to transform a feature containing hours (24h) into two features with xy-coordinates of each hour in the space? [duplicate]

I have a clustering problem that I might solve with an algorithm based on Euclidean distance (e.g. K-Means). One potential feature is the "hour" at which each user began an interaction. As ...

rusiano

566

asked Jan 21, 2023 at 12:55

0 votes

0 answers

19 views

How do I choose k for k means clustering [duplicate]

Given a set of points, I'm trying to find the right cluster. However, I am lost on what the process is. Here is the graph of all possible points. I am unsure what I should look at

whuang2017

asked Jan 17, 2023 at 18:25

2 votes

1 answer

370 views

Choosing the best clustering algorithm and evaluating the results

I'm trying to separate my data into clusters using the k-means algorithm and the hierarchical algorithm, choose which algorithm fits my data the best, and evaluate the results. However, all of my ...

Jim

61

asked Jan 8, 2023 at 11:45

0 votes

0 answers

31 views

How to interpret the Scatter Plot result from PCA? [duplicate]

I have a project in school about clustering analysis. I have applied standardization and principal component analysis (PCA) to my dataset (I used K-means), which is about heart disease patients. I ...

AK6000W

1

asked Jan 5, 2023 at 8:43

1 vote

1 answer

140 views

In $k$-means, how is it NP-hard if the dimensionality of the data is at least $2$ ($d\geq 2$)?

In $k$-means, how is it NP-hard if the dimensionality of the data is at least $2$ ($d\geq 2$)? Can someone justify or give reasons to this statement? Any guidance would be appreciated.

Maryam Faheem

11

asked Dec 31, 2022 at 16:46

0 votes

0 answers

22 views

K-means on linearly projected features

I am looking for references on K-Means applied to linearly projected features instead of to the original features, in the sense that both K-Means and the projection matrix are learned at the same time....

f10w

213

asked Dec 16, 2022 at 14:02

1 vote

0 answers

48 views

Can K-means put most of the noise in the same cluster?

I am working on clustering text data (very short sentences) vectorized with tf-idf. The data are characterized by high sparseness and the presence of abundant noise (considered here as documents that ...

zurgo

11

asked Dec 15, 2022 at 18:49

Questions tagged [k-means]

Related Tags