Newest 'gini' Questions

3 votes

1 answer

470 views

How is Gini impurity related to accuracy when predicting the majority class?

For simplicity, consider the binary case, where we have a set of elements with each element belonging to one of two classes (0 or 1). Let p(j) be the proportion of ...

alex76

31

asked Apr 24 at 19:58

3 votes

2 answers

75 views

Is it possible to calculate a standard deviation from the gini coefficient and mean?

I am looking to create an analysis showing how many people in a given country have more than than X dollars in income. I know the average income, population count, and Gini Coefficient of income ...

Andrew

31

asked Mar 19 at 17:45

0 votes

0 answers

70 views

Relation between gini coefficient/accuracy ratio and roc_auc_score when there are many identical predictions

I have been working on ranking metrics related to various estimators lately, and cam a across a curious phenomenon related to the Gini-coefficient which I would like to understand better. I will start ...

user405288

1

asked Jan 19 at 18:57

2 votes

1 answer

376 views

Gini impurity greedily optimises a loss function in decision trees

I am trying to understand how the Gini criterion for decision decision tree construction actually greedily optimises a loss function. The Gini impurity, sometimes also called Gini index, for a region (...

ngmir

339

asked Oct 9, 2023 at 12:15

0 votes

0 answers

39 views

What is the benefit of implicit performance of a probability of default model?

I have a model which predicts a binary outcome (default/no default). The discriminatory power of the model is normally quantified with Somers' D which is the same as Gini in the binary context. $Gini ...

PalimPalim

333

asked Jun 26, 2023 at 12:07

1 vote

1 answer

2k views

Calculation of the GINI coefficient,Accuracy and AUROC for credit scoring using Python code

I have the following data and I want to compute the GINI and Accuracy for model validation purposes. But I tried to calculate the GINI and Accuracy using Python code, but it seems incorrect. I would ...

StatsUser

1,839

asked May 22, 2023 at 21:09

4 votes

1 answer

1k views

Calculating Area Under Curve (AUC) using cumulative events and non-events rates after binning the data

I understand that the AUC is basically the area under the ROC curve, which is the plot of the proportion of true positives versus the proportion of false positives at different probability cutoffs. ...

Dr. Dre

53

asked Oct 28, 2022 at 6:09

0 votes

0 answers

46 views

How to find significance for Gini coefficient changes?

I'm using the Gini coefficient to evaluate the performance of a model. Making some changes (feature selection, hyperparameter tuning, etc.) I created variant models with different Gini coefficients. ...

Lafayette

111

asked Oct 20, 2022 at 14:58

3 votes

1 answer

669 views

Calculate Herfindahl-Hirschman Index when you know the total but only observe the largest few

The Herfindahl–Hirschman Index (HHI) is a concentration measure defined as $$H = \sum_i p_i^2,$$ where $p_i$ is the market share of firm $i$. However this assumes knowing all $p_i$ for an industry. ...

jf328

811

asked Jul 11, 2022 at 5:57

0 votes

0 answers

302 views

Ways to measure deviation from a discrete uniform distribution [duplicate]

I'm looking for a way to characterize the deviation from a discrete uniform distribution. Example: 50 balls are distributed over 10 urns. In the most equal case, all urns get 5 balls. In the most ...

cosine

1

asked May 8, 2022 at 20:51

0 votes

1 answer

143 views

Inverse transform sampling : comparing bias, variance and mse for an estimator

Starting from the PDF of the Pareto distribution, \begin{equation} f_{\theta_1, \theta_2}(x) = \begin{cases} \frac{\theta_1 \theta_2^{\theta_1}}{x^{\theta_1 + 1}}, &\quad x \geq \theta_2 \...

Mathieu Rousseau

111

asked Apr 22, 2022 at 18:44

0 votes

0 answers

40 views

Computing Gini coefficient for a 2 parameters density function

I have a random variable $X$ defined by the following the density function, \begin{equation} f_{\theta_1, \theta_2}(x) = \begin{cases} \frac{\theta_1 \theta_2^{\theta_1}}{x^{\theta_1 + 1}}, &...

Mathieu Rousseau

111

asked Apr 15, 2022 at 8:29

1 vote

1 answer

376 views

Creating a function to compute Gini Index

I'm trying to compute the Gini Index for different examples given in this page. I don't get what I'm doing wrong, as the formula showed is: $Gini Index = 1 - \sum_{i=1}^{C}(p_{i})^{2}$ And my code ...

Chris

535

asked Apr 5, 2022 at 2:02

4 votes

1 answer

177 views

Where does the Gini coefficient come from?

I understand what a ROC curve is. However, I do not understand the Gini coefficient in the context of binary classification. All the resources I have checked state that $Gini = 1 - (2 \times AUC_{ROC})...

Arturo Sbr

569

asked Mar 2, 2022 at 20:01

2 votes

1 answer

809 views

Why is my logistic regression outperforming neural networks?

I have 5 samples (each one contains ~380K records, 33 predictive variables and 1 binary Target): one sample is used to train the models the remaining 4 samples are used to validate the models The ...

Giampaolo Levorato

123

asked Nov 10, 2021 at 13:22

0 votes

0 answers

128 views

variations in 4-fold cross-validation coefficients

What does it mean when one of 4-folds Gini coefficient has a low number. For instance 83%, 84%, 85% and 75%? Is this variation is in a normal range? Can it be caused by outliers? Does it worth ...

Sadegh

205

asked Jul 30, 2021 at 12:24

3 votes

1 answer

167 views

Splitting criterion of classification tree: Does the growth process come naturally to a stop?

With respect to growing a classification tree: Does growing with Gini or Cross-entropy (CE) imply we would grow the tree until every leaf is pure (in case of no other stopping criteria)? Put ...

J3lackkyy

745

asked Jul 27, 2021 at 18:17

0 votes

0 answers

124 views

Remove fatures with low Gini importance score to improve accuracy of Random forest

For a research project on a networking related subject, I am training and testing a Random forest model with a data set that contains 20 features. Initially, I obtained a baseline accuracy of around ...

Nht_e0

33

asked Jul 22, 2021 at 5:00

5 votes

1 answer

259 views

How are entropy and Gini Impurity related?

I know the differences between entropy and Gini impurity and why we use Gini in order to construct trees. But I would like to find some relation between those two measures. It leads me to one ...

ltrd

151

asked Jun 21, 2021 at 20:25

3 votes

1 answer

65 views

Is my understanding of the Gini plot to detect fat tails correct?

I'm trying to reproduce the following plot: which was generated on the Danish dataset of fire insurance claims using the ineq() function (a wrapper for functions ...

Antoni Parellada

26.9k

asked Jun 12, 2021 at 1:13

2 votes

0 answers

462 views

Derive Gini coefficient of lognormal distribution from definition

The Gini coefficient of a lognormal distribution $\operatorname{Lognormal}(\mu, \sigma^2)$ is $\operatorname{erf}(\sigma / 2)$, where $\operatorname{erf}$ is the error function. But how do I derive ...

Fredrik P

502

asked Jun 2, 2021 at 11:10

1 vote

1 answer

1k views

GridsearchCV() gives optimum criterion for Decision Tree should be entropy, but why am I getting better accuracy with Gini?

I ran this code ...

K.Swaviman

19

asked Apr 30, 2021 at 21:54

1 vote

0 answers

25 views

How can I show a mathematical proof of entropy in clasification tree? [closed]

I am trying to understand the splitting criteria in the classification tree. How can I show that for $p_1,p_2,..,p_n$ these functions attaining their maximum and minimum? $g(p_1,p_2,...,p_n) = Σp_i(1-...

Pastor Soto

123

asked Mar 29, 2021 at 2:03

1 vote

0 answers

53 views

How MeanDecreaseGini is calculated for categorical predictors?

I'm implementing a random forest algorithm but I noticed that the categorical variable in the database is not selected among the important variables. So I want to know how RF calculates ...

user1988

155

asked Mar 16, 2021 at 3:26

0 votes

1 answer

32 views

How can I forecast Gini using ML?

I have a data set containing 20 years of Gini values for a country. The latest data are for 2018. I want to predict the Gini values for this country by 2025. How can I do this using ML techniques? ...

Maxpayne

39

asked Feb 23, 2021 at 6:17

2 votes

1 answer

98 views

Is this case possible for Decision Tree?

I am studying decision tree and I would like to know if this case is possible: We have 2 features, each does not decrease the Gini of the previous node (=> not choose), but their combination (two ...

Lucas Bensaid

145

asked Jan 13, 2021 at 15:39

0 votes

0 answers

457 views

In gradient boost, do we still split nodes based on splitting criteria(impurity measure)?

Am I correct if i say that we use the loss function to calculate residuals, and the splitting criteria to determine which splits to make to predict these residuals? If this is the case how do we ...

Polarni1

85

asked Dec 18, 2020 at 22:49

0 votes

0 answers

79 views

how can i plot a gini curve?

i am using a scoring metric as below: (gini) ...

Maths12

579

asked Nov 10, 2020 at 10:07

2 votes

1 answer

329 views

Log probabilities versus squared probabilities (entropy vs Gini)

The advantage of log probabilities over direct probabilities, as discussed here and here, is that they make numerical values close to $0$ more easy to work with. (my question, instead of the links, ...

develarist

4,049

asked Oct 21, 2020 at 13:27

3 votes

1 answer

209 views

Gini Index calculation for near duplicate rows

My data set has near duplicate rows because there are multiple rows for each employee depending on how long they have stayed in the organization. Therefore, employee Ann has 3 rows, Bob has 2 rows etc....

learner

627

asked Aug 16, 2020 at 9:37

1 vote

1 answer

937 views

Why we use squared probabilities in the Gini impurity [duplicate]

Why we are using squared probabilities instead of normal probabilities in Gini impurity . Probabilities will always be positive, so why to square those?

Daya

11

asked Aug 3, 2020 at 23:28

2 votes

0 answers

30 views

How is the fraction of individuals with negative income handled in calculating the Gini coefficient in grouped data?

Much of the literature on theorizing and estimating the Gini coefficient $G$ is predicated upon the lower bound of the income distribution being $\$0$ (or whatever your unit of currency is); that is, ...

Alexis

30.7k

asked Jul 25, 2020 at 19:39

2 votes

0 answers

24 views

When calculating the Gini coefficient for the US, how should the portion of the population which has not filed a return be incorporated?

The Gini coefficient $G$ is a commonly used measure of income distribution inequality, taking values from 0 (meaning every individual in the population has an identical income) to 1 (meaning a single ...

Alexis

30.7k

asked Jul 24, 2020 at 20:43

3 votes

1 answer

3k views

How to derive equation of Gini index used in Decision Trees?

Gini coefficient formally is measured as the area between the equality curve and the Lorenz curve. By using the definition I can derive the equation However, I can't obtain the exact Gini index ...

Tarlan Ahad

173

asked Jul 14, 2020 at 7:28

3 votes

1 answer

468 views

Gini Index of Vector with Negative Values

I would like to use the Gini Index to measure the sparsity in a signal. From my research so far it seems that the Gini Index is defined for a vector of positive values. My vector however also contains ...

thebear

31

asked Jul 10, 2020 at 8:49

1 vote

0 answers

30 views

Calculate GINI inequality coefficient from IRS SOI data

I am trying to calculate the GINI coefficient from the IRS SOI dataset using the adjusted gross income (AGI) bins provided in the csv. I know this will not be an exact GINI index score, and only a ...

psw

55

asked Jul 6, 2020 at 15:04

0 votes

1 answer

102 views

Gini values are not corresponding with Lorenz Curve area

I'm using Gini coefficient and Lorenz Curve plots to show the accumulation of beneficiaries in ecosystem services (ES) supply points, in R. I classify ES into three categories and calculate Gini and ...

Josep Pueyo

111

asked Apr 28, 2020 at 7:14

1 vote

1 answer

168 views

Gini and Lift With Transformed Variable

With regards to Gini Index/Net Lift, If I build a model where the target value is transformed by something - say natural log for example - will the Gini and Lift calculated on the transformed variable ...

Seraphim

135

asked Mar 21, 2020 at 20:50

0 votes

0 answers

357 views

What do all the distributions that have the same Gini index have in common?

According to the Wikipedia article about Income inequality metrics, Gini index have the next disadvantage: As a disadvantage, the Gini index only maps a number to the properties of a diagram, but the ...

Yanirmr

101

asked Mar 3, 2020 at 12:40

1 vote

0 answers

171 views

Decision trees minimizing the Gini error

I was reading the Elements of Statistical Learning and I stumbled upon the formula for minimizing the misclassification error. I was wondering if I could write something like that for the Gini index. ...

glouis

237

asked Jan 28, 2020 at 15:53

5 votes

1 answer

3k views

What is the difference between Gini index and Gini coefficient?

I am building a decision tree from scratch. I have been using entropy so far (calculated this way): ...

Carlos Mougan

380

asked Jan 22, 2020 at 16:55

2 votes

1 answer

1k views

Can someone explain to the Gini Index for a tree?

So I know what the formula for the Gini index. However, I have a few questions that I am hoping to clarify. I saw this, which tells you how to calculate the Gini index for each feature: Computing ...

confused

3,263

asked Jan 19, 2020 at 2:34

1 vote

0 answers

57 views

Why is the off-diagonal summation notation for the Gini index used only in classification problems with more than 2 classes?

The formula for the Gini index as a node impurity measure can be written as: $Gini(q)= \sum_{k=1}^M p_{qk}(1-p_{qk})$ Where $q$ is the node and $M$ represents the number of classes. Why can we only ...

Tommyixi

233

asked Jan 18, 2020 at 22:10

1 vote

0 answers

45 views

Looking at two PDF plots, is it possible to guess which distribution has a greater Gini coefficient?

By observing the PDF of two different distributions over the same support (as in the image), is it possible to infer which PDF describes the distribution with the greater Gini coefficient? I assume ...

Tecon

21

asked Sep 2, 2019 at 14:15

0 votes

1 answer

715 views

decision tree training: gini vs entropy vs precision vs recall

When training decision trees, the standard algorithms (e.g. ID3, C4.5, C5.0) use either the gini index or entropy to determine which node to add next. Only once the tree is built, and the ROC curve is ...

mathew gunther

35

asked Jul 14, 2019 at 17:31

2 votes

1 answer

2k views

Gini Index Formula

I've read many related articles and posts. The more I read, the more I got confused about 'Gini index' and 'Gini Impurity'. I understood the concept but it seems to me that these things are used ...

Dr Nisha Arora

1,044

asked Jun 5, 2019 at 4:29

4 votes

0 answers

779 views

What are the loss function used in Gradient Boosting vs Random Forest? Would Gini/ Entropy work for both models?

When I look at Python package tutorials, I compared the function for GradientBoostingClassifier and RandomForestClassifier and found 2 differences: 1) GBM does not mention 'Gini' or 'Entropy', which ...

user246123

41

asked Apr 27, 2019 at 0:00

2 votes

2 answers

4k views

High AUC but low R squared in a random forest classifier

I have been looking for an answer on this website and on Google but I can't seem to find a clear explanation anywhere. The problem is the following. I built a Random Forest model (using Python's ...

LoicM

158

asked Apr 15, 2019 at 13:44

4 votes

0 answers

1k views

Measuring relative variability for variables with different scales II

I'm reformulating this question to see if I might have better luck than OP did at encouraging a response. Consider that you have two univariate datasets at different scales, and need to establish ...

geotheory

647

asked Mar 25, 2019 at 1:43

0 votes

2 answers

298 views

Gini index in classification tree

In Gareth etc.'s book "An introduction to statistical learning", when it's talking about Gini index, I clipped the paragraph in the following image: My question is the statement that "...

Ames ISU

101

asked Mar 11, 2019 at 5:54

Questions tagged [gini]

Related Tags