Skip to main content

Questions tagged [gini]

The Gini coefficient is used to measure income inequality and discriminatory power of a classifier. If everybody has the same income, Gini coefficient = 0. If one person has all the income, Gini coefficient = 1. All other values are somewhere in between.

Filter by
Sorted by
Tagged with
3 votes
1 answer
470 views

How is Gini impurity related to accuracy when predicting the majority class?

For simplicity, consider the binary case, where we have a set of elements with each element belonging to one of two classes (0 or 1). Let p(j) be the proportion of ...
alex76's user avatar
  • 31
3 votes
2 answers
75 views

Is it possible to calculate a standard deviation from the gini coefficient and mean?

I am looking to create an analysis showing how many people in a given country have more than than X dollars in income. I know the average income, population count, and Gini Coefficient of income ...
Andrew's user avatar
  • 31
0 votes
0 answers
70 views

Relation between gini coefficient/accuracy ratio and roc_auc_score when there are many identical predictions

I have been working on ranking metrics related to various estimators lately, and cam a across a curious phenomenon related to the Gini-coefficient which I would like to understand better. I will start ...
user405288's user avatar
2 votes
1 answer
376 views

Gini impurity greedily optimises a loss function in decision trees

I am trying to understand how the Gini criterion for decision decision tree construction actually greedily optimises a loss function. The Gini impurity, sometimes also called Gini index, for a region (...
ngmir's user avatar
  • 339
0 votes
0 answers
39 views

What is the benefit of implicit performance of a probability of default model?

I have a model which predicts a binary outcome (default/no default). The discriminatory power of the model is normally quantified with Somers' D which is the same as Gini in the binary context. $Gini ...
PalimPalim's user avatar
1 vote
1 answer
2k views

Calculation of the GINI coefficient,Accuracy and AUROC for credit scoring using Python code

I have the following data and I want to compute the GINI and Accuracy for model validation purposes. But I tried to calculate the GINI and Accuracy using Python code, but it seems incorrect. I would ...
StatsUser's user avatar
  • 1,839
4 votes
1 answer
1k views

Calculating Area Under Curve (AUC) using cumulative events and non-events rates after binning the data

I understand that the AUC is basically the area under the ROC curve, which is the plot of the proportion of true positives versus the proportion of false positives at different probability cutoffs. ...
Dr. Dre's user avatar
  • 53
0 votes
0 answers
46 views

How to find significance for Gini coefficient changes?

I'm using the Gini coefficient to evaluate the performance of a model. Making some changes (feature selection, hyperparameter tuning, etc.) I created variant models with different Gini coefficients. ...
Lafayette's user avatar
  • 111
3 votes
1 answer
669 views

Calculate Herfindahl-Hirschman Index when you know the total but only observe the largest few

The Herfindahl–Hirschman Index (HHI) is a concentration measure defined as $$H = \sum_i p_i^2,$$ where $p_i$ is the market share of firm $i$. However this assumes knowing all $p_i$ for an industry. ...
jf328's user avatar
  • 811
0 votes
0 answers
302 views

Ways to measure deviation from a discrete uniform distribution [duplicate]

I'm looking for a way to characterize the deviation from a discrete uniform distribution. Example: 50 balls are distributed over 10 urns. In the most equal case, all urns get 5 balls. In the most ...
cosine's user avatar
  • 1
0 votes
1 answer
143 views

Inverse transform sampling : comparing bias, variance and mse for an estimator

Starting from the PDF of the Pareto distribution, \begin{equation} f_{\theta_1, \theta_2}(x) = \begin{cases} \frac{\theta_1 \theta_2^{\theta_1}}{x^{\theta_1 + 1}}, &\quad x \geq \theta_2 \...
Mathieu Rousseau's user avatar
0 votes
0 answers
40 views

Computing Gini coefficient for a 2 parameters density function

I have a random variable $X$ defined by the following the density function, \begin{equation} f_{\theta_1, \theta_2}(x) = \begin{cases} \frac{\theta_1 \theta_2^{\theta_1}}{x^{\theta_1 + 1}}, &...
Mathieu Rousseau's user avatar
1 vote
1 answer
376 views

Creating a function to compute Gini Index

I'm trying to compute the Gini Index for different examples given in this page. I don't get what I'm doing wrong, as the formula showed is: $Gini Index = 1 - \sum_{i=1}^{C}(p_{i})^{2}$ And my code ...
Chris's user avatar
  • 535
4 votes
1 answer
177 views

Where does the Gini coefficient come from?

I understand what a ROC curve is. However, I do not understand the Gini coefficient in the context of binary classification. All the resources I have checked state that $Gini = 1 - (2 \times AUC_{ROC})...
Arturo Sbr's user avatar
2 votes
1 answer
809 views

Why is my logistic regression outperforming neural networks?

I have 5 samples (each one contains ~380K records, 33 predictive variables and 1 binary Target): one sample is used to train the models the remaining 4 samples are used to validate the models The ...
Giampaolo Levorato's user avatar
0 votes
0 answers
128 views

variations in 4-fold cross-validation coefficients

What does it mean when one of 4-folds Gini coefficient has a low number. For instance 83%, 84%, 85% and 75%? Is this variation is in a normal range? Can it be caused by outliers? Does it worth ...
Sadegh's user avatar
  • 205
3 votes
1 answer
167 views

Splitting criterion of classification tree: Does the growth process come naturally to a stop?

With respect to growing a classification tree: Does growing with Gini or Cross-entropy (CE) imply we would grow the tree until every leaf is pure (in case of no other stopping criteria)? Put ...
J3lackkyy's user avatar
  • 745
0 votes
0 answers
124 views

Remove fatures with low Gini importance score to improve accuracy of Random forest

For a research project on a networking related subject, I am training and testing a Random forest model with a data set that contains 20 features. Initially, I obtained a baseline accuracy of around ...
Nht_e0's user avatar
  • 33
5 votes
1 answer
259 views

How are entropy and Gini Impurity related?

I know the differences between entropy and Gini impurity and why we use Gini in order to construct trees. But I would like to find some relation between those two measures. It leads me to one ...
ltrd's user avatar
  • 151
3 votes
1 answer
65 views

Is my understanding of the Gini plot to detect fat tails correct?

I'm trying to reproduce the following plot: which was generated on the Danish dataset of fire insurance claims using the ineq() function (a wrapper for functions ...
Antoni Parellada's user avatar
2 votes
0 answers
462 views

Derive Gini coefficient of lognormal distribution from definition

The Gini coefficient of a lognormal distribution $\operatorname{Lognormal}(\mu, \sigma^2)$ is $\operatorname{erf}(\sigma / 2)$, where $\operatorname{erf}$ is the error function. But how do I derive ...
Fredrik P's user avatar
  • 502
1 vote
1 answer
1k views

GridsearchCV() gives optimum criterion for Decision Tree should be entropy, but why am I getting better accuracy with Gini?

I ran this code ...
K.Swaviman's user avatar
1 vote
0 answers
25 views

How can I show a mathematical proof of entropy in clasification tree? [closed]

I am trying to understand the splitting criteria in the classification tree. How can I show that for $p_1,p_2,..,p_n$ these functions attaining their maximum and minimum? $g(p_1,p_2,...,p_n) = Σp_i(1-...
Pastor Soto's user avatar
1 vote
0 answers
53 views

How MeanDecreaseGini is calculated for categorical predictors?

I'm implementing a random forest algorithm but I noticed that the categorical variable in the database is not selected among the important variables. So I want to know how RF calculates ...
user1988's user avatar
  • 155
0 votes
1 answer
32 views

How can I forecast Gini using ML?

I have a data set containing 20 years of Gini values ​​for a country. The latest data are for 2018. I want to predict the Gini values ​​for this country by 2025. How can I do this using ML techniques? ...
Maxpayne's user avatar
2 votes
1 answer
98 views

Is this case possible for Decision Tree?

I am studying decision tree and I would like to know if this case is possible: We have 2 features, each does not decrease the Gini of the previous node (=> not choose), but their combination (two ...
Lucas Bensaid's user avatar
0 votes
0 answers
457 views

In gradient boost, do we still split nodes based on splitting criteria(impurity measure)?

Am I correct if i say that we use the loss function to calculate residuals, and the splitting criteria to determine which splits to make to predict these residuals? If this is the case how do we ...
Polarni1's user avatar
0 votes
0 answers
79 views

how can i plot a gini curve?

i am using a scoring metric as below: (gini) ...
Maths12's user avatar
  • 579
2 votes
1 answer
329 views

Log probabilities versus squared probabilities (entropy vs Gini)

The advantage of log probabilities over direct probabilities, as discussed here and here, is that they make numerical values close to $0$ more easy to work with. (my question, instead of the links, ...
develarist's user avatar
  • 4,049
3 votes
1 answer
209 views

Gini Index calculation for near duplicate rows

My data set has near duplicate rows because there are multiple rows for each employee depending on how long they have stayed in the organization. Therefore, employee Ann has 3 rows, Bob has 2 rows etc....
learner's user avatar
  • 627
1 vote
1 answer
937 views

Why we use squared probabilities in the Gini impurity [duplicate]

Why we are using squared probabilities instead of normal probabilities in Gini impurity . Probabilities will always be positive, so why to square those?
Daya's user avatar
  • 11
2 votes
0 answers
30 views

How is the fraction of individuals with negative income handled in calculating the Gini coefficient in grouped data?

Much of the literature on theorizing and estimating the Gini coefficient $G$ is predicated upon the lower bound of the income distribution being $\$0$ (or whatever your unit of currency is); that is, ...
Alexis's user avatar
  • 30.7k
2 votes
0 answers
24 views

When calculating the Gini coefficient for the US, how should the portion of the population which has not filed a return be incorporated?

The Gini coefficient $G$ is a commonly used measure of income distribution inequality, taking values from 0 (meaning every individual in the population has an identical income) to 1 (meaning a single ...
Alexis's user avatar
  • 30.7k
3 votes
1 answer
3k views

How to derive equation of Gini index used in Decision Trees?

Gini coefficient formally is measured as the area between the equality curve and the Lorenz curve. By using the definition I can derive the equation However, I can't obtain the exact Gini index ...
Tarlan Ahad's user avatar
3 votes
1 answer
468 views

Gini Index of Vector with Negative Values

I would like to use the Gini Index to measure the sparsity in a signal. From my research so far it seems that the Gini Index is defined for a vector of positive values. My vector however also contains ...
thebear's user avatar
  • 31
1 vote
0 answers
30 views

Calculate GINI inequality coefficient from IRS SOI data

I am trying to calculate the GINI coefficient from the IRS SOI dataset using the adjusted gross income (AGI) bins provided in the csv. I know this will not be an exact GINI index score, and only a ...
psw's user avatar
  • 55
0 votes
1 answer
102 views

Gini values are not corresponding with Lorenz Curve area

I'm using Gini coefficient and Lorenz Curve plots to show the accumulation of beneficiaries in ecosystem services (ES) supply points, in R. I classify ES into three categories and calculate Gini and ...
Josep Pueyo's user avatar
1 vote
1 answer
168 views

Gini and Lift With Transformed Variable

With regards to Gini Index/Net Lift, If I build a model where the target value is transformed by something - say natural log for example - will the Gini and Lift calculated on the transformed variable ...
Seraphim's user avatar
  • 135
0 votes
0 answers
357 views

What do all the distributions that have the same Gini index have in common?

According to the Wikipedia article about Income inequality metrics, Gini index have the next disadvantage: As a disadvantage, the Gini index only maps a number to the properties of a diagram, but the ...
Yanirmr's user avatar
  • 101
1 vote
0 answers
171 views

Decision trees minimizing the Gini error

I was reading the Elements of Statistical Learning and I stumbled upon the formula for minimizing the misclassification error. I was wondering if I could write something like that for the Gini index. ...
glouis's user avatar
  • 237
5 votes
1 answer
3k views

What is the difference between Gini index and Gini coefficient?

I am building a decision tree from scratch. I have been using entropy so far (calculated this way): ...
Carlos Mougan's user avatar
2 votes
1 answer
1k views

Can someone explain to the Gini Index for a tree?

So I know what the formula for the Gini index. However, I have a few questions that I am hoping to clarify. I saw this, which tells you how to calculate the Gini index for each feature: Computing ...
confused's user avatar
  • 3,263
1 vote
0 answers
57 views

Why is the off-diagonal summation notation for the Gini index used only in classification problems with more than 2 classes?

The formula for the Gini index as a node impurity measure can be written as: $Gini(q)= \sum_{k=1}^M p_{qk}(1-p_{qk})$ Where $q$ is the node and $M$ represents the number of classes. Why can we only ...
Tommyixi's user avatar
  • 233
1 vote
0 answers
45 views

Looking at two PDF plots, is it possible to guess which distribution has a greater Gini coefficient?

By observing the PDF of two different distributions over the same support (as in the image), is it possible to infer which PDF describes the distribution with the greater Gini coefficient? I assume ...
Tecon's user avatar
  • 21
0 votes
1 answer
715 views

decision tree training: gini vs entropy vs precision vs recall

When training decision trees, the standard algorithms (e.g. ID3, C4.5, C5.0) use either the gini index or entropy to determine which node to add next. Only once the tree is built, and the ROC curve is ...
mathew gunther's user avatar
2 votes
1 answer
2k views

Gini Index Formula

I've read many related articles and posts. The more I read, the more I got confused about 'Gini index' and 'Gini Impurity'. I understood the concept but it seems to me that these things are used ...
Dr Nisha Arora's user avatar
4 votes
0 answers
779 views

What are the loss function used in Gradient Boosting vs Random Forest? Would Gini/ Entropy work for both models?

When I look at Python package tutorials, I compared the function for GradientBoostingClassifier and RandomForestClassifier and found 2 differences: 1) GBM does not mention 'Gini' or 'Entropy', which ...
user246123's user avatar
2 votes
2 answers
4k views

High AUC but low R squared in a random forest classifier

I have been looking for an answer on this website and on Google but I can't seem to find a clear explanation anywhere. The problem is the following. I built a Random Forest model (using Python's ...
LoicM's user avatar
  • 158
4 votes
0 answers
1k views

Measuring relative variability for variables with different scales II

I'm reformulating this question to see if I might have better luck than OP did at encouraging a response. Consider that you have two univariate datasets at different scales, and need to establish ...
geotheory's user avatar
  • 647
0 votes
2 answers
298 views

Gini index in classification tree

In Gareth etc.'s book "An introduction to statistical learning", when it's talking about Gini index, I clipped the paragraph in the following image: My question is the statement that "...
Ames ISU's user avatar
  • 101