Questions tagged [gini]
The Gini coefficient is used to measure income inequality and discriminatory power of a classifier. If everybody has the same income, Gini coefficient = 0. If one person has all the income, Gini coefficient = 1. All other values are somewhere in between.
119 questions
3
votes
1
answer
470
views
How is Gini impurity related to accuracy when predicting the majority class?
For simplicity, consider the binary case, where we have a set of elements with each element belonging to one of two classes (0 or 1). Let p(j) be the proportion of ...
3
votes
2
answers
75
views
Is it possible to calculate a standard deviation from the gini coefficient and mean?
I am looking to create an analysis showing how many people in a given country have more than than X dollars in income.
I know the average income, population count, and Gini Coefficient of income ...
0
votes
0
answers
70
views
Relation between gini coefficient/accuracy ratio and roc_auc_score when there are many identical predictions
I have been working on ranking metrics related to various estimators lately, and cam a across a curious phenomenon related to the Gini-coefficient which I would like to understand better.
I will start ...
2
votes
1
answer
376
views
Gini impurity greedily optimises a loss function in decision trees
I am trying to understand how the Gini criterion for decision decision tree construction actually greedily optimises a loss function.
The Gini impurity, sometimes also called Gini index, for a region (...
0
votes
0
answers
39
views
What is the benefit of implicit performance of a probability of default model?
I have a model which predicts a binary outcome (default/no default).
The discriminatory power of the model is normally quantified with Somers' D which is the same as Gini in the binary context.
$Gini ...
1
vote
1
answer
2k
views
Calculation of the GINI coefficient,Accuracy and AUROC for credit scoring using Python code
I have the following data and I want to compute the GINI and Accuracy for model validation purposes. But I tried to calculate the GINI and Accuracy using Python code, but it seems incorrect. I would ...
4
votes
1
answer
1k
views
Calculating Area Under Curve (AUC) using cumulative events and non-events rates after binning the data
I understand that the AUC is basically the area under the ROC curve, which is the plot of the proportion of true positives versus the proportion of false positives at different probability cutoffs. ...
0
votes
0
answers
46
views
How to find significance for Gini coefficient changes?
I'm using the Gini coefficient to evaluate the performance of a model. Making some changes (feature selection, hyperparameter tuning, etc.) I created variant models with different Gini coefficients.
...
3
votes
1
answer
669
views
Calculate Herfindahl-Hirschman Index when you know the total but only observe the largest few
The Herfindahl–Hirschman Index (HHI) is a concentration measure defined as
$$H = \sum_i p_i^2,$$
where $p_i$ is the market share of firm $i$. However this assumes knowing all $p_i$ for an industry.
...
0
votes
0
answers
302
views
Ways to measure deviation from a discrete uniform distribution [duplicate]
I'm looking for a way to characterize the deviation from a discrete uniform distribution.
Example: 50 balls are distributed over 10 urns.
In the most equal case, all urns get 5 balls.
In the most ...
0
votes
1
answer
143
views
Inverse transform sampling : comparing bias, variance and mse for an estimator
Starting from the PDF of the Pareto distribution,
\begin{equation}
f_{\theta_1, \theta_2}(x) =
\begin{cases}
\frac{\theta_1 \theta_2^{\theta_1}}{x^{\theta_1 + 1}}, &\quad x \geq \theta_2 \...
0
votes
0
answers
40
views
Computing Gini coefficient for a 2 parameters density function
I have a random variable $X$ defined by the following the density function,
\begin{equation}
f_{\theta_1, \theta_2}(x) =
\begin{cases}
\frac{\theta_1 \theta_2^{\theta_1}}{x^{\theta_1 + 1}}, &...
1
vote
1
answer
376
views
Creating a function to compute Gini Index
I'm trying to compute the Gini Index for different examples given in this page. I don't get what I'm doing wrong, as the formula showed is:
$Gini Index = 1 - \sum_{i=1}^{C}(p_{i})^{2}$
And my code ...
4
votes
1
answer
177
views
Where does the Gini coefficient come from?
I understand what a ROC curve is. However, I do not understand the Gini coefficient in the context of binary classification.
All the resources I have checked state that $Gini = 1 - (2 \times AUC_{ROC})...
2
votes
1
answer
809
views
Why is my logistic regression outperforming neural networks?
I have 5 samples (each one contains ~380K records, 33 predictive variables and 1 binary Target):
one sample is used to train the models
the remaining 4 samples are used to validate the models
The ...
0
votes
0
answers
128
views
variations in 4-fold cross-validation coefficients
What does it mean when one of 4-folds Gini coefficient has a low number. For instance 83%, 84%, 85% and 75%?
Is this variation is in a normal range?
Can it be caused by outliers?
Does it worth ...
3
votes
1
answer
167
views
Splitting criterion of classification tree: Does the growth process come naturally to a stop?
With respect to growing a classification tree: Does growing with Gini or Cross-entropy (CE) imply we would grow the tree until every leaf is pure (in case of no other stopping criteria)? Put ...
0
votes
0
answers
124
views
Remove fatures with low Gini importance score to improve accuracy of Random forest
For a research project on a networking related subject, I am training and testing a Random forest model with a data set that contains 20 features.
Initially, I obtained a baseline accuracy of around ...
5
votes
1
answer
259
views
How are entropy and Gini Impurity related?
I know the differences between entropy and Gini impurity and why we use Gini in order to construct trees. But I would like to find some relation between those two measures. It leads me to one ...
3
votes
1
answer
65
views
Is my understanding of the Gini plot to detect fat tails correct?
I'm trying to reproduce the following plot:
which was generated on the Danish dataset of fire insurance claims using the ineq() function (a wrapper for functions ...
2
votes
0
answers
462
views
Derive Gini coefficient of lognormal distribution from definition
The Gini coefficient of a lognormal distribution $\operatorname{Lognormal}(\mu, \sigma^2)$ is $\operatorname{erf}(\sigma / 2)$, where $\operatorname{erf}$ is the error function. But how do I derive ...
1
vote
1
answer
1k
views
1
vote
0
answers
25
views
How can I show a mathematical proof of entropy in clasification tree? [closed]
I am trying to understand the splitting criteria in the classification tree. How can I show that for $p_1,p_2,..,p_n$ these functions attaining their maximum and minimum?
$g(p_1,p_2,...,p_n) = Σp_i(1-...
1
vote
0
answers
53
views
How MeanDecreaseGini is calculated for categorical predictors?
I'm implementing a random forest algorithm but I noticed that the categorical variable in the database is not selected among the important variables. So I want to know how RF calculates ...
0
votes
1
answer
32
views
How can I forecast Gini using ML?
I have a data set containing 20 years of Gini values for a country. The latest data are for 2018. I want to predict the Gini values for this country by 2025. How can I do this using ML techniques? ...
2
votes
1
answer
98
views
Is this case possible for Decision Tree?
I am studying decision tree and I would like to know if this case is possible:
We have 2 features, each does not decrease the Gini of the previous node (=> not choose), but their combination (two ...
0
votes
0
answers
457
views
In gradient boost, do we still split nodes based on splitting criteria(impurity measure)?
Am I correct if i say that we use the loss function to calculate residuals, and the splitting criteria to determine which splits to make to predict these residuals?
If this is the case how do we ...
0
votes
0
answers
79
views
how can i plot a gini curve?
i am using a scoring metric as below: (gini)
...
2
votes
1
answer
329
views
Log probabilities versus squared probabilities (entropy vs Gini)
The advantage of log probabilities over direct probabilities, as discussed here and here, is that they make numerical values close to $0$ more easy to work with. (my question, instead of the links, ...
3
votes
1
answer
209
views
Gini Index calculation for near duplicate rows
My data set has near duplicate rows because there are multiple rows for each employee depending on how long they have stayed in the organization. Therefore, employee Ann has 3 rows, Bob has 2 rows etc....
1
vote
1
answer
937
views
Why we use squared probabilities in the Gini impurity [duplicate]
Why we are using squared probabilities instead of normal probabilities in Gini impurity . Probabilities will always be positive, so why to square those?
2
votes
0
answers
30
views
How is the fraction of individuals with negative income handled in calculating the Gini coefficient in grouped data?
Much of the literature on theorizing and estimating the Gini coefficient $G$ is predicated upon the lower bound of the income distribution being $\$0$ (or whatever your unit of currency is); that is, ...
2
votes
0
answers
24
views
When calculating the Gini coefficient for the US, how should the portion of the population which has not filed a return be incorporated?
The Gini coefficient $G$ is a commonly used measure of income distribution inequality, taking values from 0 (meaning every individual in the population has an identical income) to 1 (meaning a single ...
3
votes
1
answer
3k
views
How to derive equation of Gini index used in Decision Trees?
Gini coefficient formally is measured as the area between the equality curve and the Lorenz curve. By using the definition I can derive the equation
However, I can't obtain the exact Gini index ...
3
votes
1
answer
468
views
Gini Index of Vector with Negative Values
I would like to use the Gini Index to measure the sparsity in a signal. From my research so far it seems that the Gini Index is defined for a vector of positive values. My vector however also contains ...
1
vote
0
answers
30
views
Calculate GINI inequality coefficient from IRS SOI data
I am trying to calculate the GINI coefficient from the IRS SOI dataset using the adjusted gross income (AGI) bins provided in the csv. I know this will not be an exact GINI index score, and only a ...
0
votes
1
answer
102
views
Gini values are not corresponding with Lorenz Curve area
I'm using Gini coefficient and Lorenz Curve plots to show the accumulation of beneficiaries in ecosystem services (ES) supply points, in R. I classify ES into three categories and calculate Gini and ...
1
vote
1
answer
168
views
Gini and Lift With Transformed Variable
With regards to Gini Index/Net Lift, If I build a model where the target value is transformed by something - say natural log for example - will the Gini and Lift calculated on the transformed variable ...
0
votes
0
answers
357
views
What do all the distributions that have the same Gini index have in common?
According to the Wikipedia article about Income inequality metrics, Gini index have the next disadvantage:
As a disadvantage, the Gini index only maps a number to the properties of a diagram, but the ...
1
vote
0
answers
171
views
Decision trees minimizing the Gini error
I was reading the Elements of Statistical Learning and I stumbled upon the formula for minimizing the misclassification error. I was wondering if I could write something like that for the Gini index.
...
5
votes
1
answer
3k
views
What is the difference between Gini index and Gini coefficient?
I am building a decision tree from scratch. I have been using entropy so far (calculated this way):
...
2
votes
1
answer
1k
views
Can someone explain to the Gini Index for a tree?
So I know what the formula for the Gini index. However, I have a few questions that I am hoping to clarify. I saw this, which tells you how to calculate the Gini index for each feature: Computing ...
1
vote
0
answers
57
views
Why is the off-diagonal summation notation for the Gini index used only in classification problems with more than 2 classes?
The formula for the Gini index as a node impurity measure can be written as:
$Gini(q)= \sum_{k=1}^M p_{qk}(1-p_{qk})$
Where $q$ is the node and $M$ represents the number of classes. Why can we only ...
1
vote
0
answers
45
views
Looking at two PDF plots, is it possible to guess which distribution has a greater Gini coefficient?
By observing the PDF of two different distributions over the same support (as in the image), is it possible to infer which PDF describes the distribution with the greater Gini coefficient?
I assume ...
0
votes
1
answer
715
views
decision tree training: gini vs entropy vs precision vs recall
When training decision trees, the standard algorithms (e.g. ID3, C4.5, C5.0) use either the gini index or entropy to determine which node to add next. Only once the tree is built, and the ROC curve is ...
2
votes
1
answer
2k
views
Gini Index Formula
I've read many related articles and posts. The more I read, the more I got confused about 'Gini index' and 'Gini Impurity'. I understood the concept but it seems to me that these things are used ...
4
votes
0
answers
779
views
What are the loss function used in Gradient Boosting vs Random Forest? Would Gini/ Entropy work for both models?
When I look at Python package tutorials, I compared the function for GradientBoostingClassifier and RandomForestClassifier and found 2 differences:
1) GBM does not mention 'Gini' or 'Entropy', which ...
2
votes
2
answers
4k
views
High AUC but low R squared in a random forest classifier
I have been looking for an answer on this website and on Google but I can't seem to find a clear explanation anywhere.
The problem is the following. I built a Random Forest model (using Python's ...
4
votes
0
answers
1k
views
Measuring relative variability for variables with different scales II
I'm reformulating this question to see if I might have better luck than OP did at encouraging a response.
Consider that you have two univariate datasets at different scales, and need to establish ...
0
votes
2
answers
298
views
Gini index in classification tree
In Gareth etc.'s book "An introduction to statistical learning", when it's talking about Gini index, I clipped the paragraph in the following image:
My question is the statement that "...