753 questions
0
votes
1
answer
56
views
Percentile formula limited to lookup value in array in separate sheet
I have a dataset in Sheet 1 where the data forms a long vertical-scrolling list, i.e.
A B C D
Location Param Value Date
L1 Conductivity 1,250 01/...
0
votes
1
answer
71
views
Creating deciles in SQL
I'm trying to bucket my data into deciles, but not in the traditional sense where the dimension is the basis of the decile.
I have 463 unique it_scores ranging from 316-900 (my dimension) with 1,296,...
2
votes
1
answer
40
views
Explaining the different methods for percentile calculation in NumPy
I was trying to create some box plots of data I have. I first was plotting them using Excel, but wanted to move on to Python to be able to personalize them. However, I was surpiresed to see the ...
0
votes
2
answers
51
views
How to create variable that shows the percentile ranges an observation is in
Say that I have the iris data.
I know that I can create a variable that shows the values that fall into a certain percentile:
library(tidyverse)
iris %>% mutate(Range = cut(Sepal.Length, quantile(...
0
votes
0
answers
28
views
Confidance Interval to percentile from VGAM
How can I add a confidence interval to each quantile with a fill color similar to the line color using the VGAM or gamlss? Thankyou for your help.
I am using following code using package VGAM in R:
...
0
votes
1
answer
38
views
Assigning a Percentile Range based off a Product/Retailors Revenu
I'm dealing with quite a puzzle at work within Pbi that I/ChatGPT/Co-pilot can't figure out.
I created a summary table of this dummy data set to showcase what I'm trying to do (image attached)
I'm ...
0
votes
0
answers
32
views
Python percentile functions give weird results [duplicate]
I am trying to calculate percentile(0.05) to percentile(0.95) with 0.05 as step. I did some test and one of the implementation gave me weird results. I couldn't figure out why. I will just use ...
1
vote
1
answer
80
views
Calculating percentile over a sliding window in Power Query
I have a table with 5 columns, District (type text), Month Cumulative (Int64.Type), Measurement (type text), Value (type number) and Monthdate (type Date, 1/mm/yyyy). The table is sorted on District ...
0
votes
0
answers
444
views
How to calculate percentile using PERCENT_RANK() using ANSI SQL
I am trying to calculate 25 percentile,75 percentile, 95 percentile and 99 percentile for datasets that have multiple columns, ID-Count, x , y , etc.
I am using Amazon Athena which support Presto SQL.
...
0
votes
0
answers
51
views
Assigning points based on quartiles (percentiles) according to a grouped virtual table
Good day to all!
I have a table with tasks that one or another counterparty (person) performs
My task is:
Group the data by counterparty and how many tasks he completed
Find the value of 25% 50% 75% ...
0
votes
1
answer
60
views
combining 50th percentiles curve gamlss in one plot
I am a seasoned R programmer trying to help my friend to visualize percentile growth curves using gamlss. I have been successfully produce the curve for each group. Now the problem is she asked me to ...
0
votes
1
answer
62
views
Percentile Aggregation in Elastic Search returns value 0 even though there are matching documents with values
I am trying to calculate 75th percentile value for pricingscore property thru aggregation in my Elastic Search query. And my last condition seem to return 0 docCount returning 0 percentile value.
This ...
0
votes
1
answer
28
views
percentitles aggregation in Elastic Search just gives percentile value as whatever is given in missing parameter
I just simplified my query as this
GET /index-name/_search
{
"from": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
...
1
vote
2
answers
82
views
Python - Pandas - What is the exact formula for percentile calculation by describe() method?
Could someone please explain how percentiles are calculated by describe() method.
Different sources explained this using different approaches. What is the exact way of calculation?
For example, ...
2
votes
1
answer
106
views
Colouring background of dataframe cells using percentiles
So I have a timeseries dataframe, df, which has 'n' columns and a load of rows:
df = pd.read_csv('percentiles.csv', index_col=0, parse_dates=True)
The last 3 rows of df look something like this:
ATH
...
3
votes
2
answers
57
views
Calculating percentile of values from separate grouped dataframes
I have two dataframes, one (df1) contains the minimum 'flow' value for specific events observed at different sites. The second dataframe (df2) contains the complete flow rime series.
df1 <- data....
0
votes
0
answers
118
views
How can I calculate percentile for every single data inside an xarray dataset
I have a dataset of one variable with the dimension of time, lat, Lon. The dataset looks like the following and it has several grids where there is NaN values:
<xarray.Dataset>
Dimensions: (...
-1
votes
1
answer
464
views
How to Find and Use Percentiles in Stata
I have a variable, income, that details some respondents' incomes. I now want to create a new variable, income_group, which has a value of 1 if the respondent's income is less than the 50th percentile,...
0
votes
0
answers
75
views
Getting the right behaviour of Excel's PERCENTRANK.EXC on mySQL
I would like to replicate Excel's PERECENTRANK.EXC() functionality and behaviour on SQL.
Sample Data set:
From what I understand from PERCENTRANK.EXC(), it excludes the first and last value of the ...
0
votes
2
answers
1k
views
Percentile formula in Excel
I have a range of numbers I am trying to summarize using percentiles; 10th and 90th to weed out the outliers and then 25th, 50th, and 75th to demonstrate the distribution within the range. The range ($...
1
vote
1
answer
65
views
Add an aggregate over full table outside the buckets to every bucket row
I have a query that filters the emails of people who "have a quantity" bigger than the median over all quantities found in the table:
SELECT
AVG(quantity) AS avg_quantity,
email
...
1
vote
1
answer
336
views
What is the mathematical way of calculating PERCENTILE_DISC() in oracle
I have a series of numbers as follows: 2, 3, 5, 1, 9, 10, 20, 34, 77, 55, 11, 13, 56, 99
I write a query to calculate PERCENTILE of 25th as follows:
SELECT
PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER ...
1
vote
1
answer
1k
views
How to get highest and average 95th percentile over a period of time in Prometheus Query Language
I need to get the highest and average 95th percentile recorded over time in PromQL.
This query only gets the current 95th percentile.
histogram_quantile(
0.95,
sum(
rate(
...
-1
votes
1
answer
138
views
Query for calculating percentile based on average
I have a group of athletes and I would like to calculate where their scores land from a percentile perspective. Doing so based on their best score doesn't make sense, but doing so based on their ...
0
votes
1
answer
756
views
Aggregation-Percentile Clickhouse
I would like to store the data of percentile 99 in every 5 minutes on ClickHouse. But, in case that I want to calculate the p99 for 10 minutes. the average p99 of 2 interval may be not accurate.
I ...
1
vote
3
answers
202
views
A neat way in R to get mean and 5 and 95 percentiles given a probability distribution in a string format?
I would like to write a function that takes a string as an input. To be exact this input represents a probability distribution and the output would be 5 percentile, mean, and 95 percentile as a vector ...
0
votes
1
answer
83
views
How to get percentile in "reverse" in php [closed]
I am not sure if this is correct title. I have let's say 3000 members, each one is ranked based on number of points. Max rank is 3000, min rank is 1. I need to check for each member on which top 10% ...
2
votes
1
answer
203
views
Groupby and percentage distributions pyspark equivalent of given pandas code
Whenever I want to get distributions in pandas for my entire dataset I just run the following basic code:
x.groupby('y').describe(percentiles=[.1, .25, .5, .75, .9, 1])
where I get the distribution ...
0
votes
1
answer
96
views
ElasticSearch, calculate 75th percentile of the first 25 hits of _score
In ElasticSearch I'm looking for a multi_match in three fields (Field1, Field2, Field3). I now want to calculate within elasticsearch aggs function the 75th of the _score values. Calculation should ...
0
votes
1
answer
240
views
In Excel, how can I PercentRank only values above a certain threshold in one column?
In Microsoft Excel, I'm looking to percentrank values in one column, but I'd only like to rank those that are that are above 1,000. All the others that are below 1,000, I'd like to leave a zero. Here'...
0
votes
1
answer
39
views
qBCPEо with negative sigma values
I am trying to use qbcpeo to produce lower limit than normal for a gamless bcpe object, however, some of the sigma values (sigma link is log) yields negative value so when I attempt to use qbcpeo I ...
4
votes
1
answer
161
views
How to select observations that are within a certain quantile
I have data (~1000 rows) that look like this:
head(data)
alt alb alp alt_zscore alb_zscore alp_zscore
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ...
1
vote
1
answer
373
views
How to calculate a growing/expanding percentile in Polars using cumulative_eval?
I'm trying to calculate a growing percentile for a column in my Polars DataFrame. The goal is to calculate the percentile from the beginning of the column up until the current observation.
Example ...
1
vote
2
answers
43
views
If value of column in pandas above 90th percentile then 1, if in between 75th percentile and 90th percentile then 2
I have a df column with volume data. I want to categorize the volume data as 1 if the value is above the 90-th percentile of the column, 2 if it is in between 75 th percentile and 90-th percentile.
I ...
0
votes
1
answer
69
views
Find quantile bracket threshold R
I have a dataset where people are stored in income group like this :
test<-tibble(income_group = c(1:3), pop = c(20,25,10),max_income_from=c(100,200,500))
And would like to know if there is a ...
0
votes
1
answer
118
views
rolling percentile rank of values from column B compared to column C
I'm trying to calculate the percentile rank for each row value in column #2 compared to the 4 lagged values in column #3.
I've tried using zoo::rollapply and slider::slide_dbl but I can't get the ...
0
votes
1
answer
77
views
Statsmodels ARIMA: how to get 0, 10, 20, .., 90,100 percentile forecast?
My goal is compute various percentile forecast for the same day.
My code:
# import the data
catfish_sales = pd.read_csv('/kaggle/input/time-series-toy-data-set/catfish.csv',
...
0
votes
1
answer
62
views
How to get the percentile of categories in a JSON file using Python?
How to count percentiles of categories located in an list in Python
I have this giant JSON to work with, assuming I'm new to Python (I do have background in programming), how can I get the percetile ...
0
votes
0
answers
31
views
Issues with IF AND statement in Excel
I'm having issues with a "PERCENTILE(IF(AND(" statement. I want to return a percentile based on two criteria, but for some reason it keeps giving me "$0." What am I doing wrong ...
0
votes
0
answers
268
views
How to create segment scores in SQL based on column values (instead of row based NTILES)
I am trying to create a preset number of x segments using column values in SQL (Google BigQuery).
Example:
Using a table sp_sales with 19 salespersons and sales, I want to create a segment from 1 to 5 ...
0
votes
0
answers
69
views
How to return percentiles for different variables in one calculation in R?
I have a dataset along the lines of that below (but many more entries) and am trying to see if there is a way in R to calculate the percentile rank in a new column ("Column D") that would be ...
1
vote
0
answers
20
views
python dataframe rolling on month-end dates: rolling average of the lowest 1% values
For each month-end date and each "PERMNO" (company identifier), I'd like to compute the rolling average of the lowest 1% values in the past 252 days from a dataframe column named "RET&...
-1
votes
2
answers
260
views
Python, Pandas: Percentile of grouped data with multiple columns
I haven't been able to find an answer to this specific question. I have data that looks like:
df = pd.DataFrame({'Product': ['Alpha', 'Alpha', 'Alpha', 'Alpha','Alpha', 'Beta', 'Beta', 'Beta','Beta', ...
0
votes
1
answer
71
views
I want ntile(3) within ntile(3) as in subdivision within division by ntile()
I want to create a ntile(3) within an ntile(3).
I have the following table:
Customer
Total_amt
Digital_amt
1
100
45
2
200
150
3
150
23
4
300
100
5
350
350
6
112
10
7
312
15
8
260
160
9
232
...
0
votes
0
answers
22
views
Find at what percentile the average of each key lies on in PySpark
I have a dataframe with columns as values, Key and avg for each key. I need to calculate at what percentile the average lies on for each key in Pyspark.
How do I replicate the same output for each key....
0
votes
0
answers
108
views
PowerBI: Calculate Median, Q1 and Q3 in an aggregated table at the lowest common denominator
I'm currently facing a problem for which I haven't found a solution on the forum.
I'm working on an aggregated table containing information such as the installation region, departmental perimeter, lot ...
1
vote
1
answer
1k
views
Calculating percentile over multiple large files without holding them all in memory (Python)
I'm trying to calculate the 99th percentile of a certain value in a climate dataset. I have hourly observations spread across a lat-lon grid of 361 x 576 points for 43 years (1980-2022).
Is there a ...
0
votes
1
answer
484
views
Boxplot with additional lines for 10th and 90th percentile in R
Unfortunately I am not very experienced in R, but I need to solve a problem, which appears quite difficult to me, but probably is quite easy if one knows how to work with boxplots in R. I would be ...
0
votes
1
answer
315
views
Get Percentile on the Group by Result in sql
I have a table which store friendship with tag like below
id | userid | friendid | tag
1 123 124 a
2 123 125 b
3 211 212 c
4 213 214 ...
1
vote
1
answer
52
views
Understanding the subtle difference in calculating percentile
When calculating the percentile using numpy, I see some authors use:
Q1, Q3 = np.percentile(X, [25, 75])
which is clear to me. However, I also see others use:
loss = np.percentile(X, 4)
I presume 4 ...