1

Any ideas on how to code a running ratio of counts (unique and total) for ordered observations (rows) in t-SQL (SQL Server 2017 Community Edition or higher)?

Given a data set, which is necessarily ordered (Ex. a time series indexed/ordered by date):

CREATE TABLE #x (dt DATE, Name VARCHAR(MAX));
INSERT INTO #x VALUES
('2012-01-01', 'a'),('2012-01-02', 'b'),('2012-01-03', 'c'),
('2012-01-04', 'a'),('2012-01-05', 'b'),('2012-01-06', 'd'),
('2012-01-07', 'a'),('2012-01-08', 'b'),('2012-01-09', 'e'),('2012-01-10', 'e');

So, given columns Dt and Names, I need the output Ratio below. For clarity, I also included output columns UCnt that counts unique names to date (i.e. a restricted running count) and TCnt counts all names to date (i.e. a simple running count). The output Ratio field is a just a running proportion of the two.

Dt          Name    UCnt    TCnt    Ratio
2012-01-01  a       1       1       1.000
2012-01-02  b       2       2       1.000
2012-01-03  c       3       3       1.000
2012-01-04  a       3       4       0.750
2012-01-05  b       3       5       0.600
2012-01-06  d       4       6       0.666
2012-01-07  a       4       7       0.571
2012-01-08  b       4       8       0.500
2012-01-09  e       5       9       0.555
2012-01-10  e       5       10      0.500
2

2 Answers 2

4

A combination of OUTER APPLY and ROW_NUMBER seems to do the trick:

WITH CTE AS
(
    SELECT  *,
            Tot_Cnt = ROW_NUMBER() OVER(ORDER BY dt)
    FROM #x
)
SELECT  A.dt,
        A.[Name],
        B.Unq_Cnt,
        A.Tot_Cnt,
        Ratio_of_Cnts = CONVERT(NUMERIC(10,4),B.Unq_Cnt)/A.Tot_Cnt 
FROM CTE A
OUTER APPLY (SELECT Unq_Cnt = COUNT(DISTINCT [Name]) 
             FROM CTE
             WHERE dt <= A.dt) B
;

Here is a demo of this.

0

Thanks Lamak. The OUTER APPLY ran for 2+ hours (~1MM rows); I had to kill it. Here is my solution that takes ~30 sec on the same-size table.

WITH y AS (-- count dupped Names and each observation
    SELECT dt, Name,
    UCnt_=COUNT(*) OVER (PARTITION BY Name ORDER BY Dt), 
    TCnt=ROW_NUMBER() OVER (ORDER BY Dt)
    FROM #x
),
z AS (-- count unique Names only
    SELECT *,
    UCnt=SUM(CASE WHEN UCnt_>1 THEN 0 ELSE 1 END) OVER (ORDER BY Dt) 
    FROM y
)
SELECT Dt, Name, UCnt_, UCnt, TCnt, Ratio=UCnt/1.0/TCnt
FROM z
ORDER BY Dt

The output:

Dt          Name    UCnt_   UCnt    TCnt    Ratio
2012-01-01  a       1       1       1       1.000
2012-01-02  b       1       2       2       1.000
2012-01-03  c       1       3       3       1.000
2012-01-04  a       2       3       4       0.750
2012-01-05  b       2       3       5       0.600
2012-01-06  d       1       4       6       0.666
2012-01-07  a       3       4       7       0.571
2012-01-08  b       3       4       8       0.500
2012-01-09  e       1       5       9       0.555
2012-01-10  e       2       5       10      0.500

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.