1

I have the below table extracted from a Type 2 Slowly Changing dimension. I want to group rows to only have distinct classifications and their respective 'switch' time periods in a chronological order. A simple GROUP BY won't work because of the classifications that can 'come back', like the classification 'K' that 'came back' in RowNum 5.

Current Data :

RowNum_Helper Item_ID Classification Start_Date End_Date
1 X K 2021-01-07 02:02:44.060 2021-01-08 02:03:45.980
2 X K 2021-01-08 02:03:45.980 2021-01-31 20:03:38.993
3 X N 2021-01-31 20:03:38.993 2021-03-26 14:44:08.617
4 X N 2021-03-26 14:44:08.617 2021-04-08 02:02:51.013
5 X K 2021-04-08 02:02:51.013 2021-10-13 02:02:51.013
6 X N 2021-10-13 02:02:51.013 2021-10-14 02:02:51.013
7 X K 2021-10-14 02:02:51.013 9999-12-31 23:59:59.999
8 A K 2021-06-14 02:02:51.013 9999-12-31 23:59:59.999

Desired data :

Item_ID Classification Earliest_Start_Date Latest_End_Date
X K 2021-01-07 02:02:44.060 2021-01-31 20:03:38.993
X N 2021-01-31 20:03:38.993 2021-04-08 02:02:51.013
X K 2021-04-08 02:02:51.013 2021-10-13 02:02:51.013
X N 2021-10-13 02:02:51.013 2021-10-14 02:02:51.013
X K 2021-10-14 02:02:51.013 9999-12-31 23:59:59.999
A K 2021-06-14 02:02:51.013 9999-12-31 23:59:59.999

I tired using windowed functions like SUM OVER PARTITION BY etc, but can't really find an elegant way of doing it. Any help would be appreciated.

SQLFiddle seems to be offline, I used StackExchangeDE to create table with values for playground

0

1 Answer 1

0

Try an approach like this.

First identify the rows that have changed by using LAG. The do a running total on those change points. Once you know that you can do the Group By but include the groups.

DROP TABLE IF EXISTS #Source
CREATE TABLE #Source (RowNum_Helper INT, ItemID CHAR(1), Classification CHAR(1), Start_Date DATETIME2, End_Date DATETIME2)

INSERT INTO #Source
SELECT 1, 'X','K','2021-01-07 02:02:44.060', '2021-01-08 02:03:45.980' UNION ALL
SELECT 2, 'X','K','2021-01-08 02:03:45.980', '2021-01-31 20:03:38.993' UNION ALL
SELECT 3, 'X','N','2021-01-31 20:03:38.993', '2021-03-26 14:44:08.617' UNION ALL
SELECT 4, 'X','N','2021-03-26 14:44:08.617', '2021-04-08 02:02:51.013' UNION ALL
SELECT 5, 'X','K','2021-04-08 02:02:51.013', '2021-10-13 02:02:51.013' UNION ALL
SELECT 6, 'X','N','2021-10-13 02:02:51.013', '2021-10-14 02:02:51.013' UNION ALL
SELECT 7, 'X','K','2021-10-14 02:02:51.013', '9999-12-31 23:59:59.999' UNION ALL
SELECT 8, 'A','K','2021-06-14 02:02:51.013', '9999-12-31 23:59:59.999'
Go

WITH IdentifyChangedRows
AS (SELECT RowNum_Helper,
           ItemID,
           Classification,
           Start_Date,
           End_Date,
           LAG(Classification) OVER (PARTITION BY ItemID ORDER BY Start_Date) AS Classification_Lag
    FROM #Source S
   ),
     CreateGroups
AS (SELECT RowNum_Helper,
           ItemID,
           Classification,
           Start_Date,
           End_Date,
           SUM(   CASE
                      WHEN Classification_Lag = Classification
                           OR Classification_Lag IS NULL THEN
                          0
                      ELSE
                          1
                  END
              ) OVER (ORDER BY RowNum_Helper
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                     ) AS ChangeGroup
    FROM IdentifyChangedRows
   )
SELECT ItemID,
       Classification,
       MIN(Start_Date) Start_Date,
       MAX(End_Date) End_Date
FROM CreateGroups G
GROUP BY ItemID,
         Classification,
         ChangeGroup
ORDER BY ItemID DESC,
         Start_Date
1
  • That works very well indeed... Thank you !
    – Yass T
    Commented Oct 7, 2021 at 16:49

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.