I am trying to build a complex sql query for hours but still didn't find any way to do it as expected.
Here is my table and my dataset :
create table Skills
(
ID varchar(10),
StartDate date,
EndDate date,
Skill varchar(10)
);
Insert into Skills values
('1','2021-01-01','2021-12-31','A'),
('1','2022-01-01','2022-12-31','B'),
('2','2021-01-01','2021-12-31','A'),
('2','2021-11-30','2022-12-31','B'),
('3','2021-01-01','2021-12-31','A'),
('3','2021-11-30','2022-12-31','B'),
('3','2022-11-30','2023-12-31','C'),
('4','2021-01-01','2021-12-31','A'),
('4','2022-01-01','2022-12-31','B'),
('4','2022-11-30','2023-12-31','C');
I would like to aggregate rows by ID only when dates range (StartDate, EndDate) overlap. Here is the expected result :
1, 2021-01-01, 2021-12-31, A
1, 2022-01-01, 2022-12-31, B
2, 2021-01-01, 2022-12-31, B
3, 2021-01-01, 2023-12-31, C
4, 2021-01-01, 2021-12-31, A
4, 2022-01-01, 2023-12-31, C
When rows with overlapping dates range are aggregated, we need to keep the oldest StartDate, the newest EndDate and the Skill associated to the newest EndDate.
I tried so many queries with partition by, lag, cte, etc.
Could you help me find the right solution please ?
Thanks, Regards