Aggregate rows only if dates range overlap

Question

I am trying to build a complex sql query for hours but still didn't find any way to do it as expected.

Here is my table and my dataset :

create table Skills
(
ID varchar(10),
StartDate date,
EndDate date,
Skill varchar(10)
);

Insert into Skills values
('1','2021-01-01','2021-12-31','A'),
('1','2022-01-01','2022-12-31','B'),
('2','2021-01-01','2021-12-31','A'),
('2','2021-11-30','2022-12-31','B'),
('3','2021-01-01','2021-12-31','A'),
('3','2021-11-30','2022-12-31','B'),
('3','2022-11-30','2023-12-31','C'),
('4','2021-01-01','2021-12-31','A'),
('4','2022-01-01','2022-12-31','B'),
('4','2022-11-30','2023-12-31','C');

I would like to aggregate rows by ID only when dates range (StartDate, EndDate) overlap. Here is the expected result :

1, 2021-01-01, 2021-12-31, A
1, 2022-01-01, 2022-12-31, B
2, 2021-01-01, 2022-12-31, B
3, 2021-01-01, 2023-12-31, C
4, 2021-01-01, 2021-12-31, A
4, 2022-01-01, 2023-12-31, C

When rows with overlapping dates range are aggregated, we need to keep the oldest StartDate, the newest EndDate and the Skill associated to the newest EndDate.

I tried so many queries with partition by, lag, cte, etc.

Could you help me find the right solution please ?

Thanks, Regards

and the Skill associated to the newest EndDate What if 2 rows have the same EndDate but different Skill? What if StartDate is the same too? — Akina, Commented Jul 11, 2023 at 13:53
For a same ID you can't have the same StartDate, nor the same EndDate (I mean in my case) — Gosfly, Commented Jul 11, 2023 at 14:13
Does this is provided with according unique indices? if there are such indices then duplicates (include complete ones) MAY EXIST (for example as an issue of some programmatical error or fail), and nothing prevents this. You must take this into account.. — Akina, Commented Jul 12, 2023 at 4:25

SelVazi · Accepted Answer · 2023-07-11 14:13:20Z

1

This is a gaps and islands problem, to solve it you can use lag() to determine where the "islands" start, Then use a cumulative sum() to determine gaps :

Assuming the endDate is unique per id :

select d.*, s.skill
from (
  select d.id, min(d.start_date) as start_date, max(d.end_date) as end_date
  from (
      select d.*,
      sum(case when DATEDIFF(prev_end_date, start_date) > 0 then 0 else 1 end)  over (partition by id order by start_date) as grp
      from (
            select d.*,
            lag(end_date) over (partition by id order by start_date) as prev_end_date
            from Skills d
      ) d
  ) d
  group by d.id, grp
) d
inner join Skills s on s.id = d.id and s.end_date = d.end_date

Demo here

edited Jul 11, 2023 at 14:13

answered Jul 11, 2023 at 13:21

SelVazi

16k2 gold badges18 silver badges33 bronze badges

1

Man you so strong, thank you very much, this is doing it perfectly !
– Gosfly
Commented Jul 11, 2023 at 13:29
Just find out a little issue with max(Skill) as it will pick the max in alphabetical order instead of the value of Skill from the row with the last EndDate. From my example, if you change ('4','2022-11-30','2023-12-31','C'); by ('4','2022-11-30','2023-12-31','A'); The result will be 'B' instead of 'A'.
– Gosfly
Commented Jul 11, 2023 at 13:43
That is correct, let me check
– SelVazi
Commented Jul 11, 2023 at 13:50
1

That's correct, that's what I would have done too, thanks for your answer and your help
– Gosfly
Commented Jul 11, 2023 at 14:43

Add a comment |

Collectives™ on Stack Overflow

Aggregate rows only if dates range overlap

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
sql
mysql
date
aggregate
overlap
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sqlmysqldateaggregateoverlap or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
sql
mysql
date
aggregate
overlap
or ask your own question.