OdeToCode IC Logo

SQL Brainteasers

Friday, May 19, 2006

I love the little challenges Ayende Rahien puts on his blog. The last SQL challenge was to convert a table like this:

FromDate ToDate
01/01/2000 01/31/2000
02/01/2000 02/28/2000
03/05/2000 03/31/2000

Into this:

FromDate ToDate
01/01/2000 02/28/2000
03/05/2000 03/31/2000

Where adjacent date ranges collapse to a single record. The first step was to setup some experimental tables.

CREATE TABLE #InDates
(
   FromDate datetime,
   ToDate datetime
)

INSERT INTO #InDates VALUES('1/1/2000', '1/31/2000')
INSERT INTO #InDates VALUES('2/1/2000', '2/28/2000')
INSERT INTO #InDates VALUES('3/5/2000', '3/31/2000')

CREATE TABLE #OutDates
(
   FromDate datetime,
   ToDate datetime
)

I have to think about SQL in small chunks, so I first thought about how to get a resultset with a FromDate minus one day. That would give me a column to compare against another record's ToDate:

SELECT DATEADD(dd, -1, FromDate) AS FromDateMinusOne, ToDate FROM #InDates

Typically these kind of solutions involve a self join, so I added the above into the following query:

SELECT *
FROM #InDates I
  LEFT JOIN
      (SELECT
         DATEADD(dd, -1, FromDate) AS FromDateMinusOne,
         ToDate 
       FROM #InDates) M
    ON I.ToDate = M.FromDateMinusOne   

Which generates the following resultset:

FromDate ToDate FromDate ToDate
2000-01-01 2000-01-31 2000-01-31 2000-02-28
2000-02-01 2000-02-28 NULL NULL
2000-03-05 2000-03-31 NULL NULL

Looking at that resultset it was easy to see that coalescing the ToDate into the NULL value left behind by a LEFT JOIN and using a GROUP BY would process the sample data.

INSERT INTO #OutDates
  SELECT
    MIN(I.FromDate),
    COALESCE(M.ToDate, I.ToDate) AS ToDate
FROM #InDates I
  LEFT JOIN
    (SELECT DATEADD(dd, -1, FromDate) AS FromDateMinusOne, ToDate
     FROM #InDates) M ON I.ToDate = M.FromDateMinusOne
GROUP BY COALESCE(M.ToDate, I.ToDate)

Then Ayende had to up the ante by updating the post with a harder challenge. This one has me scratching my head. I ended up with the above SELECT query inside a SQL 2005 common table expression. I’m sure someone will come along and tell me the following query is horribly inefficient, and terribly wrong:

WITH CollapsedDates(FromDate, ToDate)
AS
(
   SELECT
      MIN(T1.FromDate),
      COALESCE(T2.ToDate, T1.ToDate)      
   FROM #InDates T1
   LEFT JOIN
      (SELECT DATEADD(dd, -1, FromDate) AS FromDateMinusOne, ToDate
       FROM #InDates) T2 ON T1.ToDate = T2.FromDateMinusOne
   GROUP BY COALESCE(T2.ToDate, T1.ToDate)      
)

SELECT
   MIN(FromDate),
   ToDate
FROM   
   (SELECT
      CD1.FromDate AS FromDate,
     (SELECT MAX(ToDate) FROM CollapsedDates CD2 
      WHERE CD1.ToDate BETWEEN CD2.FromDate AND CD2.ToDate) AS ToDate
   FROM CollapsedDates CD1) UGLY
GROUP BY ToDate

The query plan is an ugly beast (but not the ugliest). I think I need to step away from this problem. There has to be a more elegant solution.