Date Range in SQL

While creating SQL queries for reports, it’s usually required to get a continuous list of dates for a given period. E.g.: get all dates from 1980 Jan 01 till 2013 Dec 31 (12419 records). In this post I would like to describe several ways how this can be done and pros and cons of these methods.

First I will briefly describe each method and then I will move them all into one table to make some conclusions.

Use system tables (or large tables in your database)

The query looks like:

DECLARE @startDate datetime, @endDate datetime;SET @startDate = {d N’1980-01-01′};
SET @endDate = {d N’2013-12-31′};
select DATEADD(d, n, @startDate)
from (
select 0 n
union all
select row_number() over (order by a.ID) n
from sysobjects a with(nolock)
cross join sysobjects b with(nolock)
) tab1
group by n
having DATEADD(d, n, @startDate) <= @endDate
order by n

Advantages:

  • Small amount of code.
  • Great performance: CPU 63, Reads 1500, Duration 0.09 seconds.

Disadvantages:

  • We are working with system tables and Microsoft can change their names or structure in future versions of SQL Server.
  • Depend on table size – for small databases this method cannot be used as there would be too little records in sysobjects table.

Use large union query

DECLARE @startDate datetime, @endDate datetime;
SET @startDate = {d N’1980-01-01′};
SET @endDate = {d N’2013-12-31′};
with
t0(n) as
(
select 1
union all
select 1
),
t1(n) as
(
select 1
from t0 as a
cross join t0 as b
),
t2(n) as
(
select 1
from t1 as a
cross join t1 as b
),
t3(n) as
(
select 1
from t2 as a
cross join t2 as b
),
t4(n) as
(
select 1
from t3 as a
cross join t3 as b
),
t5(n) as
(
select 1
from t4 as a
cross join t4 as b
),
Numbers(n) as
(
select row_number() over (order by n) as n
from t5
)
select dateadd(d, n – 1, @startDate) as n
from Numbers
where n <= datediff(d, @startDate, @endDate) + 1

Advantages:

  • Ideal performance: CPU – 16, Reads – 0, Duration – 0,009 seconds.

Disadvanteges:

  • Code is rather bulky.

Use CTE

DECLARE @startDate datetime, @endDate datetime;
 
SET @startDate = {d N’1980-01-01′};
SET @endDate   = {d N’2013-12-31′};
 
WITH [dates] ([Sequence], [date]) AS
   (SELECT 1 AS [Sequence]
          ,@startDate AS [date]
    UNION ALL
    SELECT Sequence + 1 AS Sequence
          ,DATEADD(d, 1, [date]) AS [date]
    FROM [dates]
    WHERE [date] < @endDate)
 
SELECT [Sequence]
      ,[date]
FROM [dates]
OPTION (MAXRECURSION 32747);

Advantages:

  • Code is ok – it’s rather compact.

Disadvantages:

  • Performance is rather poor, especially for number of reads: CPU – 172, Reads – 111000, Duration – 0,3 seconds.
  • We cannot use it on SQL Server 2000.
  • You are limited with 89 years – but I think for most cases this will be enough.

To sum up, the small table with all three methods in one place, with some my thoughts about each of them.

Method Advantages Disadvantages Conclusion
System Tables
  • Small Code
  • Good performance
  • Depends on internal  SQL objects
  • Depends on size of the database
From my point of view not reliable method. The performance lead is not so much higher than the others methods.
Large union queries
  • The best performance
  • Code not very good
Great method. It  is flexible enough to feed needs for any date ranges and works very quickly.
CTE
  • Small code
  • Poor performance (especially for reads).
Can be used for not large date ranges (it shows quite good performance for 1-2 years period). But in general case the second method is much better.
Advertisements
This entry was posted in Database development, SQL Server, Technology and tagged , . Bookmark the permalink.

6 Responses to Date Range in SQL

  1. quickjs says:

    I beg to differ with your conclusions. Firstly, you forgot to mention that your CTE has one distinct disadvantage in that even with MAXRECURSION set to 32747, it will still blow out for larger results.

    Secondly, using the system tables is, if done properly, the fastest method I am aware of and carries none of the disadvantages that you list. Here is the implementation that I use:

    with tally(n) as
    (
    select 0
    union all
    select row_number() over (order by a.object_id)
    from master.sys.columns a cross join master.sys.columns b
    with (nolock)
    )

    Using the master database (and the cross join) will give you enough rows for pretty much anything you’d want to do, and the NOLOCK hint will alleviate the other issue you mentioned.

    In comparing this with your CTE approach to generate every day from 1 Jan 1930 until today, my method was around 60% faster (~450ms vs ~1100) and used 235 logical reads compared to a whopping 182305.

    • Boris Frolov says:

      For my SQL Server recommended select returns only 1937 records. This is really not enough, also the access to master database is not always allowed (the service account under which system is working usially do not need sych permissions). But the performance was really good, so this point I would correct in my post.

      Still the problem with amount of records is really annoying. I’ve ran the query using sysobjects table on several databases and the amount of rows was quite different. In this case I will prefer to stop on the second case, as it shows max flexibility and rather good performance.

      • quickjs says:

        Yep, every choice is a trade-off between something, whether it’s performance, portability, scalability, predictability, and so on. There’s no ideal solution.

        For me personally, the hard limit imposed by MAXRECURSION is a dealbreaker; I’d rather have the freedom to just keep adding CROSS JOINs as required. Using the master database takes care of the variance in the number of rows returned, but as you point out it’s not always possible depending on how locked down your environment is.

        The more I think about this, the more I think it’s best to turn method 2 (large unions) into a table-valued UDF or a view and never have to worry about the size and ugliness of the code behind it again.

      • Boris Frolov says:

        Agree. I’ve also come to the second method, especially after I’ve speeded it up. Now it has excellent performance and no visible disadvantages. In discussion the truth is born – thanks a lot for your participation )))

  2. Juan Arellano says:

    This was my solution when I had to do it a while ago:

    DECLARE @x as INT
    SELECT @x = 37620 –DATEDIFF(dd,CAST(0 AS DATETIME), ‘2003-01-01’)
    CREATE TABLE #table (id int IDENTITY(37620,1), dateofyear DATETIME)
    WHILE @x <= 38350 –DATEDIFF(dd,CAST(0 AS DATETIME), '2004-12-31')
    BEGIN
    INSERT #table (dateofyear) SELECT 1
    SET @x = @x + 1
    END
    SELECT CAST(id AS DATETIME) as dateofyear
    FROM #table
    DROP TABLE #table

    it seems to be the more cost-effective, efficient, withe less CPU and reads, as compared to any of the solutions in the original posting.
    It also can be turn into a function or stored procedure, so parameters can be passed, and still maintain the same performance.

    • Boris Frolov says:

      Juan,
      Thanks a lot for your comment, but I cannot agree with you. Generally, SQL Server is not so good when dealing with cursors or other cycles. It’s a query oriented language. Also the performance of each cycle depends on the size of the variable.
      Concerning your example, I’ve tested it for the period from 1980 till 2013 and the results were not good:
      • CPU – 765,
      • Reads – 13 500,
      • Duration – 0,8 seconds.
      Just to compare with the second methods (based on several unions) – CPU – 16, Reads – 0, Duration – 0,009 seconds.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s