What's Wrong With This Code (#6)

Tuesday, September 26, 2006

Joe Developer is working on a new application for a book publisher. Authors can publish zero or more books. Books can have zero or more authors.

Book and Author schema

Joe wrote a query to get a total count of all authors, and a total count of all books. Joe read on the Internet that the DISTINCT keyword is good to use in these scenarios.

SELECT DISTINCT
  COUNT(Authors.Id) AS TotalAuthors,
  COUNT(Books.Id) As TotalBooks
FROM
  Authors  
  FULL JOIN AuthorsBooks AB ON Authors.Id = AB.AuthorID
  FULL JOIN Books ON AB.BookID = Books.Id

The problem is - the numbers seem too high. What's wrong? Is it easy to fix?


Comments
Eric W. Bachtal Tuesday, September 26, 2006
DISTINCT at the SELECT level will only ensure that unique rows appear in the result set. In this example, DISTINCT should be used within each COUNT() to ensure that duplicates are eliminated before the counts are calculated:

SELECT
COUNT(DISTINCT Authors.Id) AS TotalAuthors,
COUNT(DISTINCT Books.Id) As TotalBooks
FROM
Authors
FULL JOIN AuthorsBooks AB ON Authors.Id = AB.AuthorID
FULL JOIN Books ON AB.BookID = Books.Id
Julian Kuiters Tuesday, September 26, 2006
DISTINCT should be place within the count function, this way you are only counting the distinct values within each column.

SELECT
COUNT(DISTINCT Authors.Id) AS TotalAuthors,
COUNT(DISTINCT Books.Id) As TotalBooks
FROM
Authors
FULL JOIN AuthorsBooks AB ON Authors.Id = AB.AuthorID
FULL JOIN Books ON AB.BookID = Books.Id

scott Tuesday, September 26, 2006
Julian, Eric: Correct!

I also updated the post to fix a small typo (not in the code).
Wilhelm Svenselius Tuesday, September 26, 2006
Um, I might be missing something, but wouldn't it be much (much!!) faster and simpler to just SELECT COUNT(*) from the Authors and Books tables in order to get the author and book counts?
scott Tuesday, September 26, 2006
It could be, Wilhelm [but then it wouldn't be interesting to look at for bugs :)]
Rick Glos Tuesday, September 26, 2006
In fact, I think AuthorsBooks is just confusing the developer. If you want 1 resultset containing a count of authors and a count of books the query would be better written like this:

-- use count(1) not count(*) to speed up
select
(select count(1) from authors) as [AuthorsCount]
, (select count(1) from books) as [BooksCount]
yaip Tuesday, September 26, 2006
I am a bit ignorant about this. What tool have you used for the ER diagram?
scott Wednesday, September 27, 2006
@ Rick: Good one, I like that.

@ yaip: This is the database diagram tool in SQL Server 2005. It is similar to the same tool in SQL 2000, just a slightly different look.
marshall Monday, October 2, 2006
@rick - count(*) and count(1) make no difference. unless that's an access thing of course but turn all the profiling options on in sqlserver and you'll see no difference.

just trying to squash that myth
Comments are now closed.
by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!