Tuesday, February 24, 2009

Get Rid of NULL

"My query isn't returning the right rows" is a problem we all hear often. Upon investigation, I find that the most common cause of a miswritten query is a misunderstanding and/or mishandling of NULLs.

Well, you know what? I've had it. Let's get rid of NULL. There, I said it.

I've written articles explaining NULL before, a couple of times in fact*. So have plenty of other Oracle specialists, in both blogs and books. And it's just not getting through. So let's just get rid of NULLs.

* Links to my previous articles:
NULLs in Oracle
NULL vs Nothing - ANSI SQL is unlike programming languages because NULL is not nothing.

The basic problem is that programmers coming from different backgrounds mistakenly think NULL is nothing. You may think NULL is nothing, you may design your application to treat NULL as nothing, but the problem is that Oracle does not consider NULL to be nothing. NULL is, quite simply, "I don't know."

NULL is not nothing, it's a value. It's a value as real as any other. It's simply an unknown value. So why do we need NULL anyway? Let's just call it UNKNOWN. That should avoid any programmer misconceptions.

Let me give you an example to help clear it up. Let's take a table of employees, and for whatever reason, HR keeps track of everyone's favourite colour. How would you treat the following situations?

1. An employee leaves that section blank.
2. An employee writes "blind since birth - never seen a colour."
3. An employee writes "that's personal" or "what a stupid question."
4. An employee writes "I do not have a favourite colour."

In most cases today, each of these employees would have NULL entered as their favourite colour. But would it be appropriate in all cases? In some cases the employee has a favourite colour, we simply don't know what it is. In at least one case, the employee has no favourite colour (ie "nothing").

What if I asked you if two of these employees had the same favourite colour? You couldn't tell me, could you? What if I asked you if they all had different favourite colours? You can't answer that either.

So what would I do? I'd get rid of NULL and use a value more appropriately named UNKNOWN instead.

"But Robert," some of you are thinking, "What about foreign keys?" You may have a table called colour and employee.colour_id might map to colour.id. UNKNOWN may be appropriate for some of those employees, what about the employee without a favourite colour? We can't put UNKNOWN because we know his favourite colour (ie. none). So what do we do? Create another type called NONE or NOTHING?

No. No we don't. I doubt Oracle would even support that properly anyway. I'll tell you what we'd do. We'd insert "Nothing" into COLOUR. And why not? What's the problem with that?

In more practical terms, think of the "supervisor_id" field in a typical employee table. The supervisor_id column would be self-referential to employee.id. What I'm saying is this:
1. Employees with an unknown supervisor would have supervisor_id = UNKNOWN (which is just a simple re-naming of NULL).
2. There would be a "Nobody" employee. (Make sure not to pay him/her!)
3. Employees with no supervisor (the President, for example) would have a supervisor_id equal to Nobody's id.

Since I'm adding "Nobody" as an employee, would I therefore want to add "Unknown" as an employee too, make the column "NOT NULL", thus avoiding the NULL/UNKNOWN value altogether?

No. No way. Look at those questions I asked above - what if I asked you if two employees had the same boss? If they both had a supervisor_id set to the "Unknown" employee, you'd say "yes," which would be wrong! They might have two completely different supervisors - the point is that we don't know. Which is what NULL/UNKNOWN value is for.

I know I'm not alone in my opinion, but I also know that there is no consensus on this issue. I'm expecting more than a few comments from experienced Oracle experts finding fault with my proposals and/or examples, and backing it up quite convincingly. NULL is no doubt one of the great RDBMS arguments, and while it may be frustrating to work with, I hope it made for interesting reading.

Friday, February 20, 2009

OVERLAPS

"What's wrong?" the guru asked as he sat down next to the company's newest database analyst. "I heard you were having some trouble with the room booking application."

"Yep," said Chad. "I just can't seem to get it right."

"Don't put it all on yourself," said the guru. "What can I do to help?"

"Well, I've got it narrowed down to this one procedure, which is responsible for checking if the room is available or not. It takes a start time, and end time, and the room number. For some reason it sometimes says the room is available when it isn't."

The wise guru studied the code, scratched his beard, and finally provided his much anticipated wisdom. "Have you ever heard of OVERLAPS?"

The well-trained database analyst dutifully opened up the Oracle documentation. He knew the Oracle guru strongly believed in the completeness and accuracy of the Oracle documentation, and it would certainly win him some points to be seen going directly to the guru's favourite source of information.

Unfortunately Chad was a little embarrassed when he couldn't find OVERLAPS anywhere. Finally the guru broke the tension with a hearty laugh. "You won't find it in the Oracle documentation, but you were right to look there first!" Chad smiled sheepishly. "For some reason Oracle has chosen not to advertise the existence of this command, but here's how it works."

The guru slid the keyboard over, cracked his knuckles, and typed out this query.

SQL> SELECT 'Yes' FROM DUAL
2 WHERE (DATE '2009-01-01', DATE '2009-01-03')
3 OVERLAPS (DATE '2009-01-02', DATE '2009-01-04');

'YE
---
Yes

"Wow, that's neat! But without documentation, how am I supposed to know how this works?"

"One of the many advantages to using Oracle is its popularity. As a result, there is an abundance of Oracle experts around the world, most of whom are quite generous with their knowledge and experience. You should be sure to include some of the more popular and trusted forums and sites to your daily reading list."

Chad looked optimistic, but still a little shaken. "Hey, don't feel so bad," said the guru as he placed his hand on the shoulder. "Open up your copy of Expert One-on-One Oracle, and check out Chapter 1. Flip to page 31."

Chad studied the code, which attacked the very same problem. Finally he looked up, obviously still confused. The guru smiled at him. "Study the query carefully. What happens when a meeting request comes in that both starts and ends while another meeting is running?"

Chad studied the code with furrowed eyebrows, but eventually his face broke out into a broad smile. "The room gets double-booked! Wow, even Tom Kyte got this wrong!"

"See? It happens to the best of us."

"Once I test this, should I rewrite my procedure to use OVERLAPS?"

"Personally, I wouldn't recommend using any undocumented feature. Furthermore, my own experience with OVERLAPS is that it doesn't consider equality as an overlap, and that may not satisfy your business requirements."

SQL> SELECT 'Yes' FROM DUAL
2 WHERE (DATE '2009-01-01', DATE '2009-01-03')
3 OVERLAPS (DATE '2009-01-03', DATE '2009-01-04');

no rows selected

"Instead, just consider OVERLAPS a handy tool to help you with your testing."

"Thanks!"

The guru turned up to leave, but just as he was about to leave sight, he turned around in a fashion reminiscent of Peter Falk's Columbo. "You know, I could use some lunch ..."


---
To the Chads out there, here are some authors with more information on OVERLAPS:
Eddie Awad
Kevin Meade
David Aldridge

This page is powered by Blogger. Isn't yours?