The views expressed on this blog site are my own and do not necessarily reflect the views of an employer.

Experienced thinking on designing, building, supporting, and maintaining software – database software and otherwise.

Sep

If you don’t know what it is, why are you searching for it?

Posted by Andrew Published in General Posts

...Or, don't put IS NULL in the WHERE clause! There is a tendency to use NULL, or '', or even zero as a special value. So what's wrong with that? Like me, you may remember a lot of college and grad school courses in which professors and T.A.s would casually put a zero somewhere to serve as a flag or marker. And think, of course, of the venerable "null-terminated string" we inherited from Unix and C! The problem is that none of these magic values are magic in the slightest. A "null byte" is not really null, it's just eight bits of zero. And zero is a completely usable value — just ask the people who had to clean up buffer-overflow vulnerabilities in all the operating systems since the mid-1990's. In your schema, using NULL as any special value, rather than as an indeterminate or absent value, is an invitation for trouble. Your first problem will be that you can't search for NULL in an index. Get it in your head: NULL is what you use to keep a row out of an index. If you encounter a place where you're using special indexing techniques to optimize a search for NULL, you're not actually using it as NULL but as a state marker or as a default value. Then you encounter the issue where NULL has different meanings when you INSERT or UPDATE, and when you SELECT or DELETE. "Oh, well on a SELECT I pass in a null to indicate that I don't want to filter on that column at all." Well, in that case you're using it as a wild-card, not as a NULL. When column Z is NULL it indicates that it is defaulted. Defaulted to what? Defaulted as of when? This is a very easy trap to fall into. Here are some typical cases. You're in a numeric domain, and you just need one special value. It's not really a number. So you use NULL, and pile up a couple of million rows of live data in which some percentage have it NULL. Some time later you want to refine the meaning of NULL. Or you discover that this NULL actually has two or three subcases. You're in a textual domain, and an empty string seems like a fine way to show that this value has been deleted from the row. But then you encounter a case where you need to insert the row with this value empty, but that has completely different ramifications from removing the value. How do you distinguish these? If something has meaning, if you're searching for it, DO NOT use NULL! NULL is meaningless. Be alert to when you are treating it otherwise. The following are common hazardous uses of NULL, '', NIL, and zero: * logical values that exclude values in the domain. For example, NULL in the "start date" might mean "this record has been active since system start." But what does it mean if the "end date" is also null, or you have to merge into a system with a much earlier epoch? * encoding record state or lifecycle in a strongly related attribute rather than its own. Yes, I know that when you find a null in the "superseded by item #" column, it means the item has not been superseded yet. What happens when an item is superseded by multiple other items? What if it is just obsoleted and not replaced or superseded? * wild-card matching We've all used NULL in these ways, but when you do, be smart. Ask yourself whether it really is NULL. And if you think you might search for it - well, it probably isn't NULL at all. If you code it as NULL, why would you search for it? So if you're searching for it, it has some meaning, you need this particular state for a substantial reason - so find some other way to represent that condition in the column rather than emptiness.

no comment

« Previous Page

If you don’t know what it is, why are you searching for it?

Blogroll

Categories

Recent Posts