In 1981, ACM bestowed its Turing Award on E.F. Codd, formulator of relational database theory and practice. Codd's work, The Relational Model for Database Management, had changed the computing world - and changed it for the better. I work for the company that commercialized and dominates relational technology. But it's not some fluke, and not some academic toy that made for the success of relational technology.
Most of us are introduced to relational technology by a book co-written by Codd with C.J. Date, An Introduction to Database Systems. Codd's contribution was not just applying mathematical rigor to "relations" - tables - but also the definition of the process of data normalization. Lots of us pretended to pay attention during our one course on database. And unfortunately, sometimes abetted by jaded professors, a lot of us discard the learnings as soon as we get our grades.
So what's with that? Well, "relational" looks stupidly simple. And it's widely "known" that a fully normalized database schema "can't perform well." So naturally, when database design is underway, people are expecting to compromise on normalization.
The drill goes something like this:
- Programmer realizes he needs to store something - a phone number for a vendor sales representative
- Programmer does not have realistic sample data with relationships to existing test data
- Programmer does, however, have the use case of wanting to submit a vendor name and get back the phone number to call
- Programmer codes a table with a name and a phone number
- Programmer trumps DBAs and data modelers with "customer requirements"
- Applications depend on the illusory tight functional dependency of phone number and vendor name
- When the vendor relationship is found more complicated, or two share a phone number, the world ends.
- The programmer has another year of employment
Regrettably, the practical effect of this commonly-held belief is to short-circuit the actual data modeling - after all, if normalization doesn't really work, why follow through on it at all - or for that matter, start it? Of course this is not particularly honest.
And it doesn't work. A well-normalized schema (remember 3NF, BCNF, 4NF?) typically has superior performance over a poorly-designed schema. Please go back, pick up something to remind you what normalization is, and take the time to apply it on your database project.