| Dog | Name | Sex | Kennel | Address | Phone | Trick #1 | Trick #2 | Trick #3 |
|---|---|---|---|---|---|---|---|---|
| 3 | Tuck Yang | Male | Ruff Haus | 1200 Frank St., Chattanooga, TN 37411 | 423-820-2412 | Rollover 53 | Retrieve 71 | Tightrope 91 |
| 12 | Nip Yin | Male | K9HQ | 4051 Lewis St., Chattanooga, TN 37412 | 423-833-4122 | Roll Over 40 | Play Dead 70 | Fetch 50 |
| 13 | Shadow Muttley | N/A | K9HQ | 4051 Lewis St., Chattanooga, TN 37412 | 615-833-4122 | Sit 10% | Stay 10% | Heel 12% |
There are several problems with this data. First, we haven't accurately identified each piece of data in enough detail. For example, "Name" needs to be broken down into "First Name" and "Last Name." Also, the address needs to be broken down into street address, city, state, and ZIP code. Also, the tricks need to be separated into trick name and skill level. Identifying each entity allows for better queries in the future (e.g., we want to send targeted advertising to just certain states, or certain ZIP codes).
One of the rules of a relational database is that attributes (columns) must have the same data type. Therefore, skill levels will be numeric only (no percent signs). Here is the same data in a relational database's table, but still unnormalized:
| Dog | First Name | Last Name | Sex | Kennel | Street | City | State | ZIP | Phone | Trick Name #1 | Trick #1 Skill | Trick Name #2 | Trick #2 Skill | Trick #3 | Trick #3 Skill |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Tuck | Yang | M | Ruff Haus | 1200 Frank St. | Chattanooga | TN | 37411 | 423-820-2412 | Rollover | 53 | Retrieve | 71 | Tightrope | 91 |
| 12 | Nip | Yin | M | K9HQ | 4051 Lewis St. | Chattanooga | TN | 37412 | 423-833-4122 | Roll Over | 40 | Play Dead | 70 | Fetch | 50 |
| 13 | Shadow | Muttley | U | K9HQ | 4051 Lewis St. | Chattanooga | TN | 37412 | 615-833-4122 | Sit | 10 | Stay | 10 | Heel | 12 |
We had to make some non-technical decisions, which in the corporate world might be called "redesigning our business processes." In our scenario, we decided that all skill levels would be recorded as percentages, so a 70 is 70 out of 100 possible points. We also made the decision that percentages will not include any decimal points — integers only. That also meets the relational requirement that an attribute have a domain. In this case, the domain for a skill level is an integer between 0 and 100. You'll also notice that we conserved space by using a single-letter code for sex: M=Male, F=Female, and U=Unknown. We can programmatically enforce these domains, either in the applications or through triggers (more on triggers in a following page).
From here on, I'll refer to columns as fields, and rows as records. As you can see, the order of the records doesn't change the data, there are no duplicate records, each record has the same number of fields (making it a two dimensional table), any given field is of the same data type, and all the fields have a domain. This last characteristic isn't obvious, but when working with a database tool, you would define a field's domain. For example, you would set "First Name" to accept characters, with a maximum length of 30. By setting this field to be required (often called a "Not Null" constraint) we enforce a domain of "at least one character, but no more than 30 characters."