Our SKILL table still has flaws. Here it is again for reference, in the 1NF:
SKILL Table| Dog | Trick | Skill |
|---|---|---|
| 3 | Rollover | 53 |
| 3 | Retrieve | 71 |
| 3 | Tightrope | 91 |
| 12 | Roll Over | 40 |
| 12 | Play Dead | 70 |
| 12 | Fetch | 50 |
| 13 | Sit | 10 |
| 13 | Stay | 10 |
| 13 | Heel | 12 |
We want to eliminate redundancy by using shorter codes for the trick names.
Codes will save disk space, smaller files will process faster, and short codes are
easier for users to enter
into applications. This is especially true if the
descriptions were a lot longer than our example data. Then the savings and
performance gains could be quite significant.
By storing codes, we also gain an integrity benefit. You don't have to worry about variations is spelling, capitalization, or the use of synonyms. You'll want to limit who can manage the list of trick definitions, so that consistency is enforced. This is because consistency is especially important when queries are made (e.g., "Give a list of all dogs who know the trick Fetch" could return invalid results in the 1NF because it would miss dogs who know that trick as Retrieve.)
When we move the redundant data to a different table, here is how the data would look:
SKILL Table
|
TRICK Table
|
Below is the ERD for our two new tables in the second normal form (the DOG table isn't in the 2NF, but it will be fixed when we bring that table to the 3NF). You can see that the SKILL table is the "many" side for two relationships. That is, one dog can learn zero or more skills, and that one trick can occur zero or more times in the skills table. The skills table is in (what is referred to as) a "Many-to-Many Relationship."
The 3NF is just a step away, but before we reach that destination, I want to talk about Data Dictionaries (abbreviated DD). Below is a DD entry for our TRICK table. A DD gives detailed information about each field in a record. This is where we document the data type, domain (limits such as size), and what role a field might play in a relationship between other tables. In our example, the notes for the TRICK field show how the trick number links the table TRICK with the table SKILL. The comments use the "dot" notation. With dot notation, the word before the period is the table, and the field follows the period. Since TRICK is a field in both the SKILL and TRICK tables, it helps identify which we're talking about. Take a look at the example below, and we'll discuss some finer points below.
| Type: | Numeric |
| Size: | 5 (99999) |
| Index: | Primary Key |
| Rules: | Required, Unique. |
| Notes: | Internal trick number. There may be many SKILL.TRICK entries for a single TRICK.TRICK |
| Type: | Character |
| Size: | 40 |
| Index: | Secondary Key |
| Rules: | Required. Text entered into this field is converted to uppercase. |
| Notes: | Full description for the trick. |
There's nothing magic about the layout of this example DD. It's just a document created by the database designer to record the decisions made during development. Any changes that follow implementation should also be recorded in your DD. You might want to use a different layout, record additional details, or your database tool might have a place for this information to be stored. Regardless of location, it's important to document these details so they won't be forgotten. I would also encourage you to document, not only the WHAT, but also the WHY things were set up a certain way.
Looking at our specific example, you can see that the TRICK field is the primary key used to link to the SKILL table. Earlier I spoke of using short, mnemonic codes, so why are we using numbers here? First, primary keys used to link must be unique. Second, to ensure that link integrity is enforced, professional database designers hide these links from the user (for both their benefit and for integrity's sake). Third, professional database designers prefer to use arbitrary keys so maintenance is minimal. Numeric fields are used, by the vast majority of professional developers, as primary keys. That's because a new record can be added by simply adding one to the highest key that exists (e.g., the next trick in our sample table of seven tricks would be trick #8).
If an application is designed to use codes for expert user's efficiency, we could add a field named CODE to the TRICK table. The user could use CODE, but behind the scenes the TRICK field transparently links to the SKILL table. We don't sacrifice maintainability, either. For example, let's say we added a trick #8 that's named "Triple Reverse Somersault." We want to use the code "TR" but that code is already used by "Tightrope." To solve our quandary, we'll make "Tightrope" code "T" and "Triple Reverse Somersault" as code "TR." This change only affects two records if we used arbitrary key. However, if we had used a code to link the tables, we'd have to replace ALL occurrences of "TR" with "T" first, and then all occurrences for "Triple Reverse Somersault" to "TR." If there were millions of records, such changes become a maintenance nightmare. Also, if the sequence isn't correct, you can accidentally corrupt a ton of data!