Click here to Skip to main content
15,868,016 members
Articles / Database Development

Get Ready to Learn SQL: 8. Database Normalization Explained in Simple English

Rate me:
Please Sign up or sign in to vote.
5.00/5 (9 votes)
22 Oct 2014MIT5 min read 19.8K   12   2
Database normalization in SQL explained in simple English

Introduction

Database normalization is a process used to organize a database into tables and columns. The idea is that a table should be about a specific topic and that only those columns which support that topic are included. For example, a spreadsheet containing information about sales people and customers serves several purposes:

  • Identify sales people in your organization
  • List all customers your company calls upon to sell product
  • Identify which sales people call on specific customers

By limiting a table to one purpose, you reduce the number of duplicate data that is contained within your database, which helps eliminate some issues stemming from database modifications. To assist in achieving these objectives, some rules for database table organization have been developed. The stages of organization are called normal forms; there are three normal forms most databases adhere to using. As tables satisfy each successive normalization form, they become less prone to database modification anomalies and more focused toward a sole purpose or topic. Before we move on, be sure you understand the definition of a database table.

Reasons for Normalization

There are three main reasons to normalize a database. The first is to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries. As we go through the various states of normalization, we’ll discuss how each form addresses these issues, but to start, let’s look at some data which hasn’t been normalized and discuss some potential pitfalls. Once these are understood, I think you’ll better appreciate the reason to normalize the data. Consider the following table:

Image 1

Note: The primary key columns are underlined.

The first thing to notice is this table serves many purposes including:

  1. Identifying the organization’s salespeople
  2. Listing the sales offices and phone numbers
  3. Associating a salesperson with a sales office
  4. Showing each salesperson’s customers

As a DBA, this raises a red flag. In general, I like to see tables that have one purpose. Having the table serve many purposes introduces many of the challenges; namely, data duplication, data update issues, and increased effort to query data.

Data Duplication and Modification Anomalies

Notice that for each SalesPerson, we have listed both the SalesOffice and OfficeNumber. This information is duplicated for each SalesPerson. Duplicated information presents two problems:

  1. It increases storage and decreases performance.
  2. It becomes more difficult to maintain data changes.

For example:

  • Consider if we move the Chicago office to Evanston, IL. To properly reflect this in our table, we need to update the entries for all the SalesPersons currently in Chicago. Our table is a small example, but you can see if it were larger, that potentially this could involve hundreds of updates.
  • Also consider what would happen if John Hunt quits. If we remove his entry, then we lose the information for New York.

These situations are modification anomalies. There are three modification anomalies that can occur:

Insert Anomaly

There are facts we cannot record until we know information for the entire row. In our example, we cannot record a new sales office until we also know the sales person. Why? Because in order to create the record, we need provide a primary key. In our case, this is the EmployeeID.

Update Anomaly

Image 2

The same information is recorded in multiple rows. For instance, if the office number changes, then there are multiple updates that need to be made. If these updates are not successfully completed across all rows, then an inconsistency occurs.

Deletion Anomaly

Image 3

Deletion of a row can cause more than one set of facts to be removed. For instance, if John Hunt retires, then deleting that row causes us to lose information about the New York office.

Search and Sort Issues

The last reason we’ll consider is making it easier to search and sort your data. In the SalesStaff table, if you want to search for a specific customer such as Ford, you would have to write a query like:

SQL
SELECT SalesOffice
FROM SalesStaff
WHERE Customer1 = ‘Ford’ OR
      Customer2 = ‘Ford’ OR
      Customer3 = ‘Ford’

Clearly if the customers were somehow in one column, our query would be simpler. Also, consider if you want to run a query and sort by customer. The way the table is currently defined, this isn’t possible, unless you use three separate queries with a UNION. These anomalies can be eliminated or reduced by properly separating the data into different tables, to house the data in tables which serve a single purpose. The process to do this is called normalization, and the various stages you can achieve are called the normal forms.

Definition of Normalization

There are three common forms of normalization: 1st, 2nd, and 3rd normal form. There are several additional forms, such as BCNF, but I consider those advanced, and not too necessary to learn in the beginning. The forms are progressive, meaning that to qualify for 3rd normal form, a table must first satisfy the rules for 2nd normal form, and 2nd normal form must adhere to those for 1st normal form. Before we discuss the various forms and rules in details, let’s summarize the various forms:

  • First Normal Form – The information is stored in a relational table and each column contains atomic values, and there are not repeating groups of columns.
  • Second Normal Form – The table is in first normal form and all the columns depend on the table’s primary key.
  • Third Normal Form – The table is in second normal form and all of its columns are not transitively dependent on the primary key.

Do not get too hung up if you don’t know what these rules mean at the moment; we’ll explain them in detail in the next post using examples. For now, it’s important to understand that there are three rules which build upon each other. More tutorials are to follow! Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me. I’m here to help you.

What other topics would you like to know more about?

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Easy Computer Academy, LLC
United States United States
Hello my name is Kris. I’m here because I am passionate about helping non-techie people to overcome their fear of learning SQL.

I know what it is like to not know where to start or whether the time spent learning is worth the effort. That is why I am here to help you to:
- Get started in an easy to follow step-by-step manner.
- Use your time wisely so you focus on what is important to learn to get the most value from your time.
- Answer your questions. Really! Just post a comment and I’ll respond. I’m here to help.

It wasn’t long ago that I was helping a colleague with some reporting. She didn’t know where to start and soon got overwhelmed and lost as she didn’t know SQL.

I felt really bad, as she was under pressure to get some summary information to her boss, the built-in reports were falling short, and to make them better would require her to know SQL. At that time that seemed impossible! It in dawned on me, it doesn’t have to be that way.

Then I discovered a way for anyone with the desire to easily learn SQL. I worked with my co-worker, started to teach her what I learned and soon she was able to write reports and answer her boss’ questions without getting stressed or ploughing hours into manipulating data in Excel.

It hasn’t always been easy. Sometimes the information seems abstract or too conceptual. In this case I’ve found out that a visual explanation is best. I really like to use diagrams or videos to explain hard-to-grasp ideas.

Having video, pictures, and text really help to reinforce the point and enable learning.

And now I want to help you get the same results.

The first step is simple, click here http://www.essentialsql.com/get-started-with-sql-server/

Comments and Discussions

 
QuestionNormalization, who needs it? Pin
KP Lee25-Oct-14 16:43
KP Lee25-Oct-14 16:43 
QuestionExcellent article taking us back to first principles Pin
Nji, Klaus24-Oct-14 14:54
Nji, Klaus24-Oct-14 14:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.