(Other Format - BK&CD-ROM)
Build the skills needed on the job—and on the Microsoft Certified Professional (MCP) exam—with this official Microsoft study guide for MCP Exam 70-229: Designing and Implementing Databases with Microsoft SQL Server 2000 Enterprise Edition.
Walks through the steps involved in designing and implementing an SQL server database, creating and maintaining database objects, and implementing data integrity. The study guide also introduces how to use transact-SQL to query an SQL server database and manipulate data stored in that database, and describes techniques for optimizing a database and managing server security. Exercises demonstrate a particular concept or skill. The CD-ROM contains practice files and an evaluation edition of the SQL server 2000. Annotation c. Book News, Inc., Portland, OR (booknews.com)
More Reviews and RecommendationsFounded in 1975, Microsoft Corporation (Nasdaq 'MSFT') is the worldwide leader in software for personal and business computing. The company offers a wide range of products and services designed to empower people through great software—any time, any place, and on any device.
Reader Rating:
See Detailed Ratings
March 16, 2003: I think this book is pretty good exam prep but it's not a comprehensive textbook on SQL Server. So, if you need to learn the fundamentals of SQL Server, you probably should learn those first. Then use this book to brush up on the specific things that are likely to be covered in the exam. The exam covers a lot of ground - it's used in the MCSE, MCSD, and MCDBA programs and tests stuff that looks a lot like programming (TSQL, stored procedures, etc.) as well as a lot of admin stuff. So, it's a wide range of topics and not really specialized in any single area. This book will help you determine the specific knowledge areas where you need to learn more. The CD has SQL Server 2000 Developer ed. with a 120-day expiration. That's good cause you really need to try stuff out before you take the exam.
This official MCSE Training Kit teaches how to design and implement database solutions using Microsoft SQL Server 2000 Enterprise Edition, as well as providing in-depth preparation for MCP Exam 70-229-a core exam for MCDBA certification and an elective on the Microsoft Windows 2000 MCSE track. Topics map directly to the objectives measured by the MCP exam, including developing a logical data model, implementing the physical database, retrieving and modifying data, programming business logic, tuning and optimizing data access, and designing a database security plan. Students learn through an integrated system of skill-building tutorials, case study examples, and self-assessment exercises. An economical alternative to classroom instruction, this kit enables students to set their own pace and learn by doing!
Walks through the steps involved in designing and implementing an SQL server database, creating and maintaining database objects, and implementing data integrity. The study guide also introduces how to use transact-SQL to query an SQL server database and manipulate data stored in that database, and describes techniques for optimizing a database and managing server security. Exercises demonstrate a particular concept or skill. The CD-ROM contains practice files and an evaluation edition of the SQL server 2000. Annotation c. Book News, Inc., Portland, OR (booknews.com)
Loading...A SQL Server database consists of a collection of tables that contain data and other objectsincluding views, indexes, stored procedures, and triggersthat are defined so as to support activities performed with the data. The data stored in a database is usually related to a particular subject or process, such as a retailer's customer information or a manufacturer's sales information. This chapter introduces you to the process of creating a SQL Server database and describes the basic concepts of database design. This chapter also provides information about planning a database, identifying system requirements, and developing a logical data model.
Before You Begin
To complete the lessons in this chapter, you must have
After this lesson, you will be able to:
Estimated lesson time: 30 minutes
For example, suppose that you create a database named MyCoDB to manage the data in your company. In the MyCoDB database, you create a table named Employees to store information about each employee, and the table contains columns named EmpID, LastName, FirstName, Dept, and Title. To ensure that no two employees share the same EmpID and that the Dept column contains only valid numbers for the departments in your company, you must add constraints to the table. Because you want to quickly find the data for an employee based on the employee's ID or last name, you define indexes. For each employee, you will have to add a row of data to the Employees table, so you create a stored procedure named AddEmployee that is customized to accept the data values for a new employee and that performs the operation of adding the row to the Employees table. You might need a departmental summary of employees, in which case you define a view named DeptEmps that combines data from the Departments and Employees tables and produces the output. Figure 3.1 shows the parts of the MyCoDB database.
Click to view graphic
Figure 3.1 The MyCoDB database, the Employees table, and the DeptEmps view.
Reasonable normalization often improves performance. When useful indexes are available, the SQL Server 2000 query optimizer is efficient at selecting rapid, efficient joins between tables.
As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex relational joins between too many tables can hinder performance. Reasonable normalization should include few regularly executed queries that use joins involving more than four tables.
A database that is used primarily for decision support (as opposed to update-intensive transaction processing) might not have redundant updates and might be more understandable and efficient for queries if the design is not fully normalized. Nevertheless, data that is not normalized is a more common design problem in database applications than over-normalized data. Starting with a normalized design and then selectively denormalizing tables for specific reasons is a good strategy.
Sometimes the logical database design is already fixed, however, and total redesign is not feasible. But even then, it might be possible to normalize a large table selectively into several smaller tables. If the database is accessed through stored procedures, this schema change could take place without affecting applications. If not, it might be possible to create a view that hides the schema change from the applications.
Achieving a Well-Designed Database
In relational database design theory, normalization rules identify certain attributes that must be present or absent in a well-designed database. These rules can become quite complicated and go well beyond the scope of this book. There are a few rules that can help you achieve a sound database design, however. A table should have an identifier, it should store data for only a single type of entity, it should avoid nullable columns, and it should not have repeating values or columns.
A Table Should Have an Identifier
The fundamental rule of database design theory is that each table should have a unique row identifier, which is a column or a set of columns used to distinguish any single record from every other record in the table. Each table should have an ID column, and no two records can share the same ID value. The column (or columns) that serves as the unique row identifier for a table is the primary key of the table.
In Figure 3.2, the Employees table does not include a column that uniquely identifies each row within the table. Notice that the name David Mendlen appears twice. Because there is no unique identifier in this table, there is no way to easily distinguish one row from another. This situation could be worse if both employees worked in the same department and had the same job title.
Click to view graphic
Figure 3.2 A table that has no unique identifier.
Click to view graphic
Figure 3.3 A normalized table that has a unique identifier.
You can normalize the table by adding a column that uniquely identifies each row, as shown in Figure 3.3. Notice that each instance of David Mendlen has a unique EmpID value.
A Table Should Store Data for Only a Single Type of Entity
Attempting to store too much information in a table can prevent the efficient and reliable management of the data in the table. For example, in Figure 3.4, the Books table includes information about each book's publisher.
Click to view graphic
Figure 3.4 A Table that includes title and publisher information.
Although it is possible to have columns that contain information for both the book and the publisher in the same table, this design leads to several problems. The publisher information must be added and stored redundantly for each book published by a given publisher. This information uses extra storage space in the database. If the address for the publisher changes, the change must be made for each book. Furthermore, if the last book for a publisher is removed from the Books table, the information for that publisher is lost.
In a normalized database, the information about books and publishers would be stored in at least two tables: one for book titles and one for publishers (as shown in Figure 3.5).
Click to view graphic
Figure 3.5 A normalized database design that includes a table for book titles and a table for publisher information.
The information about the publisher now has to be entered only once and then linked to each book. If the publisher information changes, it must be changed in only one place, and the publisher information will be there even if the publisher has no books in the database.
A Table Should Avoid Nullable Columns
Tables can have columns defined to allow null values. A null value indicates that the record has no value for that attribute. Although it can be useful to allow null values in isolated cases, it is best to use them sparingly because they require special handling that increases the complexity of data operations. If you have a table that has several nullable columns and several of the rows have null values in the columns, you should consider placing these columns in another table linked to the primary table. Storing the data in two separate tables enables the primary table to be simple in design but capable of accommodating the occasional need for storing this information.
A Table Should Not Have Repeating Values or Columns
A table should not contain a list of values for a specific piece of information. For example, suppose that you want to track book titles and their authors. Although most books might have only one author, many of them might have two or more. If there is only one column in the Books table for the name of the author, this situation presents a problem. One solution is to store the name of both authors in the column, but showing a list of individual authors would then be difficult. Another solution is to change the structure of the table to add another column for the name of the second author, but this solution accommodates only two authors. Yet another column must be added if a book has three authors.
Figure 3.6 shows two methods of handling multiple authors per title.
Click to view graphic
Figure 3.6 Two methods for structuring the Books table.
If you find that you need to store a list of values in a single column or if you have multiple columns for a single piece of data (Author1, Author2, and so on), you should consider placing the duplicated data in another table with a link to the primary table. In the case of the Books table, you could create an additional primary table for authors and then create a third table that matches books to authors and accommodates repeating values, as shown in Figure 3.7. This design enables any number of authors for a book without modifying the definition of the table and allocates no unused storage space for books that have a single author.
Click to view graphic
Figure 3.7 Three tables that store information about books and their authors. (Image unavailable)
There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how the related columns are defined.
One-to-One Relationships
In a one-to-one relationship, a row in table A can have no more than one matching row in table B (and vice versa). A one-to-one relationship is created if both of the related columns are primary keys or have unique constraints. This type of relationship is not common, however, because information related in this way would usually be in one table.
One-to-Many Relationships
A one-to-many relationship is the most common type of relationship. In this type of relationship, a row in table A can have many matching rows in table B, but a row in table B can have only one matching row in table A. For example, the Publishers and Titles tables mentioned previously have a one-to-many relationship. Each publisher produces many titles, but each title comes from only one publisher. A one-to-many relationship is created if only one of the related columns is a primary key or has a unique constraint.
Many-to-Many Relationships
In a many-to-many relationship, a row in table A can have many matching rows in table B (and vice versa). You create such a relationship by defining a third table, called a junction table, whose primary key consists of the foreign keys from both table A and table B. In Figures 3-6 and 3-7, you saw how the author information could be separated into another table. The Books table and the Authors table have a many-to-many relationship. Each of these tables has a one-to-many relationship with the BookAuthor table, which serves as the junction table between the two primary tables.
Exercise 1: Exploring the Basic Concepts of Database Design
In this exercise, you will view the primary objects that are contained in a SQL Server database. You will then apply the principles of normalization to a database design and identify the relationships that exist between entities within a database. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator. You will use SQL Query Analyzer and SQL Server Enterprise Manager for part of the exercise, and you will need paper and a pencil to complete the rest of the exercise.
To identify the main components of a SQL Server database
The Object Browser window displays a hierarchical tree of database objects contained in the instance of SQL Server that you are logged on to.
A list of object types appears. The list includes users tables, system tables, stored procedures, functions, and user-defined data types. Each category contains objects specific to the Northwind database.
A list of users tables in the Northwind database appears. Notice that each table object is preceded by the object owner (which, in this case, is dbo).
To view the contents of a table
The Open Table window appears and displays the contents of the Categories table.
What are the column names (attributes) in the Categories table, and how many rows of data are displayed?
To use the sp_help system stored procedure to view table information
Use Northwind
GO
sp_help
To normalize a database design
| FirstName | LastName | City |
| Elizabeth | Boyle | Cleveland |
| Rob | Caron | Chicago |
| FirstName | LastName | City |
| Neil | Smith | Denver |
| Denise | Smith | Boston |
Which rule is being violated in the Customers table?
How should you modify the data?
| CustID | FirstName | LastName | City | PurchaseType |
| 101 | Elizabeth | Boyle | Cleveland | Books, CDs |
| 102 | Rob | Caron | Chicago | Books, videos |
| 103 | Neil | Smith | Denver | CDs, videos, DVDs |
| 104 | Denise | Smith | Boston | Books |
The PurchaseType column contains a list of the types of products that the customer has purchased.
How should you modify the current design?
| CustID | FirstName | LastName | City | Purchase | Manufacturer | ManContact |
| 101 | Elizabeth | Boyle | Cleveland | Spring candles | Pavlova, Ltd. | Ian Devling |
| 102 | Rob | Caron | Chicago | Sandalwood incense | Mayumi's | Mayumi Ohno |
| 103 | Neil | Smith | Denver | Sage | Pavlova, Ltd | Ian Devling |
| 104 | Denise | Smith | Boston | Hanging crystal | Leka Trading | Chandra Leka |
How should you modify the current design?
To generate a database diagram in SQL Server
SQL Server Enterprise Manager appears.
A list of object types in the Pubs database should now be displayed.
The Create Database Diagram wizard appears.
The Select Tables To Be Added screen appears. The list of tables in the Pubs database appears in the left window.
The How Many Levels Of Related Tables option becomes active.
The Authors table, TitleAuthor table, and Titles table are added to the Tables To Add To Diagram list.
The Completing The Create Database Diagram Wizard screen appears.
The diagram is generated and appears in a new window. The diagram includes the Authors table, the TitleAuthor table, and the Titles table.
To view entity relationships in a database diagram
The tables should now be arranged so that you can see the relationships between them.
The connector indicates that a relationship exists between the tables. Notice that there is no connector between the Author table and the Titles table because no direct relationship exists between the tables.
At one end of the connecter is a key, which indicates one. The other side of the connector is an infinity sign, which indicates many. These symbols tell you that a one-to-many relationship exists between the Authors table and the TitleAuthor table and that a one-to-many relationship exists between the Titles table and the TitleAuthor table.
What is the implied relationship between the Titles table and the Authors table?
What type of table is the TitleAuthor table (in terms of the implied relationship between Titles and Authors)?
Designing a SQL Server Database
About This Chapter
A SQL Server database consists of a collection of tables that contain data and other objectsincluding views, indexes, stored procedures, and triggersthat are defined so as to support activities performed with the data. The data stored in a database is usually related to a particular subject or process, such as a retailer’s customer information or a manufacturer’s sales information. This chapter introduces you to the process of creating a SQL Server database and describes the basic concepts of database design. This chapter also provides information about planning a database, identifying system requirements, and developing a logical data model.
Before You Begin
To complete the lessons in this chapter, you must have
Lesson 1: Introduction to Database Design
Before you can develop a logical data modeland subsequently create a database and the objects it containsyou should understand the fundamental concepts of database design. In addition, you should be familiar with the basic components of a database and how those components work together to provide efficient data storage and to provide access to those who require specific types of data, in specific formats, from the database. This lesson introduces you to the basic components of a database and the terminology that describes those components. The lesson then discusses normalization and entity relationshipstwo concepts that are integral to understanding relational database design.
After this lesson, you will be able to:
Estimated lesson time: 30 minutes
Components of a SQL Server Database
A SQL Server database consists of a collection of tables that store specific sets of structured data. A table (entity) contains a collection of rows (tuples) and columns (attributes). Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). Tables have several types of controls (constraints, rules, triggers, defaults, and customized user data types) that ensure the validity of the data. Tables can have indexes (similar to those in books) that enable rows to be found quickly. You can add declarative referential integrity constraints to the tables to ensure consistency between interrelated data in different tables. A database can also store procedures that use Transact-SQL programming code to perform operations with the data in the database, such as storing views that provide customized access to table data.
For example, suppose that you create a database named MyCoDB to manage the data in your company. In the MyCoDB database, you create a table named Employees to store information about each employee, and the table contains columns named EmpID, LastName, FirstName, Dept, and Title. To ensure that no two employees share the same EmpID and that the Dept column contains only valid numbers for the departments in your company, you must add constraints to the table. Because you want to quickly find the data for an employee based on the employee’s ID or last name, you define indexes. For each employee, you will have to add a row of data to the Employees table, so you create a stored procedure named AddEmployee that is customized to accept the data values for a new employee and that performs the operation of adding the row to the Employees table. You might need a departmental summary of employees, in which case you define a view named DeptEmps that combines data from the Departments and Employees tables and produces the output. Figure 3.1 shows the parts of the MyCoDB database.
Figure 3.1 The MyCoDB database, the Employees table, and the DeptEmps view. (Image unavailable)
Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. Having a greater number of narrow tables (with fewer columns) is characteristic of a normalized database; having a few wide tables (with more columns) is characteristic of a denormalized database.
Reasonable normalization often improves performance. When useful indexes are available, the SQL Server 2000 query optimizer is efficient at selecting rapid, efficient joins between tables.
As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex relational joins between too many tables can hinder performance. Reasonable normalization should include few regularly executed queries that use joins involving more than four tables.
A database that is used primarily for decision support (as opposed to update-intensive transaction processing) might not have redundant updates and might be more understandable and efficient for queries if the design is not fully normalized. Nevertheless, data that is not normalized is a more common design problem in database applications than over-normalized data. Starting with a normalized design and then selectively denormalizing tables for specific reasons is a good strategy.
Sometimes the logical database design is already fixed, however, and total redesign is not feasible. But even then, it might be possible to normalize a large table selectively into several smaller tables. If the database is accessed through stored procedures, this schema change could take place without affecting applications. If not, it might be possible to create a view that hides the schema change from the applications.
Achieving a Well-Designed Database
In relational database design theory, normalization rules identify certain attributes that must be present or absent in a well-designed database. These rules can become quite complicated and go well beyond the scope of this book. There are a few rules that can help you achieve a sound database design, however. A table should have an identifier, it should store data for only a single type of entity, it should avoid nullable columns, and it should not have repeating values or columns.
A Table Should Have an Identifier
The fundamental rule of database design theory is that each table should have a unique row identifier, which is a column or a set of columns used to distinguish any single record from every other record in the table. Each table should have an ID column, and no two records can share the same ID value. The column (or columns) that serves as the unique row identifier for a table is the primary key of the table.
In Figure 3.2, the Employees table does not include a column that uniquely identifies each row within the table. Notice that the name David Mendlen appears twice. Because there is no unique identifier in this table, there is no way to easily distinguish one row from another. This situation could be worse if both employees worked in the same department and had the same job title.
Figure 3.2 A table that has no unique identifier. (Image unavailable)
Figure 3.3 A normalized table that has a unique identifier. (Image unavailable)
You can normalize the table by adding a column that uniquely identifies each row, as shown in Figure 3.3. Notice that each instance of David Mendlen has a unique EmpID value.
A Table Should Store Data for Only a Single Type of Entity
Attempting to store too much information in a table can prevent the efficient and reliable management of the data in the table. For example, in Figure 3.4, the Books table includes information about each book’s publisher.
Figure 3.4 A Table that includes title and publisher information. (Image unavailable)
Although it is possible to have columns that contain information for both the book and the publisher in the same table, this design leads to several problems. The publisher information must be added and stored redundantly for each book published by a given publisher. This information uses extra storage space in the database. If the address for the publisher changes, the change must be made for each book. Furthermore, if the last book for a publisher is removed from the Books table, the information for that publisher is lost.
In a normalized database, the information about books and publishers would be stored in at least two tables: one for book titles and one for publishers (as shown in Figure 3.5).
Figure 3.5 A normalized database design that includes a table for book titles and a table for publisher information. (Image unavailable)
The information about the publisher now has to be entered only once and then linked to each book. If the publisher information changes, it must be changed in only one place, and the publisher information will be there even if the publisher has no books in the database.
A Table Should Avoid Nullable Columns
Tables can have columns defined to allow null values. A null value indicates that the record has no value for that attribute. Although it can be useful to allow null values in isolated cases, it is best to use them sparingly because they require special handling that increases the complexity of data operations. If you have a table that has several nullable columns and several of the rows have null values in the columns, you should consider placing these columns in another table linked to the primary table. Storing the data in two separate tables enables the primary table to be simple in design but capable of accommodating the occasional need for storing this information.
A Table Should Not Have Repeating Values or Columns
A table should not contain a list of values for a specific piece of information. For example, suppose that you want to track book titles and their authors. Although most books might have only one author, many of them might have two or more. If there is only one column in the Books table for the name of the author, this situation presents a problem. One solution is to store the name of both authors in the column, but showing a list of individual authors would then be difficult. Another solution is to change the structure of the table to add another column for the name of the second author, but this solution accommodates only two authors. Yet another column must be added if a book has three authors.
Figure 3.6 shows two methods of handling multiple authors per title.
Figure 3.6 Two methods for structuring the Books table. (Image unavailable)
If you find that you need to store a list of values in a single column or if you have multiple columns for a single piece of data (Author1, Author2, and so on), you should consider placing the duplicated data in another table with a link to the primary table. In the case of the Books table, you could create an additional primary table for authors and then create a third table that matches books to authors and accommodates repeating values, as shown in Figure 3.7. This design enables any number of authors for a book without modifying the definition of the table and allocates no unused storage space for books that have a single author.
Figure 3.7 Three tables that store information about books and their authors. (Image unavailable)
In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columnsusually columns that have the same name in both tables. In most cases, the relationship matches the primary key from one table, which provides a unique identifier for each row with an entry in the foreign key in the other table. Primary keys and foreign keys are discussed in more detail in Chapter 5, "Implementing Data Integrity."
There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how the related columns are defined.
One-to-One Relationships
In a one-to-one relationship, a row in table A can have no more than one matching row in table B (and vice versa). A one-to-one relationship is created if both of the related columns are primary keys or have unique constraints. This type of relationship is not common, however, because information related in this way would usually be in one table.
One-to-Many Relationships
A one-to-many relationship is the most common type of relationship. In this type of relationship, a row in table A can have many matching rows in table B, but a row in table B can have only one matching row in table A. For example, the Publishers and Titles tables mentioned previously have a one-to-many relationship. Each publisher produces many titles, but each title comes from only one publisher. A one-to-many relationship is created if only one of the related columns is a primary key or has a unique constraint.
Many-to-Many Relationships
In a many-to-many relationship, a row in table A can have many matching rows in table B (and vice versa). You create such a relationship by defining a third table, called a junction table, whose primary key consists of the foreign keys from both table A and table B. In Figures 3-6 and 3-7, you saw how the author information could be separated into another table. The Books table and the Authors table have a many-to-many relationship. Each of these tables has a one-to-many relationship with the BookAuthor table, which serves as the junction table between the two primary tables.
Exercise 1: Exploring the Basic Concepts of Database Design
In this exercise, you will view the primary objects that are contained in a SQL Server database. You will then apply the principles of normalization to a database design and identify the relationships that exist between entities within a database. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator. You will use SQL Query Analyzer and SQL Server Enterprise Manager for part of the exercise, and you will need paper and a pencil to complete the rest of the exercise.
To identify the main components of a SQL Server database
The Object Browser window displays a hierarchical tree of database objects contained in the instance of SQL Server that you are logged on to.
A list of object types appears. The list includes users tables, system tables, stored procedures, functions, and user-defined data types. Each category contains objects specific to the Northwind database.
A list of users tables in the Northwind database appears. Notice that each table object is preceded by the object owner (which, in this case, is dbo).
To view the contents of a table
The Open Table window appears and displays the contents of the Categories table.
What are the column names (attributes) in the Categories table, and how many rows of data are displayed?
To use the sp_help system stored procedure to view table information
Use Northwind
GO
sp_help
To normalize a database design
| FirstName | LastName | City |
| Elizabeth | Boyle | Cleveland |
| Rob | Caron | Chicago |
| FirstName | LastName | City |
| Neil | Smith | Denver |
| Denise | Smith | Boston |
Which rule is being violated in the Customers table?
How should you modify the data?
| CustID | FirstName | LastName | City | PurchaseType |
| 101 | Elizabeth | Boyle | Cleveland | Books, CDs |
| 102 | Rob | Caron | Chicago | Books, videos |
| 103 | Neil | Smith | Denver | CDs, videos, DVDs |
| 104 | Denise | Smith | Boston | Books |
The PurchaseType column contains a list of the types of products that the customer has purchased.
How should you modify the current design?
| CustID | FirstName | LastName | City | Purchase | Manufacturer | ManContact |
| 101 | Elizabeth | Boyle | Cleveland | Spring candles | Pavlova, Ltd. | Ian Devling |
| 102 | Rob | Caron | Chicago | Sandalwood incense | Mayumi’s | Mayumi Ohno |
| 103 | Neil | Smith | Denver | Sage | Pavlova, Ltd | Ian Devling |
| 104 | Denise | Smith | Boston | Hanging crystal | Leka Trading | Chandra Leka |
How should you modify the current design?
To generate a database diagram in SQL Server
SQL Server Enterprise Manager appears.
A list of object types in the Pubs database should now be displayed.
The Create Database Diagram wizard appears.
The Select Tables To Be Added screen appears. The list of tables in the Pubs database appears in the left window.
The How Many Levels Of Related Tables option becomes active.
The Authors table, TitleAuthor table, and Titles table are added to the Tables To Add To Diagram list.
The Completing The Create Database Diagram Wizard screen appears.
The diagram is generated and appears in a new window. The diagram includes the Authors table, the TitleAuthor table, and the Titles table.
To view entity relationships in a database diagram
The tables should now be arranged so that you can see the relationships between them.
The connector indicates that a relationship exists between the tables. Notice that there is no connector between the Author table and the Titles table because no direct relationship exists between the tables.
At one end of the connecter is a key, which indicates one. The other side of the connector is an infinity sign, which indicates many. These symbols tell you that a one-to-many relationship exists between the Authors table and the TitleAuthor table and that a one-to-many relationship exists between the Titles table and the TitleAuthor table.
What is the implied relationship between the Titles table and the Authors table?
What type of table is the TitleAuthor table (in terms of the implied relationship between Titles and Authors)?
A SQL Server database consists of a collection of tables that store a specific set of structured data. A table contains a collection of rows and columns. Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). The logical design of the database, including the tables and the relationships between them, is the core of an optimized, relational database. Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. As normalization increases, so do the number and complexity of joins that are required to retrieve data. Normalization rules identify certain attributes that must be present or absent in a well-designed database. Tables in a normalized database should have an identifier, should only store data for a single type of entity, should avoid nullable columns, and should not have repeating values or columns. You can create relationships between your tables in a database diagram to show how the columns in one table are linked to columns in another table. In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columnsusually columns that have the same name in both tables. There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how you define the related columns.
Lesson 2: Planning a SQL Server Database
When planning a SQL Server database, the design must take into consideration a number of factors, including database files and filegroups, transaction logs, the SQL Server installation and its operating environment, and security. This lesson discusses each of these considerations.
After this lesson, you will be able to:
Estimated lesson time: 25 minutes
To map a database, SQL Server 2000 uses a set of operating system files. All data and objects in the database, such as tables, stored procedures, triggers, and views, are stored within the following types of operating system files:
A simple database can be created with one primary file that contains all data and objects and a log file that contains the transaction log information. Alternatively, a more complex database can be created with one primary file and five secondary files. The data and objects within the database spread across all six files, and four additional log files contain the transaction log information.
Filegroups group files together for administrative and data allocation/placement purposes. For example, three files (Data1.ndf, Data2.ndf, and Data3.ndf) can be created on three disk drives and assigned to the filegroup fgroup1. A table can then be created specifically on the filegroup fgroup1. Queries for data from the table will be spread across the three disks, thereby improving performance. The same performance improvement can be accomplished with a single file created on a redundant array of independent disks (RAID) stripe set. Files and filegroups, however, help to easily add new files to new disks. Additionally, if your database exceeds the maximum size for a single Windows NT file, you can use secondary data files to grow your database further.
Rules for Designing Files and Filegroups
When designing files and filegroups, you should adhere to the following rules:
Default Filegroups
A database comprises a primary filegroup and any user-defined filegroups. The filegroup that contains the primary file is the primary filegroup. When a database is created, the primary filegroup contains the primary data file and any other files that are not put into another filegroup. All system tables are allocated in the primary filegroup. If the primary filegroup runs out of space, no new catalog information can be added to the system tables. The primary filegroup is filled only if autogrow is turned off or if all of the disks that are holding the files in the primary filegroup run out of space. If this situation happens, either turn autogrow back on or move other files off the disks to free more space.
User-defined filegroups are any filegroups that are specifically created by the user when he or she is first creating or later altering the database. If a user-defined filegroup fills up, only the users tables specifically allocated to that filegroup would be affected.
At any time, exactly one filegroup is designated as the default filegroup. When objects are created in the database without specifying to which filegroup they belong, they are assigned to the default filegroup. The default filegroup must be large enough to hold any objects not allocated to a user-defined filegroup. Initially, the primary filegroup is the default filegroup.
The default filegroup can be changed by using the ALTER DATABASE statement. When you change the default filegroup, any objects that do not have a filegroup specified when they are created are allocated to the data files in the new default filegroup. Allocation for the system objects and tables, however, remains within the primary filegroup, not in the new default filegroup.
Changing the default filegroup prevents user objects that are not specifically created on a user-defined filegroup from competing with the system objects and tables for data space.
Recommendations
When implementing a database, you should try to adhere to the following guidelines for using files and filegroups:
A database in SQL Server 2000 has at least one data file and one transaction log file. Data and transaction log information is never mixed on the same file, and individual files are used by only one database.
SQL Server uses the transaction log of each database to recover transactions. The transaction log is a serial record of all modifications that have occurred in the database as well as the transactions that performed the modifications. The transaction log records the start of each transaction and records the changes to the data. This log has enough information to undo the modifications (if necessary later) made during each transaction. For some large operations, such as CREATE INDEX, the transaction log instead records the fact that the operation took place. The log grows continuously as logged operations occur in the database.
The transaction log records the allocation and deallocation of pages and the commit or rollback of each transaction. This feature enables SQL Server to either apply (roll forward) or back out (roll back) each transaction in the following ways:
At a checkpoint, SQL Server ensures that all transaction log records and database pages that were modified are written to disk. During each database’s recovery process, which occurs when SQL Server is restarted, a transaction needs to be rolled forward only if it is not known whether all the transaction’s data modifications were actually written from the SQL Server buffer cache to disk. Because a checkpoint forces all modified pages to disk, it represents the point at which the startup recovery must start rolling forward transactions. Because all pages modified before the checkpoint are guaranteed to be on disk, there is no need to roll forward anything done before the checkpoint.
Transaction log backups enable you to recover the database to a specific point in time (for example, prior to entering unwanted data) or to the point of failure. Transaction log backups should be a consideration in your media-recovery strategy.
Generally, the larger the database, the greater the hardware requirements. Database design should always take into consideration processor speeds, memory, and hard disk space and configuration. There are other determining factors, however: the number of concurrent users/sessions, transaction throughput, and the types of operations within the database. For example, a database containing infrequently updated school library data would generally have lower hardware requirements than a one-terabyte (TB) data warehouse containing frequently analyzed sales, product, and customer information for a large corporation. Aside from the disk storage requirements, more memory and faster processors would be needed for the data warehouse to cache more of the data in memory and to quickly process queries referencing large amounts of data.
Estimating the Size of a Database
When designing a database, you might need to estimate how big the database will be when it is filled with data. Estimating the size of the database can help you determine the hardware configuration that you will need to meet the following requirements:
Estimating the size of a database can also lead you to determine whether the database design needs refining. For example, you might determine that the estimated size of the database is too large to implement in your organization and that more normalization is required. Conversely, the estimated size might be smaller than expected, requiring you to denormalize the database to improve query performance.
To estimate the size of a database, estimate the size of each table individually and then add the values that you obtain. The size of a table depends on whether the table has indexes and, if so, what type of indexes. For more information about estimating the sizes of various types of tables, refer to SQL Server Books Online.
Physical Database Design
The I/O subsystem (storage engine) is a key component of any relational database. A successful database implementation usually requires careful planning in the early stages of your project. The storage engine of a relational database requires much of this planning, which includes the following:
Although the installation of SQL Server is beyond the scope of this training kit, you should always take into consideration the following issues before performing an installation:
Before running SQL Server 2000 Setup, create one or more domain user accounts if you are installing SQL Server 2000 on a computer running Windows NT or Windows 2000 and want SQL Server 2000 to communicate with other clients and servers.
You should log on to the operating system under a user account that has local administrative permissions; otherwise, you should assign the appropriate permissions to the domain user account. Be sure to shut down all services dependent on SQL Server (including any service that uses ODBC, such as Internet Information Services, or IIS). In addition, shut down Windows NT Event Viewer and registry viewers (Regedit.exe or Regedt32.exe).
A database must have a solid security system to control the activities that can be performed and to determine which information can be viewed and modified. A solid security system ensures the protection of data, regardless of how users gain access to the database.
Planning Security
A security plan identifies which users can see which data and perform which activities in the database. You should take the following steps to develop a security plan:
Security Levels
A user passes through two stages of security when working in SQL Server: authentication and authorization (permissions validation). The authentication stage identifies the user who is using a login account and verifies only the capability to connect to an instance of SQL Server. If authentication is successful, the user connects to an instance of SQL Server. The user then needs permissions to access databases on the server, which is done by granting access to an account in each database (mapped to the user login). The permissions validation stage controls the activities that the user is allowed to perform in the SQL Server database.
Authentication Modes
SQL Server can operate in one of two security (authentication) modes: Windows Authentication and Mixed mode. Windows Authentication mode enables a user to connect through a Windows NT 4.0 or Windows 2000 user account. Mixed mode (Windows Authentication and SQL Server Authentication) enables users to connect to an instance of SQL Server by using either Windows Authentication or SQL Server Authentication. Users who connect through a Windows NT 4.0 or Windows 2000 user account can make use of trusted connections in either Windows Authentication mode or Mixed mode.
When planning a SQL Server database, you must take into consideration database files and filegroups, transaction logs, the SQL Server installation and its operating environment, and security. All data and objects in the database are stored within primary, secondary, and transaction log files. Filegroups group files together for administrative and data allocation/placement purposes. The filegroup that contains the primary file is the primary filegroup. A database in SQL Server 2000 has at least one data file and one transaction log file. Data and transaction log information is never mixed on the same file, and individual files are used by only one database. SQL Server uses the transaction log of each database to recover transactions. Database design should always take into consideration processor speeds, memory, and hard disk space and configuration. There are other determining factors, however: the number of concurrent users/sessions, transaction throughput, and the types of operations within the database. When designing a database, you might need to estimate how big the database will be when it is filled with data. When installing SQL Server, you should take into consideration several issues. A database must have a solid security system to control the activities that can be performed and to determine which information can be viewed and modified. A security plan identifies which users can see which data and perform which activities in the database.
Lesson 3: Identifying System Requirements
Before creating a database, you must have a thorough understanding of the job that the database is expected to do. You can gain this understanding only by identifying specific types of information that are essential to developing an efficient database design. This lesson discusses the types of information that you must gather before you can begin creating a logical data model. To develop this model, you must identify the goals of your database project. You must also understand the type and amount of data with which you will be working, how you will use the data, and any business rules that apply to the new system that you are implementing.
After this lesson, you will be able to:
Estimated lesson time: 35 minutes
The Process of Identifying System Requirements
The process of identifying system requirements includes a variety of steps. The number of steps included in this process, how these steps are defined, and the detail in which they are defined can vary greatly from resource to resource (with no one method necessarily being the most accurate). For the purpose of this training kit, however, this process has been divided into four primary tasks:
You do not necessarily have to perform these tasks one at a time. For example, while identifying the amount and types of data, you might also find it useful to determine how the data will be used and what constraints should be placed on the data. Figure 3.8 illustrates the process of identifying system requirements.
Figure 3.8 Identifying system requirements. (Image unavailable)
Designing a database requires an understanding of the business functions that you want to model. As much as possible, your database design should accurately model the business. It is time-consuming to change the design of a database significantly once it has been implemented. A well-designed database also performs better. When designing a database, you must consider the purpose of the database and how it affects the design. In other words, you must determine the goals of the new system. Why are you creating this database?
The system goals are the reasons why you are implementing the new database. To create an effective database design, you must have thorough knowledge of the job that the database is expected to perform. Without this understanding, you cannot make informed decisions about how the database should be designed. The system goals are the reasons why the database is being developed.
Determining the system goals is not always a straightforward process. Most database development projects have many goals (both tangible and intangible), and trying to discover them can often take a fair amount of detective work. For example, a manufacturing company might decide to automate its process for managing inventory. One of the company’s stated goals for the project is "to make it easer to manage inventory." Your job is to take this intangible goal and try to determine the underlying tangible goal(s). Does the company want to speed up the process of managing inventory? Does it want to more accurately track inventory? Does it want to reduce costs? The intangible goal of "making it easier" might include all of these more tangible goals and more.
Although these goals are more tangible, they are still vague. Vague goals tend to be stated in general terms, such as "increase productivity" or "improve performance." As you go through the process of identifying goals, you must determine the degree to which these goals should be achieved. If the goal is to increase productivity, you should try to find out from what to what. Whenever possible, your goals should be directly measurable.
You should be aware, however, of the dangers of going overboard when measuring certain goals. Often, in order to determine a measurable goal, you must be able to establish an initial measurement. For example, if a goal of an inventory database is to improve accuracy, it might take a great deal of resources to study how much inaccuracy exists in the current process. A study of this sort could span years of inventory history and perhaps cost more to conduct than it would to design and implement the database. In cases such as these, it might be better to talk to managers and bookkeepers first to get a general sense of where the problems lie and what can be done to solve them. When determining the extent to which a base measurement should be studied, you should keep in mind the scale of the project and its practical application while always maintaining a sense of proportion.
Sometimes, intangible goals can be difficult to translate into more tangible goals. This situation is particularly true for goals that adopt popular marketing jargon, such as product position, getting on the same page, or thinking out of the box. For example, the stated goal might be something that seems to have no meaning or relevance: "We want the new system to show our clients that we’re thinking out of the box and getting on the same page as themin order to improve product positioning." In these cases, you must work closely with the organization to clearly define what its stated goal means.
After you have defined the initial goals for the new database, you can begin looking at the type and amount of data that the system will support. As you move forward with the database design process, however, be prepared to re-evaluate these goals. As projects move forward, management changes, business requirements change, and company expectations are revised. As a result, goals evolvewhich means that the database design might need to be modified. You might also discover, as you dig deeper into the project, that some goals are unattainable or inappropriate. As new understandings and additional information continue to unfold, you must be prepared to act accordingly.
Identifying the Amount and Types of Data
The amount and types of data that your database will store can affect database performance and should be taken into consideration when creating your database. The amount of data will, of course, affect the size of your database, and the types of data are a factor in determining the kinds of constraints that are incorporated into the database design.
In many cases, determining the amount and types of data is a straightforward process because a system is already implemented and you are simply upgrading or replacing that system. In these situations, you can examine the body of data that already exists.
In those cases in which you are implementing a new systemor radically altering the existing oneyour job might be a little more difficult because you might have to spend a fair amount of time determining what types of data will be stored and how much data there will be. You might need to interview key participants and collect copies of relevant documents and forms, such as customer statements, inventory lists, management reports, and any other documents that are currently being used.
Whatever the current system, you must determine the volume of data that the system will manage. When examining data volume, you should identify the actual amount of data and its growth pattern. For example, a warehouse might currently carry only a few thousand items, but it might be planning to add several hundred a day over the next few years to substantially increase how many items are kept on hand. Another warehouse might carry millions of items but it might plan to add only a few new items a day, never letting the inventory go much beyond its current capacity. The growth patterns for these two systems are very different, and as a result, the design approach will vary.
When looking at the types of data, you are basically trying to get a general sense of the categories of information that you will be storing and what details about the categories are necessary to store. This process will prepare you for mapping out the entities and attributes that you will incorporate into your database design. For example, if you are developing a database for a clothing retailer, the types of data might include information about customers who purchase products from the store. Customer information could include names, addresses, phone numbers, and even style preferences.
At this point in the design process, you do not have to get too specific about how data should be categorized or grouped. You are merely attempting to gain a general sense of the types of data involved and creating a centralized list for those types of data. When you actually begin to identify database objects, you will take the information you gather here and use it to develop a data model.
Identifying How the Data Will Be Used
As you gather system requirements, you must determine how information in your database will be used. The purpose of this step is to identify who will be using the data, the number of users who will be accessing the data, and the tasks they will be performing when they access that data.
When determining who will be using the data, you should think in terms of categories of users. For example, one category of users might be the general public (who accesses data through the Internet). You might also have another category of users who access data through the company’s intranet. Some organizations might have only one type of user, while other organizations might have many types. In addition, there is no set minimum or maximum number of users that each category must contain. The only limitations are those dictated by hardware configurations and database design. One category might contain only one user, while another category might contain 100,000 users.
As you determine who will be using the data, you must also identify how many users in each category will be accessing the data. This estimate should include not only current numbers but projected figures, as well. In addition, you will have to define user concurrency. You should know how many people will be connected to the database at one time and how many might be trying to update the database at one time.
Once you have defined who your users are and how many there are, you must identify the tasks they will be performing when they access the database. For example, suppose a manufacturing company includes a category of users who take orders from customers. This order-taking group must be able to access and update customer information. In addition, it must be able to view inventory data in order to place these orders. The company might also include a category of users from human resources. The human resources group must be able to view and update employee information. By identifying these tasks, you can determine how the database will be accessed and how data will be viewed and manipulated. When you combine this information with the number and type of users, you will be able to implement a database design that serves the needs of everyone in the organization.
Identifying Business Rules of the System
By identifying the business rules, you are determining the constraints that govern how data and the system should be handled and protected. These constraints refer to more than the individual integrity applied to entity attributes. Business rules are much broader and incorporate all of the constraints on the system, including data integrity as well as system security. In other words, you are defining what each category of users can or cannot do.
Returning to the example of the manufacturing company, the order-taking group can access and update customer records and view inventory data. You might determine that these users should not be able to update inventory data and should not be able to view employee data, however. You might also determine that no customer records can be created without a full mailing address and phone number. Another constraint might be that any item added to a customer order should be removed from inventory. Business rules can include a wide spectrum of constraints, some pertaining to the system as a whole and others pertaining to specific types of data.
Exercise 2: Identifying the System Requirements for Your Database Design
In this exercise, you will review the following scenario. From the information in the scenario, you will identify the system requirements for a database design. You will be using this scenario and the result of this exercise in subsequent exercises. The end product will be a database that you have designed and implemented on your SQL Server computer. To complete this exercise, you need paper and a pencil. Because you need to save the result of this exercise, however, you might want to copy it into a word processing file or text file.
Book Shop Scenario
The manager of a small book shop has asked you to design and implement a database that centralizes information so that it is easier and more efficient to manage inventory and track orders and sales. The shop handles rare and out-of-print books and tends to carry only a few thousand titles at any one time. Currently, the manager tracks all of the sales and inventory on paper. For each book, the manager records the title, author, publisher, publication date, edition, cost, suggested retail price, and a rating that indicates the condition of the book. Each book is assigned one of the following ratings: superb, excellent, good, fair, poor, or damaged. The manager would like to be able to add a description to each rating (just a couple of sentences), but the description should not be required. The information about each book must include the title, author, cost, suggested retail price, and rating. The publisher, publication date, and edition are not always available. If the year a book was published is available, the year will never be before 1600. And for purposes of the new database system, the publication date will never fall after the year 2099.
Because these books are rare, each title must be tracked individuallyeven if they are the same book (identical title, author, publisher, publication date, and edition). Currently, the manager assigns a unique ID to each book so that identical titles can be differentiated. This ID must be included with the book information. The book ID assigned by the manager is an eight-character ID made up of numbers and letters.
The manager also maintains limited information about each author whose books the store has carried or is carrying. The store might carry more than one book by an author, and sometimes more than one author will have written a book. The manager currently maintains information about approximately 2500 authors. The information includes the author’s first name, last name, year of birth, and year of death (if applicable). The information must includeat the very leastthe author’s last name. The manager would like to include a brief description of each author, if available, when the author is added to the list. The description will usually be no longer than one or two sen tences.
The bookstore has 12 employees (including the manager and assistant manager). The manager expects to hire an additional employee every year for the next few years. Both the manager and the assistant manager must be able to access and modify information about each employee as necessary. Employee information must include each employee’s first name, last name, address, phone number, date of birth, hire date, and position in the store. Positions include Manager, Assistant Manager, Full Time Sales Clerk, and Part Time Sales Clerk. The manager might at some point want to add new job titles to the list or change existing ones and would eventually like to add a brief description of job duties to each title (at least, to some of the titles). An employee can hold only one position at any one time. No employeeother than the two managersshould have access to the employee information. The manager also likes to track how many books and which books each employee is selling.
The bookstore currently maintains information about customers. For each customer, the information includes the customer’s first name, last name, telephone number, mailing address, books that the customer has purchased, and when the purchase was made. Because some customers do not like to give out personal information, only a first name or a last name is required. The manager currently has a list of about 2000 customers. Not all customers who are included in the list have bought books, although most have.
The manager maintains a record of sales by tracking each order from when a sales clerk takes the order to when the sale is complete. In some cases, such as for walk-in customers, these two events occur concurrently. Each order must include information about the book sold, the customer who bought the book, the salesperson who sold the book, the amount of the sale, and the date of the order. The order must also include the delivery or pickup date, which is added after the merchandise is actually picked up or delivered. An order is completed when a book has been paid for and picked up at the store or paid for and shipped to the customer. A book cannot be taken out of the store or shipped unless it is paid for. Each order includes the payment method and the status of the order. Payment methods include cash, check, and credit cards. The status of an order must be one of the following: (1) to be shipped, (2) customer will pick up, (3) shipped, or (4) picked up. An order can contain only one customer, salesperson, order date, delivery date, payment method, and order status; however, an order can contain one or more books.
Currently, orders are generated, tracked, and modified on paper order forms. The forms are used to make sure that the orders get shipped (if applicable) and to maintain a record of sales. Whenever a book is added to an order, it is removed from the inventory list. This process has been very tedious and not always very efficient. This situation can also lead to confusion and mistakes. Ideally, the manager would like sold books to remain in the list of books but be marked somehow to show that the book has been sold.
The store sells about 20 books a day. The store is open five days a week for about 10 hours a day. There are one to two salespeople working at the same time, and there are two sales counters where people pick up and pay for books and where salespeople process orders. At least one manager is in the store at one time.
The manager expects sales to increase by about 10 percent each year. As a result, the number of books on hand, authors, and customers should all increase at about the same rate.
In order to serve customers effectively, each employee must be able to access a centralized source of information about authors, books in stock, customers, and orders. Currently, employees access this information from index cards and lists. Often, these lists are not up-to-date, and errors are made. In addition, each employee should be able to create, track, and modify orders online, rather than having to maintain paper order forms. Only the managers should be able to modify information about authors, books, and customers, however.
To identify system goals
What are those goals?
Which goals are measurable?
To identify the amount and type of data
What categories of data can you identify?
What types of information can you identify?
What is the volume of data for each category?
What is the growth pattern for each category?
To identify how the data will be used
What are those categories of users?
What are the current number of users and the projected number of users in each category?
What tasks will each type of user be performing?
To identify business rules
What are the business rules?
Before you can develop a data model, you must identify the goals of your database project, the type and amount of data that you will be working with, how the data will be used, and any business constraints that should exist on the data. You must consider the purpose of the database and how it affects the design. You should have a clear understanding of why you are creating this database. Another area of concern when identifying system requirements is the amount and types of data that your database will store. Whatever the current system, you must determine the volume of data that the system will manage. When examining data volume, you should determine the actual amount of data and its growth pattern. When looking at the types of data, you are basically trying to get a general sense of the categories of information you will be storing and what details about the categories are necessary to store. As you gather system requirements, you must identify who will be using the data, the number of users who will be accessing the data, and the tasks they will be performing when they access that data. By identifying the constraints on the data, you are determining the business rules that govern how data should be handled and protected. Business rules include data integrity as well as system security. They enable you to define what each category of users can and cannot do.
Lesson 4: Developing a Logical Data Model
Once you have identified the system requirements, you are ready to develop a logical data model. The data model is essentially an extension of the system requirements. When creating the data model, you are organizing the requirements into a logical representation of the database. The data model includes definitions of entities, their attributes, and entity constraints. The model also includes definitions of the relationships between entities and the constraints on those relationships. This lesson describes how to develop a data model by identifying the entities, their attributes and constraints, and their relationships.
After this lesson, you will be able to:
Estimated lesson time: 35 minutes
Identifying Entities and Their Attributes
When you gather system requirements for a database design, one of the steps that you take is to define the types of data that the database will contain. These types of data can be separated into categories that represent a logical division of information. In most instances, each category translates to a table object within the database. Normally, there is a set of primary objects, and after they are identified, the related objects become more apparent.
For example, in the Pubs database, one of the primary objects is the Titles table. One of the objects related to the Titles table is the RoySched table, which provides information about the royalty schedules associated with each book. Another object is the TitleAuthor table, which matches authors to books.
By using the categories of data defined in the system requirements, you can start to create a map of the table objects within your new database. For example, suppose you are designing a database for a hotel’s reservation system. During the process of gathering system requirements, you identify several categories of data, including rooms, guests, and reservations. As a result, you add tables to your database design that match each of these categories, as shown in Figure 3.9.
Figure 3.9 The primary objects in a database design: the Rooms table, the Reservations table, and the Guests table. (Image unavailable)
When identifying the business rules for this system, you determined that the hotel has eight types of rooms and that regular guests prefer a certain type of room. As a result, the Rooms table and the Guests table will each include a room type attribute. You decide to create a table for room types, as shown in Figure 3.10.
Figure 3.10 The hotel's reservation database, which includes the Room Type table. (Image unavailable)
Now, the Rooms table and the Guests table can reference the RoomType table without having to repeat a room description for each room and each guest. In addition, as room types change, you can update the information in one location, rather than having to update multiple tables and records.
Before you can complete the process of defining table objects within the database, you must define the relationships between the tables. Whenever you identify a many-to-many relationship, you will have to add a junction table. Relationships are discussed in more detail later in this lesson.
After you have defined all of the tables that you can define at this point, you can define the columns (attributes) for those tables. Again, you will be taking this information directly from the system requirements in which you identified which types of data should be included with each category of information.
Using the earlier hotel database example, suppose that you determined during the process of gathering system requirements that the Guests category of data should include information about the guests’ first names, last names, addresses, telephone numbers, and room preferences. As a result, you plan to add columns to the Guests table for each of these types of information. You also plan to add a unique identifier for each guest, as is the case with any normalized entity. Figure 3.11 shows the Guests table with all of the columns that the table will contain.
Figure 3.11 The Guests table and its attributes. (Image unavailable)
Identifying Relationships Between Entities
After you have defined the tables and their columns, you should define the relationships between the tables. Through this process, you might discover that you need to modify the design that you have created to this point.
Start by choosing one of the primary tables and selecting the entities that have relationships to that table. Referring once more to the hotel database used in earlier examples, assume that the system requirements state that all reservations must include room and guest information. Rooms, guests, and reservations are the categories of data. As a result, you can deduce that a relationship exists between rooms and reservations and between guests and reservations. Figure 3.12 shows the relationships between these objects. A line connecting the two tables signifies a relationship. Notice that a relationship also exists between the Rooms table and the RoomType table and between the Guests table and the RoomType table.
Figure 3.12 The relationships that exist between tables in the hotel's reservation database. (Image unavailable)
Once you establish that a relationship exists between tables, you must define the type of relationship. In Figure 3.12, each relationship (line) is marked at each end (where it connects to the table) with the number 1 or with an infinity symbol. The 1 refers to the one side of a relationship, and the infinity symbol refers to the many side of a relationship.
To determine the types of relationships that exist between tables, you should look at the types of data that each table contains and the types of interchange between them. For example, a relationship exists between the Guests table and the Reservations table. The relationship exists because guests must be included in reservation information. According to the business rules, a guest can make one or more reservations, but each reservation record can include the name of only one guest, usually the person who is making the reservation. As a result, a one-to-many relationship exists between the two tables: one guest to many reservations.
A relationship also exists between the Reservations table and the Rooms table. According to the business rules, a reservation can be made for one or more rooms, and a room can be included in one or more reservations (on different dates). In this case, a many-to-many relationship exists: many reservations to many rooms. In a normalized database design, however, many-to-many relationships must be modified by adding a junction table and creating one-to-many relationships between each original table and the junction table, as shown in Figure 3.13.
Figure 3.13 The RoomReserv table as a junction table between the Rooms table and the Reservations table. (Image unavailable)
Identifying Constraints on Data
At this point in the database design process, you should have the entities, their attributes, and the relationships between entities mapped. Now, you must identify the constraints on the data that will be stored in your tables. Most of your work was already completed when you identified the business rules as you gathered system requirements. As stated previously, business rules include all constraints on a system, including data integrity and security. For this stage of the design process, your focus will be on the constraints specific to the data. You will take the data-related business rules and refine and organize them. You should try to organize the constraints based on the objects that you created in the database, and you should word them in a way that reflects those objects.
Returning again to the database design in Figure 3.13, suppose that one of the business rules is stated as follows: "A guest record can, but is not required to, include one of the predefined room type preferences but cannot include any other room type preference." When defining the data constraints, you should reference the relevant tables and columns and separate them so that they each focus on a single instruction:
When possible, you should organize data constraints according to tables and their columns. In some cases, a constraint applies to the table as a whole, to more than one table, to a relationship between tables, or to data security. In these cases, try to organize the constraints in a way that is the most logical and the most relevant to the project you are working on. The goal of identifying the data constraints is to have a clear road map when creating database objects and their relationships and enforcing data integrity.
Exercise 3: Developing a Logical Data Model
In this exercise, you will take the steps necessary to create a logical data model. Much of this exercise involves drawing the tables, entities, and relationships that make up the database. Although you can use a drawing program such as Visio to create these objects, paper and a pencil are all that you really need. If you like, you can later transfer your model to a drawing program. In addition, you will need paper and a pencil to write the data constraints. You can also write these directly to a word processing document or a text document. Whatever method you choose, you should save the result for subsequent exercises. To perform this exercise, you will use the book shop scenario from Exercise 2 in Lesson 3.
To identify which tables to add to a database
Each category represents one of the primary table objects in your database design.
Your drawing should include five tables.
Your next step will be to identify any related tables. At this point, designing a database becomes a little more complicated. A good source to use for determining related tables is the list of business rules that you identified when you gathered the system requirements. Essentially, you are looking for subcategories of information or business rules that lead you to believe that additional tables are necessary. Remember, you can modify the database design as you identify relationships between tables and constraints on data.
For consistency, use the following names for your new tables: OrderStatus, FormOfPayment, Positions, and BookCondition.
You should now have 10 tables.
To identify which columns to add to the tables
For each category of data, you defined which information should be included with each category. This information makes up your columns.
For consistency, use the following labels for column names:
| Table | Columns |
| Books | TitleID, AuthorID, Publisher, PubDate, Edition, Cost, SRP, ConditionID, Sold |
| BookCondition | ConditionID, ConditionName, Description |
| Authors | AuthorID, FirstName, LastName, YearBorn, YearDied, Description |
| Employees | EmployeeID, FirstName, LastName, Address1, Address2, City, State, Zip, Phone, DOB, HireDate, PositionID |
| Positions | PositionID, Title, JobDescrip |
| Customers | CustomerID, FirstName, LastName, Phone, Address1, Address2, City, State, Zip |
| Orders | OrderID, CustomerID, EmployeeID, Amount, OrderDate, DeliveryDate, PaymentID, StatusID |
| OrderStatus | StatusID, StatusDescrip |
| FormOfPayment | PaymentID, PaymentDescrip |
| BookOrders | OrderID, BookID |
Notice that the Employees table does not include a column for books purchased and dates of purchases. Because each customer can purchase more than one book, you would not include the information here. You could create a table to store this information, but it would be unnecessary because it would duplicate information that already exists in a database (information that can be derived through views or ad hoc queries).
To identify relationships between entities
You are looking for direct relationships. For example, the Books table has a direct relationship with the BookCondition table. BookCondition data applies directly to Books data. In addition, Authors data is directly related to Book data (authors write books). There is also a direct relationship between Books data and BookOrders data (orders include the books being sold).
Notice that there is no direct relationship between the Books table and the Orders table. The relationship between the two tables is indirect and is expressed through the BookOrders table.
Your database design should look similar to the schema in Figure 3.14.
Figure 3.14 Identifying the relationships between tables in the logical data model. (Image unavailable)
To determine the type of relationship, think in terms of the data associated with each object. For example, a relationship exists between employees and the orders that they generate. An employee can create many orders, but only one employee can create an order. Therefore, a one-to-many relationship exists between the Orders table and the Employees table (one employee can create many orders). The Employees table is on the one side of the relationship, and the Orders table is on the many side.
Your database should now look similar to the schema in Figure 3.15.
Figure 3.15 Identifying the types of relationships between tables in the logical data model. (Image unavailable)
Which relationship is many-to-many?
You are deleting the relationship between the two tables because a direct relationship no longer exists. Instead, an indirect relationship is created through the BookAuthors table. In addition, the AuthorID column is no longer necessary in the Books table because the book/author relationship is expressed in the BookAuthors table.
Your database design should now look similar to the schema in Figure 3.16.
Figure 3.16 Adding the BookAuthors table to the logical data model. (Image unavailable)
To identify constraints on data
To which object(s) does this business rule apply?
What are the data constraints?
What are the data constraints for your database design?
The data model includes definitions of entities, their attributes, and entity constraints. The model also includes definitions of the relationships between entities and the constraints on those relationships. One of the first steps that you must take toward creating a data model is to identify the types of data that the database will contain. These types of data can be separated into categories that represent a logical division of information. In most instances, each category translates to a table object within the database. Once you have defined the tables and their columns, you should define the relationship between the tables. To determine the type of relationship that exists between tables, you should look at the types of data that each table contains and the types of interchanges between them. Once you determine the relationships between tables, you must identify the constraints on the data that will be stored in your tables. You should organize data constraints according to tables and their columns (when possible).
The following questions are intended to reinforce key information presented in this chapter. If you are unable to answer a question, review the appropriate lesson and then try the question again. You can find answers to the questions in Appendix A, "Questions and Answers."
loading...
loading...
loading...
Terms of Use, Copyright, and Privacy Policy
© 1997-2009 Barnesandnoble.com llc