Object Oriented System Analysis: Chapter 11: Physical Database Design

True-False Questions

1. The primary focus of attention for physical database design is efficiency and speed.

Answer: True Page Reference: 303 Difficulty: Moderate

2. The latest standard for SQL is SQL-2003.

Answer: False Page Reference: 304 Difficulty: Moderate

3. Establishment of constraints on field values can be accomplished using the CREATE TABLE SQL command.

Answer: True Page Reference: 305 Difficulty: Moderate

4. The SELECT statement includes an option for determining the primary key for a database table.

Answer: False Page Reference: 305-306 Difficulty: Moderate

5. The Values clause indicates the requirements for a join in a SELECT statement.

Answer: False Page Reference: 305-306 Difficulty: Moderate

6. The field list of an SQL SELECT query comes before the FROM clause.

Answer: True Page Reference: 306 Difficulty: Moderate

7. Group By and Having clauses are used primarily to enhance aggregate operations like Sum and Avg.

Answer: True Page Reference: 307 Difficulty: Moderate

8. Oracle’s Char data type should be used when you want fields to contain a variable number of characters.

Answer: False Page Reference: 308 Difficulty: Moderate

9. The highest year value that can be represented in a Oracle’s Date data type is 9999AD.

Answer: True Page Reference: 308 Difficulty: Moderate

10. The type of operation that performs instance creation and initialization is called a derivation.

Answer: False Page Reference: 309 Difficulty: Moderate

11. It is NOT necessary to explicitly store derived field data in a database.

Answer: True Page Reference: 309-310 Difficulty: Moderate

12. A format control follows specific patterns of characters that must be satisfied by the data in a field.

Answer: True Page Reference: 310 Difficulty: Moderate

13. Range controls are used to ensure referential integrity.

Answer: False Page Reference: 310 Difficulty: Moderate

14. Violations of normalization requirements can sometimes improve database performance.

Answer: True Page Reference: 311 Difficulty: Moderate

15. If a data field from an associative class is moved to one of the other tables involved in a many-to-many relationship, this will likely cause a violation of second normal form.

Answer: True Page Reference: 313 Difficulty: Moderate

16. Determining the scan time of a table involves calculating blocking factors.

Answer: True Page Reference: 314 Difficulty: Moderate

17. Primary keys requiring sequential access should be hashed.

Answer: False Page Reference: 315 Difficulty: Moderate

18. Seek time involves waiting for the proper block to rotate under the controllers read-write head.

Answer: False Page Reference: 316 Difficulty: Moderate

19. The number of row objects that can fit in one block is called an extent.

Answer: False Page Reference: 316 Difficulty: Moderate

20. B+ trees are superior to hash algorithms for supporting sequential data access in a table.

Answer: True Page Reference: 319-320 Difficulty: Moderate

21. Traditional relational database technology supports nested tables for aggregation purposes.

Answer: False Page Reference: 324 Difficulty: Moderate

Multiple-Choice Questions

22. Physical database design is primarily concerned with issues of:

a. normalization.

b. referential integrity.

c. efficiency.

d. identification of entity classes.

Answer: c Page Reference: 303 Difficulty: Moderate

23. SQL is a:

a. database design standard.

b. relational database query language.

c. major database vendor.

d. file organization indexing scheme.

Answer: b Page Reference: 304 Difficulty: Easy

24. The latest ANSI standard for SQL is:

a. SQL-92.

b. SQL-95.

c. SQL-99.

d. SQL-02.

Answer: c Page Reference: 304 Difficulty: Moderate

25. The purpose of a SQL SELECT statement is to:

a. retrieve data from the database.

b. select a table for creation.

c. place new data into a table.

d. make modifications to existing data in a table.

Answer: a Page Reference: 306 Difficulty: Moderate

26. The ____________ keyword of the SELECT statement is used to identify the tables involved in the query.

a. TABLES

b. SELECT

c. WHERE

d. FROM

Answer: d Page Reference: 306 Difficulty: Moderate

27. A query that involves retrieval of data from multiple tables based on connecting primary keys and foreign keys is called a ____________ query.

a. composite

b. join

c. merge

d. key

Answer: b Page Reference: 306-307 Difficulty: Moderate

28. When translating from logical to physical database design, an attribute maps to a(n):

a. table.

b. index.

c. field.

d. data type.

Answer: c Page Reference: 307 Difficulty: Easy

29. The maximum length allowed for a variable-length character field in Oracle 9i is ____________ bytes.

a. 1000

b. 2000

c. 3000

d. 4000

Answer: d Page Reference: 308 Difficulty: Hard

30. The purpose of a constructor is to:

a. create an instance of a class.

b. define the structure of a table.

c. set up the overall format of a database.

d. ensure data integrity when inserting rows.

Answer: a Page Reference: 309 Difficulty: Moderate

31. The interpretation a field’s bit pattern is determined by its:

a. identifier.

b. data type.

c. content.

d. blocking factor.

Answer: b Page Reference: 309 Difficulty: Moderate

32. The term “input mask” refers to a field’s:

a. format control.

b. range control.

c. default value.

d. referential integrity.

Answer: a Page Reference: 310 Difficulty: Moderate

33. The main advantage of denormalization is:

a. increased referential integrity.

b. reduction in data duplication.

c. speed of data retrieval.

d. reduction of storage space.

Answer: c Page Reference: 311-312 Difficulty: Moderate

34. Denormalization of tables in one-to-many relationships is a viable option if:

a. you want to avoid null values in fields.

b. updates outnumber retrievals.

c. retrievals outnumber updates.

d. referential integrity is a must.

Answer: c Page Reference: 313 Difficulty: Moderate

35. A unit of data retrieved by the operating system from secondary storage is called a(n):

a. field.

b. row.

c. object.

d. block.

Answer: d Page Reference: 315 Difficulty: Moderate

36. A (n)____________ file structure uses an algorithm to convert a key value into a row address.

a. heap

b. hash

c. block scan

d. indexed

Answer: b Page Reference: 317 Difficulty: Moderate

37. A B+ tree data structure is used to support:

a. block extents.

b. hash algorithms.

c. indexed file organizations.

d. heaps.

Answer: c Page Reference: 317-318 Difficulty: Moderate

38. The fan-out ratio in a B+ tree represents the number of:

a. levels of the tree.

b. blocks required for scanning.

c. rows in the table.

d. indexes in a block.

Answer: d Page Reference: 319 Difficulty: Moderate

39. Which circumstance supports the decision to index a table’s foreign key?

a. Primary key is not indexed

b. Row-object ratio is small

c. Table is cached

d. Operating system performs sequential pre-fetches

Answer: b Page Reference: 322-323 Difficulty: Moderate

40. In object-relational databases, aggregation is implemented via:

a. nested tables.

b. object identifiers.

c. array data types.

d. domain constraints.

Answer: a Page Reference: 324 Difficulty: Moderate

41. Which of the following is an object-relational feature not found in traditional relational database systems?

a. Multi-table joins

b. Default values for fields

c. Relationships by reference

d. Relational integrity constraints

Answer: c Page Reference: 326 Difficulty: Moderate

Essay and Problem-Solving Questions

42. Describe the process of physical database design. What are the important types of information a designer must use in order to construct a well-designed physical database?

Answer:

Physical database design is the portion of the database design process that involves consideration of efficiency and performance of the database system. Therefore, issues of response time, transaction throughput, storage space utilization, and other nonfunctional aspects of the database are dealt with at this stage. The decisions made by the database designer include merging or splitting entity classes, denormalizing object relations, constructing indexes, and file organizations. The database designer needs information from many sources. From the systems analyst, a full description of all entity classes and object relations are required, together with characteristics of all attributes, as well as volume estimates of data. Information about expected system usage is also important, including descriptions of where and when data will be used, frequency of accesses and updates to database records, and requirements for response time and throughput. The designer also needs to know the type of technology that will be available for the database, including available storage structures, database software, and object-relational capabilities of the underlying DBMS.

Page Reference: 303-204 Difficulty: Moderate

43. What is SQL? Who is the standards body responsible for development of SQL, and what versions of SQL are most recent and most widely used? Name and describe four common SQL commands.

Answer:

SQL stands for Structured Query Language, and is the standard language for creating and querying tables in relational databases. The SQL standard is developed and maintained by the American National Standards Institute (ANSI), and the two main versions are SQL-92 (which is the most widely used) and SQL-99 (which is the most recent). Relational DBMSs are compliant with SQL-92, and are working toward compliance with SQL-99. Four common SQL commands are Create Table, Select, Insert, Update, and Delete. Create Table is used to define and create table relations, including naming the table, naming all fields and giving their data types and sizes, and identifying any primary and foreign keys, as well as other data constraints. Select is the command for querying tables to obtain information, based on identifying the fields to return, the tables from which to search, and the conditions under which records should be included. Insert is the command used to add new records to a table, Update is used to modify data in existing records, and Delete is used to remove records from a table.

Page Reference: 304-307 Difficulty: Moderate

44. Consider the database schema from Table 11-1 of the text. Based on this data, show the result of executing the following SQL statement:

SELECT ENAME, DNAME, SAL FROM EMP, DEPT

WHERE EMP.DEPTNO = DEPT.DEPTNO AND JOB = ‘CLERK’

ORDER BY SAL;

Answer:

SMITH RESEARCH 800

JAMES SALES 950

ADAMS RESEARCH 1100

MILLER ACCOUNTING 1300

Page Reference: 304-307 Difficulty: Moderate

45. Consider the database schema from Table 11-1 of the text. Based on this data write the SQL statement that will give the following results:

7788 SCOTT ANALYST

7369 SMITH CLERK

7876 ADAMS CLERK

7566 JONES MANAGER

Answer (others are also possible):

SELECT EMPNO, ENAME, JOB FROM EMP, DEPT

WHERE EMP.DEPTNO = DEPT.DEPTNO AND DNAME='RESEARCH'

ORDER BY JOB;

Page Reference: 304-307 Difficulty: Moderate

46. Consider the database schema from Table 11-1 of the text. Based on this data write the SQL statement that will show the employee name and location of all employees making at least $1500 and no more than $2500. The results should display in alphabetical order based on the employee’s name.

Answer:

SELECT ENAME, LOCATION FROM EMP, DEPT

WHERE EMP.DEPTNO = DEPT.DEPTNO

AND SAL BETWEEN 1500 AND 2500

ORDER BY ENAME;

Page Reference: 304-307, 321 Difficulty: Hard

47. Consider the database schema from Table 11-1 of the text. Based on this data write the SQL statement that will change the salary of all clerks making under $1000 to $1000.

Answer:

UPDATE EMP SET SALARY = 1000 WHERE JOB=’CLERK’ AND SAL<1000;

Page Reference: 307 Difficulty: Moderate

48. Give a detailed description of the SQL Select statement. Describe the overall structure, including all the keywords and their purposes. Discuss the concept of joins and aggregation operations.

Answer:

The SQL Select statement is used to query one or more tables of a database. It is composed of the following keywords, which must be used in the following order: Select, From, Where, Order By, Group By, and Having. Following the Select keyword is a list of columns (fields) that should be returned from the query; these are separated by commas. Alternatively, if all fields should be returned, you can follow Select with an asterisk (*). The From keyword indicates the table(s) that are involved in the query; if more than one table is involved, they are separated by commas. Select and From are the only mandatory keywords to use in a query. The Where keyword is used to specify selection criteria for including rows in the query results; in addition, join specifications are included in the Where clause. The Order By keyword specifies sort conditions. The Group By and Having keywords are used for aggregation operations. The Group By allows you to obtain aggregates for groups of records based on the columns following the Group By keyword. The Having allows you to specify that only groups satisfying certain conditions in their aggregates will be included; it is something like a Where statement for aggregates instead of individual rows. As mentioned previously, the Where statement may also include conditions for a join. A join is a query on multiple tables in which the primary key of one table is matched with the foreign key of another. Matching of foreign key with primary key is done in the Where section of a join query. Aggregation operations involve obtaining information that spans across multiple rows; these include counts, sums or averages of values in specified fields.

Page Reference: 306-307 Difficulty: Moderate

49. What is a field? How are fields created in SQL? What types of data can a field hold in Oracle databases? What considerations are involved in assigning data types to fields?

Answer:

A field is the smallest unit of data in a database. Generally, fields correspond with attributes in entity classes, although composite attributes map onto several fields in a table. Fields are created using the Create Table command in SQL, in which their names, data types, sizes, and integrity checks are specified. Fields can hold a variety of data types. In Oracle databases, the main data types include: text-based types such as Char, Varchar2, and Long; Number, which can involve fixed or floating point numbers; Date and Timestamp for keeping track of dates and times; and various types of large object blocks (LOBs), such as BLOB, CLOB, or NCLOB, which are used to store large objects like images, video files, or international character data.

When designing fields, selection of data types involves consideration of the following objectives: minimization of storage space requirements, representing all possible values of an attribute, ensuring data integrity for the field, and supporting all desired data modifications for the field.

Page Reference: 307-309 Difficulty: Moderate

50. List and describe six techniques supported by most DBMSs for controlling data integrity.

Answer:

The main techniques for controlling data integrity are data typing, specifying default values, format controls, range controls, referential integrity checks, and null value controls. Specifying data types for fields assist integrity control by restricting the possible range of values that a field can take on. For example, specifying a data type can ensure that text data will not go into fields that should be in mathematical operations. A default value specifies the value a field will assume if no explicit value is entered for that field. Format controls (also called input masks) ensure that data entered satisfy a specified structure and size. For example, a format can specify that a character field begins with two letters and then contains three numeric digits. Alternatively, a format control can require numeric data to be formatted in currency form. A range control can limit the possible values in a field more precisely than a data type. For example, a numeric field can be limited to include only numbers between 0 and 100, or a character field can be limited to allow only values from a specified list. Referential integrity controls ensure that foreign keys of one table match primary keys of another in one-to-many relationships. Null value controls can be imposed to control whether fields are allowed to be empty, that is, devoid of all data.

Page Reference: 310-311 Difficulty: Moderate

51. What is denormalization? What are the advantages and disadvantages of denormalization? Describe three situations in which denormalization would be an appropriate decision.

Answer:

Denormalization is the process of combining normalized relations into physical tables based on affinity and use of rows and fields, and on retrieval and update frequencies of the tables. The main advantage of denormalization is improved system performance, mainly because the combination of data into denormalized tables reduces the need for join queries. However, denormalization increases the risk of data inconsistencies, due to data duplications. One circumstance that supports denormlization is when tables have a one-to-one relationship. Such a circumstance will not introduce any duplications or anomalies, but can result in null values in fields. Another circumstance involves many-to-many relationships in which an associative entity class has non-key values. In this case, combining the associative entity table with one of the other tables can improve performance by turning a three-way join into a two-way join, but at the cost of duplicating data and risking data inconsistencies. Finally, in one-to-many relationships, where the one side has a relatively small number of rows, the data in the one side can be merged into the table on the many side. Again, duplications will result, but join queries can be avoided.

Page Reference: 311-314 Difficulty: Moderate

52. Discuss the concept of file organizations, and the objectives considered by a database designer when selecting file organizations for a database. Then identify the steps that a designer uses when determining a file organization for a database.

Answer:

A file organization is a technique for physically arranging the row objects for a file, whether they are table rows in a relational database or object rows in an object-relational database. A file organization should be selected to achieve a number of goals. These include fast retrieval time, and high transaction throughput, efficient use of storage space, minimal need for reorganization, and accommodation for growth. The steps involved in file organization are as follows. First, the designer needs to estimate the number of rows expected in a table and the size of the rows. Based on this information, the designer calculates blocking factors and number of blocks, which are the storage size of a unit of retrieval of physical data by the system software. This information can help determine the table scan time, or the time it takes to read the entire table from disk. Then frequencies of updates vs. queries should be compared, because the benefit of file organizations are primarily for queries whereas the file organization cost is primarily incurred for inserts, updates, and deletes. The next step is to cache small, frequently used tables in memory. Next, fields that will be hashed are determined; these will typically be primary key fields that do not require sequential access. Finally, fields that will be indexed are determined; these include fields that will require random or sequential access and that are used frequently in queries.

Page Reference: 314-315 Difficulty: Moderate

53. The average time it takes a disk controller to locate and read a block of data from the disk is 3ms. Each block is 4000 bytes. A table contains 2 million rows, each containing 80 bytes of data. How many seconds will it take to scan the entire table? Show your work.

Answer:

The blocking factor is 4000/80 = 50.

Therefore, the number of blocks is 2000000/50 = 40000.

Total scan time = 40000 * 3 = 120000 ms, or 12 seconds.

Page Reference: 315-316 Difficulty: Moderate

54. Compare and contrast the following three file organizations: heap, hash, and index. Describe the structure and implementation of each organization. Then discuss the advantages and disadvantages of each.

Answer:

A heap is essentially a data structure that is unorganized. Rows are added to the end of the table chronologically, and they are not stored in any logical order. The advantage of a heap is that no organizational costs are incurred, since there is no organization that takes place. However, heaps result in very poor data access performance, and are only acceptable if data retrieval is to be done by full table scans. This implies that the table will be small, and probably cached in main memory. A hash file organization is one in which an algorithm is applied to convert a key value into a logical row address (called the relative address) in the table, which is then translated into the absolute physical row address. This has the advantage of providing fast access for random row retrieval. However, because hash algorithms typically result in non-sequential assignment of key values to row numbers, this is not an efficient technique for organizing tables in which sequential data access will be frequently needed. Indexes are the most frequently used file organization in most database systems. An index is a file structure in which each key value is stored with an associated pointer mapping it to a particular row address in the table. This is similar in concept to the index at the end of a book, which contains an alphabetized list of keywords with associated page numbers. However, whereas book indexes are linear, database indexes are typically structured in the form of B+ trees, which are balanced hierarchical data structures. This ensures a minimum time for locating a key value for a large table. Indexes have the advantage of providing fast retrieval for queries requiring either random or sequential data access. But there are significant costs for maintaining indexes when updates, insertions, or deletions take place. Therefore, indexes should be applied only to those fields that will be used frequently in queries, such as primary and foreign keys.

Page Reference: 316-320 Difficulty: Moderate

55. Describe the heuristics (the decision process) used to determine whether a table field should be indexed.

Answer:

The decision process for determining whether to index a field is essentially a cost-benefit question. Indexing a field speeds access time for queries, but slows down processing time for updates, inserts, and deletions. With this in mind, the heuristic is as follows. First, the primary key will typically be indexed for all tables. In determining indexes for nonprimary key fields, The first consideration regards the size of the table. If it is small enough to be cached in main memory, no further indexing is required. Even if it is not cached, a small table with minimal table scan time does not require further indexing. For larger tables, the foreign key should be indexed if the row objects ratio is small, that is, if the number of rows on the many side of a one-to-many relationship will be small for each primary key on the one side of the relationship. Finally, the deciding factor for whether to index any non-key fields involves two questions: (1) Is the field likely to be used frequently in the Where clause of SQL select statements? (2) Does the field have a large number of distinct values? If the answer to both of these questions is yes, then the field should be indexed.

Page Reference: 322-323 Difficulty: Moderate

56. List and describe five object-relational features that are enhancements of traditional relational database technology. How are these features implemented in object-relational DBMSs? What are some commercial DBMS products that support object-relational features?

Answer:

Five features that are found in object-relational databases but not in strictly relational databases are generalization, aggregation, support for multivalued attributes, object identifiers, and relationship by reference. Generalization, based on inheritance, provides for supertype/subtype relationships, and is implemented through the use of pointers. Aggregation is a strongly-bound association in which the aggregate has no meaning without its parts. Aggregation is implemented in ORDBMSs by clustering, which involves nesting of tables. Multivalued attributes are supported via array data types, and has a performance advantage over the reliance on joins which would be necessary in traditional relational databases. Object identifiers are system-generated unique values that ensure uniqueness of objects on instantiation. These are used primarily for maintaining relationships via reference. Oracle and Informix are two examples of commercial DBMSs that support object-relational features.

Page Reference: 323-326 Difficulty: Moderate

Object Oriented System Analysis

Wednesday 23 January 2013

Chapter 11: Physical Database Design

1 comment: