Introduction

A database is an organized collection of data. The data are typically organized to model relevant aspects of reality in a way that supports processes requiring this information. For example, modeling the availability of rooms in hotels in a way that supports finding a hotel with vacancies.
Database management systems (DBMSs) are specially designed software applications that interact with the user, other applications, and the database itself to capture and analyze data. A general-purpose DBMS is a software system designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, MariaDB, PostgreSQL, SQLite, Microsoft SQL Server, Oracle, SAP HANA, dBASE, FoxPro, IBM DB2, LibreOffice Base and FileMaker Pro. A database is not generally portable across different DBMSs, but different DBMSs can interoperate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one database.

Purpose of Database Systems

To see why database management systems are necessary, let's look at a typical ``file-processing system'' supported by a conventional operating system. The application is a savings bank:
- Savings account and customer records are kept in permanent system files.
- Application programs are written to manipulate files to perform the following tasks:
  - Debit or credit an account.
  - Add a new account.
  - Find an account balance.
  - Generate monthly statements.
Development of the system proceeds as follows:
- New application programs must be written as the need arises.
- New permanent files are created as required.
- but over a long period of time files may be in different formats, and
- Application programs may be in different languages.
So we can see there are problems with the straight file-processing approach:
- Data redundancy and inconsistency
  - Same information may be duplicated in several places.
  - All copies may not be updated properly.
- Difficulty in accessing data
  - May have to write a new application program to satisfy an unusual request.
  - E.g. find all customers with the same postal code.
  - Could generate this data manually, but a long job...
- Data isolation
  - Data in different files.
  - Data in different formats.
  - Difficult to write new application programs.
- Multiple users
  - Want concurrency for faster response time.
  - Need protection for concurrent updates.
  - E.g. two customers withdrawing funds from the same account at the same time - account has $500 in it, and they withdraw $100 and $50. The result could be $350, $400 or $450 if no protection.
- Security problems
  - Every user of the system should be able to access only the data they are permitted to see.
  - E.g. payroll people only handle employee records, and cannot see customer accounts; tellers only access account data and cannot see payroll data.
  - Difficult to enforce this with application programs.
- Integrity problems
  - Data may be required to satisfy constraints.
  - E.g. no account balance below $25.00.
  - Again, difficult to enforce or to change constraints with the file-processing approach.
These problems and others led to the development of database management systems.

Data Abstraction

The major purpose of a database system is to provide users with an abstract view of the system. The system hides certain details of how data is stored and created and maintained
Complexity should be hidden from database users.
There are several levels of abstraction:
1. Physical Level:
  - How the data are stored.
  - E.g. index, B-tree, hashing.
  - Lowest level of abstraction.
  - Complex low-level structures described in detail.
2. Conceptual Level:
  - Next highest level of abstraction.
  - Describes what data are stored.
  - Describes the relationships among data.
  - Database administrator level.
3. View Level:
  - Highest level.
  - Describes part of the database for a particular group of users.
  - Can be many different views of a database.
  - E.g. tellers in a bank get a view of customer accounts, but not of payroll data.
Fig. 1.1 illustrates the three levels.
Figure 1.1: The three levels of data abstraction

Data Model

The E-R Model
The entity-relationship model is based on a perception of the world as consisting of a collection of basic objects (entities) and relationships among these objects.
- An entity is a distinguishable object that exists.
- Each entity has associated with it a set of attributes describing it.
- E.g. number and balance for an account entity.
- A relationship is an association among several entities.
- e.g. A cust_acct relationship associates a customer with each account he or she has.
- The set of all entities or relationships of the same type is called the entity set or relationship set.
- Another essential element of the E-R diagram is the mapping cardinalities, which express the number of entities to which another entity can be associated via a relationship set.
We'll see later how well this model works to describe real world situations.
The overall logical structure of a database can be expressed graphically by an E-R diagram:
- rectangles: represent entity sets.
- ellipses: represent attributes.
- diamonds: represent relationships among entity sets.
- lines: link attributes to entity sets and entity sets to relationships.
See figure 1.2 for an example.
Figure 1.2: A sample E-R diagram.
Instances and Schemes
Databases change over time.
The information in a database at a particular point in time is called an instance of the database.
The overall design of the database is called the database scheme.
Analogy with programming languages:
- Data type definition - scheme
- Value of a variable - instance
There are several schemes, corresponding to levels of abstraction:
- Physical scheme
- Conceptual scheme
- Subscheme (can be many)
Data Independence
The ability to modify a scheme definition in one level without affecting a scheme definition in a higher level is called data independence.
There are two kinds:
- Physical data independence
  - The ability to modify the physical scheme without causing application programs to be rewritten
  - Modifications at this level are usually to improve performance
- Logical data independence
  - The ability to modify the conceptual scheme without causing application programs to be rewritten
  - Usually done when logical structure of database is altered
Logical data independence is harder to achieve as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages.
Data Definition Language (DDL)
Used to specify a database scheme as a set of definitions expressed in a DDL
DDL statements are compiled, resulting in a set of tables stored in a special file called a data dictionary or data directory.
The data directory contains metadata (data about data)
The storage structure and access methods used by the database system are specified by a set of definitions in a special type of DDL called a data storage and definition language
basic idea: hide implementation details of the database schemes from the users

Database Manager

The database manager is a program module which provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
Databases typically require lots of storage space (gigabytes). This must be stored on disks. Data is moved between disk and main memory (MM) as needed.
The goal of the database system is to simplify and facilitate access to data. Performance is important. Views provide simplification.
So the database manager module is responsible for
- Interaction with the file manager: Storing raw data on disk using the file system usually provided by a conventional operating system. The database manager must translate DML statements into low-level file system commands (for storing, retrieving and updating data in the database).
- Integrity enforcement: Checking that updates in the database do not violate consistency constraints (e.g. no bank account balance below $25)
- Security enforcement: Ensuring that users only have access to information they are permitted to see
- Backup and recovery: Detecting failures due to power failure, disk crash, software errors, etc., and restoring the database to its state before the failure
- Concurrency control: Preserving data consistency when there are concurrent users.
Some small database systems may miss some of these features, resulting in simpler database managers. (For example, no concurrency is required on a PC running MS-DOS.) These features are necessary on larger systems.

Database Administrator

The database administrator is a person having central control over data and programs accessing that data. Duties of the database administrator include:
- Scheme definition: the creation of the original database scheme. This involves writing a set of definitions in a DDL (data storage and definition language), compiled by the DDL compiler into a set of tables stored in the data dictionary.
- Storage structure and access method definition: writing a set of definitions translated by the data storage and definition language compiler
- Scheme and physical organization modification: writing a set of definitions used by the DDL compiler to generate modifications to appropriate internal system tables (e.g. data dictionary). This is done rarely, but sometimes the database scheme or physical organization must be modified.
- Granting of authorization for data access: granting different types of authorization for data access to various users
- Integrity constraint specification: generating integrity constraints. These are consulted by the database manager module whenever updates occur.

Database Users

The database users fall into several categories:

Application programmers are computer professionals interacting with the system through DML calls embedded in a program written in a host language (e.g. C, PL/1, Pascal).
- These programs are called application programs.
- The DML precompiler converts DML calls (prefaced by a special character like $, #, etc.) to normal procedure calls in a host language.
- The host language compiler then generates the object code.
- Some special types of programming languages combine Pascal-like control structures with control structures for the manipulation of a database.
- These are sometimes called fourth-generation languages.
- They often include features to help generate forms and display data.
Sophisticated users interact with the system without writing programs.
- They form requests by writing queries in a database query language.
- These are submitted to a query processor that breaks a DML statement down into instructions for the database manager module.
Specialized users are sophisticated users writing special database application programs. These may be CADD systems, knowledge-based and expert systems, complex data systems (audio/video), etc.
Naive users are unsophisticated users who interact with the system by using permanent application programs (e.g. automated teller machine)

Data Manipulation Language (DML)

Data Manipulation is:
- retrieval of information from the database
- insertion of new information into the database
- deletion of information in the database
- modification of information in the database
A DML is a language which enables users to access and manipulate data. The goal is to provide efficient human interaction with the system.
There are two types of DML:
- procedural: the user specifies what data is needed and how to get it
- nonprocedural: the user only specifies what data is needed
  - Easier for user
  - May not generate code as efficient as that produced by procedural languages
A query language is a portion of a DML involving information retrieval only. The terms DML and query language are often used synonymously.

Overall System Structure
Database systems are partitioned into modules for different functions. Some functions (e.g. file systems) may be provided by the operating system.
Components include:
- File manager manages allocation of disk space and data structures used to represent information on disk.
- Database manager: The interface between low-level data and application programs and queries.
- Query processor translates statements in a query language into low-level instructions the database manager understands. (May also attempt to find an equivalent but more efficient form.)
- DML precompiler converts DML statements embedded in an application program to normal procedure calls in a host language. The precompiler interacts with the query processor.
- DDL compiler converts DDL statements to a set of tables containing metadata stored in a data dictionary.
In addition, several data structures are required for physical system implementation:
- Data files: store the database itself.
- Data dictionary: stores information about the structure of the database. It is used heavily. Great emphasis should be placed on developing a good design and efficient implementation of the dictionary.
- Indices: provide fast access to data items holding particular values.
Figure 1.6 shows these components.
Figure 1.6: Database system structure.

Mapping Constraints

An E-R scheme may define certain constraints to which the contents of a database must conform.

Mapping Cardinalities: express the number of entities to which another entity can be associated via a relationship. For binary relationship sets between entity sets A and B, the mapping cardinality must be one of:
1. One-to-one: An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. (Figure 2.3)
2. One-to-many: An entity in A is associated with any number in B. An entity in B is associated with at most one entity in A. (Figure 2.4)
3. Many-to-one: An entity in A is associated with at most one entity in B. An entity in B is associated with any number in A. (Figure 2.5)
4. Many-to-many: Entities in A and B are associated with any number from each other. (Figure 2.6)
The appropriate mapping cardinality for a particular relationship set depends on the real world being modeled. (Think about the CustAcct relationship...)
Existence Dependencies: if the existence of entity X depends on the existence of entity Y, then X is said to be existence dependent on Y. (Or we say that Y is the dominant entity and X is the subordinate entity.) For example,
- Consider account and transaction entity sets, and a relationship log between them.
- This is one-to-many from account to transaction.
- If an account entity is deleted, its associated transaction entities must also be deleted.
- Thus account is dominant and transaction is subordinate.
Keys
Differences between entities must be expressed in terms of attributes.
A superkey is a set of one or more attributes which, taken collectively, allow us to identify uniquely an entity in the entity set.
For example, in the entity set customer, customer-name and S.I.N. is a superkey.
Note that customer-name alone is not, as two customers could have the same name.
A superkey may contain extraneous attributes, and we are often interested in the smallest superkey. A superkey for which no subset is a superkey is called a candidate key.
In the example above, S.I.N. is a candidate key, as it is minimal, and uniquely identifies a customer entity.
A primary key is a candidate key (there may be more than one) chosen by the DB designer to identify entities in an entity set.

An entity set that does not possess sufficient attributes to form a primary key is called a weak entity set. One that does have a primary key is called a strong entity set. For example,

The entity set transaction has attributes transaction-number, date and amount.
Different transactions on different accounts could share the same number.
These are not sufficient to form a primary key (uniquely identify a transaction).
Thus transaction is a weak entity set.

For a weak entity set to be meaningful, it must be part of a one-to-many relationship set. This relationship set should have no descriptive attributes. (Why?)
The idea of strong and weak entity sets is related to the existence dependencies seen earlier.

Member of a strong entity set is a dominant entity.
Member of a weak entity set is a subordinate entity.

A weak entity set does not have a primary key, but we need a means of distinguishing among the entities.
The discriminator of a weak entity set is a set of attributes that allows this distinction to be made.
The primary key of a weak entity set is formed by taking the primary key of the strong entity set on which its existence depends (see Mapping Constraints) plus its discriminator.
To illustrate:

transaction is a weak entity. It is existence-dependent on account.
The primary key of account is account-number.
transaction-number distinguishes transaction entities within the same account (and is thus the discriminator).
So the primary key for transaction would be (account-number, transaction-number).

Just Remember: The primary key of a weak entity is found by taking the primary key of the strong entity on which it is existence-dependent, plus the discriminator of the weak entity set.

The Entity Relationship Diagram

We can express the overall logical structure of a database graphically with an E-R diagram.
Its components are:

rectangles representing entity sets.
ellipses representing attributes.
diamonds representing relationship sets.
lines linking attributes to entity sets and entity sets to relationship sets.

In the text, lines may be directed (have an arrow on the end) to signify mapping cardinalities for relationship sets. Figures 2.8 to 2.10 show some examples.

Figure 2.7: An E-R diagram

Figure 2.8: One-to-many from customer to account

Figure 2.9: Many-to-one from customer to account

Figure 2.10: One-to-one from customer to account

Go back and review mapping cardinalities. They express the number of entities to which an entity can be associated via a relationship.
The arrow positioning is simple once you get it straight in your mind, so do some examples. Think of the arrow head as pointing to the entity that ``one'' refers to

Other Styles of E-R Diagram

The text uses one particular style of diagram. Many variations exist.
Some of the variations you will see are:

Diamonds being omitted - a link between entities indicates a relationship.
- Less symbols, clearer picture.
- What happens with descriptive attributes?
- In this case, we have to create an intersection entity to possess the attributes.
Numbers instead of arrowheads indicating cardinality.
- Symbols, 1, n and m used.
- E.g. 1 to 1, 1 to n, n to m.
- Easier to understand than arrowheads.
A range of numbers indicating optionality of relationship. (See Elmasri & Navathe, p 58.)
- E.g (0,1) indicates minimum zero (optional), maximum 1.
- Can also use (0,n), (1,1) or (1,n).
- Typically used on near end of link - confusing at first, but gives more information.
- E.g. entity 1 (0,1) -- (1,n) entity 2 indicates that entity 1 is related to between 0 and 1 occurrences of entity 2 (optional).
- Entity 2 is related to at least 1 and possibly many occurrences of entity 1 (mandatory).
Multivalued attributes may be indicated in some manner.
- Means attribute can have more than one value.
- E.g. hobbies.
- Has to be normalized later on.
Extended E-R diagrams allowing more details/constraints in the real world to be recorded. (See Elmasri & Navathe, chapter 21.)
- Composite attributes.
- Derived attributes.
- Subclasses and superclasses.
- Generalization and specialization.

Roles in E-R Diagrams
The function that an entity plays in a relationship is called its role. Roles are normally explicit and not specified.
They are useful when the meaning of a relationship set needs clarification.
For example, the entity sets of a relationship may not be distinct. The relationship works-for might be ordered pairs of employees (first is manager, second is worker).
In the E-R diagram, this can be shown by labelling the lines connecting entities (rectangles) to relationships (diamonds). (See figure 2.11).

Figure 2.11: E-R diagram with role indicators

Weak Entity Sets in E-R Diagrams
A weak entity set is indicated by a doubly-outlined box. For example, the previously-mentioned weak entity set transaction is dependent on the strong entity set account via the relationship set log.
Figure 2.12) shows this example.

Figure 2.12: E-R diagram with a weak entity set

Nonbinary Relationships
Non-binary relationships can easily be represented. Figure 2.13) shows an example.

Figure 2.13: E-R diagram with a ternary relationship

This E-R diagram says that a customer may have several accounts, each located in a specific bank branch, and that an account may belong to several different customers.

Reducing E-R Diagrams to Tables

A database conforming to an E-R diagram can be represented by a collection of tables. We'll use the E-R diagram of Figure 2.14) as our example.

Figure 2.14: E-R diagram with strong and weak entity sets

For each entity set and relationship set, there is a unique table which is assigned the name of the corresponding set. Each table has a number of columns with unique names. (E.g. Figs. 2.14 - 2.18 in the text).

Representation of Strong Entity Sets

We use a table with one column for each attribute of the set. Each row in the table corresponds to one entity of the entity set. For the entity set account, see the table of figure 2.14.
We can add, delete and modify rows (to reflect changes in the real world).
A row of a table will consist of an n-tuple where n is the number of attributes.
Actually, the table contains a subset of the set of all possible rows. We refer to the set of all possible rows as the cartesian product of the sets of all attribute values.
We may denote this as

for the account table, where

and

denote the set of all account numbers and all account balances, respectively.
In general, for a table of n columns, we may denote the cartesian product of

Representation of Weak Entity Sets

For a weak entity set, we add columns to the table corresponding to the primary key of the strong entity set on which the weak set is dependent.
For example, the weak entity set transaction has three attributes: transaction-number, date and amount. The primary key of account (on which transaction depends) is account-number. This gives us the table of figure 2.16.

Generalization

Consider extending the entity set account by classifying accounts as being either savings-account or chequing-account.
Each of these is described by the attributes of account plus additional attributes. (savings has interest-rate and chequing has overdraft-amount.)
We can express the similarities between the entity sets by generalization. This is the process of forming containment relationships between a higher-level entity set and one or more lower-level entity sets.
In E-R diagrams, generalization is shown by a triangle, as shown in Figure 2.19.

Figure 2.19: Generalization

Generalization hides differences and emphasizes similarities.
Distinction made through attribute inheritance.
Attributes of higher-level entity are inherited by lower-level entities.
Two methods for conversion to a table form:
- Create a table for the high-level entity, plus tables for the lower-level entities containing also their specific attributes.
- Create only tables for the lower-level entities.

Aggregation

The E-R model cannot express relationships among relationships.
When would we need such a thing?
Consider a DB with information about employees who work on a particular project and use a number of machines doing that work. We get the E-R diagram shown in Figure 2.20 .

Figure 2.20: E-R diagram with redundant relationships

Relationship sets work and uses could be combined into a single set. However, they shouldn't be, as this would obscure the logical structure of this scheme.
The solution is to use aggregation.

An abstraction through which relationships are treated as higher-level entities.
For our example, we treat the relationship set work and the entity sets employee and project as a higher-level entity set called work.
Figure 2.21 shows the E-R diagram with aggregation.
Figure 2.21: E-R diagram with aggregation

Transforming an E-R diagram with aggregation into tabular form is easy. We create a table for each entity and relationship set as before. The table for relationship set uses contains a column for each attribute in the primary key of machinery and work.

1 comment:

Kris Lewis20 February 2026 at 04:06
Interesting insights shared here on emerging technologies and their practical applications. The growing convergence of data science and artificial intelligence
is transforming how organizations analyze information, automate decisions, and generate predictive insights. It’s exciting to see discussions that highlight how these fields continue to drive innovation and real-world impact.

Database Engineering

introduction