Links
|
Introduction to databases
CS 284 (CSA), Spring 2020
Terms
Database: Organized collection of data. Example:
student records, library catalog.
Record (or row): An element of information in a
database, structured to consist of values for certain fields.
DBMS (Database Management System): Software
designed to assist in maintaining and using large collections of data.
DBA (Database Administrator): A person who uses
features of a DBMS to design and organize databases for particular
purposes.
RDB (Relational Database): A database using
a relational data model. The relational
model is the most common model for contemporary DBMSs.
Field (or column or attribute): A
component of a record, possessing a name and a particular value type.
Relation (or table): A collection of records
having the same field structure.
Key: A designated field or collection of fields in a
relation for which no two distinct records may have the same
value(s). For example, student ID number would presumably represent a
key for a relation consisting of student records.
Query: A DBMS command for searching a
database for information matching specified criteria.
Transaction: A sequence of DBMS commands which is
atomic (either all of the sequence is performed, or none of
it) and concurrent (correct behavior even if more than one
process is attempting to modify a database at the same time).
Why use databases?
Databases vs. file systems
In general, databases are desirable for (potentially large)
quantities of field-structured data, since DBMSs are designed for
extensive and efficient retrieval of such information.
In general, files are preferable for irregularly structured
data, whether or not that data has a high degree of logical
structure. For example, program code has very specific syntax, but
no simple regular structure that could readily be represented as
records and fields in a database relation.
To choose between databases and file systems for representing
data, consider factors such as the following:
The form, degree, and use of structuring. (Is it readily
expressed in terms of records and fields? Or is the structure more
logical than regular?)
The size of the data. Few operating systems can support 500GB
files, but 500GB databases are not unheard of. On the other hand,
individual fields in a database with text data types may have limited
size.
Features of DBMSs vs file systems. (For example,
DBMSs offer sophisticated search queries, report generation,
constraint checking such as keys on entry, etc.,
whereas
UNIX file
systems offer symbolic and "hard" links, well-supported directories,
etc.)
Advantages of a DBMS
Independence of data vs. applications. For
data structured
appropriately for a DBMS, all applications may reuse the same
efficient, powerful querying mechanism and storage strategy.
Efficient data access.
Data integrity,
i.e., avoiding corruption of data.
Data security,
i.e., fine-grained access control to data within a database.
Expertise of a DBA
(when one is available).
Concurrent access with crash recovery,
through database transactions.
E.g., the SABRE airline reservation system.
Programmer productivity.
DBMSs include features that facilitate the development of database
applications.
Advantages of a file system
Diversity of data format. Programs may choose
their own formats for storing data within files, customizing in order
to optimize according to appropriate criteria. For example,
.gif and .jpeg file formats for images are
both widely used on the web. Each has its own advantages and
disadvantages; for instance, .gif supports only 256
colors, but saving images in .jpeg could lead to loss of
image quality.
Flexibility of access. Directories provide an
easily customizable way to organize files; file system features such
as symbolic links allow a single file to be stored under multiple
names. It is usually possible to view files as linear sequences of
bytes and thus to retrieve information in sequential order when
needed, without having to negotiate the record/field structure of a
database.
Specialized processing. Much specialized
software exists for carrying out specific tasks on certain file
formats. New programs can be created; in fact, scripts and filters
enable programmers to construct temporary programs on an as-needed
basis.
Independence of physical location. Whereas a DBMS
typically operates on a particular computer with a single underlying
disk system, files can be stored on disks, tapes, flash drives, etc.,
and can be moved to other machines readily.
Well-established standards. Many file formats
(e.g., for image files or documents) are standard enough to be moved
from system to system; also, some operating systems (e.g., Linux or
OS/X) can even mount entire file systems that were developed on
different systems. In contrast, no DBMS adheres to all of the most
common database standard, SQL, and different DBMS products have often
significant incompatibilities.
Database as a topic of study
The following subjects typically appear in a Database course.
Database design. Given an application need, how should the
data be organized within a database for effective use through a DBMS?
Tuning of databases. How can we improve the performance of
database operations for a particular application? How can we plan for
scaling up of the size of an application?
Concurrency and robustness. Includes transaction management.
Implementation of a DBMS.
Data models; schemas
A data model is a strategy for organizing information
in a database. Most databases are designed to support only one data model.
Some data models include:
Network model (early 60's)
Hierarchical model (late 60's)
Relational model (70's)
Object model (emerging in 90's)
Object-relation model (mid-late 90's, a hybrid of object and
relational models)
The relational model has been dominant throughout the
past few decades, and most prominent systems today represent that
model. In the relational model, data is organized into relations,
typically visualized as tables with rows and columns, with rows
representing records and columns representing fields within those
records.
Among the operations supported for tables in a relational database
are projection, selection, and the relational join (see the SQL page
for more information).
A schema (pronounced "SKEEma", plural
schemas) is a description of the structuring of data
using a particular data model. For example, the definition of the
tables in a database, together with constraint information (such as
keys), describes a database schema in a relational DBMS.
A schema is typically specified using a data definition
language (DDL).
Schema levels:
External schema, as viewed by the users or non-DBMS
portions of an application.
Logical (or conceptual) schema, representing the
organization of data into base tables.
Physical schema, or storage strategy within a DBMS for
efficient access.
Disk storage.
|