Links
CS home page
Dick Brown's home page
Site home page
Printable version of this page
-----
CSA online text
Laboratory assignments
Homework assignments
Escher: Web Portfolio Manager Project
Course directory ~cs284
-----
Java API
Project log form




Introduction to databases

CS 284 (CSA), Spring 2020

Terms

  • Database: Organized collection of data. Example: student records, library catalog.

  • Record (or row): An element of information in a database, structured to consist of values for certain fields.

  • DBMS (Database Management System): Software designed to assist in maintaining and using large collections of data.

  • DBA (Database Administrator): A person who uses features of a DBMS to design and organize databases for particular purposes.

  • RDB (Relational Database): A database using a relational data model. The relational model is the most common model for contemporary DBMSs.

  • Field (or column or attribute): A component of a record, possessing a name and a particular value type.

  • Relation (or table): A collection of records having the same field structure.

  • Key: A designated field or collection of fields in a relation for which no two distinct records may have the same value(s). For example, student ID number would presumably represent a key for a relation consisting of student records.

  • Query: A DBMS command for searching a database for information matching specified criteria.

  • Transaction: A sequence of DBMS commands which is atomic (either all of the sequence is performed, or none of it) and concurrent (correct behavior even if more than one process is attempting to modify a database at the same time).

Why use databases?

Databases vs. file systems

  • In general, databases are desirable for (potentially large) quantities of field-structured data, since DBMSs are designed for extensive and efficient retrieval of such information.

  • In general, files are preferable for irregularly structured data, whether or not that data has a high degree of logical structure. For example, program code has very specific syntax, but no simple regular structure that could readily be represented as records and fields in a database relation.

  • To choose between databases and file systems for representing data, consider factors such as the following:

    • The form, degree, and use of structuring. (Is it readily expressed in terms of records and fields? Or is the structure more logical than regular?)

    • The size of the data. Few operating systems can support 500GB files, but 500GB databases are not unheard of. On the other hand, individual fields in a database with text data types may have limited size.

    • Features of DBMSs vs file systems. (For example, DBMSs offer sophisticated search queries, report generation, constraint checking such as keys on entry, etc., whereas UNIX file systems offer symbolic and "hard" links, well-supported directories, etc.)

Advantages of a DBMS

  • Independence of data vs. applications. For data structured appropriately for a DBMS, all applications may reuse the same efficient, powerful querying mechanism and storage strategy.

  • Efficient data access.

  • Data integrity, i.e., avoiding corruption of data.

  • Data security, i.e., fine-grained access control to data within a database.

  • Expertise of a DBA (when one is available).

  • Concurrent access with crash recovery, through database transactions. E.g., the SABRE airline reservation system.

  • Programmer productivity. DBMSs include features that facilitate the development of database applications.

Advantages of a file system

  • Diversity of data format. Programs may choose their own formats for storing data within files, customizing in order to optimize according to appropriate criteria. For example, .gif and .jpeg file formats for images are both widely used on the web. Each has its own advantages and disadvantages; for instance, .gif supports only 256 colors, but saving images in .jpeg could lead to loss of image quality.

  • Flexibility of access. Directories provide an easily customizable way to organize files; file system features such as symbolic links allow a single file to be stored under multiple names. It is usually possible to view files as linear sequences of bytes and thus to retrieve information in sequential order when needed, without having to negotiate the record/field structure of a database.

  • Specialized processing. Much specialized software exists for carrying out specific tasks on certain file formats. New programs can be created; in fact, scripts and filters enable programmers to construct temporary programs on an as-needed basis.

  • Independence of physical location. Whereas a DBMS typically operates on a particular computer with a single underlying disk system, files can be stored on disks, tapes, flash drives, etc., and can be moved to other machines readily.

  • Well-established standards. Many file formats (e.g., for image files or documents) are standard enough to be moved from system to system; also, some operating systems (e.g., Linux or OS/X) can even mount entire file systems that were developed on different systems. In contrast, no DBMS adheres to all of the most common database standard, SQL, and different DBMS products have often significant incompatibilities.

Database as a topic of study

The following subjects typically appear in a Database course.

  • Database design. Given an application need, how should the data be organized within a database for effective use through a DBMS?

  • Tuning of databases. How can we improve the performance of database operations for a particular application? How can we plan for scaling up of the size of an application?

  • Concurrency and robustness. Includes transaction management.

  • Implementation of a DBMS.

Data models; schemas

  • A data model is a strategy for organizing information in a database. Most databases are designed to support only one data model. Some data models include:

    • Network model (early 60's)

    • Hierarchical model (late 60's)

    • Relational model (70's)

    • Object model (emerging in 90's)

    • Object-relation model (mid-late 90's, a hybrid of object and relational models)

  • The relational model has been dominant throughout the past few decades, and most prominent systems today represent that model. In the relational model, data is organized into relations, typically visualized as tables with rows and columns, with rows representing records and columns representing fields within those records.

    Among the operations supported for tables in a relational database are projection, selection, and the relational join (see the SQL page for more information).

  • A schema (pronounced "SKEEma", plural schemas) is a description of the structuring of data using a particular data model. For example, the definition of the tables in a database, together with constraint information (such as keys), describes a database schema in a relational DBMS.

  • A schema is typically specified using a data definition language (DDL).

  • Schema levels:

    • External schema, as viewed by the users or non-DBMS portions of an application.

    • Logical (or conceptual) schema, representing the organization of data into base tables.

    • Physical schema, or storage strategy within a DBMS for efficient access.

    • Disk storage.





rab@stolaf.edu, April 15, 2020