Links
CS home page
Dick Brown's home page
Site home page
Printable version of this page
-----
CSA online text
Laboratory assignments
Homework assignments
Escher: Web Portfolio Manager Project
Course directory ~cs284
-----
Java API
Project log form




Introduction to SQL

CS 284 (CSA), Spring 2020

Table of contents

Standards and DBMSs

  • Recall that a standard describes the expected features for a programming language. For example, the MIT Scheme interpreter used at St. Olaf satisfies the Scheme standard known as RSR5, except that the empty list () is returned by predicates instead of the standard false value #f. MIT Scheme also includes many extensions of RSR5, such as the predefined predicate list? (RSR5 only specifies null? and pair?). We do not consider extensions to violate a standard unless those extensions contradict some requirement in that standard.

    Standards are developed by (often lengthy) deliberation by committees that have broad representation of (typically commercial) parties interested in those standards, e.g., producers of compilers and interpreters, vendors who create applications that will use the standard, etc.

  • The two most recent SQL standards are SQL92 and SQL99, produced in 1992 and 1999, respectively. Unlike Scheme and C++, actual DBMS products all differ from the standards in many respects, although basic forms of the core SQL commands are generally accepted by all of the DBMSs

  • Four example DBMSs are:

    • Oracle, a leading commercial system with many features, known for its scalability and reliability, available on many computing platforms (i.e., a wide variety of hardware/software combinations). Oracle can be tuned to optimize performance for specific applications; the number of adjustable system parameters makes it necessary to have a DBA at most installations.

    • SQL Server (pronounced "sequel server") is Microsoft's (commercial) DBMS, and runs only on Microsoft platforms. It is low in cost, high in performance, and easy to use, but far less tunable than Oracle.

    • Postgres, the DBMS we will use for the course, is an open source system that is rich in features (though not as rich as Oracle) and adheres to the SQL standard more closely than any other non-commercial DBMS. It runs on many platforms and includes support for large-scale applications.

    • MySQL is a widely used open source system known for its high performance. MySQL provides fewer features than the DBMSs above. For example, transactions (in which multiple database operations are performed as an indivisible unit) are not supported; some do not consider MySQL to be a true DBMS for this reason. MySQL is available on platforms including Linux.

Types and values

  • Types available illustrate some of the ways that DBMSs differ from the SQL standards. The table below shows some type names specified in SQL99 and corresponding type names actually used in Oracle and Postgres. Type names in this font are extensions of the standard.

    SQL99 standardPostgresOracleNote
    character(n)char(n)char(n) or character(n)string of fixed length, padded with spaces
    character varying(n)varchar(n), textvarchar, longUp to n characters, not padded by spaces. Postgres text and Oracle varchar does not specify n; Oracle long allows for up to 2GB strings
    national character varying(n) national character varying(n)Unicode strings
    integerint2, int4, int8integer(n), smallint2,4,8,n indicate size in bytes
    float(p,s)float4, float8floatStandard calls for precision and size specification
    booleanbool 
    temporaldate, time, timespan, interval, timestamp, timetz, datetimedatePostgres timespan includes two times, interval is a "delta time", timetz includes time zone, etc.
    binary BLOB, bfile, CLOB, raw, long rawFor representing binary data such as images, large quantities of text, etc.; BLOB is "binary large object", CLOB is "character large object; Oracle raw is up to 2K binary, long raw up to 4G
  • The value NULL is available for all types, and indicates a missing value in a table field.

  • Strings are delimited using ' or `, not ". Example: 'hello'. Double the string delimiter to include it in a string, e.g., 'Who''s on first?'.

SQL Statements

All SQL statements are terminated with a semicolon ;. Two dashes -- introduce a comment; /* ... */ comments are also available.

We will consider a database implementation of a simplified telephone-address book as a running example, e.g., perhaps representing the address by only a state field.

Names of commands, fields, types, etc., are all case insensitive. We will use all caps for SQL keywords (e.g., parts of commands) and use all lower case for (user-selected) names of tables, fields, etc. Of course, the case of characters appearing within strings is preserved, e.g., 'Hello'.

All information is for Postgres; see the Postgres reference manual for actual specifications and more information.

CREATE TABLE

  • The SQL CREATE TABLE command defines (logical) relations. Besides specifying the names and types of columns in a table, CREATE TABLE also provides options for using or constraining the data placed in a table.

  • Examples:

    CREATE TABLE state (
           abbr char(2) primary key, -- abbreviation
           name char(15)
    );
    
    CREATE TABLE person (
           eid int2 primary key, -- entry id
           name char(25),
           phone char(4), -- campus phone
           state char(2) references state(abbr)
    );
    

  • The two example CREATE TABLE statements specify the schema for relations state and person.

  • Each line within these examples above describes a field in a table, consisting of a field name and a type, optionally followed by qualifying information for that field; field descriptions are separated (not terminated) by commas.

  • The qualifications primary key and references state(abbr) are called table constraints. Constraints are checked by the DBMS when a new row is inserted in a table, when a row is modified, etc.

  • Primary key constraints specify that the indicated field is the main key; there may be other keys (perhaps involving multiple columns), but only one can be designated "primary." The DBMS will automatically check that all keys in a table are distinct; for example, two rows in the table state may not have the same abbr value.

  • The constraint references state(abbr) is called a foreign key constraint. This constraint indicates that the field state of the table person should appear among the keys abbr in the table state. The DBMS will not allow us to add a row to person that does not satisfy this condition.

  • We chose the type int2 for eid because a two-byte integer (range from -32,768 to +32,767) should suffice for this application. We chose the type char(4) for the campus phone instead of int2 because we would like a campus phone number such as 0767 to be treated as a four-character string, not a 3-digit numerical value 767.

  • More on CREATE TABLE
  • "Added key fields."

    • In person, the sole purpose of the field eid is to provide a one-field key. We will refer to such a field as an added key field. Without eid, the shortest key would consist of all three fields name, phone and state, since there might, for example, be two people with the same phone number, or with the same name but different states, or two phone numbers for a single person.

    • Added key fields may be necessary for some applications. For example, in the person example, if roommates from the same state had the same name there would be no way to distinguish them in person without eid.

    • An added key field was not necessary in the table state, since the two-character abbreviations MN, IL, etc., already determine state names uniquely.

    • The choice whether to add a key field represents a tradeoff: an added key field may be convenient for refering to individual records in an application, even if that added field is unnecessary; but when another field already serves as a key, an added key field would clutter a database with redundancy.

  • The CREATE TABLE command has many options not illustrated by these examples. Here are some constraints that could be added as additional lines within the CREATE TABLE command (preceded by comma separators):

    • unique(name, phone) would cause the DBMS to check that a new row (record) has a different combination of name and phone number than every other row in the table person. Because of this uniqueness, this pair of columns could serve as an key (in addition to eid). Of course, this uniqueness constraint would not be a good idea if two persons might conceivably have the same name and phone number.

    • check phone ~ '[0-9][0-9][0-9][0-9]' will cause the DBMS to verify that the phone field of person is actually four digits, instead of any four characters. The string '[0-9][0-9][0-9][0-9]' is a regular expression in Postgres, specifying a pattern to be matched, and the operator ~ is the pattern matching operator in Postgres.

      Note. The operator ~ is not standard SQL; neither is the regular expression above (although that form of regular expression is standard in UNIX). Standard SQL provides a like operator and a different form of regular expressions, but standard SQL regular expressions cannot specify a pattern of four digits.

    • Other constraints are available, as well. For example, adding not null in the description of a field will cause the DBMS to verify that field is not assigned the special value null.

  • Other qualifications are available besides table constraints. For example,

           state char(2) default 'MN'
    
    would specify a default value for the state attribute in new rows of person. If this is combined with the foreign key constraint references state(abbr), we would have to make sure that MN appears in the abbr field of some record in the table state before using that default value in a new record of person.

  • Here is a definition of person that uses these additional constraints and the default value for the field state.

    CREATE TABLE person (
           eid int2 primary key, -- entry id
           name char(25) not null,
           phone char(4), -- campus phone
           state char(2) references state(abbr) default 'MN',
           unique(name, phone),
           check (phone ~ '[0-9][0-9][0-9][0-9]')
    );
    

INSERT

  • The INSERT command adds a row to a table.

  • Examples:

    INSERT INTO state VALUES ('MN', 'Minnesota');
    INSERT INTO state VALUES ('WA', 'Washington');
    INSERT INTO state (name, abbr) VALUES ('Illinois', 'IL');
    INSERT INTO person VALUES (1, 'Dick', '3860', 'WA');
    INSERT INTO person VALUES (2, 'Sue' , '3098', 'IL');
    INSERT INTO person VALUES (3, 'Chris', '2997');
    INSERT INTO person (eid, name) VALUES (4, 'John');
    

  • These lines insert three rows in the table state, then two rows in the table person.

  • The values listed between parentheses must match the order of fields in the table schema by default. To override this default, insert a parenthesized list of field names between the table name and the keyword VALUES, as in the Illinois example above.

  • (Default values) The final two insertions above provide only three values; the fourth field in each case, state, will be initialized with the default value 'MN', assuming that we describe that field state as default 'MN'.

    In the last example, a second field phone is left unspecified. The value null is typically entered for unspecified fields without a default value. To insure that this will be the case, add default null to the description of that field phone when creating the table person.

Introduction to SELECT

  • SELECT is the general purpose data retrieval command in SQL. A command for retrieving data from a database is called a query. SELECT has many options and variants; we will consider only a few basic uses here.

  • Example:

    1. SELECT * FROM person;
      
      This retrieves all rows and columns from the table person. The following output represents the results:
       eid |           name            | phone | state
      -----+---------------------------+-------+-------
         1 | Dick                      | 3860  | WA
         2 | Sue                       | 3098  | IL
         3 | Chris                     | 2997  | MN
         4 | John                      |       | MN
      (4 rows)
      
      Assuming the insertions above, note that although the rows will have these contents, they may not necessarily appear in this order.

    2. SELECT name, phone FROM person;
      
      This selects only the two named columns from the table person. The operation of choosing only certain columns is called projection. Results:
                 name            | phone
      ---------------------------+-------
       Dick                      | 3860
       Sue                       | 3098
       Chris                     | 2997
       John                      |
      (4 rows)
      

    3. SELECT * FROM person WHERE state = 'MN';
      
      The WHERE clause in a SELECT command allows one to choose among the rows in a table by specifying a boolean criterion. The operation of choosing among rows is called selection (and obviously gives rise to the name of the command SELECT). Observe that a single = sign is used the equality predicate in SQL (unlike the == of C++). Results:
       eid |           name            | phone | state
      -----+---------------------------+-------+-------
         3 | Chris                     | 2997  | MN
         4 | John                      |       | MN
      (2 rows)                                           
      

    4. SELECT name, phone FROM person WHERE state = 'MN';
      
      Here, we combine projection and selection. Observe that the column(s) used in the WHERE condition need(s) not appear in the output. Results:
                 name            | phone
      ---------------------------+-------
       Chris                     | 2997
       John                      |
      (2 rows)                                                
      

    5. SELECT * FROM person ORDER BY phone;
      
      Here we choose an ordering for the output. Results:
       eid |           name            | phone | state
      -----+---------------------------+-------+-------
         3 | Chris                     | 2997  | MN
         2 | Sue                       | 3098  | IL
         1 | Dick                      | 3860  | WA
         4 | John                      |       | MN
      (4 rows)
      

    6. SELECT name, state FROM person ORDER BY phone;
      
      This example of ordering illustrates the fact that columns used in the ordering criteria need not appear in the output. Results:
                 name            | state
      ---------------------------+-------
       Chris                     | MN
       Sue                       | IL
       Dick                      | WA
       John                      | MN
      (4 rows)
      

    7. SELECT name, state FROM person ORDER BY state ASC, name DESC;
      
      This suggests some of the possibilities for ordering rows: we require that rows be ordered first by state in ascending order, then by name in descending order. (ASC isn't necessary here, because ascending order is the default.) String ordering is lexicographic (dictionary) order. Results:
                 name            | state
      ---------------------------+-------
       Sue                       | IL
       John                      | MN
       Chris                     | MN
       Dick                      | WA
      (4 rows)
      

    8. Joins

    9. SELECT p.name, s.name FROM person p, state s WHERE p.state = s.abbr;
      
      This query requests the person names and state names for each row in person. The FROM clause defines temporary names p and s for rows of the tables person and state (respectively), and the WHERE clause requires that the state column of row p is the same as the abbr column of row s. Results:
                 name            |      name
      ---------------------------+-----------------
       Sue                       | Illinois
       John                      | Minnesota
       Chris                     | Minnesota
       Dick                      | Washington
      (4 rows)
      

      This kind of query, which involves rows from two or more tables, is called a join operation. It is equivalent to asking the DBMS to create all pairs of rows (p, s) where p appears in person and s appears in state, then discarding any pairs that do not match the condition p.state = s.abbr, and finally projecting the columns p.name and s.name from the pairs that remain.

    10. SELECT person.name, state.name FROM person 
      JOIN state ON person.state = state.abbr;
      
      This query produces the same results as the previous query. The WHERE clause style of the previous query is understood by most DBMSs, but this form with a JOIN clause satisfies the ANSI standard.

      Also, we used aliases p and s for the table names person and state in the previous query, but in this query we use those table names directly without aliases.

    11. SELECT p.name FROM person p, state s WHERE p.state = s.abbr AND s.name ~ 'a';
      
      This query requests the names of all persons who come from states containing the letter a. Results:
                 name
      ---------------------------
       John
       Chris
       Dick
      (3 rows)
      

    12. SELECT p.name, s.name FROM person p 
      JOIN state s ON p.state = s.abbr WHERE s.name ~ 'a';
      

      This query returns the same results as the previous query, but uses the JOIN syntax for specifying the join instead of specifying the join as part of the WHERE clause constraints.

  • Join operations are very powerful, but may be computationally quite expensive.

UPDATE

  • The UPDATE command in SQL allows for modification of one or more records (table rows) after those records have been entered.

  • Examples:

    1. UPDATE person SET eid = 10 + eid;
      SELECT * FROM person;
      
      This command adds ten to each eid value in the table person. Results:
      UPDATE 4
       eid |           name            | phone | state
      -----+---------------------------+-------+-------
        14 | John                      |       | MN
        11 | Dick                      | 3860  | WA
        12 | Sue                       | 3098  | IL
        13 | Chris                     | 2997  | MN
      (4 rows)
      
      The first line, UPDATE 4, reports the number of successful UPDATE operations performed.

    2. UPDATE person SET phone = '3999' WHERE name = 'Sue';
      UPDATE person SET name = 'Richard' WHERE name = 'Dick';
      UPDATE person SET eid = 29, state = 'IL' WHERE eid = 13;
      SELECT * FROM person;
      
      These three UPDATE commands modify individual rows of the table person, using a WHERE clause to identify the row to be modified. Observe that the WHERE clause may or may not involve the field to be modified, and that multiple columns may be modified in a single UPDATE command. Results:
      UPDATE 1
      UPDATE 1
      UPDATE 1
       eid |           name            | phone | state
      -----+---------------------------+-------+-------
        14 | John                      |       | MN
        12 | Sue                       | 3999  | IL
        11 | Richard                   | 3860  | WA
        29 | Chris                     | 2997  | IL
      (4 rows)
      
      Recall that the order of rows in a table is not significant (unless there is an ORDER BY clause).

DELETE

  • To remove one or more rows from a table, use the DELETE command.

  • Examples:

    1. DELETE FROM person WHERE name = 'Richard';
      SELECT * FROM person;
      DELETE FROM person WHERE state = 'IL';
      SELECT * FROM person;
      DELETE FROM person;
      SELECT * FROM person;
      
      Results:
      DELETE 1
      
       eid |           name            | phone | state
      -----+---------------------------+-------+-------
        14 | John                      |       | MN
        12 | Sue                       | 3999  | IL
        29 | Chris                     | 2997  | IL
      (3 rows)
      
      DELETE 2
      
       eid |           name            | phone | state
      -----+---------------------------+-------+-------
        14 | John                      |       | MN
      (1 row)
      
      DELETE 1
      
       eid | name | phone | state
      -----+------+-------+-------
      (0 rows)
      

DROP

  • The DROP command removes an entire table. The table need not be empty before DROPping.

  • Examples:

    DROP TABLE person;
    DROP TABLE state;
    

File interface

  • _____

  • Example:

    _____
    
    _____

  • _____

  • _____

  • _____

  • _____

  • _____

  • _____

  • _____





rab@stolaf.edu, April 15, 2020