Introduction to SQL (CS 284 (CSA), Spring 2020)

Recall that a standard describes the expected features for a programming language. For example, the MIT Scheme interpreter used at St. Olaf satisfies the Scheme standard known as RSR5, except that the empty list () is returned by predicates instead of the standard false value #f. MIT Scheme also includes many extensions of RSR5, such as the predefined predicate list? (RSR5 only specifies null? and pair?). We do not consider extensions to violate a standard unless those extensions contradict some requirement in that standard.

Standards are developed by (often lengthy) deliberation by committees that have broad representation of (typically commercial) parties interested in those standards, e.g., producers of compilers and interpreters, vendors who create applications that will use the standard, etc.
The two most recent SQL standards are SQL92 and SQL99, produced in 1992 and 1999, respectively. Unlike Scheme and C++, actual DBMS products all differ from the standards in many respects, although basic forms of the core SQL commands are generally accepted by all of the DBMSs
Four example DBMSs are:
- Oracle, a leading commercial system with many features, known for its scalability and reliability, available on many computing platforms (i.e., a wide variety of hardware/software combinations). Oracle can be tuned to optimize performance for specific applications; the number of adjustable system parameters makes it necessary to have a DBA at most installations.
- SQL Server (pronounced "sequel server") is Microsoft's (commercial) DBMS, and runs only on Microsoft platforms. It is low in cost, high in performance, and easy to use, but far less tunable than Oracle.
- Postgres, the DBMS we will use for the course, is an open source system that is rich in features (though not as rich as Oracle) and adheres to the SQL standard more closely than any other non-commercial DBMS. It runs on many platforms and includes support for large-scale applications.
- MySQL is a widely used open source system known for its high performance. MySQL provides fewer features than the DBMSs above. For example, transactions (in which multiple database operations are performed as an indivisible unit) are not supported; some do not consider MySQL to be a true DBMS for this reason. MySQL is available on platforms including Linux.

Types and values

Types available illustrate some of the ways that DBMSs differ from the SQL standards. The table below shows some type names specified in SQL99 and corresponding type names actually used in Oracle and Postgres. Type names in this font are extensions of the standard.

SQL99 standard	Postgres	Oracle	Note
`character(n)`	`char(n)`	`char(n)` or `character(n)`	string of fixed length, padded with spaces
`character varying(n)`	`varchar(n)`, `text`	`varchar`, `long`	Up to n characters, not padded by spaces. Postgres `text` and Oracle `varchar` does not specify n; Oracle `long allows for up to 2GB strings`
`national character varying(n)`		`national character varying(n)`	Unicode strings
`integer`	`int2`, `int4`, `int8`	`integer(n)`, `smallint`	2,4,8,n indicate size in bytes
`float(p,s)`	`float4`, `float8`	`float`	Standard calls for `p`recision and `s`ize specification
`boolean`	`bool`
`temporal`	`date`, `time`, `timespan`, `interval`, `timestamp`, `timetz`, `datetime`	`date`	Postgres `timespan` includes two times, `interval` is a "delta time", `timetz` includes time zone, etc.
`binary`		`BLOB`, `bfile`, `CLOB`, `raw`, `long raw`	For representing binary data such as images, large quantities of text, etc.; BLOB is "binary large object", CLOB is "character large object; Oracle `raw` is up to 2K binary, `long raw` up to 4G

The value NULL is available for all types, and indicates a missing value in a table field.
Strings are delimited using ' or `, not ". Example: 'hello'. Double the string delimiter to include it in a string, e.g., 'Who''s on first?'.

SQL Statements

All SQL statements are terminated with a semicolon ;. Two dashes -- introduce a comment; /* ... */ comments are also available.

We will consider a database implementation of a simplified telephone-address book as a running example, e.g., perhaps representing the address by only a state field.

Names of commands, fields, types, etc., are all case insensitive. We will use all caps for SQL keywords (e.g., parts of commands) and use all lower case for (user-selected) names of tables, fields, etc. Of course, the case of characters appearing within strings is preserved, e.g., 'Hello'.

All information is for Postgres; see the Postgres reference manual for actual specifications and more information.

`CREATE TABLE`

The SQL CREATE TABLE command defines (logical) relations. Besides specifying the names and types of columns in a table, CREATE TABLE also provides options for using or constraining the data placed in a table.

Examples:

CREATE TABLE state (
       abbr char(2) primary key, -- abbreviation
       name char(15)
);

CREATE TABLE person (
       eid int2 primary key, -- entry id
       name char(25),
       phone char(4), -- campus phone
       state char(2) references state(abbr)
);

The two example CREATE TABLE statements specify the schema for relations state and person.
Each line within these examples above describes a field in a table, consisting of a field name and a type, optionally followed by qualifying information for that field; field descriptions are separated (not terminated) by commas.
The qualifications primary key and references state(abbr) are called table constraints. Constraints are checked by the DBMS when a new row is inserted in a table, when a row is modified, etc.
Primary key constraints specify that the indicated field is the main key; there may be other keys (perhaps involving multiple columns), but only one can be designated "primary." The DBMS will automatically check that all keys in a table are distinct; for example, two rows in the table state may not have the same abbr value.
The constraint references state(abbr) is called a foreign key constraint. This constraint indicates that the field state of the table person should appear among the keys abbr in the table state. The DBMS will not allow us to add a row to person that does not satisfy this condition.
We chose the type int2 for eid because a two-byte integer (range from -32,768 to +32,767) should suffice for this application. We chose the type char(4) for the campus phone instead of int2 because we would like a campus phone number such as 0767 to be treated as a four-character string, not a 3-digit numerical value 767.

More on `CREATE TABLE`

"Added key fields."
- In person, the sole purpose of the field eid is to provide a one-field key. We will refer to such a field as an added key field. Without eid, the shortest key would consist of all three fields name, phone and state, since there might, for example, be two people with the same phone number, or with the same name but different states, or two phone numbers for a single person.
- Added key fields may be necessary for some applications. For example, in the person example, if roommates from the same state had the same name there would be no way to distinguish them in person without eid.
- An added key field was not necessary in the table state, since the two-character abbreviations MN, IL, etc., already determine state names uniquely.
- The choice whether to add a key field represents a tradeoff: an added key field may be convenient for refering to individual records in an application, even if that added field is unnecessary; but when another field already serves as a key, an added key field would clutter a database with redundancy.
The CREATE TABLE command has many options not illustrated by these examples. Here are some constraints that could be added as additional lines within the CREATE TABLE command (preceded by comma separators):
- unique(name, phone) would cause the DBMS to check that a new row (record) has a different combination of name and phone number than every other row in the table person. Because of this uniqueness, this pair of columns could serve as an key (in addition to eid). Of course, this uniqueness constraint would not be a good idea if two persons might conceivably have the same name and phone number.
- check phone ~ '[0-9][0-9][0-9][0-9]' will cause the DBMS to verify that the phone field of person is actually four digits, instead of any four characters. The string '[0-9][0-9][0-9][0-9]' is a regular expression in Postgres, specifying a pattern to be matched, and the operator ~ is the pattern matching operator in Postgres.
  Note. The operator ~ is not standard SQL; neither is the regular expression above (although that form of regular expression is standard in UNIX). Standard SQL provides a like operator and a different form of regular expressions, but standard SQL regular expressions cannot specify a pattern of four digits.
- Other constraints are available, as well. For example, adding not null in the description of a field will cause the DBMS to verify that field is not assigned the special value null.
Other qualifications are available besides table constraints. For example,
```
       state char(2) default 'MN'
```
would specify a default value for the state attribute in new rows of person. If this is combined with the foreign key constraint references state(abbr), we would have to make sure that MN appears in the abbr field of some record in the table state before using that default value in a new record of person.

Here is a definition of person that uses these additional constraints and the default value for the field state.

CREATE TABLE person (
       eid int2 primary key, -- entry id
       name char(25) not null,
       phone char(4), -- campus phone
       state char(2) references state(abbr) default 'MN',
       unique(name, phone),
       check (phone ~ '[0-9][0-9][0-9][0-9]')
);

`INSERT`

The INSERT command adds a row to a table.

Examples:

INSERT INTO state VALUES ('MN', 'Minnesota');
INSERT INTO state VALUES ('WA', 'Washington');
INSERT INTO state (name, abbr) VALUES ('Illinois', 'IL');
INSERT INTO person VALUES (1, 'Dick', '3860', 'WA');
INSERT INTO person VALUES (2, 'Sue' , '3098', 'IL');
INSERT INTO person VALUES (3, 'Chris', '2997');
INSERT INTO person (eid, name) VALUES (4, 'John');

These lines insert three rows in the table state, then two rows in the table person.
The values listed between parentheses must match the order of fields in the table schema by default. To override this default, insert a parenthesized list of field names between the table name and the keyword VALUES, as in the Illinois example above.
(Default values) The final two insertions above provide only three values; the fourth field in each case, state, will be initialized with the default value 'MN', assuming that we describe that field state as default 'MN'.
In the last example, a second field phone is left unspecified. The value null is typically entered for unspecified fields without a default value. To insure that this will be the case, add default null to the description of that field phone when creating the table person.

Introduction to `SELECT`

SELECT is the general purpose data retrieval command in SQL. A command for retrieving data from a database is called a query. SELECT has many options and variants; we will consider only a few basic uses here.

Example:

SELECT * FROM person;

This retrieves all rows and columns from the table person. The following output represents the results:

 eid |           name            | phone | state
-----+---------------------------+-------+-------
   1 | Dick                      | 3860  | WA
   2 | Sue                       | 3098  | IL
   3 | Chris                     | 2997  | MN
   4 | John                      |       | MN
(4 rows)

Assuming the insertions above, note that although the rows will have these contents, they may not necessarily appear in this order.

SELECT name, phone FROM person;

This selects only the two named columns from the table person. The operation of choosing only certain columns is called projection. Results:

           name            | phone
---------------------------+-------
 Dick                      | 3860
 Sue                       | 3098
 Chris                     | 2997
 John                      |
(4 rows)

SELECT * FROM person WHERE state = 'MN';

The WHERE clause in a SELECT command allows one to choose among the rows in a table by specifying a boolean criterion. The operation of choosing among rows is called selection (and obviously gives rise to the name of the command SELECT). Observe that a single = sign is used the equality predicate in SQL (unlike the == of C++). Results:

 eid |           name            | phone | state
-----+---------------------------+-------+-------
   3 | Chris                     | 2997  | MN
   4 | John                      |       | MN
(2 rows)

SELECT name, phone FROM person WHERE state = 'MN';

Here, we combine projection and selection. Observe that the column(s) used in the WHERE condition need(s) not appear in the output. Results:

           name            | phone
---------------------------+-------
 Chris                     | 2997
 John                      |
(2 rows)

SELECT * FROM person ORDER BY phone;

Here we choose an ordering for the output. Results:

 eid |           name            | phone | state
-----+---------------------------+-------+-------
   3 | Chris                     | 2997  | MN
   2 | Sue                       | 3098  | IL
   1 | Dick                      | 3860  | WA
   4 | John                      |       | MN
(4 rows)

SELECT name, state FROM person ORDER BY phone;

This example of ordering illustrates the fact that columns used in the ordering criteria need not appear in the output. Results:

           name            | state
---------------------------+-------
 Chris                     | MN
 Sue                       | IL
 Dick                      | WA
 John                      | MN
(4 rows)

```
SELECT name, state FROM person ORDER BY state ASC, name DESC;
```
This suggests some of the possibilities for ordering rows: we require that rows be ordered first by state in ascending order, then by name in descending order. (ASC isn't necessary here, because ascending order is the default.) String ordering is lexicographic (dictionary) order. Results:
```
           name            | state
---------------------------+-------
 Sue                       | IL
 John                      | MN
 Chris                     | MN
 Dick                      | WA
(4 rows)
```

Joins

```
SELECT p.name, s.name FROM person p, state s WHERE p.state = s.abbr;
```
This query requests the person names and state names for each row in person. The FROM clause defines temporary names p and s for rows of the tables person and state (respectively), and the WHERE clause requires that the state column of row p is the same as the abbr column of row s. Results:
```
           name            |      name
---------------------------+-----------------
 Sue                       | Illinois
 John                      | Minnesota
 Chris                     | Minnesota
 Dick                      | Washington
(4 rows)
```
This kind of query, which involves rows from two or more tables, is called a join operation. It is equivalent to asking the DBMS to create all pairs of rows (p, s) where p appears in person and s appears in state, then discarding any pairs that do not match the condition p.state = s.abbr, and finally projecting the columns p.name and s.name from the pairs that remain.
```
SELECT person.name, state.name FROM person 
JOIN state ON person.state = state.abbr;
```
This query produces the same results as the previous query. The WHERE clause style of the previous query is understood by most DBMSs, but this form with a JOIN clause satisfies the ANSI standard.

Also, we used aliases p and s for the table names person and state in the previous query, but in this query we use those table names directly without aliases.

SELECT p.name FROM person p, state s WHERE p.state = s.abbr AND s.name ~ 'a';

This query requests the names of all persons who come from states containing the letter a. Results:

           name
---------------------------
 John
 Chris
 Dick
(3 rows)

```
SELECT p.name, s.name FROM person p 
JOIN state s ON p.state = s.abbr WHERE s.name ~ 'a';
```
This query returns the same results as the previous query, but uses the JOIN syntax for specifying the join instead of specifying the join as part of the WHERE clause constraints.

Join operations are very powerful, but may be computationally quite expensive.

`UPDATE`

The UPDATE command in SQL allows for modification of one or more records (table rows) after those records have been entered.

Examples:

UPDATE person SET eid = 10 + eid;
SELECT * FROM person;

This command adds ten to each eid value in the table person. Results:

UPDATE 4
 eid |           name            | phone | state
-----+---------------------------+-------+-------
  14 | John                      |       | MN
  11 | Dick                      | 3860  | WA
  12 | Sue                       | 3098  | IL
  13 | Chris                     | 2997  | MN
(4 rows)

The first line, UPDATE 4, reports the number of successful UPDATE operations performed.

UPDATE person SET phone = '3999' WHERE name = 'Sue';
UPDATE person SET name = 'Richard' WHERE name = 'Dick';
UPDATE person SET eid = 29, state = 'IL' WHERE eid = 13;
SELECT * FROM person;

These three UPDATE commands modify individual rows of the table person, using a WHERE clause to identify the row to be modified. Observe that the WHERE clause may or may not involve the field to be modified, and that multiple columns may be modified in a single UPDATE command. Results:

UPDATE 1
UPDATE 1
UPDATE 1
 eid |           name            | phone | state
-----+---------------------------+-------+-------
  14 | John                      |       | MN
  12 | Sue                       | 3999  | IL
  11 | Richard                   | 3860  | WA
  29 | Chris                     | 2997  | IL
(4 rows)

Recall that the order of rows in a table is not significant (unless there is an ORDER BY clause).

`DELETE`

To remove one or more rows from a table, use the DELETE command.

Examples:

DELETE FROM person WHERE name = 'Richard';
SELECT * FROM person;
DELETE FROM person WHERE state = 'IL';
SELECT * FROM person;
DELETE FROM person;
SELECT * FROM person;

Results:

DELETE 1

 eid |           name            | phone | state
-----+---------------------------+-------+-------
  14 | John                      |       | MN
  12 | Sue                       | 3999  | IL
  29 | Chris                     | 2997  | IL
(3 rows)

DELETE 2

 eid |           name            | phone | state
-----+---------------------------+-------+-------
  14 | John                      |       | MN
(1 row)

DELETE 1

 eid | name | phone | state
-----+------+-------+-------
(0 rows)

`DROP`

The DROP command removes an entire table. The table need not be empty before DROPping.
Examples:
```
DROP TABLE person;
DROP TABLE state;
```

File interface

_____
Example:
```
_____
```
_____
_____
_____
_____
_____
_____
_____
_____

rab@stolaf.edu, April 15, 2020

Table of contents

Standards and DBMSs

Types and values

SQL Statements

CREATE TABLE

More on CREATE TABLE

INSERT

Introduction to SELECT

Joins

UPDATE

DELETE

DROP

File interface

`CREATE TABLE`

More on `CREATE TABLE`

`INSERT`

Introduction to `SELECT`

`UPDATE`

`DELETE`

`DROP`