Introduction to SQL
CS 284 (CSA), Spring 2005
Recall that a standard describes the expected features
for a programming language. For example, the MIT Scheme interpreter
used at St. Olaf satisfies the Scheme standard known as RSR5, except
that the empty list ()
is returned by predicates instead
of the standard false value #f
. MIT Scheme also includes
many extensions of RSR5, such as the predefined predicate
list?
(RSR5 only specifies null?
and
pair?
). We do not consider extensions to violate a
standard unless those extensions contradict some requirement
in that standard.
Standards are developed by (often lengthy) deliberation by committees that have broad representation of (typically commercial) parties interested in those standards, e.g., producers of compilers and interpreters, vendors who create applications that will use the standard, etc.
The two most recent SQL standards are SQL92 and SQL99, produced in 1992 and 1999, respectively. Unlike Scheme and C++, actual DBMS products all differ from the standards in many respects, although basic forms of the core SQL commands are generally accepted by all of the DBMSs
Four example DBMSs are:
Oracle, a leading commercial system with many features, known for its scalability and reliability, available on many computing platforms (i.e., a wide variety of hardware/software combinations). Oracle can be tuned to optimize performance for specific applications; the number of adjustable system parameters makes it necessary to have a DBA at most installations.
SQL Server (pronounced "sequel server") is Microsoft's (commercial) DBMS, and runs only on Microsoft platforms. It is low in cost, high in performance, and easy to use, but far less tunable than Oracle.
Postgres, the DBMS we will use for the course, is an open source system that is rich in features (though not as rich as Oracle) and adheres to the SQL standard more closely than any other non-commercial DBMS. It runs on many platforms and includes support for large-scale applications.
MySQL is a widely used open source system known for its high performance. MySQL provides fewer features than the DBMSs above. For example, transactions (in which multiple database operations are performed as an indivisible unit) are not supported; some do not consider MySQL to be a true DBMS for this reason. MySQL is available on platforms including Linux.
Types available illustrate some of the ways that DBMSs differ
from the SQL standards. The table below shows some type names specified
in SQL99 and corresponding type names actually used in Oracle and
Postgres. Type names in this font
are
extensions of the standard.
SQL99 standard | Postgres | Oracle | Note |
character(n) | char(n) | char(n)
or character(n) | string of fixed length, padded with spaces |
character
varying(n) | varchar(n) , text | varchar ,
long | Up to n characters, not padded by
spaces. Postgres text and Oracle
varchar does not specify n; Oracle long allows for up
to 2GB strings |
national character
varying(n) | | national
character varying(n) | Unicode strings |
integer | int2 ,
int4 , int8 | integer(n) ,
smallint | 2,4,8,n indicate size in bytes |
float(p,s) | float4 ,
float8 | float | Standard calls
for p recision and s ize specification |
boolean | bool | | |
temporal | date ,
time , timespan ,
interval , timestamp ,
timetz ,
datetime | date | Postgres
timespan includes two times,
interval is a "delta time",
timetz includes time zone, etc. |
binary | | BLOB ,
bfile , CLOB ,
raw , long raw | For
representing binary data such as images, large quantities of text,
etc.; BLOB
is "binary large object", CLOB is "character large object; Oracle
raw is up to 2K binary, long
raw up to 4G |
The value NULL
is available for all types, and
indicates a missing value in a table field.
Strings are delimited using '
or `
,
not "
. Example: 'hello'
. Double the
string delimiter to include it in a string, e.g., 'Who''s on first?'
.
All SQL statements are terminated with a semicolon
;
. Two dashes --
introduce a comment;
/* ... */
comments are also available.
We will consider a database implementation of a simplified telephone-address book as a running example, e.g., perhaps representing the address by only a state field.
Names of commands, fields, types, etc., are all case
insensitive. We will use all caps for SQL keywords (e.g.,
parts of commands) and use all lower case for (user-selected) names of
tables, fields, etc. Of course, the case of characters appearing
within strings is preserved, e.g., 'Hello'
.
All information is for Postgres; see the Postgres reference manual for actual specifications and more information.
CREATE TABLE
The SQL CREATE TABLE
command defines (logical)
relations. Besides specifying the names and types of columns in a
table, CREATE TABLE
also provides options for
using or constraining the data placed in a table.
Examples:
CREATE TABLE state ( abbr char(2) primary key, -- abbreviation name char(15) ); CREATE TABLE person ( eid int2 primary key, -- entry id name char(25), phone char(4), -- campus phone state char(2) references state(abbr) );
The two example CREATE TABLE
statements specify
the schema for
relations state
and person
.
Each line within these examples above describes a field in a table, consisting of a field name and a type, optionally followed by qualifying information for that field; field descriptions are separated (not terminated) by commas.
The qualifications primary key
and
references state(abbr)
are called table
constraints. Constraints are checked by the DBMS when a new row
is inserted in a table, when a row is modified, etc.
Primary key
constraints specify that
the indicated field is the main key; there may be other keys (perhaps
involving multiple columns), but only one can be designated "primary."
The DBMS will automatically check that all keys in a table are
distinct; for example, two rows in the table state
may
not have the same abbr
value.
The constraint references state(abbr)
is called
a foreign key constraint. This constraint indicates that the
field state
of the table person
should
appear among the keys abbr
in the table
state
. The DBMS will not allow us to add a row to
person
that does not satisfy this condition.
We chose the type int2
for eid
because a two-byte integer (range from -32,768 to +32,767) should
suffice for this application. We chose the type char(4)
for the campus phone instead of int2
because we would
like a campus phone number such as 0767
to be treated as a
four-character string, not a 3-digit numerical value 767.
CREATE TABLE
"Added key fields."
In person
, the sole purpose of the field
eid
is to provide a one-field key. We will refer to such
a field as an added key field. Without eid
, the
shortest key would consist of all three fields name
,
phone
and state
, since there might, for
example, be two people with the same phone number, or with the same
name but different states, or two phone numbers for a single person.
Added key fields may be necessary for some applications. For example,
in the person
example, if roommates from the same state
had the same name there
would be no way to distinguish them in person
without
eid
.
An added key field was not necessary in the table
state
, since the two-character abbreviations
MN
, IL
, etc., already determine state names
uniquely.
The choice whether to add a key field represents a tradeoff: an added key field may be convenient for refering to individual records in an application, even if that added field is unnecessary; but when another field already serves as a key, an added key field would clutter a database with redundancy.
The CREATE TABLE
command has many options not
illustrated by these examples. Here are some constraints that could
be added as additional lines within the CREATE TABLE
command (preceded by comma separators):
unique(name, phone)
would cause the DBMS to
check that a new row (record)
has a different combination of name and phone number than every other
row in the table person
.
Because of this uniqueness, this pair of columns could serve as an
key (in addition to eid
). Of
course, this uniqueness constraint
would not be a good idea if two persons might conceivably have the same
name and phone number.
check phone ~ '[0-9][0-9][0-9][0-9]'
will cause
the DBMS to verify that the phone
field of
person
is actually four digits, instead of any four
characters. The string '[0-9][0-9][0-9][0-9]'
is a
regular expression in Postgres, specifying a pattern to be
matched, and the operator ~
is the pattern matching operator in
Postgres.
Note. The operator ~
is not
standard SQL; neither is the regular expression above (although that
form of regular expression is standard in UNIX).
Standard SQL provides a like
operator and a different
form of regular expressions, but standard SQL regular expressions
cannot specify a pattern of four digits.
Other constraints are available, as well. For example, adding
not null
in the description of a field will cause the
DBMS to verify that field is not assigned the special value
null
.
Other qualifications are available besides table constraints. For example,
state char(2) default 'MN'would specify a default value for the
state
attribute in new rows of person
. If this is combined
with the foreign key constraint references state(abbr)
,
we would have to make sure that MN
appears in the
abbr
field of some record in the table state
before using that default value in a new record of person
.
Here is a definition of person
that uses these
additional constraints and the default
value for the
field state
.
CREATE TABLE person ( eid int2 primary key, -- entry id name char(25) not null, phone char(4), -- campus phone state char(2) references state(abbr) default 'MN', unique(name, phone), check (phone ~ '[0-9][0-9][0-9][0-9]') );
INSERT
The INSERT
command adds a row to a table.
Examples:
INSERT INTO state VALUES ('MN', 'Minnesota'); INSERT INTO state VALUES ('WA', 'Washington'); INSERT INTO state (name, abbr) VALUES ('Illinois', 'IL'); INSERT INTO person VALUES (1, 'Dick', '3860', 'WA'); INSERT INTO person VALUES (2, 'Sue' , '3098', 'IL'); INSERT INTO person VALUES (3, 'Chris', '2997'); INSERT INTO person (eid, name) VALUES (4, 'John');
These lines insert three rows in the table state
,
then two rows in the table person
.
The values listed
between parentheses must match the order of fields in the table schema
by default. To override this default, insert a parenthesized list of
field names between the table name and the keyword
VALUES
, as in the Illinois example above.
(Default values) The final two insertions above provide only three values; the
fourth field in each case, state
, will be initialized
with the default value 'MN'
, assuming that we describe
that field state
as default 'MN'
.
In the
last example, a second field phone
is left unspecified.
The value null
is typically entered for unspecified
fields without a default value. To insure that this will be the case,
add default null
to the description of that field
phone
when creating the table person
.
SELECT
SELECT
is the general purpose data retrieval
command in SQL. A command for retrieving data from a database is
called a query.
SELECT
has many options and variants; we will consider
only a few basic uses here.
Example:
SELECT * FROM person;This retrieves all rows and columns from the table
person
. The following output represents the results:
eid | name | phone | state -----+---------------------------+-------+------- 1 | Dick | 3860 | WA 2 | Sue | 3098 | IL 3 | Chris | 2997 | MN 4 | John | | MN (4 rows)Assuming the insertions above, note that although the rows will have these contents, they may not necessarily appear in this order.
SELECT name, phone FROM person;This selects only the two named columns from the table
person
. The operation of choosing only certain columns
is called projection.
Results: name | phone ---------------------------+------- Dick | 3860 Sue | 3098 Chris | 2997 John | (4 rows)
SELECT * FROM person WHERE state = 'MN';The
WHERE
clause in a SELECT
command allows
one to choose among the rows in a table by specifying a boolean
criterion. The operation of choosing among rows is called
selection (and obviously gives rise to the name of the
command SELECT
). Observe that a single
=
sign is used the equality predicate in SQL (unlike the
==
of C++).
Results: eid | name | phone | state -----+---------------------------+-------+------- 3 | Chris | 2997 | MN 4 | John | | MN (2 rows)
SELECT name, phone FROM person WHERE state = 'MN';Here, we combine projection and selection. Observe that the column(s) used in the
WHERE
condition need(s) not appear in the output.
Results: name | phone ---------------------------+------- Chris | 2997 John | (2 rows)
SELECT * FROM person ORDER BY phone;Here we choose an ordering for the output. Results:
eid | name | phone | state -----+---------------------------+-------+------- 3 | Chris | 2997 | MN 2 | Sue | 3098 | IL 1 | Dick | 3860 | WA 4 | John | | MN (4 rows)
SELECT name, state FROM person ORDER BY phone;This example of ordering illustrates the fact that columns used in the ordering criteria need not appear in the output. Results:
name | state ---------------------------+------- Chris | MN Sue | IL Dick | WA John | MN (4 rows)
SELECT name, state FROM person ORDER BY state ASC, name DESC;This suggests some of the possibilities for ordering rows: we require that rows be ordered first by state in ascending order, then by name in descending order. (
ASC
isn't necessary here, because
ascending order is the default.) String ordering is lexicographic
(dictionary) order.
Results: name | state ---------------------------+------- Sue | IL John | MN Chris | MN Dick | WA (4 rows)
SELECT p.name, s.name FROM person p, state s WHERE p.state = s.abbr;This query requests the person names and state names for each row in
person
. The FROM
clause defines temporary
names p
and s
for rows of the
tables person
and state
(respectively), and
the WHERE
clause requires that the state
column of row p
is the same as the abbr
column of row s
.
Results: name | name ---------------------------+----------------- Sue | Illinois John | Minnesota Chris | Minnesota Dick | Washington (4 rows)
This kind of query, which involves rows from two or more tables, is
called a join operation. It is equivalent to asking the DBMS
to create all pairs of rows (p
, s
)
where p
appears in person
and s
appears in state
, then discarding any pairs that do not
match the condition p.state = s.abbr
, and finally
projecting the columns p.name
and s.name
from the pairs that remain.
SELECT person.name, state.name FROM person JOIN state ON person.state = state.abbr;This query produces the same results as the previous query. The
WHERE
clause style of the previous query is understood by most
DBMSs, but this form with a JOIN
clause satisfies the ANSI
standard.
Also, we used aliases p
and s
for the table names person
and state
in the
previous query, but in this query we use those table names directly
without aliases.
SELECT p.name FROM person p, state s WHERE p.state = s.abbr AND s.name ~ 'a';This query requests the names of all persons who come from states containing the letter
a
.
Results: name --------------------------- John Chris Dick (3 rows)
SELECT p.name, s.name FROM person p JOIN state s ON p.state = s.abbr WHERE s.name ~ 'a';
This query returns the same results as the previous query,
but uses the JOIN
syntax for specifying the join instead of
specifying the join as part of the WHERE
clause constraints.
Join operations are very powerful, but may be computationally quite expensive.
UPDATE
The UPDATE command in SQL allows for modification of one or more records (table rows) after those records have been entered.
Examples:
UPDATE person SET eid = 10 + eid; SELECT * FROM person;This command adds ten to each
eid
value in the table
person
.
Results: UPDATE 4 eid | name | phone | state -----+---------------------------+-------+------- 14 | John | | MN 11 | Dick | 3860 | WA 12 | Sue | 3098 | IL 13 | Chris | 2997 | MN (4 rows)The first line,
UPDATE 4
, reports the number of
successful UPDATE
operations performed. UPDATE person SET phone = '3999' WHERE name = 'Sue'; UPDATE person SET name = 'Richard' WHERE name = 'Dick'; UPDATE person SET eid = 29, state = 'IL' WHERE eid = 13; SELECT * FROM person;These three
UPDATE
commands modify individual rows of the
table person
, using a WHERE
clause to
identify the row to be modified. Observe that the WHERE
clause may or may not involve the field to be modified, and that
multiple columns may be modified in a single UPDATE
command.
Results: UPDATE 1 UPDATE 1 UPDATE 1 eid | name | phone | state -----+---------------------------+-------+------- 14 | John | | MN 12 | Sue | 3999 | IL 11 | Richard | 3860 | WA 29 | Chris | 2997 | IL (4 rows)Recall that the order of rows in a table is not significant (unless there is an
ORDER BY
clause).DELETE
To remove one or more rows from a table, use the
DELETE
command.
Examples:
DELETE FROM person WHERE name = 'Richard'; SELECT * FROM person; DELETE FROM person WHERE state = 'IL'; SELECT * FROM person; DELETE FROM person; SELECT * FROM person;Results:
DELETE 1 eid | name | phone | state -----+---------------------------+-------+------- 14 | John | | MN 12 | Sue | 3999 | IL 29 | Chris | 2997 | IL (3 rows) DELETE 2 eid | name | phone | state -----+---------------------------+-------+------- 14 | John | | MN (1 row) DELETE 1 eid | name | phone | state -----+------+-------+------- (0 rows)
DROP
The DROP
command removes an entire table. The
table need not be empty before DROP
ping.
Examples:
DROP TABLE person; DROP TABLE state;
_____
Example:
__________
_____
_____
_____
_____
_____
_____
_____
rab@stolaf.edu, October 09, 2006