Links
|
Introduction to SQL
CS 284 (CSA), Spring 2020
Table of contents
Standards and DBMSs
Recall that a standard describes the expected features
for a programming language. For example, the MIT Scheme interpreter
used at St. Olaf satisfies the Scheme standard known as RSR5, except
that the empty list () is returned by predicates instead
of the standard false value #f . MIT Scheme also includes
many extensions of RSR5, such as the predefined predicate
list? (RSR5 only specifies null? and
pair? ). We do not consider extensions to violate a
standard unless those extensions contradict some requirement
in that standard.
Standards are developed by (often lengthy) deliberation by
committees that have broad
representation of (typically commercial) parties interested in those
standards, e.g., producers of compilers and interpreters, vendors who
create applications that will use the standard, etc.
The two most recent SQL standards are SQL92 and SQL99, produced
in 1992 and 1999, respectively. Unlike Scheme and C++, actual DBMS
products all differ from the standards in many respects,
although basic forms of the core SQL commands are generally accepted
by all of the DBMSs
Four example DBMSs are:
Oracle, a leading commercial system with many
features, known for its scalability and reliability, available on
many computing platforms (i.e., a wide variety of hardware/software
combinations). Oracle can be tuned to optimize performance
for specific applications; the number of adjustable system parameters
makes it necessary to have a DBA at most installations.
SQL Server (pronounced "sequel server") is Microsoft's
(commercial) DBMS, and runs only on Microsoft platforms. It is low
in cost, high in performance, and easy to use, but far less tunable
than Oracle.
Postgres, the DBMS we will use for the course, is an
open source system that is rich in features (though not as rich as
Oracle) and adheres to the SQL standard more closely than any other
non-commercial DBMS. It runs on many platforms and includes support
for large-scale applications.
MySQL is a widely used open source system known for
its high performance. MySQL provides fewer features than the
DBMSs above. For example, transactions (in which multiple
database operations are performed as an indivisible unit) are not
supported; some do not consider MySQL to be a true DBMS for this
reason. MySQL is available on platforms including Linux.
Types and values
Types available illustrate some of the ways that DBMSs differ
from the SQL standards. The table below shows some type names specified
in SQL99 and corresponding type names actually used in Oracle and
Postgres. Type names in this font are
extensions of the standard.
SQL99
standard | Postgres | Oracle | Note |
character(n) | char(n) | char(n)
or character(n) | string of fixed length, padded
with spaces |
character
varying(n) | varchar(n) , text | varchar ,
long | Up to n characters, not padded by
spaces. Postgres text and Oracle
varchar does not specify n; Oracle long allows for up
to 2GB strings |
national character
varying(n) | | national
character varying(n) | Unicode strings |
integer | int2 ,
int4 , int8 | integer(n) ,
smallint | 2,4,8,n indicate size in bytes |
float(p,s) | float4 ,
float8 | float | Standard calls
for p recision and s ize specification |
boolean | bool | | |
temporal | date ,
time , timespan ,
interval , timestamp ,
timetz ,
datetime | date | Postgres
timespan includes two times,
interval is a "delta time",
timetz includes time zone, etc. |
binary | | BLOB ,
bfile , CLOB ,
raw , long raw | For
representing binary data such as images, large quantities of text,
etc.; BLOB
is "binary large object", CLOB is "character large object; Oracle
raw is up to 2K binary, long
raw up to 4G |
The value NULL is available for all types, and
indicates a missing value in a table field.
Strings are delimited using ' or ` ,
not " . Example: 'hello' . Double the
string delimiter to include it in a string, e.g., 'Who''s on first?' .
SQL Statements
All SQL statements are terminated with a semicolon
; . Two dashes -- introduce a comment;
/* ... */ comments are also available.
We will consider a database implementation of a simplified
telephone-address book as a running example, e.g., perhaps representing the
address by only a state field.
Names of commands, fields, types, etc., are all case
insensitive. We will use all caps for SQL keywords (e.g.,
parts of commands) and use all lower case for (user-selected) names of
tables, fields, etc. Of course, the case of characters appearing
within strings is preserved, e.g., 'Hello' .
All information is for Postgres; see the
Postgres
reference
manual for actual specifications and more information.
CREATE TABLE
The SQL CREATE TABLE command defines (logical)
relations. Besides specifying the names and types of columns in a
table, CREATE TABLE also provides options for
using or constraining the data placed in a table.
Examples: CREATE TABLE state (
abbr char(2) primary key, -- abbreviation
name char(15)
);
CREATE TABLE person (
eid int2 primary key, -- entry id
name char(25),
phone char(4), -- campus phone
state char(2) references state(abbr)
);
The two example CREATE TABLE statements specify
the schema for
relations state and person .
Each
line within these examples above describes a field in a table,
consisting of a field name and a type, optionally followed by
qualifying information for that field; field descriptions are
separated (not terminated) by commas.
The qualifications primary key and
references state(abbr) are called table
constraints. Constraints are checked by the DBMS when a new row
is inserted in a table, when a row is modified, etc.
Primary key constraints specify that
the indicated field is the main key; there may be other keys (perhaps
involving multiple columns), but only one can be designated "primary."
The DBMS will automatically check that all keys in a table are
distinct; for example, two rows in the table state may
not have the same abbr value.
The constraint references state(abbr) is called
a foreign key constraint. This constraint indicates that the
field state of the table person should
appear among the keys abbr in the table
state . The DBMS will not allow us to add a row to
person that does not satisfy this condition.
We chose the type int2 for eid
because a two-byte integer (range from -32,768 to +32,767) should
suffice for this application. We chose the type char(4)
for the campus phone instead of int2 because we would
like a campus phone number such as 0767 to be treated as a
four-character string, not a 3-digit numerical value 767.
More on CREATE TABLE
"Added key fields."
In person , the sole purpose of the field
eid is to provide a one-field key. We will refer to such
a field as an added key field. Without eid , the
shortest key would consist of all three fields name ,
phone and state , since there might, for
example, be two people with the same phone number, or with the same
name but different states, or two phone numbers for a single person.
Added key fields may be necessary for some applications. For example,
in the person example, if roommates from the same state
had the same name there
would be no way to distinguish them in person without
eid .
-
An added key field was not necessary in the table
state , since the two-character abbreviations
MN , IL , etc., already determine state names
uniquely.
The choice whether to add a key field represents a
tradeoff: an added key field may be convenient for refering
to individual records in an application, even if that added field is
unnecessary; but when another field already serves as a key, an added
key field would clutter a database with redundancy.
The CREATE TABLE command has many options not
illustrated by these examples. Here are some constraints that could
be added as additional lines within the CREATE TABLE
command (preceded by comma separators):
unique(name, phone) would cause the DBMS to
check that a new row (record)
has a different combination of name and phone number than every other
row in the table person .
Because of this uniqueness, this pair of columns could serve as an
key (in addition to eid ). Of
course, this uniqueness constraint
would not be a good idea if two persons might conceivably have the same
name and phone number.
check phone ~ '[0-9][0-9][0-9][0-9]' will cause
the DBMS to verify that the phone field of
person is actually four digits, instead of any four
characters. The string '[0-9][0-9][0-9][0-9]' is a
regular expression in Postgres, specifying a pattern to be
matched, and the operator ~ is the pattern matching operator in
Postgres.
Note. The operator ~ is not
standard SQL; neither is the regular expression above (although that
form of regular expression is standard in UNIX).
Standard SQL provides a like operator and a different
form of regular expressions, but standard SQL regular expressions
cannot specify a pattern of four digits.
Other constraints are available, as well. For example, adding
not null in the description of a field will cause the
DBMS to verify that field is not assigned the special value
null .
Other qualifications are available besides table constraints.
For example, state char(2) default 'MN'
would specify a default value for the state
attribute in new rows of person . If this is combined
with the foreign key constraint references state(abbr) ,
we would have to make sure that MN appears in the
abbr field of some record in the table state
before using that default value in a new record of person .
Here is a definition of person that uses these
additional constraints and the default value for the
field state . CREATE TABLE person (
eid int2 primary key, -- entry id
name char(25) not null,
phone char(4), -- campus phone
state char(2) references state(abbr) default 'MN',
unique(name, phone),
check (phone ~ '[0-9][0-9][0-9][0-9]')
);
INSERT
The INSERT command adds a row to a table.
Examples: INSERT INTO state VALUES ('MN', 'Minnesota');
INSERT INTO state VALUES ('WA', 'Washington');
INSERT INTO state (name, abbr) VALUES ('Illinois', 'IL');
INSERT INTO person VALUES (1, 'Dick', '3860', 'WA');
INSERT INTO person VALUES (2, 'Sue' , '3098', 'IL');
INSERT INTO person VALUES (3, 'Chris', '2997');
INSERT INTO person (eid, name) VALUES (4, 'John');
These lines insert three rows in the table state ,
then two rows in the table person .
The values listed
between parentheses must match the order of fields in the table schema
by default. To override this default, insert a parenthesized list of
field names between the table name and the keyword
VALUES , as in the Illinois example above.
(Default values) The final two insertions above provide only three values; the
fourth field in each case, state , will be initialized
with the default value 'MN' , assuming that we describe
that field state as default 'MN' . In the
last example, a second field phone is left unspecified.
The value null is typically entered for unspecified
fields without a default value. To insure that this will be the case,
add default null to the description of that field
phone when creating the table person .
Introduction to SELECT
SELECT is the general purpose data retrieval
command in SQL. A command for retrieving data from a database is
called a query.
SELECT has many options and variants; we will consider
only a few basic uses here.
Example:
SELECT * FROM person;
This retrieves all rows and columns from the table
person . The following output represents the results:
eid | name | phone | state
-----+---------------------------+-------+-------
1 | Dick | 3860 | WA
2 | Sue | 3098 | IL
3 | Chris | 2997 | MN
4 | John | | MN
(4 rows)
Assuming the insertions above,
note that although the rows will have these contents, they may not
necessarily appear in this order.
SELECT name, phone FROM person;
This selects only the two named columns from the table
person . The operation of choosing only certain columns
is called projection.
Results: name | phone
---------------------------+-------
Dick | 3860
Sue | 3098
Chris | 2997
John |
(4 rows)
SELECT * FROM person WHERE state = 'MN';
The WHERE clause in a SELECT command allows
one to choose among the rows in a table by specifying a boolean
criterion. The operation of choosing among rows is called
selection (and obviously gives rise to the name of the
command SELECT ). Observe that a single
= sign is used the equality predicate in SQL (unlike the
== of C++).
Results: eid | name | phone | state
-----+---------------------------+-------+-------
3 | Chris | 2997 | MN
4 | John | | MN
(2 rows)
SELECT name, phone FROM person WHERE state = 'MN';
Here, we combine projection and selection. Observe that the column(s)
used in the WHERE condition need(s) not appear in the output.
Results: name | phone
---------------------------+-------
Chris | 2997
John |
(2 rows)
SELECT * FROM person ORDER BY phone;
Here we choose an ordering for the output.
Results: eid | name | phone | state
-----+---------------------------+-------+-------
3 | Chris | 2997 | MN
2 | Sue | 3098 | IL
1 | Dick | 3860 | WA
4 | John | | MN
(4 rows)
SELECT name, state FROM person ORDER BY phone;
This example of ordering illustrates the fact that columns used in the
ordering criteria need not appear in the output.
Results: name | state
---------------------------+-------
Chris | MN
Sue | IL
Dick | WA
John | MN
(4 rows)
SELECT name, state FROM person ORDER BY state ASC, name DESC;
This suggests some of the possibilities for ordering rows: we require
that rows be ordered first by state in ascending order, then by name
in descending order. (ASC isn't necessary here, because
ascending order is the default.) String ordering is lexicographic
(dictionary) order.
Results: name | state
---------------------------+-------
Sue | IL
John | MN
Chris | MN
Dick | WA
(4 rows)
Joins
SELECT p.name, s.name FROM person p, state s WHERE p.state = s.abbr;
This query requests the person names and state names for each row in
person . The FROM clause defines temporary
names p and s for rows of the
tables person and state (respectively), and
the WHERE clause requires that the state
column of row p is the same as the abbr
column of row s .
Results: name | name
---------------------------+-----------------
Sue | Illinois
John | Minnesota
Chris | Minnesota
Dick | Washington
(4 rows)
This kind of query, which involves rows from two or more tables, is
called a join operation. It is equivalent to asking the DBMS
to create all pairs of rows (p , s )
where p appears in person and s
appears in state , then discarding any pairs that do not
match the condition p.state = s.abbr , and finally
projecting the columns p.name and s.name
from the pairs that remain.
SELECT person.name, state.name FROM person
JOIN state ON person.state = state.abbr;
This query produces the same results as the previous query. The
WHERE clause style of the previous query is understood by most
DBMSs, but this form with a JOIN clause satisfies the ANSI
standard.
Also, we used aliases p and s
for the table names person and state in the
previous query, but in this query we use those table names directly
without aliases.
SELECT p.name FROM person p, state s WHERE p.state = s.abbr AND s.name ~ 'a';
This query requests the names of all persons who come from states
containing the letter a .
Results: name
---------------------------
John
Chris
Dick
(3 rows)
SELECT p.name, s.name FROM person p
JOIN state s ON p.state = s.abbr WHERE s.name ~ 'a';
This query returns the same results as the previous query,
but uses the JOIN syntax for specifying the join instead of
specifying the join as part of the WHERE clause constraints.
Join operations are very powerful, but may be computationally
quite expensive.
UPDATE
DELETE
DROP
File interface
_____
Example: _____
_____
_____
_____
_____
_____
_____
_____
_____
|