Links
|
Overview of XML
CS 284 (CSA), Spring 2005
In this page, we present an outline of markup languages in general
and XML in particular, providing a framework for programming with
XML. We make no particular assumptions, but assume general prior
exposure to HTML.
Markup
Terms: markup, markup languages,
rendition ("source" text with markup), presentation
(formatted view), style sheet (determines how to present
rendition).
Markup languages such as HTML and XML use tags for the
markup
General form: <name attrib="value"
...>
content... </name> .
Tags such as <name attrib="value"
...> are open tags;
</name>
is the corresponding close tag
Elements of the document are indicated by tag (pair)s
<name ...>
content... </name> .
The content of such an element is the (marked up) text
between the open and close tags.
Attributes are options of the form
attrib="value" ... within an
open tag.
Entities are additional objects within the rendition.
For example, the entity
represents a "non-breaking space", and
< represents the less-than character. Other ideas
for entities: an entity used to insert a company's logo graphic; and
an entity used to insert a standard body of text, such as a copyright
notice.
History of markup languages
SGML, Standard Generalized Markup Language, Goldfarb et al
since 1960's, beginning at IBM.
Goldfarb coined term markup in 1970
Standards in '86, '91
Tags of form <name attrib="value"
...>
content... </name> . Properly
bracketed, i.e, every open tag in the rendition has a close tag.
Fundamental goals:
Common rendition representation
Extensibility, i.e., ability to define new tags, etc.
Document type rules
Document type rules represented in a separate DTD (Document
Type Definition) language, using regular expressions. See below
LaTeX, Leslie Lamport published 1985
Implemented as a macro package over TeX (Donald Knuth, 1978-81)
typesetting language.
Some proper bracketing: e.g., "environments" have the form \begin{name}...\end{name}
Common rendition representation, extensible, but no explicit
document type rules.
commonly used in Mathematics and CS research publications.
HTML, Tim Berners-Lee (creator of WWW) and Anders Berglund,
'89
Tags <name attrib="value"
...>
content... </name> as in
predecessor language SGML
Common rendition, but no extensibility (at first), no
(modifyable) document type rules.
Designed as a simplification of SGML for WWW authoring.
XML, Berners-Lee et al (WWW consortium, w3c.org ) '96, standard '98.
Simplified subset of SGML, but with extensibility and document
type rules
Document types may be expressed using DTDs
or XML Schema
(an XML form for specifying document types).
DTDs
Processing XML
_____
|