September 16, 2008
Harvard University
Division of Continuing Education
Extension School
Course Web Site: http://cscie153.dce.harvard.edu/
Instructor email:
david_heitmeyer@harvard.edu
Course staff email:
cscie153@fas.harvard.edu
Copyright 2003-2008 David P. Heitmeyer
Course Web Site: http://cscie153.dce.harvard.edu/
CSCI E-153
Web Development Using XML(12151)
David P. Heitmeyer, AM, Senior Software Product Architect/Engineer, iCommons, Central Administration Information Technology, Harvard University.
Fall term:
Tuesdays, Sept. 16, 5:30-7:30 pm, Harvard Hall, Room 104. Optional sections to be arranged.
Online and on-campus options. See Distance Education.
Students learn key XML technologies (XML, XPath, XSL, XSL-FO, XML Schema, RNG, DTD, XQuery, DOM) as well as specific markup languages relevant to website development (XHTML, XHTMI Mobile Profile, RSS, RDF, XSL-FO, SVG, DocBook, OOXML, OpenDocument, XForms). In addition, the course covers topics such as XML and databases (native XML databases and RDBMS), XML programming APIs (DOM and SAX), Apache Cocoon (an open-source XML publishing framework), and the role of XML in Web 2.0 to deliver data and functionality through Ajax and web services (SOAP and REST). Using these technologies, students develop dynamic, data-driven websites that are capable of delivering content in a variety of media formats (screen, text, print, graphics) to a variety of devices (desktop, handheld, mobile) for a variety of audiences. Prerequisite: CSCI E-12, or the equivalent experience. (4 credits)
In addition to the texts for the course, there will be suggested readings of online material.
Harold, Elliotte Rusty and W. Scott Means. 2004. XML in a Nutshell, 3rd Edition. O'Reilly & Associates. 689 p. ISBN 0596007647. This is another excellent Nutshell book from O'Reilly. It provides good, solid, readable information about the key XML technologies you need to know. |
Kay, Michael. 2008 XSLT 2.0 and XPath 2.0, 4th edition. Wiley Publishing, Inc. 1316p. ISBN 0470192747. This is an excellent book that provides excellent coverage and reference for XSLT and XPath. This is essentially the XSLT 2.0, 3rd ed and XPath 2.0, 3rd ed books combined. | |
Kay, Michael. 2004 XSLT 2.0, 3rd edition. Wiley Publishing, Inc. 911p. ISBN 0764569090. This is an excellent book that provides excellent coverage and reference for XSLT, versions 1 and 2. |
Kay, Michael. 2004 XPath 2.0. Wiley Publishing, Inc. 530p. ISBN 0764569104. This is the companion book for XSLT 2.0, 3rd edition. This is optional, and is certainly not needed if you have "XSLT 2.0 and XPath 2.0, 4th edition". |
Required and recommended texts are available through:
XML experience. No prior experience with XML will be assumed.
Web experience. It will be expected that students already have an understanding of the web and have taken CSCIE-12 (Fundamentals of Web Site Development), or CSCI S-H, or have equivalent experience. Students should have a solid understanding of Hypertext Markup Language (HTML) and a familiarity with Cascading Style Sheets (CSS).
Programming experience. Experience with any programming language will suffice (Pascal, Basic, Lisp, C/C++, C#, Perl, Python, Java, etc.)
Computing Environment. It is expected that students will be able to install, configure, and run certain software in their own computing environment, namely Cocoon, Maven, and Subversion. Cocoon is a Java-based XML publishing framework, which can be installed and run with the utility Maven. In addition, Subversion, a widely-used revision control system, will be used to submit assignments. While guidance on getting these tools to work on other systems will be provided, it is expected that students will be capable of setting up their computing environment. Although a course server will be provided for students to work on, students are encouraged to use their own environments for reasons of control, learning, and performance. In addition the use of Amazon's Elastic Computing Cloud (EC2) will be demonstrated to provide low-cost, flexible, and dedicated computing environments for course work. Use of EC2 is not required, and any usage charges will be the responsibility of the student. Assignment 1 is geared towards getting the environment established.
| Description | Points |
|---|---|
Assignments. There will be 5 assignments over
the course of the term (50,150 points each).
| 650 points |
| Community Participation. This component is focused on participation in the course community (online as well as in-person). There will be specific opportunities throughout the term for points in this area, such as:
| 100 points (maximum) |
| Final Project The final project will consist of an XML-driven Course Catalog Website with a variety of media-type outputs (XHTML, HTML, XHTML Mobile Profile, PDF). A component of the final project will include an optional presentation at the last class meeting. Students will have an option to collaborate in self-formed groups (3 to 4 students) or to work independently for the final project. Additional details regarding the Final Project will be provided in class. | 300 points |
| Total Points Possible: | 1050 points |
Late assignments will be not accepted and a score of zero will be recorded for any assignment that is not submitted on time.
Each student will have 4 tokens that may be used to extend the due date of an assignment or assignments. One token will extend the due date by one day. You may use more than one token on an assignment. Each student can use at maximum four tokens (they can't be traded, shared, given away, and have no cash value). Tokens cannot be used on the Final Project. Use of tokens will be indicated when assignment is submitted.
It is expected that students will read, understand, and comply with the Student Responsibilities at Harvard Extension School.
Harvard Hall, Harvard University, Cambridge, Massachusetts

Students
Happy 10th Birthday to XML!
"There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes..."
Tim Bray, read more from the press release.
Contacts in and Address Book in XML:
XHTML for presentation:
Displayed in a browser:
| Name | David Heitmeyer |
|---|---|
| Address | 8 Story Street
Cambridge, Massachusetts 02138 |
| Phone | 617-384-6656 |
| david_heitmeyer@harvard.edu | |
| Name | Drew Gilpin Faust |
| Address | Massachusetts Hall
Cambridge, Massachusetts 02138 |
| president@harvard.edu |
The World Wide Web Consortium was created in October 1994 to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. W3C has around 450 Member organizations from all over the world and has earned international recognition for its contributions to the growth of the Web. (from http://www.w3.org/Consortium/ )
XSLT - Extensible Stylesheet Language for Transformation
XHTML ![]() | XML ![]() |
Mobile Devices (XHTML, XHTML Mobile Profile, and WML) ![]() ![]() ![]() ![]() | |
PDF ![]() | Text ![]() |
SVG ![]() | |
XML is often at the center of data "aggregation," "mashups," and "re-mixing" that is characteristic of some "Web 2.0" sites and applications.
![]() | ![]() |
NOAA's National Weather Service offers several XML services: National Digital Forecast Database XML
Web Service
baseball.xml snippet:
Red = Bush
Blue = Kerry

Massachusetts Cities and Towns | |
Red/Blue Map | Purple Map |
![]() | ![]() |
RSS and Atom are lightweight XML formats for sharing headlines and other Web content.
The heart of an RSS feed is an "item", which can have a title, link, and description.
Item from BBC News:

Atom feed displayed in Firefox:

Firefox "Live Bookmarks" alert you to the existence of an RSS feed for a site and allow convenient access to the items in the feed.


RSS Feed of CSCI E-153 Announcements
Adapting and extending RSS for use with iTunes, iPods, and other mobile media players.

Podcast snippet ( WBUR/NPR On Point with Tom Ashbrook Podcast):
Yahoo! Maps and RSS with GeoInfo
| Task or Purpose | Standard or Technology | Tools |
|---|---|---|
| Authoring and structuring | XML, SGML | Text Editor |
| Defining and Validating | DTD, W3C XML Schema, RELAX NG (RNG), Schematron | XML Parser (Apache Xerces ) |
| Formatting, Styling, Presentation | CSS, XSLT, XSL-FO, DSSSL | XSL-FO Processor (Apache FOP)
XSLT Processor (Apache Xalan, Saxon) |
| Transforming | XSLT | XSLT Stylesheet Processor (Apache Xalan, Saxon ) |
| Navigating and Searching | XPath, XQuery | |
Specific Document Formats (XML Applications)
| XHTML, DocBook, WML, SVG, MathML, etc. | Text Editor |
| Merging | XML Inclusions (XInclude) | |
| Relating and Associating | RDF, XTM (XML Topic Maps) | |
| Programming | SAX, DOM | Programming Language (Java, C#, Perl, Python, etc.) |
| Application Server (Publishing / Programming) | XML, XSLT, XPath, SAX, DOM, ... | Apache Cocoon, Apache AxKit |

| Entity | Character |
|---|---|
| < | < |
| > | > |
| & | & |
| " | " |
| ' | ' |
![]() |

Tree view using Stylus Studio
Tree view using Amaya Browser/Editor
Tree View generated with XSLT ( XSLT used to make a Tree View of and XML document.)
<?xml version="1.0" encoding="UTF-8" ?>1.0
Extensible Markup
Language (XML) 1.0 (Fourth Edition), August 2006
You may also see ISO-8859-1 quite a bit (also known as Latin-1). When in doubt, use UTF-8 (it is the default anyway).
In XML, there are only 5 named character entities defined:
&169; → ©© → © &#nnn; (nnn = decimal number) or
&#xnnnn; (nnnn = hexadecimal number). Named entities (e.g. ©) are defined as part of the markup language.
Named XHTML entities
are not available generally in XML documents.
Whitespace. In general, whitespace does not change the meaning of XML (as in (X)HTML). Note that there are exceptions to this.
Order of elements is meaningful.
Order of attributes is not meaningful.
Comments. XML comments are just like comments in HTML
CDATA sections
In CDATA sections, do need to have <, >, and & as character entities. Often used to
put 'markup' into an element.
Syntax:


Define prefix and namespace URI binding and default namespace:
Use MathML inline in XHTML document:

xmlns:mml="http://www.w3.org/1998/Math/MathML"http://www.w3.org/1998/Math/MathML{http://www.w3.org/1998/Math/MathML}mathThe following documents have the same "meaning":
Snippet XHTML and MathML, with "mml" prefix used for all "MathML" elements. | XHTML and MathML, with MathML defined as default namespace for "math" element and all children. |
W3C RSS document
GovTrack.US People RDF Data
XSLT


In this course we will be using the XSLT processors Saxon (Michael Kay) and Apache Xalan (Apache Software Foundation).



The simplest XSLT
The result:
define a template to process a node that matches
select list of nodes to apply templates to
output a value of a selected node or expression