CSCI E-153, Web Development Using XML

September 16, 2008

Harvard University
Division of Continuing Education
Extension School

Course Web Site: http://cscie153.dce.harvard.edu/

Instructor email: david_heitmeyer@harvard.edu
Course staff email: cscie153@fas.harvard.edu

Copyright 2003-2008 David P. Heitmeyer

About CSCI E-153

Course Web Site: http://cscie153.dce.harvard.edu/

CSCI E-153 Web Development Using XML (12151)

CSCI E-153 Web Development Using XML(12151)
David P. Heitmeyer, AM, Senior Software Product Architect/Engineer, iCommons, Central Administration Information Technology, Harvard University.
Fall term: Tuesdays, Sept. 16, 5:30-7:30 pm, Harvard Hall, Room 104. Optional sections to be arranged.
Online and on-campus options. See Distance Education.

Students learn key XML technologies (XML, XPath, XSL, XSL-FO, XML Schema, RNG, DTD, XQuery, DOM) as well as specific markup languages relevant to website development (XHTML, XHTMI Mobile Profile, RSS, RDF, XSL-FO, SVG, DocBook, OOXML, OpenDocument, XForms). In addition, the course covers topics such as XML and databases (native XML databases and RDBMS), XML programming APIs (DOM and SAX), Apache Cocoon (an open-source XML publishing framework), and the role of XML in Web 2.0 to deliver data and functionality through Ajax and web services (SOAP and REST). Using these technologies, students develop dynamic, data-driven websites that are capable of delivering content in a variety of media formats (screen, text, print, graphics) to a variety of devices (desktop, handheld, mobile) for a variety of audiences. Prerequisite: CSCI E-12, or the equivalent experience. (4 credits)

Syllabus: Books

In addition to the texts for the course, there will be suggested readings of online material.

Required: XML in a Nutshell, 3rd edition

XML in a Nutshell, 2nd Edition
XML in a Nutshell

Harold, Elliotte Rusty and W. Scott Means. 2004. XML in a Nutshell, 3rd Edition. O'Reilly & Associates. 689 p. ISBN 0596007647.

This is another excellent Nutshell book from O'Reilly. It provides good, solid, readable information about the key XML technologies you need to know.

Required: XSLT 2.0, 3rd edition or XSLT 2.0 and XPath 2.0, 4th edition

XSLT 2.0 and XPath 2.0, 4th edition
XSLT 2.0 and XPath 2.0, 4th edition

Kay, Michael. 2008 XSLT 2.0 and XPath 2.0, 4th edition. Wiley Publishing, Inc. 1316p. ISBN 0470192747.

This is an excellent book that provides excellent coverage and reference for XSLT and XPath. This is essentially the XSLT 2.0, 3rd ed and XPath 2.0, 3rd ed books combined.

XSLT 2.0, 3rd edition
XSLT 2.0

Kay, Michael. 2004 XSLT 2.0, 3rd edition. Wiley Publishing, Inc. 911p. ISBN 0764569090.

This is an excellent book that provides excellent coverage and reference for XSLT, versions 1 and 2.

Recommended: XPath 2.0

XPath 2.0
XPath 2.0

Kay, Michael. 2004 XPath 2.0. Wiley Publishing, Inc. 530p. ISBN 0764569104.

This is the companion book for XSLT 2.0, 3rd edition. This is optional, and is certainly not needed if you have "XSLT 2.0 and XPath 2.0, 4th edition".

Required and recommended texts are available through:

Syllabus: Prerequisites

XML experience. No prior experience with XML will be assumed.

Web experience. It will be expected that students already have an understanding of the web and have taken CSCIE-12 (Fundamentals of Web Site Development), or CSCI S-H, or have equivalent experience. Students should have a solid understanding of Hypertext Markup Language (HTML) and a familiarity with Cascading Style Sheets (CSS).

Programming experience. Experience with any programming language will suffice (Pascal, Basic, Lisp, C/C++, C#, Perl, Python, Java, etc.)

Computing Environment. It is expected that students will be able to install, configure, and run certain software in their own computing environment, namely Cocoon, Maven, and Subversion. Cocoon is a Java-based XML publishing framework, which can be installed and run with the utility Maven. In addition, Subversion, a widely-used revision control system, will be used to submit assignments. While guidance on getting these tools to work on other systems will be provided, it is expected that students will be capable of setting up their computing environment. Although a course server will be provided for students to work on, students are encouraged to use their own environments for reasons of control, learning, and performance. In addition the use of Amazon's Elastic Computing Cloud (EC2) will be demonstrated to provide low-cost, flexible, and dedicated computing environments for course work. Use of EC2 is not required, and any usage charges will be the responsibility of the student. Assignment 1 is geared towards getting the environment established.

Syllabus: Requirements

DescriptionPoints
Assignments. There will be 5 assignments over the course of the term (50,150 points each).
  • Assignment 1. Getting Started and XML Basics. (50 pts)
  • Assignment 2. XSLT and XPath, Part 1; Getting Started with Cocoon. (150 pts)
  • Assignment 3. XSLT and XPath, Part 2. (150 pts)
  • Assignment 4. XML content formats: XML and print (XSL-FO), Mobile Devices, Office. (150 pts)
  • Assignment 5. XML and Web 2.0. (150 pts)
650 points
Community Participation.
This component is focused on participation in the course community (online as well as in-person). There will be specific opportunities throughout the term for points in this area, such as:
  • Short responses to blog posts by instructors
  • Contributing information to a Google map of the class participants.
  • Making contributions to CSCI E-153 "Delicious" bookmark collection.
  • Blog entries about uses of XML in your professional or personal areas of interest.
100 points (maximum)
Final Project

The final project will consist of an XML-driven Course Catalog Website with a variety of media-type outputs (XHTML, HTML, XHTML Mobile Profile, PDF). A component of the final project will include an optional presentation at the last class meeting.

Students will have an option to collaborate in self-formed groups (3 to 4 students) or to work independently for the final project.

Additional details regarding the Final Project will be provided in class.

300 points
Total Points Possible:1050 points

Late Assignments

Late assignments will be not accepted and a score of zero will be recorded for any assignment that is not submitted on time.

Each student will have 4 tokens that may be used to extend the due date of an assignment or assignments. One token will extend the due date by one day. You may use more than one token on an assignment. Each student can use at maximum four tokens (they can't be traded, shared, given away, and have no cash value). Tokens cannot be used on the Final Project. Use of tokens will be indicated when assignment is submitted.

Academic Honesty and Student Responsibilities

It is expected that students will read, understand, and comply with the Student Responsibilities at Harvard Extension School.

Welcome to CSCI E-153

Harvard Hall, Harvard University, Cambridge, Massachusetts
Harvard Hall  Harvard Hall


View Larger Map


Students
Studentsl

XML: Extensible Markup Language

Happy 10th Birthday to XML!

W3C XML 10th anniversary

"There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes..."
Tim Bray, read more from the press release.

Address Book Example

Data

Contacts in and Address Book in XML:

Presentation

XHTML for presentation:

Displayed in a browser:

NameDavid Heitmeyer
Address8 Story Street
Cambridge, Massachusetts 02138
Phone617-384-6656
Email david_heitmeyer@harvard.edu
NameDrew Gilpin Faust
AddressMassachusetts Hall
Cambridge, Massachusetts 02138
Emailpresident@harvard.edu

XML in 10 Points

  1. XML is for structuring data
  2. XML looks a bit like HTML
  3. XML is text, but isn't meant to be read
  4. XML is verbose by design
  5. XML is a family of technologies
  6. XML is new, but not that new
  7. XML leads HTML to XHTML
  8. XML is modular
  9. XML is the basis for RDF and the Semantic Web
  10. XML is license-free, platform-independent and well-supported

An aside: W3C

The World Wide Web Consortium was created in October 1994 to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. W3C has around 450 Member organizations from all over the world and has earned international recognition for its contributions to the growth of the Web. (from http://www.w3.org/Consortium/ )

Content and Presentation

Content

XML is all about content.

Presentation

XML is all about presenting your content.

Demonstration of the Power of XML and XSLT

XSLT - Extensible Stylesheet Language for Transformation

XHTML
XHTML
XML
XML
Mobile Devices (XHTML, XHTML Mobile Profile, and WML)
XHTML iPhoneXHTML Mobile ProfileXHTML Mobile ProfileWML
PDF
PDF
Text
Text
SVG
SVG

Aggregation, Re-mixing, Mashups

XML is often at the center of data "aggregation," "mashups," and "re-mixing" that is characteristic of some "Web 2.0" sites and applications.

my.summer.harvard.edu
Summer School Portal

govtrack.us
GovTrack.US

chicago crime
ChicagoCrime.org

Housing Maps
HousingMaps.com

Weather

weatherweather

National Weather Service

NOAA's National Weather Service offers several XML services: National Digital Forecast Database XML Web Service
KBOS Weather in RSS Feed

NWS MA Weather

Major League Baseball

American League East Division

How they were made

  1. Programmatic Interface to construct XML from non-XML source (SAX or DOM)
    JSON Data for August 1, 2008 from mlb.com
  2. Transformation to XML "Chart format" (XSLT)
  3. Transformation to Scalable Vector Graphic (SVG, by XSLT)
  4. Serialization of SVG to PNG image format

baseball.xml snippet:

Election 2004: Massachusetts in Red, Blue, and Purple

Red = Bush
Blue = Kerry

Massachusetts Purple Map, Presidential Election 2004

Select a thumbnail to view a larger image
Massachusetts Cities and Towns Boundaries
Massachusetts Cities and Towns
Massachusetts Red/Blue Map, Presidential Election 2004
Red/Blue Map
Massachusetts Purple Map, Presidential Election 2004
Purple Map

How they were made

usa todayMass GIS
  1. GIS data (non-XML) from MassGIS
  2. XML Document Object Model (DOM)
  3. Scalable Vector Graphics (SVG)
  4. Vote data from USA Today Web Site (XHTML)
  5. XSLT and XPath
  6. SVG to JPEG/PNG Conversion

Google Maps & Google Earth

Google Earth KML
Google Earth

Google Maps
Google Maps

KML file:

DHS Threat Level Advisory

dhs threat level   dhs 

threat level
http://www.dhs.gov/dhspublic/getAdvisoryCondition

Syndication Feeds: RSS, Atom, Podcasts

RSS and Atom are lightweight XML formats for sharing headlines and other Web content.

RSS Snippet: Item

The heart of an RSS feed is an "item", which can have a title, link, and description.
Item from BBC News:

BBC RSS


Atom Snippet

Atom feed displayed in Firefox:
BBC RSS

Consuming Feeds

Subscribe to Feeds via an E-mail client

Thunderbird RSS

Firefox "Live Bookmarks"

Firefox "Live Bookmarks" alert you to the existence of an RSS feed for a site and allow convenient access to the items in the feed.

live bookmark

live bookmarks

CSCI E-153 RSS Feed

RSS Feed of CSCI E-153 Announcements

Podcasts: An Adaptation of RSS

Adapting and extending RSS for use with iTunes, iPods, and other mobile media players.

podcast

Podcast snippet ( WBUR/NPR On Point with Tom Ashbrook Podcast):

RSS and Yahoo! Maps

Yahoo! Maps and RSS with GeoInfo
yahoo! maps with RSS

Technologies, Tasks, and Tools

XML-related standards, technologies and tools grouped by purpose
Task or PurposeStandard or TechnologyTools
Authoring and structuringXML, SGMLText Editor
Defining and ValidatingDTD, W3C XML Schema, RELAX NG (RNG), SchematronXML Parser (Apache Xerces )
Formatting, Styling, PresentationCSS, XSLT, XSL-FO, DSSSLXSL-FO Processor (Apache FOP)
XSLT Processor (Apache Xalan, Saxon)
TransformingXSLTXSLT Stylesheet Processor (Apache Xalan, Saxon )
Navigating and SearchingXPath, XQuery
Specific Document Formats (XML Applications)
  • data
  • presentation
XHTML, DocBook, WML, SVG, MathML, etc.Text Editor
MergingXML Inclusions (XInclude)
Relating and AssociatingRDF, XTM (XML Topic Maps) 
ProgrammingSAX, DOMProgramming Language (Java, C#, Perl, Python, etc.)
Application Server (Publishing / Programming)XML, XSLT, XPath, SAX, DOM, ...Apache Cocoon, Apache AxKit

Class Schedule

SGML and XML

sgml and xml relationship

XML: a structure for creating and working with markup languages

XML provides a way to create markup languages. The markup languages are called applications of XML. Some applications of XML are:
Looking for existing applications of XML? Check out Oasis Open or XML.org

XML Documents: Well-formed, Valid

Well-formed XML Documents

Valid

  1. Well-formed and
  2. Conforming to rules of a schema

A Tree

tree

Visualizing a Tree

tree   tree

Tree view using Stylus Studio
stylus studio screenshot

Tree view using Amaya Browser/Editor
amaya screenshot

Tree View generated with XSLT ( XSLT used to make a Tree View of and XML document.)

Mechanics of an XML Document

XML Declaration

<?xml version="1.0" encoding="UTF-8" ?>

XML version

1.0
Extensible Markup Language (XML) 1.0 (Fourth Edition), August 2006

XML encoding

XML supports Unicode

You may also see ISO-8859-1 quite a bit (also known as Latin-1). When in doubt, use UTF-8 (it is the default anyway).

Character Entities

In XML, there are only 5 named character entities defined:

Numeric Character Entities

Use &#nnn; (nnn = decimal number) or &#xnnnn; (nnnn = hexadecimal number).

Named Entities from XHTML

Named entities (e.g. &copy;) are defined as part of the markup language.
Named XHTML entities are not available generally in XML documents.

XML document mechanics and meaning

XHTML and MathML

mathml screenshot

mathml

XML Namespaces

XHTML and MathML

Define prefix and namespace URI binding and default namespace:

Use MathML inline in XHTML document:

XML Namespaces

mathml

Parts

Purposes

Setting Default Namespaces

The following documents have the same "meaning":

Snippet XHTML and MathML, with "mml" prefix used for all "MathML" elements.

XHTML and MathML, with MathML defined as default namespace for "math" element and all children.

XML Namespaces: Prefix and URI

Need to bind a prefix to a URI.

W3C RSS document


GovTrack.US People RDF Data


Extensible Stylesheet Language Transformations (XSLT) and XML Path Language (XPath)

XSLT

XPath

Simple XSLT Example as an Introduction

XSLT

XSLT Processors at work

XSLT Processor


A closer look:

XSLT Processor

XSLT Processors

In this course we will be using the XSLT processors Saxon (Michael Kay) and Apache Xalan (Apache Software Foundation).

XSLT Examples

XSLT Example 1

BBC RSS

XSLT Example 2

BBC RSS

XSLT Example 3

BBC RSS

Default Rules for XSLT Processing


The simplest XSLT

The result:

Common XSLT elements

xsl:template

define a template to process a node that matches

xsl:apply-templates

select list of nodes to apply templates to

When you think about XSL, think in terms of recursion! Don't loop, recurse!

xsl:value-of

output a value of a selected node or expression