WWW 2002

Sheraton Waikiki on left, Diamond Head on right

Frank Fujimoto
May 8-10, 2002
Sheraton Waikiki
Honolulu, HI

As with other years, I concentrated on going to the plenary sessions and panels, but instead of filling time with presentations of submitted papers, I went to several of the W3C update sessions. There was a lot going on, although there were fewer attendees than previous WWW conventions.

Speakers

Tim Berners-Lee

W3C
Specs Count

Tim Berners-Lee, the developer of the World Wide Web, is the director of the W3C. In past plenary addresses, he’s presented his vision of a Semantic Web, but since that effort is gaining momentum, he didn’t focus on that this year.

Tim’s premise is that the web is built on a foundation of layers, and that each of those layers is based on standards to which everything must conform for things to work correctly. Each layer is a small abstraction of the layer below, and general enough to allow many different applications on the layer above. As an example, HTTP is general enough to not only be a mechanism to transport HTML, but to also transport other types of information (such as XML.)

Tim also touched on a couple other topics, including patents. He does recognize that patents do have their place, including rewarding research and protecting from plagirism, but he also believes that most specifications should be royalty-free, as they have been in the past, and that new growth can be built onto new foundations only if they are royalty-free. Asked how he thought the web would look in 50 years, he said that more innovation will come from looking only a few years into the future rather than trying to predict what will happen decades from now. His analogy was that he had no idea what would happen with the web when it began, and by the same token, he has no idea what today’s activities will allow to happen in the future, but by making things general enough, it’s more likely for new ideas to be able to take off.

Ian Foster

University of Chicago
The Grid, Enabling Resource Sharing within Virtual Organizations

The Grid is a framework for connecting several computers together via a network and splitting a task among several of them. It’s somewhat like a generalized version of the SETI project which divides signal analysis among computers all over the internet. Where the Grid goes beyond SETI is a client can set up to be a user or provider of computing or storage services, and there are multiple qualities of service. Each request has a time-to-live associated with it, and the requesting system needs to update those TTLs during the course of the job.

Alfred Spector

IBM Research
Architecting Knowledge Middleware

Alfred’s belief is that while during the early 1990s the web was elegant and simple, new technologies are now required, in particular for text analysis. However, text analysis technologies and techniques seldom work well together.

He believes that users would like seamless use of all kinds of information, and want good results. As an example, if you are looking for a product on an online store, you probably won’t find the product you’re seeking if you misspell the product’s name. However, if those misspellings can be collected and indexed, that could greatly help buyers. Another example is that adding knowledge to search can help, such as personalized information or task-specific context.

Alfred thinks that a combination of information retrieval, grammatical, statistical, and semantic technologies will need to be combined to make this happen.

Rich Demillo

Hewlett-Packard
New Foundations for Trust and the Web

Much of Rich’s talk worked towards talking about how more hardware, firmware, and software trust will be atainable with the Itanium processor, but he had some interesting numbers. He claims that more content will be created in the next three years than in the past 40,000 years (three years from now, there will be 57 exabytes of content, 3eb created in 2002, 6eb in 2003, and 12eb in 2004.) Also, much of the backbone traffic increase between 1996 and 2000 was business-to-business.

On the topic of trust and the web, he feels that a chain of trust must be established to guarantee that hardware and software can’t be corrupted. Each stage must have checks, since the chain of trust is only as strong as its weakest link, and those checks must be embedded into technology rather than patched on as an afterthought.

Pamela Samuelson

UC Berkeley
Towards a New Politics of Intellectual Property

Pam said that little attention was paid to copyright laws in the past because they mostly regulated public and commercial activities, rather than private and noncommercial ones. However, computers and networking changed the ease and cost of copying and distribution, and digital copies are prefect ones.

The widespread view among the public is that private copying is OK, since it’s only one copy that doesn’t hurt anyone. But the industry seems to have quite an extreme view of it, ranging from some of them wanting to crack down on a few people and fully prosecute them to set an example, to the desire to have general-purpose computers banned.

The latest view among the music industry is that every access in digital form requires making of temporary copies in RAM, and those copies should also be subject to the copyright laws.

Pam concluded with her view that a new form of copyright politics is needed, since over time copyright industries have gotten far more control than they need. Copyright law now affects everyone directly, so past and current politics are no longer applicable.

Panels

Web Engineering

Yogesh Deshpande, University of Western Sydney (Moderator)
Bebo White, Stanford Linear Accellerator Center
Paul Dantzip, IBM Research
Athula Ginige, University of Western Sydney
David Lowe, University of Technology, Sydney
Martin Gaedke, University of Karlsruhe

This panel tackled the question of whether web engineering is worthy of being its own field. Some panelists thought that it is, stating that the work of web engineers have important aspects such as the criticality of the applications and their on-time requirements.

All of the panelists thought that web engineering was worthy of being separate from software engineering, especially because of the need to draw from other disciplines.

XML and Databases: Fad or Disruptive Technology

Michel Rys, Microsoft (Moderator)
Paul Cotton, Microsoft
Andrew Eisenberg, IBM
Dana Florescu, BEA
Mary Fernandez, AT&T
Don Deutsch, Oracle

This was a rather lively panel, but instead of debating whether databases and XML were a good match, the time was spent talking about whether SQL as it is works well with XML. Most of the panelists thought that SQL is a good back-end datbase for XML content (and there were some that thought the reverse was true, too.) However, one panelist thought that a new kind of database is required, and that it’s a good time to rethink the DB field.

It was agreed that one existing problem is the movement and translation of data between XML and SQL.

Mary Fernandez mentioned in passing that her company uses compression with their data, and that sparked a conversation. Her contention is with the large amount of information the company exchanges, there has to be at least some compression for the networks and systems to not become completely buried. Another panelist brought up that compression makes the system not portable. One thing they didn’t mention is that field and attribute names can be made quite short, which can help reduce content size.

Do Web Measurements Measure Up

Balachander Krishnamurthy, AT&T Research Lab (Moderator)
Marc Crovella, Boston University
Jeffrey Mogul, Compaq (HP) Western Research Lab
Eric Siegel, Keynote Systems
Andrew Certain, Amazon.com

It’s well-recognized that it’s very hard to get good information about usage of the web, and the panelists had different ideas on how the problem can be mitigated.

The web measurement efforts struggle with size (there are too many transactions to measure), complexity (many protocols and systems, and proxies complicate measurements), and change (workloads change faster than measurement tools and algorithms can keep up.) People still have an interest in measuring to help enforce service level agreements, to find and fix problems quickly, to improve site revenue and load capacity, and to provide research data.

Another related topic was cache performance, and Marc Crovella thinks this is due to the fact that web pages follow Zipf’s power curve: the vast majority of requests are small, and large objects are requested very rarely, while the number of small requests is exponentially large so it’s very difficult to cache them effectively.

Jeff Mogul comes from the research side of web log analysis, and has concluded that it’s too difficult to get companies to send web logs (among other reasons, the companies are worried about privacy of the content as well as giving away internal information about purchases and products.) He’s concluded that it would probably be easier to provide code to the vendors and receive the aggregate totals in return.

Eric Siegel had an interesting anecdote where an ISP blindly agreed to a service level agreement to a customer with service which they could never meet, so the customer got a lot of free service.

In the end, lots of ideas were tossed around, but it doesn’t look as if log analysis will get much better in the near future.

W3C Activity

The W3C usually has their own track where they present reports on current activity. One thing which is apparent is that the different working groups are working more closely together than ever to help standardize and modularlize their work. Some of the topics:

XML Core

The primary goal of XML 1.1 is to make sure that XML 1.0 documents will continue to work. It will provide the ability to specify a sequence of operations on a document (for example, first validate the schema, then do an XSLT transformation.)

DOM Level 2 and 3

Level 2 work items for the Document Object Model include enhancing the core and events, as well as adding abstract schemas. DOM level 3 will work on keyboard events, grouped events, and improving user events. The load and save features will be enhanced (including adding options for either synchronous or asynchronous loading and saving.)

XSL and MathML

XSLT 2.0 will have added complexity, but much of that is due to the optional typing information – there will be no fundamental change in processing or in the data model. The typing information does make XSLT code longer, but it’s also much more explicit and easier to predict its output. However, XSLT 1.0 shorthand will still be accepted. While there will be small incompatbilities (which makes XSLT 2.0 compatible with XQuery) only the edge cases should be affected.

MathML is an XML language used to express mathematics. There are two ways to use MathML, one which uses an infix-style notation for equation layout, and the other which uses prefix-style notation for interchange and storing of equations.

CSS: Targeting More Media

Current developments for CSS are profiles for different devices (such as handhelds and TV.) CSS3 will be multi-modal (for example, can have sounds during display), have better print options (columns, headers and footers, footnotes, and cross-references), and support for non-western typography.

SVG

SVG

GIF

Scalable Vector Graphics were demonstrated in one session, and it’s looking very good. Adobe already has SVG support in many of its applications, as well as a plug-in which can be used on many browsers. The simple example on the right (presuming your browser has loaded the SVG viewer) shows that the text and ellipse scale (along with the drop shadow) when you zoom in. I was able to create it with Adobe Illustrator with just two objects (the ellipse and its drop shadow effect, and the text on a path.)

While the size of the SVG file is larger than the equivalent GIF (16K vs. 4.5K) the GIF is not only non-scalable but has a limited color pallete. it’s hard to tell unless you look closely, but the drop shadow exhibits some mottling in the GIF, where it’s smooth on the SVG (at least on a display with more than 8 bit color.)

Not shown in this demo are the abilities to interact with SVG files or load from data stores.

During the Mobile SVG talk they showed screenshots of an SVG player on Palm and PocketPC devices.

Other

Fire knife dancer at convention banquet

As with past WWW conventions, there was a high percentage of international attendees.

XML isn’t as large of a buzzword as in past conferences, but it’s been replaced by uses of XML (XPath, XQuery, SVG, and MathML were mentioned quite a bit.) There were even a couple references to ChessGML, which describes chess games in XML. There are viewers where you can see the game on a board move-by-move, or create a PDF with each move.

Apple supplied the wireless connectivity, and the coverage was complete without any problems with the number of DHCP addresses.

While at ApacheCon I’ve found many PowerBooks, they were fewer at WWW 2002. In fact, there were more older PowerBook G3 laptops than the titanium PowerBook G4s. Tim Berners-Lee used a G4, saying he especially liked it since it reminded him of the NeXT he used to develop Mosaic.

To attest to the Hawaiianification of the attendees, I saw far more people wearing sandals towards the end of the conference than when it started.