ApacheCon 2001

April 4-6, 2001
Santa Clara Convention Center
Santa Clara, CA
Frank Fujimoto

General

ApacheCon continues to grow and focus more deeply on more topics, even though the number of attendees doesn’t seem to grow very quickly. The 2000 conference included more talks about commercial products, but the organizers attempted to have no talks strictly based on commercial products for this time.

Apache 2.0 Beta was released during the conference.

Many Titanium G4 PowerBooks were being used, and I saw at least as many Linux laptops as Windows.

The conference organizers managed to get fairly good wireless connectivity set up, mostly through AirPort base stations which were brought in by conference delegates.

Cell phone ringing was quite prevelant and distracting. While reading from his latest book, David Brin finally stopped to ask that a phone be turned off which had rung several times.

Below are summares from some of the more interesting talks which I attended.

Apache-Specific Talks

The State of Apache

Roy Fielding

The current focus of the Apache Software Foundation is on the Apache 2.0 release, although there will be some minor 1.3.x changes as necessary.

Roy likens open source as shared custom software. More important than the code, however, is the open source development community, even though the code is needed to form the community.

With Apache, all work is done by someone who sees a need for something and goes ahead and does it. This way, the people who are writing code have an interest in getting it to work.

Apache 2.0

Ryan Bloom

Apache is becoming much more stable with version 2.0; the Windows port of the 2.0 beta is considered more stable than the 1.3 released Windows port.

Some of the new features with 2.0:

Multi-processing modules
These modules allow the site administrator to select whether to use multiple children (as Apache 1.3 does), or a smaller number of children with multiple threads. In addition, these modules can be replaced with ones which are tuned for specific operating systems (BeOS, Windows, OS/2, etc.)
Hooks registered at runtime
With all pre-2.0 versions of Apache, module hooks were statically declared at compile time. With 2.0, modules register hooks in which they are interested at startup, which means that it’s easier to add new hooks without recompiling all preexisting modules. In addition, modules can specify where in the call-back chain they should be called (first, last, or somewhere in the middle, or before/after other modules by name), and this can be done on a per-hook basis.
Modules helping each other
Modules can use services offered by other modules. For example, the Server-Side Includes module can call a CGI by invoking that service from the CGI module. This way, less code needs to be replicated in separate modules.
Filters
Apache now has the ability to set up chains of filters, so the output of one module can be processed by another. As an example, CGI output can be passed through the Server-Side Includes module for more processing. I also attended Ryan’s talk about the technical aspects of filters; this functionality was implemented in a very general, expandable way.
Apache Portable Runtime
This abstraction layer makes it much easier to port Apache to other platforms. The APR will be available as a separate product, for use in other applications. I attended Christian Gross’ talk about the APR, and it seems the Apache group did a good job in making a consistent API.

With Apache 2.0, the SSL module will be implemented as a filter. One new feature will be that name-based virtual hosts will be able to use SSL. With pre-2.0 versions, this is not possible because the server must present its certificate before the HTTP request (containing the desired virtual host) is read. Apache 2.0 will support the TLS function of starting a connection as cleartext, and then switching to encryption; the HTTP request is read before the server certificate is presented to the client. This feature will require client support, and no current browsers have this ability.

WebDAV and Apache

Greg Stein

Greg sees WebDAV as an enabling technology which turns the web into a writable medium. He believes DAV has benefits for many people:

User (web surfer)
Document metadata becomes available, such as the author and publishing date. Also, things such as more intelligent directory listings become available.
Author (content provider)
DAV presents a standard means of publishing content, moving it around, and tagging with metadata. In addition, it provides collaborative tools, such as protection from overwriting in group scenarios.
Web administrator
DAV doesn’t need to mirror the directory structure, so setup is much more flexible. Also, all authentication and authorization is done via HTTP, so any non-standard mechanisms (in our case, pubcooke) will also work. In addition, since it’s built on HTTP, DAV works through proxies and firewalls, and can take advantage of encryption which is already available.

With the mod_dav implementation for Apache, available features provide infrastructure for collaboration (such as per-resource locking, lock timeouts,) metadata, and namespace management (allowing different directories to be logically mapped onto one DAV server.) Future features include version control, content/metadata searching, and finer-grained access control.

Microsoft is using DAV in many places, such as to download mail from HotMail. In fact, it’s possible to configure a DAV client to do the same.

Many open source tools are available to work with DAV (sitecopy, which uploads a local site to the DAV server; cadaver, which offers a command-line ftp-like DAV client; Nautilus, a Gnome file manager, making DAV trees available in a GUI; SubVersion, a DAV-based CVS replacement), as well as commercial clients (Dreamweaver 4, Adobe GoLive). DAV support is showing up in newer operating systems, too – Whistler can map URLs to filenames, which would include DAV repositories, and Mac OS X will allow users to mount a DAV repository into the filesystem.

Other Talks

AxKit

Matt Sergeant

AxKit performs server-side XML transformations, and is implemented in Perl. For the XSLT-specific transformations, the functionality is very much like how I’ve already implemented XML support for our servers. However, AxKit is very module-oriented, and content can come from many types of sources (XML, extract from Word, etc.), as well as be presented in many different ways (printable, HTML, graphics-light, etc.)

The reason that AxKit was implemented on the server side is because of the lack of browser-side XSLT implementations. To help reduce server loads, pages are cached, based on the URI and other components (such whether printable or paginated HTML) and all requests with POST data are never cached.

Why Logging Is A Complete Nightmare

Tony Finch

Tony was unable to make this presentation, but it was done by a colleague of his, Dirk.

There are many reasons why companies wish to log access information, such as collecting statistics on visitors, performance analysis, optimizing site usability (to see what pages users are visiting,) and debugging. However, there are many reasons why logging can slow down a server. Among them:

Reverse DNS lookups can add up to a half second per request. In many cases, it’s better to just log the IP address and do the DNS lookups later. However, don’t wait too long, lest the DNS data change.
Writing to disk can be slow. Tony believes that the most throughput you’ll get on a typical disk is 100 to 200 operations/second. By making sure you log to a completely separate disk, you’ll not be competing with accesses to content pages. Also, by co-mingling data into one file and splitting out later, you reduce disk seeking.
Many objects are logged. Some sites can increase performance by only logging page accesses, and not logging images or other auxilary objects.

During the Q&A portion, people wondered about using syslog or feeding the log data directly into a database. Syslog is an unreliable mechanism, so not only would you not know if all the entries got logged, you wouldn’t know how many got dropped. Logging to a database could end up being slower than spooling to disk; better to write to disk and then feed into a database at a later date.

Keynote Speakers

Bill and Larry: Both Right, Both Wrong

John “maddog” Hall, Linux International

According to John, Bill Gates sees computing as the PC, fat clients, and .NET for delivery of applications. Larry Ellison sees thin clients plugged in everywhere, and the data is kept and computing done on large servers with databases. In the meantime, Scott McNealy says, “The network is the computer.”

As do many pundits, John likes to make predictions. However, he claims that many of his observations have come true. Some of them:

1988: “I don’t understand the difference between workstations and PCs.”
The engineers he managed were never able to tell him how workstations were fundamentally different than PCs that were available at the time; they finally settled on “They run UNIX,” but later realized that SCO was available on Intel systems.
1989: “Software will come in cereal boxes.”
When John first saw a CD-ROM, he thought it would be a better way to distribute software than on TK50 cartridges. Several years later, a friend of his called, telling him he found a CD-ROM in a cereal box.
1994: “It is inevitable.”
When Linus Torvalds first showed Linux to John, he felt that it would become a big thing.

The way John sees the future is that the web will become a supplier of information at very low prices; this includes things like the weather and financial information. This information will be cached on the home computers, along with family calendars and other data.

John also feels that many items will become “intelligent” and networked together wirelessly. When you bring home a new device, it will automatically register with the other devices in the home, along with taking an inventory of other items with which it can communicate. The central computer will be remotely managed, since most people wouldn’t want to manage it themselves.

The home office will become like a Star Trek captain’s ready room, in that almost all information one needs will be available there. Continuing with the Star Trek theme, personal holosuites aren’t too far behind reality, since they can be seen as a logical step beyond what is now known as a media room.

Open Source and the Corporation

Lee Nackman, IBM

IBM has been heavily investing in Open Source, Linux and Apache in particular. They believe that Open Source projects need a process, and is no longer just for mavericks. This process helps participation, and is a means for providing developer training.

Some of the benefits of open source: It’s what the customer wants (quality is high, price is low, and it reduces dependencies on single vendors,) and proprietary infrastructures are reduced.

IBMs contributions to open source are not only code (they contributed to the Linux kernel, JFS, system management tools, and hardware support; Apache contributions include working on Jakarta and XML tools) but by doing things such as sponsoring conferences and offering legal support (an IBM lawyer helped to set up the bylaws for the Apache Software Foundation, and then needed to bow out once it was formed, since he was still an employee of IBM.)

The face of open source has been evolving, and will continue to change. For example, open source has gone past just being “cool to techies,” and is now being accepted in the corporate world; effective standards have been helping with this. While it is threatening conventional business models, open source is providing new areas of potential growth for businesses.

Probing for Quicksand: How We Peer a Bit Ahead, Into Tomorrow’s World

David Brin

No, David Brin doesn’t look that bad in person; I had a hard time getting a good picture. Also, since he’s an author who is well-known for taking a long time between novels, it didn’t surprise me that he went beyond his allotted time.

One of Brin’s themes is that people like to think that things are worse than they are in the big scheme of things. For example, while he agrees that schools today do need improvment, many if not most students end up going to college, which is a big improvement over past generations.

Another theme is that today’s society is unlike any other in history. Previous civilizations were designed as a pyramid, with the elite few trying to retain power and keep that power from the majority of people on the lower parts of the pyramid, mostly through witholding iformation. Brin sees today’s society structured as a diamond, with most people in the middle, and everyone trying to rise on the diamond. Since information is much more easily available, people do move up, but since everyone has the opportunity to rise in the diamond, then the whole diamond is rising.

According to Brin, we’re heading into an age of amateurs. His reasoning is that the economy will be driven by people’s hobbies.