Web Standards Do - the Way of Web Standards
Sunday, March 30th, 2008One of the things which made the web so popular since its first days was its easy access: HTML was simple. Anyone could write a web page. This is still true, to some extent, and thanks to a number of Web Authoring tools, or services such as Wikis, Blog software and CMSes, anyone can create a Web Page. But the Web technologies got richer: CSS, scripting, the DOM, SVG, widgets… From this increased richness and complexity rose a new group of people: the Web Professionals.
For the outside eye, Web Professionals are pragmatics, knowledgeable of technologies. They know Web Architecture, their bedside reading are the W3C Specifications. But the insiders know that the Web Professionals are a highly dedicated and disciplined cast, following age-old teachings of the Web Standards 道 - the Way (or Tao) of Web Standards, striving to achieve the seven virtues.
Disclaimer: this article is a humorous look at principles of Web quality, viewed through the filter of the Bushido, the Samurai’s code of Honor. It is a companion to a talk given at the Days of Web Standards Conference, in Tokyo on July 15th, 2007. The metaphor should be taken with a smile, the principles of Web Architecture it showcases, seriously.
The seven virtues of Web Standards 道
誠 - Honesty: Use Semantic Markup for profit
Most professional creators of Web content will certainly cite “valid markup” as one of the things they
care about a lot when working on the Web. This is an apt goal, one that brings a lot of benefits: among other
benefits, it makes content more portable across platforms, and easier to style consistently.
But there are benefits to using HTML to its full potential way beyond validity. HTML is a well-structured
language providing meaning to its different elements, and making good use of semantics can reap a lot of benefits.
For example:
Using semantic elements instead of styling unstructured markup (e.g using headings instead of bold text) will make the content easier to index by search engines, and thus easier to find on the Web.
Declaring the language of a document or a block (e.g ) will allow tools and external services to know that your content is in this language: voice browsers can adopt the proper voice setting; and some services will even automatically provide a free translation of the content.
The construct can be used for smooth navigation in collections of documents. Some browsers will also pre-fetch documents linked this way, resulting in a faster, more pleasant user experience.
Rich markup can also be queried, reused, rehashed: GRDDL can be used to extract and reuse data from documents that use rich markup such as RDFa or Microformats (learn how).
Many web sites invest a lot of time and money building complex APIs to access their information, when often, they could simply use rich, semantic HTML markup: HTML can be a cheap and efficient API.
礼 - Respect/Etiquette: Use HTTP for Content/Language Negotiation
In a social context, etiquette is the art of acting and communicating in a manner appropriate to the context, and taking into consideration who you are communicating with. This virtue can be followed in our usage of the Web technologies: when serving Web content, it is important to take into account the capabilities and preferences.
That does not mean browser sniffing, which is the act of serving different content based on the detection of such browsing engine or other. Instead, HTTP provides mechanisms for a user agent to declare what types of content are supported and prefered (a feed reader, for instance, will claim a preference for the Atom format, while a graphical browser will typically prefer HTML), what languages are acceptable, and prefered (based on user preferences).
Using this Language-Negotiation technology we can provide a single resource under a single URI, but still serve it in different languages. For example when using the CSS Validator:
Tom, who speaks english as a mother tongue and whose browser sends the Accept-Language: en will see a page in english.
Tomoyo, who speaks Japanese and German and whose browser is set up to send Accept-Language: ja, de;q=0.8 will get the page in Japanese.
Finally, since Tommi speaks fluent English but who prefers Finnish, his browser will send the accept-language headers reflecting his preferences: Accept-Language: fi, en;q=0.9. The CSS validator not being available in Finnish, he will receive his second choice, that is, English
Another Benefit? Even though they may all see the page in different languages, Tommi, Tomoyo and Tom can all link to the same resource, and exchange links, and the content will automatically be adapted to them.
To learn more about language negotiation, find out how to set up language preferences in browsers and enable negotiation on a Web server, see The techniques on the W3C Internationalization Web site.
仁- Benevolence: Use Caching capabilities to save time and money
Most Web professionals know and fear this scenario: A Web site is getting some attention. Visitors are flowing to it, users are mashing up its rich, interesting content. But the Systems administrators worry. The servers don’t seem to cope with the load very well. They’ll have to get some budget for a new server, and replication will be complicated. The site becomes slow, hardly usable. Users start to walk away and use the competition, which may not be as cool, but at least, they work. Before the budget for a new server can be granted, it’s too late: the site lost an opportunity to go from cool to successful.
Scalability is a complex issue, and sometimes its problem can not be avoided. But often, they can be avoided altogether, or alleviated, by making a smarter usage of Web technologies.
Smaller page weight can have a dramatically positive effect on page load times: this is one of many reasons to use clean, structured markup and CSS stylesheets rather than bloated presentational tag soup.
But there’s a part of the equation too often overlooked: caching. Images and style sheets seldom change: are you sure your Web server properly tells browsers, proxies and search engines that they are not changing, and yet should be considered “fresh”? Even dynamically generated content has a certain life span, and there are techniques to reflect that in how they are served, to ensure that the server-heavy dynamically generated content will not be requested in vain, when a cached copy would have worked.
This practice is a win-win solution for the server and the client:
For the server, this reduces network traffic dramatically. Large sites can save gigabites of bandwidth per week with a simple caching of static documents, stylesheets, and especially images, videos and multimedia content. Fewer requests also means less loaded servers, and faster response times.
For the client, this simply means faster page loads. Stylesheets and layout images, for instance, are loaded once and for all, making browsing faster, providing a better user experience. Remember the findings of Jakob Nielsen: wait more than a second, and you are already losing users.
How is this done?
Switching on caching in a directory where your images or stylesheet lie, on a server like apache, is as trivial as adding a handful of lines to your configuration or .htaccess. With the Apache server the mod-expires module, if enabled, can take take care of sending Last-Modified, Expires and Cache-Control headers for you.
ExpiresDefault “modification plus 4 weeks”
php scripts are often used to draw content for a database. If the database has a field with the timestamp of the last relevant change, this can be forwarded to the user-agents. For example, Simon Willison’s method: