Skip to content

The Web Is Just Text

In the beginning (around 1990 or so), the World Wide Web was composed of three parts:

  • HTML, a markup language for displaying rich documents
  • HTTP, a stateless protocol for communicating with a web server
  • The URI, a short, standardized reference to a network resource

Almost 20 years later, very little has changed, especially from the point of view of the server. To the list above, we have added client-side stuff like JavaScript and CSS, and HTML is now at version 4. But HTTP and the URI are close to unchanged.

And all three of the items in the list above are just text. Not just any text, mind you, but text which is formatted in such a way as to be mostly easy for humans to read.

This means that what web servers do is very simple. They accept text over a text-based protocol and return more text.

Despite this, people occasionally seem to find web servers confusing, in part because we like to lie about what they do. For example, you might see a URI which looks like this:

http://www.example.com/foo.php

This is a lie. If you point a web browser at this URI, you will almost certainly not receive a PHP source file; instead, you will probably get HTML. That the HTML was generated with PHP source code is an implementation detail unsuitable for inclusion in a Uniform Resource Identifier.

Some people might think I’m being pedantic here. How many end users actually care what the extension in the URI is? These are probably the same people who post questions on Stack Overflow wondering why their web application doesn’t work, but not including in the question the only things that the server really cares about, which you can find in the list at the beginning of this post. It doesn’t matter if you generated your HTML using aspx, rhtml, or PHP. It’s still just plain HTML to the browser. When you submit a form, you’re submitting name/value pairs of strings, not "integers" and "dates" and "classes."

That’s why this matters to a programmer: Web applications, as a class, are extremely easy to debug. Tools like Firebug and Fiddler allow you to see the text that is exchanged between the browser and the server, and even pretty it up a little to make it more readable than it already is. But already built-in to your web browser is the single most important tool that you need for debugging a web application; namely, the "View Source" command. Chances are very good that if you look at the HTML, the HTTP, and the URI, you can solve whatever issue you’re having with your web application. But if you can’t, including whichever of these three is relevant to the problem at hand will dramatically increase the chances of getting a solution when you ask someone else to help you out.

Perhaps it’s the fact that we’re used to debugging within an IDE which causes people to look at, for example, aspx source code instead of HTML when a web page doesn’t render the way we’d like, or to debug through model binders instead of looking at a form submission when the input to a controller action is not what we expect. But remember, the browser doesn’t see your aspx or PHP. The browser just sees the HTML you generate.

Perhaps another reason why this tends to confuse people is that certain web frameworks, such as ASP.NET, introduce fictions such as "event handlers" and "postbacks." No such concepts exist within HTTP. Instead, they are (extraordinarily leaky) abstractions built on top of HTTP within a certain framework. But under the hood, there is still just HTTP. If you want to understand the communication between your web application and a browser, look at the HTTP and the HTML.

I’m going to close this post with a recommendation: If you’ve never visited a web site via telnet, go do it right now. It only takes a second, and you’ll learn a lot in the process. This is an easy way to understand at a very fundamental level how web browsers and web servers talk to each other. If you need to debug your web application at a client site and can’t install Fiddler, you can probably use Telnet instead.

This is the first article in a series on the dynamic web.

{ 1 } Comments

  1. Jan Derk | February 17, 2009 at 12:57 am | Permalink

    Thanks for pointing out Fiddler. It is a very handy tool.

Post a Comment

Your email is never published nor shared. Required fields are marked *

Bad Behavior has blocked 713 access attempts in the last 7 days.

Close