An Overview of the Web

November 22, 2012

Written as a Thanksgiving present for Kate Compton

OVERVIEW

A device you access the Internet with is a computer. Even if you think of it as your phone or your tablet or your laptop or your desktop, all of these things are actually computers. Your computer, when connecting to the Internet, is called a “client”. Your client, running a program that acts on your behalf (called a “user agent”) sends messages to another kind of computer called a “server”, that does its best to answer your client’s questions. The language the user agent uses to talk to the server is called a “protocol”. When you’re requesting a web page, your user agent is your web browser – such as Firefox, Chrome, Safari, or Internet Explorer – and this user agents uses the HyperText Transfer Protocol (HTTP) to ask questions of a “web” server and receive a response. There are other protocols, like the File Transfer Protocol (FTP), the Simple Mail Transfer Protocol (SMTP, used for email), Skype, and many others. Many downloaded multiplayer video games use their own special protocols to talk to game servers. Let’s dig in a little more into what happens when you fetch a web page.

Your computer is connected to the Internet most commonly via WiFi or the cell network. Your computer has an antenna inside that communicates with a nearby special computer called a “base station” that also has a similar kind of antenna in it. Your computer says hello to the base station and asks for an Internet Protocol (IP) address using a protocol called DHCP; the base station responds with an IP address assignment, let’s say 10.0.1.102, and some other information, including a Domain Name System (DNS) address. Your computer is now ready to try and send requests to a server over the Internet.

When you type in an address into your web browser, like “news.ycombinator.com”, your computer first needs to look up the IP address for that server so it can send the request. To do this, it sends a DNS request to the DNS server returned by the base station. In this case, the base station looks up what server is responsible for all of .COM, asks that server who’s responsible for Ycombinator.com, then asks that server what the address is for news.ycombinator.com. Having obtained the answer, the base station then responds to your client with the IP address for news.ycombinator.com, which is 174.132.225.106. Most IP addresses look like this: four numbers between 0 and 255. You can also force your client’s operating system to use a particular name server – OpenDNS (208.67.222.222, 208.67.220.220) and Google (8.8.8.8, 8.8.4.4) both offer high-quality free DNS resolution to the public which is often much better than you’d otherwise get and allows for some protection against malware and phishing.

(SIDEBAR: There is an updated version of IP called IPv6 that uses sixteen numbers, but it’s not very broadly used yet – because we’ve taken so long to adopt IPv6, we’ve basically run out of IP addresses in the current scheme, IPv4.)

The server needs to know what protocol we are trying to use to speak to it, so we pick a “port number” that indicates that we’d like to talk to the server over HTTP. HTTP uses port 80. (That number is somewhat arbitrary but is assigned by an international committee on a per-protocol basis.) HTTP sits on top of a lower-level protocol that makes sure that information arrives in the proper order and can recover from interruption – this lower-level protocol is called the Transmission Control Protocol, or TCP. TCP uses a “three way handshake” to make sure that the client and server are actually talking to each other before any higher level communication occurs. It starts with the client sending a hello to the server’s IP and port – this greeting is, for historical reasons, call a synchronization packet, or SYN. If the server is reachable and is running a program that is listening on the given port, the server will attempt to respond to the client with an acknowledgement, called a SYN-ACK. Finally, the client lets the server know that the SYN-ACK was received successfully with an ACK. At this point, the client and the server are now ready to begin communicating and the connection is said to be “open”.

In HTTP, the client speaks first (this is not the case with all protocols!). The client starts with saying what kind of request it is making, the request itself, the version of HTTP it is trying to speak, and several other request headers. In some cases, such as after submitting a form or when uploading a file, there’s also data attached after the headers. These request headers include the name of the user agent, any cookies associated with that website, and other information.

The server then issues an HTTP response starting with a response code – a three digit number that indicates success, a request for a redirect, or different kinds of errors. You may have seen “404 errors” on the web before – that is the code to indicate that a resource could not be found on the server. 200 is the code that indicates that the request was successful and the response will include the result. (There are many other codes.) The server response also has headers and then data.

To save time, the client is then free to make another request to the server over this connection without having to complete another TCP three way handshake. This is called “keep alive”. If, after a certain period of time, the client hasn’t yet made another request, the server will close the connection (by sending a TCP FIN packet to the client).

WEB SERVERS

Now let’s take a look at that program that runs on the server that listens for HTTP requests. The server must be able to piece together the request, figure out what to do, and transmit a response. If the request is for a file, say an image or logo, this is pretty straightforward – we check to see whether or not the file exists. If the file doesn’t exist, we send the client a “404” and we’re done. If the file does exist, we send the client a “200”, the length and type of the file, and then the file itself. This is called a “static resource” – anyone who requests that image will get the same image. More difficult are responses that are different depending on who asked – for instance, going to www.Facebook.com results in different information being displayed depending on whether you or your friend is currently logged in. Some code logic separate from the basic mechanics of a web server needs to evaluate “hey, is this person currently logged in, is it valid, and what information am I going to need to show to them?” This code is called a “web application”. Web applications can be written in many different programming languages, but it is popular to write them in environments where it’s harder to write bugs that can crash your server. (Server crashes can not only make your service unavailable they can sometimes be used to break into your server and steal information.) This is called a “managed” environment.

People get into religious wars about what server programming language is best to use, so suffice it to say there are several choices, each with their attendant pros and cons. In my experience, there are three general camps: the first is .NET programmers, who write in a language called ASP.NET or another called C# (pronounced “See Sharp”). C# in particular is viewed as a very nicely designed language but it only runs well on Microsoft servers, which in turn more or less require all of your servers to be running on Microsoft. This is not very popular in Silicon Valley.

The second camp is Java programmers. Since Java has been around for over 15 years now, there’s a lot of history and a big community around the language, with many different and sophisticated techniques for programming server applications. Culturally, however, it’s rare to see hip startups using Java – most of the Java code I see these days is at larger and enterprise-oriented companies. Part of this is that there has been time to develop careful engineering methodologies for building and testing Java code using large teams.

Finally, there is the camp of those who are using interpreted server languages, such as PHP, Ruby, Python, or Javascript. I’ll break each of these sub-camps down, though I should note that these camps share more in common with each other than they do with Java or .NET.

A decade ago, a language called PHP was pretty popular, and it is still in very wide use. Facebook, Wikipedia, Yahoo, and Zynga use PHP to write most of their server code. There are a lot of people with PHP programming experience, so it’s pretty easy to hire them or find code examples. That said, there are many people who really dislike PHP due to inconsistencies within the language (which grew very organically from cramming many different libraries together) and find it inelegant and difficult to use in very large projects.

A very clean and sophisticated language called Ruby was created way back in 1995 by a Japanese programmer, but it didn’t become very popular until 2005, when a Chicago design agency published as open source a clever way of writing clean, powerful web applications in Ruby. They called this set of code and libraries “Rails”. (A combination of code and libraries that offer a way to program an application is called a “framework”.) Rails introduced a number of concepts that make it fun and easy to program. Many of these concepts have since been copied into other languages and frameworks.

There is another very nice programming language called Python that is very popular inside Google. It is a nice mix of being relatively easy to learn but sophisticated at the same time – many of the smartest nerds I know prefer to write their most interesting code in Python.

Finally, it has become very popular in the last two years to write server applications in Javascript, using a framework called Node.js. Javascript is actually a totally different language from Java. It has no relationship to Java at all! Code for Node.js is written in a special way that forces you to not cause the server to twiddle its thumbs needlessly waiting (called “blocking”). This is very different from how applications in many other environments work, where it is very difficult to avoid accidentally blocking. As a result of this, it’s possible to write code that serves tens of thousands of people at the same time on a single machine, whereas that is hard to do with e.g. PHP.

DATABASES

Once you have written your application, you generally need to be able to store data somewhere. If people can sign up for your service, you’ll need to save people’s username, password, profile picture, and basic information. The place where you keep this information around is called a “database”. It used to be very popular to use one particular kind of database called a “relational database”. Almost everyone in the startup community used a database called MySQL, though some (like Instagram!) used another one called Postgres; people at larger companies used relational databases from IBM (DB2), Oracle, or Microsoft (SQL Server). There’s also a very tiny SQL database called SQLite that’s easy to embed inside other programs or for very lightweight needs and is popularly used in that way.

There is a Standard Query Language (SQL) that one uses to talk to a relational database. You can INSERT information into the database, or REPLACE or DELETE it, or if you’re looking to ask it questions you can SELECT information from it. Information is stored in TABLEs that you can think of as Excel sheets – there are labeled columns and any number of rows of information. You SELECT what columns you want to fetch (or * for “all of them”) FROM which table and then WHERE certain criteria are met. Let’s say you wanted to get a list of all of the names and email addresses of the thirtysomethings using your service so you could send them an email about how to avoid getting gray hair. You’d “SELECT name, email FROM users WHERE age >= 30 AND age < 40”. Fun fact: the airport in San Carlos, CA – just south of Oracle – has the airport code SQL.

There are other ways to store and manage information, however. Sometimes you don’t need something as fancy as a relational database. For instance, you might need to check whether or not a user is currently logged in – you need to be able to check this quickly or every page on your site will be slow. You might store a “sessionID” and when it is “validUntil”. When a client presents you with a sessionID, you just need to check to see whether you know about the sessionID and whether it’s still valid. If it isn’t, you send the user to a login screen. Because you want to be able to check this information very quickly, it could be a good idea to store it in memory without writing it to a hard drive – memory is often thousands of times faster than hard drives. Of course, if the power goes out, after the server reboots everyone is going to need to log in again, but this is not a really catastrophic failure. This kind of setup is served with an “in-memory key-value store” such as memcached or Redis. A database like MySQL spends a lot of its time “parsing” SQL, but a simple database like memcached can use a much simpler protocol and therefore run much faster than talking to MySQL via SQL. Consequently, databases that use simpler protocols than SQL are popularly called “NoSQL”, also implying that the database is likely not relational.

Also popular are “document oriented databases” which have much less rigid definitions around how the data should be stored and organized. MongoDB is one of the most popular of these, though CouchDB, and Cassandra are both in common use. The kind of flexibility afforded by document oriented databases can make it easy to rapidly develop sophisticated applications, though there have been complaints about how some of these databases perform when under very heavy load (e.g. if your website suddenly becomes very popular).

CLOUD

All of this above was written assuming that you are running your own server. This used to be pretty common to do. The most popular kinds of servers look kind of like extra large pizza boxes. They are exactly 19″ wide, little more than an inch tall, and quite long. They were this funny shape so you could stick a lot of them on top of each other, bolting them to vertical rails 19″ apart. You’d put a rack full of servers in a place with a really fast Internet connection where other people would also have racks full of servers. Since you are all putting your servers in the same place, these places are called “colocation centers”. If you are a really enormous Internet company, you need so many servers that it makes sense to build your own facility, or “datacenter”. Apple, Google, Microsoft, and Facebook have all built their own datacenters, generally where electricity is cheap and plentiful, like near hydroelectric facilities.

But it’s hard to manage your own servers, particularly if you suddenly need a lot of them. So a few companies allow you to quickly ask to have access to new servers, use them for a bit, and then stop using them, and only pay by the server-hour. This ability to rapidly scale up and down the amount of computing power you need is called “elastic computing”. Amazon has the most popular cloud offering, but many other companies offer elastic compute services, including Microsoft and Rackspacetags.

These elastic compute services are still quite low-level, however. You usually start off with what appears to be a very bare operating system installation and have to install all of the programming languages and modules and environments you are going to need to actually run your application. While in reality there are many of these “virtual machines” running on a single server, a special operating system called a “hypervisor” manages running all these operating systems and makes sure that they can’t see what each other are doing. You’ll generally get a randomly assigned IP address for your instance – if you want to build a service that’s exposed on the web, you’ll probably have to register at least one static IP and point the name you want at that IP. This usually costs a very small amount extra.

It’s surprisingly hard to usefully add dozens of servers to speed up your service. If an application is performing slowly because it is trying to write to a database a lot, adding more computing power is not going to help it. Surprisingly, even adding more database servers won’t necessarily help, since if you’re bottlenecked on writes and all the servers need to stay up to date, they’re all now going to be bottlenecked on writes. You need to “partition” the problem, so that some writes go to one database server and others go to another. Figuring out how to smoothly partition data in a way that you can easily add and remove servers to make your database more or less powerful is a bit of an art. As a quick sidebar, because these are intellectually exciting challenges, many people immediately want to jump in to figuring out how to make an application “scalable” without first ensuring they are building a popular application – i.e. one that will need to scale. An old saying is that “premature optimization is the root of all evil”. Instead, scale your solution as you need. Two great reads on this are the LiveJournal presentation about how Brad Fitzpatrick scaled LiveJournal from one script run on a shared server to a sophisticated multirole cluster (building now-vital parts of Internet infrastructure like memcached and mogilefs in the process), and the Instagram presentation about how two guys with no backend experience built a service used by tens of millions without hiring very many people.

Some services like Heroku and CloudFoundry take care of some aspects of scaling for you, helping make sure that your operating system and software is up to date and secured and that you can seamlessly “dial up” and “dial down” computing resources without having to be a whiz. Naturally, these services are more expensive – many themselves run atop elastic compute services like Amazon’s.

BROWSERS

Now back to the client. We’ve got our data from the server that we asked for…what now? On the web, the data that we’re expecting to get back for a web page is a text document formatted as HyperText Markup Language. On most web browsers you can “View Source” as easily as right-clicking on a web page to read through the document that the server returned to you. This document contains a header and the body of the content. By default, whatever is in the body of the document will be displayed in your web browser as text. With HTML you can “tag” certain chunks of text. Tags have a beginning and an end. If a web browser has no idea what a given tag means, it will just ignore it. Tags are generally structural (here is a new paragraph), images (display a picture that you should get from this location), stylistic (make the following words bold), or script (run the following Javascript program in the browser).

<html>
 <head>
  <title>A Web Page!</title>
 </head>
 <body>
   <h1>Hello!</h1>
   <p>This is a <b>web page</b>, wow.</p>
  </body>
</html>

The web browser parses the document “tree” (tags can be nested inside each other) and all of its tags and stores it in an in-memory database of its own, called the Document Object Model (DOM). The browser then figures out how to style the information to be presented in the DOM and then renders it to the screen. Browsers also know how to run Javascript code given to it by the server. This code is run in a very limited environment – for example, it can’t look at what’s on your hard drive and it stops running immediately as soon as you close the web page. Modern web browsers have competed vigorously on how quickly they can run Javascript code and the result has been an astonishing improvement in only a few years. Very sophisticated programs can now be run in the browser that would have been unthinkable only five years ago. Browsers continue to rapidly grow more sophisticated, in the last two years allowing access to web cameras, phone accelerometers, GPS, and 3D hardware. The code itself can be included inline in the HTML using the