In part II of this series we looked at HTTP messages. We saw examples of the text commands and codes that flow from the client to the server and back in an HTTP transaction. But how does the information in these messages move through the network? When are the network connections opened? When are the connections closed? These are some of the questions this article will answer as we look at HTTP from a lower level. First, we'll need to understand some of the abstractions below HTTP.
The Series:
Part I: Resources
Part II: Messages
Part III: Connections (you are here)
Part IV: Architecture
Part V: Security
A Whirlwind Tour to Networking
To understand HTTP connections we have to know just a bit about what happens in the layers underneath HTTP. Network communication protocols, like most business applications, consist of layers. Each layer in a communication stack is responsible for specific and limited number of responsibilities.
HTTP is what we call an application layer protocol because it allows two applications to communicate over the network. Quite often one of the applications is a web browser, and the other application is a web server like IIS or Apache. We saw how HTTP messages allow the browser to request resources from the server. But, the HTTP specifications don't say anything about how the messages actually cross the network and reach the server – that's the job of lower layer protocols. A message from a web browser has to travel down a series of layers, and when it arrives at the web server it travels up through a series of layers to reach the web service process.
The layer underneath HTTP is a transport layer protocol. Most HTTP traffic travels over TCP (short for Transmission Control Protocol) in this layer, although TCP isn't required by HTTP. When a user types a URL into the browser, the browser opens a TCP socket by specifying the server address and port, then starts writing data into the socket. All the browser needs to worry about is writing the proper HTTP message into the socket. The TCP layer accepts the data and ensures the data gets delivered to the server without getting lost or duplicated. TCP will automatically resend any information that might get lost in transit. The application doesn't have to worry about lost data, and this is why TCP is known as a reliable protocol. In addition to error detection, TCP also provides flow control. Flow control ensures the sender does not send data too fast for the receiver or the network to process the data. Flow control is important in this world of varied networks and devices.
In short, TCP provides many vital services for the successful delivery of HTTP messages, but it does so in a transparent way. Most applications don't need to worry about TCP. And, TCP is just the first layer beneath HTTP. After TCP at the transport layer comes IP as a network layer protocol.
IP is short for Internet Protocol. While TCP is responsible for error detection, flow control, and overall reliability, IP is responsible for taking pieces of information and moving them through the various switches, routers, gateways, repeaters, and other devices that move information from one network to the next and all around the world. IP tries hard to deliver the data at the destination (but it doesn't guarantee delivery – that's TCP's job). To deliver data IP requires computers to have an address (the famous IP address, an example being 208.192.32.40). IP is also responsible for breaking data into packets (often called datagrams), and sometimes fragmenting and reassembling these packets so they are optimized for a particular network segment.
Everything we've talked about so far happens inside a computer, but eventually these IP packets have to travel over a piece of wire, a fiber optic cable, a wireless network, or a satellite link. This is the responsibility of the data link layer. A common choice of technology at this point is Ethernet. In Ethernet packets become frames, and low level protocols like Ethernet are focused on 1s, 0s, and electrical signals.
Eventually the signal reaches the server and comes in through a network card where the process is reversed. The data link layer delivers packets to the IP layer, which hands over data to TCP, which can reassemble the data into the original HTTP message sent by the client and push it into the web server process. It's a beautifully engineered piece of work all made possible by standards.
Quick HTTP Request with Sockets and C#
If you are wondering what it looks like to write an application that will make HTTP requests, then the following C# code is a simple example of what such code will look like. This application does not do any error handling, and tries to write the server response to the console window (so you'll need to request a textual resource), but it works.
using System; using System.Net; using System.Net.Sockets; using System.Text; public class GetSocket { public static void Main(string[] args) { var host = args[0]; var resource = args[1]; var result = GetResource(host, resource); Console.WriteLine(result); } private static string GetResource(string host, string resource) { var hostEntry = Dns.GetHostEntry(host); var socket = CreateSocket(hostEntry); SendRequest(socket, host, resource); return GetResponse(socket); } private static Socket CreateSocket(IPHostEntry hostEntry) { const int httpPort = 80; foreach (var address in hostEntry.AddressList) { var endPoint = new IPEndPoint(address, httpPort); var socket = new Socket(endPoint.AddressFamily, SocketType.Stream, ProtocolType.Tcp); socket.Connect(endPoint); if (socket.Connected) { return socket; } } return null; } private static void SendRequest(Socket socket, string host, string resource) { var requestMessage = String.Format( "GET {0} HTTP/1.1\r\n" + "Host: {1}\r\n" + "\r\n", resource, host ); var requestBytes = Encoding.ASCII.GetBytes(requestMessage); socket.Send(requestBytes); } private static string GetResponse(Socket socket) { int bytes = 0; byte[] buffer = new byte[256]; var result = new StringBuilder(); do { bytes = socket.Receive(buffer); result.Append(Encoding.ASCII.GetString(buffer, 0, bytes)); } while (bytes > 0); return result.ToString(); } }
Notice how the program needs to look up the server address (using Dns.GetHostEntry), and formulate a proper HTTP message with a GET operator and Host header. The actual networking part is fairly easy, because the Socket APIs and TCP takes care of most of the work. TCP understands, for example, how to manage multiple connections to the same server (they'll all receive different port numbers locally). Two outstanding requests to the same server won't get confused and receive each other's data.
Networking and Wireshark
If you want some visibility into TCP and IP you can install a program like Wireshark (available for OS/X and Windows). Wireshark is a network analyzer that can show you every bit of information flowing through your network interfaces. Using Wireshark you can observe TCP handshakes, which are the TCP messages required to establish a connection between client and server before the actual HTTP messages start flowing. You can also see the TCP and IP headers (20 bytes each) on every message. Below is a screen shot showing the last two steps of the handshake, followed by a GET request and a 304 redirect.
With Wireshark you can see exactly when HTTP connections are established and closed. The important part to take away from all of this is not how handshakes and TCP work at the lowest level, but that HTTP relies almost entirely on TCP to take care of all the hard work and TCP involves some overhead, like handshakes. The performance characteristics of HTTP rely on the performance characteristics of TCP, and this is the topic for the next section.
HTTP, TCP, and the Evolution of the Web
In the very old days of the web, most resources were textual. You could request a document from a web server, go off and read for 5 minutes, then request another document. The world was simple.
For today's web, most web pages require more than a single resource to fully render. Every page in a web application has 1 or more images, 1 or more JavaScript files, and 1 or more CSS stylesheets. It's not uncommon for the initial request for a "home page" to spawn off 30 or 50 additional requests to retrieve all the other resources associated with a page.
In the old days it was simple for a browser to establish a connection with a server, send a request, receive the response, and close the connection. It seemed wasteful to keep a connection open. If today's web browsers opened connections one at a time and waited for each resource to download before starting the next download, then the web would feel very slow. The Internet is full of latency. Signals have to travel long distances and wind their way through different pieces of hardware. There is also some overhead in establishing a TCP connection. As we saw in the Wireshark screen shot there is a 3 step handshake to complete before an HTTP transaction can begin.
The evolution from simple documents to complex pages has required some ingenuity in the practical use of HTTP.
Parallel Connections
Most user agents (a.k.a. web browsers) will not make requests in a serial one-by-one fashion. Instead, they open multiple, parallel connections to a server. For example, when downloading the HTML for a page the browser might see two <img> tags in the page, so the browser will open two parallel connections to download the images simultaneously. The number of parallel connections depends on the user agent and the agent's configuration.
For a long time, we considered 2 as the maximum number of parallel connections a browser would create. We considered 2 because the most popular browser for many years, Internet Explorer 6, would only allow 2 simultaneous connections to a single host (see: How do I configure Internet Explorer to download more than 2 files at a time). Internet Explorer was only obeying the rules spelled out in the HTTP 1.1 specification, which states:
A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.
To increase the number of parallel downloads, many web sites use some tricks. For example, the 2 connection limit is per host, meaning a browser like IE6 would happily make 2 parallel connections to www.odetocode.com, and 2 parallel connection to images.odetocode.com. By hosting images on a different server (even if the DNS records were setup to point all 4 requests to the same server, because the 2 connection limit is per hostname not IP address), web sites could increase the number of parallel downloads and make their pages load faster. For a detailed look at this trick, and it's potential benefits, see "Circumventing Browser Connections For Fun and Profit".
These days we don't have to work quite as hard to achieve more than 2 parallel connections because most user agents use a different set of heuristics when deciding on how many parallel connections to establish. For example, IE 8 will now open up to 6 concurrent connections (see: Connectivity Enhancements in Internet Explorer 8).
Now the question to ask is this: how many connections is too many? Parallel connections will obey the law of diminishing returns, as too many connections can saturate and congest the network, particularly when mobile devices or unreliable networks are involved. Thus, having too many connections can hurt performance. Also, a server can only accept a finite number of connections, so if 100,000 user agents simultaneously create 100 connections to single web server, bad things will happen. Still, using more than 1 connection per agent is better than downloading everything in a serial fashion.
Fortunately, parallel connections are not the only performance optimization.
Persistent Connections
In the early days of the web, a user-agent would open and close a connection for each individual request it sent to a server. This implementation was inline with HTTP's idea of being a completely stateless protocol. As the number of requests per page grew, so did the overhead generated by TCP handshakes and the in-memory data structures required to establish each TCP socket. To reduce this overhead and improve performance, the HTTP 1.1 specification suggests that implementations should implement persistent connections, and make persistent connections the default type of connection.
A persistent connection stays open after the completion of one request-response transaction. That leaves a user agent with an already open socket it can use to continue making requests to the server without the overhead of opening a new socket. Persistent connections also avoid the slow start strategy that is part of TCP congestion control, making persistent connections perform better over time. In short, persistent connections reduce memory usage, reduce CPU usage, reduce network congestion, reduce latency, and generally improve the response time of a page. But, like everything in life there is a downside.
As I mentioned earlier, a server can only support a finite number of incoming connections. The exact number depends on the amount of memory available, the configuration of the server software, the performance of the application, and many other variables. It's difficult to give an exact number, but generally speaking, if you talk about supporting thousands of concurrent connections, you'll have to start testing to see if a server will support the load. In fact, many servers are configured to limit the number of concurrent connections far below the point where the server will fall over. The configuration is a security measure to help prevent denial of service attacks. It's relatively easy for someone to create a program that will open thousands of persistent connections to a server. Persistent connections are a performance optimization but also a vulnerability.
Thinking along the lines of a vulnerability – we've talked about persistent connections remaining open – but for how long? In a world of infinite scalability the connections could stay open for as long as the user agent program was running. But, because a server supports a finite number of connections, most servers are configured to close a persistent connection if it is idle for some period of time (5 seconds in Apache, for example). User agents can also close connections after a period of idle time. The only visibility into connections close is through a network analyzer like Wireshark.
In addition to aggressively closing persistent connections, most web server software can be configured to not enable persistent connections. This is common, for example, with shared servers. Shared servers sacrificing performance to allow as many connections as possible. Because persistent connections are the default connection style with HTTP 1.1, a server that does not allow persistent connections has to include a Connection header in every HTTP response. Below is an example.
HTTP/1.1 200 OK Content-Type: text/html; charset=utf-8 Server: Microsoft-IIS/7.0 X-AspNet-Version: 2.0.50727 X-Powered-By: ASP.NET Connection: close Content-Length: 17149
The "Connection: close" header is a signal to the user agent that the connection will not be persistent and should be closed as soon as possible. The agent isn't allowed to make a second request on the same connection.
Pipelined Connections
Parallel connections and persistent connections are both widely used and supported by clients and servers. The HTTP specification also allows for pipelined connections, which are not as widely supported by either servers or clients. In a pipelined connection a user agent can send multiple HTTP requests on a connection before waiting for the first response. Pipelining allows more efficient packing of requests into packets and can reduce latency, but it's not as widely supported as parallel and persistent connections.
Where Are We?
In this article we've look at HTTP connections and talked about some of the performance optimizations made possible by the HTTP specifications. In the next article we'll take a step back and look at the Web from a wider, architectural perspective.