The Peril Of Using ETags In A Cluster
April 24, 2003
Apache administrators: beware ETags if you have more than one webserver! (If you only have one webserver this article will not be useful to you.)
HTTP/1.1 added the header response “ETag” to allow a server to define its own way of uniquely identifying a point-in-time version of a specific file. The ETag is unstructured data; it’s just a string. The client, when rerequesting a document, submits an “If-None-Match”
header – if this header does not match the server’s ETag for the file, the server must retransmit the document, even if the HTTP/1.0 “If-Last-Modified” header exactly matches the “Last-Modified” date of the file.
This wouldn’t be so bad as-is if it weren’t for the way that Apache implements ETag support by default. The default setting is to incorporate the file’s last modification date, its current size, and its Unix inode. The first two make sense; I can understand wanting to make sure that both the last-modified time and the size match what’s on the client. But incorporating the inode leads to some very bad behavior on clusters, because a given file,
such as LOGO.JPG might have the same size and modification time on all of the webservers of the cluster, but the inode numbers are guaranteed to be different.
This means that if you have four web servers, three times out of four when a client connects to a random web server, the client’s stored ETag will not match the server’s and the server will needlessly be forced to retransmit the file to the client. As the number of web servers grows, the situation quickly approaches the point where effectively no caching is happening at all.
This is all compounded by a bug that I found in Internet Explorer 5 and 6, where if the downloaded file’s Last-Modified header matches the If-Last-Modified header it sent in the request, IE doesn’t bother to update its cached ETag. This means that even if you were to force IE to keep connecting to the same server (with the same inode for the file, etc.), once it’s made up its mind about an ETag it won’t change it until the Last-Modified time changes!
To fix this insanity, stick the following line in your Apache httpd.conf:
FileETag MTime Size
This will tell Apache to construct ETags based on only the modification time and the filesize; specifically, it prevents Apache from using the inode of the file in the ETag. Then touch all of your files to update your last-modified time. The next time a client goes to your page, they’ll re-download the files, since the last-modified time changed, but then they will have the “simplified” ETag (without an inode) and they won’t have to download the file again until the file actually next changes. Your pages will be much snappier! 🙂