Kristian's web log

February 2, 2010

Varnish purges

Filed under: Varnish — Tags: , , , — kristian @ 14:55

Varnish purges can be tricky. Both the model of purging that Varnish use and the syntax you need to use to take advantage of them can be difficult to grasp. It took me about five Varnish Administration Courses until I was happy with how I explained it to the participants, specially because the syntax is the most confusing syntax we have in VCL. However, it’s not very hard to work with once you understand the magic at work.

0. Separating purges and forced expiry

There are two ways to throw something out of the cache before the TTL is due in Varnish. You can either find the object you want gone and set the TTL to 0 forcing it to expire, or use Varnish’ purge mechanism. Setting ttl to 0 has it’s advantages, since you evict the object immediately, but it also means you have to evict one object at a time. This is fairly easy and usually done by having Varnish look for a “PURGE” request and handle it. This is not what I’ll talk about today, though. Read http://varnish-cache.org/wiki/VCLExamplePurging for information on forcibly expiring an object.

1. The challenges of purging a cache

The main reason people purge their cache, is to make room for updated content. Some journalist updated an article and you want the old one – possibly cached for days – gone. In addition, you may not know exactly what to cache, or it might be broader than just one item. En example would be a template used to generate multiple php files. Or all sports articles.

All in all, you do not purge to conserve memory. Because you expect that the cache will be filled soon.

If you are to purge all your php pages and you have 150 000 objects, you may not want to go looking for them either. This the reason some competing cache products are slow at large purging. By looking for all these objects, you might have to hit the disk to fetch cold objects.

In varnish, we also leave it up to VCL what’s unique to an object. That is to say: You can override the cache hash. By default it’s the host name or server IP combined with the “URL”. This is usually what people want, but sometimes you may want to add a cookie into the mix, for instance. The point is, we don’t know exactly what people cache on.

2. How Varnish attacks the problem

In Varnish, you purge by adding a purge to a list. This list can grow large if you add several very specific purges, but we try to reduce the overlap as much as possible. The purge in question can be pretty much anything you can match in VCL, including regular expressions on URLs, host names and user-agents for that matter. You can see the list by typing “purge.list” in the command line interface (CLI, or telnet).

Each object in your cache points to the last purge it was tested against. When you hit an object, it checks if there are any new purges in the list, test the object against them, then either evict the object and fetch a new one, or update the “last tested against”-pointer.

Because of this, the ‘req’-structure you are evaluating is actually that of the client to access the object next, not the client who pulled the object from the backend. It also means that every single object in your cache that is hit will be tested against all purges to see if it matches. But it’s spread out over time. It might sound wasteful, but it means you can add purges at constant time, and not really think about the cost of evaluating them.

It also means the object stays in the cache until it expires if it is not hit. So you don’t free up memory.

3. Adding purges “by hand”

Want to purge a http://example.com/somedirectory/ and everything beneath that path?

purge req.http.host == example.com && req.url ~ ^/somedirectory/.*$

or

purge req.url ~ ^/somedirectory/ && req.http.host == example.com

Want to purge all objects with a “Cache-Control: max-age=” set to 3600 ?

purge obj.http.Cache-Control ~ max-age=3600

or to take white space into account and no trailing numbers:

purge obj.http.Cache-Control ~ max-age ?= ?3600[^0-9]

Notice that all of the variables are in the same “VCL-context” as the client to hit the object next, so if you purge on req.http.user-agent, it’s fairly random if the object is really purged, because you (probably) can’t predict what user-agent the next person to visit a specific object is using. If you wish to purge based on a parameter sent from the “original” client, you will have to store that parameter in obj.http somewhere and remove it in vcl_deliver if you don’t want to expose it.

4. Adding purges in VCL

This is where it gets tricky. The normal example of why, is this: purge(”req.url == ” req.url);

Normal programming-thinking would tell you that this would match everything, since the url is always equal to itself. This is where VCL string concatenation comes into the picture. In reality, you are writing: “add this to the purge list: The string containing “req.url == ” and the value of the variable req.url”.

In other words, if the client access http://example.com/foobar and hit the code above, this would say: “Add the string containing “req.url == ” and “/foobar” to the purge list.” The quotation marks are essential!

I find it easier to think of it as preparing a string for the purge-command on cli. Varnish concatenates two strings without any special sign.

In the end, this is the rule of thumb: Put everything you expect to see literally when you type “purge.list” inside quotation marks, and put things you wish to replace with the variable of the calling session outside.

So you actually have three different VCL contexts to worry about:

  1. The context that originally pulled the object in from a backend (not much you can do here unless you hide things in obj.http)
  2. The context that will hit the object and thereby test the object against the purge. Any variable in this context has to be inside quotation marks.
  3. The context that triggered the purge, variables from this context should be outside quotation marks, so they are replaced with their string values before being added to the purge list.

The reason you do not need quotation marks if you enter the purge command on the command line interface is because you don’t have the third context. There is no req.url in telnet, since you are not going through VCL at all.

Some examples, note that when I say “supplied by the client” I mean the client initiating the purge, typically some smart system you’ve set up:

Purge object on the current host and URLs matching the regex stored in the X-Purge-Regex header supplied by the client:

purge("req.http.host == " req.http.host " && req.url ~ " req.http.X-Purge-Regex);

Purge all php for any example.com-domain:

purge("req.http.host ~ example.com$ && req.url ~ ^/.*\.php");

Same, but for the host provided in the X-Purge-HostPHP:

purge("req.http.host ~ " req.http.X-Purge-HostPHP " && req.url ~ ^/.*\.php");

Purge objects with X-Cache-Channel set to “sport”:

purge("obj.http.X-Cache-Channel ~ sport");

Same, but purge the cache-channel set in the header ‘X-Purge-CC’:

purge("obj.http.X-Cache-Channel ~ " X-Purge-CC);

Purge in vcl_fetch if the backend sent a X-Purge-URL header (weird thing to do, but fun example):

sub vcl_fetch {
(....)
if (obj.http.X-Purge-URL) {
purge("req.url ~ " obj.http.X-Purge-URL);
}
(...)
}

(PS: I have not actually tested all these examples, but they look correct)

January 26, 2010

Varnish best practices

Filed under: /dev/random — Tags: , , , , — kristian @ 16:21

A while ago I wrote about common Varnish issues, and I think it’s time for an updated version. This time, I’ve decided to include a few somewhat uncommon issues that, if set, can be difficult to spot or track down. A sort of pitfall-avoidance, if you will. I’ll add a little summary with parameters and such at the end.

1. Run Varnish on a 64 bit operating system

Varnish works on 32-bit, but was designed for 64bit. It’s all about virtual memory: Things like stack size suddenly matter on 32bit. If you must use Varnish on 32-bit, you’re somewhat on your own. However, try to fit it within 2GB. I wouldn’t recommend a cache larger than 1GB, and no more than a few hundred threads… (Why are you on 32bit again?)

2. Watch /var/log/syslog

Varnish is flexible, and has a relatively robust architecture. If a Varnish worker thread was to do something Bad and Varnish noticed, an assert would be triggered, Varnish would shut down and the management process would start it up again almost instantly. This is logged. If it wasn’t, there’s a decent chance you wouldn’t notice, since the downtime is often sub-second. However, your cache is emptied. We’ve had several customers contact us about performance-issues, only to realize they’re essentially restarting Varnish several times per minute.

This might make it sound like Varnish is unstable: It’s not. But there are bugs, and I happen to see a lot of them, since that’s my job.

An extra note: On Debian-based systems, /var/log/messages and /var/log/syslog is not the same. Varnish will log the restart in /var/log/messages but the actual assert error is only found in /var/log/syslog, so make sure you look there too.

The best way to deal with assert errors is to search our bug tracker for the relevant function-name.

3. Threads

The default values for threads is based on a philosophy I’ve since come to realize isn’t optimal. The idea was to minimize the memory footprint of Varnish. So by default, Varnish uses 5 threads per thread pool. By default, that’s 10 threads minimum. The maximum is far higher, but in reality, threads are fairly cheap. If you expect to handle 500 concurrent requests, tune Varnish for that.

A little clarification on the thread-parameters: thread_pool_min is the minimum number of threads for each thread pool. thread_pool_max is the maximum total number of threads. That means the values are not on the same scale. The thread_pools parameter can safely be ignored (tests have indicated that it doesn’t matter as much as we thought), but ideally having one thread_pool for each cpu core is the rule of thumb, if you want to modify it.

You also do not want more than 5000 as the thread_pool_max. It’s dangerous, though fixed in trunk. It’s also more often than not an indication that something else is wrong. If you find yourself using 5000 threads, the solution is to find out why it’s happening, not to increase the number of threads.

To reduce the startup time, you also want to reduce the thread_pool_add_delay parameter. ‘2′ is a good value (as opposed to 20 which makes for a slow start).

4. Tune based on necessity

I often look at sites where someone has tried to tune Varnish to get the most out of it, but taken it a bit too far. After working with Varnish I’ve realized that you do not really need to tune Varnish much: The defaults are tuned. The only real exception I’ve found to this is number of threads and possibly work spaces.

Varnish is – by default – tuned for high performance on the vast majority of real-life production sites. And it scales well, in most directions. By default. Do yourself a favor and don’t fix a problem which isn’t there. Of all the issues I’ve dealt with on Varnish, the vast majority have been related to finding out the real problem and either using Varnish to work around it, or fix it on the related system. Off the top of my head, I can really only remember one or two cases where Varnish itself has been the problem with regards to performance.

To be more specific:

  • Do not modify lru_interval. I often see the value “3600″. Which is a 180 000% (one hundred and eighty thousand percent) increase from the default. This is downright dangerous if you suddenly need the lru-list, and so far my tests haven’t been able to prove any noticeable performance improvement.
  • Setting sess_timeout to a higher value increase your filedescriptor consumption. There’s little to gain by doing it too. You risk running out of file descriptors. At least until we can get the fix into a released version.

So the rule of thumb is: Adjust your threads, then leave the rest until you see a reason to change it.

5. Pay attention to work spaces

To avoid locking, Varnish allocates a chump of memory to each thread, session and object. While keeping the object workspace small is a good thing to reduce the memory footprint (this has been improved vastly in trunk), sometimes the session workspace is a bit too small, specially when ESI is in use. The default sess_workspace is 16kB, but I know we have customers running with 5MB sess_workspace without trouble. We’re obviously looking to fix this, but so far it seems that having some extra sess_workspace isn’t that bad. The way to tell is by asserts (unfortunately), typically something related to “(p != NULL) Condition not true” (though there can obviously be other reasons for that). Look for it in our bug report, then try to increase the session workspace.

6. Keep your VCL simple

Most of your VCL-work should be focused around vcl_recv and vcl_fetch. That’s where you define the majority of your caching policies. If that’s where you do your work, you’re fairly safe.

If you want to add extra headers, do it in vcl_deliver. Adding a header in vcl_hit is not safe. You can use the “obj.hits” variable in vcl_deliver to determine if it was a cache hit or not.

You should also review the default vcl, and if you can, let Varnish fall through to it. When you define your VCL, Varnish appends the default VCL, but if you terminate a function, the default is never run. This is an important detail in vcl_recv, where requests with cookies or Authroization-headers are passed if present. That’s far safer than forcing a lookup. The default vcl_recv code also ensures that only GET and HEAD-requests go through the cache.

Focus on caching policy and remember that the default VCL is appended to your own VCL – and use it.

7. Choosing storage backend (malloc or file?)

If you can contain your cache in memory, use malloc. If you have 32GB of physical memory, using -smalloc,30G is a good choice. The size you specify is for the cache, and does not include session workspace and such, that’s why you don’t want to specify -smalloc,32G on a 32GB-system.

If you can not contain your cache in memory, first consider if you really need that big of a cache. Then consider buying more memory. Then sleep on it. Then, if you still think you need to use disk, use -sfile. On Linux, -sfile performs far better than -smalloc once you start hitting disk. We’re talking pie-chart-material. You should also make sure the filesystem is mounted with noatime, though it shouldn’t be necessary. On Linux, my cold-hit tests (a cold hit being a cache hit that has to be read from disk, as opposed to a hot hit which is read from memory) take about 6000 seconds to run on -smalloc, while it takes 4000 seconds on -sfile with the same hardware.  Consistently. However, your milage may vary with things such as kernel version, so test both anyway. My tests are easy enough: Run httperf through x-thousand urls in order. Then do it again in the same order.

Some of the most challenging setups we work with are disk-intensive setups, so try to avoid it. SSD is a relatively cheap way to buy yourself out of disk-issues though.

8. Use packages and supplied scripts

While it may seem easier to just write your own script and/or install from source, it rarely pays off in the long run. Varnish usually run on machines where downtime has to be planned, and you don’t want a surprise when you upgrade it. Nor do you want to risk missing that little bug we realized was a problem on your distro but not others. If you do insist on running home-brew, make sure you at least get the ulimit-commands from the startup scripts.

This is really something you want regardless of what sort of software you run, though.

9. Firewall and sysctl-tuning

Do not set “tw_reuse” to 1 (sysctl). It will work perfectly fine for everyone. Except thousands of people behind various NAT-based firewalls. And it’s a pain to track down. Unfortunately, this has been an advice in the past.

Avoid connection-tracking on the Varnish server too. If you need it, you’ll need to tune it for high performance, but the best approach is simply to not do connection-tracking on a server with potentially thousands of new connections per second.

10. Service agreements

(Service agreements are partly responsible for my salary, so with that “conflict of interest” in mind….)

You do not need a service agreement to run Varnish. It’s free software.

However, if you intend to run Varnish and your site is business critical, it’s sound financial advice to invest some money in it. We are the best at finding potential problems with your Varnish-setup before they occur, and solving them fast when they do occur.

We typically start out by doing a quick sanity-test of your configuration. This is something we can do fast, both with regards to parameters, VCL and system configuration. Some of our customers only contact us when there’s something horribly wrong, others more frequently to sanity-check their plans or check up on how to use varnisncsa for their particular logging tool and so on. It’s all up to you.

We also have a public bug tracker anyone can access and submit to. We do not have a private bug tracker, though there are bugs that never hit the public bug tracker – but that’s because we fix them immediately. Just like any other free software project, really. We have several public mailing lists, and we answer them to the best of our ability, but there is no guarantee and our time is far more limited. If you run into a bug, my work on other bugs will be postponed until your problems are solved. Better yet: if you run into something you don’t know is a bug, we can track it down.

A service agreement gives you saftey. And your needs will get priority when we decide where we want to take Varnish in the future.

We also offer training on Varnish, if you prefer not to rely on outside competence.

Oh, and I get to eat. Yum.

Summary

Keep it simple and clean. Do not use connection tracking or tw_reuse. Try to fit your cache into memory on a 64-bit system.

Watch your logs.

Parameters:

thread_pool_add_delay=2
thread_pools = <Number of cpu cores>
thread_pool_min = <800/number of cpu cores>
thread_pool_max = 4000
session_linger = 50
sess_workspace = <16k to 5m>

So if you have a dual quad core CPU, you would have 8 cpu cores. This would make sense: thread_pools=8, thread_pool_min=100, thread_pool_max=4000. The number 800 is semi random: it seems to cover most use-cases. I addedd session_linger into the mix because it’s a default in Varnish 2.0.5 and 2.0.6 but not in prior versions, and it makes good sense.

January 19, 2010

Real-time and hit-and-run based statistics for VSTS

Filed under: Varnish — Tags: , , , , — kristian @ 14:18

A while ago I wrote VSTS – the Varnish Stress Testing Suite. It’s not nearly as fancy as the name makes it sound: It’s a simple set of shell scripts that runs on a set of test servers and pounds Varnish under a couple of different scenarios. The idea is simple: Detect possible performance issues during the development cycle of Varnish by periodically testing the current code against the same well-established test.

It does the job, but could be far better. Specially when it comes to statistics.

The rrdtool-backed Python script is simply insufficient. As rrdtool likes to put data entry into a time-slot, and I don’t really operate in time slots, the data is both slow and imprecise. This is fine if you want data for every 5 minute throughout a year, but not if you want data collected anwhere from one time to ten times a day. Simply put: I want one data entry to represent one unit on the X-scale.

I also want to add “real-time” statistics. The current data is collected after each test, but I want data to be gathered throughout the relevant tests. This should be seperate from the other statistics. Hopefully, the system I end up with is flexible enough to allow me to plot all the datasets on the same graph, which should make any variations easy to spot.

Up until now I’ve only ever dealt with rrdtool when it comes to statistics (unless you count pre-rrdtool mrtg and the like), so I was hoping for some pointers before I start digging through what I suspect is a jungle of similar but different solutions. I’m comfortable working with most languages, and the primary concerns I’m looking at is implementation complexity and flexibility.

When I’m done with this batch of refactoring, I’ll do a proper writeup. However, what I’m most excited about is adding OpenSolaris into the target-platforms (should help clean out a few bugs that have been plaguing the Solaris-users for a while), adding better support for customized tests and improved robustness of VSTS.

January 13, 2010

Pushing Varnish even further

Filed under: Varnish — Tags: , , — kristian @ 21:33

A while ago I did a little writeup on high-end Varnish tuning, where I noted that I made our single core 2.2GHz Opteron reach 27k requests/second. This begged the questions as to how well Varnish scale with hardware. So I went ahead and tried to overload our quad-core Xeon at 2.4GHz. It would obviously take some extra fire power. At the very least, four times as much as the last batch of tests.

Hardware involved

Our main set of test servers for Varnish are called varnish1, varnish2, varnish3, varnish4, varnish6 and varnish7. These have mostly different software and hardware – which is done intentionally so we can perform tests under different circuimstances. We routinely run tests against Varnish2 and Varnish4, which run CentOS and FreeBSD, respectively. For my last test, I used Varnish2 as the server and the remaining servers as test nodes. By any normal math, I would need  about 4 times more fire power to overload a 2.4GHz Quad core, compared to a single core Opteron at 2.2GHz.

To sum it up as far as this round of tests go:

  • Varnish1 – Single core Opteron
  • Varnish2 – Single core Opteron at 2.2GHz (used in the last round of tests)
  • Varnish3 – Single core Xeon (if I’m not much mistaken). It’s also the nginx server used as backend, but that just means 1 request every X minutes.
  • Varnish4 – Single core Opteron (FreeBSD)
  • Varnish6 – Dual core Xeon of some kind
  • Varnish7 – Quad-core Xeon at 2.4GHz

So I needed more power. As it happens, we do alot of training and we have three classrooms full of computers for students, and I borrowed two of these class rooms, adding the following to the mix:

  • 10 x single core Pentium Celerons at 2.9x GHz
  • 10 x Core 2 Duos at 2.4ish GHz

As you might notice – a large part of the challenge when you want to test Varnish is getting your test systems to keep up.

Basic test procedures

Same as last time, more or less: 1 byte pages and httperf. I’ve tried ab, siege and curl… And they simply do not offer the raw power of httperf combined with the control – if anyone cares to enlighten me on how to get the most out of them, then I’m more than willing to listen.

Ideally I wanted to test with 10 requests for each connection, and with mixed data set size. As it turns out, I ended up using 100 requests / second and bursting all of the requests, which is far from realistic. More on this later.

I have an intricate script system for the nightly tests, but that’s a story for an other time. For these tests I simply used clusterssh to replicate my input on 37ish shells. This has allowed me to instantly test identical setups on all the nodes, and to quickly review what their status is. I probably ran a thousand or more different variants of the same test this time around.

I’ve used varnishstat to monitor the request rate and other relevant stats, and top to monitor general load.

The backend I use is hosted on varnish3, which runs nginx and a simple rewrite to ‘current.txt’, which for this occasion was linked to a 1byte file.

Results

Varnish uses alot of threads, and as such, when it does finally saturate the CPU, the load average will skyrocket. On the last test, Varnish2 had a load of 600-700. During this load, Varnish2 would use 10-15 seconds to start ‘top’.

During this round of tests I had roughly 87GHz worth of clients, spread over 25 physical computers. All of the tests systems were running at full load. Varnish7 had a load average around 45. Logging in and starting top was close to instant. And Varnish was serving 143k requests per second.

Based on the load and general snappiness, I think it is safe to conclude that while Varnish was close to the breaking point, it hadn’t actually reached it. To put it simply: My clients were not fast enough. Before I told httperf to burst 100 requests for each connection, Varnish was serving 110-120k requests per second with a load less than 1.0, and the clients were still using all their fire power. I ended up stress testing my clients. Dammit.

However, as I came fairly close to the breaking point, I still believe there are a few interesting things to look at.

The scaling nature of Varnish

It’s very rare that you can see an application scale so well just by throwing cpu power and cpu cores at it. Varnish essentially didn’t get affected at all by the extra work needed to synchronize work on 4 cpu cores. In fact, if you look at the math, the raw performance on 4 cpu cores was actually BETTER than on one cpu core, when you look at it on a cycle-by-cycle.

I think it’s reasonably safe to say that when it comes to raw performance, we’ve nailed it with Varnish.

In fact, scaling Varnish is far more difficult when you increase your active data set beyond physical memory. Or when you introduce latency. Or when when you have a low cache hit rate. Or any other number of corner cases. There will always be bottlenecks.

What you can learn from this is actually simple: Do not focus on the CPU when you want to scale your Varnish setup. I know it’s tempting to buy the biggest baddest server around for a high-traffic site, but if your active data set can fit within physical memory and you have a 64-bit CPU, Varnish will thrive. And for the record: All CPU-usage graphs I’ve seen from Varnish installations confirm this. Most of the time, those sexy CPUs are just sitting idle regardless of traffic.

Myths and further research

Since I didn’t reach the breaking point, there’s not much I can say conclusively. However, I can repeat a few points.

Adjusting the lru_interval had little impact regardless of data set and access patterns. If I repeat this often enough, perhaps I’ll stop seeing new installations with an lru_interval of 3600: DO NOT SET lru_interval TO 3600. There. I didn’t even add the usual “unless you know what you are doing” part. I might’ve explained it before, but the problem is that it leaves you with a really badly sorted lru-listed that will cause Bad Things once you need to lru-nuke something. Possibly really really bad things. Like throwing out the 200 most popular objects on your site at the same time.

And the size of your VCL has little impact on the performance. I have not tested this extensively, but I’ve never registered a difference, and since your cpu will be idle most of the time anyway, you should NOT worry about CPU cycles in VCL.

An otherimportant detail is that your shmlog shouldn’t trigger disk activity. On my setup, it didn’t sync to disk to begin with, but you may want to stick it on a tmpfs just to be sure. I suspect this has improved throughout the 2.0-series of Varnish, but it’s an easy insurance. Typically the shmlog is found in /usr/var/varnish, /usr/local/var/varnish or similar (”ls /proc/*/fd | grep _.vsl” is the lazy way to find it).

I tried several different settings of thread pools, acceptors, listen depth, shmlog parameters, rush exponent and such, but none of it revealed much – most likely because I never pressured Varnish enough. This will be what I want to investigate further. But it should tell you something about how far you have to go before these obscure settings start to matter.

Feedback wanted

I figure this must be some sort of record, but I’m interested in what sort of numbers others have seen or are seeing. Have anyone even come close to the numbers above – synthetic or otherwise – from a single server? Regardless of software or hardware? This is not meant as a challenge or boast, but I’m genuinely curious on what sort of traffic people are able to push. I’m interested in more “normal” requests rates too – I’m a sucker for numbers. What are you seeing on your site? Have you had scaling issues?

January 8, 2010

Hitpass objects and Varnish

Filed under: Varnish — Tags: , — kristian @ 10:39

Yesterday presented an interesting case of “what the hey is going on!?” for a customer of ours. The problem presented itself as a front page that sporadically needed 40 to 50 seconds to purge. After a brief hunt I discovered a somewhat interesting set of coincidences.

Before I go into detail, I need to tell you what a hitpass object is and why it’s needed.

Varnish can pass objects – that is to say, deliver an object without storing it in the cache – at two different stages in the request handling. The first and simplest stage you can pass at is before you have contacted the web server. This can be because you know that every page under a specific url should not be cached, or that a cookie is present and needed. This is done in vcl_recv. The slightly more complex stage to  purge at is after Varnish has gotten a reply from the backend, typically because it sent a “Set-Cookie” header or otherwise had headers that indicate that the page isn’t cacheable. This is done in vcl_fetch.

Normally Varnish tries to only request a specific object once, since it will look the same to all users there is no reason to make the web server generate it multiple times, and it’s definitely not faster either. When you are passing an object, every client request has to go to the web server, so serializing the requests for the same url is not wanted.

If you pass in recv, it’s easy for Varnish to avoid this serialization . It already knows that the next request for the same URL will have to go to the backend. Varnish makes an “anonymous” object that is never entered into the cache. A very simple and effective design.

But in vcl_fetch, things are not so easy. Varnish can’t predict a pass in vcl_fetch ahead of time, so it has to serialize these requests as if it was being cached. This is obviously not a desirable situation. Enter the hitpass object.

To avoid serialized requests after a pass in vcl_fetch, Varnish makes a hitpass object which is entered into the cache for ‘obj.ttl’ period of time, which simply says that this url will be passed for the next obj.ttl period of time. This looks almost just like a normal object in the cache, but it has no content, just a flag telling Varnish to fetch it from a backend. And thus Varnish can perform parallel requests again, because it knows the request will not be cached before it contacts the web server. The moment Varnish sees a hitpass object, it will dereference (and thus unlock) the hitpass object and make an anonymous one, entering the same code-path as if it was a pass in vcl_recv.

Now, back to my original story and purging. What’s it got to do with hitpass objects? Well, the purges on this particular site is done in vcl_hit/vcl_miss by resetting ttl to 0, effectively expiring the object. The problem was that at some point, a header on the relevant front page had told Varnish not to cache. So a hitpass object was made, and it had a ttl of roughly 4 days. So the VCL never hit vcl_hit or vcl_miss – where the purging was being done – but instead went to vcl_pass and directly to the backend, which kindly generated the page even though it just got a PURGE request, which is a bit strange, but not all that strange. What’s worse, of course, is that the front page wasn’t cached at all for this period of time. As it turns out, Varnish also has a minor issue which I also discovered yesterday, in that it doesn’t actually purge hitpass objects correctly, at least not in this particular version of Varnish.

And if you are thinking “why not set the ttl to 0 in vcl_pass if you see a PURGE?”, even if we allowed you to modify obj.ttl in vcl_pass, it wouldn’t be the ttl of the hitpass object, but the anonymous object. This is also why you can’t simply decide to start caching it again – when you reach vcl_fetch after hitting a hitpass object, the object you are working on is anonymous, and never entered into the cache.

My solution? Add an ‘x’ to the cache hash for the particular url/host combination for an immediate “fix”. It was either that or restarting Varnish at peak hour. Oh, and add some sanity to the VCL to reduce the effect of this in the future.

Recommendations

Understand hitpass objects. If you are doing ‘pass’ in vcl_fetch, remember that the TTL matters. You may want to switch ‘pass’ in vcl_fetch with your own pass subroutine which just sets the ttl to 1 minute for instance, and use that function consistently. That way, you avoid long-lived hitpass objects, but still get the benefit of them when you want them.

Also: If you are using the artificial http “PURGE”-method to set ttl to zero in vcl_hit, don’t forget to check for it in vcl_pass. You probably want to throw a 503-error in vcl_pass if you see a PURGE request.

December 8, 2009

Varnishstat for dummies

Filed under: /dev/random — Tags: , , , — kristian @ 14:16

Varnishstat is the tool used to monitor the basic health of Varnish. Unlike all the other tools, it doesn’t read log entries, but counters that Varnish update in real-time. It can be used to determine your request rate, memory usage, thread usage, and just about anything that’s not related to a specific request. As such, it’s nice to know how to work with it. Below is a rough introduction.

Reading varnishstat

In it’s simplest form, varnishstat is run with: «varnishstat». It will look something like this, depending on your terminal size:

I’ve added the red text, in case you didn’t already guess that.

The uptime here is 5 days, 7 hours, 36 minutes. The server name is the hostname by default, or what you specify with the -n argument to varnishd (and varnishstat).

While running varnishstat, you will see the three numbers right of “Hitrate ratio:” increase from 0 to 10, 100 and 1000 respectively. They are simple indicators for the numbers in the “Hitrate avg:” listing below. In the picture above, we can see that during the last 10 seconds, the hitrate average was 0.9671, during the last 100 seconds, it was 0.9687, and during the last 131 seconds, it was 0.9688. The actual numbers in hitrage average is how many cache hits you have, compared to cache misses. Keep in mind that “pass” in vcl_recv is not a cache miss, so you can have a hitrate average of 1.0, and still see backend requests. For puny mortals, you typically multiple the number with 100 and pronounce it as percentage. (Ie: 96.71%, 96.87% and 96.88%).

The rest of the output is a list of all counters. In Varnish 2.0.5 (and 2.0.4, possibly 2.0.3?), an interactive varnishstat only prints numbers that are different from zero. This is a futile attempt to display as much information as possible, as the varnishstat I have on my laptop currently has 98 counters.

The first column is the raw data of the counter. In case of cache hits, for instance, this is the total number of cache hits since Varnish was started.

The second column is the change per second in realtime. So on the image above, the server is handling 1546 requests per second. The next column is the average change per second since Varnish started. So during the past 5+ days, this server has handled 925 requests per second in average.

You will notice that some counters do not have the “per second” columns. These are counters that can decrease. Number of objects, for instance, will go both up and down, so the value of a change/second is small, since it doesn’t really tell you much.

Some values to care about

(Assuming you are the caring, loving type)

  • Client connections accepted (per second).
  • Client requests received. Experience shows that a ratio close to 1:10 between connections and requests is natural on web sites. If it’s far below or far above that – investigate.
  • Backend connections failures – Should be low, obviously. This typically results in 503-errors. Are your backends struggling?
  • N struct object – number of cached objects.
  • N worker threads – how many threads you have right now
  • N worker threads created – how many threads have been created (should be close to the number you are running now)
  • N worker threads not created – ZERO – Threads that Varnish tried to created but failed (should never happen)
  • N worker threads limited – reasonably low after startup – Number of threads varnish wanted to created, but wasn’t able to either because of max threads or the thread_pool_add_delay.
  • N overflowed work requests – requests that had to be put on the request queue. should be fairly static after startup.
  • N dropped work requests. Requests Varnish never got to respond to because the request queue was full. Should ideally never happen.
  • N LRU nuked objects – Objects thrown out to make room for others. If this is zero, there’s no point to make your cache larger.
  • esi_parse and esi_errors – ESI parsed pages and ESI pages parsed with errors, respectively. Only relevant if you use ESI.
  • n_expired – Objects expired (ttl reached 0 (or was set to 0) and grace too).

There are obviously more variables that might be of interest, but the list above is a good place to start and should give you a general idea of the state Varnish is in.

Other uses

  • To list all stats, use «varnishstat -1». This will list everything once.
  • To list just a few values: use «varnishstat -l»to find the name for the field, then «varnishtat -f field1,field2,field3»
  • Use munin to graph the data into related blocks.

October 19, 2009

High-end Varnish-tuning

Filed under: Varnish — Tags: , , , — kristian @ 12:19

Most of the time when I tune varnish servers, the main problem is hit rate. That’s mostly a matter wack the weasel, and fairly straight forward. However, once you go beyond that, things get fun. I’ll take you through a few common tuning tricks. This is also based on no disk I/O too, so either sort that out first or expect different results.

The big ones

The first thing you want to do is sort your threads out. One thread pool for each CPU core. Never run with less than, say, 800 threads. If you think that’s alot, then you don’t need these tips. For max, I don’t advice going over 6000, I’ll explain that shortly. So if you have 8 cpu cores, you will want to set:

thread_pools 8
thread_pool_min 100
thread_pool_max 5000
thread_pool_add_delay 2

Note that I also set the thread_pool_add_delay to 2ms. That should drastically reduce the startup time for your threads, and is fairly safe. The reason we don’t create everything instantly is to avoid bombing the kernel.

The main danger with threads – if we rule out I/O – is file descriptors. Currently the log format we use have a 16 bit field reserved for file descriptors, which I believe is fixed in trunk, but that limits us to 64k file descriptors. And your kernel will clean them up periodically, so running out is very very relevant, and please keep in mind that synthetic tests are horrible at testing this. You can probably use 40 000 threads in a synthetic test without running into file descriptor issues, but do not use that in production. 6000 might be high, and unless you really really really need it, I wouldn’t go beyond 2000 or 3000. I’ve done quite a bit of testing and tried out different options on production sites, and have found that 800 is a sane minimum, and I’ve rarely seen max threads be an issue until you hit the fd-limit. You can watch /proc/<PID of varnish child>/fd/ to see how many fds varnish have allocated at any given time.

The next issue you are likely to run in to is cli_timeout. If your varnish is heavily loaded, it might not answer the management thread in a timely fashion, which in turn will kill it off. To avoid that, set cli_timeout to 20 seconds or more. Yes, 20. That’s the extreme, but I have gradually increased this over months of  routine tests. I’m currently running these tests with a cli_timeout of  25 seconds, which so far has worked. 23 worked until today. For most sites and most real work loads, I doubt this is necessary, but if it is and you actually hit this in production, your Varnish will restart when it’s most bussy – which is probably the worst possible scenario you have. Set it to at least 10-15 seconds (we increased the default to 10 seconds a while ago. It’s a sane compromise, but a tad low for an overloaded Varnish)

Last but not least of the common tricks is a well kept seceret; session_linger.  When you have a bunch of threads and Varnish become CPU-bound, you are likely to get killed by context switching and whatnot. To reduce this, setting session_linger can help. You may have to experiment a bit, as it depends on your content. I recently had to set it to 120ms to get it to really do the trick. The site load would climb to 60k req/s then crumble to a measly 2-5k req/s during tests. Session linger did the trick. However, don’t set it too high. That will leave your threads idling.

Session_linger has been improved in trunk, and will be enabled by default in 2.0.5, but it’s still useful in 2.0.4.

[Update] Session linger cause your threads to wait around for more data from the client it’s currently working with, without it, you risk switching threads between piped requests which requires moving alot of data around and allocating/freeing threads. It’s better to have spare threads than to constantly switch the ones you have around.

Misc

An other value you may want to change is lru_interval. This is mainly to update the lru list, and the default is 2 seconds. There are several pages that will mention an lru_interval of 3600, but we’ve seen such values cause problems in the past. I would consider something like 20 seconds. It’s not going to have a huge impact on your performance.

People also increase the listen depth, this might be necessary but I’ve not seen any solid evidence that it does, so I generally avoid it.

An other thing to consider is using critbit instead of classic hashing. That is more relevant for huge data sets, and I’ve not seen any significant performance gain on my synthetic tests yet, but I know some people have so it’s something you might want to look into.

Session timeout is generally fine at the default (4s), but you should not increase it, or you might run into file descriptor issues.

Then there’s your load balancer. We’ve had several cases where Varnish has run into issues because of enourmous amount of connections. You do NOT want to make a connection for every single request.

Summary

thread_pools 8
thread_pool_min 100
thread_pool_max 5000
thread_pool_add_delay 2
cli_timeout 25
session_linger 50/100/150
lru_interval 20

Testing

Testing all of this is a different story, but I will point out a few common pit falls:

  • Testing your stress testing tool.  You need a number of machines to test Varnish – otherwise varnish isn’t going to be the bottleneck but your stress testing system is. I use a cluster of 6 servers to test Varnish, one will be the varnish server and the other 5 will hammer it – and that’s barely enough, even though the Varnish server is not specced for high performance compared to the other nodes.
  • Using too few connections or too many – Real life seems to suggest that 10 requests per connection is fairly realistic.
  • Testing only cache hits. This is great for getting huge numbers, but obviously not all that realistic. For a proper test, you may want to generate urls from log files and balance them accordingly.

Results?

Our single-core Opteron at 2.2GHz handles 27k requests/s consistently. Sure, the load can hit 400-600 but hey, it works. This scales fairly well too, so if that was a dual quad core I wouldn’t be surprised if we could reach 180 k req/s (but I have no idea where we’d get the firepower from to test that – or the bandwidth. I assume there’d be some completely different issues at that point). This is with 1-byte pages, mind you. I’ve seen varnish deliver favicon.ico at 60k req/s on a dual quad, but that was an underachiever ;)

September 15, 2009

The importance of hit-rate and why you should care

Filed under: Varnish — Tags: , , , — kristian @ 00:09

Tonight is election-night in Norway, which means most media-sites are beat up properly, which truly puts them to the test.

So what happens when your site has enourmous amounts of traffic?

Cache-hit and what a cache can do for you is often significantly underestimated, and this is where I tell you why and what you can do about it.

(more…)

August 24, 2009

Saint Mode

Filed under: Varnish — Tags: , , — kristian @ 16:45

SSIA?

Saint mode is now committed to trunk. This means you can evaluate responses in vcl_fetch and choose to discard them, hold off requests to the backend for N seconds and use graced objects instead. In other words, if your image-site is delivering images with http 200 OK but a single-digit content-length, you can decide to not use that response, but used the previously cached one instead (and/or try a different backend).

The syntax is reasonable simple. We’ve introduced the ‘beresp.saintmode’ variable which can be set in vcl_fetch. This adds the objecthead to a list of ‘troubled’ objects hanging off each backend. This means that if one backend is broken but an other is fine, only the broken one will be blacklisted for that specific object. After this is set, you need to restart. So something like this:

if (beresp.status == 500) { set beresp.saintmode = 20s; restart; }

Unfortunately we ran into a snag with regards to doing this in vcl_error, since we don’t have what we need there. This is more of a general problem than saint mode-specific, and with PHK’s blessings, I’ll be writing a workaround/fix for this with regards to saintmode tomorrow. It should be reasonably simple to fix, if the VCL syntax is acceptable. (We’re talking an hour or two + testing).

August 12, 2009

Security.VCL

Filed under: /dev/random, Varnish — Tags: , , , , — kristian @ 14:40

Edward Bjarte Fjellskål, Kacper Wysocki and myself have been working on Security.VCL, a small (for now?) project to imitate much of the functionality that mod_security has in VCL.

The basic idea is this: You’re already running Varnish, and you want to add some basic filtering of Bad Things. Varnish is Quite Fast[tm] when it comes to parsing headers, so doing so in VCL makes sense. VCL also makes it easy to include other VCL files as they are just appended, so let’s write a VCL that detects some common script-kiddie stuff. The typical example would be someone trying to access “../../” or similar.

Kacper made a script to re-write most of the mod_security rules to VCL, I made the basic framework and Edward has written several customized rules. The flexibilty that VCL offers makes it easy to customize security.vcl in a simple manner.  We’ve yet to write any backend-magic, but in theory, you could easily block clients in your firewall, log them or just redirect them. We supply a couple of examples of this in the main VCL.

So far, the major drawback is that Varnish is not designed to analyze the content of POST requests.

At the moment, Security.VCL is a working (internal) PoC, and we’ve yet to decide what to do with it. But the concept is quite interesting, if I may say so myself.

Since we haven’t done a release yet, we figured we’d post a (mostly useless) screenshot of a random portion of the code. Unfortunately, I’m not quite cool enough to use green fonts. So grey will have to do:

Older Posts »

Powered by WordPress