Day: September 19, 2012
Varnish 3.0 Setup in HA for Drupal 7 with Redhat (Part 2)
Varnish 3.0 Setup in HA for Drupal 7 using Redhat Servers
So now that you have read the Varnish and how it works posting of my blog. We can begin with how I went about setting my varnish. The Diagram above is basically the same setup we had.
Since we were using redhat and this was going into production eventually. I decided it was best to stick to repos, now keep in mind you don’t have to do this. You can go ahead and compile your own version if you wish. For the purpose of my tutorial, we’re going to use third party repo called EPEL.
- Installing Varnish 3.0 on Redhat
- This configuration is based on lullobot’s setup, with some tweaks and stuff I found that he forgot to mention which I spent hours learning.
Varnish is distributed in the EPEL (Extra Packages for Enterprise Linux) package repositories. However, while EPEL allows new versions to be distributed, it does not allow for backwards-incompatible changes. Therefore, new major versions will not hit EPEL and it is therefore not necessarily up to date. If you require a newer major version than what is available in EPEL, you should use the repository provided by varnish-cache.org.
To use the varnish-cache.org repository, run
rpm --nosignature -i http://repo.varnish-cache.org/redhat/varnish-3.0/el5/noarch/varnish-release-3.0-1.noarch.rpm
and then run
yum install varnish
The --no-signature
is only needed on initial installation, since the Varnish GPG key is not yet in the yum keyring
–Note: So after you install it, you will notice that the daemon will not start. Total piss off right? This is because you need to configure a few things based on the resources you have available. This is explained all my Varnish how it works post, which you have already read 😛
2. So, we need to get varnish running first before we can play with it this is done /etc/sysconfig/varnish. These are the settings I used for my configuration. My VM’s had 2 CPU’s and 4 gigs of ram each.
If you want to know what these options do, go read my previous post. It would take too long to explain each flag in this post, and this will get boring hence why I wrote it in two parts. Save the file and then start varnish /etc/init.d/varnish start. If it doesn’t start you have a mistake somewhere in here.
DAEMON_OPTS=”-a *:80,*:443 \
-T 127.0.0.1:6082 \
-f /etc/varnish/default.vcl \
-u varnish -g varnish \
-S /etc/varnish/secret \
-p thread_pool_add_delay=<Number of CPU cores> \\
-p thread_pools=Number of CPU cores\
-p thread_pool_max=1500 \
-p listen_depth=2048 \
# -p lru_interval=1500 \
-h classic,169313 \
-p obj_workspace=4096 \
-p connect_timeout=600 \
-p sess_workspace=50000 \
-p max_restarts=6 \
-s malloc,2G”
3. now that varnish is started you need to setup the VCL which it will read. The better you understand how your application works, the better you will be able to fine time the way the cache works. There is no one way to do this. This is simply how I went about it.
VCL Configuration
The VCL file is the main location for configuring Varnish and it’s where we’ll be doing the majority of our changes. It’s important to note that Varnish includes a large set of defaults that are always automatically appended to the rules that you have specified. Unless you force a particular command like “pipe”, “pass”, or “lookup”, the defaults will be run. Varnish includes an entirely commented-out default.vcl file that is for reference.
So this configuration will be connecting to two webserver backends. Each webserver has a health probe which the VCL is checking. If the probe fails it removes the webserver from the caching round robin. The cache is also updating every 30 seconds as long as one of the webservers is up and running. If both webservers go down it will server objects from the cache up to 12 hours, this is varied depending on how you configure it.
http://www.nicktailor.com/files/default.vcl
Now what some people do is they have a php script that sits on the webservers, which varnish will run and if everything passes the web server stays in the pool. I didn’t bother to do it this way. I setup a website that connected to a database and had the health probe just look for status 200 code. If the page came up web server stayed in the pool. if it didn’t it will it drop it.
# Define the list of backends (web servers).
# Port 80 Backend Servers
backend web1 { .host = “status.nicktailor.com”; .probe = { .url = “/”; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}
backend web2 { .host = “status.nicktailor.com”; .probe = { .url = “/”; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}
Caching Even if Apache Goes Down
So a few people have written articles about this and say how to do it, but it took me a bit to get this working.
Even in an environment where everything has a redundant backup, it’s possible for the entire site to go “down” due to any number of causes. A programming error, a database connection failure, or just plain excessive amounts of traffic. In such scenarios, the most likely outcome is that Apache will be overloaded and begin rejecting requests. In those situations, Varnish can save your bacon with theGrace period. Apache gives Varnish an expiration date for each piece of content it serves. Varnish automatically discards outdated content and retrieves a fresh copy when it hits the expiration time. However, if the web server is down it’s impossible to retrieve the fresh copy. “Grace” is a setting that allows Varnish to serve up cached copies of the page even after the expiration period if Apache is down. Varnish will continue to serve up the outdated cached copies it has until Apache becomes available again.
To enable Grace, you just need to specify the setting in vcl_recv
and invcl_fetch
:
# Respond to incoming requests.
sub vcl_recv {
# Allow the backend to serve up stale content if it is responding slowly.
set req.grace = 6h;
}
# Code determining what to do when serving items from the Apache servers.
sub vcl_fetch {
# Allow items to be stale if needed.
set beresp.grace = 6h;
}
Note- The missing piece to this is the most important piece, without the ttl below if webservers go down, after like 2 mins your backend will show an error page, because by default the ttl for objects to stay in the cache when the webservers aka backend go down is set extremely low. Everyone seems to forget to mention this crucial piece of information.
# Code determining what to do when serving items from the Apache servers.
sub vcl_fetch {
# Allow items to be stale if needed.
set beresp.grace = 24h;
set beresp.ttl = 6h;
}
Just remember: while the powers of grace are awesome, Varnish can only serve up a page that it has already received a request for and cached. This can be a problem when you’re dealing with authenticated users, who are usually served customized versions of pages that are difficult to cache. If you’re serving uncached pages to authenticated users and all of your web servers die, the last thing you want is to present them with error messages. Instead, wouldn’t it be great if Varnish could “fall back” to the anonymous pages that it does have cached until the web servers came back? Fortunately, it can — and doing this is remarkably easy! Just add this extra bit of code into the vcl_recv sub-routine:
# Respond to incoming requests.
sub vcl_recv {
# …code from above.
# Use anonymous, cached pages if all backends are down.
if (!req.backend.healthy) {
unset req.http.Cookie;
}
}
Varnish sets a property req.backend.health if any web server is available. If all web servers go down, this flag becomes FALSE. Varnish will strip the cookie that indicates a logged-in user from incoming request, and attempt to retrieve an anonymous version of the page. As soon as one server becomes healthy again, Varnish will quit stripping the cookie from incoming requests and pass them along to Apache as normal.
Making Varnish Pass to Apache for Uncached Content
Often when configuring Varnish to work with an application like Drupal, you’ll have some pages that should absolutely never be cached. In those scenarios, you can easily tell Varnish to not cache those URLs by returning a “pass” statement.
# Do not cache these paths.
if (req.url ~ “^/status\.php$” ||
req.url ~ “^/update\.php$” ||
req.url ~ “^/ooyala/ping$” ||
req.url ~ “^/admin/build/features” ||
req.url ~ “^/info/.*$” ||
req.url ~ “^/flag/.*$” ||
req.url ~ “^.*/ajax/.*$” ||
req.url ~ “^.*/ahah/.*$”) {
return (pass);
}
Varnish will still act as an intermediary between requests from the outside world and your web server, but the “pass” command ensures that it will always retrieve a fresh copy of the page.
In some situations, though, you do need Varnish to give the outside world a direct connection to Apache. Why is it necessary? By default, Varnish will always respond to page requests with an explicitly specified “content-length”. This information allows web browsers to display progress indicators to users, but some types of files don’t have predictable lengths. Streaming audio and video, and any files that are being generated on the server and downloaded in real-time, are of unknown size, and Varnish can’t provide the content-length information. This is often encountered on Drupal sites when using the Backup and Migrate module, which creates a SQL dump of the database and sends it directly to the web browser of the user who requested the backup.
To keep Varnish working in these situations, it must be instructed to “pipe” those special request types directly to Apache.
# Pipe these paths directly to Apache for streaming.
if (req.url ~ “^/admin/content/backup_migrate/export”) {
return (pipe);
}
Just remember: while the powers of grace are awesome, Varnish can only serve up a page that it has already received a request for and cached. This can be a problem when you’re dealing with authenticated users, who are usually served customized versions of pages that are difficult to cache. If you’re serving uncached pages to authenticated users and all of your web servers die, the last thing you want is to present them with error messages. Instead, wouldn’t it be great if Varnish could “fall back” to the anonymous pages that it does have cached until the web servers came back? Fortunately, it can — and doing this is remarkably easy! Just add this extra bit of code into the vcl_recv sub-routine:
# Respond to incoming requests.
sub vcl_recv {
# …code from above.
# Use anonymous, cached pages if all backends are down.
if (!req.backend.healthy) {
unset req.http.Cookie;
}
}
Varnish sets a property req.backend.health if any web server is available. If all web servers go down, this flag becomes FALSE. Varnish will strip the cookie that indicates a logged-in user from incoming request, and attempt to retrieve an anonymous version of the page. As soon as one server becomes healthy again, Varnish will quit stripping the cookie from incoming requests and pass them along to Apache as normal.
Making Varnish Pass to Apache for Uncached Content
Often when configuring Varnish to work with an application like Drupal, you’ll have some pages that should absolutely never be cached. In those scenarios, you can easily tell Varnish to not cache those URLs by returning a “pass” statement.
# Do not cache these paths.
if (req.url ~ “^/status\.php$” ||
req.url ~ “^/update\.php$” ||
req.url ~ “^/ooyala/ping$” ||
req.url ~ “^/admin/build/features” ||
req.url ~ “^/info/.*$” ||
req.url ~ “^/flag/.*$” ||
req.url ~ “^.*/ajax/.*$” ||
req.url ~ “^.*/ahah/.*$”) {
return (pass);
}
Varnish will still act as an intermediary between requests from the outside world and your web server, but the “pass” command ensures that it will always retrieve a fresh copy of the page.
In some situations, though, you do need Varnish to give the outside world a direct connection to Apache. Why is it necessary? By default, Varnish will always respond to page requests with an explicitly specified “content-length”. This information allows web browsers to display progress indicators to users, but some types of files don’t have predictable lengths. Streaming audio and video, and any files that are being generated on the server and downloaded in real-time, are of unknown size, and Varnish can’t provide the content-length information. This is often encountered on Drupal sites when using the Backup and Migrate module, which creates a SQL dump of the database and sends it directly to the web browser of the user who requested the backup.
To keep Varnish working in these situations, it must be instructed to “pipe” those special request types directly to Apache.
# Pipe these paths directly to Apache for streaming.
if (req.url ~ “^/admin/content/backup_migrate/export”) {
return (pipe);
}
How to view the log and what to look for
varnishlog |grep -i -v ng (This will output a one page out of the log so you can see it without it going all over the place)
- One of the key things to look for is if your back end is healthy, it should show that in this log, if it does not show this, then something is still wrong. I have jotted down what it should look like below.
Every poll is recorded in the shared memory log as follows:
NB: subject to polishing before 2.0 is released!
0 Backend_health - b0 Still healthy 4--X-S-RH 9 8 10 0.029291 0.030875 HTTP/1.1 200 Ok
The fields are:
- 0 — Constant
- Backend_health — Log record tag
- – — client/backend indication (XXX: wrong! should be ‘b’)
- b0 — Name of backend (XXX: needs qualifier)
- two words indicating state:
- “Still healthy”
- “Still sick”
- “Back healthy”
- “Went sick”
Notice that the second word indicates present state, and the first word == “Still” indicates unchanged state.
- 4–X-S-RH — Flags indicating how the latest poll went
- 4 — IPv4 connection established
- 6 — IPv6 connection established
- x — Request transmit failed
- X — Request transmit succeeded
- s — TCP socket shutdown failed
- S — TCP socket shutdown succeeded
- r — Read response failed
- R — Read response succeeded
- H — Happy with result
- 9 — Number of good polls in the last .window polls
- 8 — .threshold (see above)
- 10 — .window (see above)
- 0.029291 — Response time this poll or zero if it failed
- 0.030875 — Exponential average (r=4) of responsetime for good polls.
- HTTP/1.1 200 Ok — The HTTP response from the backend.
- Varnishhist – The varnishhist utility reads varnishd(1) shared memory logs and presents a continuously updated histogram show- ing the distribution of the last N requests by their processing. The value of N and the vertical scale are dis- played in the top left corner. The horizontal scale is logarithmic. Hits are marked with a pipe character (“|”), and misses are marked with a hash character (“#”)
- Varnishtop – The varnishtop utility reads varnishd(1) shared memory logs and presents a continuously updated list of the most commonly occurring log entries. With suitable filtering using the -I, -i, -X and -x options, it can be used to display a ranking of requested documents, clients, user agents, or any other information which is recorded in the log.
Warming up the Varnish Cache
Example:
wget –mirror -r -N -D http://www.nicktailor.com – You will need to check the wget flags I did this off memory
- Varnishreplay -The varnishreplay utility parses varnish logs and attempts to reproduce the traffic. It is typcally used to warm up caches or various forms of testing.The following options are available:
-abackend Send the traffic over tcp to this server, specified by an address and a port. This option is mandatory. Only IPV4 is supported at this time. -D Turn on debugging mode. -r file Parse logs from this file. The input file has to be from a varnishlog of the same version as the varnishreplay binary. This option is mandatory.
Understanding how Varnish works (Part 1)
I put this post together because you kind of need to understand these things before you try and setup varnish, otherwise you will be trial and error like I was which took a bit a longer. If I had known these things it would of helped.
Varnish 3.0 How it works
I am writing this blog post because when I setup Varnish is very painful to learn, because varnish does not work out of the box. It needs to be configured to even start on redhat. Although there are some great posts out there on how to setup, they all fail to mention key details that every newb wants to know and ends up digging all over the net to find. So I have decided to save everyone the trouble and I’m writing it from beginning to end with descriptions and why and how it all works.
Understanding The Architecture and process model
Varnish has two main processes: the management process and the child process. The management process apply configuration changes (VCL and parameters), compile VCL, monitor Varnish, initialize Varnish and provides a command line interface, accessible either directly on the terminal or through a management interface.
The management process polls the child process every few seconds to see if it’s still there. If it doesn’t get a reply within a reasonable time, the management process will kill the child and start it back up again. The same happens if the child unexpectedly exits, for example from a segmentation fault or assert error.
This ensures that even if Varnish does contain a critical bug, it will start back up again fast. Usually within a few seconds, depending on the conditions.
The child process
The child process consist of several different types of threads, including, but not limited to:
Acceptor thread to accept new connections and delegate them. Worker threads – one per session. It’s common to use hundreds of worker threads. Expiry thread, to evict old content from the cache Varnish uses workspaces to reduce the contention between each thread when they need to acquire or modify memory. There are multiple workspaces, but the most important one is the session workspace, which is used to manipulate session data. An example is changing www.example.com to example.com before it is entered into the cache, to reduce the number of duplicates.
It is important to remember that even if you have 5MB of session workspace and are using 1000 threads, the actual memory usage is not 5GB. The virtual memory usage will indeed be 5GB, but unless you actually use the memory, this is not a problem. Your memory controller and operating system will keep track of what you actually use.
To communicate with the rest of the system, the child process uses a shared memory log accessible from the file system. This means that if a thread needs to log something, all it has to do is grab a lock, write to a memory area and then free the lock. In addition to that, each worker thread has a cache for log data to reduce lock contention.
The log file is usually about 90MB, and split in two. The first part is counters, the second part is request data. To view the actual data, a number of tools exist that parses the shared memory log. Because the log-data is not meant to be written to disk in its raw form, Varnish can afford to be very verbose. You then use one of the log-parsing tools to extract the piece of information you want – either to store it permanently or to monitor Varnish in real-time.
All of this is logged to syslog. This makes it crucially important to monitor the syslog, otherwise you may never even know unless you look for them, because the perceived downtime is so short.
VCL compilation
Configuring the caching policies of Varnish is done in the Varnish Configuration Language (VCL). Your VCL is then interpreted by the management process into to C and then compiled by a normal C compiler – typically gcc. Lastly, it is linked into the running Varnish instance.
As a result of this, changing configuration while Varnish is running is very cheap. Varnish may want to keep the old configuration around for a bit in case it still has references to it, but the policies of the new VCL takes effect immediately.
Because the compilation is done outside of the child process, there is no risk of affecting the running Varnish by accidentally loading an ill-formated VCL.
A compiled VCL file is kept around until you restart Varnish completely, or until you issue vcl.discard from the management interface. You can only discard compiled VCL files after all references to them are gone, and the amount of references left is part of the output of vcl.list.
Storage backends
Varnish supports different methods of allocating space for the cache, and you choose which one you want with the -s argument.
file
malloc
persistent (experimental)
Rule of thumb: malloc if it fits in memory, file if it doesn’t
Expect around 1kB of overhead per object cached
They approach the same basic problem from two different angles. With the malloc-method, Varnish will request the entire size of the cache with a malloc() (memory allocation) library call. The operating system divides the cache between memory and disk by swapping out what it can’t fit in memory.
The alternative is to use the file storage backend, which instead creates a file on a filesystem to contain the entire cache, then tell the operating system through the mmap() (memory map) system call to map the entire file into memory if possible.
The file storage method does not retain data when you stop or restart Varnish! This is what persistent storage is for. When -s file is used, Varnish does not keep track of what is written to disk and what is not. As a result, it’s impossible to know whether the cache on disk can be used or not — it’s just random data. Varnish will not (and can not) re-use old cache if you use -s file.
While malloc will use swap to store data to disk, file will use memory to cache the data instead. Varnish allow you to choose between the two because the performance of the two approaches have varied historically.
The persistent storage backend is similar to file, but experimental. It does not yet gracefully handle situations where you run out of space. We only recommend using persistent if you have a large amount of data that you must cache and are prepared to work with us to track down bugs.
Tunable parameters
In the CLI:
param.show -l
Varnish has many different parameters which can be adjusted to make Varnish act better under specific workloads or with specific software and hardware setups. They can all be viewed with param.show in the management interface and set with the -p option passed to Varnish – or directly in the management interface.
Remember that changes made in the management interface are not stored anywhere, so unless you store your changes in a startup script, they will be lost when Varnish restarts.
The general advice with regards to parameters is to keep it simple. Most of the defaults are very good, and even though they might give a small boost to performance, it’s generally better to use safe defaults if you don’t have a very specific need.
A few hidden commands exist in the CLI, which can be revealed with help -d. These are meant exclusively for development or testing, and many of them are downright dangerous. They are hidden for a reason, and the only exception is perhaps debug.health, which is somewhat common to use.
The shared memory log
Varnish’ shared memory log is used to log most data. It’s sometimes called a shm-log, and operates on a round-robin capacity.
There’s not much you have to do with the shared memory log, except ensure that it does not cause I/O. This is easily accomplished by putting it on a tmpfs.
This is typically done in ‘/etc/fstab’, and the shmlog is normally kept in ‘/var/lib/varnish’ or equivalent locations. All the content in that directory is safe to delete.
The shared memory log is not persistent, so do not expect it to contain any real history.
The typical size of the shared memory log is 80MB. If you want to see old log entries, not just real-time, you can use the -d argument for varnishlog: varnishlog -d.
Warning: Some packages will use -s file by default with a path that puts the storage file in the same directory as the shmlog. You want to avoid this.
Threading model
The child process runs multiple threads
Worker threads are the bread and butter of the Varnish architecture
Utility-threads
Balance
The child process of Varnish is where the magic takes place. It consists of several distinct threads performing different tasks. The following table lists some interesting threads, to give you an idea of what goes on. The table is not complete.Thread-name Amount of threads Task
cache-worker One per active connection Handle requests
cache-main One Startup
ban lurker One Clean bans
acceptor One Accept new connections
epoll/kqueue Configurable, default: 2 Manage thread pools
expire One Remove old content
backend poll One per backend poll Health checks
Most of the time, we only deal with the cache-worker threads when configuring Varnish. With the exception of the amount of thread pools, all the other threads are not configurable.
For tuning Varnish, you need to think about your expected traffic. The thread model allows you to use multiple thread pools, but time and experience has shown that as long as you have 2 thread pools, adding more will not increase performance.
The most important thread setting is the number of worker threads.
Note: If you run across tuning advice that suggests running one thread pool for each CPU core, res assured that this is old advice. Experiments and data from production environments have revealed that as long as you have two thread pools (which is the default), there is nothing to gain by increasing the number of thread pools.
Threading parameters
Thread pools can safely be ignored
Maximum: Roughly 5000 (total)
Start them sooner rather than later
Maximum and minimum values are per thread pool
Details of threading parameters
While most parameters can be left to the defaults, the exception is the number of threads.Varnish will use one thread for each session and the number of threads you let Varnish use is directly proportional to how many requests Varnish can serve concurrently.The available parameters directly related to threads are:Parameter
Default value
thread_pool_add_delay 2 [milliseconds]
thread_pool_add_threshold 2 [requests]
thread_pool_fail_delay 200 [milliseconds]
thread_pool_max 500 [threads]
thread_pool_min 5 [threads]
thread_pool_purge_delay 1000 [milliseconds]
thread_pool_stack 65536 [bytes]
thread_pool_timeout 300 [seconds]
thread_pools 2 [pools]
thread_stats_rate 10 [requests]
Among these, thread_pool_min and thread_pool_max are most important. The thread_pools parameter is also of some importance, but mainly because it is used to calculate the final number of threads.
Varnish operates with multiple pools of threads. When a connection is accepted, the connection is delegated to one of these thread pools. The thread pool will further delegate the connection to available thread if one is available, put the connection on a queue if there are no available threads or drop the connection if the queue is full. By default, Varnish uses 2 thread pools, and this has proven sufficient for even the most busy Varnish server.
For the sake of keeping things simple, the current best practice is to leave thread_pools at the default 2 [pools].
Number of threads
Varnish has the ability to spawn new worker threads on demand, and remove them once the load is reduced. This is mainly intended for traffic spikes. It’s a better approach to try to always keep a few threads idle during regular traffic than it is to run on a minimum amount of threads and constantly spawn and destroy threads as demand changes. As long as you are on a 64-bit system, the cost of running a few hundred threads extra is very limited.
The thread_pool_min parameter defines how many threads will be running for each thread pool even when there is no load. thread_pool_max defines the maximum amount of threads that will be used per thread pool.
The defaults of a minimum of 5 [threads] and maximum 500 [threads] threads per thread pool and 2 [pools] will result in:
At any given time, at least 5 [threads] * 2 [pools] worker threads will be running
No more than 500 [threads] * 2 [pools] threads will run.
We rarely recommend running with more than 5000 threads. If you seem to need more than 5000 threads, it’s very likely that there is something not quite right about your setup, and you should investigate elsewhere before you increase the maximum value.
For minimum, it’s common to operate with 500 to 1000 threads minimum (total). You can observe if this is enough through varnishstat, by looking at the N queued work requests (n_wrk_queued) counter over time. It should be fairly static after startup.
Timing thread growth
Varnish can use several thousand threads, and has had this capability from the very beginning. Not all operating system kernels were prepared to deal with this, though, so the parameter thread_pool_add_delay was added which ensures that there is a small delay between each thread that spawns. As operating systems have matured, this has become less important and the default value of thread_pool_add_delay has been reduced dramatically, from 20ms to 2ms.
There are a few, less important parameters related to thread timing. The thread_pool_timeout is how long a thread is kept around when there is no work for it before it is removed. This only applies if you have more threads than the minimum, and is rarely changed.
An other is the thread_pool_fail_delay, which defines how long to wait after the operating system denied us a new thread before we try again.
System parameters
As Varnish has matured, fewer and fewer parameters require tuning. The sess_workspace is one of the parameters that could still pose a problem.
sess_workspace – incoming HTTP header workspace (from client)
Common values range from the default of 16384 [bytes] to 10MB
ESI typically requires exponential growth Remember: It’s all virtual – not physical memory.
Workspaces are some of the things you can change with parameters. The session workspace is how much memory is allocated to each HTTP session for tasks like string manipulation of incoming headers. It is also used to modify the object returned from a web server before the precise size is allocated and the object is stored read-only.
Some times you may have to increase the session workspace to avoid running out of workspace.
As most of the parameters can be left unchanged, we will not go through all of them, but take a look at the list param.show gives you to get an impression of what they can do.
TimersParameter Default Description Scope
connect_timeout 0.700000 [s] OS/network latency Backend
first_byte_timeout 60.000000 [s] Page generation? Backend
between_bytes_timeout 60.000000 [s] Hiccoughs? Backend
send_timeout 60 [seconds] Client-in-tunnel Client
sess_timeout 5 [seconds] keep-alive timeout Client
cli_timeout 10 [seconds] Management thread->child Management
The timeout-parameters are generally set to pretty good defaults, but you might have to adjust them for strange applications. The connection timeout is tuned for a geographically close web server, and might have to be increased if your Varnish server and web server are not close.
Keep in mind that the session timeout affects how long sessions are kept around, which in turn affects file descriptors left open. It is not wise to increase the session timeout without taking this into consideration.
The cli_timeout is how long the management thread waits for the worker thread to reply before it assumes it is dead, kills it and starts it back up. The default value seems to do the trick for most users today.
Now that you have read this you can go read My
Varnish Configuration for Drupal in HA on Redhat