Apache sending “Vary: Host” making things uncacheable for Varnish
TLDR;
Using %{HTTP_HOST} in .htaccess, will cause Apache to included a “Vary: Host” field in response.
Subsequently “Vary: Host” header from Apache will force Varnish not to cache otherwise cacheable content.
HTTP Vary is not a trivial concept. It is by far the most misunderstood HTTP header. (Varnish Docs)
On a project I’ve been working, I could not make Varnish hard cache the site no matter what I did.
It was mostly read-only site and I wanted to achieve that in case of backend failures – site would still run from the cache. To avoid surprises, I was using my own configuration template which was working exactly as I wanted it on different project.
But, no matter what I did – Varnish was not caching everything. Instead I got ‘per-user‘ cache: if user visited eg, homepage, in case of backend failure – user could reload homepage and Varnish would serve homepage from stale cache to the user. Visiting any other page user did not visit before backend failure (and therefore not in his cache), would result in Varnish freaking that backend is down.
What I noticed was strangely large Vary-Header in Varnish response
Vary:Host,Accept-Encoding,User-Agent
After spending hours of tunneling, debugging Varnish configuration, analysing header responses, comparing server configurations, hunting for cookies that could have been somehow magically slip through… last place I looked was my main .htaccess. I was using mod_deflate there, but as it checked out everything was fine.
On a side note, I have been experimenting with per host configuration in .htaccess, so Basic Authentication would not kick in dev enviroment. It was working great up until now, and it goes something like this:
<if "%{HTTP_HOST} == 'dev.nivas.hr'"> Require valid-user ... Allow from facebook.com </if>
What I had to find out the hard way, is that if special Apache enviroment variable %{HTTP_HOST} was used in a eg. .htaccess, Apache would change response header. Server was returning a header which included a “Vary: Host” field, which means that the server didn’t serve a regular static page, but its reply depends on the “Host” field in the HTTP request. Browsers interpret this as “the content returned is dynamic, don’t cache it (source).
curl -I http://localhost/ HTTP/1.1 200 OK Date: Mon, 13 Feb 2017 14:48:14 GMT Server: Apache/2.4.6 (CentOS) Vary: Host,User-Agent Content-Type: text/html; charset=UTF-8
It is really strange I did not hit this before because RewriteCond can add “Host” to the Vary-Header as well. eg. a RewriteCond that evaluates %{HTTP_HOST} automatically adds “Host” to the Vary-Header. This is unnecessary and not permitted according to https://tools.ietf.org/html/rfc7231#section-7.1.4. The issue was reported and has been sitting in Apache bugtraq for a while without clear resolution.
I can understand why Varnish is not caching, but cannot understand Apache logic. I do have VirtualHost defined, so therefore my request do vary on Host. There is no need in forcing this out in response.
After removing %{HTTP_HOST} from .htaccess, site was cacheable as we wanted.
HTTP/1.1 200 OK Date: Mon, 13 Feb 2017 22:18:33 GMT Content-Type: text/html; charset=UTF-8 Vary: Accept-Encoding Age: 3707 X-Nivas-Crew: loves you :) http://www.nivas.hr X-Backend: backend_app1 X-Cache: HIT X-Cache-Hits: 786332 X-Vudu-Url-Cache: hit
Don’t forget to normalize your Vary in Varnish, chance are without normalization it will never see a cache hit.
Happy caching!