Ever since I discovered that Amazon CloudFront supported custom origins I have been thinking about how it could be used in a Sitecore solution. If you set up a custom origin pointing to your website it will effectively work as a caching proxy (edge cache). This can take some of the load off your Sitecore server and give a general boost to page performance* which again could have a positive affect on your SEO ranking.
In this post I want show how CloudFront can be leveraged as an edge cache for Sitecore media library content. One problem that immediately arises is that CloudFront does not forward query string parameters to the origin server. This is a problem if you use query string parameters to control your media requests. The obvious example is requesting images form the media library with different parameters.
Note how the query string disappears as CloudFront forwards the requests to the origin.
Hopefully one day AWS will support query string parameters - in the meantime we will have to do a little bit of URL rewriting. If we were to move the query string to the beginning of the part and substitute & and ? with ^ (%5E when URL encoded) - like this:
CloudFront will forward the path in full to the origin server. The URL can easily be re-written using the URL Rewrite module for IIS to restore the query string.
The rewrite rule could look like this:
Outputting CDN image paths
A while ago I blogged about customising the Image field renderer to enable the crop parameter. It’s possible to override the Sitecore.Pipelines.RenderField.GetImageFieldValue class to change the generated image source URL. The same technique can be used to substitute local media library URLs with CDN URLs.
You will also need to handle media library URLs that are being output in-line in Rich Text fields. That can be done by elaborating on the functionality of the Sitecore.Pipelines.RenderField.ExpandLinks in the RenderField pipeline.
Both of these uses this code to generate CDN URLs:
You may at this point think that it would be easier to just have a outbound URL rewrite rule (I did) but you would loose some control over when/where URLs are being rewritten. There could be cases (e.g. Preview and Page Editor mode) where you would want to disable the CDN URL rewriting.
One big problem faced when using edge caching is invalidating content when it is updated at the origin. Most CDN’s - CloudFront included - supports some sort of content invalidation. Ideally you would want to expire content on the edge as soon as a new version has been published by Sitecore. Due to the fact that we include dynamic parameters in the URLs and that CloudFront does not support a full purge this is difficult. To circumvent this issue I choose to embed a token in the CDN urls. There are two tokens that could be used; item revision or last published timestamp. The latter will basically generate new URLs for all content after a publish and forces CloudFront to visit origin for all requests. Note that the old media content may linger on the edge for however long the expires HTTP header dictates.You would be able to invalidate content on the edge using the AWS API if you know the exact URLs. If you are governed by legislation that requires the ability to invalidate the entire edge cache instantly, the approach described above is not appropriate.
Another consideration is how long content should live on the edge. By default Sitecore will not allow media content to be cached on the edge but this can be adjusted by changing the following configuration options:
Sitecore already has a powerful caching layer for media library content. The performance gain from using a CDN may be limited depending on your set-up and the nature of your website.
I would not suggest using a CDN for caching pages as you would loose the ability to do personalisation. You will also loose the ability to record DMS goals on media library downloads.
When creating a custom origin CloudFront distribution it will mirror your entire site (pages included). You may want your origin server to filter out non-media requests coming form CloudFront. You can use a Request Blocking rule in IIS Rewrite for requests where the user agent HTTP header matching “Amazon CloudFront”.
When trying to establish if content is being cached you can look for these HTTP headers coming from CloadFront: “X-Cache: Miss from cloudfront” or “X-Cache: Hit from cloudfront”.
There are many other CDN products available besides Amazon CloudFront (Akamai is popular) but none other with such a transparent price policy as far as I’m aware. CloudFront does not require a heavy financial commitment to start using edge caching and should be affordable by most.
*) Be warned: It’s important to realise that rewriting the URLs in the FieldRenderer will degrade field rendering performance slightly and that the impact of any modifications to the RenderField pipeline will be amplified by the number of fields output via the pipeline. You would need to evaluate whether the performance hit to the FieldRenderer will be outweighed by the potential gains of using a CDN. The code provided in this post has not been performance tested or tuned - it’s purely written to provide examples to help understanding the processes involved.