Gotchas with static sites on S3 via CloudFront

There’s a ton of posts already on how to set up static sites hosted on S3 via CloudFront. This isn’t going to be one of them. What I want to discuss is some weirdness that I encountered with setting up this blog. For the purpose of this post we’re going to assume that you are going to create a hypothetical static site: https://example.com

If you host a static site on hardware, or a VPS (be it an EC2 instance or a DigitalOcean Droplet) you’ll most likely do this with one of the various available web servers, like nginx or apache, and use a service like Let’s Encrypt to create the TLS certificate. When you do that, for requests like https://example.com/blog/ the web server will return the contents https://example.com/blog/index.html if it’s available. Nice!

If you host a static site via an AWS S3 bucket (named example.com) with static website hosting, it too will return the content of http://example.com.s3-website-us-east-1.amazonaws.com/blog/index.html for a request to http://example.com.s3-website-us-east-1.amazonaws.com/blog/. Unfortunately if you want to use HTTPS and/or your actual domain name (example.com) then you’ll need CloudFront or another CDN to do that for you.

If you put a CloudFront distribution in front of your S3 bucket to provide caching and HTTPS support, you will only get that behaviour if use the website domain in your distribution (ie. example.com.s3-website-us-east-1.amazonaws.com) and not the bucket domain (ie. example.com.s3.amazonaws.com). You should also NOT specify a default root object for the CloudFront distribution (ie. index.html), and instead let S3 do that for you. The problem with this approach is that your static site is still accessible directly via its website domain URL (ie. http://example.com.s3-website-us-east-1.amazonaws.com). If you want to prevent direct access and only allow CloudFront to access your S3 bucket, then in practical terms you’re pretty much out of luck.

There is another option. You can give CloudFront access to a bucket that is NOT configured as a website endpoint. This means that your bucket is now nicely inaccessible to unauthorised requests. You can also set up a default root object (ie. index.html) for your CloudFront distribution so that https://example.com will return the content of https://example.com/index.html. However, https://example.com/blog/ will now return a S3 access error since the bucket and the distribution no longer operate exactly like a web server.

The solution to this requires some code. CloudFront Functions allow you to intercept and modify incoming requests before they reach the origin server (in our case a S3 bucket). You can write a function that intercepts the incoming request to see if it is attempting to access a file or a directory, and if it is a directory then you can ask S3 to serve an index.html page from that directory. As a bonus, you can tell WordPress vulnerability scanners to piss off without forwarding the request to S3.

function handler(event) {
  var request = event.request;
  var uri = request.uri;

  // piss off php/wordpress scanner
  if (uri.includes('/wp-includes/') || uri.endsWith('.php')) {
    return {
      statusCode: 404,
      statusDescription: 'Not Found'
    };
  }

  // check for /blog/
  if (uri.endsWith('/')) {
    request.uri += 'index.html';
    return request
  }

  // disambiguate between /blog and /blog/style.css
  var tail = uri.split('/').pop();
  if (tail.includes('.')) {
    return request;
  }

  // redirect /blog to /blog/ so that pages
  // with relative resources work properly
  return {
    statusCode: 302,
    statusDescription: 'Found',
    headers: {
      'location': {
        'value': uri + '/'
      }
    }
  };
}

Create this function via the AWS Console as a viewer request event type, associate it with your CloudFront distribution, and you’re back in business!