I recently ran into an issue with my static file serving strategy, to make it short, here is a high level introduction of my architecture, Django web framework, I am using CloudFront to serve my static files, and S3 as my origin. The problem I ran into was after I collect static to S3, refreshing the CloudFront URL will randomly give me the old or the latest version of that object.
A little bit more details on this, my CloudFront cache time was set to 24 hours. Yesterday, I ran collect static twice, first time I uploaded test.txt with 1-line content “test 1”, the website runs for 4 hours, and 4 hours later, I updated my test.txt with content “test 2”, and I ran collectstatic again, now I have two questions, first, what’s in test.txt on S3, second, what are you going to see in test.txt from CloudFront? Answer, first, it is “test 2” on S3, second, sometimes you will get “test 1”, sometimes you will get “test 2” while you keep refreshing the CloudFront URL.
I thought this is super weird, since sometimes I got the latest version of my object from CloudFront, I expected the latest version to be served up all the time. Therefore, I took some further inspection today, I updated test.txt again to “test 3”, I ran collectstatic to get the file up to S3. All of these 3 collectstatic operations happened within 24 hours. Question, what’s there on S3, and what’s there on CloudFront. Answer, S3 has the latest version “test 3”, CloudFront will return me “test 1”, “test 2” or “test 3” randomly while I refresh.
I spent some time debugging, trying to get CloudFront to serve the latest version, making “test 1” and “test 2” disappear, but no luck.
On the second thought, before I was going to take any further actions, I started realizing that I was probably thinking in a completely wrong direction, after some read through on CloudFront documentation, I found this:
CloudFront will make another request to the origin after the objects expire, my “test 1” did not expire, why I started to see “test 2” and “test 3”.
This is what I think is happening, I am in San Francisco, assume my edge location has 20 servers, first time I collectstatic to get “test 1” to S3, and then I requested CF Server 1 from my edge location, CF Server 1 did not have it, so it forwarded the request to S3, and got it from there. And then I refreshed the URL again, this time it hit CF Server 2 from my edge location, it did not have it, so it got it from S3. Therefore, CF server 1 and 2 from my edge location now had “test 1”. The second time when I collectstatic to get “test 2” up to S3, when I requested for the file, if my request hit CF server 1 or 2, it would find the file there, and serve it, it my request hit another 18 CF servers from my edge location, it would ask S3 for the file, which was the latest version “test 2”, cache it on the server, and serve it. From that point on, when I refreshed the URL, whether I would see “test 1” or “test 2” would depend on which CF server got my request.
Someone may ask doesn’t CF server 1 propagate to all the other CF servers when it gets the latest version. Apparently it does not. Here is what I think the reason will be, if an edge location has hundreds or thousands of servers, and every time it starts propagating to all the other servers if 1 of them gets a latest version of a file, how much effort that is going to cost such as file transfer. However, there must be some kinda of ways to “propagate”, that’s when invalidation comes to the rescue. Also, if I was a developer on CloudFront team, in the implementation point of view, invalidating an object is much much cheaper than propagating an object to all the CloudFront servers, because invalidating an object only needs to delete that object from all the servers that have that object, but propagating an object will have to transfer that object to all the servers. That’s why I believe the philosophy behind CloudFront is that they want you to bust those old version of files once you have a new version.
Finally, I invalidated my test.txt file, I got “test 3” all the time as I expected.
Thanks for reading.