Debugging CDN Cache Issues: When S3 File Updates Don't Propagate
The Problem
After updating a file with the same name on S3, some endpoints saw the updated version immediately, while others continued serving the old version for over 30 minutes. This inconsistent behavior across different client locations strongly suggested a CDN caching issue.
The symptoms were clear:
- File was successfully updated on S3
- Some users could see the new version
- Other users were stuck with the old version for 40+ minutes
- Infrastructure team reported "no CDN" was in use
Investigation
When dealing with file update propagation issues, the first step is to understand the complete request path. Even when teams believe they're not using a CDN, modern web architectures often involve multiple caching layers.
Step 1: Analyzing the Request Headers
Using curl -I to examine the HTTP response headers revealed crucial information:
curl -I https://yourdomain.com/yourfile.jsThe response headers told the complete story:
age: 6584
cache-control: public, max-age=14400
cf-cache-status: HIT
These headers immediately identified the root cause:
cf-cache-status: HITconfirmed Cloudflare was serving cached contentage: 6584showed the cached object had been alive for 6,584 seconds (~1.8 hours)cache-control: public, max-age=14400indicated a 4-hour cache TTL
Step 2: Understanding the Architecture
The actual request flow was:
Client → Cloudflare Edge Node → S3 (when cache miss)
Even though the infrastructure team reported "no CDN," Cloudflare itself is a CDN with global edge nodes that cache content.
Root Cause
The issue stemmed from CDN edge node caching behavior:
-
S3 was setting cache headers: The origin (S3) was instructing downstream caches to store the file for 4 hours (
max-age=14400) -
Cloudflare respected the cache headers: Different Cloudflare edge nodes had cached the file at different times
-
Geographic cache distribution: Some users hit edge nodes with expired caches (showing new content), while others hit nodes with valid caches (showing old content)
-
Time-based inconsistency: The
age: 6584header showed this particular edge node's cache still had ~2.2 hours remaining before expiration
Solution
Immediate Fix: Cache Invalidation
The quickest solution was to purge the Cloudflare cache:
- Go to Cloudflare Dashboard
- Navigate to Caching → Cache Purge
- Select Purge by URL and enter the specific file URL
- This forces all edge nodes to fetch fresh content from S3
Long-term Solutions
1. Implement Versioned URLs
The most robust approach is to avoid same-name file updates:
// Instead of overwriting:
https://example.com/app.js
// Use versioned URLs:
https://example.com/app.js?v=20250924
// or
https://example.com/app.20250924.js2. Configure Appropriate Cache Rules
Set up granular caching policies in Cloudflare:
HTML files: TTL = 60 seconds (frequent updates)
JavaScript/CSS: TTL = 24 hours + versioning
Images: TTL = 7 days + versioning
API responses: TTL = 30 seconds or no-cache
3. Optimize S3 Cache Headers
Configure S3 metadata for appropriate caching:
# For frequently updated files
aws s3 cp file.html s3://bucket/ --cache-control "max-age=60"
# For static assets with versioning
aws s3 cp app.js s3://bucket/ --cache-control "max-age=86400"Understanding CDN Architecture
To answer the broader questions about CDN behavior:
Edge Node Caching
- Cloudflare operates global edge nodes - these are CDN cache servers
- When you route traffic through Cloudflare (orange cloud icon), you're using their CDN
- Each geographic region may have different cache states
Cache Hierarchy
Client → ISP Cache → Cloudflare Edge → Origin (S3/CloudFront)
Traditional CDN Behavior
- AWS CloudFront: Another CDN layer that can be added
- ISP Caches: Network providers often have their own caching infrastructure
- Multiple CDN layers: It's common to have CloudFront → Cloudflare → Client setups
Cache Control Inheritance
- S3
Cache-Controlheaders influence downstream caches - Cloudflare can override these settings with Page Rules or Cache Rules
- Browser caches also respect these headers
Lessons Learned
Prevention Tips
- Always use versioned static assets for files that change frequently
- Implement proper cache strategies based on content type and update frequency
- Monitor cache hit rates to balance performance vs freshness
- Set up cache invalidation workflows for emergency updates
Debugging Process
When facing similar issues:
- Check direct origin access (bypass CDN) to confirm file updates
- Analyze HTTP headers to understand cache behavior
- Test from multiple locations to identify geographic inconsistencies
- Document your CDN architecture - teams often forget about caching layers
Performance Considerations
Reducing TTL from 4 hours to 1 minute would solve the update issue but comes with trade-offs:
- Lower cache hit rates (90% → 20-30%)
- Increased origin load and costs
- Higher latency for end users
- More bandwidth usage
The optimal approach is content-type-specific caching rather than blanket short TTLs.
Architecture Documentation
Always maintain clear documentation of your caching architecture:
DNS → Cloudflare → [CloudFront] → S3/Origin
Understanding each layer's caching behavior prevents confusion when troubleshooting update propagation issues.
By implementing versioned URLs and appropriate cache policies, you can achieve both fast updates and optimal performance, eliminating the trade-off between cache efficiency and content freshness.
