My AWS Bill Was Out of Control — Here’s What I Did

Cerebrix

Wednesday, December 4, 2024

My AWS Bill Was Out of Control — Here’s What I Did

Franck Kengne

I thought I was being a smart cloud-native engineer. We had Terraform-managed infrastructure, CI/CD pipelines humming, and the team was happy deploying to ECS, Lambda, S3, you name it. It all looked great — until the bill showed up.

Last month, our AWS charges were up 68%. It felt like a punch in the gut.

We’re not talking about a few hundred dollars. This was tens of thousands of dollars that had no clear owner, no explanation, and no real accountability.

Step 1: Diagnosing the Bleed

I went straight to the Cost Explorer. It’s ugly, but it tells the truth. Here’s what I found:

✅ Orphaned RDS Instances
We had dev databases no one had used in half a year. Each was costing hundreds a month.

✅ EBS Zombie Volumes
Detached but un-deleted volumes racking up storage charges.

✅ Lambda Sprawl
A new feature spiked Lambda concurrency 10x thanks to a misconfigured batch job, which scaled up on every message from SQS.

✅ S3 Without Rules
Buckets holding user-generated content had no lifecycle rules. 30 terabytes of images were sitting in Standard storage.

✅ Cross-AZ Data Transfer
We had ECS services talking to each other across availability zones, quietly draining money at $0.01/GB millions of times per day.

I remember thinking, how the hell did this happen? Answer: death by a thousand papercuts.

Step 2: Emergency Fixes

I knew we needed to get costs down fast, so here’s what we changed in week one:

✅ Enforced Cost Allocation Tags
Every single resource, even dev boxes, had to carry team, project, and environment tags. We made it a Terraform policy with OPA checks — no deploy if you didn’t tag.

✅ Immediate Budgets & Alerts
Slack alerts on any service that crossed $300/month. It felt extreme, but the first day it caught another unexpected DynamoDB burst.

✅ RDS Cleanup Script
A Lambda runs every Sunday to look for idle RDS instances. If there are zero connections in the past 14 days, it snapshots then destroys them.

✅ S3 Lifecycle Policies
Finally bit the bullet:

Standard-IA after 30 days
Glacier after 90
Delete after 365

It was the easiest 20% cost reduction I’ve ever seen.

✅ Data Transfer Review
We forced ECS tasks in the same AZ to talk over local links instead of cross-AZ traffic. That alone dropped data transfer charges by 40%.

✅ Lambda Concurrency Controls
For a batch process with unpredictable spikes, we migrated it to ECS Fargate with a scheduled scaling policy. That gave predictable container pricing rather than unpredictable concurrency charges.

Step 3: Building a Better Culture

Once the bleeding stopped, I realized the root problem was cultural:

Nobody felt responsible for the bill.
“It’s on the cloud, so who cares?”
“Storage is cheap, right?”

Wrong.

So I pulled the team together and set new norms:

✅ Cloud cost reviews in sprint retro — we do a five-minute look at our top services every two weeks.
✅ Shared dashboards — I built a Grafana board pulling from AWS Cost Explorer so everyone could see in real time what was trending.
✅ FinOps mindset — if you build it, you own its budget impact.

I even started tracking unit economics for features: “This endpoint costs X per million requests,” something product managers actually understood. That way it wasn’t just an engineering story — it became a product conversation.

Lessons I’ll Never Forget

1️⃣ Cloud is not magic.
AWS will happily let you pay forever for zombie resources. Automate your kill switches.

2️⃣ Tags are oxygen.
No tags = no accountability.

3️⃣ Observability matters for cost, too.
Just like you monitor latency, you should monitor costs with the same rigor.

4️⃣ Serverless is awesome — until it explodes.
Uncapped concurrency can blow up your bill overnight.

5️⃣ FinOps is everyone’s job.
Dev teams need to think about budgets just as much as product features.

What’s Next for Us

I’m not stopping here. Now that we stabilized, we’re moving on to:

✅ Adopting Graviton instances to lower EC2/RDS pricing
✅ Spot Instances for non-critical workloads
✅ Savings Plans for predictable compute
✅ OpenCost to track Kubernetes clusters with more granularity
✅ Workshops for engineers on designing with cost in mind

I want to make this sustainable, so nobody ends up panicking at the end of next quarter.

Final Takeaway

Cloud is incredible. It lets us move fast, ship faster, and experiment like never before. But it will run away with your wallet if you aren’t actively managing it.

My biggest lesson? Treat your AWS bill like a product. Review it, test it, measure it, refactor it — and give it an owner.

Because if you don’t own your cloud costs, your cloud costs will absolutely own you.

NEVER MISS A THING!

Subscribe and get freshly baked articles. Join the community!

Join the newsletter to receive the latest updates in your inbox.

July 24, 2025