operationsengineering

Cloud Cost Analysis

Analyze and optimize cloud infrastructure costs — identifying waste, right-sizing resources, evaluating reserved vs on-demand pricing, and producing savings roadmaps with ROI projections.

cloudcost-optimizationFinOpsAWSinfrastructure

Works well with agents

Cloud Architect AgentDevOps Engineer Agent

Works well with skills

System Design Document
cloud-cost-analysis/
    • aws-quarterly-review.md4.1 KB
  • SKILL.md6.8 KB
SKILL.md
Markdown
1 
2# Cloud Cost Analysis
3 
4## Before you start
5 
6Gather the following from the user:
7 
81. **Which cloud provider(s)?** (AWS, GCP, Azure, or multi-cloud)
92. **Current monthly spend** (total and by service if available)
103. **Cost breakdown access** (billing console exports, Cost Explorer data, or CSV dumps)
114. **Growth trajectory** (expected traffic or workload changes over 6-12 months)
125. **Commitment constraints** (existing reserved instances, savings plans, or enterprise agreements)
13 
14If the user says "our cloud bill is too high," push back: "What's your current monthly spend, which services make up the top 80%, and do you have any existing reservations or savings plans?"
15 
16## Cost analysis template
17 
18### 1. Spend Summary
19 
20Break down current spend into the top cost categories. Cover at least 80% of total spend.
21 
22```
23| Service | Monthly Cost | % of Total | Trend (3mo) |
24|------------------|-------------|------------|-------------|
25| EC2 / Compute | $42,300 | 38% | +12% |
26| RDS / Databases | $28,100 | 25% | +5% |
27| S3 / Storage | $15,200 | 14% | +8% |
28| Data Transfer | $9,800 | 9% | +22% |
29| Other | $15,600 | 14% | flat |
30| **Total** | **$111,000**| **100%** | **+11%** |
31```
32 
33### 2. Waste Identification
34 
35Audit each category for idle, oversized, or orphaned resources. Use this checklist:
36 
37- **Idle resources**: Instances, load balancers, or databases with <5% average utilization over 14 days
38- **Orphaned storage**: Unattached EBS volumes, old snapshots, unused S3 buckets
39- **Oversized instances**: CPU/memory utilization consistently below 30% — candidates for right-sizing
40- **Zombie environments**: Dev/staging environments running 24/7 that could use scheduling
41- **Unused reservations**: Reserved capacity for instance types no longer in use
42 
43For each finding, document the resource, its current cost, and the estimated savings.
44 
45### 3. Right-Sizing Recommendations
46 
47For every oversized resource, propose a specific target:
48 
49```
50| Resource | Current Type | Avg CPU | Avg Memory | Recommended | Monthly Savings |
51|------------------|-------------|---------|------------|-------------|-----------------|
52| api-prod-1 | m5.2xlarge | 12% | 28% | m5.large | $180 |
53| worker-batch | c5.4xlarge | 8% | 15% | c5.xlarge | $310 |
54| analytics-db | r5.4xlarge | 22% | 45% | r5.2xlarge | $520 |
55```
56 
57### 4. Pricing Model Optimization
58 
59Evaluate the mix of on-demand, reserved, savings plans, and spot:
60 
61- **Stable baseline workloads**: Recommend 1-year or 3-year reservations. Calculate break-even point (typically 7-9 months for 1-year RI).
62- **Variable workloads**: Recommend savings plans with a commitment level matching the floor of historical usage.
63- **Fault-tolerant batch jobs**: Recommend spot instances with interruption handling. Document the spot vs on-demand discount (typically 60-80%).
64- **Dev/test environments**: Recommend scheduling (stop nights/weekends) or spot-based environments.
65 
66### 5. Architecture-Level Optimizations
67 
68Identify structural changes that reduce cost:
69 
70- **Data transfer**: Move cross-AZ traffic to same-AZ where possible. Use VPC endpoints instead of NAT gateways for AWS service calls.
71- **Storage tiering**: Move infrequently accessed data to cheaper tiers (S3 Infrequent Access, Glacier, or equivalent).
72- **Compute model**: Evaluate containers (ECS/EKS) vs VMs for better bin-packing and utilization.
73- **Caching**: Add caching layers to reduce database and API call volume.
74- **Serverless migration**: Identify low-traffic services where serverless would eliminate idle compute costs.
75 
76### 6. Savings Roadmap
77 
78Prioritize recommendations by effort and impact. Use this format:
79 
80```
81| Priority | Action | Monthly Savings | Effort | Timeline |
82|----------|----------------------------------|----------------|----------|-----------|
83| P0 | Delete orphaned EBS volumes | $1,200 | 1 day | This week |
84| P0 | Schedule dev environments | $3,800 | 2 days | This week |
85| P1 | Right-size top 10 instances | $4,500 | 1 week | 2 weeks |
86| P1 | Purchase 1-year RIs for baseline | $8,200 | 1 day | 30 days |
87| P2 | Migrate logs to S3 IA tier | $2,100 | 1 sprint | 60 days |
88| P2 | Move batch jobs to spot | $5,600 | 2 sprints| 90 days |
89```
90 
91### 7. ROI Projection
92 
93Summarize the total opportunity:
94 
95- **Quick wins (0-2 weeks)**: Total monthly savings from P0 items
96- **Medium-term (1-3 months)**: Cumulative savings including P1 items
97- **Full realization (3-6 months)**: Total annual savings with all recommendations implemented
98- **Implementation cost**: Engineering hours required, expressed in estimated cost
99 
100## Quality checklist
101 
102Before delivering the analysis, verify:
103 
104- [ ] Top 80% of spend is broken down by service with 3-month trends
105- [ ] Every waste finding references a specific resource or resource group
106- [ ] Right-sizing recommendations include current utilization data
107- [ ] Pricing model recommendations include break-even calculations
108- [ ] Savings roadmap has priorities, effort estimates, and timelines
109- [ ] ROI projection includes implementation cost, not just savings
110- [ ] Recommendations account for existing commitments and growth trajectory
111 
112## Common mistakes to avoid
113 
114- **Optimizing without utilization data.** Right-sizing based on instance type alone is guessing. Always require at least 14 days of CPU/memory metrics before recommending a downsize.
115- **Ignoring data transfer costs.** These are often the fastest-growing line item and the hardest to spot. Always check cross-AZ, cross-region, and internet egress charges.
116- **Recommending 3-year reservations without growth context.** A 3-year RI saves more per month but locks you in. If the workload might migrate to containers or serverless, prefer 1-year or convertible RIs.
117- **Listing savings without effort estimates.** "$50K/year savings" means nothing if it requires 6 months of engineering work. Always pair savings with implementation cost.
118- **Forgetting about non-production environments.** Dev, staging, and QA environments often run 24/7 but are only used during business hours. Scheduling alone can cut their cost by 65%.
119 

©2026 ai-directory.company

·Privacy·Terms·Cookies·