Cloud Cost Optimization Strategies for AWS, Azure, and GCP

Posted Jan 29, 2026

By R G

20 min read

Cloud Cost Optimization Strategies for AWS, Azure, and GCP

Cloud computing offers incredible flexibility and scalability, but without proper cost management, cloud bills can quickly spiral out of control. In this comprehensive guide, we will explore practical cost optimization strategies across the three major cloud providers: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). We will cover cost monitoring, resource rightsizing, reserved instances, spot instances, auto-scaling, storage optimization, network costs, tagging strategies, and FinOps practices.

Understanding Cloud Cost Fundamentals

Cloud costs typically fall into several categories:

Compute: Virtual machines, containers, serverless functions
Storage: Object storage, block storage, databases
Network: Data transfer, load balancers, CDN
Managed Services: Databases, analytics, AI/ML services
Support: Technical support plans

The pay-as-you-go model provides flexibility but requires active management to avoid waste.

Cost Monitoring and Visibility

The first step in cost optimization is understanding where money is being spent.

AWS Cost Monitoring

  
# Install AWS CLI
pip install awscli

# Configure credentials
aws configure

# Get cost and usage data
aws ce get-cost-and-usage \
  --time-period Start=2024-01-01,End=2024-01-31 \
  --granularity MONTHLY \
  --metrics "UnblendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE

# Create cost budget
aws budgets create-budget \
  --account-id 123456789012 \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json

Budget configuration file:

  
{
  "BudgetName": "Monthly-Budget-2024",
  "BudgetLimit": {
    "Amount": "10000",
    "Unit": "USD"
  },
  "TimeUnit": "MONTHLY",
  "BudgetType": "COST",
  "CostFilters": {
    "TagKeyValue": [
      "user:Environment$Production"
    ]
  }
}

Azure Cost Monitoring

  
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

# Login
az login

# Show cost analysis
az consumption usage list \
  --start-date 2024-01-01 \
  --end-date 2024-01-31 \
  --query "[].{Name:instanceName,Cost:pretaxCost}" \
  --output table

# Create budget
az consumption budget create \
  --budget-name monthly-budget \
  --amount 10000 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2024-01-01 \
  --end-date 2024-12-31

Azure Cost Management query using Python:

  
from azure.identity import DefaultAzureCredential
from azure.mgmt.costmanagement import CostManagementClient
from azure.mgmt.costmanagement.models import QueryDefinition, QueryDataset, QueryTimePeriod

credential = DefaultAzureCredential()
client = CostManagementClient(credential)

# Define query
query = QueryDefinition(
    type="Usage",
    timeframe="MonthToDate",
    dataset=QueryDataset(
        granularity="Daily",
        aggregation={
            "totalCost": {
                "name": "PreTaxCost",
                "function": "Sum"
            }
        },
        grouping=[
            {
                "type": "Dimension",
                "name": "ResourceGroup"
            }
        ]
    )
)

# Execute query
scope = f"/subscriptions/{subscription_id}"
result = client.query.usage(scope, query)

for row in result.rows:
    print(f"Date: {row[0]}, Resource Group: {row[1]}, Cost: ${row[2]:.2f}")

GCP Cost Monitoring

  
# Install gcloud CLI
curl https://sdk.cloud.google.com | bash

# Initialize
gcloud init

# Export billing data to BigQuery (one-time setup)
gcloud alpha billing accounts list

# Query costs using bq
bq query --use_legacy_sql=false '
SELECT
  service.description as service,
  SUM(cost) as total_cost
FROM `project-id.billing_dataset.gcp_billing_export_v1_XXXXX`
WHERE DATE(_PARTITIONTIME) BETWEEN "2024-01-01" AND "2024-01-31"
GROUP BY service
ORDER BY total_cost DESC
'

GCP Budget alert using Terraform:

  
resource "google_billing_budget" "monthly_budget" {
  billing_account = var.billing_account
  display_name    = "Monthly Budget"

  budget_filter {
    projects = ["projects/${var.project_number}"]
  }

  amount {
    specified_amount {
      currency_code = "USD"
      units         = "10000"
    }
  }

  threshold_rules {
    threshold_percent = 0.5
  }

  threshold_rules {
    threshold_percent = 0.9
  }

  threshold_rules {
    threshold_percent = 1.0
  }

  all_updates_rule {
    pubsub_topic = google_pubsub_topic.budget_alerts.id
  }
}

resource "google_pubsub_topic" "budget_alerts" {
  name = "budget-alerts"
}

Resource Rightsizing

Rightsizing ensures you are using the appropriate instance types and sizes for your workloads.

AWS Rightsizing

  
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')
ec2 = boto3.client('ec2')

def analyze_ec2_utilization(instance_id, days=30):
    """Analyze EC2 instance CPU and memory utilization"""
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)
    
    # Get CPU utilization
    cpu_metrics = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,
        Statistics=['Average', 'Maximum']
    )
    
    # Calculate averages
    avg_cpu = sum(m['Average'] for m in cpu_metrics['Datapoints']) / len(cpu_metrics['Datapoints'])
    max_cpu = max(m['Maximum'] for m in cpu_metrics['Datapoints'])
    
    print(f"Instance {instance_id}:")
    print(f"  Average CPU: {avg_cpu:.2f}%")
    print(f"  Maximum CPU: {max_cpu:.2f}%")
    
    # Recommendations
    if avg_cpu < 20 and max_cpu < 40:
        print("  Recommendation: Consider downsizing or using Burstable instances (t3/t4g)")
    elif avg_cpu > 70:
        print("  Recommendation: Consider upgrading instance type")
    
    return avg_cpu, max_cpu

# Get all running instances
instances = ec2.describe_instances(
    Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)

for reservation in instances['Reservations']:
    for instance in reservation['Instances']:
        analyze_ec2_utilization(instance['InstanceId'])

Azure Rightsizing with Azure Advisor

  
# Get Azure Advisor recommendations
az advisor recommendation list \
  --category Cost \
  --query "[].{Category:category,Impact:impact,Resource:impactedValue,Recommendation:shortDescription.solution}" \
  --output table

Python script to automate rightsizing:

  
from azure.identity import DefaultAzureCredential
from azure.mgmt.advisor import AdvisorManagementClient
from azure.mgmt.compute import ComputeManagementClient

credential = DefaultAzureCredential()
advisor_client = AdvisorManagementClient(credential, subscription_id)
compute_client = ComputeManagementClient(credential, subscription_id)

# Get cost recommendations
recommendations = advisor_client.recommendations.list(
    filter="Category eq 'Cost'"
)

for rec in recommendations:
    if rec.impacted_field == "Microsoft.Compute/virtualMachines":
        print(f"VM: {rec.impacted_value}")
        print(f"Recommendation: {rec.short_description.solution}")
        print(f"Potential Savings: ${rec.extended_properties.get('savingsAmount', 'N/A')}")
        print(f"Current SKU: {rec.extended_properties.get('currentSku')}")
        print(f"Recommended SKU: {rec.extended_properties.get('targetSku')}")
        print("---")

GCP Rightsizing Recommendations

  
from google.cloud import recommender_v1

def get_rightsizing_recommendations(project_id):
    """Get VM rightsizing recommendations from GCP"""
    client = recommender_v1.RecommenderClient()
    
    parent = f"projects/{project_id}/locations/us-central1/recommenders/google.compute.instance.MachineTypeRecommender"
    
    recommendations = client.list_recommendations(parent=parent)
    
    for recommendation in recommendations:
        print(f"Recommendation: {recommendation.name}")
        print(f"Description: {recommendation.description}")
        print(f"Priority: {recommendation.priority}")
        
        # Parse recommendation content
        for operation in recommendation.content.operation_groups:
            for op in operation.operations:
                print(f"Action: {op.action}")
                print(f"Resource: {op.resource}")
                print(f"Current machine type: {op.value_matcher}")
        
        # Cost impact
        if recommendation.primary_impact:
            impact = recommendation.primary_impact
            if impact.category == recommender_v1.Impact.Category.COST:
                print(f"Estimated monthly savings: ${abs(impact.cost_projection.cost.units)}")
        
        print("---")

# Usage
get_rightsizing_recommendations("your-project-id")

Reserved Instances and Savings Plans

Reserved instances and savings plans offer significant discounts in exchange for commitment.

AWS Reserved Instances and Savings Plans

  
import boto3

ce = boto3.client('ce')

def get_ri_recommendations():
    """Get Reserved Instance recommendations"""
    response = ce.get_reservation_purchase_recommendation(
        ServiceSpecification={'EC2Specification': {'OfferingClass': 'STANDARD'}},
        PaymentOption='PARTIAL_UPFRONT',
        TermInYears='ONE_YEAR',
        LookbackPeriodInDays='SIXTY_DAYS'
    )
    
    for recommendation in response['Recommendations']:
        details = recommendation['RecommendationDetail']
        print(f"Instance Type: {details['InstanceDetails']['EC2InstanceDetails']['InstanceType']}")
        print(f"Recommended Instances: {details['RecommendedNumberOfInstancesToPurchase']}")
        print(f"Estimated Monthly Savings: ${details['EstimatedMonthlySavingsAmount']}")
        print(f"Upfront Cost: ${details['UpfrontCost']}")
        print("---")

def get_savings_plans_recommendations():
    """Get Savings Plans recommendations"""
    response = ce.get_savings_plans_purchase_recommendation(
        SavingsPlansType='COMPUTE_SP',
        TermInYears='ONE_YEAR',
        PaymentOption='PARTIAL_UPFRONT',
        LookbackPeriodInDays='SIXTY_DAYS'
    )
    
    for rec in response['SavingsPlansPurchaseRecommendation']['SavingsPlansPurchaseRecommendationDetails']:
        print(f"Hourly Commitment: ${rec['HourlyCommitmentToPurchase']}")
        print(f"Estimated Monthly Savings: ${rec['EstimatedMonthlySavingsAmount']}")
        print(f"Estimated ROI: {rec['EstimatedROI']}%")
        print("---")

Terraform for purchasing Reserved Instances:

  
resource "aws_ec2_reserved_instance" "production" {
  instance_type     = "t3.large"
  instance_count    = 10
  availability_zone = "us-east-1a"
  offering_class    = "standard"
  offering_type     = "Partial Upfront"
  duration          = 31536000  # 1 year in seconds
}

Azure Reserved VM Instances

  
# List available reservations
az reservations catalog show \
  --subscription-id $SUBSCRIPTION_ID \
  --reserved-resource-type VirtualMachines \
  --location eastus

# Purchase reservation
az reservations reservation-order purchase \
  --reservation-order-id /providers/Microsoft.Capacity/reservationOrders/XXXXX \
  --sku Standard_D2s_v3 \
  --location eastus \
  --quantity 10 \
  --term P1Y \
  --billing-plan Monthly

GCP Committed Use Discounts

  
# Create committed use discount
gcloud compute commitments create my-commitment \
  --region us-central1 \
  --resources vcpu=100,memory=400GB \
  --plan 12-month

# List active commitments
gcloud compute commitments list

# Create commitment using Terraform

Terraform configuration:

  
resource "google_compute_commitment" "commitment" {
  name        = "production-commitment"
  region      = "us-central1"
  plan        = "TWELVE_MONTH"
  type        = "GENERAL_PURPOSE"

  resources {
    type   = "VCPU"
    amount = "100"
  }

  resources {
    type   = "MEMORY"
    amount = "400"
  }
}

Spot Instances and Preemptible VMs

Use spot instances for fault-tolerant, flexible workloads to save up to 90 percent.

AWS Spot Instances

  
# AWS Spot Fleet request
apiVersion: v1
kind: ConfigMap
metadata:
  name: spot-config
data:
  spot-request.json: |
    {
      "IamFleetRole": "arn:aws:iam::123456789012:role/aws-ec2-spot-fleet-role",
      "AllocationStrategy": "lowestPrice",
      "TargetCapacity": 10,
      "SpotPrice": "0.05",
      "ValidFrom": "2024-01-01T00:00:00Z",
      "ValidUntil": "2024-12-31T23:59:59Z",
      "LaunchSpecifications": [
        {
          "ImageId": "ami-12345678",
          "InstanceType": "t3.medium",
          "KeyName": "my-key",
          "SubnetId": "subnet-12345",
          "SpotPrice": "0.05"
        },
        {
          "ImageId": "ami-12345678",
          "InstanceType": "t3.large",
          "KeyName": "my-key",
          "SubnetId": "subnet-12345",
          "SpotPrice": "0.08"
        }
      ]
    }

Using Spot Instances with Kubernetes Karpenter:

  
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: spot-provisioner
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
    - key: node.kubernetes.io/instance-type
      operator: In
      values: ["t3.medium", "t3.large", "t3.xlarge"]
  limits:
    resources:
      cpu: 1000
      memory: 1000Gi
  provider:
    instanceProfile: KarpenterNodeInstanceProfile
    subnetSelector:
      karpenter.sh/discovery: my-cluster
    securityGroupSelector:
      karpenter.sh/discovery: my-cluster
    tags:
      Name: karpenter-spot-node
  ttlSecondsAfterEmpty: 30
  ttlSecondsUntilExpired: 604800

Azure Spot VMs

  
# Create Spot VM
az vm create \
  --resource-group myResourceGroup \
  --name mySpotVM \
  --image UbuntuLTS \
  --priority Spot \
  --max-price 0.05 \
  --eviction-policy Deallocate \
  --size Standard_D2s_v3

Azure Spot with VMSS:

  
resource "azurerm_linux_virtual_machine_scale_set" "spot_vmss" {
  name                = "spot-vmss"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  sku                 = "Standard_D2s_v3"
  instances           = 5
  priority            = "Spot"
  eviction_policy     = "Deallocate"
  max_bid_price       = 0.05

  admin_username = "azureuser"

  admin_ssh_key {
    username   = "azureuser"
    public_key = file("~/.ssh/id_rsa.pub")
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "18.04-LTS"
    version   = "latest"
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  network_interface {
    name    = "spot-nic"
    primary = true

    ip_configuration {
      name      = "internal"
      primary   = true
      subnet_id = azurerm_subnet.main.id
    }
  }
}

GCP Preemptible VMs

  
# Create preemptible VM
gcloud compute instances create preemptible-vm \
  --zone us-central1-a \
  --machine-type n1-standard-4 \
  --preemptible \
  --maintenance-policy TERMINATE

# Create instance template with preemptible
gcloud compute instance-templates create preemptible-template \
  --machine-type n1-standard-4 \
  --preemptible \
  --boot-disk-size 100GB \
  --image-family ubuntu-2004-lts \
  --image-project ubuntu-os-cloud

GKE with Spot VMs:

  
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
  name: spot-pool
spec:
  clusterRef:
    name: my-cluster
  location: us-central1
  initialNodeCount: 3
  autoscaling:
    minNodeCount: 1
    maxNodeCount: 10
  nodeConfig:
    machineType: n1-standard-4
    preemptible: true
    diskSizeGb: 100
    oauthScopes:
      - "https://www.googleapis.com/auth/cloud-platform"
    labels:
      workload-type: batch
    taints:
      - key: preemptible
        value: "true"
        effect: NoSchedule

Auto-Scaling

Automatically adjust resources based on demand.

AWS Auto Scaling

  
# AWS Auto Scaling Group with CloudFormation
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  LaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: WebServerTemplate
      LaunchTemplateData:
        ImageId: ami-12345678
        InstanceType: t3.medium
        SecurityGroupIds:
          - sg-12345678
        UserData:
          Fn::Base64: !Sub |
            #!/bin/bash
            yum update -y
            yum install -y httpd
            systemctl start httpd

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MinSize: 2
      MaxSize: 10
      DesiredCapacity: 2
      TargetGroupARNs:
        - !Ref TargetGroup
      VPCZoneIdentifier:
        - subnet-12345
        - subnet-67890

  ScaleUpPolicy:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
      AdjustmentType: ChangeInCapacity
      AutoScalingGroupName: !Ref AutoScalingGroup
      Cooldown: 60
      ScalingAdjustment: 2

  CPUAlarmHigh:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: Scale up when CPU exceeds 70%
      MetricName: CPUUtilization
      Namespace: AWS/EC2
      Statistic: Average
      Period: 300
      EvaluationPeriods: 2
      Threshold: 70
      AlarmActions:
        - !Ref ScaleUpPolicy
      Dimensions:
        - Name: AutoScalingGroupName
          Value: !Ref AutoScalingGroup
      ComparisonOperator: GreaterThanThreshold

Azure Auto-scaling

  
resource "azurerm_monitor_autoscale_setting" "vmss_autoscale" {
  name                = "vmss-autoscale"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  target_resource_id  = azurerm_linux_virtual_machine_scale_set.main.id

  profile {
    name = "default"

    capacity {
      default = 2
      minimum = 2
      maximum = 10
    }

    rule {
      metric_trigger {
        metric_name        = "Percentage CPU"
        metric_resource_id = azurerm_linux_virtual_machine_scale_set.main.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "GreaterThan"
        threshold          = 70
      }

      scale_action {
        direction = "Increase"
        type      = "ChangeCount"
        value     = "2"
        cooldown  = "PT5M"
      }
    }

    rule {
      metric_trigger {
        metric_name        = "Percentage CPU"
        metric_resource_id = azurerm_linux_virtual_machine_scale_set.main.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "LessThan"
        threshold          = 30
      }

      scale_action {
        direction = "Decrease"
        type      = "ChangeCount"
        value     = "1"
        cooldown  = "PT5M"
      }
    }
  }
}

GCP Auto-scaling

  
# Create managed instance group with autoscaling
gcloud compute instance-groups managed create web-group \
  --base-instance-name web \
  --template web-template \
  --size 2 \
  --zone us-central1-a

gcloud compute instance-groups managed set-autoscaling web-group \
  --max-num-replicas 10 \
  --min-num-replicas 2 \
  --target-cpu-utilization 0.70 \
  --cool-down-period 60 \
  --zone us-central1-a

Storage Optimization

Optimize storage costs by choosing the right storage class and implementing lifecycle policies.

AWS S3 Storage Classes and Lifecycle

  
import boto3

s3 = boto3.client('s3')

lifecycle_policy = {
    'Rules': [
        {
            'Id': 'TransitionToIA',
            'Filter': {'Prefix': 'logs/'},
            'Status': 'Enabled',
            'Transitions': [
                {
                    'Days': 30,
                    'StorageClass': 'STANDARD_IA'
                },
                {
                    'Days': 90,
                    'StorageClass': 'GLACIER'
                },
                {
                    'Days': 365,
                    'StorageClass': 'DEEP_ARCHIVE'
                }
            ],
            'Expiration': {
                'Days': 730
            }
        },
        {
            'Id': 'DeleteOldVersions',
            'Filter': {},
            'Status': 'Enabled',
            'NoncurrentVersionTransitions': [
                {
                    'NoncurrentDays': 30,
                    'StorageClass': 'STANDARD_IA'
                }
            ],
            'NoncurrentVersionExpiration': {
                'NoncurrentDays': 90
            }
        }
    ]
}

s3.put_bucket_lifecycle_configuration(
    Bucket='my-bucket',
    LifecycleConfiguration=lifecycle_policy
)

Azure Blob Storage Tiers and Lifecycle

  
resource "azurerm_storage_account" "main" {
  name                     = "mystorageaccount"
  resource_group_name      = azurerm_resource_group.main.name
  location                 = azurerm_resource_group.main.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  access_tier              = "Hot"
}

resource "azurerm_storage_management_policy" "lifecycle" {
  storage_account_id = azurerm_storage_account.main.id

  rule {
    name    = "rule1"
    enabled = true
    filters {
      prefix_match = ["logs/"]
      blob_types   = ["blockBlob"]
    }
    actions {
      base_blob {
        tier_to_cool_after_days_since_modification_greater_than    = 30
        tier_to_archive_after_days_since_modification_greater_than = 90
        delete_after_days_since_modification_greater_than          = 730
      }
      snapshot {
        delete_after_days_since_creation_greater_than = 90
      }
    }
  }
}

GCP Cloud Storage Classes

  
from google.cloud import storage

def set_bucket_lifecycle(bucket_name):
    """Set lifecycle rules for GCS bucket"""
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)

    rule = storage.bucket.LifecycleRule(
        action=storage.bucket.LifecycleRule.SetStorageClass(
            storage_class='NEARLINE'
        ),
        condition=storage.bucket.LifecycleRule.AgeCondition(age=30)
    )

    rule2 = storage.bucket.LifecycleRule(
        action=storage.bucket.LifecycleRule.SetStorageClass(
            storage_class='COLDLINE'
        ),
        condition=storage.bucket.LifecycleRule.AgeCondition(age=90)
    )

    rule3 = storage.bucket.LifecycleRule(
        action=storage.bucket.LifecycleRule.SetStorageClass(
            storage_class='ARCHIVE'
        ),
        condition=storage.bucket.LifecycleRule.AgeCondition(age=365)
    )

    rule4 = storage.bucket.LifecycleRule(
        action=storage.bucket.LifecycleRule.Delete(),
        condition=storage.bucket.LifecycleRule.AgeCondition(age=730)
    )

    bucket.lifecycle_rules = [rule, rule2, rule3, rule4]
    bucket.patch()

Network Cost Optimization

Network costs can be significant, especially for data transfer.

Strategies to Reduce Network Costs

Use CDN for static content delivery
Implement VPC peering instead of internet gateway
Minimize cross-region data transfer
Use private endpoints for cloud services
Compress data before transfer

AWS Network Cost Optimization

  
# Use VPC Endpoints to avoid NAT Gateway costs
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.us-east-1.s3"
  route_table_ids = [aws_route_table.private.id]

  tags = {
    Name = "s3-endpoint"
  }
}

resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.us-east-1.dynamodb"
  route_table_ids = [aws_route_table.private.id]
}

# Use CloudFront for content delivery
resource "aws_cloudfront_distribution" "cdn" {
  origin {
    domain_name = aws_s3_bucket.static.bucket_regional_domain_name
    origin_id   = "S3-static"

    s3_origin_config {
      origin_access_identity = aws_cloudfront_origin_access_identity.oai.cloudfront_access_identity_path
    }
  }

  enabled             = true
  default_root_object = "index.html"

  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "S3-static"

    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  price_class = "PriceClass_100"  # Use only US, Canada, Europe

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    cloudfront_default_certificate = true
  }
}

Tagging Strategies

Proper tagging enables cost allocation and tracking.

Comprehensive Tagging Strategy

  
# AWS resource tagging
import boto3

def apply_tags(resource_arn, tags):
    """Apply standardized tags to AWS resources"""
    client = boto3.client('resourcegroupstaggingapi')
    
    client.tag_resources(
        ResourceARNList=[resource_arn],
        Tags=tags
    )

# Standard tag schema
standard_tags = {
    'Environment': 'production',
    'Project': 'web-application',
    'Owner': 'platform-team',
    'CostCenter': 'engineering',
    'Application': 'api-backend',
    'ManagedBy': 'terraform',
    'Backup': 'daily',
    'Compliance': 'pci-dss'
}

# Terraform with consistent tagging

Terraform module for consistent tagging:

  
# modules/tags/variables.tf
variable "environment" {
  type = string
}

variable "project" {
  type = string
}

variable "additional_tags" {
  type    = map(string)
  default = {}
}

# modules/tags/outputs.tf
output "tags" {
  value = merge(
    {
      Environment = var.environment
      Project     = var.project
      ManagedBy   = "Terraform"
      CreatedDate = timestamp()
    },
    var.additional_tags
  )
}

# Usage in main.tf
module "common_tags" {
  source      = "./modules/tags"
  environment = "production"
  project     = "web-app"
  additional_tags = {
    Owner      = "platform-team"
    CostCenter = "engineering"
  }
}

resource "aws_instance" "web" {
  ami           = "ami-12345678"
  instance_type = "t3.medium"
  tags          = module.common_tags.tags
}

FinOps Practices

FinOps is a cultural practice that brings financial accountability to cloud spending.

Key FinOps Principles

Teams need to collaborate
Everyone takes ownership of cloud usage
A centralized team drives FinOps
Reports should be accessible and timely
Decisions are driven by business value
Take advantage of variable cost model

Implementing FinOps with Infrastructure as Code

  
# Cost-aware infrastructure deployment
import boto3
import json

class CostAwareDeployer:
    def __init__(self):
        self.ce = boto3.client('ce')
        self.ec2 = boto3.client('ec2')
        
    def get_current_month_cost(self):
        """Get current month's cost"""
        response = self.ce.get_cost_and_usage(
            TimePeriod={
                'Start': '2024-01-01',
                'End': '2024-01-31'
            },
            Granularity='MONTHLY',
            Metrics=['UnblendedCost']
        )
        return float(response['ResultsByTime'][0]['Total']['UnblendedCost']['Amount'])
    
    def check_budget(self, proposed_cost):
        """Check if proposed deployment fits within budget"""
        current_cost = self.get_current_month_cost()
        budget_limit = 10000  # $10,000
        
        if current_cost + proposed_cost > budget_limit:
            return False, f"Deployment would exceed budget: ${current_cost + proposed_cost} > ${budget_limit}"
        return True, "Within budget"
    
    def deploy_with_cost_check(self, instance_type, count):
        """Deploy instances only if within budget"""
        # Calculate estimated cost
        pricing = self.get_instance_pricing(instance_type)
        monthly_cost = pricing * count * 730  # hours per month
        
        can_deploy, message = self.check_budget(monthly_cost)
        
        if can_deploy:
            print(f"Deploying {count} {instance_type} instances")
            print(f"Estimated monthly cost: ${monthly_cost:.2f}")
            # Actual deployment code here
        else:
            print(f"Deployment blocked: {message}")
            # Send notification to team

Cost Optimization Automation

  
# Automated cost optimization script
import boto3
from datetime import datetime, timedelta

class CostOptimizer:
    def __init__(self):
        self.ec2 = boto3.client('ec2')
        self.rds = boto3.client('rds')
        
    def stop_unused_instances(self):
        """Stop EC2 instances with low CPU utilization"""
        cloudwatch = boto3.client('cloudwatch')
        instances = self.ec2.describe_instances(
            Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
        )
        
        for reservation in instances['Reservations']:
            for instance in reservation['Instances']:
                instance_id = instance['InstanceId']
                
                # Check CPU utilization
                metrics = cloudwatch.get_metric_statistics(
                    Namespace='AWS/EC2',
                    MetricName='CPUUtilization',
                    Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
                    StartTime=datetime.utcnow() - timedelta(days=7),
                    EndTime=datetime.utcnow(),
                    Period=86400,
                    Statistics=['Average']
                )
                
                if metrics['Datapoints']:
                    avg_cpu = sum(m['Average'] for m in metrics['Datapoints']) / len(metrics['Datapoints'])
                    
                    if avg_cpu < 5:
                        print(f"Stopping unused instance {instance_id} (avg CPU: {avg_cpu:.2f}%)")
                        self.ec2.stop_instances(InstanceIds=[instance_id])
    
    def delete_old_snapshots(self, days=90):
        """Delete snapshots older than specified days"""
        snapshots = self.ec2.describe_snapshots(OwnerIds=['self'])
        cutoff_date = datetime.utcnow() - timedelta(days=days)
        
        for snapshot in snapshots['Snapshots']:
            snapshot_date = snapshot['StartTime'].replace(tzinfo=None)
            if snapshot_date < cutoff_date:
                print(f"Deleting old snapshot {snapshot['SnapshotId']}")
                self.ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'])
    
    def schedule_non_prod_instances(self):
        """Schedule non-production instances to stop at night"""
        # Implementation for scheduling
        pass

# Run optimization
optimizer = CostOptimizer()
optimizer.stop_unused_instances()
optimizer.delete_old_snapshots()

Conclusion

Cloud cost optimization is an ongoing process that requires continuous monitoring, analysis, and adjustment. By implementing the strategies covered in this guide, including proper cost monitoring, resource rightsizing, leveraging reserved instances and spot instances, implementing auto-scaling, optimizing storage and network costs, using effective tagging strategies, and adopting FinOps practices, organizations can significantly reduce their cloud spending while maintaining or improving performance.

Key takeaways:

Implement comprehensive cost monitoring and alerting
Regularly review and rightsize resources
Use commitment-based pricing for predictable workloads
Leverage spot instances for fault-tolerant workloads
Implement auto-scaling to match demand
Optimize storage with lifecycle policies
Reduce network costs with CDN and VPC endpoints
Use consistent tagging for cost allocation
Adopt FinOps culture across the organization
Automate cost optimization where possible

References

AWS Cost Management: https://aws.amazon.com/aws-cost-management/
AWS Well-Architected Framework - Cost Optimization Pillar: https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html
Azure Cost Management: https://azure.microsoft.com/en-us/products/cost-management/
Azure Architecture Center - Cost Optimization: https://learn.microsoft.com/en-us/azure/architecture/framework/cost/
GCP Cost Management: https://cloud.google.com/cost-management
GCP Best Practices for Cost Optimization: https://cloud.google.com/architecture/best-practices-for-optimizing-your-cloud-costs
FinOps Foundation: https://www.finops.org/
Cloud FinOps Book: https://www.oreilly.com/library/view/cloud-finops/9781492054610/
AWS Spot Instance Best Practices: https://aws.amazon.com/ec2/spot/getting-started/
Kubernetes Resource Management: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Cloud

This post is licensed under CC BY 4.0 by the author.

Cloud Cost Optimization Strategies for AWS, Azure, and GCP

Understanding Cloud Cost Fundamentals

Cost Monitoring and Visibility

AWS Cost Monitoring

Azure Cost Monitoring

GCP Cost Monitoring

Resource Rightsizing

AWS Rightsizing

Azure Rightsizing with Azure Advisor

GCP Rightsizing Recommendations

Reserved Instances and Savings Plans

AWS Reserved Instances and Savings Plans

Azure Reserved VM Instances

GCP Committed Use Discounts

Spot Instances and Preemptible VMs

AWS Spot Instances

Azure Spot VMs

GCP Preemptible VMs

Auto-Scaling

AWS Auto Scaling

Azure Auto-scaling

GCP Auto-scaling

Storage Optimization

AWS S3 Storage Classes and Lifecycle

Azure Blob Storage Tiers and Lifecycle

GCP Cloud Storage Classes

Network Cost Optimization

Strategies to Reduce Network Costs

AWS Network Cost Optimization

Tagging Strategies

Comprehensive Tagging Strategy

FinOps Practices

Key FinOps Principles

Implementing FinOps with Infrastructure as Code

Cost Optimization Automation

Conclusion

References

Trending Tags