October 2, 2025

12 min read

Dillon Browne

From Monolith to Serverless Migration

A practical guide to migrating legacy monolithic applications to serverless architectures with real-world patterns, cost analysis, and production lessons.

Serverless AWS Lambda Cloud Architecture Migration Strategy Cost Optimization Terraform DevOps

Over the past year, I’ve led three major monolith-to-serverless migrations for enterprise clients. Each time, the conversation starts the same way: “We want to reduce costs and improve scalability.” What they don’t expect is that serverless isn’t a silver bullet—it’s a fundamental architectural shift that requires careful planning, phased execution, and a willingness to rethink how applications are built.

In this post, I’ll share the migration strategy I’ve refined across these projects, including the patterns that worked, the pitfalls I discovered, and the cost analysis that justified the investment.

The Problem with “Big Bang” Migrations

The biggest mistake I see teams make is attempting to rewrite their entire monolith as serverless functions in one go. I learned this the hard way on my first migration project when we spent six months rewriting a .NET monolith into Lambda functions, only to discover performance issues, hidden dependencies, and a cost structure that was 3x higher than projected.

The issue isn’t serverless itself—it’s the approach. Modern serverless migrations require a strangler fig pattern: gradually replacing monolith functionality while maintaining the existing system. This allows you to:

Validate assumptions incrementally with real traffic and costs
Learn serverless patterns without betting the entire project
Maintain business continuity throughout the migration
Roll back quickly if something doesn’t work

Phase 1: Identify the Low-Hanging Fruit

Not all parts of a monolith are equal candidates for serverless migration. I start every project with a two-week analysis phase to map the application and identify the best targets for initial extraction.

What Makes a Good First Candidate?

Asynchronous background jobs are ideal starting points. These workloads typically:

Have clear input/output boundaries
Don’t require low-latency responses
Can tolerate cold starts
Often have variable load patterns (perfect for serverless economics)

Read-heavy API endpoints are my second choice, especially those that:

Return data from databases or external APIs
Have predictable response times under 30 seconds
Don’t maintain state between requests
Experience traffic spikes

Scheduled tasks and cron jobs are the easiest wins. If your monolith runs daily reports, data cleanup, or batch processing, these are perfect candidates for EventBridge + Lambda.

What to Avoid Early On

I’ve learned to stay away from these patterns until later phases:

WebSocket connections (require API Gateway WebSocket APIs and state management)
Long-running processes (Lambda’s 15-minute limit is real)
Highly stateful operations (you’ll need to redesign around distributed state)
Critical path authentication/authorization (get your security model right first)

Phase 2: Build the Serverless Foundation

Before extracting your first function, you need the infrastructure foundation. I use Terraform for this because it allows me to version, review, and replicate the entire setup across environments.

Core Infrastructure Components

# terraform/serverless-foundation/main.tf

# API Gateway with custom domain
resource "aws_apigatewayv2_api" "main" {
  name          = "${var.project_name}-api"
  protocol_type = "HTTP"
  
  cors_configuration {
    allow_origins = var.cors_origins
    allow_methods = ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
    allow_headers = ["content-type", "authorization"]
    max_age       = 300
  }
}

# Lambda execution role with least privilege
resource "aws_iam_role" "lambda_execution" {
  name = "${var.project_name}-lambda-execution"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

# CloudWatch Log Groups with retention
resource "aws_cloudwatch_log_group" "lambda_logs" {
  for_each = var.lambda_functions
  
  name              = "/aws/lambda/${each.key}"
  retention_in_days = 30
  
  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

# DynamoDB table for distributed state (if needed)
resource "aws_dynamodb_table" "state" {
  name         = "${var.project_name}-state"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "pk"
  range_key    = "sk"
  
  attribute {
    name = "pk"
    type = "S"
  }
  
  attribute {
    name = "sk"
    type = "S"
  }
  
  ttl {
    attribute_name = "ttl"
    enabled        = true
  }
  
  point_in_time_recovery {
    enabled = true
  }
}

Observability from Day One

I’ve learned that observability can’t be an afterthought. Before migrating any functionality, I set up:

Structured logging using JSON format:

// Lambda function logger utility
import { Context } from 'aws-lambda';

interface LogContext {
  requestId: string;
  functionName: string;
  functionVersion: string;
  [key: string]: any;
}

export class Logger {
  private context: LogContext;

  constructor(lambdaContext: Context) {
    this.context = {
      requestId: lambdaContext.requestId,
      functionName: lambdaContext.functionName,
      functionVersion: lambdaContext.functionVersion,
    };
  }

  info(message: string, meta?: object) {
    console.log(JSON.stringify({
      level: 'info',
      message,
      ...this.context,
      ...meta,
      timestamp: new Date().toISOString(),
    }));
  }

  error(message: string, error: Error, meta?: object) {
    console.error(JSON.stringify({
      level: 'error',
      message,
      error: {
        message: error.message,
        stack: error.stack,
        name: error.name,
      },
      ...this.context,
      ...meta,
      timestamp: new Date().toISOString(),
    }));
  }
}

Custom CloudWatch metrics for business KPIs:

import { CloudWatch } from '@aws-sdk/client-cloudwatch';

const cloudwatch = new CloudWatch({});

export async function trackMetric(
  metricName: string,
  value: number,
  unit: 'Count' | 'Milliseconds' | 'Bytes' = 'Count'
) {
  await cloudwatch.putMetricData({
    Namespace: 'CustomApp/Business',
    MetricData: [{
      MetricName: metricName,
      Value: value,
      Unit: unit,
      Timestamp: new Date(),
    }],
  });
}

X-Ray tracing for distributed request tracking—this is critical when you have functions calling other functions or external services.

Phase 3: The Strangler Pattern in Practice

Here’s how I actually implement the strangler pattern using API Gateway and routing:

Step 1: Route Specific Endpoints to Lambda

# Route /api/reports/* to new Lambda function
resource "aws_apigatewayv2_integration" "reports" {
  api_id           = aws_apigatewayv2_api.main.id
  integration_type = "AWS_PROXY"
  
  integration_uri    = aws_lambda_function.reports.invoke_arn
  integration_method = "POST"
  payload_format_version = "2.0"
}

resource "aws_apigatewayv2_route" "reports" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "ANY /api/reports/{proxy+}"
  target    = "integrations/${aws_apigatewayv2_integration.reports.id}"
}

# Route everything else to existing monolith (ECS/EC2)
resource "aws_apigatewayv2_integration" "monolith" {
  api_id           = aws_apigatewayv2_api.main.id
  integration_type = "HTTP_PROXY"
  
  integration_uri    = "http://${var.monolith_alb_dns}"
  integration_method = "ANY"
}

resource "aws_apigatewayv2_route" "monolith_catchall" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "ANY /{proxy+}"
  target    = "integrations/${aws_apigatewayv2_integration.monolith.id}"
}

This approach allows you to:

Migrate one endpoint at a time
Compare performance and costs between old and new
Roll back instantly by updating routes
Run A/B tests with weighted routing

Step 2: Shared Data Access

The biggest challenge in strangler migrations is data access. You can’t just split your database in half. I use this pattern:

Database read replicas for Lambda functions:

Create a dedicated read replica with connection pooling (RDS Proxy)
Lambda functions read from the replica
Writes still go through the monolith (initially)
Gradually migrate write operations as confidence grows

Event-driven synchronization for eventual consistency:

// Monolith publishes events to EventBridge
import { EventBridge } from '@aws-sdk/client-eventbridge';

const eventBridge = new EventBridge({});

export async function publishOrderCreated(order: Order) {
  await eventBridge.putEvents({
    Entries: [{
      Source: 'monolith.orders',
      DetailType: 'OrderCreated',
      Detail: JSON.stringify(order),
      EventBusName: 'application-events',
    }],
  });
}

// Lambda functions subscribe to these events
export const handler = async (event: EventBridgeEvent) => {
  const order = JSON.parse(event.detail);
  
  // Update materialized view, cache, or search index
  await updateOrderSearchIndex(order);
};

Phase 4: Optimize for Cost

Here’s where serverless migrations either succeed or fail financially. I’ve seen projects where the serverless version cost more than the monolith because teams didn’t optimize.

Real Cost Analysis

From a recent migration of a .NET API serving 50M requests/month:

Before (EC2 Auto Scaling):

4x m5.xlarge instances (on-demand): $560/month
Application Load Balancer: $23/month
RDS db.m5.large (Multi-AZ): $350/month
Total: ~$933/month

After (Serverless - Unoptimized):

Lambda invocations (50M × $0.20/1M): $10/month
Lambda duration (50M × 200ms avg × $0.0000166667/GB-sec × 1GB): $166/month
API Gateway (50M requests × $1.00/1M): $50/month
RDS Proxy: $65/month
DynamoDB (for session state): $45/month
Total: ~$336/month (64% reduction)

After (Serverless - Optimized):

Lambda invocations: $10/month
Lambda duration (50M × 120ms avg × 1.5GB memory): $150/month
API Gateway: $50/month
RDS Proxy: $65/month
DynamoDB: $45/month
Total: ~$320/month (66% reduction)

Optimization Techniques That Moved the Needle

1. Right-size Lambda memory (memory also controls CPU):

// Use AWS Lambda Power Tuning to find the sweet spot
// https://github.com/alexcasalboni/aws-lambda-power-tuning

// Our finding: 1536MB was optimal for our Node.js functions
// - Faster execution (lower duration costs)
// - Better cold start times
// - Minimal additional memory cost

2. Connection pooling for databases:

// BAD: Creating new connection per invocation
export const handler = async (event) => {
  const client = new Client({ connectionString: process.env.DB_URL });
  await client.connect();
  // ... query
  await client.end();
};

// GOOD: Reuse connections across invocations
import { Client } from 'pg';

let client: Client | null = null;

async function getClient() {
  if (!client) {
    client = new Client({ connectionString: process.env.DB_URL });
    await client.connect();
  }
  return client;
}

export const handler = async (event) => {
  const db = await getClient();
  // ... query (connection persists)
};

3. Batch processing instead of individual invocations:

// Process SQS messages in batches
export const handler = async (event: SQSEvent) => {
  const records = event.Records;
  
  // Process up to 10 messages in one invocation
  await Promise.all(records.map(record => 
    processMessage(JSON.parse(record.body))
  ));
};

4. Reserved concurrency for predictable workloads:

resource "aws_lambda_function" "api" {
  # ... other config
  
  # Reserve 10 concurrent executions
  # Prevents throttling and provides cost predictability
  reserved_concurrent_executions = 10
}

Lessons Learned and Trade-offs

After three major migrations, here’s what I wish I’d known from the start:

What Worked Well

Progressive migration is worth the complexity. Yes, running two systems simultaneously is messy. But it’s far less risky than a big-bang rewrite.

Observability investments pay off immediately. The structured logging and metrics I mentioned earlier caught issues in production that would have been nearly impossible to debug otherwise.

Serverless forces better architecture. The constraints (stateless, timeouts, cold starts) push you toward cleaner separation of concerns and event-driven design.

What Didn’t Work

Over-engineering for “eventual scale.” I built a complex saga pattern for distributed transactions on the first migration. We never needed it. Start simple.

Ignoring cold starts. For user-facing APIs, cold starts matter. Use provisioned concurrency for critical endpoints or consider Lambda SnapStart for Java workloads.

Underestimating the learning curve. Even experienced developers need time to internalize serverless patterns. Budget for this.

The Surprising Benefits

Developer velocity increased. Once the foundation was in place, new features shipped 40% faster because developers didn’t need to worry about infrastructure.

Incident response improved. Smaller, isolated functions meant faster root cause analysis and more surgical fixes.

Security posture strengthened. Least-privilege IAM per function, automatic patching, and reduced attack surface made our security team happy.

When NOT to Go Serverless

Serverless isn’t always the answer. I’ve recommended against it when:

Consistent, high-throughput workloads (containers on ECS/EKS are more cost-effective)
Sub-10ms latency requirements (cold starts and invocation overhead matter)
Existing team expertise in container orchestration (don’t force a migration)
Complex stateful workflows (unless you’re ready to embrace Step Functions)

The Migration Checklist

Here’s the checklist I use for every migration:

Week 1-2: Analysis

Map monolith components and dependencies
Identify top 10 candidates for extraction
Analyze traffic patterns and cost

Found this helpful? Share it with others:

Share Share