Deploy LangGraph to Production: Complete Guide for 2026

You spent weeks perfecting a multi-agent graph in your local Jupyter notebook. It handles state correctly, pauses for human input at the right moment, and streams responses cleanly to the front end. Then you push it live and everything breaks. This guide walks you through every step, from local setup to live deployment, covering three proven production paths, real cost figures, and the most common mistakes teams make along the way.

What Is LangGraph and Why Deploy It Now

You spent weeks perfecting a multi-agent graph in your local Jupyter notebook. It handles state correctly, pauses for human input at the right moment, and streams responses cleanly to the front end. Then you push it live and everything breaks. Context disappears between turns. Costs spike without warning. Debugging becomes a guessing game.

This is the exact wall where most AI engineering teams stall.

LangGraph sits inside the LangChain ecosystem. It lets you build stateful, cyclic multi-agent workflows where you define nodes for LLM calls or tool use, connect them with edges, and add conditional routing logic. The graph remembers state across conversation turns, pauses for human review when you configure it to, and streams results in real time.

Six months ago most teams treated LangGraph graphs as sophisticated prototypes. That changed fast. Cloud-managed runtimes and self-hosted infrastructure-as-code templates turned experimental setups into repeatable production deployments. Enterprises across insurance, banking, and customer support now run them at scale.

The business case is clear. Teams that fully reimagine workflows around reusable agents cut nonessential work by 30 to 50 percent, according to McKinsey research on AI-powered workflow redesign. Those gains only show up after the graph leaves the notebook.

This guide walks through every step — from local setup to live deployment — covering three proven production paths, real cost figures, and the most common mistakes teams make along the way.

Core Production Features That Changed Everything in Late 2025

Several capabilities arrived together in late 2025, making LangGraph production-ready in a way it simply was not before.

Checkpointing

The graph saves its exact state after every node execution. You can resume, rewind, or branch from any point in a conversation. Long-running sessions no longer lose context on failure.

Interrupts

The graph pauses at any step you define, waits for external approval or input, then continues automatically. This is critical for workflows in compliance-heavy industries.

Streaming APIs

Partial results push to the front end without blocking the full response. Users see incremental output rather than waiting for the entire graph to finish.

Persistence Backends

The persistence backend determines how well these features hold up under real traffic. MemorySaver works in tests. SQL or Redis checkpointers are required for production. The right backend is the difference between a demo and a deployable service.

The LangGraph Platform rebranded as LangSmith Deployment around October 2025. It standardized the runtime layer so teams no longer need to glue persistence, streaming, and authentication together by hand.

Three Proven Deployment Options

Option 1

LangSmith CLI — One-Command Deploy

Time to live: Minutes

Best for: Teams already using LangSmith tracing who want zero ops overhead.

The CLI packages your compiled graph into an Agent Server microservice, handling HTTP endpoints, authentication, persistence, and horizontal scaling behind a single command. Monitoring flows directly into LangSmith dashboards.

Key trade-off: Platform fees accumulate at scale, and all telemetry flows through an external service.

Option 2

AWS Fargate with CDK Templates

Time to live: 10 to 20 minutes

Best for: Organizations requiring VPC isolation, data residency, or regulated-industry compliance.

The official LangGraph AWS deployment template ships Docker, ECS Fargate, CloudFormation, and auto-scaling out of the box. The CDK version lets you define the entire stack in typed code.

Key trade-off: More initial IaC setup compared to the managed CLI path.

Option 3

Bedrock AgentCore for Multi-Agent Workflows

Time to live: Under 15 minutes

Best for: Heavy Bedrock users who want minimal deviation from the AWS-native toolchain.

The AWS Bedrock AgentCore toolkit integrates directly with LangGraph's StateGraph API. It handles supervisor-agent coordination, parallel node execution, and tool calling through MCP. Persistence lands in Aurora Serverless automatically.

Key trade-off: Tied to AWS inference; less portable than the self-hosted Fargate path.

There is no single right path. The best option depends on your team's speed requirements, existing cloud footprint, and tolerance for vendor dependency.

Prerequisites and Local Setup Before You Deploy

Skipping local validation wastes hours when behavior differs in the cloud. Complete every step here before touching a deploy script.

What You Need Before Starting

A working LangGraph graph (state defined with TypedDict and add_messages)
Nodes built for LLM calls and tool use, connected with conditional_edges using tools_condition
A checkpointer added before compile()
API keys for your LLM provider stored in environment variables
For AWS paths: an account with permissions for ECS, ECR, Bedrock, and SSM Parameter Store

Deploying with the LangSmith CLI

Install langgraph-cli via pip
Compile your graph exactly as you do locally, with a checkpointer attached
Configure your langgraph.json file with environment variable references
Run langgraph deploy from the project root
Confirm the health endpoint returns 200 and run a sample conversation end-to-end

Deploying to AWS Fargate with CDK

Clone the official Fargate template repository from the LangChain GitHub organization
Configure the .env file with your LLM provider keys and region settings
Store sensitive secrets in AWS SSM Parameter Store rather than in environment files
Run the production deploy script — it builds the Docker image, pushes to ECR, and provisions ECS tasks
Verify the Application Load Balancer health checks pass
Enable CloudWatch alarms on token usage, p99 latency, and task CPU
Document the graph structure as a Mermaid diagram for future engineers

Infrastructure the template provisions:

Component	Purpose
Amazon ECR	Container image storage
AWS ECS Fargate	Serverless task execution, no EC2 management
Application Load Balancer	Routes traffic, terminates TLS
AWS SSM Parameter Store	Secrets injection at runtime
Amazon CloudWatch	Metrics, logs, and alarms
Amazon Aurora or ElastiCache	Persistent checkpointing backend

Deploying with Bedrock AgentCore

Define your StateGraph the same way you would for any other deployment
Configure the AgentCore agent definition with tool permissions and memory scope
Attach Bedrock knowledge bases or Aurora for persistent context
Wire the AppSync real-time endpoint to your front end
Enable cross-region inference as a single configuration toggle if needed
Run a load test to validate supervisor-to-subagent routing under concurrent requests

Local Validation Checklist

Only after this checklist passes is the graph ready for a production deployment.

Install packages
Confirm you have the latest langgraph, cloud SDKs, and any vector store clients installed.
Run with MemorySaver
The graph should complete a full cycle without errors before you swap in a production checkpointer.
Test interrupts
Verify that execution pauses and resumes correctly on human input.
Test state persistence
Confirm that state survives between separate invocations, not just within a single session.
Check streaming
Partial tokens should reach the client without blocking the full response.
Confirm tool calls
All tools should resolve correctly and return expected outputs before you go live.

Monitoring, Scaling, and Hardening Your Live App

Production means observability before anything else. A graph that works in testing but fails silently in production is worse than one that fails loudly during development.

Observability Stack

LangSmith for end-to-end traces of every graph run, node by node
Amazon CloudWatch for Fargate task metrics, ALB access logs, and request counts
Structured logging at each node for graph-specific events that generic infrastructure tools cannot capture — include graph_id, node_name, run_id, and token_count in a consistent JSON schema, then ship to CloudWatch Logs Insights for queryable graph-level visibility
Mermaid diagrams committed to the repository so new engineers understand the execution flow without reading code

Scaling Configuration

Checkpointing is what makes horizontal scaling safe. When a Fargate task restarts or a new task spins up under load, the Redis or Aurora backend restores state exactly where the previous task left it. Configure auto-scaling groups on ECS based on ALB request count rather than CPU, since LangGraph workloads are often IO-bound rather than compute-bound.

Security Hardening

Control	Implementation
IAM permissions	Least-privilege roles per ECS task definition
Secret management	SSM Parameter Store or AWS Secrets Manager — never environment files baked into images
Network isolation	ECS tasks in private subnets; only the ALB exposed publicly
TLS termination	Enforce HTTPS at the ALB and redirect all HTTP
Health checks	Configure both ALB target group checks and container-level health endpoints

Real Production Costs and Business Results

AWS Reference Architecture Costs

A typical multi-agent deployment running a supervisor plus five specialized agents on AWS Fargate with Bedrock inference costs approximately $245 to $265 per month for 10,000 user interactions.

Cost Component	Approximate Monthly Cost
ECS Fargate compute	$90 to $110
Bedrock inference (model-dependent)	$120 to $130
Aurora Serverless or ElastiCache	$20 to $30
CloudWatch logs and metrics	$5 to $10
ALB and data transfer	$10 to $15
Total (10,000 interactions)	$245 to $265

Costs shift with model selection — smaller models cut inference spend significantly. Exact figures vary by AWS region and traffic patterns.

Documented Business Results

Insurance carrier (claims summarization): A graph-style orchestration workflow for claims summaries reached 95 percent user acceptance on visual collaboration interfaces, according to McKinsey tracking of real deployments.

E-commerce customer support (AWS Fargate + Bedrock): A supervisor agent routing queries across order management, troubleshooting, and personalization subagents delivered real-time responses via AppSync. Prospecting efficiency doubled in a comparable McKinsey supplier test using the same multi-agent pattern.

These results only materialize once the graph runs in production with real traffic. Teams that keep graphs in notebooks do not see these numbers show up in throughput or token spend metrics.

Common Deployment Problems and How to Fix Them

Breaking Changes from Library Updates

LangGraph moves quickly. Teams that do not pin dependency versions spend weekends rewriting edge definitions after a minor upgrade.

Fix: Pin every package version in requirements.txt, maintain a staging environment that mirrors production, and run integration tests against a pinned LangGraph version before promoting any upgrade.

Token Cost Spikes in Multi-Agent Loops

Multi-agent graphs can enter feedback loops that amplify token usage unexpectedly.

Fix: Add time-travel debugging to replay exact execution paths, combine rule-based routing with LLM-based decisions at critical junctions, and set hard token budget limits per graph run in your checkpointing configuration.

Vendor Lock-in with Managed Platforms

LangSmith Platform fees accumulate at scale, and some teams are uncomfortable with external telemetry for sensitive workloads.

Fix: The Fargate self-hosted path runs without any managed runtime layer. Zero telemetry leaves your infrastructure.

Observability Gaps Outside LangSmith

CloudWatch captures infrastructure-level metrics but nothing graph-specific.

Fix: Add structured JSON logs at each node with a consistent schema including graph_id, node_name, run_id, and token_count. Ship these to CloudWatch Logs Insights for queryable graph-level visibility.

Teams Reverting to Plain Python

Some engineers abandon LangGraph citing dependency bloat or perceived complexity.

Fix: Start small. Pick one workflow, prove value on a scoped deployment, then expand. The framework pays off once reusable components span multiple workflows.

Most production failures fall into a small set of repeatable patterns. Recognizing them early saves significant debugging time.

Frequently Asked Questions

How do I deploy a LangGraph agent to AWS Fargate?

Clone the official Fargate template from the LangChain GitHub organization. Configure your .env and load secrets into SSM Parameter Store. Run the production deploy script — it builds the Docker image, pushes to ECR, and provisions ECS tasks with an Application Load Balancer. The full stack tears down and recreates in minutes because everything is defined as code.

What does it actually cost to run LangGraph in production?

Expect roughly $245 to $265 per month for 10,000 interactions on a standard AWS multi-agent setup using Fargate and Bedrock. Add persistence and monitoring on top. Exact figures shift significantly with model choice and traffic volume.

Can I run LangGraph without using the LangSmith Platform?

Yes. Self-hosted Docker with FastAPI, or the full AWS Fargate CDK template, gives complete control without any managed runtime. You give up the one-command Agent Server but keep everything inside your own accounts with no external telemetry.

How do checkpointing and interrupts work in a live deployment?

Checkpointing saves graph state to SQL or Redis after every node completes. On restart or failover, the graph resumes exactly where it left off. Interrupts pause execution at any node you configure, wait for external input or approval, then continue automatically. Both features work identically whether you use the managed platform or raw Fargate.

Which is better for production: LangSmith CLI or a custom Docker setup?

The CLI wins for speed and built-in observability. Custom Docker or Fargate wins when you need zero external dependencies and full data residency. Most teams start with the CLI to prove business value, then migrate to self-hosted once traffic volumes justify the added infrastructure work.

What persistence backend should I use?

Use MemorySaver only in local tests and unit tests. For production, choose a SQL-based checkpointer (PostgreSQL works well with the LangGraph SqliteSaver or a Postgres adapter) or a Redis checkpointer for lower-latency state reads. Aurora Serverless is the right choice on AWS if you want managed scaling without provisioned capacity.

The graph only pays off in production

Every feature covered here — checkpointing, interrupts, streaming, horizontal scaling — only delivers real business value once the graph is running live with real traffic. Teams that keep graphs in notebooks do not see the throughput gains or cost efficiencies show up in their metrics.

Pick one deployment path, validate locally, and ship it.

Build with Octopus Builds

Need help turning the article into an actual system?

We design the operating model, product surface, and delivery plan behind AI systems that need to ship cleanly and keep working in production.

Start a conversation Explore capabilities

How to Deploy LangGraph: A Step-by-Step Guide