You spent weeks perfecting a multi-agent graph in your local Jupyter notebook. It handles state correctly, pauses for human input at the right moment, and streams responses cleanly to the front end. Then you push it live and everything breaks. This guide walks you through every step, from local setup to live deployment, covering three proven production paths, real cost figures, and the most common mistakes teams make along the way.
What Is LangGraph and Why Deploy It Now
You spent weeks perfecting a multi-agent graph in your local Jupyter notebook. It handles state correctly, pauses for human input at the right moment, and streams responses cleanly to the front end. Then you push it live and everything breaks. Context disappears between turns. Costs spike without warning. Debugging becomes a guessing game.
This is the exact wall where most AI engineering teams stall.
LangGraph sits inside the LangChain ecosystem. It lets you build stateful, cyclic multi-agent workflows where you define nodes for LLM calls or tool use, connect them with edges, and add conditional routing logic. The graph remembers state across conversation turns, pauses for human review when you configure it to, and streams results in real time.
Six months ago most teams treated LangGraph graphs as sophisticated prototypes. That changed fast. Cloud-managed runtimes and self-hosted infrastructure-as-code templates turned experimental setups into repeatable production deployments. Enterprises across insurance, banking, and customer support now run them at scale.
The business case is clear. Teams that fully reimagine workflows around reusable agents cut nonessential work by 30 to 50 percent, according to McKinsey research on AI-powered workflow redesign. Those gains only show up after the graph leaves the notebook.
This guide walks through every step — from local setup to live deployment — covering three proven production paths, real cost figures, and the most common mistakes teams make along the way.
Core Production Features That Changed Everything in Late 2025
Several capabilities arrived together in late 2025, making LangGraph production-ready in a way it simply was not before.
Checkpointing
The graph saves its exact state after every node execution. You can resume, rewind, or branch from any point in a conversation. Long-running sessions no longer lose context on failure.
Interrupts
The graph pauses at any step you define, waits for external approval or input, then continues automatically. This is critical for workflows in compliance-heavy industries.
Streaming APIs
Partial results push to the front end without blocking the full response. Users see incremental output rather than waiting for the entire graph to finish.
Persistence Backends
The persistence backend determines how well these features hold up under real traffic. MemorySaver works in tests. SQL or Redis checkpointers are required for production. The right backend is the difference between a demo and a deployable service.
The LangGraph Platform rebranded as LangSmith Deployment around October 2025. It standardized the runtime layer so teams no longer need to glue persistence, streaming, and authentication together by hand.
Three Proven Deployment Options
LangSmith CLI — One-Command Deploy
Time to live: Minutes
Best for: Teams already using LangSmith tracing who want zero ops overhead.
The CLI packages your compiled graph into an Agent Server microservice, handling HTTP endpoints, authentication, persistence, and horizontal scaling behind a single command. Monitoring flows directly into LangSmith dashboards.
Key trade-off: Platform fees accumulate at scale, and all telemetry flows through an external service.
AWS Fargate with CDK Templates
Time to live: 10 to 20 minutes
Best for: Organizations requiring VPC isolation, data residency, or regulated-industry compliance.
The official LangGraph AWS deployment template ships Docker, ECS Fargate, CloudFormation, and auto-scaling out of the box. The CDK version lets you define the entire stack in typed code.
Key trade-off: More initial IaC setup compared to the managed CLI path.
Bedrock AgentCore for Multi-Agent Workflows
Time to live: Under 15 minutes
Best for: Heavy Bedrock users who want minimal deviation from the AWS-native toolchain.
The AWS Bedrock AgentCore toolkit integrates directly with LangGraph's StateGraph API. It handles supervisor-agent coordination, parallel node execution, and tool calling through MCP. Persistence lands in Aurora Serverless automatically.
Key trade-off: Tied to AWS inference; less portable than the self-hosted Fargate path.
There is no single right path. The best option depends on your team's speed requirements, existing cloud footprint, and tolerance for vendor dependency.
Prerequisites and Local Setup Before You Deploy
Skipping local validation wastes hours when behavior differs in the cloud. Complete every step here before touching a deploy script.
What You Need Before Starting
- A working LangGraph graph (state defined with
TypedDictandadd_messages) - Nodes built for LLM calls and tool use, connected with
conditional_edgesusingtools_condition - A checkpointer added before
compile() - API keys for your LLM provider stored in environment variables
- For AWS paths: an account with permissions for ECS, ECR, Bedrock, and SSM Parameter Store
Deploying with the LangSmith CLI
- Install
langgraph-clivia pip - Compile your graph exactly as you do locally, with a checkpointer attached
- Configure your
langgraph.jsonfile with environment variable references - Run
langgraph deployfrom the project root - Confirm the health endpoint returns 200 and run a sample conversation end-to-end
Deploying to AWS Fargate with CDK
- Clone the official Fargate template repository from the LangChain GitHub organization
- Configure the
.envfile with your LLM provider keys and region settings - Store sensitive secrets in AWS SSM Parameter Store rather than in environment files
- Run the production deploy script — it builds the Docker image, pushes to ECR, and provisions ECS tasks
- Verify the Application Load Balancer health checks pass
- Enable CloudWatch alarms on token usage, p99 latency, and task CPU
- Document the graph structure as a Mermaid diagram for future engineers
Infrastructure the template provisions:
| Component | Purpose |
|---|---|
| Amazon ECR | Container image storage |
| AWS ECS Fargate | Serverless task execution, no EC2 management |
| Application Load Balancer | Routes traffic, terminates TLS |
| AWS SSM Parameter Store | Secrets injection at runtime |
| Amazon CloudWatch | Metrics, logs, and alarms |
| Amazon Aurora or ElastiCache | Persistent checkpointing backend |
Deploying with Bedrock AgentCore
- Define your
StateGraphthe same way you would for any other deployment - Configure the AgentCore agent definition with tool permissions and memory scope
- Attach Bedrock knowledge bases or Aurora for persistent context
- Wire the AppSync real-time endpoint to your front end
- Enable cross-region inference as a single configuration toggle if needed
- Run a load test to validate supervisor-to-subagent routing under concurrent requests
Local Validation Checklist
Only after this checklist passes is the graph ready for a production deployment.
Install packages
Confirm you have the latest
langgraph, cloud SDKs, and any vector store clients installed.Run with MemorySaver
The graph should complete a full cycle without errors before you swap in a production checkpointer.
Test interrupts
Verify that execution pauses and resumes correctly on human input.
Test state persistence
Confirm that state survives between separate invocations, not just within a single session.
Check streaming
Partial tokens should reach the client without blocking the full response.
Confirm tool calls
All tools should resolve correctly and return expected outputs before you go live.
Monitoring, Scaling, and Hardening Your Live App
Production means observability before anything else. A graph that works in testing but fails silently in production is worse than one that fails loudly during development.
Observability Stack
- LangSmith for end-to-end traces of every graph run, node by node
- Amazon CloudWatch for Fargate task metrics, ALB access logs, and request counts
- Structured logging at each node for graph-specific events that generic infrastructure tools cannot capture — include
graph_id,node_name,run_id, andtoken_countin a consistent JSON schema, then ship to CloudWatch Logs Insights for queryable graph-level visibility - Mermaid diagrams committed to the repository so new engineers understand the execution flow without reading code
Scaling Configuration
Checkpointing is what makes horizontal scaling safe. When a Fargate task restarts or a new task spins up under load, the Redis or Aurora backend restores state exactly where the previous task left it. Configure auto-scaling groups on ECS based on ALB request count rather than CPU, since LangGraph workloads are often IO-bound rather than compute-bound.
Security Hardening
| Control | Implementation |
|---|---|
| IAM permissions | Least-privilege roles per ECS task definition |
| Secret management | SSM Parameter Store or AWS Secrets Manager — never environment files baked into images |
| Network isolation | ECS tasks in private subnets; only the ALB exposed publicly |
| TLS termination | Enforce HTTPS at the ALB and redirect all HTTP |
| Health checks | Configure both ALB target group checks and container-level health endpoints |
Real Production Costs and Business Results
AWS Reference Architecture Costs
A typical multi-agent deployment running a supervisor plus five specialized agents on AWS Fargate with Bedrock inference costs approximately $245 to $265 per month for 10,000 user interactions.
| Cost Component | Approximate Monthly Cost |
|---|---|
| ECS Fargate compute | $90 to $110 |
| Bedrock inference (model-dependent) | $120 to $130 |
| Aurora Serverless or ElastiCache | $20 to $30 |
| CloudWatch logs and metrics | $5 to $10 |
| ALB and data transfer | $10 to $15 |
| Total (10,000 interactions) | $245 to $265 |
Costs shift with model selection — smaller models cut inference spend significantly. Exact figures vary by AWS region and traffic patterns.
Documented Business Results
Insurance carrier (claims summarization): A graph-style orchestration workflow for claims summaries reached 95 percent user acceptance on visual collaboration interfaces, according to McKinsey tracking of real deployments.
E-commerce customer support (AWS Fargate + Bedrock): A supervisor agent routing queries across order management, troubleshooting, and personalization subagents delivered real-time responses via AppSync. Prospecting efficiency doubled in a comparable McKinsey supplier test using the same multi-agent pattern.
These results only materialize once the graph runs in production with real traffic. Teams that keep graphs in notebooks do not see these numbers show up in throughput or token spend metrics.
Common Deployment Problems and How to Fix Them
Breaking Changes from Library Updates
LangGraph moves quickly. Teams that do not pin dependency versions spend weekends rewriting edge definitions after a minor upgrade.
Fix: Pin every package version in requirements.txt, maintain a staging environment that mirrors production, and run integration tests against a pinned LangGraph version before promoting any upgrade.
Token Cost Spikes in Multi-Agent Loops
Multi-agent graphs can enter feedback loops that amplify token usage unexpectedly.
Fix: Add time-travel debugging to replay exact execution paths, combine rule-based routing with LLM-based decisions at critical junctions, and set hard token budget limits per graph run in your checkpointing configuration.
Vendor Lock-in with Managed Platforms
LangSmith Platform fees accumulate at scale, and some teams are uncomfortable with external telemetry for sensitive workloads.
Fix: The Fargate self-hosted path runs without any managed runtime layer. Zero telemetry leaves your infrastructure.
Observability Gaps Outside LangSmith
CloudWatch captures infrastructure-level metrics but nothing graph-specific.
Fix: Add structured JSON logs at each node with a consistent schema including graph_id, node_name, run_id, and token_count. Ship these to CloudWatch Logs Insights for queryable graph-level visibility.
Teams Reverting to Plain Python
Some engineers abandon LangGraph citing dependency bloat or perceived complexity.
Fix: Start small. Pick one workflow, prove value on a scoped deployment, then expand. The framework pays off once reusable components span multiple workflows.
Most production failures fall into a small set of repeatable patterns. Recognizing them early saves significant debugging time.
Frequently Asked Questions
How do I deploy a LangGraph agent to AWS Fargate?
Clone the official Fargate template from the LangChain GitHub organization. Configure your .env and load secrets into SSM Parameter Store. Run the production deploy script — it builds the Docker image, pushes to ECR, and provisions ECS tasks with an Application Load Balancer. The full stack tears down and recreates in minutes because everything is defined as code.
What does it actually cost to run LangGraph in production?
Expect roughly $245 to $265 per month for 10,000 interactions on a standard AWS multi-agent setup using Fargate and Bedrock. Add persistence and monitoring on top. Exact figures shift significantly with model choice and traffic volume.
Can I run LangGraph without using the LangSmith Platform?
Yes. Self-hosted Docker with FastAPI, or the full AWS Fargate CDK template, gives complete control without any managed runtime. You give up the one-command Agent Server but keep everything inside your own accounts with no external telemetry.
How do checkpointing and interrupts work in a live deployment?
Checkpointing saves graph state to SQL or Redis after every node completes. On restart or failover, the graph resumes exactly where it left off. Interrupts pause execution at any node you configure, wait for external input or approval, then continue automatically. Both features work identically whether you use the managed platform or raw Fargate.
Which is better for production: LangSmith CLI or a custom Docker setup?
The CLI wins for speed and built-in observability. Custom Docker or Fargate wins when you need zero external dependencies and full data residency. Most teams start with the CLI to prove business value, then migrate to self-hosted once traffic volumes justify the added infrastructure work.
What persistence backend should I use?
Use MemorySaver only in local tests and unit tests. For production, choose a SQL-based checkpointer (PostgreSQL works well with the LangGraph SqliteSaver or a Postgres adapter) or a Redis checkpointer for lower-latency state reads. Aurora Serverless is the right choice on AWS if you want managed scaling without provisioned capacity.
The graph only pays off in production
Every feature covered here — checkpointing, interrupts, streaming, horizontal scaling — only delivers real business value once the graph is running live with real traffic. Teams that keep graphs in notebooks do not see the throughput gains or cost efficiencies show up in their metrics.
Pick one deployment path, validate locally, and ship it.
Build with Octopus Builds
Need help turning the article into an actual system?
We design the operating model, product surface, and delivery plan behind AI systems that need to ship cleanly and keep working in production.
.png)