From AWS Lock-In to LLM Freedom: Building a Cost-Effective Hybrid-Cloud AI Stack
Self-hosting LiteLLM, Langflow, and Khoj for maximum flexibility

Earlier this year, I wrote about reducing my AWS costs through a hybrid-cloud approach. The core insight: the majority of my AWS costs came from compute (Lambda) usage, while the AI “heavy lifting” performed by AWS Bedrock constituted only 5% of my costs. By moving compute on-premises, I reduced costs significantly. However, maintaining the on-premises software stack proved more expensive than anticipated.
The Solution That Wasn’t
Problems with the on-premises solution:
- The Bedrock Proxy API implementation is incomplete and does not support all OpenAI API calls. Implementing all missing APIs is non-trivial.
- Dependencies on AWS Secrets Manager, Key Management Services, and Systems Manager Parameter Store cost $0.32 per day.
Replacing AWS dependencies? Easy — maybe a day of work. Implementing full OpenAI API support? That’s a different story. It would require significant upfront development and constant maintenance every time OpenAI updates their API.
I actually started building this, but quickly realized I was signing up for endless maintenance work. That’s a fool’s errand, so I abandoned the approach.
If you’re curious about my implementation before I archived it, you can check out the repos where I split the front-end API and Bedrock adapter: here and here
Back to the drawing board
Defining the Requirements
What problem am I trying to solve? I want a self-hosted piece of software that presents an OpenAI-compatible API while routing requests to arbitrary LLM providers. This would unlock compatibility between OpenAI-only tools (which are numerous) and any LLM — including self-hosted models and Anthropic’s offerings on AWS Bedrock.
A Note on Motivation
I maintain an OpenAI account, so this isn’t about avoiding their service. Rather, it’s about architectural flexibility: the ability to choose the best LLM for each use case without being locked into a single provider.
Why Not Extend Existing Solutions?
The Bedrock API Gateway sample handles the Bedrock integration but has two limitations:
- It doesn’t support all OpenAI APIs
- It doesn’t support self-hosted models
I considered two options: extend the project and contribute back via pull request, or find an existing solution that already meets my needs. While I’m capable of implementing the missing features, that doesn’t make it the right choice. The maintenance burden and opportunity cost led me to search for existing open-source alternatives instead.
Introducing, LiteLLM
LiteLLM solved everything. I genuinely wish I had known about it a year or two ago — it would have saved me from building custom solutions. Their Docker and Kubernetes deployment docs got me up and running in minutes, and since it checked all my boxes while working perfectly with my existing Kubernetes and Ansible setup, I stopped searching.
LiteLLM - Getting Started | liteLLM
https://github.com/BerriAI/litellmdocs.litellm.ai
What Changed?
My API keys now live on-premises in the LiteLLM database. That means I no longer need:
- AWS Secrets Manager
- Key Management Services
- Systems Manager Parameter Store
Since I am not implementing IAM Roles Anywhere, my AWS footprint consists solely of an IAM user for generating access keys. I added a CDK stack to my bedrock-access-gateway-cdk repository to automate this one-time provisioning step.
CDK Deployment Overview
This CDK stack will deploy an IAM user and group with the permissions to call AWS Bedrock APIs. Instructions for generating the AWS Access and Secret Key can be found here.
CDK source: bedrock_api_user_stack.py
PowerShell Deployment Script: deploy-bedrock-api-users-stack.ps1
.\deploy-bedrock-api-users-stack.ps1 `
-AwsAccountId "987654321098" `
-AwsRegion "us-west-2" `
-UserName "BedrockApiUser" `
-GroupName "BedrockApiUsers" `
-RemovalPolicy "DESTROY"
# Environment variables may be substituted for command-line parameters
$env:BEDROCK_API_USER_NAME = "BedrockApiUser"
$env:BEDROCK_API_USERS_GROUP_NAME = "BedrockApiUsers"
.\deploy-bedrock-api-users-stack.ps1 `
-AwsAccountId "987654321098" `
-AwsRegion "us-west-2" `
-RemovalPolicy "DESTROY"
Bash Deployment Script: deploy-bedrock-api-users-stack.sh
./scripts/deploy-bedrock-api-users-stack.sh \
--aws-account-id 0123456789012
--aws-region us-west-2
--removal-policy RETAIN
--user-name "BedrockApiUser"
--group-name "BedrockApiUsers"
LiteLLM Deployment Overview
I deployed my LiteLLM stack to Kubernetes using Ansible. The playbook below has been modified from my production configuration to make it suitable for others to adapt.
Prerequisites
- Ansible knowledge required
- Alternative: Use the LiteLLM Helm chart if unfamiliar with Ansible
Important Notes
- This code is provided as-is and requires modification for your environment
- DNS Provider: This example uses Cloudflare. To use AWS Route 53, replace Cloudflare-specific sections with the Route 53 example below
Deployment Commands:
# deploy to dev
ansible-playbook -vv -i inventory/litellm.yml deploy.yml \
--extra-vars="@group_vars/secrets.yml" \
--limit dev-cp-1
# deploy to ppe
ansible-playbook -vv -i inventory/litellm.yml deploy.yml \
--extra-vars="@group_vars/secrets.yml" \
--limit ppe-cp-1
# deploy to prod
Deployment Playbook:
# deploy.yml
---
- hosts: all
become: yes
become_user: ansible
pre_tasks:
- name: Create directory for manifests
file:
path: ""
state: directory
mode: '0755'
- name: Create namespace manifest
copy:
dest: "/00-app-namespace.yaml"
content: |
apiVersion: v1
kind: Namespace
metadata:
name:
# Database
- name: Create PostgreSQL credentials secret
copy:
dest: "/00-database-credentials-secret.yaml"
content: |
---
apiVersion: v1
kind: Secret
metadata:
name: -database-credentials
namespace:
stringData:
POSTGRES_USER: ""
POSTGRES_PASSWORD: ""
POSTGRES_DB: ""
POSTGRES_HOST: "-database..svc.cluster.local"
POSTGRES_PORT: "5432"
- name: Create LiteLLM database StatefulSet manifest
copy:
dest: "/00-database-statefulset.yaml"
content: |
---
# Headless Service for the StatefulSet
apiVersion: v1
kind: Service
metadata:
name: -database
namespace:
spec:
clusterIP: None
selector:
app: -database
ports:
- port: 5432
name: tcp
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: -database
namespace:
spec:
serviceName: -database
replicas: 1
selector:
matchLabels:
app: -database
template:
metadata:
labels:
app: -database
spec:
containers:
- name: -database
image:
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: -database-credentials
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: -database-credentials
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: -database-credentials
key: POSTGRES_PASSWORD
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 10
volumeMounts:
- name: database-data
mountPath: /var/lib/postgresql/data/
volumeClaimTemplates:
- metadata:
name: database-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName:
resources:
requests:
storage:
- name: Apply Kubernetes manifests
shell: "kubectl apply -f /"
with_items:
- 00-app-namespace.yaml
- 00-database-credentials-secret.yaml
- 00-database-statefulset.yaml
tasks:
# Part 1: Certificate Management
## Cloudflare
- name: Create Cloudflare credentials secret
copy:
dest: "/01-cloudflare-credentials-secret.yaml"
content: |
---
apiVersion: v1
kind: Secret
metadata:
name: -cloudflare-credentials
namespace:
type: Opaque
stringData:
dns-api-token: ""
global-api-key: ""
- name: Create Cloudflare cert-manager staging issuer
copy:
dest: "/01-cloudflare-staging-issuer.yaml"
content: |
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name:
namespace:
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: ""
privateKeySecretRef:
name:
solvers:
- selector: {}
dns01:
cloudflare:
apiTokenSecretRef:
name: -cloudflare-credentials
key: dns-api-token
- name: Create Cloudflare cert-manager production issuer
copy:
dest: "/01-cloudflare-prod-issuer.yaml"
content: |
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name:
namespace:
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ""
privateKeySecretRef:
name:
solvers:
- selector: {}
dns01:
cloudflare:
apiTokenSecretRef:
name: -cloudflare-credentials
key: dns-api-token
# Part 2: LiteLLM Configs
- name: Create LiteLLM config
copy:
dest: "/02-litellm-config.yaml"
content: |
---
apiVersion: v1
kind: ConfigMap
metadata:
name: -config
namespace:
data:
config.yaml: |
# https://docs.litellm.ai/docs/proxy/config_management
# include:
# - model_config.yaml
router_settings:
debug_level: ""
litellm_settings:
telemetry: False
drop_params: true # Drop unsupported params https://docs.litellm.ai/docs/completion/drop_params#openai-proxy-usage
request_timeout: 600 # raise Timeout error if call takes longer than 600 seconds. Default value is 6000seconds if not set
set_verbose:
json_logs: true # Get debug logs in json format
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
# Setup slack alerting - get alerts on LLM exceptions, Budget Alerts, Slow LLM Responses
alerting:
# Batch write spend updates
proxy_batch_write_at:
# limit the number of database connections to = MAX Number of DB Connections/Number of instances of litellm proxy (Around 10-20 is good number)
database_connection_pool_limit:
allow_requests_on_db_unavailable: # Allow requests to still be processed even if the DB is unavailable.
background_health_checks: # enable background health checks
health_check_interval: # frequency of background health checks
model_config.yaml: |
# https://docs.litellm.ai/docs/proxy/model_management
# litellm_params: https://docs.litellm.ai/docs/completion/input#input-params-1
model_list:
# Anthropic Claude Sonnet 4.5 v1
- model_name: us.anthropic.claude-sonnet-4-5-20250929-v1:0
litellm_params:
model: us.anthropic.claude-sonnet-4-5-20250929-v1:0
litellm_credential_name: default_bedrock_credential
model_info:
custom_llm_provider: bedrock
# Anthropic Claude Opus 4.1 v1
- model_name: us.anthropic.claude-opus-4-1-20250805-v1:0
litellm_params:
model: us.anthropic.claude-opus-4-1-20250805-v1:0
litellm_credential_name: default_bedrock_credential
model_info:
custom_llm_provider: bedrock
# Anthropic Claude Haiku 4.5 v1
- model_name: us.anthropic.claude-haiku-4-5-20251001-v1:0
litellm_params:
model: us.anthropic.claude-haiku-4-5-20251001-v1:0
litellm_credential_name: default_bedrock_credential
model_info:
custom_llm_provider: bedrock
# Stability Stable Image Ultra v1.1
- model_name: stability.stable-image-ultra-v1:1
litellm_params:
model: stability.stable-image-ultra-v1:1
litellm_credential_name: default_bedrock_credential
model_info:
custom_llm_provider: bedrock
mode: image_generation
# Stability Stable Diffusion 3.5 Large v1.0
- model_name: stability.sd3-5-large-v1:0
litellm_params:
model: stability.sd3-5-large-v1:0
litellm_credential_name: default_bedrock_credential
model_info:
custom_llm_provider: bedrock
mode: image_generation
# https://docs.litellm.ai/docs/proxy/configs#centralized-credential-management
credential_list:
- credential_name: default_bedrock_credential
credential_values:
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: os.environ/AWS_REGION_NAME
credential_info:
description: "AWS Bedrock credentials"
custom_llm_provider: bedrock
- name: Create LiteLLM secrets
copy:
dest: "/02-litellm-secrets.yaml"
content: |
---
apiVersion: v1
kind: Secret
metadata:
name: -secrets
namespace:
stringData:
AWS_REGION_NAME: ""
AWS_REGION: ""
AWS_ACCESS_KEY_ID: ""
AWS_SECRET_ACCESS_KEY: ""
# Your master key for the proxy server. Can use this to send /chat/completion requests etc
LITELLM_MASTER_KEY: ""
# Can NOT CHANGE THIS ONCE SET - It is used to encrypt/decrypt credentials stored in DB. If value of 'LITELLM_SALT_KEY' changes your models cannot be retrieved from DB
LITELLM_SALT_KEY: ""
DATABASE_URL: "postgresql://:@-database..svc.cluster.local:5432/"
UI_USERNAME: ""
UI_PASSWORD: ""
SLACK_WEBHOOK_URL: ""
SMTP_USERNAME: ""
SMTP_PASSWORD: ""
# Part 3: LiteLLM Proxy
- name: Create LiteLLM Deployment
copy:
dest: "/03-litellm-deployment.yaml"
content: |
---
apiVersion: apps/v1
kind: Deployment
metadata:
name:
namespace:
spec:
replicas:
selector:
matchLabels:
app:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app:
spec:
containers:
- name:
image:
ports:
- containerPort:
volumeMounts:
- name: config-volume
mountPath: /app/config.yaml
subPath: config.yaml
- name: config-volume
mountPath: /app/model_config.yaml
subPath: model_config.yaml
envFrom:
- secretRef:
name: -secrets
env:
- name: CONFIG_FILE_PATH
value: /app/config.yaml
- name: DISABLE_ADMIN_UI
value: "False"
- name: STORE_MODEL_IN_DB
value: "True"
- name: USE_PRISMA_MIGRATE
value: "True"
- name: SMTP_SENDER_EMAIL
value:
- name: SMTP_HOST
value:
- name: SMTP_PORT
value: "465"
- name: SMTP_TLS
value: "True"
- name: SMTP_USERNAME
valueFrom:
secretKeyRef:
name: -secrets
key: SMTP_USERNAME
- name: SMTP_PASSWORD
valueFrom:
secretKeyRef:
name: -secrets
key: SMTP_PASSWORD
- name: LITELLM_LOG
value:
- name: LITELLM_MASTER_KEY
valueFrom:
secretKeyRef:
name: -secrets
key: LITELLM_MASTER_KEY
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
readinessProbe:
httpGet:
path: /health/readiness
port:
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 10
livenessProbe:
httpGet:
path: /health/liveness
port:
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 10
resources:
requests:
cpu: 2000m
memory: 2Gi
limits:
cpu: 4000m
memory: 8Gi
volumes:
- name: config-volume
configMap:
name: -config
- name: Create LiteLLM Service
copy:
dest: "/03-litellm-service.yaml"
content: |
---
apiVersion: v1
kind: Service
metadata:
name:
namespace:
spec:
type: ClusterIP
ports:
- port:
targetPort:
protocol: TCP
selector:
app:
- name: Create LiteLLM Ingress
copy:
dest: "/03-litellm-ingress.yaml"
content: |
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name:
namespace:
annotations:
cert-manager.io/issuer:
spec:
ingressClassName:
tls:
- hosts:
secretName: -tls
rules:
# Part 7: Apply Kubernetes manifests
- name: Apply Kubernetes manifests
shell: "kubectl apply -f /"
with_items:
# Cloudflare ACME
- 01-cloudflare-credentials-secret.yaml
- 01-cloudflare-staging-issuer.yaml
- 01-cloudflare-prod-issuer.yaml
# LiteLLM Configs
- 02-litellm-config.yaml
- 02-litellm-secrets.yaml
# LiteLLM Proxy
- 03-litellm-deployment.yaml
- 03-litellm-service.yaml
- 03-litellm-ingress.yaml
AWS Route 53 ACME
If desired, you can replace Cloudflare ACME with Route53.
- hosts: all
become: yes
become_user: ansible
tasks:
## AWS ACME credentials
- name: Create ACME AWS credentials secret
copy:
dest: "/02-aws-credentials-secret.yaml"
content: |
---
apiVersion: v1
kind: Secret
metadata:
name:
namespace:
stringData:
access-key-id: ""
secret-access-key: ""
- name: Create cert-manager test issuer
copy:
dest: "/02-cert-manager-test-issuer.yaml"
content: |
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name:
namespace:
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: ""
privateKeySecretRef:
name:
solvers:
- selector: {}
dns01:
route53:
region:
accessKeyIDSecretRef:
name:
key: access-key-id
secretAccessKeySecretRef:
name:
key: secret-access-key
- name: Create cert-manager production issuer
copy:
dest: "/02-cert-manager-prod-issuer.yaml"
content: |
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name:
namespace:
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ""
privateKeySecretRef:
name:
solvers:
- selector: {}
dns01:
route53:
region:
accessKeyIDSecretRef:
name:
key: access-key-id
secretAccessKeySecretRef:
name:
key: secret-access-key
- name: Apply Kubernetes manifests
shell: "kubectl apply -f /"
with_items:
# AWS Route 53 ACME
- 02-aws-credentials-secret.yaml
- 02-cert-manager-test-issuer.yaml
- 02-cert-manager-prod-issuer.yaml
Secrets
#The following variables need to be defined in an Ansible Vault secrets file
# All Environments
litellm_ui_admin
litellm_ui_password
postgres_user
postgres_db
# Cloudflare ACME
cloudflare_prod_dns_api_token
cloudflare_prod_global_api_key
# AWS Route 53 ACME
acme_route53_access_key_id
acme_route53_secret_access_key
# Slack notifications
litellm_slack_webhook_url
# Dev Environment
bedrock_test_aws_access_key
bedrock_test_aws_secret_key
litellm_dev_master_key
litellm_dev_postgres_password
litellm_dev_replication_password
# Preproduction Environment
bedrock_gamma_aws_access_key
bedrock_gamma_aws_secret_key
litellm_ppe_master_key
litellm_ppe_postgres_password
litellm_ppe_replication_password
# Production Environment
bedrock_prod_aws_access_key
bedrock_prod_aws_secret_key
litellm_prod_master_key
litellm_prod_postgres_password
litellm_prod_replication_password
Inventory
# inventory/litellm.yml
all:
vars:
ansible_user: ansible
manifests_dir: /home/ansible/litellm
app_namespace: litellm
app_name: litellm
app_ingress_class: nginx
app_container_port: 4000 # LiteLLM default port
app_storage_class: csi-rbd-sc
app_container_image: ghcr.io/berriai/litellm:v1.79.1-stable
aws_region: us-west-2
default_model: us.anthropic.claude-sonnet-4-5-20250929-v1:0
acme_staging_email: "YOUR_EMAIL@YOUR_DOMAIN"
acme_prod_email: "YOUR_EMAIL@YOUR_DOMAIN"
cloudflare_acme_prod_issuer: acme-cloudflare
cloudflare_acme_test_issuer: test-cloudflare
database_container_image: docker.io/pgvector/pgvector:pg15
db_storage_size: 10Gi
db_storage_class: csi-rbd-sc
postgres_user: litellm
postgres_db: litellm
litellm_ui_username: ""
litellm_ui_password: ""
allow_requests_on_db_unavailable: "True"
health_check_interval: 300
database_connection_pool_limit: 10
proxy_batch_write_at: 60
smtp_host: email-smtp.us-west-2.amazonaws.com
smtp_sender_email: "YOUR_EMAIL@YOUR_DOMAIN"
dev:
hosts:
dev-cp-1:
ansible_host: REPLACEME
vars:
app_cert_issuer: ""
app_hosts:
- litellm.dev.example.com
app_env: dev
desired_replicas: 1
log_level: DEBUG
set_verbose: "True"
background_health_checks: "False"
app_alerting: ["slack"]
bedrock_credentials_access_key: ""
bedrock_credentials_secret_key: ""
litellm_master_key: ""
postgres_password: ""
replication_password: ""
ppe:
hosts:
ppe-cp-1:
ansible_host: REPLACEME
vars:
app_cert_issuer: ""
app_hosts:
- litellm.ppe.example.com
app_env: ppe
desired_replicas: 1
log_level: DEBUG
set_verbose: "True"
background_health_checks: "False"
app_alerting: ["slack"]
bedrock_credentials_access_key: ""
bedrock_credentials_secret_key: ""
litellm_master_key: ""
postgres_password: ""
replication_password: ""
prod:
hosts:
prod-cp-1:
ansible_host: REPLACEME
vars:
app_cert_issuer: ""
app_hosts:
- litellm.example.com
app_env: prod
desired_replicas: 3
log_level: INFO
set_verbose: "False"
background_health_checks: "False"
app_alerting: ["slack", "email"]
bedrock_credentials_access_key: ""
bedrock_credentials_secret_key: ""
litellm_master_key: ""
postgres_password: ""
replication_password: ""
But wait, there’s more!
LangFlow — OMG, this is exactly what I needed! This is definitely worth checking out if you’re into building Agentic workflow. The easiest way to get started (in my opinion) is with Langflow for Desktop.
Deploying Langflow is significantly more involved, however, because my flows depend on additional services: a vector database, external chat memory, and similar infrastructure components. I deploy everything as containers via Kubernetes, managing the deployment through a GitHub repository — no custom Langflow code or components, just a collection of Ansible manifests and bash scripts orchestrated by a GoCD pipeline-as-code setup. The deployment code isn’t useful to share since it’s specific to my infrastructure configuration.
Self-Hosted Chat Bot
Earlier this year, I made a post about my asinine conversations with “fake Spock” on my self-hosted Mattermost server. While the copilot AI plug-in is cool, there’s a prompt embedded in its logic that is a challenge to overcome. Rather than fight with a chatbot designed for Mattermost, I found an open-source, self-hostable service for a personal chat bot. Now I have two personal chatbots! Perfect! 2 > 0 > 1 (Amazon employees will recognize that mathematically inaccurate expression as “Two is better than none, and none is better than one”.)
Following the same pattern as LiteLLM, Khoj’s documentation made self-hosting remarkably easy. Khoj functions as an AI-powered personal assistant that indexes and searches across notes, documents, and files.
The killer feature for my workflow is its native Obsidian integration. Since Obsidian is my primary knowledge management tool, having AI-assisted search and contextual chat built directly into my note-taking environment significantly enhances productivity. I can now query my entire knowledge base conversationally and surface relevant information without leaving my editor.
Overview | Khoj AI
Your Second Braindocs.khoj.dev
Conclusion
With LiteLLM deployed, I have two options: point my tools directly at providers like OpenAI, Anthropic, or AWS Bedrock, or route everything through LiteLLM for maximum flexibility. I chose the latter, connecting Mattermost, Khoj, and Langflow to my LiteLLM instance. This approach provides centralized AI spending tracking across all applications, with daily reports delivered automatically via Mattermost. The result? A simple, flexible, and cost-effective architecture that gives me provider independence without sacrificing functionality.