Portable GenAI Architecture: Patterns That Run on Any Cloud (or On-Prem)

If you read our piece on cloud lock-in, you understand the problem. The hyperscalers are running their standard playbook on AI infrastructure, and the switching costs are accumulating faster than most teams realize.

But understanding the problem doesn’t build systems. This article is about the “how”—concrete architectural patterns that preserve your options without sacrificing the speed benefits of managed services. Consider it a checklist for teams actively designing agent platforms.

The goal isn’t to avoid cloud services entirely. That’s impractical and, frankly, unnecessary. The goal is intentional architecture: understanding which layers must stay portable, which can safely use managed services, and how to structure the boundaries between them.

The Layer Model for GenAI Platforms

Before diving into specific technologies, it helps to have a mental model for how GenAI platforms decompose. Most production systems have five distinct layers, each with different portability characteristics.

The model layer handles LLM inference—either through API calls to providers like OpenAI and Anthropic, or through self-hosted models. This is where the actual language understanding happens.

The orchestration layer contains your agent logic: the chains, the tool use, the multi-step reasoning, the decision trees that turn raw model capabilities into useful workflows. This is typically where most of your custom code lives.

The storage layer encompasses vector databases for retrieval, conversation history, knowledge bases, and any persistent state your agents need to function.

The observability layer handles tracing, logging, evaluation, and monitoring—everything required to understand what your agents are doing and whether they’re doing it well.

The infrastructure layer is the compute, networking, and deployment machinery that runs everything else.

These layers have fundamentally different portability requirements. Some must stay portable to preserve meaningful flexibility. Others can use managed services without creating problematic dependencies. The key is knowing which is which.

Layer-by-Layer Portability Analysis

Model Layer: Abstract Your Interfaces

The model layer is deceptively simple to get wrong. The temptation is to call provider APIs directly—OpenAI’s client library is well-designed, and the code is clean. But direct coupling creates two problems: you can’t switch providers without touching every file that calls the API, and you can’t easily test with different models.

The solution is a thin abstraction layer. Not a massive framework—just enough indirection to swap providers without rewriting business logic.

Here’s a minimal example in Python:

from abc import ABC, abstractmethod
from typing import List, Dict, Any

class LLMProvider(ABC):
    """Abstract interface for LLM providers."""

    @abstractmethod
    def complete(self, messages: List[Dict[str, str]], **kwargs) -> str:
        """Generate a completion from a list of messages."""
        pass

    @abstractmethod
    def embed(self, text: str) -> List[float]:
        """Generate an embedding vector for text."""
        pass


class OpenAIProvider(LLMProvider):
    def __init__(self, model: str = "gpt-4o"):
        from openai import OpenAI
        self.client = OpenAI()
        self.model = model
        self.embed_model = "text-embedding-3-small"

    def complete(self, messages: List[Dict[str, str]], **kwargs) -> str:
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            **kwargs
        )
        return response.choices[0].message.content

    def embed(self, text: str) -> List[float]:
        response = self.client.embeddings.create(
            model=self.embed_model,
            input=text
        )
        return response.data[0].embedding


class AnthropicProvider(LLMProvider):
    def __init__(self, model: str = "claude-sonnet-4-20250514"):
        from anthropic import Anthropic
        self.client = Anthropic()
        self.model = model

    def complete(self, messages: List[Dict[str, str]], **kwargs) -> str:
        # Anthropic uses a different message format
        response = self.client.messages.create(
            model=self.model,
            max_tokens=kwargs.get("max_tokens", 4096),
            messages=messages
        )
        return response.content[0].text

    def embed(self, text: str) -> List[float]:
        # Use a portable embedding provider or Voyager
        raise NotImplementedError("Use dedicated embedding service")

The abstraction doesn’t need to be elaborate. What matters is that your agent code depends on LLMProvider, not on OpenAI or Anthropic directly. When you need to switch providers—or support multiple providers for different use cases—you change the implementation, not the consumers.

One nuance worth noting: fine-tuned models complicate portability. If you fine-tune through a cloud provider (Bedrock, Vertex), those weights exist only in that provider’s infrastructure. If portability matters, consider fine-tuning approaches that produce weights you control—LoRA adapters you can host anywhere, or full fine-tunes on open models like Llama that you deploy yourself.

Orchestration Layer: This Is the Critical Layer

The orchestration layer is where portability is won or lost. Your agent logic—the chains, the reasoning patterns, the tool integrations—represents the bulk of your custom development. If this layer is locked to a specific provider, everything else becomes academic.

The distinction is straightforward: LangChain, CrewAI, and AutoGen are portable. They’re open-source frameworks that run anywhere Python runs. The agent definitions, the chain logic, the tool configurations—they’re your code, stored in your repository, deployable to any infrastructure.

Bedrock Agents, Vertex AI Agents, and Azure AI Agent Service are not portable. They use proprietary runtimes and configuration formats. The agents you build exist only within those platforms’ consoles and APIs. You can’t export a Bedrock Agent as code and run it elsewhere.

The key question to ask: Does your agent logic run outside the cloud console? If you can clone your repository, run python agent.py, and have your agent function (with appropriate credentials), your orchestration is portable. If your agent only exists as a configuration in a cloud console, it isn’t.

This doesn’t mean cloud-native orchestration is always wrong. For simple, single-purpose agents where portability genuinely doesn’t matter, managed services offer convenience. But for core platform capabilities—the agents that power your key workflows—you want orchestration you control.

Here’s what portable orchestration looks like in practice with LangChain:

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.tools import Tool
from langchain_core.prompts import ChatPromptTemplate

# Provider-agnostic: swap the LLM implementation without changing agent logic
def create_support_agent(llm_provider: LLMProvider, tools: List[Tool]):
    """Create a support agent that runs on any infrastructure."""

    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a customer support agent.
        Use the provided tools to look up information.
        Always cite your sources. Escalate if uncertain."""),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}")
    ])

    # The agent definition is pure Python—runs anywhere
    agent = create_tool_calling_agent(
        llm=llm_provider.as_langchain_llm(),  # Adapter method
        tools=tools,
        prompt=prompt
    )

    return AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=True,
        max_iterations=10
    )

This agent runs on EKS, GKE, AKS, on-prem Kubernetes, or a laptop. The orchestration logic doesn’t know or care where it’s deployed.

Storage Layer: Where Your Knowledge Lives

Vector databases are the sleeper lock-in risk. Your knowledge base—potentially representing months of data preparation, chunking optimization, and embedding generation—lives in your vector storage. If that storage doesn’t export, neither does your investment.

Postgres with pgvector is the maximally portable option. It’s the database you already know, with vector operations added. Standard pg_dump exports everything. Your vectors, your metadata, your relational data—all in one portable package.

-- pgvector schema: completely standard Postgres
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(1536),  -- OpenAI ada-002 dimensions
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Efficient similarity search with IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Pinecone offers a middle ground. It’s a managed SaaS, but with standard APIs and documented export capabilities. You can extract your vectors and metadata in portable formats. The convenience of managed infrastructure without the data captivity.

Bedrock Knowledge Bases and Vertex Vector Search present portability challenges. The ingestion is easy—point at S3 or GCS, and indexing happens automatically. But the export story is weak. These services are designed for data to flow in, not out. If your knowledge base represents significant investment, that investment becomes cloud-specific.

When evaluating vector storage, ask two questions: Can I export my vectors in a standard format? and Can I export my index, or do I need to rebuild it? Rebuilding indexes for millions of vectors is expensive. True portability means exporting the operational system, not just the raw data.

Observability Layer: Tracing Without Traps

AI observability is still maturing, which makes early choices particularly consequential. The tools you adopt now will shape your operational practices for years.

LangSmith, Langfuse, and Phoenix represent the portable options. They’re designed for AI workloads specifically, with features like trace visualization, prompt versioning, and evaluation tracking. Critically, they either self-host (Langfuse, Phoenix) or offer data export (LangSmith).

OpenTelemetry deserves special mention because it provides infrastructure-agnostic tracing that works across providers. OTEL collectors can route trace data anywhere—cloud-native services, self-hosted Jaeger, commercial observability platforms. Building on OTEL means your tracing investment follows you.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Provider-agnostic tracing setup
def configure_tracing(endpoint: str):
    """Configure OTEL tracing to any compatible backend."""
    provider = TracerProvider()
    processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=endpoint))
    provider.add_span_processor(processor)
    trace.set_tracer_provider(provider)
    return trace.get_tracer("genai-platform")

# Usage in agent code
tracer = configure_tracing("http://otel-collector:4317")

with tracer.start_as_current_span("agent_invocation") as span:
    span.set_attribute("agent.name", "support_agent")
    span.set_attribute("model.provider", "anthropic")
    # Agent execution happens here

The trap to avoid: proprietary evaluation tools with no export. If your evaluation data—the scores, the human feedback, the regression test results—lives only in a platform-specific format, you can’t switch evaluation providers without losing history. Evaluation data is institutional knowledge; treat it accordingly.

Infrastructure Layer: Containers Are Your Friend

Container-based deployment is portable by design. A Docker image runs on any container runtime. Kubernetes manifests deploy to any conformant cluster. This isn’t accidental—it’s the point of containerization.

For GenAI platforms, this means packaging your agents as container images and deploying through Kubernetes. The same manifests that deploy to EKS deploy to GKE, AKS, or on-prem clusters with minimal modification.

Terraform with provider abstraction extends this portability to infrastructure provisioning:

# Abstract provider configuration
variable "cloud_provider" {
  type    = string
  default = "aws"
}

# Provider-specific implementations behind consistent interfaces
module "kubernetes_cluster" {
  source = "./modules/k8s-${var.cloud_provider}"

  cluster_name    = "genai-platform"
  node_count      = 3
  instance_type   = var.instance_types[var.cloud_provider]
}

module "vector_database" {
  source = "./modules/postgres-${var.cloud_provider}"

  instance_class = "db.r6g.large"
  storage_gb     = 100
}

The serverless question is more nuanced. AWS Lambda, Cloud Functions, and Azure Functions offer convenience, but they also introduce provider-specific invocation patterns, cold start characteristics, and deployment mechanisms. For components where serverless makes sense (infrequent batch jobs, simple webhooks), the lock-in is usually acceptable. For core agent workloads, containers provide better portability with comparable operational simplicity.

Reference Architecture

Bringing these patterns together, here’s a reference architecture that runs on any major cloud or on-prem:

Compute: Kubernetes cluster (EKS, GKE, AKS, or self-managed). Agent services deployed as containerized FastAPI applications with horizontal pod autoscaling.

Orchestration: LangChain or CrewAI for agent logic, stored as Python code in your repository. All agent definitions, tool configurations, and prompt templates version-controlled.

Model access: Abstracted provider interface supporting OpenAI, Anthropic, and self-hosted models. Provider selection via environment configuration.

Vector storage: PostgreSQL with pgvector extension, deployed as a managed service (RDS, Cloud SQL, Azure Database) or self-hosted. Standard connection pooling via PgBouncer.

Observability: OpenTelemetry for distributed tracing, exported to your collector of choice. Langfuse or Phoenix for AI-specific evaluation and prompt tracking.

Infrastructure-as-code: Terraform with modular provider configurations. Separate modules for each cloud provider behind consistent interfaces.

The key property: any component can be replaced without rewriting the others. Swap OpenAI for Anthropic by changing configuration. Move from AWS to GCP by deploying different Terraform modules. Switch vector databases by implementing a new repository interface. The boundaries are clean, and the coupling is intentional.

The 90-Day Migration Test

Before committing to any architecture, run this mental exercise: If we had to move this to a different cloud in 90 days, what would break?

Walk through each layer systematically.

Model layer: Are we calling providers directly, or through an abstraction? If direct, how many files need changes?

Orchestration layer: Does our agent logic exist as code we control, or as configurations in a cloud console? Can we run our agents locally?

Storage layer: Can we export our vector database? Is the export format standard, or proprietary? How long would re-indexing take?

Observability layer: Does our tracing data export in standard formats? Would we lose evaluation history?

Infrastructure layer: Is our deployment described in provider-agnostic IaC? What provider-specific services have we adopted?

The answers identify your long poles. Maybe model abstraction is weak—that’s fixable in a sprint. Maybe your knowledge base is locked into Bedrock—that’s a quarter of extraction work. The exercise surfaces risks before they become regrets.

Where Managed Services Are Fine

Portability absolutism is as counterproductive as ignoring lock-in entirely. Some managed services create minimal switching costs and provide genuine operational value.

Authentication and identity can safely use cloud-native IAM. AWS IAM, GCP IAM, Azure AD—they’re all solving the same problem with similar concepts. Migration involves configuration changes, not architecture changes.

Secrets management works well as managed services. Vault is portable; AWS Secrets Manager and GCP Secret Manager are operationally equivalent. The interfaces are similar enough that switching is straightforward.

CDN and DNS are essentially commoditized. CloudFront, Cloud CDN, Azure CDN—the capabilities are comparable, and migration is configuration updates.

Basic networking—VPCs, subnets, security groups—is provider-specific by nature, but the concepts translate directly. Your network architecture knowledge transfers even when the specific APIs don’t.

The principle: accept coupling where the switching cost is low and the operational benefit is high. Resist coupling where the switching cost is high and the benefit is merely convenience.

Making It Real

These patterns aren’t theoretical. They’re how we approach every platform we build at theorem.agency. The specifics vary—some clients need multi-cloud from day one, others just want the option preserved—but the architectural discipline is consistent.

If you’re in the middle of designing your GenAI architecture and want to pressure-test your decisions, we’re happy to do a quick review. Sometimes a second set of eyes catches the lock-in risk hiding in an innocuous-looking integration.

The window for intentional architecture is smaller than it appears. The patterns you establish in the next few months will shape your options for years. Make them count.

theorem.agency builds AI platforms on open-source components that run in any cloud—or on-prem. We handle the architecture, the orchestration, the prompt engineering, and the handover. You own everything.