Building AI Workflows with LangChain vs LlamaIndex: A Developer’s Guide

Choosing between LangChain and LlamaIndex is less about hype and more about how you’ll compose workflows: ingestion, retrieval, memory, tool-calling, and streaming across your stack. This guide compares both libraries with practical TypeScript snippets in a Next.js context.

If you need a quick primer on Next.js model integration, start with OpenAI integration. For retrieval patterns, read RAG for SaaS and Vector Databases. For chat scaffolds, see AI chatbot with React + Node and LangChain + Next.js chatbot.

Architecture Overview

Docs → Ingestion (parse → chunk → embed) → Vector DB
Query → Retrieval (filters, hybrid) → Orchestration (chains/engines) → LLM → Answer + Citations

LangChain: ChatPromptTemplate, RunnableSequence, Memory, Tools
LlamaIndex: Nodes, VectorStoreIndex, QueryEngine, Observability

Installing the Stacks

pnpm add langchain openai zod
pnpm add @llamaindex/node @llamaindex/core

Add OPENAI_API_KEY=sk-... to .env.local.

LangChain: Minimal RAG Chain

// lib/lc/rag.ts
import { ChatOpenAI } from "langchain/chat_models/openai";
import { ChatPromptTemplate } from "langchain/prompts";
import { RunnableSequence } from "langchain/schema/runnable";

type Retrieved = { id: string; text: string; source?: string };

export const buildLcRag = ({ retrieve }: { retrieve: (q: string) => Promise<Retrieved[]> }) => {
  const llm = new ChatOpenAI({ modelName: "gpt-4o-mini", temperature: 0.2 });
  const prompt = ChatPromptTemplate.fromTemplate(
    "You answer using ONLY the provided context. If insufficient, say you don't know. Include citations like [CITATION:id].\n\nContext:\n{context}\n\nQuestion: {question}"
  );

  const chain = RunnableSequence.from([
    async ({ question }: { question: string }) => {
      const docs = await retrieve(question);
      const context = docs.map((d) => `[CITATION:${d.id}] ${d.text}`).join("\n\n");
      return { question, context };
    },
    prompt,
    llm,
  ]);

  return { chain };
};

LlamaIndex: Minimal Query Engine

// lib/li/engine.ts
import { OpenAI } from "@llamaindex/node";
import { VectorStoreIndex, SimpleNodeParser } from "@llamaindex/core";

type Doc = { id: string; text: string };

export const buildLiEngine = async (docs: Doc[]) => {
  const parser = new SimpleNodeParser();
  const nodes = docs.map((d) => parser.createNode(d.text, { id_: d.id }));
  const index = await VectorStoreIndex.fromNodes(nodes);
  const engine = index.asQueryEngine({
    retriever: { similarityTopK: 8 },
    llm: new OpenAI({ model: "gpt-4o-mini" }),
  });
  return { engine };
};

Next.js API Routes (Side-by-Side)

// app/api/lc/route.ts
import { NextRequest, NextResponse } from "next/server";
import { buildLcRag } from "@/lib/lc/rag";

export const POST = async (req: NextRequest) => {
  const { question } = (await req.json()) as { question?: string };
  if (!question) return NextResponse.json({ error: "Missing question" }, { status: 400 });
  const retrieve = async () => [{ id: "1", text: "Example context" }];
  const { chain } = buildLcRag({ retrieve });
  const res = await chain.invoke({ question });
  return NextResponse.json({ text: res.toString() });
};

// app/api/li/route.ts
import { NextRequest, NextResponse } from "next/server";
import { buildLiEngine } from "@/lib/li/engine";

export const POST = async (req: NextRequest) => {
  const { question } = (await req.json()) as { question?: string };
  if (!question) return NextResponse.json({ error: "Missing question" }, { status: 400 });
  const { engine } = await buildLiEngine([{ id: "1", text: "Example context" }]);
  const res = await engine.query({ query: question });
  return NextResponse.json({ text: String(res?.response ?? "") });
};

Streaming Patterns

Both stacks can stream tokens server-side. Prefer SSE or ReadableStream in API routes and progressively render in the client. See our OpenAI streaming in Next.js integration.

Strengths and Trade-offs

LangChain: rich graph/runnables, wide tool ecosystem, explicit chains.
LlamaIndex: strong ingestion/index abstractions, convenient query engines, observability.

Choose based on where you spend your time: orchestration and tools (LangChain) or indexing/query engines and data connectors (LlamaIndex).

Hybrid Retrieval and Filters

Regardless of library, hybrid retrieval (semantic + keyword) improves recall for IDs and rare terms. See our patterns in Vector Databases.

Evaluations and Cost Controls

Measure groundedness, answer quality, and latency/cost. Cache query rewrites, cap candidates, compress before generation. For deployment and runtime hardening, review VPS deployment guide.

Conclusion

LangChain and LlamaIndex overlap but shine in different layers. Compose a stack that fits your priorities: explicit chains and tool use (LangChain) or indexing/query engines and connectors (LlamaIndex). Start with a minimal RAG endpoint, stream responses, add filters, and measure quality before scaling.