Choosing between LangChain and LlamaIndex is less about hype and more about how you’ll compose workflows: ingestion, retrieval, memory, tool-calling, and streaming across your stack. This guide compares both libraries with practical TypeScript snippets in a Next.js context.
If you need a quick primer on Next.js model integration, start with OpenAI integration. For retrieval patterns, read RAG for SaaS and Vector Databases. For chat scaffolds, see AI chatbot with React + Node and LangChain + Next.js chatbot.
Architecture Overview
Docs → Ingestion (parse → chunk → embed) → Vector DB
Query → Retrieval (filters, hybrid) → Orchestration (chains/engines) → LLM → Answer + Citations
LangChain: ChatPromptTemplate, RunnableSequence, Memory, Tools
LlamaIndex: Nodes, VectorStoreIndex, QueryEngine, ObservabilityInstalling the Stacks
pnpm add langchain openai zod
pnpm add @llamaindex/node @llamaindex/coreAdd OPENAI_API_KEY=sk-... to .env.local.
LangChain: Minimal RAG Chain
// lib/lc/rag.ts
import { ChatOpenAI } from "langchain/chat_models/openai";
import { ChatPromptTemplate } from "langchain/prompts";
import { RunnableSequence } from "langchain/schema/runnable";
type Retrieved = { id: string; text: string; source?: string };
export const buildLcRag = ({ retrieve }: { retrieve: (q: string) => Promise<Retrieved[]> }) => {
const llm = new ChatOpenAI({ modelName: "gpt-4o-mini", temperature: 0.2 });
const prompt = ChatPromptTemplate.fromTemplate(
"You answer using ONLY the provided context. If insufficient, say you don't know. Include citations like [CITATION:id].\n\nContext:\n{context}\n\nQuestion: {question}"
);
const chain = RunnableSequence.from([
async ({ question }: { question: string }) => {
const docs = await retrieve(question);
const context = docs.map((d) => `[CITATION:${d.id}] ${d.text}`).join("\n\n");
return { question, context };
},
prompt,
llm,
]);
return { chain };
};LlamaIndex: Minimal Query Engine
// lib/li/engine.ts
import { OpenAI } from "@llamaindex/node";
import { VectorStoreIndex, SimpleNodeParser } from "@llamaindex/core";
type Doc = { id: string; text: string };
export const buildLiEngine = async (docs: Doc[]) => {
const parser = new SimpleNodeParser();
const nodes = docs.map((d) => parser.createNode(d.text, { id_: d.id }));
const index = await VectorStoreIndex.fromNodes(nodes);
const engine = index.asQueryEngine({
retriever: { similarityTopK: 8 },
llm: new OpenAI({ model: "gpt-4o-mini" }),
});
return { engine };
};Next.js API Routes (Side-by-Side)
// app/api/lc/route.ts
import { NextRequest, NextResponse } from "next/server";
import { buildLcRag } from "@/lib/lc/rag";
export const POST = async (req: NextRequest) => {
const { question } = (await req.json()) as { question?: string };
if (!question) return NextResponse.json({ error: "Missing question" }, { status: 400 });
const retrieve = async () => [{ id: "1", text: "Example context" }];
const { chain } = buildLcRag({ retrieve });
const res = await chain.invoke({ question });
return NextResponse.json({ text: res.toString() });
};// app/api/li/route.ts
import { NextRequest, NextResponse } from "next/server";
import { buildLiEngine } from "@/lib/li/engine";
export const POST = async (req: NextRequest) => {
const { question } = (await req.json()) as { question?: string };
if (!question) return NextResponse.json({ error: "Missing question" }, { status: 400 });
const { engine } = await buildLiEngine([{ id: "1", text: "Example context" }]);
const res = await engine.query({ query: question });
return NextResponse.json({ text: String(res?.response ?? "") });
};Streaming Patterns
Both stacks can stream tokens server-side. Prefer SSE or ReadableStream in API routes and progressively render in the client. See our OpenAI streaming in Next.js integration.
Strengths and Trade-offs
- LangChain: rich graph/runnables, wide tool ecosystem, explicit chains.
- LlamaIndex: strong ingestion/index abstractions, convenient query engines, observability.
Choose based on where you spend your time: orchestration and tools (LangChain) or indexing/query engines and data connectors (LlamaIndex).
Hybrid Retrieval and Filters
Regardless of library, hybrid retrieval (semantic + keyword) improves recall for IDs and rare terms. See our patterns in Vector Databases.
Evaluations and Cost Controls
Measure groundedness, answer quality, and latency/cost. Cache query rewrites, cap candidates, compress before generation. For deployment and runtime hardening, review VPS deployment guide.
Conclusion
LangChain and LlamaIndex overlap but shine in different layers. Compose a stack that fits your priorities: explicit chains and tool use (LangChain) or indexing/query engines and connectors (LlamaIndex). Start with a minimal RAG endpoint, stream responses, add filters, and measure quality before scaling.
