Building AI features into a Next.js app is straightforward once you structure concerns properly: secure server calls, predictable types, robust error handling, and efficient UX. This guide shows a practical, production‑ready integration of OpenAI’s API with the App Router using TypeScript - including streaming chat, image generation, and deployment tips - so you can ship quickly and safely.
TL;DR
- Install SDK + set env: add
OPENAI_API_KEYto.env.localand use the official SDK. - Server‑only calls: use
app/api/*/route.tsor server actions to keep keys private. - Streaming chat: stream tokens for a responsive UX; fall back to non‑stream on error.
- Image generation: separate route with explicit size and content policy checks.
- Observability: log request IDs and failures; add basic rate limits.
- Next steps: For grounding over your docs, see our RAG production guide. For metadata, see Next.js SEO best practices.
Architecture at a Glance
Client (React) ──▶ API Route (`/api/chat`) ──▶ OpenAI (Responses)
▲ │ │
│ ├── Rate limiting │
Streaming UI ◀───────────┤ Validation │
└── Logging / Metrics ◀──┘
Optional:
Client ──▶ API Route (`/api/image`) ──▶ OpenAI (Images)If you plan to ground responses on your content, add a retrieval layer before the model call. We cover that pattern in detail in Retrieval‑Augmented Generation (RAG).
Prerequisites
- Next.js 13+ App Router
- TypeScript
- Node 18+
Install dependencies:
pnpm add openai zodCreate .env.local (never commit this):
OPENAI_API_KEY=sk-...Server SDK Initialization (Typed)
// lib/ai.ts
import OpenAI from "openai";
/**
* Creates a singleton OpenAI client configured for server‑side usage only.
* Ensure this file is imported ONLY from server code (API routes, server actions, RSC).
*/
export const getOpenAIClient = () => {
const apiKey = process.env.OPENAI_API_KEY;
if (!apiKey) {
throw new Error("Missing OPENAI_API_KEY env var");
}
return new OpenAI({ apiKey });
};Basic Chat Completion API Route
// app/api/chat/route.ts
import { NextRequest, NextResponse } from "next/server";
import { z } from "zod";
import { getOpenAIClient } from "@/lib/ai";
export const runtime = "nodejs"; // or "edge" if you prefer; verify SDK support
/**
* Validates client payload to avoid unexpected shapes and injection of unwanted fields.
*/
const ChatSchema = z.object({
messages: z
.array(
z.object({
role: z.enum(["system", "user", "assistant"]),
content: z.string().min(1),
})
)
.min(1),
temperature: z.number().min(0).max(2).optional(),
});
export const POST = async (req: NextRequest) => {
const requestId = crypto.randomUUID();
try {
const json = await req.json();
const { messages, temperature } = ChatSchema.parse(json);
const openai = getOpenAIClient();
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages,
temperature: temperature ?? 0.5,
});
const text = response.choices[0]?.message?.content ?? "";
return NextResponse.json({ id: requestId, text });
} catch (error: unknown) {
// Surface useful diagnostics without leaking sensitive details
const message = error instanceof Error ? error.message : "Unknown error";
return NextResponse.json(
{ id: requestId, error: message },
{ status: 400 }
);
}
};This synchronous approach returns the full message when ready. For a more responsive UX, stream tokens as they arrive.
Streaming Chat (ReadableStream)
// app/api/chat/stream/route.ts
import { NextRequest } from "next/server";
import { z } from "zod";
import { getOpenAIClient } from "@/lib/ai";
export const runtime = "nodejs";
const ChatSchema = z.object({
messages: z
.array(
z.object({
role: z.enum(["system", "user", "assistant"]),
content: z.string().min(1),
})
)
.min(1),
});
export const POST = async (req: NextRequest) => {
const { messages } = ChatSchema.parse(await req.json());
const openai = getOpenAIClient();
const stream = new ReadableStream<Uint8Array>({
start: async (controller) => {
try {
const resp = await openai.chat.completions.create({
model: "gpt-4o-mini",
stream: true,
messages,
});
for await (const chunk of resp) {
const content = chunk.choices[0]?.delta?.content;
if (content) controller.enqueue(new TextEncoder().encode(content));
}
} catch (e) {
controller.error(e);
} finally {
controller.close();
}
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "no-store",
},
});
};You can enhance this with SSE framing, JSON Lines, or the OpenAI Responses API depending on your preference. The key is keeping long‑lived connections in the API route, not the client.
Client Chat Component (TypeScript, Hooks)
// components/ChatBox.tsx
"use client";
import { useCallback, useMemo, useRef, useState } from "react";
type Role = "system" | "user" | "assistant";
type ChatMessage = { role: Role; content: string };
const ERROR_GENERIC = "Something went wrong. Please try again." as const;
/**
* Simple chat UI with non‑streaming submit and optional streaming switch.
*/
export const ChatBox = () => {
const [messages, setMessages] = useState<ChatMessage[]>([
{ role: "system", content: "You are a concise assistant." },
]);
const [input, setInput] = useState("");
const [thinking, setThinking] = useState(false);
const [streaming, setStreaming] = useState(false);
const abortRef = useRef<AbortController | null>(null);
const canSend = useMemo(() => input.trim().length > 0 && !thinking, [input, thinking]);
const send = useCallback(async () => {
if (!canSend) return;
const userMsg: ChatMessage = { role: "user", content: input.trim() };
setInput("");
setThinking(true);
try {
if (!streaming) {
const res = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages: [...messages, userMsg] }),
});
const data = (await res.json()) as { text?: string; error?: string };
const assistant: ChatMessage = { role: "assistant", content: data.text ?? data.error ?? ERROR_GENERIC };
setMessages((prev) => [...prev, userMsg, assistant]);
} else {
abortRef.current?.abort();
abortRef.current = new AbortController();
const res = await fetch("/api/chat/stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages: [...messages, userMsg] }),
signal: abortRef.current.signal,
});
const reader = res.body?.getReader();
let acc = "";
if (reader) {
// optimistic append user and an empty assistant message
setMessages((prev) => [...prev, userMsg, { role: "assistant", content: "" }]);
// stream loop
while (true) {
const { done, value } = await reader.read();
if (done) break;
acc += new TextDecoder().decode(value);
setMessages((prev) => {
const copy = [...prev];
copy[copy.length - 1] = { role: "assistant", content: acc };
return copy;
});
}
}
}
} catch (_e) {
setMessages((prev) => [...prev, userMsg, { role: "assistant", content: ERROR_GENERIC }]);
} finally {
setThinking(false);
}
}, [canSend, input, messages, streaming]);
return (
<div className="space-y-3">
<div className="flex items-center gap-2">
<input
className="flex-1 rounded border px-3 py-2"
value={input}
onChange={(e) => setInput(e.target.value)}
placeholder="Ask something…"
/>
<button className="rounded bg-black px-3 py-2 text-white disabled:opacity-50" disabled={!canSend} onClick={send}>
{thinking ? "Thinking…" : "Send"}
</button>
</div>
<label className="flex items-center gap-2 text-sm opacity-75">
<input type="checkbox" checked={streaming} onChange={(e) => setStreaming(e.target.checked)} />
Stream tokens
</label>
<div className="space-y-2 rounded border p-3">
{messages.map((m, i) => (
<div key={i} className={m.role === "user" ? "text-black" : "text-zinc-700"}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
</div>
</div>
);
};Place the component anywhere in your page tree, e.g., app/page.tsx or a dedicated /chat route.
Image Generation Route
// app/api/image/route.ts
import { NextRequest, NextResponse } from "next/server";
import { z } from "zod";
import { getOpenAIClient } from "@/lib/ai";
const Schema = z.object({ prompt: z.string().min(5), size: z.enum(["256x256", "512x512", "1024x1024"]).default("512x512") });
export const POST = async (req: NextRequest) => {
try {
const { prompt, size } = Schema.parse(await req.json());
const openai = getOpenAIClient();
const result = await openai.images.generate({ prompt, size });
const url = result.data[0]?.url;
return NextResponse.json({ url });
} catch (e: unknown) {
const message = e instanceof Error ? e.message : "Unknown error";
return NextResponse.json({ error: message }, { status: 400 });
}
};On the client, fetch the URL and render with next/image. Consider content policies and user controls before exposing this feature publicly.
Server Actions (Optional)
If you prefer server actions instead of API routes:
// app/actions/ask.ts
"use server";
import { getOpenAIClient } from "@/lib/ai";
type AskArgs = { prompt: string };
/** Executes a single‑turn prompt on the server. */
export const ask = async ({ prompt }: AskArgs) => {
if (!prompt?.trim()) return "";
const openai = getOpenAIClient();
const resp = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a concise assistant." },
{ role: "user", content: prompt.trim() },
],
temperature: 0.5,
});
return resp.choices[0]?.message?.content ?? "";
};Server actions keep calls server‑only and can reduce boilerplate for simple flows. For multi‑turn chat and streaming, API routes offer more control.
Security, Limits, and Costs
- Never expose keys: Keep all OpenAI calls on the server.
- Validate inputs: Use
zodto boundtemperature,size, and content length. - Rate limit: Consider IP/user throttling (e.g., Redis or a simple in‑memory token bucket) to prevent abuse.
- Observability: Log request IDs and error categories. Add breadcrumbs to debug streaming disconnects.
- Caching: Cache deterministic prompts where acceptable; avoid caching sensitive user inputs.
- SEO: Ensure your pages have accurate metadata and OG images; see Next.js SEO best practices.
Deployment Notes
- Edge vs Node: The official SDK supports
fetch. If you use the Responses API streaming or specific features, verify compatibility with Edge runtime. Otherwise, stick toruntime = "nodejs". - Environment management: Use project/host‑level secrets. Avoid bundling
.env.localin CI artifacts. - Cold starts: Keep clients singleton‑ish. Avoid re‑creating SDK clients per token chunk.
- VPS or serverless: If you self‑host, see our guide on deploying Next.js on a VPS to tune Node and reverse proxies.
Extending with RAG (Grounding on Your Data)
To answer with citations from your docs, add:
- An ingestion pipeline (parse, chunk, embed) into a vector DB.
- A retrieval step before the model call.
- A prompt that requires citing sources.
We provide a complete blueprint and TypeScript snippets in our RAG production guide. Pair this with clean metadata for higher CTR as outlined in Next.js SEO best practices.
Common Pitfalls (and Fixes)
- Mixing client and server concerns - keep model calls server‑only.
- Missing input validation - always
zod.parsepayloads before calling the model. - Streaming in the client only - stream from the server and render progressively.
- No backpressure - pause reads if the UI can’t keep up; or buffer minimally.
- Silent failures - return structured errors with messages suitable for end users.
Minimal E2E Test Plan
- Non‑stream chat: send 3 prompts and assert non‑empty answers.
- Stream chat: verify incremental updates, final completion, and graceful abort.
- Image route: validate allowed sizes and 400s for invalid prompts.
- Load test: short soak with 10–20 concurrent requests to find obvious bottlenecks.
What to Ship Next
- A polished chat page using your design system; consider optimistic UI and retry.
- Guardrails: input length caps, profanity filter, or moderation endpoint checks.
- Structured outputs (JSON) with
zodvalidation when the UI expects fields.
If you plan to monetize AI features, you might connect billing with a provider like Paddle - see the practical integration notes in Paddle integration for SaaS. Also, consider performance culture and developer efficiency; our piece on Vibe coding with Cursor has practical tips for fast iteration loops.
Conclusion
Integrating OpenAI with Next.js is mostly disciplined web engineering: keep secrets on the server, validate inputs, stream for responsiveness, and log what matters. With the patterns above, you can ship a robust MVP in hours and extend it with grounding (RAG), better UX, and production controls as your app grows. When you’re ready to scale traffic, revisit your hosting setup with our VPS deployment guide and polish your metadata using the SEO checklist.
