Chunking for LLMs: How to Format Content for AI Retrieval
Chunking for LLMs: How to Format Content for AI Retrieval
One of the most under-discussed concepts in GEO is also one of the most fundamental: chunking. It's the process of breaking content into the smaller units AI systems actually retrieve and reuse. Every AI engine that pulls from your content does it through chunks, and the way your content is structured determines whether those chunks are clean, useful, and citable, or fragmented, ambiguous, and skipped.
If you're writing content for AI systems and don't know what chunking is, you're optimizing in the dark. Here's the practical guide.
What chunking actually is
Chunking is what RAG (retrieval-augmented generation) systems do every time they need to find content from a corpus. The system takes your full document, splits it into smaller units (chunks), embeds each chunk into a vector representation, and stores those embeddings. When a user asks a question, the system retrieves the most semantically similar chunks and feeds them to the LLM as context for the answer.
The chunks are the unit of retrieval. They're what gets pulled. They're what gets cited. They're what gets quoted back to users. Whatever ends up in a chunk is what AI systems can use; whatever falls between chunks is invisible.
The Strapi guide on GEO content strategy puts the implication directly: AI optimization requires "modular content chunking" as a core priority for RAG systems. The way your content is structured determines the chunks the AI ends up with.
The core principle of good chunks
Both Pinecone and Weaviate, two leading vector database providers with deep practical experience tuning chunking for production RAG systems, converge on the same guiding principle:
"If a chunk of text makes sense to a human reading it alone, it will make sense to the language model too."
That's the single most useful test you can run on any chunk. Take it out of context. Read it cold. Does it stand on its own? Does it answer a question? Does it make a coherent point without needing the rest of the document? If yes, it's doing its job. If no, the chunk is fragile, and the AI will either pull it incorrectly or skip it entirely.
This is also why answer-first writing, self-contained sections, and entity-rich paragraphs all matter. They're the writing-level techniques that produce good chunks at the retrieval level.
The major chunking strategies AI systems use
You don't need to be a vector database engineer to understand these, but knowing them helps you write content that survives any of them. Pinecone and Weaviate together describe several primary methods:
1. Fixed-size chunking. The simplest and most common approach: pick a token count (commonly 256, 512, or 1024 tokens) and split the document into equal-sized chunks. As Pinecone notes, "fixed-sized chunking will be the best path in most cases" because it's predictable, fast, and works across content types. The risk is that chunks can split mid-sentence or mid-thought, fragmenting meaning.
2. Content-aware chunking. Splits at sentence boundaries, paragraph boundaries, or recursive character-level separators (paragraph break, then sentence, then word). This preserves natural language boundaries but produces variable chunk sizes.
3. Document-structure chunking. Uses the document's native formatting, markdown headers, HTML tags, code blocks, to identify chunk boundaries. This is the chunking method most relevant to GEO, because it's the one that works with content authored the way most articles already are.
4. Semantic chunking. Splits the document at meaning boundaries detected by embedding similarity. When two consecutive paragraphs are semantically distant, the system creates a chunk boundary between them.
5. Hierarchical chunking. Creates multiple abstraction layers, broad summaries at one level, granular details at another, so the system can retrieve different levels of detail depending on the query.
The key insight: document-structure chunking is the strategy your content can directly influence. Write with clean headings, semantic HTML, and self-contained sections, and you make your content chunk well under any of these strategies.
Chunk size recommendations
The most common chunk sizes in production AI systems land in two ranges. Pinecone recommends "smaller chunks (e.g., 128 or 256 tokens)" for granular information and "larger chunks (e.g., 512 or 1024 tokens)" for retaining context. Weaviate suggests starting with "a chunk size of 512 tokens and a chunk overlap of 50-100 tokens" as a baseline.
The Strapi guide cites Azure's documentation as the most practical default for content optimization: "512-1024 tokens with 20% overlap for semantic coherence."
What does this mean for content writers? Two things:
- Aim for sections roughly 300-700 words long (which corresponds to roughly 400-900 tokens depending on writing style). This gives chunkers a natural unit to grab.
- Make sure each section is self-contained at that length. A 600-word section that depends on context from earlier in the document will produce a 600-word chunk that doesn't stand alone.
The 20% overlap means consecutive chunks share roughly 20% of their content, which helps the system handle queries that span chunk boundaries. But the overlap is something the chunker handles automatically, you just have to write sections that hold together.
Use semantic HTML to give chunkers explicit boundaries
Document-structure chunking depends on your markup. If your content has a clean H1-H2-H3 hierarchy with article and section tags marking the boundaries between major content blocks, the chunker has clear places to split. If your content is one giant div with no semantic structure, the chunker has to guess.
The Strapi guide reinforces this: AI systems require "semantic HTML5 elements (article, section tags) and strict heading hierarchies (H1 → H2 → H3)" to parse content systematically. The semantic markup gives the chunker a roadmap; without it, the chunker is operating blind.
Two practical rules:
- Use H2s as natural chunk boundaries. Each H2 section should be a complete, self-contained unit that could stand alone as a chunk.
- Use article and section tags to mark major content blocks. The article tag wraps the entire piece; section tags mark each major H2-delimited part inside it.
Write sections that survive being chunked
The most important writing-level rule for chunkability: each section should make sense in isolation. This means:
- No "as we discussed above" or other backward references that depend on earlier sections
- No "we'll cover this in the next section" or forward references that depend on what comes after
- Entity names spelled out rather than relying on pronouns whose antecedents live in other sections
- Key claims stated directly rather than implied based on context the chunk doesn't have
- The opening sentence of each section being a clean, complete answer to the section's question
This is the same advice as answer-first writing, self-contained sections, and entity-rich content, and it converges here for a reason. All of those techniques produce better chunks. Better chunks produce better citations.
Avoid the chunking failure modes
Some common content patterns produce chunks that perform poorly:
- Long unbroken paragraphs, the chunker can't find natural splits, so it splits mid-sentence and produces fragmented chunks
- Short, fragmentary sections, too small to be self-contained; they get merged with neighboring sections, producing chunks that mix unrelated topics
- Heavy use of inline formatting (footnotes, sidebars, callouts), the chunker may merge or split these unpredictably
- Tables embedded in prose without clear semantic markup, the chunker often fragments tables across chunks, breaking the data
- Lists without clear container elements, items get separated from their parent context
None of these are catastrophic individually. Combined, they produce content that's much harder to retrieve cleanly than equivalent content with better structure.
Test chunkability before publishing
The most useful exercise: the standalone-section read. Take any H2 section from your draft. Copy it out of the document. Read it on its own. Ask:
- Does it make sense without the surrounding context?
- Does it answer a specific question?
- Does it contain enough entity names that it doesn't rely on antecedents from elsewhere?
- If an AI extractor pulled just this section as the answer to a user query, would the user get value from it?
If the answer to all four is yes, the section will chunk well under almost any chunking strategy. If any answer is no, fold the missing context inline before publishing.
Chunkability is GEO at the molecular level
Most GEO advice focuses on page-level and site-level decisions: schema, sitemap, internal linking, content strategy. Chunkability is the same thinking applied at the section level, making each section of content a discrete unit that can be retrieved, understood, and cited on its own.
Use semantic HTML so chunkers have clear boundaries. Aim for sections in the 300-700 word range. Lead each section with a self-contained answer. Eliminate backward and forward references. Use entity names instead of pronouns. Test each section by reading it cold.
Stop writing for "the article" and start writing for "the chunk that will be retrieved." Same content. Slightly different mental model. Substantially more citations.