There’s a common misconception when it comes to GEO optimization at page level: many people think it’s just another technical layer on top of what they’re already doing. One more JSON-LD schema, a few more tags, and the matter’s settled.

That’s not what it’s about. GEO optimization at page level is first and foremost a different way of writing. The technique follows. It doesn’t replace.

Here’s what’s new, point by point.

The front-loading principle: response first, development later

This is the most concrete change, and the one that most challenges editorial habits.

Large language models structurally give more weight to what appears at the beginning of the document. This is a technical reality linked to the way attention works in these architectures – tokens in the initial position have a greater influence on the representation of the text as a whole. In practice: if your answer to the question posed by your page is found in the third paragraph, after a contextual introduction and a presentation of the plan, you lose a significant part of your advantage.

What it looks like in practice: after the H1, a maximum of two lines of framing, then the direct answer. Not “in this article, we’ll look at…” – the answer. One or two sentences, self-contained, understandable out of context. This TL;DR positioned at the top is what Perplexity, ChatGPT and AI Overviews extract first when they synthesize your page.

The rest of the content develops, argues and nuances. But the substance is already there, on first reading.

This reversal of the narrative pyramid is counter-intuitive for anyone trained in journalism or academic writing – two disciplines where you “build towards the conclusion”. For GEO, it’s the other way around: you conclude first, then build.

H2 and H3: formulate titles as real questions

The role of intermediate headings has evolved. They no longer serve only to structure human reading – they also signal to models the intentions covered by each section.

A title like “Introduction” or “Development” says nothing. A title like “How do LLMs decide what to quote?” says precisely what the section contains – and corresponds to the natural form of a conversational query.

The practical rule: rephrase each H2 into a question or statement that could be typed as is into ChatGPT. “What is GEO?”, “Why is schema.org not enough?”, “How do I structure a page for Perplexity?” – these formulations create a direct bridge between search intent and your content.

One H1 per page. Never two. Multiple H1s send a signal of semantic disorganization that crawlers interpret as “everything is equally important, so nothing is”.

Paragraph modularity: an imperative, not a stylistic preference

This is where the tension between human writing and algorithmic parsing is most apparent.

LLMs build their answers by assembling fragments. They take a piece from your page, a piece from another source, a third elsewhere, and synthesize. For your fragment to be usable, it must stand on its own: an idea, expressed clearly, without depending on the preceding paragraph to be understood.

Target: 40 to 60 words per paragraph. One idea per block. High factual density – proper nouns, verifiable figures, concrete examples. No intro “fluff” like “it’s important to note that…” or “as we can see…”. These formulas eat up words without contributing anything.

A simple test to check the modularity of a paragraph: extract it and read it on its own. If it needs the context of the previous paragraph to make sense, it’s not self-contained enough.

Please note: modularity does not mean aridity. A text chopped up into identical sentences of 50 words each is monotonous for the human reader and paradoxically scores less well on certain engagement criteria. Variety in sentence length – very short sentences mixed with longer ones – gives a rhythm that works both for humans and for the burstiness metrics that some detectors use in the other direction.

The formats LLMs mention most – and why

No equality between formats. Some are systematically over-represented in generative responses. In order of what is observed in practice:

Centralized sector statistics pages come out on top. A single place to find sourced figures, with date and methodology – that’s exactly what Perplexity is looking for when it answers “what are X’s trends in 2026?”. If you produce this page and keep it updated quarterly, you’re creating a sustainable citation asset.

Guides in numbered steps with descriptive titles for each step. Not “Step 1”, but “Step 1: Audit your strategic pages for JavaScript rendering”. Numbering + description transform content into a directly extractable sequence.

End-of-page FAQs. As long as they’re written in natural language, answer questions that people actually ask (not disguised marketing questions), and provide self-contained answers in 2 to 4 sentences.

Comparison tables with explicit criteria. They can be scanned both by the human in a hurry and by the algorithm seeking to compare options for a user.

What doesn’t work so well: long narrative content without intermediate semantic anchors. A 3,000-word article in continuous prose, with no question headings, no lists, no initial summary – it will be cited less often even if it is excellent in substance.

Information gain: the only real question

Google has filed a patent on the concept of “information gain”. The idea: does this page provide something that the rest of the web doesn’t already say?

It’s an uncomfortable question because the honest answer, for many pages, is no.

If your article on “GEO 2026 best practices” says the same thing as the ten others published this quarter – better worded, with better headlines, but the same background – it doesn’t create information gain. It creates additional noise in an already saturated space. LLMs, trained on this corpus, have already integrated this information. They don’t need your version to explain it to their users.

What creates information gain: your proprietary data, field observations that contradict or qualify the consensus, documented real-life experience with quantified results, a clear-cut position on a subject where everyone else is saying the same thing in a generic way.

A concrete example: if you’ve analyzed 50 URLs in your sector and found that 80% of them unintentionally block at least one major AI crawler – publish that figure with the methodology. This isn’t a complex academic study. It’s a documented field observation. And it’s exactly this kind of data that Perplexity cites when answering questions about GEO visibility.

Methodological transparency is an integral part of value. Saying “we’ve analyzed 50 URLs” with the method used and the limitations of the analysis is worth infinitely more, in terms of credibility for LLMs, than an unsourced assertion even if it’s correct.

schema.org: useful for reinforcement, useless for substitution

Google has officially confirmed that Gemini exploits structured data to improve its understanding of content – the declaration was made at Search Central Live in Madrid. It’s a long-awaited validation.

But we need to understand exactly what that means, and what it doesn’t mean.

LLMs understand natural language. They don’t need schema to understand that a page is about a product, a person or an FAQ. What schema provides is contextual precision where plain text can be ambiguous – and a machine-readable layer that clarifies intent without ambiguity of interpretation.

The schema Organization on each page of your site anchors your entity in knowledge graphs. The Person schema on your author pages links your experts to a network of recognizable entities. The FAQ schema on your answer-question content optimizes extraction. The HowTo schema on your step-by-step guides facilitates sequence parsing.

Where schema won’t save you: if your information architecture is chaotic, if your content is vague and lacking in factual density, if your pages don’t clearly respond to an identifiable intention. Markup is a quality multiplier. It doesn’t turn mediocre content into good content.

The audio transcription test

It’s a rapid diagnostic tool that I use regularly in auditing.

Take any page you want to optimize. Read it aloud, as if you were dictating it. Where do you stumble? Where is the meaning lost without the visual context – a table that lacks a label, a list whose introduction is separated by a subheading, a reference to “as we have seen previously” without the “previously” being accessible in this piece?

Every point of friction for the ear is also a point of friction for a language model that parses text sequentially. Narrative breaks, unmarked digressions, elements whose meaning depends on a non-textual visual context – all these degrade parsability in measurable ways.

The test also works the other way round: a text that “sounds good” aloud, fluent, with sentences of varying length, natural transitions and well-defined ideas – is generally a text that performs well in algorithmic extraction.

It’s not an absolute rule. But as a quick heuristic for identifying priority pages for reworking, it’s reliable.

What LLMs don’t do naturally – and you can

There are characteristic tropes in AI-generated text. The most common are: “In today’s digital landscape…”, “The future of X is…”, “It’s important to note that…”. In French, their equivalents: “Dans le paysage numérique actuel”, “Il convient de noter que”, “Cette transformation radicale de”.

These formulas are not mistakes – they’re just statistically over-represented in LLM outputs because they frequently appeared in their training corpora as introducers to informative paragraphs. The models reproduce them by statistical inertia.

What LLMs don’t do spontaneously without intensive creative prompting: real-life anecdotes with precise and incongruous details, contextual humor that presupposes shared knowledge, direct contradiction of a received idea in the sector (“contrary to what you read everywhere, X doesn’t work like that”), and assumed regrets or mistakes (“we advised this approach for two years before realizing that…”).

These elements are not stylistic embellishments. They are signals of authenticity that human readers perceive, that AI detectors look for, and that LLMs value as sources because they bring something they can’t synthesize themselves.

Use them. Not artificially – if you don’t have a field anecdote on the subject, don’t invent one. But if you’ve experienced something instructive while working on these issues, that’s where the article should go. Not in your internal slides or in a customer conversation.

GEO metrics at page level: what’s being measured today

Let’s be honest about the state of the tools: precise measurement of GEO visibility at page level is still a work in progress in 2026.

The most reliable method is manual. List 15 to 20 target queries for each strategic page. Test them on ChatGPT, Perplexity and Claude, with systematic screenshots. Note whether your page is cited, whether the URL appears, whether content is paraphrased without attribution. Repeat every month. It’s tedious. It’s also the only way to get reliable data on the evolution of your presence.

In addition: Google Search Console impressions on informational queries corresponding to your pages (to identify whether demand exists), and branded queries associated with the page’s subject (to measure whether your content is starting to generate awareness beyond direct traffic).

Some tools are beginning to automate this tracking – Otterly.ai, Profound, and a few modules in development at Semrush. None yet have the maturity of classic SEO rank tracking tools. Integrate them if you have access to them, but don’t regard them as a source of absolute truth.

What doesn’t change in measurement: organic traffic remains a useful indicator, provided it is disaggregated by type of intent. A page that loses traffic on generic informational queries while gaining on specific branded queries is often successfully repositioning itself GEO – LLMs handle the former, users who are specifically looking for you arrive on the latter.