How Large Language Models Actually Work: A Plain-English Guide

A clear, jargon-free explanation of how large language models like ChatGPT work, from tokens and training to why they sometimes get things confidently wrong.

Marcus LinMay 2, 20266 min read

How Large Language Models Actually Work: A Plain-English Guide

You have almost certainly talked to a large language model this week, even if you did not call it that. It drafted an email, summarized a document, or answered a question that would once have meant a trip to a search engine. Yet most people use these tools daily without any sense of what is happening behind the cursor.

This guide strips away the mystique. No equations, no buzzwords left unexplained, just a working mental model of what a large language model (LLM) is, how it learns, and why it behaves the way it does.

What a Language Model Actually Predicts

At its core, an LLM does one deceptively simple thing: it predicts the next chunk of text. Given everything written so far, it estimates which word, or fragment of a word, is most likely to come next, then repeats that guess over and over to build a full response.

That fragment is called a token. A token is not always a whole word. Common words like "the" are single tokens, while a rarer word like "antidisestablishmentarianism" might be split into several. The model reads and writes in tokens, not letters, which is why it sometimes miscounts the letters in a word but writes flawless paragraphs.

An LLM is not a database of facts. It is a prediction engine that has read so much text that accurate predictions often look exactly like knowledge.

This distinction matters. When the model tells you the capital of France, it is not looking up an entry. It is predicting that, after "The capital of France is," the most probable next token is "Paris," because that pattern appeared countless times in its training data.

How the Model Learns: Training in Two Phases

Building a useful LLM happens in stages, and understanding them explains a lot of the model's quirks.

Pretraining: Reading the Internet

In the first phase, the model is shown enormous quantities of text, much of it from the public web, books, and code. It plays a relentless guessing game: hide the next token, predict it, check the answer, adjust. Repeat trillions of times.

Through this process the model gradually tunes billions of internal numbers called parameters. Each parameter is a tiny dial that nudges predictions. Collectively they encode patterns of grammar, style, factual associations, and reasoning shortcuts, none of it programmed by hand.

Fine-Tuning and Alignment: Learning Manners

A freshly pretrained model is knowledgeable but unruly. It might continue your question with ten more questions, because that is a common text pattern. The second phase teaches it to be helpful.

This usually involves:

Instruction tuning, where the model studies examples of good responses to prompts so it learns to actually answer rather than ramble.
Reinforcement learning from human feedback (RLHF), where people rank competing answers and the model is steered toward the responses humans prefer.
Safety tuning, which discourages harmful, biased, or dangerous outputs.

This alignment step is why the assistant you use feels polite and on-task, while the raw model underneath is closer to an autocomplete engine on steroids.

The Engine Room: Attention and the Transformer

The breakthrough that made modern LLMs possible is an architecture called the transformer, introduced by researchers in 2017. Its key trick is a mechanism called attention.

Attention lets the model weigh how relevant every earlier word is to the word it is about to generate. When processing the sentence "The trophy did not fit in the suitcase because it was too big," attention helps the model figure out that "it" refers to the trophy, not the suitcase. The model learns these relationships statistically, by seeing similar sentences millions of times.

What makes transformers special is that they can process an entire passage in parallel rather than one word at a time. That parallelism is what allowed training to scale up to today's massive models on modern hardware.

The Context Window: The Model's Short-Term Memory

Every LLM has a context window, the maximum amount of text it can consider at once, measured in tokens. Think of it as working memory. Anything inside the window, your prompt, the documents you paste, the conversation so far, can influence the answer. Anything that falls outside is simply gone.

Early models could hold a few thousand tokens, roughly a long article. Newer ones stretch to hundreds of thousands or even millions, enough to read an entire book. But a larger window is not free: it costs more computation, and models can still lose track of details buried in the middle of very long inputs.

Crucially, the base model does not remember you between conversations. Any sense of long-term memory comes from software layers that store and re-inject relevant text into the window, not from the model itself learning about you on the fly.

Why LLMs Make Confident Mistakes

The single most important thing to understand is hallucination, the industry term for when a model states something false with total confidence.

This is not a bug bolted on by accident. It is a direct consequence of how the system works. The model is optimized to produce plausible-sounding text, and a fluent, confident wrong answer can be statistically more "likely" than an awkward admission of ignorance. The model has no built-in fact-checker and no sense of certainty in the human sense.

A few practical consequences follow:

Verify anything that matters. Names, dates, statistics, legal or medical claims, and citations all deserve independent checking.
Be wary of specifics it could not know. If you ask about a private document it never saw, a detailed answer is a red flag, not a feature.
Give it the source. Models are far more reliable when you paste the relevant material into the prompt rather than relying on their memory.

Techniques like retrieval-augmented generation (RAG), where the system first fetches real documents and then asks the model to answer using them, dramatically reduce hallucination by grounding responses in actual sources.

What These Models Are Good and Bad At

A clear mental model helps you delegate sensibly. LLMs excel at tasks that are about transforming and structuring language: summarizing, rewriting, translating, drafting, brainstorming, and explaining concepts. They are genuinely useful coding assistants and tireless first-draft generators.

They struggle, by contrast, with anything requiring precise calculation, up-to-the-minute facts they were not trained on, or genuine logical guarantees. They can reason impressively in many cases, but that reasoning is pattern-matched, not proven, so it can collapse on unusual problems.

Treat a language model like a brilliant, fast, slightly overconfident intern: wonderful for first drafts, never the final word.

The frontier is moving fast. Models that can call tools, browse, run code, and check their own work are blurring some of these limits. But the underlying nature, prediction over tokens, remains the foundation, and knowing it makes you a sharper, safer user.

The Bottom Line

Large language models are next-token prediction engines, trained on vast text and then shaped to be helpful through human feedback. The transformer's attention mechanism lets them track relationships across a passage, while the context window defines what they can "see" at any moment. Their fluency is real, but so are their hallucinations, because plausibility, not truth, is what they optimize for. Use them for drafting, transforming, and exploring ideas, supply your own sources when accuracy counts, and always verify the claims that matter. Understand the prediction machine, and it becomes one of the most powerful tools in your kit.

#large-language-models#artificial-intelligence#machine-learning#explainers