في هذه الصفحة

الدّرس 1 من 6

ماذا يفعل النّموذج اللّغويّ فعلاً

20 دقائق قراءةالصّعوبة

ما ستتعلّمه

وصف النّموذج اللّغويّ كمتنبّئ بالرّمز التّالي بكلماتك
تتبّع مطالبة → إجابة رمزًا برمز
التّمييز بين «نمط فوق النّصّ» و«الفهم»

Strip the marketing away and an LLM is doing one thing: predicting the next chunk of text given everything that came before it. That sounds small. It is not. This lesson builds the mental model that everything else in this course hangs from.

The one-sentence explanation

A large language model (LLM) is a next-token predictor. You give it text. It reads every piece of that text, then produces a probability distribution over what the next piece of text should be, picks one, appends it, and repeats. That is the entire loop. Every conversation you have ever had with ChatGPT, Claude, or Gemini is this loop running thousands of times.

Let us make that concrete. Suppose you type:

The capital of France is

The model has seen billions of documents during training. Many of them contain the phrase "The capital of France is Paris." So when it computes the probabilities for the next token, "Paris" gets a very high score. The model picks it. Now the text reads "The capital of France is Paris" and the model predicts again. This time a period (.) is highly likely. Then perhaps a newline. And so on, token by token, until the response is complete.

What "training" actually produced

During training, the model read a massive portion of the internet and books and code. But it did not memorize facts the way a database stores rows. Instead, it learned statistical patterns: which tokens tend to follow which other tokens, across billions of contexts. The result is not a lookup table. It is a neural network with billions of parameters that encodes incredibly rich patterns about language, logic, facts, and style — all compressed into those weights.

Think of it this way: the model never stored "Paris is the capital of France" as a discrete fact. Instead, it adjusted its internal parameters so that, given the pattern "The capital of France is ___", the token "Paris" gets a high probability. The distinction matters because the model has no way to flag which of its patterns are true and which are plausible-sounding nonsense. We will explore that problem deeply in Lesson 3.

The generation loop, step by step

Here is what happens every time you press Enter:

Your full prompt becomes the input sequence.
The model processes every token in that sequence through its neural network layers.
It outputs a probability for every possible next token in its vocabulary (tens of thousands of options).
One token is selected from that distribution (how it is selected is the topic of Lesson 4 on temperature).
That token is appended to the sequence.
Steps 2-5 repeat until the model produces a stop token or hits a length limit.

This is called autoregressive generation. "Autoregressive" just means "each output feeds back as input for the next step." The model literally reads its own previous output as part of the context for generating the next token. This is why early mistakes in a response can cascade — the model builds on what it has already written.

Pattern completion, not understanding

Here is the most important insight in this lesson: the model does not "understand" your question the way you understand it. It does not have beliefs, intentions, or a world model in the human sense. What it has is an extraordinarily sophisticated pattern-completion engine trained on more text than any human could read in a thousand lifetimes.

But here is the twist: the patterns are so complex, so layered, and so deeply encoded that the output is often indistinguishable from genuine understanding. When Claude explains a subtle bug in your code, it is not "understanding" the code the way a senior engineer does. It is completing a pattern that looks like: "given this code and this error, a helpful explanation would say..." And yet the explanation is often correct and useful.

This is not a reason to dismiss LLMs. It is a reason to use them with clear eyes. The practical consequence is:

When the pattern is well-represented in training data (common programming languages, well-known facts, standard formats), the output is remarkably reliable.
When the pattern is rare, novel, or contradicted by plausible-looking alternatives in the training data, the output can be confidently wrong.

Knowing this distinction is the single most valuable skill for working with AI effectively. It tells you when to trust and when to verify.

Why this matters for you

Understanding next-token prediction changes how you prompt. If the model is a pattern-completion engine, then your prompt is the beginning of a pattern. A clear, well-structured prompt creates a pattern that the model can complete well. A vague, ambiguous prompt creates a pattern that could go anywhere — and often does.

This also explains why context matters so much. The model does not remember previous conversations (unless they are included in the current context). It does not learn from your corrections between sessions. Every conversation starts fresh, with only the tokens in the current window to work from. We will explore this in the next lesson on tokens and context windows.

What is next

Now that you have the core mental model — next-token prediction — we need to understand the units it operates on. In the next lesson, we will learn what tokens actually are, why Arabic and English tokenize differently, and what happens when your conversation exceeds the model's context window.

النّموذج اللّغويّ الكبير: متنبّئ بالرّمز التّالي

حين تجرّد الطّبقة التّسويقيّة، يفعل النّموذج اللّغويّ الكبير شيئًا واحدًا: يقرأ كلّ النّصّ الذي أمامه ثمّ يحسب احتمالات الرّمز التّالي. يختار رمزًا، يضيفه للنّصّ، ويكرّر العمليّة. كلّ محادثة أجريتها مع Claude أو ChatGPT هي هذه الحلقة تعمل آلاف المرّات. لا يوجد «فهم» بالمعنى البشري — بل محرّك إكمال أنماط مدرَّب على نصوص أكثر ممّا يمكن لإنسان قراءته في ألف عمر.

لكنّ الأنماط بالغة التّعقيد والعمق، لذا يبدو المُخرَج في كثير من الأحيان مطابقًا للفهم الحقيقي. حين تكون الأنماط مُمثَّلة جيّدًا في بيانات التّدريب — لغات برمجة شائعة، حقائق معروفة، صيغ قياسيّة — تكون النّتيجة موثوقة بدرجة ملفتة. أمّا حين يكون النّمط نادرًا أو جديدًا فقد يُخطئ النّموذج بثقة عالية.

فهم هذه الآليّة يغيّر طريقة تعاملك مع الذكاء الاصطناعي: مطالبتك هي بداية نمط، وكلّما كانت واضحة ومنظّمة أكمل النّموذج النّمط بشكل أفضل. هذا هو الأساس الذي نبني عليه بقيّة الدّورة.

جرّب بنفسك

ابدأ توليدًا واقرأه رمزًا برمز. حاول توقّع الكلمة التّالية قبل ظهورها. لاحظ متى تنجح ومتى لا — نفس المهمّة التي يقوم بها النّموذج.

تأمّل

بعد قراءة هذا الدّرس، ما الذي تغيّر في طريقة تفسيرك لإجابة النّموذج؟ هل معرفة أنّها توقّع الرّمز التّالي تزيد ثقتك أم تنقصها؟