في هذه الصفحة
الدّرس 5 من 6
التّدريب الدّقيق مقابل المطالبة
ما ستتعلّمه
- مقارنة المطالبة و RAG والتّدريب الدّقيق بصدق
- اختيار الرّافعة المناسبة لحالة استعمال حقيقيّة
- تقدير الكلفة والتّعقيد والزّمن للقيمة لكلّ منها
Most teams reach for fine-tuning when prompting would have shipped in an afternoon. This lesson gives you a clean decision tree for the three main levers of AI customization — and it might save you months of unnecessary work.
Three levers, one question
When you want an AI model to behave differently — to match your tone, to know your company's data, to follow your specific format — you have three main options. They differ dramatically in cost, speed, and complexity. Understanding when to use each one is one of the most valuable practical skills in AI engineering.
Let us walk through them from simplest to most complex.
Lever 1: Prompting
Prompting is the simplest and fastest lever. You write instructions in the system prompt or user message, and the model adjusts its behavior accordingly. No training. No infrastructure. No waiting.
What prompting can do:
- Set tone and style ("Write in a formal academic tone")
- Enforce format ("Respond in JSON with these exact keys")
- Provide role context ("You are a senior tax advisor specializing in French tax law")
- Add constraints ("Never recommend specific stocks" or "Always include a disclaimer")
- Include few-shot examples (show 2-3 examples of ideal input/output pairs)
What prompting costs: nearly nothing. You are paying per-token for the system prompt, but that is cents, not dollars. Iteration time is instant — change the prompt, run it again, see if it improved.
What prompting cannot do well: it cannot give the model knowledge it does not already have. If you need the model to know about your company's internal documents, product catalog, or proprietary data, prompting alone will not get you there (unless the documents fit in the context window).
Time to value: minutes to hours.
Lever 2: RAG (Retrieval-Augmented Generation)
RAG solves the knowledge problem. Instead of trying to teach the model everything during training, you fetch relevant documents at runtime and include them in the prompt. The model reads your data and responds based on it.
How RAG works in practice:
- You store your documents in a searchable database (often a vector database).
- When a user asks a question, you search your database for relevant documents.
- You inject those documents into the prompt alongside the user's question.
- The model generates a response grounded in your actual data.
What RAG can do:
- Give the model access to your proprietary data without training
- Keep information current (update documents, and the model "knows" the latest version)
- Provide citations (the model can point to the specific document it used)
- Scale to enormous knowledge bases (millions of documents)
What RAG costs: more than prompting, less than fine-tuning. You need a vector database, an embedding model, and a retrieval pipeline. Good open-source tools exist for all of these. The main cost is engineering time to build and maintain the pipeline.
What RAG cannot do well: it cannot change the model's fundamental behavior or style. RAG gives the model information, but it does not change how the model processes that information. If you need the model to consistently adopt a very specific writing style or reasoning pattern, RAG will not help.
Time to value: days to weeks.
Lever 3: Fine-tuning
Fine-tuning means further training the model on your specific data. You provide hundreds or thousands of example input/output pairs, and the model adjusts its internal weights to match those patterns. The result is a new version of the model that permanently behaves differently.
What fine-tuning can do:
- Deeply change the model's style and tone in ways prompting cannot
- Teach the model to consistently follow complex formats
- Improve performance on specific, narrow tasks
- Reduce prompt size (behavior baked into weights does not need prompt instructions)
What fine-tuning costs: significantly more. You need training data (hundreds to thousands of high-quality examples), compute resources for training, evaluation infrastructure to measure improvement, and ongoing maintenance as base models update. A fine-tuning project typically takes weeks to months.
What fine-tuning cannot do well: it cannot inject large amounts of factual knowledge reliably. Fine-tuning on your company's documentation does not create a reliable knowledge base — the model may still hallucinate about your data. For knowledge, RAG is almost always better.
Time to value: weeks to months.
The decision tree
When you face a new AI use case, work through these questions in order:
Question 1: Can I get the result with a better prompt?
Try prompting first. Write a clear system prompt. Add few-shot examples. Test it on 10-20 real inputs. If it works well enough, stop. You are done. Ship it.
Most people skip this step or do it poorly. They write a two-sentence prompt, see mediocre results, and conclude that prompting is not enough. In reality, a well-crafted system prompt with 3-5 examples often outperforms a poorly-executed fine-tune.
Question 2: Does the model need access to my specific data?
If the model needs to answer questions about your documents, products, or internal knowledge — and that knowledge does not fit in a single prompt — you need RAG. Build a retrieval pipeline. This is the right answer for customer support bots, documentation assistants, internal knowledge bases, and any system where the data changes regularly.
Question 3: Do I need the model to consistently behave in a way that prompting cannot achieve?
This is a high bar. Fine-tuning is the right choice when you have tried serious prompt engineering, it is not enough, and you have the data and resources to do it properly. Real examples include: matching a very specific brand voice across thousands of outputs, performing a narrow classification task at production scale where every token of prompt costs money, or teaching the model a specialized reasoning pattern for your domain.
The 95% rule
Here is the uncomfortable truth: approximately 95% of AI use cases in production today are solved by prompting alone, or by prompting plus RAG. Fine-tuning is powerful but rarely necessary. The teams that reach for it too early waste months building training datasets and evaluation pipelines for problems that a senior prompt engineer could have solved in a day.
Start simple. Exhaust the simpler options. Escalate only when you have evidence that the simpler approach is insufficient.
What is next
You now understand the three levers for customizing AI behavior. The final lesson zooms all the way out: given all these models — Claude, GPT, Gemini, open source — how do you choose the right one for your task? We will build a practical decision framework that survives the monthly model releases.
ثلاث روافع لتخصيص سلوك الذكاء الاصطناعي
حين تريد تغيير سلوك النّموذج لديك ثلاثة خيارات مرتّبة من الأبسط إلى الأعقد. المطالبة (prompting) هي الأرخص والأسرع: تكتب تعليمات في مطالبة النّظام والنّموذج يتكيّف فورًا. تكلّفتها سنتات ووقت التّنفيذ دقائق. RAG (التّوليد المعزَّز بالاسترجاع) يحلّ مشكلة المعرفة: تجلب وثائقك وقت التّشغيل وتحقنها في المطالبة فيجيب النّموذج بناءً على بياناتك الفعليّة. يحتاج أيّامًا إلى أسابيع لبنائه. التّدريب الدّقيق (fine-tuning) يغيّر أوزان النّموذج نفسه بشكل دائم — قويّ لكنّه يحتاج مئات أو آلاف الأمثلة وأسابيع إلى أشهر من العمل.
القاعدة: حوالي 95% من حالات الاستعمال تُحلّ بالمطالبة وحدها أو بالمطالبة مع RAG. ابدأ بالأبسط. أنهِك الخيارات البسيطة. لا تصعّد إلّا حين يكون لديك دليل أنّ الأبسط لا يكفي. الفِرق التي تقفز إلى التّدريب الدّقيق باكرًا تضيّع أشهرًا في بناء بيانات تدريب لمشاكل كان يمكن حلّها بمطالبة محكمة في يوم واحد.
جرّب بنفسك
خذ حالة استعمال حقيقيّة من عملك. ارسم حلّها بثلاث طرق: مطالبة خالصة، RAG، تدريب دقيق. قارن الكلفة والزّمن والجهد. أيّها يفوز؟
تأمّل
هل سبق أن بالغت في هندسة حلّ حين كان الأبسط كافيًا؟ كيف ينطبق ذلك الحدس على طيف المطالبة-RAG-التّدريب الدّقيق؟