On this page
Lesson 6 of 6
The model landscape today
What you'll learn
- Place the major model families on a single mental map
- Match a task to a tier (small / mid / frontier)
- Stay sane as new models ship monthly
New models drop every month. If you try to keep up with every release, you will burn out. This lesson gives you a durable framework: think in tiers, not names. Match the task to the tier. Let the specific model names change — your decision-making stays stable.
The major families
As of April 2026, four major families dominate the commercial landscape, with a vibrant open-source ecosystem alongside them. Here is the map.
Anthropic (Claude)
Anthropic offers three tiers:
-
Claude Opus 4.7 — The frontier model. Strongest reasoning, deepest analysis, best at complex multi-step tasks. 1M token context window. Pricing: $5 per million input tokens, $25 per million output tokens. Use for: hard coding problems, research synthesis, complex document analysis, tasks where accuracy on the first try saves more money than the token cost.
-
Claude Sonnet 4.6 — The mid-tier workhorse. Very strong capabilities at lower cost. 1M token context window. Pricing: $3 per million input tokens, $15 per million output tokens. Use for: daily professional work, writing, code review, summarization, most tasks where Opus-level reasoning is not strictly necessary.
-
Claude Haiku 4.5 — The speed-and-cost champion. Fast, cheap, surprisingly capable for its size. 200K token context window. Pricing: $1 per million input tokens, $5 per million output tokens. Use for: high-volume automation, classification, simple extraction, chat applications where latency and cost matter more than peak reasoning.
OpenAI (GPT)
OpenAI's flagship is GPT-4o, a strong frontier-class model competitive with Claude Opus on many tasks. GPT-4o-mini serves the small/cheap tier. OpenAI also offers o-series models (o3, o4-mini) with extended "thinking" capabilities for math and reasoning-heavy tasks. Their pricing and context windows vary; check current documentation for specifics.
Google (Gemini)
Gemini 2.5 Pro is Google's frontier offering, with a 1M token context window and strong multimodal capabilities (text, image, video, audio). Gemini 2.5 Flash is their fast/cheap option. Google's models tend to excel at multimodal tasks and have competitive pricing, especially for developers already in the Google Cloud ecosystem.
Open source
The open-source ecosystem is thriving:
- Meta's Llama (Llama 4) — Available in multiple sizes, competitive with commercial mid-tier models. Free to use with generous licensing.
- Mistral — European AI lab with strong, efficient models. Mistral Large competes at the mid-tier level.
- DeepSeek — Chinese lab producing surprisingly capable models, especially in coding and math.
Open-source models give you full control: you can run them on your own hardware, fine-tune them freely, and avoid per-token API costs. The trade-off is infrastructure complexity and typically lower peak capability than the top commercial models.
Think in tiers, not names
The most durable skill you can build is tier-based thinking. Instead of memorizing which model is best at what (a moving target), learn to categorize tasks by the level of capability they require:
Frontier tier — Tasks requiring deep reasoning, complex multi-step logic, nuanced judgment, or where accuracy on the first attempt is critical. These tasks justify the highest cost because errors are expensive.
Examples: debugging a subtle concurrency bug, synthesizing a legal argument from multiple documents, writing production code for complex business logic, analyzing a research paper and identifying methodology flaws.
Mid tier — The bulk of professional knowledge work. Tasks that require strong language understanding and generation but not maximum reasoning depth.
Examples: drafting emails and documents, code review, summarizing meeting notes, translating content, most chat-based assistance, content creation, data analysis with clear instructions.
Small/fast tier — High-volume tasks where speed and cost matter most. Tasks that are relatively simple but need to be done thousands or millions of times.
Examples: classifying support tickets, extracting structured data from invoices, routing queries to the right department, simple Q&A from a knowledge base, generating short summaries, input validation.
The matching rule
Here is the rule: use the cheapest model tier that solves your task reliably. Not the most powerful. Not the most impressive. The cheapest one that works.
This matters more than most people realize. The cost difference between tiers is 5-25x. If you are running a production system that processes 10,000 requests per day, using Opus where Haiku would suffice could mean the difference between a $50/day bill and a $1,000/day bill. At scale, the right model selection is one of the highest-leverage cost decisions you will make.
The matching process is simple:
- Start with the cheapest tier (Haiku-class).
- Test on 20-30 representative examples from your real workload.
- If quality is sufficient, ship it.
- If not, move up one tier and test again.
- Only use frontier models for tasks that demonstrably fail at lower tiers.
Staying sane in a fast-moving landscape
A new model announcement every month can feel overwhelming. Here is how to stay grounded:
Ignore the hype, test the claims. When a new model drops, do not reorganize your stack based on a blog post. Run it on your actual tasks and measure. Most "breakthrough" models are incremental improvements on specific benchmarks.
Re-evaluate quarterly, not weekly. Set a calendar reminder every three months to check if a newer, cheaper model can handle tasks you currently run on an expensive one. Between evaluations, do not chase every release.
Build model-agnostic systems. Design your applications so that swapping the underlying model is a configuration change, not a rewrite. Use abstraction layers. Keep your prompts in version control. This gives you the freedom to upgrade without risk.
The names will change. The tiers will not. Six months from now, the specific model names in this lesson may be outdated. But there will still be frontier models, mid-tier workhorses, and fast/cheap options. Your tier-based decision framework will still work.
Course wrap-up
You have now built a complete mental model for working with LLMs as an informed practitioner:
- Lesson 1: LLMs are next-token predictors — sophisticated pattern completion, not understanding.
- Lesson 2: Tokens are the currency, the context window is the memory.
- Lesson 3: Hallucinations are inherent — build verification habits, not blind trust.
- Lesson 4: Temperature controls the creativity-accuracy trade-off.
- Lesson 5: Start with prompting, escalate to RAG, fine-tune only when necessary.
- Lesson 6: Think in tiers, match tasks to capability levels, stay model-agnostic.
You are not an ML engineer after this course, and you do not need to be. You are someone who understands how these tools work well enough to make smart decisions about when to use them, how to use them, and when not to trust them. That puts you ahead of most people building with AI today.
خريطة النّماذج: فكّر بالطّبقات لا بالأسماء
المشهد في أبريل 2026 يضمّ أربع عائلات تجاريّة رئيسيّة — Anthropic (Claude)، OpenAI (GPT)، Google (Gemini) — إضافة إلى نظام مفتوح المصدر مزدهر (Llama، Mistral، DeepSeek). كلّ عائلة تقدّم نماذج على عدّة مستويات: حدّيّة (أقوى استدلال وأعلى كلفة)، متوسّطة (عمود العمل اليوميّ)، وصغيرة/سريعة (أتمتة كبيرة الحجم بأقلّ كلفة).
المهارة الأكثر ديمومة هي التّفكير بالطّبقات لا بالأسماء. القاعدة: استعمل أرخص طبقة تنجز مهمّتك بشكل موثوق. ابدأ بالأصغر، اختبر على عيّنة حقيقيّة، ولا تصعّد إلّا حين يثبت أنّ الطّبقة الأدنى لا تكفي. الفرق في التّكلفة بين الطّبقات يتراوح بين 5 و25 ضعفًا — في الإنتاج على نطاق واسع هذا القرار من أعلى القرارات تأثيرًا على الميزانيّة.
الأسماء ستتغيّر كلّ بضعة أشهر. الطّبقات لن تتغيّر. ابنِ أنظمتك بحيث يكون تبديل النّموذج تغيير إعداد لا إعادة كتابة، وأعد تقييم خياراتك كلّ ثلاثة أشهر. بذلك تبقى فعّالاً دون أن تحترق في مطاردة كلّ إصدار جديد.
Try it yourself
Pick three real tasks from your past week. For each, choose the cheapest model tier that solves it well. Run the tasks on those models. Were you right, or did you over- or under-estimate what you needed?
Reflect
Do you tend to default to the most powerful model available, or do you actively match models to tasks? What would change if you adopted a tiered approach?