🤖

Series · Part 15 of 21

Practice
AI Demystified
Abhishek Saha
Abhishek Saha
· 🤖 AI / ML

How Do You Know AI Is Actually Working?

Demos always look good. Production AI degrades silently. Here's the evaluation framework — from exact match to human review — and how to catch hallucinations.

How Do You Know AI Is Actually Working?

Every AI demo looks impressive. The demo was designed to. The real question is whether it works reliably across all inputs, over time — and for that you need evals.

VISUAL EXPLAINER
Evaluating AI: The Eval Hierarchy
CLICK A LEVEL TO EXPAND — BOTTOM = CHEAPEST, TOP = MOST RELIABLE
L4
Human Eval
Level 4
cost
high
speed
slow
✗ manual
L3
LLM-as-Judge
Level 3
cost
medium
speed
moderate
✓ auto
L2
Functional Correctness
Level 2
cost
low
speed
fast
✓ auto
L1
Exact Match
Level 1
cost
free
speed
instant
✓ auto
Automatable levels75%
Cost rangefree → high
In practicecombine all 4

Start at the bottom of the pyramid (exact match) and work up only as needed. 50 carefully chosen test cases run on every deploy beat 5,000 examples run once.

Next up: Part 16 starts the Hands-On track — set up your Python environment and run real AI code from scratch.

AI Demystified · 16 of 21 published

  1. 0 Grounding 5 Mental Models You Need Before Diving Into AI
  2. 1 Foundation What Happens When You Ask AI Something?
  3. 2 Foundation Transformers — The Architecture That Changed Everything
  4. 3 Foundation How AI Learns, Thinks, and Decides
  5. 4 Foundation How AI Reads Your Words
  6. 5 Foundation Why AI Forgets
  7. 6 Foundation Why AI Lies (And Doesn't Know It)
  8. 7 Foundation What AI Cannot Do
  9. 8 Foundation How AI Reasons (And Why It Sometimes Breaks)
  10. 9 Practice Prompt Engineering — How to Talk to AI
  11. 10 Practice Embeddings & Vector Databases — The Memory Layer of AI
  12. 11 Practice RAG Explained — How AI Knows What You Didn't Train It On
  13. 12 Practice Fine-tuning vs. Prompting — When to Use Which
  14. 13 Practice Do You Really Need GPT-4?
  15. 14 Practice Latency, Tokens, and Cost — The Physics of AI Products
  16. 15 Practice How Do You Know AI Is Actually Working?
  17. 16 Hands-On Coding Setup — Your AI Development Environment soon
  18. 17 Hands-On MCP Tool Calling — How AI Uses Tools soon
  19. 18 Hands-On AI Agents — Beyond Chatbots soon
  20. 19 Hands-On Build Your First Real AI App soon
  21. 20 Hands-On Token Optimization — Spend Less, Get More soon
← Part 14 Latency, Tokens, and Cost — The Physics of AI Products
Part 16 · soon Coding Setup — Your AI Development Environment
newsletter

Get new posts in your inbox

No spam. No digest. Just a note when I publish something new.

Discussion