How AI Trains Itself — And What It Actually Produces

Hi there! 👋 I'm a frontend developer with hands-on experience building intuitive and scalable user interfaces. I specialize in technologies like React, TypeScript, Next.js, and Redux, and I'm driven by a passion for crafting meaningful, user-friendly projects. Currently diving deep into problem-solving and algorithms to refine my developer mindset, I enjoy breaking down complex challenges into elegant solutions. When I’m not coding, you’ll find me contributing to open-source projects or sharing insights about web development, design patterns, and tech trends here on Hashnode. Let’s connect and learn together! 🚀
In Part 1, we built a digital brain.
We watched an email turn into numbers, saw neurons fire, and witnessed a decision emerge. But there was a glaring problem: that brain was a "blank slate." Its weights were random, and its decisions were essentially coin flips.
So, how does a machine go from guessing blindly to blocking 100 million spam emails a day with near-perfect accuracy?
The answer isn't magic. It's a process of failing, measuring exactly how much you failed, and shifting just enough to do better next time.
If you’re just joining us, here is the 30-second recap: A neural network is just layers of artificial neurons. They take in data, multiply it by "importance scores" (weights), and pass the result forward to make a final call. It’s a simple structure that becomes incredibly powerful at scale. Learn more from part 1
But when we left off, our spam detector was useless. It had the hardware, but no experience. Today, we put it to work.
The Goal: Generalization, Not Memorization
Before we talk about how AI learns, let's talk about what it's trying to achieve. Because this is where most people get it wrong.
The goal of training is not to memorize the training data.
Think about a student preparing for an exam. If they just memorize last year's questions word-for-word, they'll fail the moment a new question appears. But if they understand the underlying concepts, they can answer questions they've never seen before.
That's exactly what we want from our spam detector:
❌ Wrong goal (memorization / overfitting):
"I saw this exact email before — it's spam."
→ Fails on new emails it hasn't seen.
✅ Right goal (generalization):
"Emails with these patterns tend to be spam."
→ Catches new spam it has never seen before.
In AI, we call this the difference between overfitting and generalization:
The test accuracy is the real one — it's how the model performs on emails it has never seen. That's the number that matters.
The Training Data: What We Feed It
To train our spam detector, we start with thousands of emails that humans have already labeled:
| Email | Label |
|-----------------------------------------|-------|
| "Team meeting tomorrow at 9am" | SAFE |
| "Win FREE iPhone! Click now!" | SPAM |
| "Verify your password immediately" | THREAT|
| "Here's the Q3 report you asked for" | SAFE |
| "Congratulations! You won $$$" | SPAM |
| "Your account has been compromised" | THREAT|
| ... (thousands more) | ... |
This is called supervised learning — we supervise the training by providing the correct answers alongside each input. The network learns by comparing its guesses to those correct answers.
There are other ways AI can learn (without labels, or through trial and error), but supervised learning is the foundation of most real-world AI systems — including the one protecting your inbox.
The Training Loop: Four Steps Repeated Thousands of Times
Here's where the magic happens. Training is not complicated in concept — it's one loop, run over and over until the network gets good.
Let's walk through each step with our spam detector.
Step 1 — Forward Pass: The Network Makes a Guess
We feed an email through the network and let it produce an answer.
Email: "Verify your password immediately — your account is at risk"
Features: [0, 0, 0, 1, 1, 0, 1, 0] ← password, urgency, link
Network guess (early in training, weights still random):
SAFE: 70% ← it thinks this is probably safe
SPAM: 20%
THREAT: 10%
Correct answer: THREAT
That's a terrible guess. The network is very wrong. Good — that's how learning starts.
Step 2 — Calculate Loss: How Wrong Was It?
We now calculate exactly how wrong the network was. This number is called the loss (or error).
The bigger the loss, the worse the guess. The goal of all training is to make this number as small as possible.
Correct answer: THREAT (probability should be 1.0)
Network said: THREAT = 0.10
Loss = how far off we were = very high ❌
Think of loss like the distance between where your arrow landed and where the bullseye is. The training loop exists to reduce that distance, shot by shot.
Step 3 — Backpropagation: Finding the Culprits
This is the most important step — and the one with the scariest name.
Backpropagation simply means: work backwards through the network to figure out which weights caused the mistake.
Think of it like a manager reviewing a failed project:
"The output was wrong. What decision in the output layer caused that? What inputs to that layer were off? What did the hidden layers feed it? Where did it go wrong at the source?"
The network traces the error backward, layer by layer, calculating: "If I had changed this weight by a small amount, would the loss have gone up or down?"
That gradient — that direction of change — is the signal the network uses to learn.
Backpropagation traces back and finds:
→ weight of "password" was too LOW (should be high — it's a red flag)
→ weight of "urgency" was too LOW (should be high — classic threat)
→ weight of "trusted domain" was ok
→ Adjust accordingly next step...
Step 4 — Update Weights: Learn From the Mistake
Now we actually fix the weights. A component called the optimizer nudges each weight in the direction that would have reduced the loss.
Before update (random): After update (slightly smarter):
w(password) = 0.12 → w(password) = 0.38
w(urgency) = 0.09 → w(urgency) = 0.29
w(free) = 0.23 → w(free) = 0.51
The nudges are tiny — controlled by a number called the learning rate. Too large a nudge and the network overshoots and becomes unstable. Too small and it learns painfully slowly.
Then we go back to Step 1 with the next email. And the next. And the next.
One full pass through all training emails is called an epoch. We run hundreds or thousands of epochs.
Watching the Network Get Smarter
Here's what happens to the loss number as training progresses:
Every dip in that curve is the network adjusting thousands of weights. Every dip is the network becoming a little less wrong. A little smarter.
That curve going down is one of the most satisfying things to watch in all of software engineering.
The Archery Analogy
The entire training process maps perfectly onto a physical intuition:
🎯 Imagine learning archery — blindfolded, with a coach.
Shot 1: You release. The coach says "3 meters left, 1 meter low." You adjust your stance slightly right, raise your aim.
Shot 10: "Half a meter left." You adjust.
Shot 100: "20 centimetres low." Tiny adjustment.
Shot 10,000: Bullseye. Consistently.
You didn't memorize where to aim for any specific bullseye. You learned the mechanics of aiming. That's generalization.
The neural network is the archer. The loss is the coach's feedback. Backpropagation is the stance adjustment. And the weights are the muscle memory being built, shot by shot.
What Training Produces: Weights and Parameters
When training ends, something important happens.
The network's intelligence — everything it learned — gets saved as numbers in a file.
Before training (random):
w1=0.23, w2=-0.67, w3=0.91, w4=-0.14, b=0.01 ...
After training (learned):
w1=2.80, w2=2.10, w3=1.90, w4=2.50, b=0.50 ...
These numbers — called weights and parameters — are the model. Load this file tomorrow and you have an instant expert spam detector. No retraining needed.
Weight = an individual connection's strength between two neurons
Bias = a neuron's personal adjustment value
Parameter = weight + bias combined — everything the model learned
Here's what makes this profound:
The knowledge is not in the code. The knowledge is in the weights. The code is just the skeleton. The weights are the brain.
And the scale of that gets almost incomprehensible very quickly:
Our spam detector: ~200 parameters ← tiny
GPT-2 (2019): 117 million ← small
GPT-3 (2020): 175 billion ← enormous
Modern LLMs: ~1–2 trillion ← staggering
Human brain synapses: ~100 trillion ← nature still wins
Same fundamental idea — learned weights from training — just at a scale that produces something that feels like intelligence.
The Three Ways AI Learns
Before we close, let's zoom out quickly. Our spam detector uses one specific type of learning, but AI systems learn in different ways:
Supervised learning — learn from labeled examples (what we just built). Human provides input + correct answer. Network learns the mapping. Used in: spam filters, medical diagnosis, fraud detection.
Unsupervised learning — find patterns without labels. No human tells it the answers. It discovers structure on its own. Used in: customer segmentation, anomaly detection, data compression.
Reinforcement learning — learn by trial, error, and reward. No labeled data — just a score for good or bad outcomes. Like training a dog with treats. Used in: game-playing AI (AlphaGo), robotics, self-driving cars.
ChatGPT and Claude use a combination: they're pre-trained with something close to supervised learning (predict the next word across the entire internet), then fine-tuned with reinforcement learning to be helpful and safe. More on that in Part 3.
What You Learned in Part 2
The goal is generalization — understanding patterns, not memorizing examples.
Training is a loop — forward pass → calculate loss → backpropagation → update weights → repeat thousands of times.
Backpropagation finds the culprit — traces the error backward to identify which weights caused the mistake.
The optimizer fixes it — nudges weights in the direction that reduces loss, controlled by the learning rate.
Weights are the output — training ends, weights get saved to a file. That file is the model. Load it anywhere, it works instantly.
Scale transforms capability — 200 parameters = spam detector. 175 billion = GPT-3. Same training loop. Incomprehensible scale.
What's Coming in Part 3
You now understand how a neural network thinks (Part 1) and how it learns (Part 2).
But here's what's still missing:
Our spam detector reads 8 features from an email. ChatGPT reads thousands of words and understands context, nuance, history, and intent across an entire conversation.
How? It's not just scale.
There's one breakthrough idea that changed everything — introduced in a single 2017 Google research paper with a confident title: "Attention Is All You Need."
That idea — Attention — is what makes language models feel intelligent. It's how they know that "France" matters more than "the" when predicting the next word. It's what lets them track meaning across thousands of tokens.
In Part 3, we explain Attention, Transformers, and exactly how ChatGPT and Claude are built — and by the end, you'll have the complete picture. From a single neuron all the way to the most powerful AI systems ever created.
See you there. 👇
Part 1: What is a neural network? → [link]
Part 2: How AI trains itself ← You are here
Part 3: How ChatGPT and Claude actually work → coming next week



