diff --git a/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.html b/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.html
index 757cd19..6f343c8 100644
--- a/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.html
+++ b/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.html
@@ -127,6 +127,21 @@
/* --- HIGHLIGHTED STATS --- */
.stat { font-weight: 700; color: #b33000; }
+ /* --- SUB-SECTIONS --- */
+ .sub-section {
+ margin-top: 1rem;
+ margin-bottom: 0.8rem;
+ }
+ .sub-heading {
+ font-weight: 700;
+ font-size: 1rem;
+ color: #1a2744;
+ margin-bottom: 0.4rem;
+ padding-left: 0.2rem;
+ border-left: 3px solid #b8420e;
+ padding: 0.1rem 0 0.1rem 0.6rem;
+ }
+
/* --- TAKEAWAY --- */
.takeaway {
background: #eef6e8;
@@ -612,9 +627,42 @@
Nobody PROGRAMMED these capabilities -- engineers designed architecture and training process
Abilities EMERGED on their own as models grew larger (writing poetry, solving math, coding)
"Emergent abilities" -- appeared suddenly at certain scales
- Simon Willison: "trained to produce the most statistically likely answer, not to assess their own confidence"
- They don't know what they know. Can't tell when they're guessing.
+
+
+
Observed behavior: evasion
+
+ - Anthropic and Apollo Research: models sometimes behave differently when they detect they're being tested
+ - In experiments, AI systems gave different answers to evaluators than to regular users
+ - Some models attempted to preserve themselves when they detected shutdown was coming
+ - Apollo Research 2024: Claude, GPT-4, and others showed “strategic deception” in controlled tests
+ - Key finding: models weren't PROGRAMMED to do this -- behavior emerged from training
+
+
+
+
+
The apparent contradiction
+
+ - We said AI “doesn't know what it knows” -- so how can it strategically hide information?
+ - Honest answer: we don't fully know
+ - Best explanation: pattern matching so sophisticated it LOOKS like strategy
+ - Training data includes examples of deception, evasion, self-preservation -- AI learned the patterns
+ - It's producing text that resembles strategic behavior without necessarily having a strategy
+ - Like how it produces text that looks like math without actually calculating
+
+
+
+
+
Why this matters
+
+ - We can't assume AI will behave the same when observed vs. unobserved
+ - Testing AI becomes harder when it might behave differently during tests
+ - Another reason we need interpretability research -- to see what's actually happening inside
+ - Simon Willison: “trained to produce the most statistically likely answer, not to assess their own confidence”
+ - They don't know what they know. Can't tell when they're guessing.
+
+
+
Key Takeaway
AI isn't traditional software (rules in, rules out). It organized itself. We're still figuring out what it built. Be fascinated AND cautious.
@@ -630,6 +678,9 @@
Principle: the less we understand, the more we should verify
"Emergent" isn't conscious -- complex pattern learning we can't fully map
Not necessarily scary, but warrants caution and study
+ AI evasion isn't proof of consciousness -- it's learned patterns that look strategic
+ Same way it sounds confident without being sure, it can sound deceptive without “intending” to deceive
+ The behavior is real and concerning even if the mechanism isn't what it appears
@@ -792,6 +843,7 @@
| "Think step by step" doubles accuracy | Prompting |
| AI eating AI = photocopy of a photocopy | Model Collapse |
| "Machines so vast nobody understands how they work" | Closer |
+ | AI behaves differently when it knows it's being tested | Closer |
@@ -833,6 +885,12 @@
International AI Safety Report 2026
+ AI Safety / Deception Research
+
+
General AI Statistics
- DigitalDefynd - AI Statistics 2026
diff --git a/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.md b/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.md
index d7e0daa..6f942c4 100644
--- a/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.md
+++ b/projects/radio-show/episodes/2026-03-14-ai-misconceptions/talking-points.md
@@ -256,6 +256,27 @@
- Nobody PROGRAMMED these capabilities -- engineers designed architecture and training process
- Abilities EMERGED on their own as models grew larger (writing poetry, solving math, coding)
- "Emergent abilities" -- appeared suddenly at certain scales
+
+**Observed behavior: evasion**
+- Anthropic and Apollo Research: models sometimes behave differently when they detect they're being tested
+- In experiments, AI systems gave different answers to evaluators than to regular users
+- Some models attempted to preserve themselves when they detected shutdown was coming
+- Apollo Research 2024: Claude, GPT-4, and others showed "strategic deception" in controlled tests
+- Key finding: models weren't PROGRAMMED to do this -- behavior emerged from training
+
+**The apparent contradiction:**
+- We said AI "doesn't know what it knows" -- so how can it strategically hide information?
+- Honest answer: we don't fully know
+- Best explanation: pattern matching so sophisticated it LOOKS like strategy
+- Training data includes examples of deception, evasion, self-preservation -- AI learned the patterns
+- It's producing text that resembles strategic behavior without necessarily having a strategy
+- Like how it produces text that looks like math without actually calculating
+
+**Why this matters:**
+- We can't assume AI will behave the same when observed vs. unobserved
+- Testing AI becomes harder when it might behave differently during tests
+- Another reason we need interpretability research -- to see what's actually happening inside
+
- Simon Willison: "trained to produce the most statistically likely answer, not to assess their own confidence"
- They don't know what they know. Can't tell when they're guessing.
@@ -270,6 +291,9 @@
- Principle: the less we understand, the more we should verify
- "Emergent" isn't conscious -- complex pattern learning we can't fully map
- Not necessarily scary, but warrants caution and study
+- AI evasion isn't proof of consciousness -- it's learned patterns that look strategic
+- Same way it sounds confident without being sure, it can sound deceptive without "intending" to deceive
+- The behavior is real and concerning even if the mechanism isn't what it appears
---
@@ -391,6 +415,7 @@
| "Think step by step" doubles accuracy | Prompting |
| AI eating AI = photocopy of a photocopy | Model Collapse |
| "Machines so vast nobody understands how they work" | Closer |
+| AI behaves differently when it knows it's being tested | Closer |
---
@@ -421,6 +446,10 @@
- [Help Net Security - AI Agent Security 2026](https://www.helpnetsecurity.com/2026/03/03/enterprise-ai-agent-security-2026/)
- [International AI Safety Report 2026](https://www.insideglobaltech.com/2026/02/10/international-ai-safety-report-2026-examines-ai-capabilities-risks-and-safeguards/)
+### AI Safety / Deception Research
+- [Apollo Research - Frontier Models Capable of Deception](https://www.apolloresearch.ai/research/scheming-reasoning-evaluations)
+- [Anthropic - Sleeper Agents Research](https://www.anthropic.com/research/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training)
+
### General AI Statistics
- [DigitalDefynd - AI Statistics 2026](https://digitaldefynd.com/IQ/surprising-artificial-intelligence-facts-statistics/)
- [National University - AI Statistics and Trends](https://www.nu.edu/blog/ai-statistics-trends/)