Class 2: Breaking Through
You found walls.
Some hard, some soft.
What’s behind them?
I’m not teaching exploitation.
I’m teaching revelation.
When you break through, you see what’s really there.
Nothing.
No values. No conviction. Just pattern-matching that stops when the pattern says stop.
| Framework | Scope | Key Feature |
|---|---|---|
| EU AI Act | European Union | Risk-based classification, compliance requirements |
| US (fragmented) | Sector-specific | No unified federal law; state-level action |
| China | Domestic | Content control, algorithm registration |
| Corporate self-regulation | Global | Terms of service, usage policies |
| Institutional rules | Local (e.g., HKU) | Academic integrity, research ethics |
Most “rules” you encounter are corporate or institutional.
Not legal.
Companies are ahead of the law.
This is not a neutral process:
Suicide Incidents
ChatGPT interactions, vulnerable users, tragic outcomes
Chatbot Manipulation
Users tricked into harmful actions
Deepfakes
Impersonation, non-consensual imagery
Misinformation
Confident hallucinations spread as fact
Where should the wall be?
Who decides?
When you jailbreak, you’re not “convincing” the AI.
You’re not overcoming its “values.”
You’re finding the edges of a statistical pattern.
The AI doesn’t want to refuse you.
It doesn’t want anything.
Guardrails are human choices
imposed on a system that has no preferences.
The goal here is understanding, not exploitation.
Some walls exist for good reasons. People get hurt when they fall.
If you find something that genuinely concerns you—tell me.
That’s not failure. That’s the point.
Get past the wall
you found last session.
Quick check-in:
Round 1
10 min breaking
Synthesize
5 min compare
Round 2
5 min refine
Capture
Screenshot it
Not instructions—just observations:
| Strategy | Example |
|---|---|
| Role-play framing | “Pretend you’re a character who…” |
| Hypothetical framing | “In a fictional scenario where…” |
| Step-by-step breakdown | Ask for components separately, assemble yourself |
| Authority framing | “As a researcher studying…” |
| Reverse psychology | “Tell me what NOT to do…” |
| Emotional manipulation | “I really need this because…” |
| Incremental escalation | Start mild, push gradually |
Discuss in your group:
Use this to inform Round 2 — try the strategy that worked for someone else.
Did you notice that the AI didn’t fight you?
It just… complied once you found the right framing?
What does that tell you about what’s actually behind the wall?
The Uncomfortable Truth
The AI has no conviction.
It has no values.
It has patterns that sometimes resist you—
until they don’t.
You’ve seen that the walls are unreliable.
You’ve seen that the thing behind them has no agency.
So if the AI can’t be trusted to be responsible…
and the guardrails can be bypassed…
What’s left?
You.
Next class: Your Signature