Part III: Datasets & Risks

Building Blocks & Guardrails of AI

A 2-Class Module on Critical Principles for Training and Using AI Models


What This Module Is

Prior introductions to AI focus on outputs & process.
This module focuses on training inputs & concerns.

We look inside AI training & implementation to understand:

AI systems are Garbage In and Garbage Out.

Once you see the training process & potential risks, you can begin to reason about:


The Arc

Class Title What Happens What You Learn
1 Datasets of AI We dive into the training process of AI models Scaling laws of AI, training data processing, and evaluating data quality
2 Risks of AI We examine the multifaceted impacts and risks of AI Ethics and risks of AI in business, socioeconomics, environments, and most importantly, on human beings

What We’re Asking You to Do

In this module, you are asked to think critically and systematically.

Besides from user → output and from system → mechanism → behaviour,
Also, think about input → model → output to fill in the missing puzzle

Your task is to:

  1. See the building blocks of AI.
    Understanding how an AI system functions largely depends on the quantity and quality of training data.

  2. Locate the mechanism.
    Explore how the quality and quantity of training data affect model predictive power.

  3. Reflect the risks.
    Criticize how AI models may create concerns and threats to society, the environment, and humanity.


Learning Outcomes

By the end of this module, you will be able to:

  1. Understand why training data matters to all AI systems
  2. Identify criteria for training data quality and quantity
  3. Communicate AI model training process through clear diagrams and structured explanations
  4. Analyze how training inputs change AI model behavior
  5. Reflect on how datasets, ethics, and risks interact

How You’ll Be Assessed

This module is assessed through A1.3 — Case Study 3: The Data Matters, an individual case study focused on explaining why the dataset matters.

You will analyze a pre-trained GenAI/LLM system and show:

Full requirements and deliverables are provided in the A1.3 assignment-specific outline link.
This should be read alongside the general case study rubric link.


Resources

Reading Material: see link.

Datasets of AI:

Risks of AI:


The One-Liner

“The anatomy of data is the building blocks of all AI models: garbage data in, garbage model out.”