Project CLAW — Lab AI Robot Build Plan

01

Project Vision

CLAW (Computer Lab Autonomous Workmate) is a real AI assistant that lives in the lab. It knows who students are, remembers past interactions, can answer questions, help debug code, and eventually move around the room. It runs entirely on local hardware — no student data ever leaves the building. The brain lives in the iMac. The body lives on a Raspberry Pi 5. Students build both.

Teaching Philosophy: Every phase of this project maps directly to Python curriculum. Students aren't just building a robot — they're learning APIs, networking, databases, machine learning pipelines, and hardware I/O through a project they're genuinely invested in.

02

Project Phases

WEEKS 1–3 Phase 0

Brain Setup — iMac + Ollama

Get the iMac running as the AI brain. Install Ollama, pull a local model, build the FastAPI server, lock down the network firewall. Every student runs this locally too on their machine to understand the stack before touching hardware.

Install Ollama + pull a 13B model (Llama 3.3 or Mistral Small 3.1)
Build minimal FastAPI server with /chat and /health endpoints
Configure macOS firewall: allow inbound, block all outbound except whitelisted domains
Test model responses via Python requests from student machines
Students learn: REST APIs, JSON, async Python, HTTP methods

WEEKS 4–6 Phase 1

Memory System — ChromaDB + SQLite

Give CLAW a real memory. Build the vector database for semantic recall and SQLite for structured facts about people and places. This is the hardest conceptual phase — and the richest for teaching.

Set up ChromaDB locally, implement embed-and-store pipeline
Design SQLite schema: people, sessions, places, facts tables
Build memory retrieval: query ChromaDB → inject into LLM context
Implement entity extraction: pull names, topics from conversations
Build session persistence: state survives restarts
Students learn: vector embeddings, RAG, SQL, database design

WEEKS 7–9 Phase 2

Voice I/O — Whisper + Speaker

Give CLAW ears and a voice. Run OpenAI's Whisper model locally for speech-to-text, and a text-to-speech engine for output. The Pi handles audio capture and playback — the iMac handles transcription and generation.

Install openai-whisper on iMac, test transcription accuracy
Wire Pi microphone → send raw audio to iMac /transcribe endpoint
Implement pyttsx3 or espeak on Pi for spoken responses
Build wake word detection (simple energy threshold or Porcupine)
End-to-end test: speak to Pi, hear CLAW respond
Students learn: audio buffers, binary data over HTTP, latency tuning

WEEKS 10–12 Phase 3

Vision — Face & Person Recognition

CLAW learns to recognize students by face. When someone walks up, it greets them by name and recalls their last session. Built entirely with local models — no cloud vision APIs.

Set up Pi camera module, stream frames to iMac
Implement face detection with deepface or face_recognition
Build enrollment flow: students register their face with their name
Link face ID → SQLite person record → memory retrieval
Handle unknowns gracefully: "Hi, I don't think we've met — what's your name?"
Students learn: computer vision, image pipelines, privacy ethics

WEEKS 13–16 Phase 4

Physical Body — Desk Robot

Build the physical enclosure and put CLAW on a desk. Servo-driven head movement, expressive LED eyes, a small screen for a face. The robot becomes a real presence in the lab.

Design and 3D-print or build enclosure (students design this)
Wire pan/tilt servos for head tracking (follow the speaker)
Build LED matrix or small display for expressive face
Implement GPIO control from Pi using RPi.GPIO or gpiozero
Pygame face animations (teacher-led — natural fit for existing curriculum)
Students learn: GPIO, PWM, servo control, hardware I/O, Pygame

WEEKS 17–22 Phase 5

Mobility — CLAW Goes Mobile

Mount CLAW on a wheeled platform. Add navigation sensors. Implement basic autonomous movement and room mapping. This is advanced — a stretch goal that high-performing students can own.

Add motor HAT and wheeled base to Pi
Install ultrasonic sensors for obstacle avoidance
Implement basic SLAM (Simultaneous Localization and Mapping)
Install ROS 2 on Pi, connect navigation to iMac brain
CLAW can patrol the lab, approach students, return to base
Students learn: ROS 2, SLAM, sensor fusion, autonomous systems

03

Student Team Structure

Divide the class into four squads. Each squad owns a vertical slice of the system. Squads collaborate via the FastAPI interface — a real-world API contract between teams. Rotate students between squads each phase so everyone touches every layer.

B

Brain Squad

Owns the iMac server. FastAPI endpoints, Ollama integration, prompt engineering, context window management, inference optimization.

M

Memory Squad

Owns ChromaDB + SQLite. Schema design, embedding pipelines, RAG retrieval, entity extraction, memory summarization strategies.

S

Senses Squad

Owns voice and vision. Whisper transcription, speaker output, face recognition, camera streaming, wake word detection.

H

Body Squad

Owns the Pi and physical hardware. GPIO, servo control, motors, sensors, Pygame face animations, enclosure design.

04

Full Technology Stack

Component	Runs On	Purpose
Ollama + Llama 3.3	iMac	Local LLM inference — no cloud required
FastAPI	iMac	HTTP API server — Pi↔Brain communication
ChromaDB	iMac	Vector database for semantic memory retrieval
SQLite (stdlib)	iMac	Structured store — people, places, sessions, facts
openai-whisper	iMac	Offline speech-to-text transcription
deepface / face_recognition	iMac	Local face identification pipeline
PyAudio	Pi 5	Microphone capture, audio buffering
pyttsx3 / espeak	Pi 5	Offline text-to-speech output
picamera2	Pi 5	Camera feed capture and streaming
gpiozero / RPi.GPIO	Pi 5	Servo, motor, LED, sensor GPIO control
Pygame	Pi 5	Animated face display — teacher's home turf
ROS 2	Pi 5 (Phase 5)	Navigation, SLAM, autonomous mobility

05

System Architecture

┌─────────────────────────────────────────────────────────┐
│                   iMAC — THE BRAIN                      │
│                                                         │
│  FastAPI Server (:8000)                                 │
│    POST /chat          ← main conversation endpoint     │
│    POST /transcribe    ← audio → text via Whisper       │
│    POST /identify      ← image → person name            │
│    GET  /memory/{name} ← retrieve person's history      │
│    GET  /health        ← status check from Pi           │
│                                                         │
│  [Ollama LLM] ← [Memory Retriever] ← [ChromaDB]        │
│                          ↓                              │
│                      [SQLite DB]                        │
│                                                         │
│  Firewall: INBOUND open · OUTBOUND blocked              │
└─────────────────────────────────────────────────────────┘
                           ↕  WiFi (LAN only)
┌─────────────────────────────────────────────────────────┐
│                RASPBERRY PI 5 — THE BODY                │
│                                                         │
│  [Microphone] → buffer audio → POST /transcribe         │
│  [Camera]     → capture frame → POST /identify          │
│  [Speaker]    ← receive text  ← speak response          │
│                                                         │
│  GPIO:                                                  │
│    Pan/Tilt Servos  → head tracks speaker               │
│    LED Matrix       → expressive face/eyes              │
│    Ultrasonic       → obstacle detection (Phase 5)      │
│    Motor HAT        → wheel drive (Phase 5)             │
│                                                         │
│  Pygame display → animated face on small screen         │
└─────────────────────────────────────────────────────────┘

06

Hardware List

// Brain (iMac — existing)

iMac (existing lab hardware)

Ollama (free, open source)

External SSD for model storage

// Body (Pi — new hardware)

Raspberry Pi 5 (8GB)

Pi Camera Module 3

USB microphone

Small speaker + amp HAT

3.5" LCD display (face)

Pan/tilt servo kit

LED matrix (eyes)

// Enclosure

3D-printed shell (student design)

Acrylic / hardware / fasteners

// Phase 5 Mobility (later)

Motor HAT for Pi

Wheeled robot chassis

LiDAR sensor (RPLIDAR A1)

LiPo battery pack

07

First Code — Week 1 Starting Point

This is Day 1 code for students. Get the brain talking before touching any hardware. Every student runs this on their own machine first.

# imac_brain/server.py — the minimal brain server
# Run: uvicorn server:app --host 0.0.0.0 --port 8000

import ollama
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="CLAW Brain")

class ChatRequest(BaseModel):
    message: str
    speaker_name: str = "unknown"

@app.post("/chat")
async def chat(req: ChatRequest):
    response = ollama.chat(
        model="llama3.3",
        messages=[
            {"role": "system", "content": f"You are CLAW, a helpful lab assistant. You are speaking with {req.speaker_name}."},
            {"role": "user", "content": req.message}
        ]
    )
    return {"reply": response["message"]["content"]}

@app.get("/health")
async def health():
    return {"status": "online", "model": "llama3.3"}

# pi_body/client.py — the minimal Pi client
# Students run this from their laptop first to test

import requests

BRAIN_URL = "http://imac.local:8000"

def ask_claw(message: str, name: str = "Student") -> str:
    response = requests.post(
        f"{BRAIN_URL}/chat",
        json={"message": message, "speaker_name": name}
    )
    return response.json()["reply"]

# Test it
if __name__ == "__main__":
    reply = ask_claw("Hey CLAW, what can you help me with?", "Alex")
    print(f"CLAW: {reply}")

08

Key Teaching Moments by Phase

Phase 0 — APIs & Servers

What is a REST API? Why does it matter?
HTTP methods: GET vs POST vs PUT vs DELETE
JSON as a universal data format
localhost vs network addresses — how computers find each other
async/await — why it matters for a server handling many clients

Phase 1 — Databases & Memory

Why can't we just use a list? Persistence and scale
Relational vs vector databases — two fundamentally different ideas
What is an embedding? How does meaning become a number?
RAG: Retrieval Augmented Generation — the most important AI pattern of 2025-26
Context windows — LLMs have short-term memory limits, here's how we work around them

Phase 2–3 — Signals & Vision

Audio as raw binary data — sample rates, bit depth, buffers
Why Whisper is remarkable: sequence-to-sequence models explained simply
Images as 3D arrays — width × height × RGB channels
Privacy ethics: face recognition is powerful — when should we use it? How do we get consent?
Latency: measuring end-to-end response time and why it matters for feel

Phase 4–5 — Hardware & Autonomy

GPIO: bridging software and physical world
PWM: how you control a servo with pulse width modulation
Pygame (teacher-led): animating CLAW's face — familiar ground for students
SLAM: how a robot builds a map while navigating it
ROS 2: the Linux of robotics — publish/subscribe architecture