GG
Applied AI Engineer
Oakland, CA youdontneedmy.help Contact via /submit
Summary

I build LLM systems wrapped in deterministic, verifiable workflows: multi-model routing, non-LLM scoring, dataset-backed evals, human-in-the-loop review gates, and fail-closed safety. Work spans shipped SaaS products, agentic safety fences, and live dashboards — all with sample data only, no client data ever exposed.

Selected Work
Shipped Groq-tutored exam-prep SaaS. Includes Stripe billing, SM-2 spaced-repetition scheduling, bilingual (EN/ES) content, and deterministic localization pipeline (~3,400 items, 5-stage QA gate, resumable, cost-capped).
Groq · Next.js · Supabase · TypeScript · Stripe · SM-2 algorithm
One blog topic fans out to Facebook caption, Instagram caption + hashtags, and email subject + body via a single Groq call. Nightly cron. Shipped as admin tool in production.
Groq · Next.js · social publish API · nightly cron
Multi-Model QA Cascade
AI
One item routed to Ollama + Groq + Codex in parallel; non-LLM scoring layer picks winner; human-in-the-loop review gate before SQL commit. $0 to run locally.
Ollama · Groq · OpenAI/Codex · deterministic scoring · human-in-loop · SQL
agent-gate — Safety Fence
AI
Deterministic guardrails for AI agents. 53 tests, fail-closed design: any uncaught edge halts the agent rather than letting it proceed. Built as a standalone reusable safety layer.
deterministic rules · fail-closed · 53-test suite
Vision Factory — Dual-Model Consensus
AI
GPT-4o generates assets, Gemini verifies. Fails closed if centroid drift exceeds 85px. Coordinates large consistent asset-generation runs via a spec-driven pipeline with a status board (todo → generating → verify → placed).
OpenAI GPT-4o · Gemini vision · consensus gate · asset pipeline
Address → ranked candidate matches from 25+ sources. Weighted issue-scoring engine with auditable per-issue breakdown. Determinism-audited: same inputs always produce the same ranked output.
Next.js · TypeScript · determinism audit · 25+ public record sources
blood-suga — Vision Meal Analysis
AIDEMO
Live Groq text call per dish returns carb/calorie range; eval loop computes MAPE vs USDA-derived ground truth in real time. Production path uses llama-4-scout vision. Offline fallback labeled honestly.
Groq · llama-3.3-70b · dataset-backed eval · MAPE · offline fallback
Capabilities & Stack
Models orchestrated
Claude OpenAI / Codex / GPT-4o Groq Gemini vision Local Ollama Stable Diffusion
Infrastructure
Next.js TypeScript Supabase Realtime Playwright Tailwind Vercel
AI / Pipeline
Deterministic pipelines Dataset-backed evals Non-LLM scoring Multi-model routing Fail-closed safety MAPE / eval loops
Systems shipped
SaaS (Stripe + SM-2) Publish automation Human-in-loop workbench Agentic safety fence Schema-driven intake Unit-economics engine
How I Work
Deterministic first. LLM output is wrapped in rules, scoring, and gates so the system behaves predictably even when the model does not.
Verifiable by design. Evals run against real labeled data; MAPE and pass/fail rates are visible in the demo, not buried in a notebook.
Fail-closed safety. Any uncaught edge halts the pipeline. No silent fallback to "probably fine." agent-gate is a dedicated reusable layer for this.
Human-in-the-loop. Operator review gates are a first-class component, not an afterthought. audit-kit ships a typed JSON workbench the next stage reads back.
No client data in demos. Every demo tile on the portfolio runs on sample data only. Real products operate under separate, isolated data paths.