Applied AI Program — Comprehensive Seminar Guide

Applied AI is not training foundation models from scratch. It is the discipline of taking existing models (GPT, Claude, Gemini, Llama) and engineering reliable products around them: retrieval pipelines, agent orchestration, guardrails, evaluation, deployment, and continuous improvement in production.

80% Of "AI Engineer" roles = API + RAG + agents

74% Of LLM job posts mention RAG explicitly

$159K–$245K US total comp (same work, title variance)

"The best engineer I hired last year had 'Software Engineer' on LinkedIn and shipped three production LLM systems. The worst candidate was a 'Principal Agentic GenAI Forward-Deployed Context Architect' whose GitHub was three forks of a LangChain tutorial." — Industry hiring analysis, 2026

Applied AI vs Adjacent Roles

Role	Core Output	PhD Required?	Typical Stack
ML Engineer	Trains models from data; feature stores; batch inference	Often helpful	PyTorch, SageMaker, Kubeflow
Applied AI / GenAI Engineer	Ships LLM features users touch daily	No	LangGraph, RAG, FastAPI, eval pipelines
Forward Deployed Engineer	Embeds with customer; end-to-end delivery on-site	No	Platform + custom integrations + stakeholder mgmt
AI PM	Prioritizes use cases, metrics, guardrails, rollout	No	Eval rubrics, cost models, user research
Research Scientist	Publishes; trains/fine-tunes foundation models	Usually yes	Distributed training, novel architectures

2 Job Market Analysis — What Industry Is Actually Hiring For

Analysis synthesized from 200+ job postings across LinkedIn, Levels.fyi, company career pages, and specialized boards (AgenticCareers, NLP People, Boundev's 50-post sample, Deloitte/Palantir FDE listings). Percentages reflect frequency across LLM-focused roles in 2025–2026.

Skill Frequency Matrix

Skill / Requirement	Frequency	What "Good" Looks Like in Interviews
RAG (Retrieval-Augmented Generation)	~74%	Debugged chunking failures; tuned hybrid search + reranking; measured groundedness
Python + API development	~95%	FastAPI/Flask, async, clean service boundaries, typed schemas
Prompt engineering (systematic)	~88%	Versioned prompts, few-shot libraries, structured outputs — not "I wrote a good prompt once"
Agent / tool-use orchestration	~65%	LangGraph state machines, retries, human-in-the-loop, tool schema design
Vector databases	~70%	pgvector, Pinecone, Weaviate, OpenSearch — plus when NOT to use vectors
Evaluation & observability	~58%	RAGAS, DeepEval, LangSmith/Langfuse, production trace analysis
Cloud deployment (AWS/Azure/GCP)	~72%	Bedrock, Vertex AI, Azure OpenAI — with cost/latency tradeoffs
Guardrails & safety	~45%	Input/output filtering, PII redaction, escalation paths
MCP / tool protocol	~22% (rising fast)	FastMCP servers, OpenAPI-bound action groups
Fine-tuning	~25%	Nice-to-have; most roles expect RAG + prompting first
Graph RAG / knowledge graphs	~18%	Differentiator for enterprise, compliance, lineage queries

Frameworks Mentioned in Postings

Orchestration: LangChain, LangGraph, LlamaIndex, Semantic Kernel, CrewAI, AutoGen

LangGraph/LangChain dominate enterprise JDs

Evaluation: RAGAS, DeepEval, LangSmith, Braintrust, PromptFoo, G-Eval

Eval is the fastest-growing differentiator

Hiring manager signal: Founders want someone who has shipped at least two LLM features end-to-end in production, can debug retrieval failures without guessing, and can explain architectural tradeoffs to a non-engineer in plain English.

Title Chaos — Pick Skills Over Labels

Same work appears under: AI Engineer, Applied AI Engineer, GenAI Engineer, LLM Engineer, AgentOps Engineer, AI Delivery Engineer, Forward Deployed Engineer. Compensation varies up to $86K for identical scope. Optimize for portfolio + production stories, not title collection.

3 The Conversation Problem Taxonomy

The biggest mistake in enterprise conversational AI is treating every user message as "chat with RAG." Your screenshot captures the right instinct: classify intent first, then route to the correct technical pattern. Below is an expanded taxonomy for Applied AI across industries — building on the 3-class model (Interpretive → RAG, Transactional → Agents, High-Risk → Guardrails + Handoff) plus three additional classes seen in production systems and research (TUNA framework, Broder's web search taxonomy, intent-first RAG literature).

Class

Nature of Intent

GenAI Technical Solution

1. Interpretive
Informational

User asks about complex business logic, contract terms, policy language, or general how-tos. Requires analyzing dense text and synthesizing accurate answers from authoritative sources.

RAG + Knowledge Base. Vector + keyword hybrid search, reranking, citation-required generation. AWS: Bedrock Knowledge Bases + OpenSearch. Azure: AI Search + Foundry. GCP: Vertex AI Search.

2. Transactional
Action-Oriented

User needs to mutate state — update an order, book an appointment, change a password, submit a claim, trigger a workflow. Or retrieve a specific literal data point from a system of record.

Agents + Action Groups. OpenAPI specs bound to Lambda/API functions. Tool schemas with idempotency keys, confirmation steps, and audit logs. LangGraph tool nodes; Bedrock Agents; MCP servers.

3. High-Risk / Empathy
Compliance

Emotionally charged interactions, regulatory constraints, medical/legal/financial advice boundaries, or non-deterministic goals where wrong answers carry brand or legal liability.

Automated Guardrails + Warm Handoff. LLM restricted to triage: classify, empathize, collect context, route to human. Never autonomous diagnosis or binding commitments. Bedrock Guardrails, NeMo Guardrails, custom policy engines.

4. Analytical
Multi-Source

User needs synthesis across multiple documents, comparison ("Plan A vs Plan B"), trend analysis, or relationship traversal ("what depends on X?").

Agentic / Hybrid RAG. Multi-hop retrieval, graph traversal (Neo4j + vectors), SQL over structured data, decomposition into sub-queries. Adaptive-RAG routing by complexity.

5. Navigational
Routing

User wants to reach a specific resource, form, dashboard, or human team — not an generated answer. "Take me to billing" or "I need to speak to retention."

Intent Router + Deep Links. Lightweight classifier → UI action, CRM queue, or IVR transfer. Often no LLM generation needed — just classification + routing.

6. Meta-Conversation
System

User talks about the AI itself: "Why did you say that?", "Forget what I told you", "Start over", feedback, corrections.

Session State + Memory Policies. Explicit memory scopes, correction handling, feedback loops into eval datasets. Critical for trust and continuous improvement.

Design rule: If two intent classes route to the same pipeline, merge them. Build your taxonomy from logged production queries, not theoretical categories. Cluster 500 real user messages before choosing architecture.

4 Intent-First Architecture (How to Build It)

Intent-first architecture inverts naive RAG: classify before retrieve, route before generate.

flowchart TD U[User Message] --> IC[Intent Classifier
fast model / fine-tuned classifier] IC -->|Interpretive| RAG[RAG Pipeline
hybrid search + rerank + cite] IC -->|Transactional| AG[Agent + Tools
OpenAPI / MCP / Lambda] IC -->|High-Risk| GR[Guardrails + Triage
empathy template + handoff] IC -->|Analytical| AR[Agentic RAG
multi-hop / graph / SQL] IC -->|Navigational| RT[Router
deep link / queue / transfer] IC -->|Meta| MEM[Session Manager
memory / reset / feedback] RAG --> OUT[Response + Citations] AG --> OUT GR --> HITL[Human Agent] --> OUT AR --> OUT RT --> OUT MEM --> OUT OUT --> EV[Eval + Trace Log]

Classifier Options (Pick One to Start)

Approach	Latency	Accuracy	When to Use
Small LLM with structured output (Haiku, GPT-4o-mini)	~200ms	Good	Fast MVP, <10 intent classes
Fine-tuned classifier (BERT, DistilBERT)	<50ms	Very good in-domain	High volume, stable taxonomy
Rules + embeddings hybrid	<30ms	Moderate	Regulated industries with explicit policies
LLM + confidence threshold → clarifying question	~300ms	Best UX	Ambiguous queries common

Confidence Thresholds & Escalation

≥ 0.85 confidence: Route automatically to selected pipeline
0.60 – 0.85: Ask one clarifying question OR retrieve with broader context
< 0.60 OR high-risk class detected: Escalate to human or safe fallback response

5 Full Stack Map — Beyond "Chatbot"

RAG Knowledge Q&A

Ingestion → chunking → embeddings → vector store → hybrid retrieval → rerank → grounded generation with citations.

When: Interpretive intents, document-heavy domains (insurance, legal, HR policies).

Watch-outs: Fixed chunking breaks semantic coherence; stale docs; no citation = no trust.

Agents Tool-Use & Workflows

State graph → tool selection → execution → observation → loop until done or budget exhausted.

When: Transactional intents, multi-step processes (claims, IT tickets, scheduling).

Watch-outs: Unbounded agent loops burn cost; always set max hops + timeout.

Guardrails Safety Layer

Input filtering (PII, jailbreaks) → policy check → output validation → escalation triggers.

When: High-risk intents, regulated industries, customer-facing support.

Watch-outs: Guardrails are not optional "later" — ship them with v1.

Eval / LLMOps Production Discipline

Offline eval sets → CI gates → online A/B → trace analysis → prompt versioning → cost dashboards.

When: Always. The #1 skill gap in candidates who can demo but can't ship.

Watch-outs: "It works on my laptop" is not production.

Additional Modalities in the Market (Don't Forget These)

Modality	Example Products	Applied AI Pattern
Copilot / Inline Assist	GitHub Copilot, Cursor, Notion AI	Context from current doc + lightweight completion; not full chat
Voice AI	ElevenLabs, Vapi, OpenAI Realtime API	STT → intent → TTS; latency-critical; often hybrid with human transfer
Workflow Automation	n8n, Zapier AI, UiPath	Deterministic triggers + LLM for unstructured steps
Search + Gen	Perplexity-style, enterprise search	RAG with web or internal index; citation-first UX
Multi-Agent Teams	CrewAI, AutoGen patterns	Role-specialized agents; high cost — use when decomposition is proven necessary
Computer Use / UI Agents	Browser automation, RPA+LLM	Fragile; prefer API-first transactional agents when APIs exist

Anti-pattern: "Agentic RAG" for every query. ~70% of production cases work with single-pass RAG + good reranking. Add agent loops only when eval data proves compound queries are failing.

6 Industry Playbooks — Taxonomy in Practice

🏥 Healthcare

Interpretive: "What does my plan cover for physical therapy?" → RAG over benefits docs.

Transactional: "Schedule my follow-up" → Agent + EHR scheduling API.

High-Risk: "Is this chest pain serious?" → Guardrail → nurse triage line. Never diagnose.

Compliance: HIPAA, no PHI in logs, BAA with vendors.

🏦 Insurance / Financial Services

Interpretive: Advisor RAG over 500-page policy PDFs (maternity vs pregnancy keyword gap).

Transactional: FNOL claim intake, beneficiary updates → Agents + core policy admin APIs.

High-Risk: Investment advice, fraud disputes → human handoff + audit trail.

Analytical: Coverage gap summaries across product lines → multi-doc agentic RAG.

🛒 E-Commerce / Marketplaces

Interpretive: "Will this fit my 2019 Honda?" → RAG + structured fitment graph.

Transactional: Order status, returns, refunds → Agents bound to OMS APIs.

Navigational: "Where's my seller dashboard?" → route, don't generate.

Search AI: Query understanding + semantic retrieval (not generative answers for every search).

💼 HR / Talent (ResumeInterview Domain)

Interpretive: "How do I tailor my resume for this JD?" → RAG over role requirements + user profile.

Transactional: Apply to job, schedule mock interview, export PDF → product actions.

Analytical: Fit score explanation across skills gap → structured comparison agent.

Meta: "Make it more senior" → session memory + iterative refinement loop.

🏛️ Government / Public Sector

Interpretive: Benefits eligibility, permit requirements → RAG over official docs only.

Transactional: Form pre-fill, case status → Agents with strict auth.

High-Risk: Legal interpretation, immigration → mandatory human review.

Constraint: FedRAMP, data residency, no external model calls for classified data.

🏭 Manufacturing / Supply Chain

Interpretive: SOP lookup, safety procedures → RAG with version-controlled docs.

Analytical: "Which suppliers depend on component X?" → graph RAG.

Transactional: PO creation, inventory holds → ERP agents.

Voice: Hands-free floor worker queries via voice AI.

⚖️ Legal

Interpretive: Contract clause lookup → RAG with precise citations (page, section).

Analytical: Compare redlines across versions → multi-doc agent.

High-Risk: "Should I sign this?" → never autonomous; attorney handoff.

🎓 EdTech / Training

Interpretive: Concept explanation, study guides → RAG over curriculum.

Meta: "Quiz me harder" → adaptive difficulty from session state.

Transactional: Enroll, submit assignment → LMS integration.

Eval focus: Factual accuracy rubrics; hallucination = student harm.

7 Forward Deployed AI — The Delivery Model

Palantir pioneered the Forward Deployed Engineer (FDE) — now adopted by Deloitte, Scale AI, Databricks, and enterprise AI consultancies. In 2026, FDE roles explicitly require GenAI/agentic delivery, not just data integration.

FDE vs Platform Engineer vs Applied AI Engineer

Dimension	Platform / Product Engineer	Applied AI Engineer	Forward Deployed Engineer
Customer proximity	Indirect (PM proxy)	Sometimes	Embedded on-site / in war room
Problem shape	Generalizable features	LLM system components	One client's ambiguous problem → working solution
Success metric	DAU, feature adoption	Latency, accuracy, cost	Customer mission outcome in weeks
Skills emphasis	Scale, abstractions	RAG, agents, eval	Stakeholder mgmt + rapid prototyping + politics

"As an FDE, your responsibilities look similar to a startup CTO: own end-to-end execution of high-stakes projects — architecture, data wrangling, custom apps, executive conversations, and team strategy." — Palantir FDSE job description

The FDE Pod Model (Deloitte / Enterprise Pattern)

FDE (Engineer): Builds working AI software, agent workflows, integrations
Deployment Strategist / AI PM: Owns use case prioritization, ROI narrative, change management
Domain SME: Validates outputs, defines guardrails, accepts/rejects model behavior
Platform Engineer (back at HQ): Hardens patterns into reusable assets — prompt libraries, eval templates, reference architectures

Career insight: FDE is the fastest path to learning Applied AI in context. You see why RAG fails on real PDFs, why agents need confirmation steps, and why executives care about cost per conversation — not benchmark scores.

8 Build vs Buy vs Platform

Layer	Build	Buy / Managed	Recommendation
Foundation model	Train from scratch	OpenAI, Anthropic, Bedrock, Vertex	Buy API — 99% of Applied AI roles
Vector search	Custom FAISS	Pinecone, OpenSearch, pgvector	Managed until scale proves otherwise
Agent orchestration	Custom state machine	LangGraph, Bedrock Agents	Framework first, custom only for edge cases
Evaluation	Custom pytest + rubrics	LangSmith, Braintrust, DeepEval	Hybrid — platform for traces, custom for domain rubrics
Full conversational platform	From scratch	Kore.ai, Cognigy, Ada, Sierra	Buy for standard support bots; build for differentiated IP

9 Evaluation & LLMOps — The Skill That Gets You Hired

Minimum Viable Eval Stack

Golden dataset: 50–200 question/answer pairs from real user queries (anonymized)
Offline metrics: Faithfulness, answer relevance, citation accuracy (RAGAS or custom rubric)
CI gate: PR cannot merge if faithfulness drops >2% vs baseline
Production traces: Log every request — intent class, retrieval docs, latency, cost, user feedback thumbs
Weekly review: Sample 20 failed traces; add failures to golden set

Metrics That Matter to Business (Not Just ML)

Metric	Definition	Why Executives Care
Containment rate	% resolved without human	Support cost reduction
Groundedness / faithfulness	Answer supported by retrieved context	Legal/compliance risk
Time-to-resolution	Median conversation length to outcome	Customer satisfaction
Cost per conversation	Tokens + infra / session	Unit economics at scale
Escalation rate	% routed to human	Guardrail effectiveness
Hallucination rate	Factually wrong on golden set	Brand trust

10 90-Day Applied AI Learning Roadmap

Weeks 1–2

Foundations: Python async, FastAPI, LLM APIs (structured outputs, tool calling). Build a CLI chatbot with conversation memory. Read model docs for Claude, GPT, Gemini — understand context windows and pricing.

Weeks 3–4

RAG deep dive: Ingest 50 PDFs, experiment with chunk sizes, hybrid search, Cohere rerank. Measure recall@5 on 30 test questions. Document what broke and why.

Weeks 5–6

Intent taxonomy: Log 100 sample queries, cluster into 4–6 classes, build classifier router. Wire each class to appropriate pipeline (RAG vs stub agent vs handoff mock).

Weeks 7–8

Agents: LangGraph state machine with 3 tools (search, calculator, API mock). Add max hops, retry logic, human confirmation for write operations.

Weeks 9–10

Eval + observability: RAGAS or DeepEval in CI. LangSmith/Langfuse tracing. Build dashboard: latency, cost, faithfulness over time.

Weeks 11–12

Capstone: Deploy end-to-end domain assistant (pick one vertical from Section 6). Write architecture doc, eval report, and 5-min demo video. Publish GitHub repo with README.

11 Portfolio Projects That Get Interviews

Project A: Policy Q&A with Intent Router

Upload benefits PDFs. Classify: interpretive vs transactional vs high-risk. RAG with citations for Q&A; mock API for "update beneficiary"; handoff UI for advice requests.

Signals: Taxonomy thinking, RAG debugging, guardrails awareness.

Project B: Job Search Copilot (ResumeInterview-aligned)

Parse resume + JD → fit analysis → tailored bullet suggestions → mock interview questions. Separate interpretive (explain gap) from transactional (save application).

Signals: Product sense, structured outputs, multi-step workflow.

Project C: Support Agent with Eval CI

50 FAQ golden set. Agent with order lookup tool. CI fails on faithfulness regression. Public trace viewer.

Signals: LLMOps maturity — the #1 differentiator in 2026 hiring.

Project D: GraphRAG for Dependencies

Ingest architecture docs + YAML configs. Answer "what breaks if X fails?" using Neo4j + vectors.

Signals: Advanced retrieval — differentiates senior candidates.

12 Seminar Exercises & Discussion Prompts

Live Exercise 1 — Classify These Queries (5 min)

Audience breaks into groups. Classify each query using the 6-class taxonomy and name the technical solution:

"What's the deductible on my gold plan?"
"I want to cancel my subscription effective today."
"I'm really frustrated — your bot charged me twice and I might sue."
"Compare PPO vs HMO for a family of four with one chronic condition."
"Take me to the claims upload page."
"Why did you recommend that plan? I said budget under $300."

Live Exercise 2 — Architecture Whiteboard (10 min)

Pick one industry (healthcare, e-commerce, HR). Draw: intent classifier → pipelines → eval layer. Identify ONE high-risk path that must never be fully automated.

Discussion Prompts

When does agentic RAG justify its cost vs single-pass RAG + reranking?
How do you convince a client their "chatbot for everything" should be 4 specialized pipelines?
What's the minimum eval set size before you trust offline metrics?
FDE vs product engineer: which path fits your career goals and why?
Build vs buy: where would you draw the line for a Series A startup?

Recommended Follow-Up Resources

AWS Bedrock Agents + Knowledge Bases documentation (action groups pattern)
LangGraph docs — stateful agent workflows
RAGAS + DeepEval — RAG evaluation frameworks
arxiv TUNA taxonomy — user needs in conversational AI
ResumeInterview blog — FDE career guides + interview prep (coming soon)

Appendix: Applied AI Patterns in ResumeInterview

Connecting seminar concepts to product work already in this codebase — useful for demonstrating real Applied AI delivery:

Product Feature	Intent Class	Pattern Used
Resume tailoring / curator	Interpretive + Meta	Structured LLM outputs, iterative refinement, user profile context
Mock interview	Interpretive + Analytical	Multi-turn conversation, follow-up generation, rubric scoring
Job fit scoring	Analytical	Structured comparison, semantic matching, explainable gaps
Simple Apply / FlashApply	Transactional	Agent-like workflow — form mutation via integrations (Greenhouse, extension)
Screening sessions	High-Risk (hiring decisions)	Human-in-loop, scored reports, not autonomous hire/reject
Study pack generation	Interpretive	RAG over role requirements + generated curriculum

Seminar close: Applied AI is not one model call. It is taxonomy → routing → the right pattern → evaluation → deployment. The engineers who win in 2026 ship systems, not demos.

Applied AI Program — Seminar Guide

Table of Contents

1 What Is Applied AI?

Applied AI vs Adjacent Roles

2 Job Market Analysis — What Industry Is Actually Hiring For

Skill Frequency Matrix

Frameworks Mentioned in Postings

Title Chaos — Pick Skills Over Labels

3 The Conversation Problem Taxonomy

4 Intent-First Architecture (How to Build It)

Classifier Options (Pick One to Start)

Confidence Thresholds & Escalation

5 Full Stack Map — Beyond "Chatbot"

RAG Knowledge Q&A

Agents Tool-Use & Workflows

Guardrails Safety Layer

Eval / LLMOps Production Discipline

Additional Modalities in the Market (Don't Forget These)

6 Industry Playbooks — Taxonomy in Practice

🏥 Healthcare

🏦 Insurance / Financial Services

🛒 E-Commerce / Marketplaces

💼 HR / Talent (ResumeInterview Domain)

🏛️ Government / Public Sector

🏭 Manufacturing / Supply Chain

⚖️ Legal

🎓 EdTech / Training

7 Forward Deployed AI — The Delivery Model

FDE vs Platform Engineer vs Applied AI Engineer

The FDE Pod Model (Deloitte / Enterprise Pattern)

8 Build vs Buy vs Platform

9 Evaluation & LLMOps — The Skill That Gets You Hired

Minimum Viable Eval Stack

Metrics That Matter to Business (Not Just ML)

10 90-Day Applied AI Learning Roadmap

11 Portfolio Projects That Get Interviews

Project A: Policy Q&A with Intent Router

Project B: Job Search Copilot (ResumeInterview-aligned)

Project C: Support Agent with Eval CI

Project D: GraphRAG for Dependencies

12 Seminar Exercises & Discussion Prompts

Live Exercise 1 — Classify These Queries (5 min)

Live Exercise 2 — Architecture Whiteboard (10 min)

Discussion Prompts

Recommended Follow-Up Resources

Appendix: Applied AI Patterns in ResumeInterview