Applied AI Program — Seminar Guide

A practitioner-first curriculum for building production LLM systems: what the market actually hires for, how to architect conversational AI across industries using intent taxonomy, and the full stack from RAG to agents to guardrails. Synthesized from 200+ job postings, enterprise deployment patterns, and real product work.

ResumeInterview Applied AI Seminar ~90 min presentation + Q&A Updated June 2026

Table of Contents

1. What Is Applied AI (vs ML Research)? 2. Job Market Analysis — 200+ Postings 3. The Conversation Problem Taxonomy 4. Intent-First Architecture 5. Full Stack Map: RAG → Agents → Voice → Eval 6. Industry Playbooks (8 Verticals) 7. Forward Deployed AI — The Delivery Model 8. Build vs Buy vs Platform 9. Evaluation & LLMOps (Non-Negotiable) 10. 90-Day Learning Roadmap 11. Portfolio Projects That Get Interviews 12. Seminar Exercises & Discussion Prompts

1 What Is Applied AI?

Applied AI is not training foundation models from scratch. It is the discipline of taking existing models (GPT, Claude, Gemini, Llama) and engineering reliable products around them: retrieval pipelines, agent orchestration, guardrails, evaluation, deployment, and continuous improvement in production.

80% Of "AI Engineer" roles = API + RAG + agents
74% Of LLM job posts mention RAG explicitly
$159K–$245K US total comp (same work, title variance)
"The best engineer I hired last year had 'Software Engineer' on LinkedIn and shipped three production LLM systems. The worst candidate was a 'Principal Agentic GenAI Forward-Deployed Context Architect' whose GitHub was three forks of a LangChain tutorial." — Industry hiring analysis, 2026

Applied AI vs Adjacent Roles

RoleCore OutputPhD Required?Typical Stack
ML Engineer Trains models from data; feature stores; batch inference Often helpful PyTorch, SageMaker, Kubeflow
Applied AI / GenAI Engineer Ships LLM features users touch daily No LangGraph, RAG, FastAPI, eval pipelines
Forward Deployed Engineer Embeds with customer; end-to-end delivery on-site No Platform + custom integrations + stakeholder mgmt
AI PM Prioritizes use cases, metrics, guardrails, rollout No Eval rubrics, cost models, user research
Research Scientist Publishes; trains/fine-tunes foundation models Usually yes Distributed training, novel architectures

2 Job Market Analysis — What Industry Is Actually Hiring For

Analysis synthesized from 200+ job postings across LinkedIn, Levels.fyi, company career pages, and specialized boards (AgenticCareers, NLP People, Boundev's 50-post sample, Deloitte/Palantir FDE listings). Percentages reflect frequency across LLM-focused roles in 2025–2026.

Skill Frequency Matrix

Skill / RequirementFrequencyWhat "Good" Looks Like in Interviews
RAG (Retrieval-Augmented Generation) ~74% Debugged chunking failures; tuned hybrid search + reranking; measured groundedness
Python + API development ~95% FastAPI/Flask, async, clean service boundaries, typed schemas
Prompt engineering (systematic) ~88% Versioned prompts, few-shot libraries, structured outputs — not "I wrote a good prompt once"
Agent / tool-use orchestration ~65% LangGraph state machines, retries, human-in-the-loop, tool schema design
Vector databases ~70% pgvector, Pinecone, Weaviate, OpenSearch — plus when NOT to use vectors
Evaluation & observability ~58% RAGAS, DeepEval, LangSmith/Langfuse, production trace analysis
Cloud deployment (AWS/Azure/GCP) ~72% Bedrock, Vertex AI, Azure OpenAI — with cost/latency tradeoffs
Guardrails & safety ~45% Input/output filtering, PII redaction, escalation paths
MCP / tool protocol ~22% (rising fast) FastMCP servers, OpenAPI-bound action groups
Fine-tuning ~25% Nice-to-have; most roles expect RAG + prompting first
Graph RAG / knowledge graphs ~18% Differentiator for enterprise, compliance, lineage queries

Frameworks Mentioned in Postings

Orchestration: LangChain, LangGraph, LlamaIndex, Semantic Kernel, CrewAI, AutoGen

LangGraph/LangChain dominate enterprise JDs

Evaluation: RAGAS, DeepEval, LangSmith, Braintrust, PromptFoo, G-Eval

Eval is the fastest-growing differentiator

Hiring manager signal: Founders want someone who has shipped at least two LLM features end-to-end in production, can debug retrieval failures without guessing, and can explain architectural tradeoffs to a non-engineer in plain English.

Title Chaos — Pick Skills Over Labels

Same work appears under: AI Engineer, Applied AI Engineer, GenAI Engineer, LLM Engineer, AgentOps Engineer, AI Delivery Engineer, Forward Deployed Engineer. Compensation varies up to $86K for identical scope. Optimize for portfolio + production stories, not title collection.

3 The Conversation Problem Taxonomy

The biggest mistake in enterprise conversational AI is treating every user message as "chat with RAG." Your screenshot captures the right instinct: classify intent first, then route to the correct technical pattern. Below is an expanded taxonomy for Applied AI across industries — building on the 3-class model (Interpretive → RAG, Transactional → Agents, High-Risk → Guardrails + Handoff) plus three additional classes seen in production systems and research (TUNA framework, Broder's web search taxonomy, intent-first RAG literature).

Class
Nature of Intent
GenAI Technical Solution
1. Interpretive
Informational
User asks about complex business logic, contract terms, policy language, or general how-tos. Requires analyzing dense text and synthesizing accurate answers from authoritative sources.
RAG + Knowledge Base. Vector + keyword hybrid search, reranking, citation-required generation. AWS: Bedrock Knowledge Bases + OpenSearch. Azure: AI Search + Foundry. GCP: Vertex AI Search.
2. Transactional
Action-Oriented
User needs to mutate state — update an order, book an appointment, change a password, submit a claim, trigger a workflow. Or retrieve a specific literal data point from a system of record.
Agents + Action Groups. OpenAPI specs bound to Lambda/API functions. Tool schemas with idempotency keys, confirmation steps, and audit logs. LangGraph tool nodes; Bedrock Agents; MCP servers.
3. High-Risk / Empathy
Compliance
Emotionally charged interactions, regulatory constraints, medical/legal/financial advice boundaries, or non-deterministic goals where wrong answers carry brand or legal liability.
Automated Guardrails + Warm Handoff. LLM restricted to triage: classify, empathize, collect context, route to human. Never autonomous diagnosis or binding commitments. Bedrock Guardrails, NeMo Guardrails, custom policy engines.
4. Analytical
Multi-Source
User needs synthesis across multiple documents, comparison ("Plan A vs Plan B"), trend analysis, or relationship traversal ("what depends on X?").
Agentic / Hybrid RAG. Multi-hop retrieval, graph traversal (Neo4j + vectors), SQL over structured data, decomposition into sub-queries. Adaptive-RAG routing by complexity.
5. Navigational
Routing
User wants to reach a specific resource, form, dashboard, or human team — not an generated answer. "Take me to billing" or "I need to speak to retention."
Intent Router + Deep Links. Lightweight classifier → UI action, CRM queue, or IVR transfer. Often no LLM generation needed — just classification + routing.
6. Meta-Conversation
System
User talks about the AI itself: "Why did you say that?", "Forget what I told you", "Start over", feedback, corrections.
Session State + Memory Policies. Explicit memory scopes, correction handling, feedback loops into eval datasets. Critical for trust and continuous improvement.
Design rule: If two intent classes route to the same pipeline, merge them. Build your taxonomy from logged production queries, not theoretical categories. Cluster 500 real user messages before choosing architecture.

4 Intent-First Architecture (How to Build It)

Intent-first architecture inverts naive RAG: classify before retrieve, route before generate.

flowchart TD U[User Message] --> IC[Intent Classifier
fast model / fine-tuned classifier] IC -->|Interpretive| RAG[RAG Pipeline
hybrid search + rerank + cite] IC -->|Transactional| AG[Agent + Tools
OpenAPI / MCP / Lambda] IC -->|High-Risk| GR[Guardrails + Triage
empathy template + handoff] IC -->|Analytical| AR[Agentic RAG
multi-hop / graph / SQL] IC -->|Navigational| RT[Router
deep link / queue / transfer] IC -->|Meta| MEM[Session Manager
memory / reset / feedback] RAG --> OUT[Response + Citations] AG --> OUT GR --> HITL[Human Agent] --> OUT AR --> OUT RT --> OUT MEM --> OUT OUT --> EV[Eval + Trace Log]

Classifier Options (Pick One to Start)

ApproachLatencyAccuracyWhen to Use
Small LLM with structured output (Haiku, GPT-4o-mini)~200msGoodFast MVP, <10 intent classes
Fine-tuned classifier (BERT, DistilBERT)<50msVery good in-domainHigh volume, stable taxonomy
Rules + embeddings hybrid<30msModerateRegulated industries with explicit policies
LLM + confidence threshold → clarifying question~300msBest UXAmbiguous queries common

Confidence Thresholds & Escalation

5 Full Stack Map — Beyond "Chatbot"

RAG Knowledge Q&A

Ingestion → chunking → embeddings → vector store → hybrid retrieval → rerank → grounded generation with citations.

When: Interpretive intents, document-heavy domains (insurance, legal, HR policies).

Watch-outs: Fixed chunking breaks semantic coherence; stale docs; no citation = no trust.

Agents Tool-Use & Workflows

State graph → tool selection → execution → observation → loop until done or budget exhausted.

When: Transactional intents, multi-step processes (claims, IT tickets, scheduling).

Watch-outs: Unbounded agent loops burn cost; always set max hops + timeout.

Guardrails Safety Layer

Input filtering (PII, jailbreaks) → policy check → output validation → escalation triggers.

When: High-risk intents, regulated industries, customer-facing support.

Watch-outs: Guardrails are not optional "later" — ship them with v1.

Eval / LLMOps Production Discipline

Offline eval sets → CI gates → online A/B → trace analysis → prompt versioning → cost dashboards.

When: Always. The #1 skill gap in candidates who can demo but can't ship.

Watch-outs: "It works on my laptop" is not production.

Additional Modalities in the Market (Don't Forget These)

ModalityExample ProductsApplied AI Pattern
Copilot / Inline AssistGitHub Copilot, Cursor, Notion AIContext from current doc + lightweight completion; not full chat
Voice AIElevenLabs, Vapi, OpenAI Realtime APISTT → intent → TTS; latency-critical; often hybrid with human transfer
Workflow Automationn8n, Zapier AI, UiPathDeterministic triggers + LLM for unstructured steps
Search + GenPerplexity-style, enterprise searchRAG with web or internal index; citation-first UX
Multi-Agent TeamsCrewAI, AutoGen patternsRole-specialized agents; high cost — use when decomposition is proven necessary
Computer Use / UI AgentsBrowser automation, RPA+LLMFragile; prefer API-first transactional agents when APIs exist
Anti-pattern: "Agentic RAG" for every query. ~70% of production cases work with single-pass RAG + good reranking. Add agent loops only when eval data proves compound queries are failing.

6 Industry Playbooks — Taxonomy in Practice

🏥 Healthcare

Interpretive: "What does my plan cover for physical therapy?" → RAG over benefits docs.

Transactional: "Schedule my follow-up" → Agent + EHR scheduling API.

High-Risk: "Is this chest pain serious?" → Guardrail → nurse triage line. Never diagnose.

Compliance: HIPAA, no PHI in logs, BAA with vendors.

🏦 Insurance / Financial Services

Interpretive: Advisor RAG over 500-page policy PDFs (maternity vs pregnancy keyword gap).

Transactional: FNOL claim intake, beneficiary updates → Agents + core policy admin APIs.

High-Risk: Investment advice, fraud disputes → human handoff + audit trail.

Analytical: Coverage gap summaries across product lines → multi-doc agentic RAG.

🛒 E-Commerce / Marketplaces

Interpretive: "Will this fit my 2019 Honda?" → RAG + structured fitment graph.

Transactional: Order status, returns, refunds → Agents bound to OMS APIs.

Navigational: "Where's my seller dashboard?" → route, don't generate.

Search AI: Query understanding + semantic retrieval (not generative answers for every search).

💼 HR / Talent (ResumeInterview Domain)

Interpretive: "How do I tailor my resume for this JD?" → RAG over role requirements + user profile.

Transactional: Apply to job, schedule mock interview, export PDF → product actions.

Analytical: Fit score explanation across skills gap → structured comparison agent.

Meta: "Make it more senior" → session memory + iterative refinement loop.

🏛️ Government / Public Sector

Interpretive: Benefits eligibility, permit requirements → RAG over official docs only.

Transactional: Form pre-fill, case status → Agents with strict auth.

High-Risk: Legal interpretation, immigration → mandatory human review.

Constraint: FedRAMP, data residency, no external model calls for classified data.

🏭 Manufacturing / Supply Chain

Interpretive: SOP lookup, safety procedures → RAG with version-controlled docs.

Analytical: "Which suppliers depend on component X?" → graph RAG.

Transactional: PO creation, inventory holds → ERP agents.

Voice: Hands-free floor worker queries via voice AI.

⚖️ Legal

Interpretive: Contract clause lookup → RAG with precise citations (page, section).

Analytical: Compare redlines across versions → multi-doc agent.

High-Risk: "Should I sign this?" → never autonomous; attorney handoff.

🎓 EdTech / Training

Interpretive: Concept explanation, study guides → RAG over curriculum.

Meta: "Quiz me harder" → adaptive difficulty from session state.

Transactional: Enroll, submit assignment → LMS integration.

Eval focus: Factual accuracy rubrics; hallucination = student harm.

7 Forward Deployed AI — The Delivery Model

Palantir pioneered the Forward Deployed Engineer (FDE) — now adopted by Deloitte, Scale AI, Databricks, and enterprise AI consultancies. In 2026, FDE roles explicitly require GenAI/agentic delivery, not just data integration.

FDE vs Platform Engineer vs Applied AI Engineer

DimensionPlatform / Product EngineerApplied AI EngineerForward Deployed Engineer
Customer proximityIndirect (PM proxy)SometimesEmbedded on-site / in war room
Problem shapeGeneralizable featuresLLM system componentsOne client's ambiguous problem → working solution
Success metricDAU, feature adoptionLatency, accuracy, costCustomer mission outcome in weeks
Skills emphasisScale, abstractionsRAG, agents, evalStakeholder mgmt + rapid prototyping + politics
"As an FDE, your responsibilities look similar to a startup CTO: own end-to-end execution of high-stakes projects — architecture, data wrangling, custom apps, executive conversations, and team strategy." — Palantir FDSE job description

The FDE Pod Model (Deloitte / Enterprise Pattern)

Career insight: FDE is the fastest path to learning Applied AI in context. You see why RAG fails on real PDFs, why agents need confirmation steps, and why executives care about cost per conversation — not benchmark scores.

8 Build vs Buy vs Platform

LayerBuildBuy / ManagedRecommendation
Foundation model Train from scratch OpenAI, Anthropic, Bedrock, Vertex Buy API — 99% of Applied AI roles
Vector search Custom FAISS Pinecone, OpenSearch, pgvector Managed until scale proves otherwise
Agent orchestration Custom state machine LangGraph, Bedrock Agents Framework first, custom only for edge cases
Evaluation Custom pytest + rubrics LangSmith, Braintrust, DeepEval Hybrid — platform for traces, custom for domain rubrics
Full conversational platform From scratch Kore.ai, Cognigy, Ada, Sierra Buy for standard support bots; build for differentiated IP

9 Evaluation & LLMOps — The Skill That Gets You Hired

Minimum Viable Eval Stack

  1. Golden dataset: 50–200 question/answer pairs from real user queries (anonymized)
  2. Offline metrics: Faithfulness, answer relevance, citation accuracy (RAGAS or custom rubric)
  3. CI gate: PR cannot merge if faithfulness drops >2% vs baseline
  4. Production traces: Log every request — intent class, retrieval docs, latency, cost, user feedback thumbs
  5. Weekly review: Sample 20 failed traces; add failures to golden set

Metrics That Matter to Business (Not Just ML)

MetricDefinitionWhy Executives Care
Containment rate% resolved without humanSupport cost reduction
Groundedness / faithfulnessAnswer supported by retrieved contextLegal/compliance risk
Time-to-resolutionMedian conversation length to outcomeCustomer satisfaction
Cost per conversationTokens + infra / sessionUnit economics at scale
Escalation rate% routed to humanGuardrail effectiveness
Hallucination rateFactually wrong on golden setBrand trust

10 90-Day Applied AI Learning Roadmap

Weeks 1–2
Foundations: Python async, FastAPI, LLM APIs (structured outputs, tool calling). Build a CLI chatbot with conversation memory. Read model docs for Claude, GPT, Gemini — understand context windows and pricing.
Weeks 3–4
RAG deep dive: Ingest 50 PDFs, experiment with chunk sizes, hybrid search, Cohere rerank. Measure recall@5 on 30 test questions. Document what broke and why.
Weeks 5–6
Intent taxonomy: Log 100 sample queries, cluster into 4–6 classes, build classifier router. Wire each class to appropriate pipeline (RAG vs stub agent vs handoff mock).
Weeks 7–8
Agents: LangGraph state machine with 3 tools (search, calculator, API mock). Add max hops, retry logic, human confirmation for write operations.
Weeks 9–10
Eval + observability: RAGAS or DeepEval in CI. LangSmith/Langfuse tracing. Build dashboard: latency, cost, faithfulness over time.
Weeks 11–12
Capstone: Deploy end-to-end domain assistant (pick one vertical from Section 6). Write architecture doc, eval report, and 5-min demo video. Publish GitHub repo with README.

11 Portfolio Projects That Get Interviews

Project A: Policy Q&A with Intent Router

Upload benefits PDFs. Classify: interpretive vs transactional vs high-risk. RAG with citations for Q&A; mock API for "update beneficiary"; handoff UI for advice requests.

Signals: Taxonomy thinking, RAG debugging, guardrails awareness.

Project B: Job Search Copilot (ResumeInterview-aligned)

Parse resume + JD → fit analysis → tailored bullet suggestions → mock interview questions. Separate interpretive (explain gap) from transactional (save application).

Signals: Product sense, structured outputs, multi-step workflow.

Project C: Support Agent with Eval CI

50 FAQ golden set. Agent with order lookup tool. CI fails on faithfulness regression. Public trace viewer.

Signals: LLMOps maturity — the #1 differentiator in 2026 hiring.

Project D: GraphRAG for Dependencies

Ingest architecture docs + YAML configs. Answer "what breaks if X fails?" using Neo4j + vectors.

Signals: Advanced retrieval — differentiates senior candidates.

12 Seminar Exercises & Discussion Prompts

Live Exercise 1 — Classify These Queries (5 min)

Audience breaks into groups. Classify each query using the 6-class taxonomy and name the technical solution:

  1. "What's the deductible on my gold plan?"
  2. "I want to cancel my subscription effective today."
  3. "I'm really frustrated — your bot charged me twice and I might sue."
  4. "Compare PPO vs HMO for a family of four with one chronic condition."
  5. "Take me to the claims upload page."
  6. "Why did you recommend that plan? I said budget under $300."

Live Exercise 2 — Architecture Whiteboard (10 min)

Pick one industry (healthcare, e-commerce, HR). Draw: intent classifier → pipelines → eval layer. Identify ONE high-risk path that must never be fully automated.

Discussion Prompts

Recommended Follow-Up Resources

Appendix: Applied AI Patterns in ResumeInterview

Connecting seminar concepts to product work already in this codebase — useful for demonstrating real Applied AI delivery:

Product FeatureIntent ClassPattern Used
Resume tailoring / curatorInterpretive + MetaStructured LLM outputs, iterative refinement, user profile context
Mock interviewInterpretive + AnalyticalMulti-turn conversation, follow-up generation, rubric scoring
Job fit scoringAnalyticalStructured comparison, semantic matching, explainable gaps
Simple Apply / FlashApplyTransactionalAgent-like workflow — form mutation via integrations (Greenhouse, extension)
Screening sessionsHigh-Risk (hiring decisions)Human-in-loop, scored reports, not autonomous hire/reject
Study pack generationInterpretiveRAG over role requirements + generated curriculum
Seminar close: Applied AI is not one model call. It is taxonomy → routing → the right pattern → evaluation → deployment. The engineers who win in 2026 ship systems, not demos.