Models & Evaluation

AIH06 Building Fast and Useful Multimodal AI Solutions: A Visual Document Helper from Idea to Product

11/19/2026

1:00pm - 2:15pm

Level: Introductory to Intermediate

Veronika Kolesnikova

Microsoft MVP (AI)

Principal AI Engineer

Multimodal vision‑language models make it easy to create an AI-based application that understands your documents, but making it to work fast and reliable in production is much harder. This session uses a visual document helper on Azure—answering questions about invoices and forms—as a running example to explore practical architecture decisions, from storage and pre‑processing to the AI services and models. You will learn which design choices most affect latency, cost, and reliability, how to apply safety and evaluation in a way that fits into existing cloud apps, and what pitfalls to avoid when multimodal features leave the prototype stage.

You will learn:

  • About practical architecture for vision‑language applications
  • How to evaluate design trade‑offs for performance and reliability
  • How to apply safety and evaluation practices effectively