Skip to content
~/work/copilot-whiteboard

case study / ai & collaboration

Copilot for Whiteboard: AI-powered collaboration at Microsoft scale

Bringing LLM-powered suggestions and automations to real-time collaboration across Web, iOS, Android, Surface Hub, and Teams Rooms.

2021-2023 Engineering Lead Millions of users, cross-platform

Context

Microsoft Whiteboard is a real-time collaboration canvas used across the Microsoft 365 ecosystem. As Engineering Lead for Mobile Experiences, I owned the Whiteboard experience across iOS, Android, Surface Hub, and Microsoft Teams Rooms (MTR), surfaces with radically different input modalities, screen sizes, and performance envelopes.

The opportunity: bring LLM-powered intelligence into the collaborative canvas itself, enabling AI to suggest content, organize ideas, and automate repetitive whiteboard tasks in real-time as teams collaborate.

Constraints

Cross-platform consistency: the AI features needed to work identically across Web, iOS, Android, Surface Hub (large-format touch), and MTR (room-scale display with limited interaction). Real-time collaboration means every AI suggestion must be conflict-free with concurrent edits from multiple users. Security and scalability challenges included proof token generation for anonymous users during Teams meetings, where authentication models differ from standard Microsoft 365 flows.

Performance budgets varied dramatically: a Surface Hub has different compute capabilities than a mobile phone on cellular data. AI inference latency had to feel instantaneous in a real-time collaboration context.

Architecture

The integration layered LLM capabilities into the existing Whiteboard service architecture, with the AI features operating as a parallel intelligence layer that observes canvas state and offers contextual suggestions. The system was designed to degrade gracefully: AI features enhanced the experience but never blocked core collaboration functionality.

For Teams platform integration, we addressed the cross-platform security model by implementing proof token generation that handled anonymous meeting participants while maintaining the security guarantees required for enterprise data.

I also contributed to the Microsoft AI Hub, creating cross-app AI features that enhance Office experiences using shared data and personalized workflows across the Microsoft 365 suite.

Hard Problems

Making AI suggestions feel natural in a spatial canvas environment, where the “right” suggestion depends on spatial context (where things are on the board), temporal context (what was just added), and social context (who is collaborating and what they’re working on). Traditional text-based AI assistants don’t have to reason about 2D space.

Handling the real-time multiplayer aspect: if the AI suggests organizing a cluster of sticky notes while another user is actively moving one, the system needs to handle that gracefully without creating jarring experiences.

Outcome

Copilot for Microsoft Whiteboard shipped across all platforms, leveraging LLMs to enable AI-powered suggestions and automations that improved real-time collaboration for millions of users. Enhanced Whiteboard service performance for external collaboration scenarios across mobile and device platforms. Established and streamlined on-call processes, monitoring systems, and incident response that improved system reliability.

What I’d Do Differently

The cross-platform parity goal was noble but expensive. In practice, Surface Hub and MTR users had fundamentally different collaboration patterns than mobile or web users. I would advocate earlier for platform-specific AI feature sets rather than insisting on identical experiences everywhere. A Surface Hub in a conference room is a shared, facilitated experience; a phone in someone’s hand during a commute is personal and asynchronous. The AI features should have reflected that distinction rather than abstracting it away.

The real-time conflict resolution between AI suggestions and concurrent human edits was over-engineered for the initial launch. We built a sophisticated system for handling edge cases that occurred infrequently in practice. A simpler approach, where AI suggestions yield to any human edit and re-evaluate afterward, would have shipped faster and covered 95% of actual usage patterns. The remaining 5% could have been addressed based on real user feedback rather than hypothetical scenarios.

I also wish we had instrumented user reactions to AI suggestions more carefully from the start. Understanding not just whether suggestions were accepted or dismissed, but why (through lightweight in-context feedback), would have dramatically improved the suggestion quality in subsequent iterations.