2. Building Lilli at the speed of change
ChatGPT and similar tools were gaining rapid attention in early 2023. McKinsey wanted to give users a fully secure tool as soon as possible to provide colleagues with a safe way to tap into the power of large language models (LLMs). Time—and security—was of the essence, so McKinsey moved quickly to mobilize the Lilli effort.
Proof of concept
Roadmap and operating model
Development decisions
Build, test, iterate
Rollout
Building, testing, and iterating
May 2023, Duration: Build, 5 weeks; Testing, 3 weeks
Creating the proof of concept
March 2023, Duration 1 week
Establishing the roadmap and operating model
April 2023, Duration 2 weeks
Making key development decisions
May 2023, Duration 2 weeks
Rolling out Lilli
July 2023, Duration: 3 months
Lilli’s development decisions were based on five criteria
Beta
(500 users)
MVP
(5,000 users)
Initial firm launch (45,000 users)
Cost
Scalability
Timing
Security
Performance
McKinsey followed a three-step approach to establishing Lilli’s use case roadmap
Alpha testing and user feedback enabled rapid improvements
Week 1 to Week 2
Week 2 to Week 3
1x
improvement in answer quality during alpha testing
+32%
+52%
Prioritized list of business-led use cases and initial users
Development decisions for the gen AI stack were guided by a five-point framework that evaluated cost, scalability, performance, security, and timing. When it came to determining whether to adopt a taker, shaper, or maker strategy for Lilli’s underlying LLMs, the best answer turned out to be shaper with a dash of maker. Lilli leverages a prebuilt model hosted by a hyperscaler, and the team trained five of its own smaller expert models to understand user intent and improve the relevancy of Lilli’s answers.
It all began with a small team that built a proof of concept for Lilli in just one week. That lean build and a five-minute presentation provided leadership with more than enough evidence to support investment in developing a full-blown gen AI platform.
The team developed the Lilli MVP in roughly five weeks and provided access to a set of 200 alpha users, enabling nearly a month of testing and iteration before gradually offering Lilli to the entire firm.
“We take a user-backed ‘learn, build, measure’ approach based on feedback and analytics,” says Kitti. “This is where a lot of companies can go wrong in scaling technology. You need to ensure the product is always anchored in user feedback and solving real problems.”
A more robust central team coalesced to plan the approach before work began on a minimal viable product (MVP). After aligning on a North Star vision, the team established, collected, and prioritized use cases. Through workshops and interviews, they developed use case profiles that helped assess the value, impact, feasibility, and technology requirements for each, which enabled leadership to quickly align on those to tackle first. Cross-functional agile squads, with a 1:1 ratio of nontechnical to technical professionals, were then stood up for core areas such as technical development, safety and security, user analytics, and adoption. Kitti and other leaders guided the entire Lilli platform experience and orchestrated the teams to deliver it.
McKinsey opened Lilli to colleagues gradually over the course of three months, with the platform becoming available to all employees by October 2023. The Lilli development team continued to add new features and improve the performance of existing ones, guided by an unwavering focus on quantitative and qualitative user feedback. New capabilities are always tested rigorously with alpha and beta users to assess performance against carefully selected metrics, such as user interaction depth and output quality, before release to the entire firm.
Ideation
Stakeholder workshops, user interviews, best practice research
1
100%
of McKinsey employees have access to Lilli
Impact and feasibility assessments Value and impact sizing, additional stakeholder and user interviews, discussions with technology leaders
Use case profile and initial tech requirements
2
Prioritized roadmap of use cases