3. Five lessons from Lilli’s development

The team—and McKinsey as a whole—learned a lot over the course of Lilli’s development and continues to do so as it expands Lilli’s capabilities. Below are five of the many.

1

Prompts matter—a lot

Prompt engineering is a new skill, and, even with training, software engineers alone won’t be able to do it effectively; domain experts must be involved in the process. And that process is more of an art than a science. Development teams need to iterate continually to incorporate feedback and emerging best practices, and define metrics to measure the impact of version changes and experiments.

Users, too, must learn the art of prompting. “Prompt anxiety,” or not knowing what to ask Lilli, stood as a major barrier to adoption initially. Just one hour of prompt training boosted our colleagues’ usage substantially.

1

Prompts matter—a lot

2

Be vigilant about data curation

3

Invest in an orchestration layer

4

Test and test again—and again

5

It’s never just tech

Powered by Ceros

Prompts matter—a lot

Be vigilant about data curation

Invest in an orchestration layer

Test and test again—and again

It’s never just tech

1

2

3

4

5

3. Five lessons from Lilli’s development

The team—and McKinsey as a whole—learned a lot over the course of Lilli’s development and continues to do so as it expands Lilli’s capabilities. Below are five of the many.

Prompt engineering is a new skill, and, even with training, software engineers alone won’t be able to do it effectively; domain experts must be involved in the process. And that process is more of an art than a science. Development teams need to iterate continually to incorporate feedback and emerging best practices, and define metrics to measure the impact of version changes and experiments.

1

Prompts matter—a lot

Data privacy and intellectual property issues rightfully stand top of mind for organizations and need to be sufficiently addressed. Lilli’s data strategy team, which includes a product manager, data life cycle director, and legal and risk professionals, among others, plays a central role in ensuring Lilli’s compliance and security.

Be vigilant about data curation

2

Invest in an orchestration layer

3

As the Lilli team experimented with off-the-shelf LLMs, it found that no single one delivered the level of specialization needed to accommodate McKinsey-specific content. For example, the word “impact” means something entirely different to a consultant than what it might mean to, say, a worker at an auto manufacturer. And employing a large model for some simpler tasks wasn’t cost-efficient. 

To solve for these issues, our engineers developed a patented orchestration layer that routes requests to different LLMs or other types of AI to better recognize user intent, optimize cost, and deliver high-quality responses. The layer provides the added benefit of enabling experimentation with different LLMs that easily “plug” into the system. Many organizations already face similar issues and would benefit from investing in this critical element of their gen AI systems.

Our experience building Lilli taught us to prioritize testing over development. Given the nascency of LLMs, we built active learning loops into our development to enable swift adjustments. There were certainly bumps along the way. For example, at one point early in the rollout, we changed our chunking strategy (breaking data sets into smaller pieces to improve processing). It caused the model to start hallucinating. We quickly paused deployment to course correct. 

Test and test again—and again

4

As documented time and again, using gen AI–based applications demands an entirely new way of working. We prioritized user-adoption programming from Lilli’s inception and have embedded it everywhere possible. It began with the platform design, which prioritized an intuitive interface and self-learning. 

It’s never just tech

5