Businessman reviewing text messages from an AI bot

Figure 1: Image generated using the prompt “realistic image of businessman chatting with artificial intelligence” by DALL-E 3.0, Microsoft Designer, 2024 (https://designer.microsoft.com).

tl;dr: Incorporating the next generation of Artificial Intelligence into our B2B Sales exercise offers a useful lens into how they work and what tradeoffs to consider.   

A few years ago we published a sales training exercise that highlighted how better questions can help uncover hidden needs, and potentially drive better outcomes. In lieu of having a live customer on call 24/7, we trained IBM Watson Assistant (“Watson”), which uses an intent-detection model, to categorize student submissions, recognize each question by type, and then use these cues to deliver the best possible response. In spite of being one of the first commercial AI platforms, Watson was pretty good at standing-in for a real customer, and for the last 5 years it’s been one of our most popular simulations. 

If it ain’t broke, why fix it?

However, it made a few trade-offs in search of speed and streamlined training. Watson offered a tool to manually set a series of rules and conditions, one that we used to craft a decision tree of sorts that determined the response sent to students. This meant that responses were limited to what we programmed, and we knew this approach would struggle to respond to unorthodox or unexpected messages – the kind that hundreds of curious, creative, and occasionally mischievous students can come up with. In an effort to better manage the wide range of possible student inputs, we began to audition the latest Large Language Models (LLMs) for the role of Customer, and they proved more than capable. 

Chat messages between demo user and LLM role playing as corporate buyer.

Figure 2: “Paints Automotive demo discussion” SimCase Co., 2024. Author’s screenshot.

Sounds great! What’s the catch?

This success meant that we had to come to terms with the one central tradeoff between Watson and an LLM like ChatGPT/Claude/Gemini. In a trade-off familiar to many managers, we would need to swap control for creativity. By design, obtaining greater conversational range was only possible if we relinquished control of the response. Students would no longer see a response picked from a pool of options, but instead they would see a response built by selecting the next best word. Yet since we no longer dictated responses, the model would be better equipped to respond to whatever students submit. 

Understanding Risk and Mitigating It

This tradeoff offered the greatest reward in terms of perceived realism and  comprehension, but it also represented a new set of risks. For us, selecting the next best word seemed similar to relying on the wisdom of crowds – something we had worked with on another project.  Plus, Watson was already successfully using this process to parse inputs, so we decided to move forward. 

Yet selecting the next best word also presented a gameplay risk. We could no longer explicitly guide the player towards the optimal response based on the type of questions they submitted. To address this risk, we focused on prompt engineering. Our Watson experience had taught us the value of trial and error training, and we applied that persistence to guide the responses. In the end, while the actual text varies between runs, we are able to create the same gameplay dynamics that the experience hinges on. 

The last risk was the most difficult to mitigate. A few years ago, SimCase built a game to highlight implicit bias (link). Understandably, we are conscious that by using an LLM we might be introducing any bias present in the reams of text the LLM was trained on. That is why we concentrated our testing on LLMs with stated bias-mitigation policies and processes in place to reduce the potential downstream impact. We are also asking learners to participate in ensuring the experience is free of perceived bias. We have included a reporting feature to this exercise so that we can review responses that reflect bias, if any. While we haven’t identified any bias issues in our testing so far, we do think the risk merits having a process in place to address it. 

Can’t Stop, Won’t Stop Learning

The experience of incorporating LLMs into our B2B Sales exercise has taught us a lot about how they work and what tradeoffs we need to consider. More than most software, this exercise will continue to require regular monitoring and maintenance. However, we hope that this evolution makes for a more enjoyable student experience, and one that serves to better reinforce the core learning objectives.