Skip to content Skip to footer

Introducing conversation practice: AI-powered simulations to build soft skills

Becoming a successful engineer requires more than just technical chops—it also requires mastering soft skills. However, engineers have limited tools to practice these skills effectively. For example: If you need to give difficult feedback to your coworker, you can find books, podcasts, or videos that provide you frameworks on how to approach the problem. But it’s tough to master the skill until you’ve done it. To grow your career, you need to be building these skills—and we found a revolutionary new way to help you do that with AI-powered conversation practice. 

In this blog post, we’ll explain how this feature works, show a few of its use cases, and dive deeper into some of the technical problems we had to solve to build it. 

How AI-powered conversation practice works

Using skills data extracted from hundreds of engineering job descriptions at top companies hiring tech talent, we’ve built learning paths designed to help you practice and master today’s most in-demand soft skills. In these new paths, we’re covering techniques to master key leadership and communication skills, along with an AI agent that allows engineers to put these skills to practice in simulated scenarios. After each practice session, our AI tutor Cosmo provides actionable feedback on how to improve. 

A video is worth a thousand words—so here’s our CEO, Tigran Sloyan, using conversation practice to prepare himself for upcoming conversations with reporters about this feature. 

Tigran Sloyan preparing to announce CodeSignal’s conversation practice feature.

Now that you’ve seen how our CEO leverages conversation practice, let’s explore how it can empower engineers at various career stages.

Engineer manager practicing delivering tough feedback to a direct report ahead of their one-on-one. 
Engineer practicing developing active listening skills to become a more supportive teammate.  
Candidate practicing behavioral interview questions ahead of a recruiter phone screen. 

Behind the scenes: Building conversation practice

In designing the conversation practice AI agent, we faced the challenge of replicating the intricacies of human communication. Natural conversations involve countless subtle decisions made in milliseconds, creating a complex interplay of timing, context, and social cues. Consider a scenario where you’re talking to a recruiter on a phone screen. You’ve just answered a question about your greatest professional achievement, and the interviewer responds with a brief pause followed by “I see.” Should you react by elaborating further on your answer, wait for the next question, or ask if they need any clarification?

The answer, of course, depends on the context. Any of these approaches could make sense depending on the recruiter’s tone, body language, and your prior conversation. Similarly, the AI agent needs to adapt its conversational approach to match the user’s cues. To achieve this, it needed to listen to the user and process the input in real time, chime in with a helpful response at the right moment, and cleverly handle any potential interruption. In the rest of this blog post, we’ll explain how we built the AI agent to meet these requirements and create a smooth and seamless experience. 

Minimizing latency for real-time dialogue

Minimizing latency is critical for a fluid conversation, but it’s a complex challenge, given bottlenecks at each layer of the experience. Any time a user interacts with the voice agent, the audio from their headphones is transmitted from the browser (client) to our backend, where it gets converted to text via a speech-to-text model. However, each input device captures audio differently, resulting in varying audio quality (measured by sampling rate). Our speech-to-text models require a specific sampling rate to guarantee the most accurate and efficient transcription. Therefore, we used adaptive resampling techniques to standardize audio quality, reducing variability and ensuring that audio data is processed swiftly. 

But this is just one half of the equation. Once we have the user input text, we feed it into a customized LLM to generate a response, which is converted to audio via a text-to-speech model that’s sent back to the client for playback. Depending on the audio file size and quality of the internet connection, this process could result in users waiting a longer than expected time for a response. To solve this problem, we do a few things. First, we use the WebSocket protocol to transfer the audio data back-and-forth in real time. Second, we break the audio response into chunks, allowing the client to start playback without requiring the full response. The combination minimizes perceived latency, making the whole experience feel natural and real-time. 

Mastering turn-taking

For our AI agent, perfecting turn-taking—the balance of knowing when to speak and when to listen—was crucial to creating a seamless interaction. This challenge is especially tricky because the AI agent needs to find the just right “Goldilocks” moment to speak. Too soon, and the user might get cut off. Too late, and they might perceive the agent as laggy and unnatural. 

To address this challenge, we needed to understand the content of the user’s speech to determine when they’ve expressed a complete thought. Our AI agent is constantly analyzing what has been said, looking for pauses after a complete thought to take its turn. For example, if the user says “My name is…” and trails off mid-sentence, the AI will wait for the user to finish. But if the user pauses after saying, “My name is John,” then the AI agent concludes that it can speak because they’ve shared a complete thought. 

Handling interruptions with flexibility

Interruptions are a natural part of human conversations—whether it’s to ask a quick question, clarify a point, or react to something unexpected. In designing our AI agent, we had to determine how the agent should behave when it was interrupted by the user. Should it keep speaking, or pause and listen?

If this were a situation with two humans, the expectation would depend on the relationship between the speakers and the situational context of the discussion. In our case, we wanted the AI agent to come across as a compassionate and polite human so users felt safe when practicing. Therefore, we decided that if the AI agent is interrupted, it will stop its turn, listen for new input, and use the latest information to craft its future response. This behavior both maintains the AI agent’s persona and ensures that the conversation remains fluid. 

Takeaways

Being an effective engineer requires deep technical chops and mastery of soft skills like leadership and communication. We believe the best way to build these skills is by practicing them in realistic simulations that mirror their real-world application. 

Leveraging generative AI, we’ve developed an AI agent that enables immersive, interactive practice through simulated conversations, handling nuances like interruptions and turn-taking.

We feel confident that these simulations will help engineers get the practice they need to master critical soft skills. If you’re interested in trying out conversation practice, we encourage you to check out a soft skills learning path in CodeSignal Learn today.