Live collaborative editing lets multiple participants work on the same remote interview session using CodeSignal’s shared IDE in the browser. Each interview participant can see the same source code as everyone else and make changes to it—while also being able to see each others’ cursors and the updates other participants make. If we didn’t have live collaborative editing, only one participant would be able to make changes in the IDE at a time, which would create an awkward interview experience.
To implement live collaborative editing, you need a way of handling conflicts. For instance, one person might be typing in the middle of a line while another person is typing at the beginning of the same line, resulting in the whole line shifting over and creating position issues for the person typing in the middle. CodeSignal’s infrastructure has to resolve conflicts like these and guarantee that every participant ends up with the same state. In this article, we’ll explain how we achieve this using a concept called operational transformation (OT).
There are two types of collaborative editing: live collaborative editing and asynchronous collaborative editing. Asynchronous collaborative editing is what happens in a version control system like Git. One person makes a change, and then a while later, someone else does. With live collaboration, multiple people are making edits at the same time—assuming some connection latency of milliseconds or seconds.
The challenge of live collaborative editing
It’s this latency that makes live collaborative editing challenging. In theory, if you had a network connection where the delay was guaranteed to be absolutely zero, it would be pretty simple—just like working on the same machine. Changes would appear on screen instantly for everyone, in the exact order they occurred. But in reality, there’s always some delay. Collaborator 1 makes a change, and their change has to be sent to Collaborator 2 so they can see it on their screen. If Collaborator 2 isn’t typing, there’s no problem. But what if Collaborator 2 is making edits right around the same time? They could make a change on their version before seeing Collaborator 1’s update come through.
That’s how conflicts occur. Collaborator 1 and Collaborator 2 started out with the same version, but by making their own edits before seeing the other person’s changes, they got out of sync. By the time C1’s change reaches C2, it might not even be possible to apply that change on C2’s new, different version. For instance, C2 could have removed the whole section where C1’s change was supposed to happen.
CodeSignal’s use case
In a remote interview, we’ve found that usually only one person makes changes in the IDE at a time: the candidate will be typing while the interviewer observes, then the interviewer might write some code as the candidate watches. Live collaborative editing is actually a bit of an edge case for us—but it’s an important and challenging one.
If we didn’t care about lag, the solution for live collaborative editing would be pretty simple. We could just wait for every keypress or change to reach all collaborators. Once they confirmed the change, you would see it on your screen. This is how some remote terminals work—as you’re typing, you don’t see what you type right away because you’re waiting for it to get confirmed on the other end—and it’s not a great experience. When a candidate is working in our IDE, the stakes are high. We want to help them perform at their best, which means not introducing any lag that could throw off their coding flow.
Adding to the challenge, we need to support a variety of functions in our IDE. For instance, we let you do things like multiple block selection or find/replace, where you can change a lot of text at once across a file. To make sure that our collaborative live editing algorithm can handle all of these changes, we need a clean way of representing them in code.
How we represent changes
The most basic thing you can do in an IDE is insert, delete, and replace (update) single characters. It’s pretty easy to think of how we could represent these operations. In the IDE, each character’s position can be described as (row, column). Thus, we could write insert, delete, and update operations that take a position, along with the new character that should go there (in the case of insert and update).
But in an IDE, you’re not always typing single characters. You can select an entire section of code and delete it or replace it with something else. And shortcuts like block editing will let you change multiple places in the code all at once.
What if we want to represent all these types of edits using just one function? (This will be important later on.) A simple approach is to transform every change into a bunch of single character changes. But this could end up generating hundreds of different events just to delete a large chunk of code. The approach that we actually use is to represent every possible change as a replacement of one or more ranges of characters with something else.
If we know the before state and the after state, we don’t have to care about what the operations were—we can simply derive the range replacements. (Here’s a library that lets you do it.) CodeSignal’s IDE, which is based on Monaco, provides the range replacements in the form of a code change operation, which says that the range from position A to position B was replaced with a certain string of characters. If a shortcut like find/replace was used, the code change could contain multiple ranges.
Now that we’ve defined the full scope of the problem, we’re ready to talk about the solution. Live collaborative editing is a well-researched area, and the two most popular approaches are operational transformation (OT) and conflict-free replicated data types (CRDTs). Operational transformation, the more classic method, works by coordinating operations using a centralized server. CRDTs work at the level of the state of the data structure itself, typically on direct P2P connections. We use operational transformation for live collaborative editing in CodeSignal’s IDE.
Operational transformation works by implementing—you guessed it!—transformation functions on operations. These functions take two events, A and B, that happened right around the same time and with the same original state. So B didn’t have knowledge of A’s edit when they made their change, and vice versa. The goal of OT is to consistently transform the events so that their original intent is preserved.
Let’s say the candidate and interviewer both start typing at the same time. In event A, the interviewer inserts a new line at the top of the file. In event B, the candidate types a character on line 10. How can we manage to preserve both of these changes?
If we assume that event A happened first, we could transform event B onto the “branch” that A created. In the transformed event B, we would insert the candidate’s character on line 11, knowing that everything in the file got pushed down by one line because of A’s change. This is called a forward transformation, because event B is trying to catch up with A.
These forward transformations can be defined for essentially any type of operation. You can now see why it was important for us to use just one function to represent all the changes that can happen in the IDE. Otherwise, we would have combinatorially many transformations to implement!
There’s also the concept of a backward transformation, where we would want to see what an event performed on a new version looks like on an older, previous version. We actually don’t have to use backward transformations at all in our implementation, as long as we have a centralized server that determines which operations happened first.
Implementing operational transforms
Our OT implementation requires two things: (1) a way to consistently determine which event came first, and (2) a transformation function.
Which event happened first?
The OT method uses a central server to function as referee between collaborators and determine the order of events.
Let’s say the server receives operation A, and then milliseconds later, receives operation B. If the same collaborator is responsible for both A and B, no problem—clearly these were sequential operations with A coming first. (Caveat: This is only true if your communication protocol preserves the order of events; it might not be the case in non-standard scenarios such as using connections based on UDP.)
The order issues happen when the server receives A from Collaborator 1, and B from Collaborator 2. Which event actually happened first on a universal, objective timeline? This is pretty much impossible to determine, but thankfully it doesn’t matter. For operational transformations, all we care about is that the server determines the order of events as quickly as possible and applies a consistent logic. The simple approach we use is to assume that whichever event the server received first is the event that happened first.
The transformation logic for the IDE isn’t as difficult as you might think. Recall that code change operations A and B each represent one or more ranges of characters that we want to replace with some new characters. There are really just two cases to consider:
1. A and B don’t intersect at all—they’re modifying completely different texts.
In this case, either A’s range is before B’s range, or vice versa. Remember, we’re doing forward transformations, so what we care about is how A needs to be transformed to work in B’s world. If B’s range comes after A’s, it can’t have any effect on where operation A would need to occur in the file, so the transformed version of operation A is just the same as the original. But if B’s range occurs before A’s range, we need to push operation A down/across, in terms of rows and columns.
2. A and B have some intersection in their ranges—in other words, their modifications touch the same text.
This is more complicated—it doesn’t just push the coordinates of the operation, but actually affects the operation’s outcome. What should happen here? There’s really no right answer, as long as the logic is consistent. For example, you could decide to retain both A and B’s versions and put them next to each other. But in practice, it’s best to just consider different cases based on their most likely intent and avoid adding extra complexity. It’s not very often that collaborators will be changing the exact same characters, and as it turns out, people don’t care very much—they’ll just erase or try again.
Putting it all together
At a high level, the implementation for live collaborative editing in our IDE looks like this:
- When a collaborator makes an edit, they send their code change operation and version number to the server.
- The server receives the operation and applies forward transformations as needed to arrive at a new state that incorporates the new operation (as much as possible). To do this, the server keeps a history of operations. It doesn’t have to record all the changes that ever happened—it just has to keep the operations from the oldest version still in use by any collaborators.
- The server updates its state and version number, and broadcasts the operation to all of the collaborators.
- Everyone who receives the operation does some operational transforms on their end and applies these to their own state. Apart from the fact that it would be too costly to transfer the whole state after every single change, this is necessary because the server doesn’t know about operations that collaborators have performed locally but which haven’t yet reached the server.
You might be wondering: What about the non-text operations in the IDE? For example, we let either candidates or interviewers manage frontend libraries, add files, and run the code. These operations need to be collaborative, too—but the outcome tends to be binary, with one change overriding the other (the code either runs, or it doesn’t). Code changes create the most complexity, because you don’t want to just override the whole file, but rather merge the changes as much as possible.
We’ve found that supporting live collaborative editing is less like a typical software engineering problem, and more like a logic puzzle. Getting every transformation correct but then applying them in the wrong order won’t work, and you have to continually ask yourself if the implementation is consistent.
If you have a pretty straightforward use case for live collaborative editing, you might want to consider using a readymade library. We ended up writing our own OT code to provide flexibility, but we’re really grateful to the author of ShareDB for sharing some helpful knowledge. This library provides a general way of transforming arbitrary JSON objects and would be a good resource for anyone looking to implement a similar project.
If you’re just as excited about this type of problem-solving as we are, consider joining our team. CodeSignal is hiring! Check out our careers page for opportunities to work on a remote interviewing tool that goes beyond algorithmic questions to offer a realistic experience—we’re transforming the way hundreds of top companies hire great engineering talent.