How CodeSignal makes it easy to write and run unit tests for a coding assessment

Unit testing is a powerful tool in a coding assessment, for both interviewers and candidates. For one, unit testing lets interviewers score assessments automatically, saving time spent grading by hand. While there are other ways of doing automatic scoring, like checking the output of the candidate’s code against certain inputs, unit testing is more flexible and lets you dig deeper into language-level skills. For example, you can test concurrency and threading or enforce that the code returns results of a certain type.

And for candidates — when they can write and run their own unit tests, they’re able to validate their code before they submit it, giving them a better chance at success. Finally, testing is an important skill for teams to measure in an interview. Not only do software engineers write unit tests nearly every day on the job, but if you’re able to write good unit tests, you probably have a strong understanding of what your code is trying to do.

In this article, we’ll talk about how we built a way for candidates and interviewers to do unit testing in assessments with CodeSignal.

Requirements

When we set out to develop a unit testing feature, we started by identifying the requirements for the product:

We should support the most common unit testing frameworks for the most popular languages, adding more over time.
Interviewers should be able to make unit tests read-only and/or hidden from the candidate.
Interviewers should be able to assign weights to the test cases, defining how much they should affect the overall score.
Unit tests must be written in a separate file (and often a separate directory) from the main program. Unit tests are therefore only supported in CodeSignal’s Filesystem Tasks, where the assessment can contain multiple files.
There should be few constraints on how the user can organize their unit tests. If they want to divide their tests across several different files, we should support this.
We should hide unnecessary complexity from the user so that they can focus on writing code rather than managing tools and frameworks. We should handle all the work of installing the unit testing framework, setting up the directories, building the code, and running the tests.
We should make the test results as readable as possible. Sometimes, if you look at the raw console output only, it can be hard to see at a glance which tests passed and which failed. That’s why most unit testing harnesses offer some kind of formatting that is more structured and user-friendly; we wanted to match this functionality in our IDE.

We’ll now discuss how we addressed these requirements.

Frameworks supported

Because each unit testing framework is language-specific and unique, there’s non-trivial work involved in supporting new frameworks. In Filesystem Tasks, we currently support the following:

Java + JUnit Maven
Java + JUnit Gradle
JavaScript + Mocha
TypeScript + Mocha
C# + .NET Core (NUnit)
C# + .NET Core (xUnit)
PHP + PHPUnit
Python 2 + unittest
Python 3 + unittest
Ruby + RSpec
Go + gotest

Support for C++ is coming soon.

Making tests hidden or read-only

Interviewers should be able to make some unit tests hidden or read-only. Our IDE client follows certain logic for deciding whether to show files or enable editing. If a file is marked hidden by the interviewer, it doesn’t show up to the candidate. If it’s marked read-only, the candidate can see it, but can’t edit it.

You can learn more about how to use these functions in our Measuring Dev Skills episode on Filesystem Tasks. Making test cases visible (but read-only) can help make your expectations clear to the candidate. On the other hand, hiding test cases can be useful if you want to test for the candidate’s ability to think ahead about possible edge cases.

Adding weights

Since certain test cases might be more important to the overall score, we wanted the interviewer designing the task to be able to add weights to the tests that they write. To make this easier, we auto-detect the test cases so that the interviewer doesn’t need to add them manually.

Our trick for doing this is to run our starter project code against the unit tests (it will most likely fail all of them) and then look at the results. The names of the test suites and cases will be in the XML results, and we can just parse them out and present them to the user for adding weights.

Test case weights are summed up and then used as a ratio, multiplied by the max score for the task. So, for example, if you have 6 test cases that each have a weight of “1,” the total weight is 6. Let’s say that a task is worth 300 points, and the candidate gets 4/6 test cases correct. We’ll apply that ratio (4/6) to the score to get (4/6) * 300 = 200.

But that’s an easy example — let’s consider if the weights are not equal. If test cases 1 through 5 are each weighted as “1”, but test case 6 is weighted as “5,” then the total weight is 10. Let’s say the candidate gets test cases 1, 2, and 3 correct. Even though that’s half the test cases, the total weight ratio is 3/10. So the score for the task will be (3/10) * 300 = 90.

Minimizing complexity for writing and running unit tests

Probably the biggest effort when implementing a new unit testing framework is figuring out how to minimize the complexity for the user. Our goal is to preconfigure the environment so that all the user has to do is write a few import statements in their unit testing file (as they would normally, to load in the functions they’re testing), and press “Run.” In general, the work breaks down into a few pieces:

Pre-installing the unit testing framework so that it works both in our IDE and on our coderunners (the specialized microservices that handle running code for assessments).

Writing a run script that will extract and combine multiple test cases, and won’t depend on having the tests all in one file. For example, the user should be able to add different test files with different names and the run script should handle it with no problems.

Making sure that the program will build correctly. Setting up unit testing for compiled languages (like Java or C#) tends to require more work than for interpreted languages (like Python or JavaScript). If you look at Filesystem Task for a Java + JUnit environment, you’ll see a build.gradle file that contains build setup rules and instructions for the compiler so that it understands the dependencies, names, and directories in the program. Our job is to make sure that these build rules are general enough to work for almost all cases. Ideally, if the user is making changes within the scope of what’s expected for writing their solution, they never need to touch this file.

Creating simple starter code that works out of the box using a sample function and a simple test. This helps the user understand how the test cases should be written and what directory they should be located in.

Presenting results with more readable formatting

Using only the standard output, it can be hard to quickly scan through your unit test results. We wanted to format the results in a simple, useful table, where tests are grouped by main and sub test cases. When you select Run Pretty Results in the CodeSignal console, you can clearly see which tests passed (green) and which failed (red), along with some context about any errors.

How do we generate these nicely formatted results? Well, the unit testing tools that we support output results in an XML format. The most common one is called JUnit XML, which describes concepts like <testsuite> (the “main” test cases) and <testcase> (the “sub” or “child” test cases). There’s also a <failure> object that gives more details when something breaks. Other frameworks like .NET Core output slightly different kinds of test result XML. For each output format, we have an adapter that knows how to take the XML and parse it into our “pretty results” in the console.

Conclusion

Writing and running unit tests can make a technical assessment that much more realistic and practical to the on-the-job work of software engineering. You can check out this feature in our Filesystem Tasks by turning on Automatic Scoring. Interested in helping us build the future of technical interviewing? Take a look at our Careers page.

Aram Drambyan is a Software Engineer at CodeSignal, mainly focused on infrastructure-heavy projects. He’s like the Dark Knight doing the underground work that saves the world!

Additional Resources

AI Interviewer