CodeSignal Solves It: hostnamesOrdering

hostnamesOrdering solution

The Challenge of the Week that ended on March 21 was hostnamesOrdering. Only 72 CodeFighters submitted solutions – this was a tricky one! In this breakdown, I’m going to walk you through my thought process in writing a solution.

In this database challenge, the task is to take a table of numeric ids and hostnames, and return a sorted version of the table. The challenge is that we have to order the hostnames by the reverse hostnameThe hostname is a string of domains separated by periods, such as codesignal.com or www.github.com. The domains don’t include any protocols (such as http://), nor do they include any paths (such as codesignal.com/forums). The general form of a hostname is domain1.domain2. … .domainN. The reversed hostname is domainN. … .domain2.domain1. For example:

[table id=6 /]

Note that the list above is ordered by the reverse hostname. We are told that there are at most 3 domains in any given hostname that appears in this challenge.

Getting started

Let’s start with an example of the table hostnames:

[table id=7 /]

Let’s suppose for a moment that we have a way of generating the reversed hostnames, which will actually be the hardest part of the challenge.

[table id=8 /]

The query we want to return is:

# Assuming we can make a ‘reversed’ column
SELECT id, hostname FROM hostnames ORDER BY reversed;

which would return:

[table id=9 /]

Reversing the domain name

We won’t actually construct an explicit reversed column. Instead, we will generate the pieces of the domain we want on the fly. To do this, we will be using the SUBSTRING_INDEX function. From the documentation, we see that SUBSTRING_INDEX( string, delim, count) returns a substring.

If count is positive, SUBSTRING_INDEX( string, delim, count) returns the substring from the beginning to the countth occurrence of delim. For example:

mysql> SELECT SUBSTRING_INDEX('www.codesignal.com', '.',1);
          -> www
mysql> SELECT SUBSTRING_INDEX('www.codesignal.com', '.', 2);
          -> www.codesignal
mysql> SELECT SUBSTRING_INDEX('www.codesignal.com', '.', 45);
          -> www.codesignal.com

If count is negative, SUBSTRING_INDEX finds the |count|th occurrence of delim from the right, and returns the substring from there to the end of the string. For example:

mysql> SELECT SUBSTRING_INDEX('www.codesignal.com', '.', -1);
       -> com
mysql> SELECT SUBSTRING_INDEX('www.codesignal.com', '.', -2);
       -> codesignal.com
mysql> SELECT SUBSTRING_INDEX('www.codesignal.com', '.', -100);
       -> www.codesignal.com

Using the same example we’ve been working with, here is a query that splits the hostname into three pieces. Using SUBSTRING_INDEX, we can extract the top level domain, the midlevel domain, and the lowest level of domain just by changing the index value.

SELECT id, hostname, SUBSTRING_INDEX(hostname,'.',-1) as top,
                                      SUBSTRING_INDEX(hostname,'.',-2) as mid
               FROM hostnames ORDER BY top, mid, hostname;

This would work if every domain has exactly three pieces. Unfortunately, some domains only have one or two pieces. What is actually returned is:

[table id=10 /]

The first two rows are in the wrong order. They match at the top level (i.e., they are both .com domains). The com domain has no lower level, but the SUBSTRING_INDEX just returns the entire string com again. Since codesignal.com is earlier in the alphabet than com, the table puts codesignal.com first. What should happen is that codesignal should be compared to the empty string, and com should appear first.

One trick to fix this problem is to prepend the hostnames with ‘…’, guaranteeing that there are three periods in each domain. So our query is now:

SELECT id, hostname, SUBSTRING_INDEX(CONCAT('...', hostname),'.',-1) as top,
                                      SUBSTRING_INDEX(CONCAT('...',hostname),'.',-2) as mid
               FROM hostnames ORDER BY top, mid, hostname;

Writing down just the first four rows, this query returns:

[table id=11 /]

This has everything in the right order!

A problem and a workaround

In principle, we just have to eliminate the top and middle column, so we are only reporting the id and hostname. Before making this change, we can try running this in CodeSignal. When we do it, we run into a problem: the CodeSignal MySQL engine returns workbench.mysql.com before dev.mysql.com!

This seems to be a bug; running MySQL on a local machine returns dev.mysql.com before workbench.mysql.com. However, ordering by the hostname on CodeSignal returns the opposite. When using ORDER BY top, mid, hostname, workbench.mysql.com appears first. This seems wrong, since both these hostnames have identical top and mids, so the ordering should be determined by hostname.

Part of being a programmer is working around issues in any software! The problem we are encountering comes from doing multiple orderings. Our workaround will be to concatenate the top, middle, and hostname into one string. We will separate the top, middle, and hostname by a space, because spaces cannot occur in a domain and spaces appear earlier in ASCII than any letter.

So our workaround is:

SELECT id, hostname, SUBSTRING_INDEX(CONCAT('...', hostname),'.',-1) as top,
                                      SUBSTRING_INDEX(CONCAT('...',hostname),'.',-2) as mid
                FROM hostnames ORDER BY CONCAT(top, ' ', mid, ' ', hostname);

This orders dev.mysql.com and workbench.mysql.com correctly.

Eliminating columns

Now we need to eliminate the top and mid columns. We could do a subquery, but instead we will actually construct the top and the middle strings as part of the ORDER BY clause.

Our solution ends up looking like this:

CREATE PROCEDURE hostnamesOrdering()
    SELECT id, hostname FROM hostnames
          ORDER BY CONCAT(SUBSTRING_INDEX(CONCAT('...',hostname),'.',-1),
                                               ' ',
                                               SUBSTRING_INDEX(CONCAT('...',hostname),'.',-2),
                                               ' ',
                                               hostname);

This solution only selects the desired columns. Notice that in the ORDER BY we have a single string, made from the “top” level domain, a space, the “middle level” domain, a space, and then the hostname. This gets around the implementation bug, and gets us a solution with 188 characters.

On a local MySQL database, we don’t need the final concatenation, and we could use multiple ordering to get:

CREATE PROCEDURE hostnamesOrdering()
    SELECT id, hostname FROM hostnames
          ORDER BY SUBSTRING_INDEX(CONCAT('...',hostname),'.',-1),
                              SUBSTRING_INDEX(CONCAT('...',hostname),'.',-2),
                              hostname;

Going further

As a code golfing exercise, we have some compact solutions. As a coding exercise, you might be interested in trying to solve the general case where we don’t have a limit on the maximum number of domains in a hostname!

This takes us back to our original solution:

SELECT id, hostname FROM hostnames ORDER BY reverse_hostname(hostnames);

where the interesting exercise is to create a custom function reverse_hostname. If you get stuck, this StackOverflow post might be useful.

Tell us…

How did you tackle this challenge? What do you think about the approach I took in solving it? Let us hear from you in the CodeSignal forum!

Related Posts

Copyright © 2018 BrainFights Inc. All rights reserved.​