A little while back, our Challenge of the Week was a tricky database challenge called pastEvents. 281 of our awesome CodeFighters solved it! I’ve created a walkthrough of the problem and a few different tactics for finding a solution. Here’s the problem, in case you need a refresher:
During the most recent social event you suddenly realized that you had forgotten your USB drive on one of the previous events. You are pretty sure that you had your flash drive with you just last week, which means that it was probably lost during one of the events of the last 7 days. So, you would like to find all the events you attended during this period.
And the output should be:
A table of all names and dates of events within 7 days of the most recent event, not including the most recent event, order from most to least recent.
The problem gives us a database with the table events, as well as a sample dataset:
[table id=1 /]
How would we construct the solution by hand?
- We would look for the most recent date, in this case the event with
id4 is the most recent with an
- We would then
SELECTall dates that occur within 7 days before
2016-12-02. This would select the events with
ids 2 and 3.
- Finally we would sort the returned events based on
With this strategy in mind, let’s start putting the pieces together!
Step 1: Find the most recent event
For this, we need to find the maximum value of the
SELECT max FROM events
The time taken for this query will scale linearly with the number of events we have stored, which is about the best we can hope for. After all, to find the maximum we have to look at the date of every single record!
Here is a different approach that also selects the most recent date:
SELECT event_date FROM events ORDER BY event_date DESC LIMIT 1
This approach is to take all the dates, order them from latest to earliest, and then select the first one off the list. This approach isn’t as good, because it isn’t as clear what the code is trying to do. If you are reviewing this code, you have to get to the very end of the line to realize that we’re only picking one value. You have to carefully think if you want to order the list in
DESCending order. A naive implementation of this query would also be slower than our first attempt, because first the entire list would need to be sorted before the most recent event was selected (MySQL is almost certainly smart enough to optimize this away). Finally, if our goal is to get a short solution, this line is less clear and more verbose than our attempt based off the max function.
Now we need to write a query that uses the latest date and selects events in the 7 days prior to our event. We can either use our query above as a subquery, or store the results as a variable and use that. We will start by using the variable
@recent, as it leads to more readable code, and then optimize once we’re done.
Our solution so far looks like this:
CREATE PROCEDURE pastEvents() BEGIN SET @recent = (SELECT max(event_date) FROM Events); /* can also use SELECT MAX(event_date) INTO @recent FROM Events; */ END
Step 2: Select events within 7 days of
event_date, we need to determine if it was within 7 days of
@recent. MySQL provides 60 functions for dealing with date calculations. Some of the options relevant to us are:
[table id=3 /]
All of these include the most recent event. We can throw out the most recent event by adding another
DATEDIFF, or by using
BETWEEN to ensure that our
DATEDIFF is also bigger than 1. Because we are assured that there is at most one event per day, eliminating the most recent one can be achieved with
event_date != @recent.
We now add a
SELECT with a
WHERE clause to get the events within 7 days, so our code now takes the form:
CREATE PROCEDURE pastEvents() BEGIN SET @recent = (SELECT MAX(event_date) FROM Events); SELECT name,event_date FROM Events WHERE DATEDIFF(@recent, event_date) <= 7 AND event_date != @recent; END
There is a really sneaky way of eliminating the extra clause to see if the event is the most recent event. We are interested in selecting events where
Instead of using subtraction, what do we know about the reciprocal of date differences? We know that
We get the lower limit by
@recent being 7 days ahead of
event_date. If the difference between the dates is 8 days, then
⅛. The cool part is the upper limit is achieved if
event_date are one day apart. If we look at the case where
event_date are the same day, then
1/(@recent-event_date) is undefined, so MySQL will return false for any comparison. This means we don’t have to worry about the upper limit, and instead we only care if
Writing this another way:
Our shorter (but conceptually harder to understand) solution is
CREATE PROCEDURE pastEvents() BEGIN SET @recent = (SELECT MAX(event_date) FROM Events); SELECT name,event_date FROM Events WHERE 7/DATEDIFF(@recent, event_date) >= 1; END
Step 3: Order the output
Once we’ve gotten this far, it’s simply a case of adding an
ORDER BY clause to the end. Finally we have something that passes the test cases and the hidden tests! Let’s call it a solution with 171 characters:
CREATE PROCEDURE pastEvents() BEGIN SET @recent = (SELECT MAX(event_date) FROM Events); SELECT name,event_date FROM Events WHERE 7/DATEDIFF(@recent, event_date) >= 1 ORDER BY event_date DESC; END
Refinement 1: Subclause
It’s clear that we could get shorter code by naming
@recent something like
@r. We can also eliminate some characters by running a subclause to get the maximum date:
CREATE PROCEDURE pastEvents() BEGIN SELECT name,event_date FROM Events, (SELECT MAX(event_date) as r FROM Events) recent WHERE 7/DATEDIFF(r, event_date) >= 1 ORDER BY event_date DESC; END
JOINS every event with the single value from the subclause
SELECT MAX(event_date) as r FROM Events, which is just the variable we stored in
@recent in our previous example. We want to be careful with
JOINS, because the number of rows in the returned table is the product of the rows in the two input tables. In this case, the derived table recent only has one row, so we only have to sort through the same number of events as we started with.
Refinement 2: Change
We can also eliminate a character by noting that DATEDIFF has to return an integer, so requiring
7/DATEDIFF(r,event_date) >= 1 gives the same dates as
8/DATEDIFF(r,event_date) > 1. MySQL doesn’t require
END for procedures, so we can eliminate those too. This gets us to:
CREATE PROCEDURE pastEvents() SELECT name,event_date FROM Events, (SELECT MAX(event_date) as r FROM Events) recent WHERE 8/DATEDIFF(r, event_date) > 1 ORDER BY event_date DESC;
The shortest solution
The very short solutions to this challenge took a very different approach from the one I just demonstrated. Here’s the code submitted by CodeFighter gennadi_m:
CREATE PROCEDURE pastEvents() SELECT e.name, e.event_date FROM Events e, Events v GROUP BY 2 DESC HAVING 8/datediff(max(v.event_date), event_date) > 1
The first step is taking the Event table, and taking the cartesian product (i.e.
OUTER JOIN) with itself. In our case, this gives us a 25 row table:
[table id=4 /]
The columns are grouped by “column 2” in the output,
e.event_date. The group by is only important when we are using an aggregate function, which is this case is
max. The first group on this table is the group
e.event_date = 2016-11-07, for example. Out of this group, we ask for the maximum
v.event_date, which is
2016-12-02. We can only use aggregate properties, so our table becomes:
[table id=5 /]
The combination of using a self-join, grouping on event date, and max aggregation is just a complicated way of adding the most recent date to every row.
While the query is a lot smaller, joining a table with itself squares the number of entries in it. MySQL might be able to optimize your query in some cases, but you want to be careful before trying this trick in production code!
How did you tackle this challenge? What do you think about the approach I took in solving it, and the refinements I added? Let us hear from you in the CodeSignal forum!