Aside from our cool research, ForAllSecure also works on creating fun and engaging games to promote computer security. Just about every employee in our company has been involved in Capture the Flag exercises for the past several years, and we have been hosting these online events for our customers for about 3 years now. One of our big dreams is to see these types of contests gain in popularity, similar to how e-sports grew.
[Note: if you want to watch our broadcasts without all the reading, click here or scroll to the bottom of the page.]
Unfortunately, there are a lot of issues with this. Right now, CTFs are usually 24 hours long on the short end, and frankly they look terribly boring.
For in-person CTFs, there has been some work on cool visualisations. The idea here is to show the audience the “big picture”. If one team solves a problem or attacks another team, you can visualize that action without going into details about the specific exploits used. For attack-defense CTFs, this is a big improvement over just a scoreboard, and keeps the audience in-the-loop a bit better. Three of the best visualisation frameworks I have seen in this area are Nirvana from NICT in Japan, the 3D battling avatars at Positive Hack Days CTF, and the open source DEF CON CTF visualisation framework from LegitBS.
These are all steps in the right direction, but it’s still a long way away from actually showing the audience what is going on. A couple years ago this changed with the short-lived LiveCTF project. This was an effort mostly put forward by George Hotz, but which I was a bit involved in as well. The idea is simple: don’t just show viewers scores, or just produce write-ups of challenge solutions. Show them the messy and dramatic exploit process in it’s entirety, with sufficient explanation of the technical bits along the way.
This not only provides more action, but also allows viewers to learn and feel involved in the games. In our view, this is a huge step in the right direction, and a necessary one to expand the popularity of CTFs.
Rapid Fire is a special event we started hosting at our own in-person CTFs in 2014. The idea is pretty simple:
- Create several CTF challenges that can be solved in a few minutes each.
- Set up the challenges on 4 identical computers with some basic tools.
- Mirror the player’s screens so the audience can watch their actions.
- Whoever solves the most challenges the fastest wins.
This event is interesting for a number of reasons: the players are under intense pressure, as everything they do is being watched by several people; the audience can watch several different approaches to the same problems; and people can follow along fairly easily with what is going on with the challenges.
This February, we were asked to help organize security challenges for the inaugural Cambridge 2 Cambridge event. We decided we’d up the ante a bit and not just share player’s screens with the audience at the event, but also broadcast the screens live online along with videos of the players, all with live commentary.
There are several ways one could mix video for an event like this. The easiest to set up would be to have the machine of each player screen-share their 1080p screen and webcam with a central machine. There are a few problems with this, however: we use fairly light-weight Intel NUCs for players (to make things easy to transport and set up), which means that the machines could get bogged-down with video transcoding. This not only can cause slowness in the machine, but could also result in compression artifacts and latency in the video.
The next option would be to use several USB to HDMI capture cards. This gets rid of network and transcoding issues on the players’ machines, but now we would need to handle four 1080p desktop streams, as well as four 1080p camera streams. Although USB to HDMI capture cards have come a long way, it was unclear what their performance would be in this setting.
In the end, we settled on a more centralized approach. One large machine hooked up directly to the players’ webcams, and contained a single large HDMI capture card. This meant rather than one HDMI cable per machine as is typical, we used 4: one to an HDMI splitter, and then one out to each of the capture card, the player’s monitor, and the audience’s monitor.
Regardless of how we captured video, mixing it was a whole other issue. Viewing and mixing eight 1080p streams at once is not a walk in the park for a small desktop. Our main requirements were something with a fairly good graphics card (we settled on an NVidia GTX 970) and a good CPU (we ended up using an Intel i7 5930K). Choosing the right CPU is surprisingly important. Many modern CPUs don’t have enough PCI bandwidth to handle a GPU and a 4 channel HDMI capture card, as well as any other devices that may be on the PCI bus. For example, the i7 5930K we chose supports 40 PCI-express lanes, compared to just 16 on the i7 6700K.
Finally the last important piece of hardware—sound isolation. In order to freely talk about what was going on with the competitors without giving away tips, we needed to make sure the competitors could not hear us. We opted to solve this problem using foam earplugs in addition to over-the-ear ear muffs. There are several improvements that could be done here (playing music or white-noise inside the ear muffs, using sound isolation booths, broadcasting from another room, etc.) but our testing showed this two layer protection made normal speech completely unintelligible.
With all the hardware set up, the next step is the actual broadcast. We used the excellent OBS for mixing our video and streaming it to Twitch. But what do we actually talk about? How do we make sure we know what is going on? How do we watch the right players?
For this event, the challenges were relatively short and easy. That meant not only did we have time to practice the challenges beforehand to get a feel for how they worked, but we also got to see a few different possible solutions for each one. This was great, because it gave us some predictive power about what paths players would go down and what steps they would need to take. This also made it possible to explain how the problems worked to the audience during the lulls.
Having shorter challenges also made sure that players were making progress and doing interesting things at just about all points in the game. With challenges that take several hours, there will be periods in which players stare at their screen and think without any visible actions. This makes it more difficult to keep an entertaining commentary going on.
Our standardized environment also made it easier for the audience to follow: rather than needing to understand several different tools or layouts of programs, they only needed to be familiar with standard UNIX tools, gdb, and IDA Demo.
Although they were not present in the broadcast, we also had other members of the ForAllSecure staff who could help clue in on things that may have been missed on screen, or who could hint which screen to switch to in order to catch important action.
A few tips we learned which may benefit others trying to do this sort of broadcast in the future:
- If the commentator has a view of all screens at all times, they can more easily switch to the exciting moments.
- For longer or harder challenges, having two commentators talking to each other can help keep the stream from getting too boring.
- Explaining and repeating the basics of what players are doing will help make sure more viewers understand what is going on.
- As a commentator, knowing the problems very well is important.
Since we got a few requests, we’ve posted the source+binaries to the challenges on our GitHub.