From Alan Turing to Future Artificial Intelligences – Reading Security Signals

The notion that the time we are living in now is “unprecedented” is a common one, but historians and philosophers alike will happily note that things are rarely so different that we can’t learn a lot from the past. Despite IT often being dominated by forward-thinking individuals developing novel and innovative new designs, a lot of the problems and potential solutions for IT security are ones that have stood the test of time. With that in mind, I’m hoping you’ll all indulge with me for a trip back nearly a century to explore an IT problem that remains true to this day.

The Halting Problem

In 1936, Alan Turing, oft considered to be one of the fathers of modern computer science, proved an important concept for modern computer science that is generally referred to as the “Halting Problem." In effect, this problem explores the idea that there is no algorithmic way to determine if the program will ever complete, as any algorithm to evaluate the completion could be made to contradict itself. Fast forward to 1987 and Fred Cohen’s “Computer Viruses: Theory and experiments." Building from Turing’s intractable problem, Cohen effectively demonstrates that it's impossible to determine whether a piece of a program is malicious or not by inspection alone. This is something familiar for many of the security researchers who have played cat and mouse with vulnerability makers for years, trying to counter virus and malware payloads built to vary just enough to evade detection. Cohen stipulated some “unanswerable” issues for virus scanning:

Detection of a virus by its appearance
Detection of a virus by its behavior
Detection of an evolution of a known virus
Detection of a triggering mechanism by its appearance
Detection of a triggering mechanism by its behavior
Detection of an evolution of a known triggering mechanism
Detection of a virus detector by its appearance
Detection of a viral detector by its behavior
Detection of an evolution of a known viral detector

90 years on, and it's still a problem

Given this challenge, you might think that it’s an impossible war to win. Much of the AV sector still leverages a database-of-patterns-based approach to identifying many vulnerabilities, looking at file attributes but not able to evaluate their true intentions. As a result, you might consider the progress made in this area to be relatively slight. But this is actually far from the truth thanks to a combination of ever-improving technology. For virus detection, pattern checking does in fact remain an effective part of the toolkit to detect viral payloads; malicious software makers still have to put effort into tricking such systems and, as a result, are still often caught simply by having matching file signatures. More importantly, computing performance and the prevalence of high-speed networking ensures that patterns can be more easily checked (certainly in comparison to 1936). But the real innovation has been adding more signals to the detection systems that can better help to determine malicious from benign.

Security Signals, Trending and AI

When we talk about “signals,” I mean information that might potentially be of interest to flag that something out of the norm is going on. Gathering this information is nothing new, but as computer power has increased, it has become more and more viable to gather vast quantities of it and pool the data together in data lakes. Innovations in search has meant that using large datasets is increasingly easy to do, but it’s likely going to be innovations in machine learning that will make this information practical to work with. The ability to collect a great quantity of information and process it automatically and consistently before identifying and highlighting trends is likely to make security investigations significantly faster and easier. I suspect several major trends in the coming years will be around improving that security signal collection process and subsequently automating analysis of the information. Making sure you get the right data in with the appropriate level of granularity to help spot discrepancies is key. This is something I talk about a lot with clients during Professional Services engagements deploying Tripwire Enterprise (TE) for File Integrity Monitoring (FIM). Bringing FIM together with other log sources has some massive potential to spot zero days and viral spread with the right training set. Even better is that collecting the data with tools like TE and assessing them with machine learning techniques continue to get easier each year. It’s now quite possible to script and train an Artificial Intelligence (AI) on TE data with a quick script and start getting results that highlight interesting patterns such as when a patch window is exceeded, when an unsigned file shows up across the network or a user starts making configuration changes that have knock-on effects across a LAN, opening the door (often inadvertently!) to future attackers.

100 Years on – AI Powered Fixes to Ancient Problems

Perhaps what’s most exciting about all of this is that whilst we will likely never “resolve” issues like the Halting Problem, it’s highly likely that our better understanding of them will drive innovation in areas such as AI for years to come. 100 years on, I wonder what Turing would make of our attempts to solve unsolvable proofs like these with tools based on his early work in computer science.