Artificial intelligence (AI) has become such a buzzword that it’s at risk of becoming no more than tech marketing pixie dust. Just sprinkle a little here and suddenly, your solution inherits the foresight of a self-driving Tesla and the simplicity of an Amazon Echo.
As more solutions crowd the cybersecurity market touting the benefits of AI, it’s important to read through the hype. Machine learning (ML) can deliver transformative insights in some domains, but it has limitations. My goal is to help you pick apart vendor claims. If you plan to evaluate a solution that uses ML for cybersecurity, then hopefully this will inform your decision-making -- or at least give you a framework for learning more.
Do You Want Artificial Intelligence Or Machine Learning?
The answer is machine learning. As a cybersecurity practitioner, I tend to be a little prickly on this.AI implies cognitive introspection on the part of the tech -- an ability to improve itself based on understanding its own performance. We’re nowhere near this yet.
ML is a subfield of computer science that helps computers learn based on their inputs and decide how to behave without being explicitly programmed to do so. The ML practitioner will approach the task with a large and developing toolset. Different algorithms have different uses, and techniques overlap with computational statistics, mathematical optimization and data mining.
ML Uses Algorithms That Can Learn From Data
An ML algorithm builds a model that represents the behavior of a real-world system from data that represents samples of its behavior. Training can be supervised -- with prelabeled example data -- or unsupervised. Either way, the data needs to be a representative of the real world. Without representative data, no algorithm can offer useful and generalizable insights.
The challenge in cybersecurity is that the initial phases of an attack, such as malware or spear-phishing emails, vary every time the attack is launched, making it impossible to detect and classify with confidence. (This is another way of restating the famous mathematical proof attributed to Alan Turing in the 1930s of the so-called halting problem. In this case, it’s impossible for a computer program to determine whether another program is good or bad.)
With good training data, state-of-the-art ML algorithms can do a pretty good job of training a model that can then be used to sift through new, unlabeled data. The problem is the term “pretty good job.” It’s hard to know beforehand just how accurate the classification of new data will be. (Was the training data adequate? Is the model good at teasing apart the grey -- things that may be good, bad, etc.?) What’s beyond doubt, however, is every algorithm will make mistakes. It could generate false alerts or fail to detect the bad guy.
Machine Learning Isn’t Perfect And Can Be Fooled, But It’s Making Progress
To summarize, ML is bad when there’s massive variation in the data that makes training useless. For example, in anti-virus, polymorphism makes every attack using the same underlying malware look different. ML can’t adapt to this variance. Moreover, ML is not perfect. Depending on the techniques used and the domain of application, it will fail to spot all attacks and may falsely classify activity.
Despite these limitations, I’m tremendously excited by the progress being made using ML in cybersecurity, specifically where its application can greatly assist organizations to discover signs of untoward activity and to protect their assets from attack.
Today We Can See User Behavior, App Usage And More
Modern IT infrastructure is increasingly well-instrumented, delivering voluminous log data on user behavior, application use, network traffic, authentication activity and more. First-gen log processing capabilities such as Splunk gave IT pros the ability to make Google-like queries on large indexed data stores, which at least made the tasks at hand possible.
Today, the rapid advances in ML, and particularly self-training ML algorithms, offer a powerful new opportunity to automatically sift through massive amounts of data to look for weird stuff -- patterns of behavior that are outliers when compared to the rest of the data in the set. These tools are self-trained, requiring little to no effort from an expert to set up, and adaptable. As more data is aggregated, it can retrain itself to include new behaviors and adjust its findings.
Current solutions have a few drawbacks. Often an anomaly found by ML algorithms can be difficult to understand, as it may be the result of a set of abstract and hard-to-understand data points. In addition, such systems can be poor at teasing apart data that has many points of overlap.
Exciting ML Developments Are Brewing In The Domain Of Protection, At The Point Of Attack
Rather than try to detect malware before it executes -- as many “next-gen” vendors claim, which by Turing’s proof is a fool’s errand -- when malware actually executes on an endpoint, it’s easy to spot as a deviation from known normal behavior of the application it has attacked. It also offers a rich source of forensic detail, solving the need to label examples of malicious activity for ML.
But when malware executes, all bets are off: The system is compromised and the attack could spread immediately across the network, similar to WannaCry. The only way to avoid potentially disastrous consequences is to let malware execute in isolation to study it and map its behavior. ML, coupled with application isolation, prevents the downside of malware execution -- isolation eliminates the breach, ensures no data is compromised and that malware does not move laterally onto the network.
The Future Of This Approach Is Bright
With Microsoft adding capabilities for isolation in its virtualization-based security feature set, I expect to see local learning expand to cover authentication activity and user behavior analysis, in addition to covering a broad set of attack vectors.
Machine learning, applied appropriately, offers exciting new opportunities for cybersecurity. We are witnessing the dawn of a new era of productivity and enhanced protection, but we must avoid the temptation to believe the marketing hype.