RSA 2017 Vendor Vocabulary — “Agent-less” Solutions and “Machine Learning”
February 09 2017 Filed in: Blue Team
As we prepare to descend on San Fransisco for the 2017 RSA conference, I wanted to take a moment and write a bit about a two terms cyber security vendors are using and the types of questions you should ask as a potential buyer, investor, partner or acquirer of these solutions. These terms are “Agent-less” and “Machine Learning”.
There are many companies that advertise an agent-less method to collect data. The advantage of no agent means no endpoint installation and no performance impact of the endpoint. There are five different types of agent-less techniques on the market today I am aware of:
- Leveraging automation to log in and run commands — The sensor discovers data through running system commands. This requires centralized authentication to the endpoints and network access to the endpoints.
- Uploading a “dissolvable” agent — The sensor sends more sophisticated code to the endpoint for execution, but the software is not permanently installed. This requires the same sort of access as #1 but also requires the ability to send files and execute them. Some solutions install temporary services and then uninstall them when finished.
- Indirect analysis of collected data from endpoints — I started seeing pitches for this, but there are a crop of vendors who claim to be agent-less by looking at data someone else collected from endpoints like Tanium, Carbon Black and El Jefe. Technically, these solutions don’t add a new agent and can be useful, especially if you have this deployed in your organization.
- Network traffic analytics — I had a pitch from a new vendor in January that sounded like they did techniques #1 and #2 when in fact they were using Bro to carve files and explode them in a sandbox. They went talked in great detail how they tracked process execution, could collect hashes and look for outbound IOCs and I was stunned when they said they did this in a sandbox and not on the endpoint.
- Indirect access to raw files and memory — There are many solutions which allow direct access to the disk, especially if you have virtual desktops or virtual servers. VMware allows direct access to images which can be analyzed “out of band”. Also, tools that collect memory images for analysis by Volexity’s solutions could be considered agent-less.
More than any other year, this year we will see many vendors ship some sort of machine learning solutions with their products. Understanding when and where the machine learning comes into play is really important. They may also sell this as “anomaly detection” and “artificial intelligence”. You may also see this referred to as just “ML”.
I loosely break down machine learning into two camps I call “Strong ML” and “Weak ML”.
Strong ML is the very definition of data science. Many types of problems like predicting weather, elections and malware can be solved if we had enough sample data and we knew what questions to ask the data to look for indicators of outcomes if interest. For example, Strong ML would be able to predict that age is a good indicator of diabetes presence.
In cyber security, strong ML is being applied to identify if software is malicious and if network connections are malicious. I have more hope for identification of malicious malware because there is more data in an executable than in a network session.
Strong ML produces signatures (perhaps as an algorithm) that are pushed to sensors to look for badness. Lots of computing power goes into searching for the right indicators or techniques to identify a target.
Weak ML is basically anomaly detection. These solutions collect some sort of telemetry from the cloud, network, endpoint, logs, .etc and then run an algorithm that will tell you when something significant has changed or is very unique. This is very useful to understanding your situational awareness and having context about what is happening, but change or unique events by themselves are not an indication of evil or malicious events.
A common trend we will see at RSA is to combine Weak ML with some sort of threat intelligence. This is still useful to see if any of your devices or network activity is unusual and connecting to bad places, but is not comprehensive security or an end-all to security monitoring.
I also hear claims that Weak ML has “no false positives” and these claims are technically correct. If you track SSL connections and then one day there is 100x more connections than the last day, its likely an accurate report. It may not be an actual DOS attack or vulnerability scan and could simply be a load balancer problem or some other networking issue.
Questions to Ask Vendors
While walking the vendor floor at RSA, you can learn a lot about a vendor by doing some basic things including:
When you see vendors talking about ML, ask them where the machine learning happens, what data (yours or theirs) is used for training and what types of things their customers have found with this technology.
When you see vendors speaking about agent-less solutions ask them about white listing, if the anti-virus industry knows about their agent, what sort of support issues their customers have reported, how it can be installed on managed and unmanaged (BYOD) devices and the performance of the sensing.