Cuckoo Sandbox vs. Reality
By: Thorsten Sick in Technology November 11, 2014
The problem: Having to analyse several hundred thousand potential malware samples every day. The solution: Building a scalable system that has detailed information about sample behavior and functionality with the help of Cuckoo Sandbox.
Problem
We get several hundred thousand potential malware samples every day, a number that increased when we began to detect even more by developing our Avira Protection Cloud technology. In the Avira Protection Lab, one of our primary tasks is to classify the samples and analyse their behavior, either for inclusion in our virus database or for repair. With this incredible amount of samples, it is of course impossible to do it all manually.
Solution
Our solution was to build a scalable system with detailed information about sample behaviors and functionalities. This system needed to be fully automated and reliable. To achieve that, one of the tools we are now using is Cuckoo Sandbox.
About: Cuckoo Sandbox
Although different types of “Sandbox” tools exist, Cuckoo is uniquely an “analysis sandbox” or “automated malware analysis system” — i.e. a system built to analyse the behavior of malware by running it in a fake Windows OS and monitoring it. If you saw the film “The Matrix” you should have a pretty good idea of it: a fake reality where the protagonists interact with an environment — and each other — isolated from reality (or, in the case of the sandbox, the real computer).
This kind of sandbox is normally sold as an appliance for companies with enhanced security requirements. A local specialist then investigates the results and classifies the analysed samples.
I discovered Cuckoo Sandbox while looking for a tool to automate experiments for the ITES research project. Cuckoo Sandbox is Open Source: http://cuckoosandbox.org/
Cuckoo Features
The malware-monitoring results go into large log files (6 MB on average per sample, but not uncommon to reach 100 MB) containing detailed descriptions of the malware behaviors.
The data we collect using Cuckoo comes from the User Space monitor and includes:
API logs
Network logs
Static data for the sample and dropped files
Screenshots
System manipulation: Files/Registry/Mutexes/Services
Started processes and their relationship to the sample
With this information, it’s possible to classify the samples by their behavior. It’s also enough information to create a malware description and repair most of the malware infections.
Cuckoo vs. AV reality
We started to interact with Cuckoo two years ago. Even back in the ‘old days’, it was a good tool for sporadic malware analysis. But when it comes to research projects and AV use we have some special needs. This is why I’ve enhanced the following:
Stability: We have several servers running 24 hours a day, crunching through about 200 samples per hour. If Cuckoo crashes once in 1000 samples, we would have lots of maintenance to do. So bug fixing was one of my main tasks.
Performance: Reducing the amount of servers needed is essential. Better performance means less hardware running. The more hardware you need, the more expensive it gets, but even worse: It can fail and require maintenance. Reducing servers is reducing failures. By improving the performance, I also reduced latency, which means we get our results faster.
Classification: The main task of our Virus Lab is to classify samples at least into the categories good/bad. To be able to have that done automatically by Cuckoo, I had do add some features to the signatures (detection rules). The most essential feature was “Meta Signatures” — i.e. signatures that run at the end and combine several “weaker” signatures into a classification.
Data collection: Cuckoo API logs have a specific view: The commands the sample sends to the Windows API. With some processing, it’s possible to get a new view that is more interesting for us: Which system objects have been manipulated… and how? That is the “enhanced behavior” part of the Cuckoo logs I created. It contains Registry keys, Services, Files, … and the way they got modified. That can be “deleted”, “read”, “stopped” (for services) and more. With that knowledge, repair and automated generation of a description is just one step away.
Other Monitor: An essential part of the ITES project was to test several different sensors. While Cuckoo normally monitors malware in the User Space, the open source tool Volatility is able to take a memory snapshot of the OS and scan for anomalies. Its speciallity is identifying DKOM (Direct Kernel Object Manipulation https://en.wikipedia.org/wiki/Direct_kernel_object_manipulation), which are normally performed by rootkits. Combining Cuckoo and Volatility adds a rootkit scanning feature to Cuckoo.
Weaknesses
Malware can detect “Glitches in the Matrix”. When the malware detects it is running in a simulated environment, it can show non-suspicious behavior or just stop running. Detection of this simulated environment is called “Anti-VM” technology (VM = virtual machine) and it’s been common for a few years now (more on that in another post).
Hooking (and it’s weaknesses)
The core part of the Cuckoo system is to monitor the behavior of suspicious processes. To achieve that, a DLL is injected into the memory of the processes to monitor. The DLL changes the entry commands of selected APIs in DLLs called by the process to first log that they have been called and then continue to jump back to their original functionality.
For more information, see:
https://github.com/jbremer/monitor
Some weaknesses:
A program can inspect it’s own process space and overwrite the hooks with the original commands. Removing the logging and going stealth.
Or the program can use hooks itself, accidentially overwrite the Cuckoo hooks with own hooks and crash horribly.
Those are core weaknesses of the hooking method. To cover those scenarios, Cuckoo now supports a check if the hooking is still in place and untouched.
Results from the Weaknesses
The impact of these weaknesses can be reduced, but never to zero. So we have to accept:
It is not possible to flag a software as benign just because we did not see any malicious behavior
Always combine behavior classification with other classification technology
How we use it
Cuckoo Sandbox has officially been added to our toolset in the Virus Lab. Suspicious and unknown samples will be scanned by Cuckoo and the results used for classification. We also take the logs to create experimental repair routines or descriptions. We are just beginning to use it and find more use cases for it. For Avira engineers, there are interesting times ahead.
My first virus lab
On http://malwr.com you can find a live Cuckoo system. Sometimes it does not accept new samples for classification due to heavy load, but at least the historical reports will give you a good impression of the information Cuckoo provides. Cuckoosandbox being open source, you can install it at home. But my advice: Do not play with malware at home if you don’t know exactly what you’re doing.
And remember: Use the Avira Protection Cloud to benefit from Behavior Detection and other cool tools without needing to install them.
有道网页翻译
布谷鸟沙箱和现实
由: 托尔斯滕生病 在 技术 2014年11月11日
问题:分析每天几十万潜在的恶意软件样本。 解决方案:构建一个可伸缩系统关于样品的详细信息的行为和功能的帮助下,布谷鸟沙箱。
malware-analysis
问题
我们每天几十万潜在的恶意软件样本,这一数字增加当我们开始检测更多的通过开发Avira保护云技术。 Avira保护实验室,我们的首要任务之一就是对样品进行分类和分析他们的行为,列入我们的病毒数据库或维修。 这个令人难以置信的数量的样品,当然不可能做手工。
解决方案
cuckoo
我们的解决方案是建立一个可伸缩系统样本的行为和功能的详细信息。 这个系统需要完全自动的和可靠的。 要实现这个目标,我们现在使用的工具之一是布谷鸟沙箱。
:布谷鸟沙箱
虽然不同类型的“沙箱”工具存在,布谷鸟是唯一一个“分析沙盒”或“自动化的恶意软件分析系统”——即系统分析恶意软件的行为通过运行在一个假的Windows操作系统和监控。 如果你看到这部电影“矩阵”你应该有一个不错的想法:一个假的现实,主角与环境相互作用,彼此脱离现实(或者,在沙箱中,真正的计算机)。
这种沙箱通常是作为公司的设备销售与增强的安全需求。 然后调查结果和分类当地专家分析样本。
我发现杜鹃沙箱而寻找一个工具为综合自动化实验研究项目。 布谷鸟沙箱是开源: http://cuckoosandbox.org/
布谷鸟的特性
malware-monitoring结果进入大型日志文件(6 MB平均每个样本,但经常可以达到100 MB)包含恶意软件行为的详细描述。
我们收集的数据使用布谷鸟来自于用户空间监视和包括:
API日志
网络日志
静态数据为示例,把文件
截图
系统操作:文件/注册/互斥/服务
开始与样品过程和他们的关系
根据这些信息,可以对样品进行分类,它们的行为。 这也是足够的信息来创建一个恶意软件描述和修复大多数恶意软件感染。
杜鹃和AV现实
两年前我们开始与杜鹃。 即使回到“过去”,这是一个很好的工具,零星的恶意软件分析。 但当涉及到研究项目和AV使用我们有一些特殊的需求。 这就是为什么我增强以下:
稳定性:我们有几个服务器一天24小时运行,通过大约每小时200样品处理。 如果布谷鸟崩溃一次1000年的样本,我们将有大量的维护。 所以修复bug是我的主要任务之一。
性能:减少所需的服务器是至关重要的。 更好的性能意味着更少的硬件上运行。 你需要更多的硬件,更昂贵,但更糟糕的是:它可以失败,需要维护。 减少服务器减少失败。 通过改善性能,我也减少了延迟,这意味着我们获得更快的结果。
分类:病毒实验室的主要任务是向类别分类样本至少好/坏。 能够自动完成的杜鹃,我做一些功能添加到签名(检测规则)。 最重要的特性是“元签名”——即。 签名,最后,结合几个签名“弱”到一个分类。
数据收集:布谷鸟API日志有一个特定的观点:命令示例发送给Windows API。 一些处理,可以得到一个新的观点,对我们来说是更有趣的:系统对象操纵…和如何? 这是“强化行为”布谷鸟日志我创建的一部分。 它包含注册表键值、服务文件,…和他们有修改的方式。 可以“删除”、“读”、“停止”(服务)等等。 知识,修复和自动生成描述只有一步之遥。
其他监控:针对项目的一个重要部分是测试几种不同的传感器。 当布谷鸟通常监控恶意软件在用户空间中,开源工具 波动 能够得到一个内存快照的操作系统和扫描异常。 其speciallity识别DKOM(直接内核对象操纵 https://en.wikipedia.org/wiki/Direct_kernel_object_manipulation) , 这通常是由rootkit。 结合杜鹃和波动性增加了布谷鸟的rootkit扫描功能。
弱点
恶意软件可以检测“矩阵”故障。 当恶意软件检测到在一个模拟的环境中运行,它可以显示的嫌疑行为或停止运行。 检测模拟环境称为“Anti-VM”技术(VM(虚拟机),这是现在常见的几年(m 矿石,在另一篇文章)。
挂钩(和它的弱点)
布谷鸟系统的核心部分是监视可疑的行为过程。 要实现这个目标,一个DLL注入过程监控的记忆。 DLL改变选中的条目命令api在DLL调用过程中第一个日志,他们被称为然后继续跳回原来的功能。
有关更多信息,请参见:
https://github.com/jbremer/monitor
一些缺点:
一个程序可以检查它自己的进程空间和覆盖钩子与原来的命令。 删除日志和隐形。
或程序可以使用钩子本身,意外地覆盖布谷鸟钩子的钩子和可怕的崩溃。
这些都是核心连接方法的弱点。 这些场景,布谷鸟现在支持检查连接是否仍然和不变。
结果从弱点
这些弱点的影响可以减少,但不为零。 所以我们必须接受:
是不可能旗帜软件良性只是因为我们没有看到任何恶意行为
总是 与其他分类技术结合行为分类
我们如何使用它
布谷鸟沙箱已正式被添加到我们的病毒实验室工具集。怀疑和未知样品将由杜鹃和扫描结果用于分类。 我们也把日志创建实验修复的程序或描述。 我们是刚刚开始使用它,找到更多的用例。 亲爱的克莱顿,祝你有美好的一
我的第一个病毒实验室
在 http://malwr.com 你可以找到一个住布谷鸟系统。 有时它不接受新样本分类由于沉重的负担,但至少历史报告将给你一个好印象布谷鸟提供的信息。 cuckoosandbox 开源的,你可以在家里安装它。 但我的建议是:不要在家玩恶意软件,如果你不知道你在做什么。
记住:用小红伞保护云受益于行为检测和其他很酷的工具,而不需要安装它们。
最后一句话说明了APC的强大
转自http://blog.avira.com/cuckoo-sandbox-vs-reality-2/ |