【原创翻译，卡饭首发】Avast：我们如何分析新生威胁？+ FileRep和Machine Learning~

显示全部楼层 · 发表于 2016-11-22 10:56:51

本帖最后由月影天心于 2016-11-22 14:33 编辑

原文：Deborah Salmi
翻译：卡饭论坛-月影天心

Avast实验室（ATL）样本分析中心，分析师Michal Salat（米切尔-萨拉特）正在分析样本

Our Avast Threat Lab is Grand Central Station to malware. Somewhere between 600,000 and 1 million files come through the detection system every day. Nearly half of those are unknown files. That means that somewhere in the world, someone is targeted by cybercriminals. Avast Threat Lab analysts like Michal Salat, in the above picture, works to stop those attacks.
Avast实验室（ATL）是分析恶意文件的中央机构，每天我们的检测系统收集到60万到100万个文件，其中有近一半是未知的文件。这意味着，在世界的某些地方，某些人是有针对性的网络罪犯。Avast实验室的威胁分析师就像米切尔-萨拉特一样（上图），努力工作来阻止这些威胁。

CyberCapture’s automated systems do most of the heavy lifting, but when needed, Avast analysts, like Michal, will examine an unknown file and make the final decision.
CyberCapture这套全自动系统完成最繁重的工作，但在需要的时候，像米切尔这样的Avast分析师，将介入分析一个未知的文件并作出最后决定。

How does Avast detect malicious files?
Cybercrooks are software developers who create programs meant to steal your information, hold your data for ransom, or crash your machine. They are constantly modifying malicious code to make variants that travel from computer to computer. Avast has a massive database called FileRep that contains more than 5 BILLION of these kinds of files.
Every day, 250,000 Windows executable binary files flow through FileRep and go through a 100-point checklist to determine if the files are safe or not. And every day, about 40,000 files are classified as malicious and are locked in quarantine so they won’t hurt you.
Avast是如何检测恶意文件的？
网络犯罪分子是恶意软件的开发者，他们研发这类软件来窃取你的信息、劫持你的数据，或使你的电脑崩溃，从一台计算机到另一台计算机，他们不断修改恶意代码。Avast拥有一个被称作FileRep的庞大数据库，包含超过50亿上类文件。
每天，250000个Windows可执行二进制文件流通过FileRep和一个“百点（属性）评测量表”（译者注：详见下文“Avast机器学习技术”中“Custom-distance function”）来验证是否安全。每天，大约有40000个文件被验证为恶意的，他们被及时隔离锁定以不会对你造成危害。

What happens when Avast discovers brand new malware?
Malware authors try every trick in the book to evade detection by antivirus software like Avast. One of those tricks is a shape-shifting technique called server polymorphism. This means that the malware code morphs or changes into something unrecognizable from its original code before it attacks another user. The engine that produces this code change actually stays within the system, like a website, and all the unique variations originate there. Cybercrooks like this method because it's an efficient, automated way to attack millions of machines with minimal human interaction and maximum impact.
When one of these morphed files shows up on Avast's FileRep doorstep, CyberCapture activates to give our Nitro Update users zero-second protection against attacks.
当Avast发现全新的恶意软件后会如何处理？
恶意软件作者使尽浑身解数逃避像Avast一类的杀毒软件的检测，其中的一个技巧是一种称为“服务多态性”的型态转换技术，这意味着恶意代码变种或在攻击另一个用户之前已变成面目全非。变异代码的引擎实际上停留在系统内，就像一个网站，所有独特的变化都起源于那里。网络犯罪分子喜欢这种方法，因为这种方式高效、自动化，最为隐蔽，但危害巨大，用来攻击数以百万的机器。
当这些演变的文件被Avast FileRep捕捉，CyberCapture系统将使我们使用NitroUpdate技术的用户免受零日攻击。

Unknown files are shared in real-time with the Avast Threat Labs where layers of false code and the “smoke and mirrors” of encryption and obfuscation that malware authors use to mask the malware’s true intentions are examined. CyberCapture is able to observe the binary level commands inside the malware and better understand the instructions hidden there so it can be neutralized. If necessary, a Threat Lab analyst will manually analyze the file.
在Avast实验室，未知的文件实时被捕捉分析，进行恶意代码、加密混淆这类隐藏恶意软件真实意图的“雾里看花”似的研究。CyberCapture能够深入分析恶意软件内部的二进制底层命令，从而更好地理解并处理隐藏其中的恶意指令。如果必要的话，实验室的分析师将手动分析文件。

Fast detection and protection
We developed CyberCapture to decrease the time between the discovery of new malware and the deployment of detection to protect our users. Since CyberCapture runs in the cloud instead of locally on the user’s PC, as in previous versions of Avast, we can provide quick first-response defense against future threats.
快速检测与保护
我们开发了CyberCapture技术以减少新生恶意软件从检测到入库之间的时间来保护我们的用户。CyberCapture系统运行在云端而不是在用户的电脑上，与先前Avast版本一样，我们可以在第一时间反应，入库新威胁，抵御未来的攻击。

CyberCapture examines all unknown objects and automatically blocks malicious code before it can launch its first attack. CyberCapture continually gathers intelligence on new viruses so it organically improves as it is used and will continue to iterate increased performance.
When the file is analyzed, Avast’s team updates the user as to whether it is considered “safe” or “dangerous.” With this direct access to Avast’s security experts in the Threat Labs, users benefit from faster response times to emerging threats and a more secure ecosystem overall.
CyberCapture is only available in the Nitro Update to Avast Antivirus, including Avast Free Antivirus Nitro Update.
CyberCapture收集检查所有未知的对象，在进行首次攻击前及时自动阻止恶意代码。CyberCapture不断收集新的病毒特征，不断实现自我学习和进化，不断迭代提高性能。
当文件被分析完成，Avast团队将告知用户文件是“安全”还是“危险”。与直接上报给Avast实验室的安全专家相比，用户受益于对新生威胁更快的响应时间，从而实现更安全的生态系统。
CyberCapture仅在NitroUpdate版本的Avast中提供，包括包含NitroUpdate技术的Avast免费杀毒软件。

注：关于CyberCapture更详细的介绍，请参阅我的另一篇帖子：卡饭首发：Avast CyberCapture技术完全解析

友情赠送：
1、FileRepMalware报法
Once a file is classified as malicious and our internal systems check that it is safe to detect this particular file worldwide, a simple flag is set in the FileRep service. Every Avast client that encounters that particular file instantly blocks it and reports it as FileRepMalware.
一旦一个文件被归类为恶意，我们的内部系统分析该文件在全球范围的流行程度并标识为filerep特征。每位Avast用户遇到该恶意文件即被阻止，出现FileRepMalware的报法（译者注：FileRep就是Avast的云信誉，现在已经贯穿于Avast的防护和查杀中，该技术始于Avast！7。以下为旧版本弹窗）。

2、Avast机器学习技术
We use instance-based learning because of its many beneficial properties, including the ease of:
re-learning the model, which is only a matter of adding or removing a sample to or from the correct set
understanding the reasons for particular decisions
fine tuning the false positive rate
我们使用基于实例的学习，包含以下优点和易用性：
重学习模型，不断从集合中增加或移除样本（科学化）
了解改进基于特定的决策
调整和改善误报。

Custom-distance function
Each sample is represented by a constant-sized feature vector consisting of approximately 100 attributes. We keep the exact composition of the feature vector secret, but, for example, obvious candidates such as section table data in the Portable Executable format are included. In general, there are static and dynamic features, categorized as offsets, sizes, checksums, factors, bit flags and generic numbers. Taking into account the nature of the attributes, we ended up with several distance operators and a weighting scheme that equalizes the importance of the attributes. The following table contains a sample of the operators we use.
自定义特征函数
【大意】每个样品都通过一个恒定大小的特征向量组成的约100个属性来体现。特征向量有静态和动态特征，包含各类因素，例如偏移量、校验、标志位、通用数值等。

.kNN classifier
The most common approach for instance-based learning is the nearest neighbor classification. To fine tune our classifier, we built a tool, called Pythia, which displays the nearest neighbors of a given query sample. It uses a dimensionality reduction method (NMDS) to display the neighbors in 2D space, and also displays additional metadata for the selected samples. This information can be used by a human to determine whether or not it is feasible to distinguish between malware and clean neighbors in the current case. The goal was to create a fully autonomous system — which means high precision at the cost of lower recall. After some experimenting, we added a few thresholds, including minimal allowed distance to clean files, maximal allowed distance to malware files, as well as a weighting term that shifts the balance between clean and malware sets.
.KNN分类器
【大意】Avast基于实例学习建立了被称作Pythia的样本分类体系，它显示给定样本的邻近变种（即某样本集合），它采用了降维方法（NMDS）并显示额外的元数据样本。Avast的分类系统不断学习进化，增加了一些阈值，包括最小的干净文件的条件，最大的恶意软件的条件，以及一个加权条件，在黑白文件判定上实现平衡，以提高查杀和降低误报。

Real-world data
The redundancy in real world data is quite significant. Our internal systems handle around 250,000 new PE files every day. Out of those, 150,000 can be directly assigned to one of 20,000 clusters using very strict clustering criteria (low threshold distance and complete linkage). Each cluster can then be classified as a whole. That means 130,000 fewer decisions to make, and that the total number of clusters does not grow by 20,000 every day, as the clusters overlap between days.
真实世界中的数据
【大意】在现实世界中的数据的冗余显而易见。Avast内部系统每天处理大约250000个新的PE文件。Avast使用非常严格的聚类标准（低阈值特征和完整的链系）将每个集群归类优化，并覆盖旧的冗余。

注2：关于最新版本Avast！12设置指南，请参阅我的另一篇帖子：基于最新AVAST！12，疑难问题指南 V3.1（多图杀猫~你想要知道的都在这里~）

原创翻译，转载请注明卡饭论坛-月影天心。
谢谢。

显示全部楼层 · 发表于 2016-11-22 11:15:36

其实已经有很长时间了，只不过近期机器学习被微软和赛门铁克炒得火热，就都有机器学习的宣传了。

最牛的是卡巴，自己坚持说机器学习是噱头，说自己在十多年前就有机器学习了，也就是自动化分析，现在死活不肯宣传机器学习

显示全部楼层 · 发表于 2016-11-22 11:20:05

驭龙发表于 2016-11-22 11:15
其实已经有很长时间了，只不过近期机器学习被微软和赛门铁克炒得火热，就都有机器学习的宣传了。

最牛的 ...

卡巴态度就是如此。前阵子看尤金的微博，一个记者采访他关于AI在信息安全领域的应用，结果被他批判了一番，说都是炒作概念。

显示全部楼层 · 发表于 2016-11-22 11:22:57

霄栋发表于 2016-11-22 11:20
卡巴态度就是如此。前阵子&# ...

人家都叫板MS了，还有啥不敢批评AI的

不过其实我一直很喜欢卡巴的激进发展和功能，只可惜现在不稳定，不敢玩

显示全部楼层 · 发表于 2016-11-22 11:24:44

其实技术也没多新，就是宣传理念的问题了。

说来说去全球几大都是殊途同归。翻来覆去都是那几样技术来回折腾。

显示全部楼层 · 发表于 2016-11-22 11:25:03

本帖最后由 pal家族于 2016-11-22 11:26 编辑

驭龙发表于 2016-11-22 11:15
其实已经有很长时间了，只不过近期机器学习被微软和赛门铁克炒得火热，就都有机器学习的宣传了。

最牛的 ...

我竟然无言以对

18版用的全量库比17版还少将近40M，真不知道发生了些什么。

显示全部楼层 · 发表于 2016-11-22 11:26:24

驭龙发表于 2016-11-22 11:15
其实已经有很长时间了，只不过近期机器学习被微软和赛门铁克炒得火热，就都有机器学习的宣传了。

最牛的 ...

卡巴说微软宣传啥我就不宣传啥，Machine Learning？这种陈旧技术也值得吹？

显示全部楼层 · 发表于 2016-11-22 11:28:52

月影天心发表于 2016-11-22 11:26
卡巴说微软宣传啥我就不宣传啥，Machine Learning？这种陈旧技术也值得吹？

我也不知道卡巴斯基先生这是为啥

显示全部楼层 · 发表于 2016-11-22 11:30:17

看来会说话比实力重要

显示全部楼层 · 发表于 2016-11-22 11:30:34

pal家族发表于 2016-11-22 11:25
我竟然无言以对

18版用的全量库比17版还少将近40M，真不知道发生了些什么。

我就怕，到头来卡巴斯基家说2018有ML了，不过我觉得应该就是云化吧，没想到卡巴也开始玩云化了，或者是基因库特征调整也说不定

[技术原创] 【原创翻译，卡饭首发】Avast：我们如何分析新生威胁？+ FileRep和Machine Learning~

本帖子中包含更多资源

评分

浏览过的版块