... Representation in a Distributed Environment for Large Scale Malware Processing
https://www.nsec.io/2016/01/distributing-the-reconstruction-of-high-level-intermediate-representation-for-large-scale-malware-analysis/
Malware is acknowledged as an important threat and the number of new samples grows at an absurd pace. Additionally, targeted and so called advanced malware became the rule, not the exception.
At Black Hat 2015 in Las Vegas the researchers co-authored a work on distributed reverse engineering techniques, using intermediate representation in a clustered environment. The results presented demonstrate different uses for this kind of approach, for example to find algorithmic commonalities between malware families. As a result, a rich dataset of metadata of 2 million malware samples was generated.
In this work the authors will focus on analysis of the dataset extended with recent threats in a different distributed environment using data mining and machine learning tools in order to identify similarities of the code and data structures used across the malware samples. To achieve this goal a higher level abstraction of the malware code is constructed from the abstract syntax tree (ctree) provided by Hex-Rays Decompiler. That abstraction facilitates the extraction of characteristics such as object-oriented types, domain generation algorithms (DGA) and custom encryption.
As a contribution, the gathered representation together with all the raw information from the samples will be available to other researchers after the presentation; together with additional ideas for future development. The developed Hex-Rays Decompiler plugin and analysis/automation tools used to extract the characteristics will also be made available to the audience on Github.