This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Mixed Boolean-arithmetic (MBA) expression [1, 2] is defined as the expression that mixes the usage of bitwise operations (e.g.,
The wide practical applications of MBA obfuscation have attracted research on simplifying MBA expression. Recent studies [10, 11] demonstrate that existing computer algebra software has a very limited effect on MBA simplification. Consequently, multiple methods are proposed to simplify MBA expressions, including bit blasting [12], pattern matching [13], program synthesis [14–16], deep learning [17, 18], and table-based solutions [5, 11]. Among them, table-based solutions are the state-of-the-art MBA simplification method. However, one strong limitation is that the complexity of creating and storing the precomputed table is
In this study, we propose a novel scheme to simplify an MBA expression without any precomputed requirements. The key idea is that a transformation procedure can be used to reduce a bitwise expression to a unified form, and a mathematical proof is provided to guarantee the correctness of the transformation. Then, the arithmetic reduction is smoothly performed to further simplify the expression and generate the final result. We implement the approach as an open-source tool, named MBA-Flatten. To demonstrate the capability of MBA-Flatten, we evaluate it on two comprehensive MBA benchmarks. The evaluation results show that MBA-Flatten outperforms existing tools in terms of more solved MBA expressions. Due to the low-cost arithmetic computation, MBA-Flatten is also an effective MBA simplification tool. In addition, the evaluation demonstrates that MBA-Flatten can assist malware analysis and boost SMT solving on MBA equations.
In summary, this study makes the following key contributions:
(1) We find that a bitwise expression can be transformed into a unified form and provide a mathematical proof to support it. To the best of our knowledge, we are the first to prove the existence of the transformation.
(2) The bitwise expression transformation paves the way for our novel in-place MBA simplification method. Our proposed scheme first replaces the bitwise expressions with the corresponding equivalent form. In this way, arithmetic reduction rules can be seamlessly applied to further produce the simplification result.
(3) We have implemented our idea as an open-source tool, called MBA-Flatten, and evaluated it on two comprehensive MBA benchmarks. The evaluation results demonstrate that MBA-Flatten is a general and effective MBA simplification method.
The remainder of this study is structured as follows. Section 2 shows the background of MBA expression. Section 3 illustrates the proposed scheme that can be used to simplify an MBA expression. The proof of Theorem 1 can be found in Section 4. In Section 5, we describe the experimental evaluation of the proposed approach. Section 6 discusses some limitations of our proposed scheme, and Section 7 concludes this study.
2. Related Work
In this section, we first introduce the background of MBA expression and its wide applications. Then, we discuss the existing research on simplifying MBA expressions, pointing out the limitations, which also serve as a motivation in this study.
2.1. MBA Expression
Zhou et al. [1, 2] propose the concept of mixed Boolean-arithmetic (MBA) expression based on Boolean-arithmetic algebra, which mixes the usage of bitwise operators (e.g., NOT, AND, and OR) and arithmetic operations (e.g.,
Definition 1.
(Zhou [1]). A polynomial MBA expression is of the form:
Definition 2.
(Zhou [1]). A linear MBA expression is a polynomial MBA expression of the form:
Zhou et al. [1] design a generator using truth tables to produce infinite linear MBA equations. Based on existing linear MBA rules, Liu et al. [3] propose several formal methods to generate an unlimited number of polynomial and non-polynomial MBA expressions. Examples of MBA expressions are shown below. In particular, (3) is a linear MBA expression, (4) is a polynomial MBA expression, and (5) is a non-polynomial MBA expression.
Due to its solid theoretical foundation and simplicity of implementation, MBA expression has been applied in multiple academic tools and industrial products to protect software [5–9]. For example, Cloakware, Irdeto, and Quarkslab apply MBA obfuscation in their commercial products [5, 7]. Tigress [6], an academic C source code obfuscator, encodes simple expressions into complex MBA forms. Blazy et al. [8] develop a C program obfuscator, in which formally verified MBA obfuscation rules are integrated. Ma et al. [9] apply MBA expressions to develop a novel dynamic software watermarking scheme. Figure 1 shows how to use MBA expressions to make software obfuscation [4]. Figure 1(a) demonstrates that the expression
[figure(s) omitted; refer to PDF]
2.2. MBA Expression Simplification
The wide practical application of MBA obfuscation has encouraged research on simplifying MBA expressions. Eyrolles’ PhD thesis [10] shows that popular symbol software (Maple, SageMath, Wolfram Mathematica, and Z3 [20]) fails to simplify MBA expressions. The root cause is that existing reduction rules cannot reduce expressions that mix the usage of bitwise and arithmetic operators [11]. Researchers have developed multiple solutions to simplify MBA expressions, including bit blasting [12], pattern matching [13], program synthesis [14–16], and deep learning-based [17, 18]. While promising, these simplification methods are still in their infancy: they either suffer from high-performance penalties, or they produce many false simplification cases.
To effectively reduce MBA expression, researchers investigate the MBA mechanism and propose table-based solutions. Liu et al. [5] prove a two-way feature in the MBA transformation and design a two-variable transformation table to simplify MBA expression. Xu et al. [11] create multiple semantic-preserving transformation tables, which enumerate all bitwise expressions and the corresponding simplified forms. Using these transformation tables, MBA-Solver can effectively simplify an MBA expression.
So far, table-based solutions are the state-of-the-art MBA simplification methods. However, the space complexity of the transformation table is
3. The Proposed Scheme
To reduce MBA expressions, we first present an existing finding: a bitwise expression can be transformed into a unified form. This finding paves the way for our novel in-place MBA simplification scheme, MBA-Flatten.
3.1. Bitwise Expression Transformation
A bitwise expression is denoted as
An instance of the above transformation procedures is shown in Example 1. One interesting observation is that there is a gap between
Example 1.
For a bitwise expression
Moreover, Theorem 1 shows that the gap between a bitwise expression
Theorem 1.
Let
Then,
By this theorem, Example 2 shows that a bitwise expression
Example 2.
For a bitwise expression
The above procedures introduced so far are integrated into Algorithm 1. The algorithm takes a bitwise expression
Algorithm 1: Transformation procedure of a bitwise expression.
(i) Input: a bitwise expression
(ii) Output: the simplification result of
(1) Function BitTrans
(2) Recursively apply the transformation
(3) Replace all
(4)
(5) Return
(6) End function
3.2. Simplifying MBA Expression
As noted above, Algorithm 1 can transform a bitwise expression
We first introduce how to simplify a linear MBA expression. According to Equation (2), a linear MBA expression is essentially a linear combination of bitwise expressions. Using Algorithm 1, the bitwise expressions in (2) are first substituted with the corresponding transformation result. After combining like terms, (2) will be reduced to the following simple form:
Example 3.
For the MBA expression in Figure 1(a), we have
Enlighten by the above simplification procedure, using Algorithm 1, (1) will be transformed to an equivalent form shown as follows:
Example 4.
For the MBA expression in Figure 1(b), we have
For a non-polynomial MBA expression, we notice that it includes multiple sub-expressions obfuscated by polynomial MBA rules. This finding inspires us to use the polynomial MBA simplification procedure to reduce a non-polynomial MBA expression. In particular, we first simplify the inner sub-expression (polynomial MBA expression), and the simplification result of the inner sub-expression is treated as a temporary variable to expose further reduction opportunities. An instance is shown in Example 5. During the simplification procedure, the inner polynomial MBA expressions are reduced to the simplified form, such as
Example 5.
For the non-polynomial MBA expression
3.3. Algorithm and Implementation
The MBA simplification scheme we have described above is illustrated in Algorithm 2. The algorithm takes an MBA expression
Algorithm 2: Simplification procedure of an MBA expression.
(i) Input: an MBA expression
(ii) Output: the simplification result of
(1) Function MBA-Flatten
(2) If
(3) Return PolySim
(4) Else
(5) For inner sub-expression
(6)
(7) Replace
(8) Replace
(9) End for
(10) Replace all
(11) Arithmetic reduction on
(12) Return
(13) End if
(14) End function
(15) Function PolySim
(16) For every bitwise expression
(17)
(18) Replace
(19) End for
(20) Arithmetic reduction on
(21) Return
(22) End function
We implement Algorithm 2 as an open-source tool, named MBA-Flatten. It accepts a complex MBA expression as the input and outputs the corresponding simplification result. An overview of MBA-Flatten’s architecture is shown in Figure 2. The whole framework is written in around 1,800 lines of Python code. The parser and AST traversal components are coded based on the Python AST library. Moreover, we leverage the Python SymPy library for arithmetic reduction.
[figure(s) omitted; refer to PDF]
Inside MBA-Flatten, the main program consists of three major components. First, a parser receives the MBA expression and translates it to abstract syntax tree (AST) for the remaining process. Then, MBA-Flatten reduces the expression to a concise form. For polynomial MBA expression, the program uses the transformation procedure to reduce a bitwise expression, and a math reduction module is adopted to further simplify the expression. The math reduction module also includes the optimization function to generate an optimal result for some expressions; e.g.,
4. Proof of Theorem 1
To prove Theorem 1, we first present that the transformation
Definition 3.
Suppose two MBA expressions
The maps in Equation (7) are identical in one-bit space. In other words, the bitwise expression
Proposition 1 shows that the transformation
Proposition 1.
Let
Proof.
Example 6.
For the bitwise expressions
Thus,
Next, we present the concept of the signature vector shown as follows. The signature vector of a linear MBA expression is a vector with
Definition 4.
(Xu [11]). Let
Table 1 shows the truth table of multiple 2-variable bitwise expressions, and the column with all “1” is encoded as “−1” [1, 11]. Using Table 1, Example 7 presents the procedure of calculating the signature vector for expression
Table 1
Truth table of multiple bitwise expressions with 2 variables.
−1 | ||||||
0 | 0 | 1 | 1 | 0 | 0 | 1 |
0 | 1 | 0 | 1 | 0 | 1 | 1 |
1 | 0 | 1 | 0 | 0 | 1 | 1 |
1 | 1 | 0 | 0 | 1 | 1 | 1 |
Example 7.
For a linear MBA expression
Then, we introduce the following lemma.
Lemma 1.
(Xu [11]). Given two linear MBA expressions
Using Proposition 1 and Lemma 1, Theorem 1 can be proved as below.
Proof.
Let
We prove
Base step: the basis is the bitwise expression
Case 1.
Suppose
If
If
Therefore,
Case 2.
Suppose
Case 3.
Suppose
If
If
If
If
Therefore,
Case 4.
Suppose
The above four cases led to
Induction step: assume
Assume
Case 5.
Suppose
Then,
According to (27), we get
If
If
Therefore,
Case 6.
Suppose
Then,
If
If
Therefore,
Case 7.
Suppose
Then,
According to (27) and (31), we get
If
If
Therefore,
Case 8.
Suppose
The above four cases led to
Assume
As discussed above, the induction is completed. Thus, we have
5. Experimental Results
In this section, a set of experiments are conducted to evaluate the MBA simplification scheme, MBA-Flatten. We first run MBA-Flatten and existing peer tools on two comprehensive MBA benchmarks. Z3 SMT solver [19] is used to check whether the simplified result is equivalent to the original MBA expression. The corresponding simplification results are discussed in Section 5.2–5.4. As reported in Section 5.5 and 5.6, MBA-Flatten can assist humans in analyzing software. At last, Section 5.7 studies MBA-Flatten’s performance data, such as running time and memory footprint.
5.1. Experimental Setup
5.1.1. Peer Tools for Comparison
We collect and check existing state-of-the-art MBA simplification tools: MBA-Blast [5] and MBA-Solver [11]. MBA-Blast is a Python tool for simplifying MBA expressions via a two-variable transformation table. MBA-Solver produces multiple precomputed transformation tables, which enumerate all bitwise expressions and corresponding concise forms. Then, MBA-Solver uses these tables to simplify an MBA expression. For a more thorough evaluation, we also check other MBA simplification tools: GraphMR [18], SSPAM [13], and Syntia [14]. GraphMR is a neural network-based solution to reduce an MBA expression. SSPAM (symbolic simplification with pattern matching) is a pattern matching method that detects and reduces MBA expressions by multiple known MBA rules. Syntia is a program synthesis framework for approximating the semantics of expressions. It uses a set of input-output samples from the expression, learns the semantics of the samples, and synthesizes a simpler expression that is equal to the original expression.
5.1.2. Benchmarks
To fully expose the capability of diverse methods on simplifying MBA expressions, a large scale of MBA expressions is required for evaluation. Therefore, we consider two comprehensive MBA benchmarks: Dataset 1 [14] and Dataset 2 [11]. Dataset 1 comprises 500 MBA samples generated by Tigress [6] with up to three variables. Dataset 2 collects 3,000 MBA equations with up to four variables, which contains 2,000 polynomial MBA (1,000 linear MBA) and 1,000 non-polynomial MBA expressions. Every sample in datasets is a 2-tuple: (
Table 2
MBA samples in the benchmarks.
5.1.3. MBA Complexity Metrics
We use the following metrics to measure MBA complexity: number of DAG nodes and MBA alternation. For example, the expression
(1) Number of DAG Nodes. An MBA expression is transformed into a directed acyclic graph (DAG) representation in which the nodes are operators, variables, and constants. The number of nodes in the DAG is defined as a complexity metric for an MBA expression.
(2) MBA Alternation. The MBA complexity mainly comes from mixing bitwise operations and arithmetic operations. We adopt “MBA alternation” to measure the number of edges linking different types of operations in the DAG representation of an MBA expression.
[figure(s) omitted; refer to PDF]
5.1.4. Machine Configuration
All of our experiments are performed on a server with Intel Core i9 3.00 GHz CPU, 64 GB DDR4 RAM, 2 TB SSD Hard Drive, and running Ubuntu 20.04 OS.
5.2. Simplification on Dataset 1
In the first experiment, we run MBA-Flatten and other peer tools on Dataset 1. The evaluation result in Table 3 shows that only MBA-Flatten successfully produces verifiable simplification outputs for all MBA expressions with negligible overhead (within 0.1 seconds).
Table 3
Simplification results using Dataset 1.
Tools | ✓ | ✗ | Timeout | Ratio (%) | Average time (s) |
GraphMR | 137 | 363 | 0 | 27.4 | 0.01 |
SSPAM | 332 | 168 | 0 | 66.4 | 1.45 |
Syntia | 369 | 131 | 0 | 73.8 | 38.5 |
MBA-Blast | 416 | 0 | 84 | 83.2 | 0.02 |
MBA-Solver | 454 | 0 | 46 | 90.8 | 0.02 |
MBA-Flatten | 500 | 0 | 0 | 100.0 | 0.02 |
We first study the correctness that means an expression before and after simplification is semantically equivalent. Z3 solver [19] is adopted to check whether the output of a simplification tool is equivalent to the input. The solver may not return the solving result due to the MBA’s complexity, so we set 1 hour as a practical threshold for this and the following experiments.
Table 3 presents the number of MBA expressions that can be reduced by simplification tools. GraphMR is trained on the linear MBA dataset, so it can only simplify 137 of 500 MBA expressions. SSPAM outputs 168 wrong simplification results because of the limited number of MBA rules in the pattern library. Syntia uses stochastic program synthesis to generate a simple expression, which successfully synthesizes 369 simplification results. MBA-Blast performs well on simplifying 2-variable MBA expressions rather than three or more variables, and therefore, it generates 416 simplification results. MBA-Solver can successfully simplify the majority of the MBA expressions (454 of 500), but it cannot process several special cases, e.g., the non-polynomial MBA expression including sub-expression
Next, we investigate the effectiveness that reflects how much complexity is reduced by the simplification methods. Table 4 reports the expression complexity before and after simplification. Two quantitative metrics are used to measure expression complexity: the number of DAG nodes and MBA alternation. Table 4 shows that all simplification tools (except SSPAM) can considerably reduce the complexity measurement of the solved MBA expressions. SSPAM cannot effectively reduce a complex MBA expression to a simpler form due to the limited known MBA rules used in the software.
Table 4
Complexity metrics of the results on correctly simplified samples in Dataset 1 before and after simplification.
Tools | Average # of nodes | Average MBA alternation | ||||
Before | After | A/B (%) | Before | After | A/B (%) | |
GraphMR | 9.7 | 3.6 | 37.1 | 3.6 | 0.2 | 5.6 |
SSPAM | 6.2 | 4.0 | 64.5 | 2.4 | 1.5 | 62.5 |
Syntia | 9.4 | 3.4 | 36.2 | 3.7 | 0.2 | 5.4 |
MBA-Blast | 9.0 | 3.5 | 38.9 | 3.3 | 0.2 | 6.1 |
MBA-Solver | 9.6 | 3.6 | 37.5 | 3.6 | 0.2 | 5.6 |
MBA-Flatten | 9.8 | 3.7 | 37.8 | 3.7 | 0.2 | 5.4 |
5.3. Simplification on Dataset 2
As the second experiment, we run MBA-Flatten and other baseline tools on Dataset 2. As shown in Table 5, MBA-Flatten can successfully simplify 2,943 of 3,000 MBA expressions, and its average processing time is less than 0.2 seconds.
Table 5
Simplification results using Dataset 2.
Tools | ✓ | ✗ | Timeout | Ratio (%) | Average time (s) |
GraphMR | 379 | 2621 | 0 | 12.6 | 0.01 |
SSPAM | 705 | 320 | 1975 | 23.5 | 143.1 |
Syntia | 437 | 2563 | 0 | 14.6 | 38.9 |
MBA-Blast | 1763 | 0 | 1237 | 58.8 | 0.05 |
MBA-Solver | 2899 | 0 | 101 | 96.7 | 0.10 |
MBA-Flatten | 2943 | 0 | 57 | 98.1 | 0.16 |
Considering the MBA expression in Dataset 2 is more complex and diverse than the one in Dataset 1, this experiment exposes more detailed findings. GraphMR and Syntia have limited effect on simplifying complex MBA expression, which can only correctly simplify less than 450 MBA samples. SSPAM cannot generate a simpler expression, so nearly 2/3 (1,975 of 3,000) of the simplified results cannot be checked by the Z3 solver within the time threshold. Compared with MBA-Blast (1,763 simplified samples), MBA-Solver can reduce more MBA expressions with three or four variables, and it successfully simplifies 2,899 MBA samples. MBA-Flatten can reduce 2,943 MBA samples, but it fails to simplify several special cases. One exception is the non-polynomial MBA expression
Table 6
Complexity metrics of the results on correctly simplified samples in Dataset 2 before and after simplification.
Tools | Average # of nodes | Average MBA alternation | ||||
Before | After | A/B (%) | Before | After | A/B (%) | |
GraphMR | 32.1 | 6.9 | 21.5 | 6.8 | 0.9 | 13.2 |
SSPAM | 35.3 | 30.4 | 86.1 | 7.5 | 6.5 | 86.7 |
Syntia | 30.9 | 5.4 | 17.5 | 6.2 | 0.7 | 11.3 |
MBA-Blast | 37.4 | 10.4 | 27.9 | 11.6 | 1.5 | 12.9 |
MBA-Solver | 45.4 | 13.2 | 29.1 | 12.2 | 2.1 | 17.2 |
MBA-Flatten | 45.3 | 11.2 | 24.7 | 12.3 | 1.5 | 12.2 |
Furthermore, we compare the average solving time of simplification tools on the two benchmarks. From Tables 3 and 5, the simplification time of GraphMR and Syntia is almost not increased, but SSPAM takes much more time when it simplifies a more complex MBA expression. MBA-Blast takes less than 0.1 seconds to simplify a two-variable MBA expression. Compared with MBA-Solver, MBA-Flatten takes slightly more time to simplify an MBA expression. The main reason is that MBA-Solver directly gets the bitwise expression simplification results from the transformation tables, rather than reduces it by multiple simplification procedures.
5.4. Case Study
The evaluation results in Tables 3 and 5 show that MBA-Solver and MBA-Flatten are the most powerful MBA simplification tools. Throughout this case study, we demonstrate the strengths and weaknesses between MBA-Solver and MBA-Flatten.
We manually check the MBA expressions solved by MBA-Solver or MBA-Flatten, and one interesting observation is that MBA-Flatten can reduce all polynomial MBA expressions in the datasets, as MBA-Solver does. Does this scenario mean that MBA-Solver and MBA-Flatten can be substituted for each other? The answer is relevant to the number of variables in an MBA expression: as described in Section 2.2, MBA-Solver can successfully simplify a polynomial MBA expression with up to four variables; compared with MBA-Flatten, MBA-Solver is more efficient when it reduces an MBA expression. However, MBA-Flatten can simplify a polynomial MBA expression with an arbitrary number of variables, and the form of simplification result is shown in Equation (15). The following example shows how to apply MBA-Flatten to simplify Equation (6), which is an MBA expression with five variables.
Example 8.
For Equation (6), we have
The other observation is that MBA-Flatten can simplify all non-polynomial MBA expressions solved by MBA-Solver, but not vice versa. It is because that MBA-Solver treats the common sub-expression as an intermediate variable, rather than a sub-expression itself. Therefore, MBA-Flatten can simplify more special cases that cannot be simplified by MBA-Solver. Moreover, MBA-Flatten can reduce a non-polynomial MBA expression with five or more variables.
From the description above, MBA-Flatten is a general MBA simplification method.
5.5. MBA-Powered Malware Deobfuscation
MBA expression is always used to obfuscate code, so malware developer also adopts the MBA expression to complicate the program. Liu et al. [5] report that MBA expressions are used in a ransomware sample to protect the encryption key, and they also observe that MBA rules are integrated into the software obfuscator VMProtect, which is widely used by malware developers.
In this experiment, we demonstrate that MBA-Flatten can assist in reverse-engineering the malware obfuscated by MBA expressions. We collect all MBA expressions used in malware from existing work [5]. Then, MBA-Flatten is applied to simplify the expressions, and the Z3 solver is used to check the correctness of the simplified result. The evaluation result shows that MBA-Flatten can successfully simplify all MBA expressions collected from existing malware samples. One simplification procedure is shown as follows, and MBA-Flatten produces the final result
Furthermore, we replace the MBA expressions used in malware with new MBA expressions involving five or more variables and produce 130 variants, such as the above expression
5.6. Boosting SMT Solving MBA Equations
Satisfiability modulo theory (SMT) solvers have been widely applied in diverse software engineering areas, such as software analysis [21, 22], symbolic execution [23, 24], and test generation [25]. Existing work [10, 11] has presented that SMT solvers are hard to solve MBA equations. However, the MBA simplification method, MBA-Solver, can be used to boost the SMT solver’s performance on solving MBA equations (11).
In this experiment, we report that MBA-Flatten (denoted as MF) can assist SMT solvers in solving MBA equations. We consider the benchmark from work [11] and test three popular SMT solvers: Boolector [26], STP [27], and Z3 [20]. The benchmark is actually considered as Dataset 2 in this study, and MBA-Solver (denoted as MS) is considered as the baseline. MBA-Flatten and MBA-Solver are used to simplify all MBA equations in the benchmark, and then, the simplification results are output to the three SMT solvers.
The evaluation result is shown in Table 7, and the solving time threshold is set as 1 hour. Before simplification, all three SMT solvers can only solve a small portion (Boolector 496 (16.5%), STP 98 (3.3%), Z3 84 (2.8%)) of the MBA equations within the time threshold, but after simplification, all three solvers can solve over 96% of MBA equations. Compared with MBA-Solver, all SMT solvers can solve more MBA equations after MBA-Flatten’s simplification. This is because MBA-Flatten can successfully simplify more MBA expressions than MBA-solver, as shown in Table 5. After MBA-Flatten’s simplification, all SMT solvers can solve 2,943 of 3,000 MBA equations, which means that the distinction between solvers’ performance on solving MBA expressions becomes insignificant. These results indicate that MBA-Flatten is a generic method to boost SMT solver’s performance on solving MBA expressions.
Table 7
Experiment result of SMT solving on Dataset 2.
Boolector | STP | Z3 | |||||||
Before | MS | MF | Before | MS | MF | Before | MS | MF | |
Polynomial | 468 | 2000 | 2000 | 70 | 2000 | 2000 | 56 | 2000 | 2000 |
Non-polynomial | 28 | 899 | 943 | 28 | 899 | 943 | 28 | 899 | 943 |
Total | 496 | 2899 | 2943 | 98 | 2899 | 2943 | 84 | 2899 | 2943 |
5.7. Performance
This section reports MBA-Flatten’s performance data. Table 8 shows the time and memory cost when MBA-Flatten processes an MBA expression with different complexity measured by the number of nodes. For every complexity measurement, 100 different MBA expressions are generated to do the test. As some of the timings are small, we repeat every test 100 times. MBA-Flatten is effective because it only performs low-cost arithmetic computation. Our implementation adopts the Python SymPy library to efficiently perform the arithmetic reduction. Overall, MBA-Flatten is an effective tool for simplifying MBA expressions.
Table 8
MBA-Flatten’s performance on MBA expressions with different complexity.
# of nodes | Time (s) | Memory (MB) |
10 | 0.02 | 0.2 |
50 | 0.18 | 1.1 |
100 | 0.53 | 4.3 |
150 | 0.91 | 7.6 |
6. Discussion
MBA-Flatten has demonstrated the feasibility of automatically reducing MBA expressions. However, we also note some potential enhancements for future improvement.
As introduced in Section 5.3, MBA-Flatten cannot simplify the non-polynomial MBA expression
It is possible that an adversary attacks MBA-Flatten by combining MBA obfuscation with other obfuscation techniques to generate an expression that does not satisfy the MBA definition in this study. Note that MBA-Flatten is designed for simplifying MBA expressions, so it may correctly handle the certain MBA sub-expression, but cannot solve the remaining non-MBA part. It is interesting to further investigate whether MBA-Flatten can interact with other analysis techniques (e.g., symbolic execution) to produce a better result.
7. Conclusion
Existing work performs well on simplifying MBA expression with very few variables. However, the state-of-the-art methods are hard to simplify a multivariable MBA expression. We investigate it and address this challenge using an in-place simplification method. A transformation procedure is proposed to transform a bitwise expression into a unified form, and we provide a mathematical proof to guarantee the correctness of this transformation. Then, the arithmetic reduction is used to further simplify the expression and produce a simplified result. Our large-scale experiments show that MBA-Flatten is a general and effective MBA simplification method. Furthermore, developing MBA-Flatten not only advances automated malware analysis but also boosts SMT solving on the MBA equations.
Acknowledgments
The authors would like to thank team members from Anhui Province Key Laboratory of High Performance Computing for their valuable suggestions. The authors appreciate Jingyao Ke for proofreading the paper. This work was supported by the Core Electronic Devices, High-End Generic Chips and Basic Software of National Science and Technology Major Projects of China, under Grant no. 2012ZX01034-001-001.
[1] Y. Zhou, A. Main, Y. X. Gu, H. Johnson, "Information Hiding in Software with Mixed Boolean-Arithmetic Transforms," Proceedings of the International Workshop on Information Security Applications, pp. 61-75, .
[2] Y. Zhou, A. Main, "Diversity via Code Transformations: A Solution for NGNA Renewable Security," Proceedings of the NCTA-The National Show, .
[3] B. Liu, W. Feng, Q. Zheng, J. Li, D. Xu, "Software Obfuscation with Non-linear Mixed Boolean-Arithmetic Expressions," Proceedings of the International Conference on Information and Communications Security, pp. 276-292, .
[4] S. Schrittwieser, S. Katzenbeisser, J. Kinder, G. Merzdovnik, E. Weippl, "Protecting software through obfuscation: can it keep pace with progress in code analysis?," ACM Computing Surveys, vol. 49 no. 1,DOI: 10.1145/2886012, 2016.
[5] B. Liu, J. Shen, J. Ming, Q. Zheng, J. Li, D. Xu, "MBA-blast: unveiling and simplifying mixed boolean-arithmetic obfuscation," Proceedings of the 30th USENIX Security Symposium, pp. 1701-1718, .
[6] C. Collberg, S. Martin, J. Myers, J. Nagra, "Distributed application tamper detection via continuous software updates," Proceedings of the 28th Annual Computer Security Applications Conference, pp. 319-328, .
[7] C. Liem, Y. X. Gu, H. Johnson, "A compiler-based infrastructure for software-protection," Proceedings of the Third ACM SIGPLAN Workshop on Programming Languages and Analysis for Security, pp. 33-44, .
[8] S. Blazy, R. Hutin, "Formal verification of a program obfuscation based on mixed boolean-arithmetic expressions," Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 196-208, .
[9] H. Ma, C. Jia, S. Li, W. Zheng, D. Wu, "Xmark: dynamic software watermarking using Collatz conjecture," IEEE Transactions on Information Forensics and Security, vol. 14 no. 11, pp. 2859-2874, DOI: 10.1109/tifs.2019.2908071, 2019.
[10] N. Eyrolles, Obfuscation with Mixed Boolean-Arithmetic Expressions: Reconstruction, Analysis and Simplification Tools, 2017.
[11] D. Xu, B. Liu, W. Feng, J. Ming, Q. Zheng, J. Li, Q. Yu, "Boosting SMT solver performance on mixed-bitwise-arithmetic expressions," Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pp. 651-664, .
[12] A. Guinet, N. Eyrolles, M. Videau, "Arybo: manipulation, canonicalization and identification of mixed boolean-arithmetic symbolic expressions," Proceedings of the GreHack, .
[13] N. Eyrolles, L. Goubin, M. Videau, "Defeating mba-based obfuscation," Proceedings of the 2016 ACM Workshop on Software PROtection, pp. 27-38, .
[14] T. Blazytko, M. Contag, C. Aschermann, T. Holz, "Syntia: synthesizing the semantics of obfuscated code," Proceedings of the 6th USENIX Security Symposium, pp. 643-659, .
[15] R. David, L. Coniglio, M. Ceccato, "Qsynth-a Program Synthesis Based Approach for Binary Code Deobfuscation," Proceedings of the BAR 2020 Workshop, .
[16] G. Menguy, S. Bardin, R. Bonichon, C. D. S. Lima, "Search-based local black-box deobfuscation: understand, improve and mitigate," Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 2513-2525, .
[17] W. Feng, B. Liu, D. Xu, Q. Zheng, Y. Xu, "Neureduce: reducing mixed boolean-arithmetic expressions by recurrent neural network," Proceedings of the Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing, pp. 635-644, .
[18] W. Feng, B. Liu, D. Xu, Q. Zheng, Y. Xu, "GraphMR: Graph neural network for mathematical reasoning," Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3395-3404, .
[19] C. Collberg, C. Thomborson, D. Low, "Manufacturing cheap, resilient, and stealthy opaque constructs," Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 184-196, .
[20] L. D. Moura, N. Bjørner, "Z3: An Efficient SMT Solver," Proceedings of the International Conference on Tools And Algorithms For the Construction And Analysis Of Systems, pp. 337-340, .
[21] J. Chen, F. He, "Control flow-guided SMT solving for program verification," Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering, pp. 351-361, .
[22] L. Cordeiro, B. Fischer, "Verifying multi-threaded software using SMT-based context-bounded model checking," Proceedings of the 33rd International Conference on Software Engineering, pp. 331-340, .
[23] C. Cadar, D. Dunbar, D. R. Engler, "Klee: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs," Proceedings of the 8th USENIX Symposium On Operating Systems Design And Implementation, pp. 209-224, .
[24] X. Li, Y. Liang, H. Qian, Y.-Q. Hu, L. Bu, Y. Yu, X. Chen, X. Li, "Symbolic execution of complex program driven by machine learning based constraint solving," Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 554-559, .
[25] E. Bounimova, P. Godefroid, D. Molnar, "Billions and billions of constraints: whitebox fuzz testing in production," Proceedings of the 35th International Conference on Software Engineering, pp. 122-131, .
[26] R. Brummayer, A. Biere, "Boolector: an efficient SMT solver for bit-vectors and arrays," Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 174-177, .
[27] V. Ganesh, D. L. Dill, "A decision procedure for bit-vectors and arrays," Proceedings of the International Conference on Computer Aided Verification, pp. 519-531, .
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Binbin Liu et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
Mixed Boolean-arithmetic (MBA) expression, which involves both bitwise operations (e.g., NOT, AND, and OR) and arithmetic operations (e.g.,
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer