Abstract
Traditional software defenses take corresponding actions after the attacks are discovered. The defenders in this situation are comparatively passive because the attackers may try many different ways to find vulnerability and bugs but the software remains static. This leads to the imbalance between offense and defense. Software diversity alleviates the current threats by implementing a heterogeneous software system. The N-Variant eXecution (NVX) systems, effective and applicable runtime diversifying methods, apply multiple variants to imporove software security. Higher diversity can lead to less vulnerabilities that attacks can exploit. However, runtime diversifying methods such as address randomization and reverse stack can only provide limited diversity to the system. Thus, we enhance the diversity of variants with a compiler-assisted approach. We use a Deep Reinforcement Learning-based algorithm to generate variants, ensuring the high diversity of the system. For different numbers of variants, we show the results of the Deep Q Network algorithm under different parameter settings.