Why I Write Malware and You Should Too

Why I Write Malware and You Should Too

Yes, you read the headline right, no it's not really clickbait. If you're in information security, whether on the blue team or red or somewhere in between, I think you should give malware development a try.

Here is why.

Reverse-Reverse Engineering

I started a series on my YouTube channel about 2 years ago that I called Reverse Reverse Engineering. The series has since turned into the staple of my channel and gets a fair bit of viewership, considering the niche nature of the content. It set me off on a brilliantly fun journey of learning how malware works by building it myself.

I had done a fair bit of reverse engineering before I dove into the series. I have the GREM certification showing that, on paper, I know a fair bit about reverse engineering malware. I can struggle my way through enough assembly to get a gist of what malware does and how it does it, and I was (and still am) a fairly mediocre dynamic reverser.

I constantly found myself wanting more, though.

I felt like there wasn't much more I could glean from just staring at assembly until my eyes bled. I felt like I wanted to get into the developer's head, to know more about the process and to understand, from their perspective, why some design decisions were made.

I had written some silly little Python RAT's before, going so far as to implement keyloggers and such, but I hadn't gone so far as to build a fully-featured, multi-stage, modular and modern piece of malware. Moreover, I'd never written one in C/C++ and wanted to give that a try.

So, I dove into what would become a hell of a journey. One of the pivotal moments happened recently, when I realized that throughout the entire series to that point, I had been script kiddy-ing my way through development without really understanding what I was writing. I stopped, read the better part of Practical Malware Analysis and C++ for Dummies and came back with a blank IDE and a new understanding of what I was working with.

With this mid-point pivot, I thought more about the philosophy of Reverse-Reverse Engineering, or developing projects for the understanding and knowledge that it brings to the reverse engineering process. Why did I start this journey to begin with? Why is this a particularly good way of learning?

The philosophy of Reverse-Reverse Engineering

First off, I'll be honest. One of the larger reasons I decided to start writing malware is because I thought it would be fun. Turns out, I was right. Just like with all things software development, it came with its fair share of frustrations... especially since I am coding it in The Devil's Language. Banging my head against the screen dealing with strictly typed languages when I've been writing almost exclusively in Python and JavaScript for the better part of a decade was... rough.

That being said, I think the philosophy of developing something for its own sake has its own merit. It's the Maker Philosophy, that you don't necessarily need a reason to write code or build something, you might just do it because it's cool and fun, or to gain knowledge about the thing itself. So really, my reasoning for Reverse-Reverse Engineering is twofold: fun and to learn.

The knowledge I've gained has shown me the merits of the RRE Philosophy. I now understand the Windows API far better than I did before. I understand what goes into writing the malware that I see on a daily basis in my day job. I understand the frustrations, a bit more about the design decisions and methodologies, and honestly I've gained just a bit of empathy for the "bad guys" on the other side of the screen.

Malware development is hard!

Writing something good, something that works, something that's reasonably error-resilient, that's hard in all software development... writing it to specifically run on a system that is inherently adversarial to the software itself is a different animal. As of the time of writing, I still haven't gotten to the part where I talk about AV evasion and such (though I plan to soon!) but I can absolutely understand how frustrating that has to be, to spend all of that time and effort to get a working piece of software only for the AV to find and mitigate it almost immediately and send you back to the drawing board.

Really, to me, RRE is all about tackling a difficult problem (malware analysis) by analyzing it from multiple perspectives, across different layers of abstraction. If you're making a video game, for example, you can pretty easily make one with comparably little effort in a popular game engine like Unity or Unreal. You can also do it like Casey Muratori, the person behind Handmade Hero who is creating a videogame from complete scratch, with no engine and very little third-party add-ons. Neither of these levels are necessarily bad, but you can learn about the actual technology behind the creation of video games, from physics engines to graphics and special effects, to an unbelievable level of depth by writing it from scratch. So much of the abstraction in game engines takes you further and further away from the mathematics and complexity underlying what happens on the screen, which is to say if you don't really care about those things, it's perfectly fine to use an engine.

Yet another layer of abstraction would be learning about malware through public reverse engineering blogs and articles. There's nothing wrong with doing this. Most of threat intelligence is digesting third party reporting anyways. There are indeed some analytical worries there, trusting third-party reporting without validating it, but as a method of learning reverse engineering, it's just another layer of abstraction. You're reading the reporting of someone who read the assembly.

For me, I'd experienced malware from the "assembly level of abstraction" and "external reporting level of abstraction" plenty. I'd run it in labs and VM's and had reversed it at a low level, and I spend most of my day reading other people's reporting. I wanted to experience it from a higher level of abstraction, by writing the code myself and viewing the malware through the lens of C/C++ code.

Now that I've done this for several months, I've got some learned lessons to share...

The Windows API is a F&?!ing mess

Why are there thirty different versions of a given function? Why is the type system so incredibly weird, and why does there need to be a new API function for every type? Why is the design of the API so damn convoluted? Why do there need to be a billion different structs for very simple data types? Why is there a typedef for literally fucking everything?

How is the world's most used operating system such an abysmal mess?

Learning to use the Windows operating system is like being force-fed concrete spraying out of a fire hose through a series of interconnected bendy straws. It's far slower than it needs to be, there are way too many bottlenecks and by the time you feel like you've made progress, your code is so bloated with heavy functions and structs that you're weighted to the ground trying to refactor and figure out your own code.

I've spent more time trying to figure out the naming conventions of Windows API functions and structs and typedefs than I'd care to admit. It's obnoxious, bloated and makes me want to cry for all of the people whose job it is to actually write legitimate software using the Windows API.

It all boils back down to C

I was watching a series by Thomas Randall on a game that he was developing from scratch, and he said he started several layers of abstraction above where he was now and that it all essentially boiled back down to C. C being the last layer of abstraction before the hardware (well, aside from Assembly, but I don't see many people, if any, writing full games and graphics software in Assembly), I've found that eventually it all does fall back to C.

I was writing some of my code within the context of injecting into a process and found that the standard library and a ton of other functionality was unavailable. Once you strip the standard library and a lot of other third party libraries, you're functionally not writing C++ anymore, you're just writing C. So leaning on C++ libraries as a crutch can be dangerous in contexts like these.

There are a million ways to skin an OS

The phrase "there are a thousand ways to skin a cat" applies very much in dealing with the Windows operating system. If you read modern malware reverse engineering articles, you'll find that there are a ton of different ways to do any given task on the killchain. Want to establish persistence? There are thirty registry keys for that, a couple of easy methods of DLL search order hijacking and a couple of system directories you can write to that will all basically do the same thing. Want to download a file? You've got the full power of JavaScript and PowerShell native on most Windows systems to trivially do just that in a hundred different ways.

As I learn more about malware, my understanding of its complexity increases. The Windows attack surface is so incredibly vast that it's no wonder why malware became and continues to be such a problem: there are a million ways that are known, little known or unknown to defenders to take each step in infecting and affecting a system.

In Conclusion: You Should Write More Malware

RRE has been an incredibly fun and informative journey. I really think that it would benefit defenders, analysts and red teamers to spend just a bit of time at least thinking about malware from the perspective of the developer. Don't spend all of your time looking through the lens of one, specific level of abstraction like assembly or dynamic reversing or reading third-party reporting.

Have fun, build cool stuff for its own sake and learn from the journey.