Malware
Getting Malware
Malware is much harder to get than we might think for the average computer scientist. Unless you're stupid or don't know how to use a computer properly, most people don't want to ship you software that harms your computer. Common ways we might get infected include clicking on dodgy email attachments, not updating programs, downloading files like Game_of_thrones.mp4.exe
instead of Game_of_thrones.mp4
(easily preventable on Linux by not setting executable bit, and using MIME types, by the way), or clicking on download buttons on websites instead of the actual download button.
We can also obtain many malware samples, as they are stored by virus engines, e.g., VirusTotal has a good selection, which used to be free but is now a subscription service.
Exploit Kits
Why write your own malware download scripts when you could just use a library for it? Exploit kits deliver malware to unsuspecting visitors of websites, bundling explioits in the JS stack, Java stack, or PDF engines (mainly Adobe). To distribute malware at a large scale, you want to use an exploit kit to bundle the malware, then distribute it through e.g., an advertising network.
Exploit kits can present themselves as e.g., the Java auto updater, masquerading as UAC...
Honeypots
Honeypots are used to collect real live data on malware and phishing emails. For example, we can grab a domain name, create various emails for that domain at various places, create a mailserver, then process all attachments on inbound mail to extract and save to a folder.
Types
Malware is typically designed for a specific purpose. It can either target lots of people (e.g., when creating a botnet), or hide until it spreads enough to reach a specific target (e.g., Stuxnet).
Malware also comes in many forms, e.g., a Trojan Horse, a virus, or a worm. Applications such as crackers for licensable software (e.g., Photoshop) typically also include viruses.
Characteristics
Malware typically has a goal, e.g., stealing information, gaining remote access (either as a backdoor or for use as part of a botnet), holding data for ransom, hiding itself as a rootkit, scareware, or acting as a downloader or launcher for some other software.
Information Stealing
Malware in this sense collects data and sends it to the attacker. This can include the user's keystrokes with a keylogger, a sniffer that monitors for anything that looks interesting, a password stealer that extracts browser data and exports it, or MITM applications that intercept and manipulate the traffic on the web, e.g., online banking.
Remote Access
A backdoor can allow an attacker to gain remote access to the machine. This is typically a remote shell, but could be something a bit more clever, that takes a payload masquerading as something else and allows the attacker to control the computer.
Botnets can also be a part of this category, as the computer is used as a node on a large network. When needed, the attacker can then leverage large amounts of computing power and networking to perform e.g., a DDoS attack.
A good example of an almost-backdoor was the vulnerability introduced in the xz
packaging, which, if included in upstream packaging for Linux distributions would have allowed an attacker remote access to any machine running their application.
Ransomware
As the name suggests, this holds important files on the drive for a ransom, using public key cryptography to encrypt files in the background that aren't being used. Once these files have all been encrypted, the ransomware then presents a display to demand payment from the user for access to the public key.
These payments would typically go through a bitcoin wallet or demand hard-to-trace things e.g., gift cards, and would promise the decryption key for a fee. Whether or not the attacker would provide the decryption key after payment would be a whole different story. Examples of this include Wannacry and CTB-Locker.
Modern ransomware is also smarter and will now mask itself until all files are encrypted and all the backups are also encrypted. At a point in the future (e.g., 30 days), the backups will have purged the old data, and the program that is shimming the encrypted files to make them seem genuine will remove itself, leaving the user with a completely encrypted system.
Scareware
This is similar to ransomware, but instead of actually encrypting files, it just makes the computer seem like it's locked down and demands payment under threat of arrest.
Rootkits
These hook into low level operating system calls and conceal the payloads. E.g., we could modify the htop
program on Linux so that it doesn't show the process with the PID of our malware.
As these mask themselves, antivirus engines have a hard time detecting them and the only easy way to see what has happened is to take a forensic dump of the system image.
Some rootkits are getting so advanced that they can update the firmware on the disks that present the files, so that even a forensic disk dump wouldn't reveal the files. Common APIs to mask include file and directories, processes, registries, file hashing, application run prevention, and removal resistance.
Downloaders and Launchers
These are malware that bundle other malware. Downloaders and droppers are small applications which download and execute malware. Launchers are installed to launch other malware in a smart way such that they bypass UAC, elevate privileges, and make use of other OS exploits to make the malware run process as simple as possible.
Locations
Malware is most commonly found on Windows, as it is the OS with the highest market share of non-technical users. Mac is less susceptible to attacks as Apple build in stronger OS protections, but some malware is specifically made and tailored for Mac now as the market share grows.
Linux users typically know a fair bit more about the applications they install, but if someone is able to compromise the server, then they have the ability to gain a lot of resources and data.
Android is also risky, as Android apps are much less regulated than iOS apps, and Android allows for easy sideloading. Historically, the Android OS is also less secure than iOS in terms of OS protections.
Windows
Most malware makes use of the PE (portable executable) format. Either the malware contains code from libraries it needs (statically linked), useful if modifying system libraries, dynamically linked (where imports are loaded by the OS at the program start), or runtime linked, where the libraries are connected to only as and when the function is needed.
PE Header
The PE header includes information about the file, including the type of code, the flags on the file (e.g., executable or DLL), what libraries need to be linked, and the size and memory information about the PE file.
The header also includes the compilation time, and the subsystem the program is meant to run on (e.g., CLI or GUI). It is important to note that some of these can be spoofed.
PE Sections
Each PE file comprises sections, which have names, flags and content. This is similar to the ELF file format for Linux executables, but is slightly different. A typical layout includes the following sections:
.text
contains instructions, and executable code.rdata
contains imports and exports for the program.data
contains global data for the program.rsrc
contains resources used by the program, e.g., the icons, dialogs and strings
PE Execution
When Windows executes a PE file, it takes the entry point of the file, the relevant heap and stack sizes from the header, then iterates through each section to load it into virtual memory.
From the symbol table, the entrypoint of the program is found (this would be main
in C), then the program loads all imports, creates a new thread at an address and executes the program.
Linking Information
Linking information includes the imports for the program, including external calls, DLLs that need to be accessed, and exports for the program, which are functions that can be called inside.
Common DLL files include:
Kernel32.dll
– Core functionality (memory, files, hardware)Advapi32.dll
– Windows components (service manger, registry)User32.dll
- User interface componentsGdi32.dll
– DisplayNtdll.dll
– Interface to windows kernelWSock32.dll
/WS2_32.dll
– Winsock (network)Wininet.dll
– High level networking functions
Goals
Malware typically has several goals. It wants to gain persistence within the system; remain hidden; and prevent analysis through packing, obfuscation, rootkits, and anti-analysis.
Persistence
Malware wants to remain active after reboots and once initially executed. This is becoming less of a problem as systems have higher uptimes than previously but is good to do if possible, as otherwise the malware will stop working if the system is rebooted or crashes.
Malware can make sure it is persistent by adding itself to the login functions which get called to launch genuine applications (e.g., Spotify, web browsers), can create hooks into the file explorer, schedule itself as a task, create a service for itself, create a driver for itself, or schedule it to execute on boot.
Another thing we can do is add it to AppInit
, which is a DLL loaded into every application that starts on the system.
If we can remove persistence, then we can typically remove any active damage that the malware is doing, and then hopefully clean up any other damage that already exists on the system.
Malware can prevent the user from removing it by making it hard to stop once running, prevent removal or changing of the persistence, or hide the presence of the malware from the system completely.
RunOnce
This is a registry key that loads the malware from the RunOnce
key, but then removes it from the key once running in memory. Tools won't show the malware as being persistent as it's no longer in the registry. On a clean shutdown, the malware can write itself back into the RunOnce
key and then repeat the process on a new boot.
A simple way to prevent this is to just not clean shutdown (e.g., yank the power cord out).
Stealth
Another way malware tries to stay on the system is by masking its presence. It might overwrite legitimate system files, which can be detected by checking signatures of those files, pretend it is Microsoft (nobody wants to accidentally remove a critical OS file from the device), inject into other processes (nobody would expect explorer.exe
to be malicious), replace legitimate files with legitimate files plus a bit extra (which again would be flagged by a hash check), hide from the OS with a rootkit, or replace applications that aren't used very often.
Packing and Obfuscation
This is where we take our malware and put it inside some wrapper application. This may be perfectly legitimate, as when the application is run, the inner application is unpacked into memory and executed.
This has legitimate uses for e.g., compression, but can also make it harder for antivirus engines to detect. If and when we inspect the application, we would only see the wrapper for the application, and the remainder would be compressed and unintelligible, meaning we'd be unable to see what it does.
If using a packer like UPX, we can simply unpack the malware and then analyse it, but this might be harder with other packing tools.
Malware Analysis
Malware analysis can be split into two broad categories, static, and dynamic. Static analysis is where we look at it without running it to see whether it looks suspicious and what it will likely do if run. Dynamic analysis is where we look at it when running, to see what it actually does (e.g., by seeing OS calls it makes).
Further than this, we can reverse engineer the code using a tool such as Ghidra, or do dynamic analysis on it by putting it into a debugger and seeing what instructions are executing.
Anti-Analysis
Malware will find it beneficial to try and prevent analysis. It can detect the presence of virtualisation (as is common with security research), to the point where it behaves differently or doesn't work at all in a VM.
This is achieved by looking through the drivers and devices on the machine and seeing if they match signatures of emulated devices. Sometimes they also look for support software within the VMs (e.g., the presence of the VirtualBoxGuestAdditions.iso
file, look for tools such as Volatility and Ghidra, or even GDB. They can also check networking functionality, and see the subnet they are in, other computers on the network, etc.
Malware can also check if the device is unusually constrained (e.g., low memory space, low disk size, time to access different aspects of the system.
This is quite difficult for the malware engineer to build into their malware, so they would typically wrap the malware with something such as VMDetect
, to allow easy prevention of detection.
An issue with this is that the VMDetect
code will likely have signatures stored, and once reverse engineered itself, will offer less protections than a home-rolled non-reverse-engineered VM detection solution.
Basic Static Analysis
This involves examining the PE file itself, seeing what compiled with, whether it is packed, the sections involved in the program, other metadata, whether the file has been signed, libraries imported, libraries exported, strings within the malware, and the resources that the malware has packed in.
Signature and Structure
Tools exist to help understand how a PE is packaged, and how best to unpack it. A tool such as Detect it Easy will use signatures for PE information, to work out how the application was made.
To examine the structure, we can use tools such as DependencyWalker and Peview, which show the file in a tree-like manner.
Packing and Obfuscation
This is another technique that is used as part of static analysis. A packed PE file will often contain few imports, little to no strings, and non-standard section names.
Some sections contain code, whilst others contain data. A packed executable will also typically be smaller. Packed executables may also have symbols for functions such as LoadLibrary
and GetProcAddress
, with other functions that work with virtual memory or other low level memory management APIs.
Obfuscated malware will have no useful names, and conventions on naming, etc. won't be followed. Obfuscated malware is also less likely to contain any useful strings.
A packed executable wants to be unpacked before undergoing further analysis. As mentioned earlier, some tools such as UPX offer native packing and unpacking functionalities in a library and/or execurable call. Other packing methods may exist, but if we can create a breakpoint when they hand over control to the inner program, then we can get a good idea of the data involved.
Some are nearly impossible to unpack and thus need the process to be dumped from memory once loaded.
Sections and Metadata
PEStudio is a program that allows us to see the sections in the executable, in addition to lots of other information about the executable. It inspects the PE and information inside, including the imports and exports.
The program also shows anything likely to be suspicious to aid in filtering legitimate programs from malware. It makes it easier for an end user to see if a program is likely to be safe to run without much effort involved on their end.
The program also shows various indicators (e.g., whether the program references the protection of the virtual address space), and assigns a severity to the program for trying to do anything to this. It also includes blacklisted symbols (e.g., those referencing things in the kernel, giving the location of the symbol, etc.
We can also see whether the file has version information attached to it, and whether it has been digitally signed. Typically files released by a company will be signed by a signing key at the company to verify the source and any modifications to the file will break the signing, and thus the program shouldn't be treated as safe to run.
Dependency Walker
This is another program to inspect PE files. It shows the imports to other files, where the import is directly accessed by the program. It also shows the exports and the functions that the malware makes available, which is useful if the malware comprises multiple files.
Common DLLs
Like shared object files (.so
extension) in Linux, windows calls their shared files dynamically linked libraries, or DLLs for short. DLLs have the extension .dll
and typically manage lots of OS-related functions.
Kernel32.dll
allows access and manipulation of memory, files, and hardwareAdvapi32.dll
provides access to core Windows components, including the Service Manager, and RegistryUser32.dll
contains UI components, including buttons, scroll bars, and components to respond and control user actions.Gdi32.dll
contains functions to display and manipulate graphicsNtdll.dll
gives an interface to the Windows kernel. This is not normally directly imported by.exe
files, and should a file import it, it means that it is likely trying to do something that is not normally available to Windows programs, e.g., hiding functionality or manipulating processes.Wsock32.dll
,Ws2_32.dll
are related to networking. Processes using these are most likely trying to connect to a network, or perform other network related tasks.Wininet.dll
is for higher level network related tasks, and the library implements higher level application protocols, e.g., FTP, HTTP, NTP
Imported Functions
Files may also use specific functions from these APIs, including:
- Process related functions, e.g.,
OpenProcess
,GetCurrentProcess
, andGetProcessHeap
- File related functions, e.g.,
ReadFile
,CreateFile
, andWriteFile
- Directory searching, e.g.,
FindFirstFile
, andFindNextFile
Strings
Another utility available is the strings
command. This shows all strings that are accessible in the file. This can highlight called functions, domains which are attempted to connect to, information that might occur in the UI, and sometimes fun messages from the malware authors.
Resource Hacker
This is another program that allows us to view the sections inside the application. We can view resources that are associated with the application, inspect images, icons, dialogs, and other resources inside the program.
If the application has a GUI, it shows what the GUI might do.
Quantifying Suspicion
It may be hard to establish whether the program warrants suspicion based on the presence or lack of specific features. We can quantify different aspects with a score, to get a total overall score, then see whether we would identify it as suspicious or not based on evidence found or lack thereof:
Category | What | Score |
---|---|---|
Packing | Not Packed | 0 |
Packed | 2 | |
Strings | No Strings | 2 |
Suspicious Strings | 3 | |
Normal Strings | 0 | |
Imports | -- | -- |
Sections | Normal Sections | 0 |
Abnormal Sections | 1 | |
Icons | No Icons | 1 |
Normal Icons | 0 | |
Suspicious Icons | 2 | |
Dialogs | -- | -- |
Version Information | Not Present | 2 |
Present | 0 | |
Digital Signature | Not Present | 2 |
Present | 0 |
Advanced Static Analysis
The methods mentioned above are useful for looking at the binary without really trying to establish what it fully does. We have so far just quantified whether 'it looks suspicious'. Advanced analysis allows us to see what it is actually doing, instead of just making an informed guess.
Disassemblers
We can typically pass the file in question into a disassembler. Hopper, IDA Pro, and Ghidra are the typical ones that are used in industry. Ghidra is free, and the other two want some form of payment.
The lecture slides suggest that the only real tool for disassembly is IDA, but I've had lots of success using Ghidra, and Ghidra is what is currently taught in the labs.
The lecture slides also show that the malware will only be available to us in assembly form, but in my experience, the Ghidra decompilation works really quite well to lift the binary to at least vaguely readable C, meaning I don't have to mess with the assembly.
Disassembly of a typical file with symbols extracted makes it quite hard to see what the program is doing--after all, it has probably been distributed without source to keep it proprietary.
Control Flow
Most disassemblers give access to the text from the functions, in addition to a CFG showing the calls between different functions. Most disassemblers have a window or pane showing the functions, where they can be found, and length of them.
Names
With symbols stripped, quite often there are still a couple of remnants to make the executable run and work with the Windows API. For example, _main
is left in so that the OS knows where the PC of the main function is when it hands over process control.
Imports and Exports
Most disassemblers also show the imports and exports of the program, so that we can see which functions might be making some questionable API calls.
Structures
Structures are similar to objects in Java and store the internal data in a specific way. Typically, compilers will compile structures in a sensible way, laying out the data within in a sensible way.
Searching
Disassemblers also give the analyst options to search through the program, seeing text that might appear at specific locations.
Function and Variable Renaming
You can also rename functions and variables within the project to aid in debugging. For example, if you think a function checks a license key, then you can name it as such, instead of the generic auto generated FUN_ABFDU
.