A Play-by-play of the Mirai botnet source code - Part 1 (scanner.c)
Today I've decided to start learning something very new and unfamiliar to me, as someone who mostly works with very high-level languages like Python and Kotlin, and attempt to understand the source code behind the Mirai botnet that managed to control over 380,000 IoT devices at one point in its life. I'll likely get a lot of things wrong the first time around, and this may take quite a few blog posts to complete, but my goal is to document everything I figure out, so that in the future, someone else in a similar position will be able to use this blog as a resource to understand the C and Go that ran this ground-breaking botnet.
To begin, I'll start by reading scanner.c, which is the code that is in charge of the scanning for and infecting new IoT devices. I'll be taking my time with it, so this might take a few blog posts.
scanner.c has most of its logic in a function called scanner_init, which is triggered either from main.c (which seems like how it is triggered when no attack is happening) or from attack.c, (which is how it is triggered when after attacks have concluded). From this behavior, I suspect that each node in the botnet was designed so that if there was an ongoing attack, it would stop scanning for new IoT devices, but if there was no ongoing attack, it would use its bandwidth to scan.
Now diving into the scanner.c code, let's start at the beginning of scanner_init:
void scanner_init(void)
{
// (A)
int i;
uint16_t source_port;
struct iphdr *iph;
struct tcphdr *tcph;
// (B)
// Let parent continue on main thread
scanner_pid = fork();
if (scanner_pid > 0 || scanner_pid == -1)
return;
// (C)
LOCAL_ADDR = util_local_addr();
// (D)
rand_init();
fake_time = time(NULL);
// (E)
conn_table = calloc(SCANNER_MAX_CONNS, sizeof (struct scanner_connection));
for (i = 0; i < SCANNER_MAX_CONNS; i++)
{
conn_table[i].state = SC_CLOSED;
conn_table[i].fd = -1;
}
The segment labelled (A) just contains some initializations that will become useful later when the SYN packets will actually be crafted in order to test if a particular IP is open to making TCP connections. This is called SYN scanning, I believe, which takes advantage of the fact that a normal TCP 3-way handshake begins with a SYN from initiating computer to the partner computer, then a SYN-ACK back from the partner computer to the initiating computer, and finally an ACK to acknowledge the SYN-ACK from the initiating computer to the partner computer. If the malicious device sends a SYN and receives a SYN-ACK, it can assume that that device does indeed have the capacity to have a TCP connection on that particular port, and if that port is 23 or 2323, it might be a open to receiving commands via telnet (so it can begin guessing passwords).
The segment labelled (B) contains the code that allows the rest of this function to run on a child of the main thread. The fork command duplicates the current running process beginning on the line of the fork, and returns 0 if it is on the child process, while returning either -1 on the parent process for failure, or the positive PID of the child, on the parent process on success. So in the following code,
if (scanner_pid > 0 || scanner_pid == -1)
return;
the parent process (which has the scanner_pid as the actual scanner process's PID) will not continue running the remaining scanner_init function, while the child process (which has the scanner_pid as 0) will continue running the rest of the scanner_init function.
Segment (C) just grabs the local IPv4 address, which is necessary for specifying the source IP address when crafting the SYN packet. You can think of this as the return address, so the victim computer of SYN scanning knows where to return the SYN-ACK packet.
Segment (D) seems to me like it is initializing random seeds based on the current time and PID
void rand_init(void)
{
x = time(NULL);
y = getpid() ^ getppid();
z = clock();
w = z ^ y;
}
From what I could gather, the random seed is different from parent and child, because of the getpid and getppid functions, but it will generate random numbers at different times. This may or may not be significant, but at this point, I'm just interpreting this as simply initiating a bunch of randomization functions.
Segment (E) holds an interesting array of the scanner_connection struct. This array is SCANNER_MAX_CONNS long (128), and all of the connections are initialized to be in the closed state. In a later portion of the code, we will see the scanner continuously loop and "spew" out SYN packets to 160 different new IP addresses at a time, and then check for any incoming SYN-ACKs. If any SYN-ACKs are received, Mirai wants to allocate a connection to check if it can connect to it via telnet, but because it does not want to have an infinite amount of connections, it uses this array of 128 connections, and then just only uses at most these connections, so that it will only attempt to connect with 128 devices at a time.
For further reading, here's an awesome resource that I found on the internet. I will continue with this play-by-play in the coming days!
Comments
Post a Comment