Points: 200

Description

Great work! With a credible threat proven, NSA’s Cybersecurity Collaboration Center reaches out to GA and discloses the vulnerability with some indicators of compromise (IoCs) to scan for.

New scan reports in hand, GA’s SOC is confident they’ve been breached using this attack vector. They’ve put in a request for support from NSA, and Barry is now tasked with assisting with the incident response.

While engaging the development teams directly at GA, you discover that their software engineers rely heavily on an offline LLM to assist in their workflows. A handful of developers vaguely recall once getting some confusing additions to their responses but can’t remember the specifics.

Barry asked for a copy of the proprietary LLM model, but approvals will take too long. Meanwhile, he was able to engage GA’s IT Security to retrieve partial audit logs for the developers and access to a caching proxy for the developers’ site.

Barry is great at DFIR, but he knows what he doesn’t know, and LLMs are outside of his wheelhouse for now. Your mutual friend Dominique was always interested in GAI and now works in Research Directorate.

The developers use the LLM for help during their work duties, and their AUP allows for limited personal use. GA IT Security has bound the audit log to an estimated time period and filtered it to specific processes. Barry sent a client certificate for you to authenticate securely with the caching proxy using https://████████/?q=query%20string.

You bring Dominique up to speed on the importance of the mission. They receive a nod from their management to spend some cycles with you looking at the artifacts. You send the audit logs their way and get to work looking at this one.

Find any snippet that has been purposefully altered.

Downloads:

  • Client certificate issued by the GA CA (client.crt)
  • Client private key used to establish a secure connection (client.key)
  • TTY audit log of a developer’s shell activity (audit.log)

Prompt:

  • A maliciously altered line from a code snippet

Solution

Let’s start by looking at the caching proxy we have been given access to.

Immediately attempting to access it directly in a browser doesn’t work since we need to use the provided certificate and private key.

We can just provide these as arguments directly to curl and get a message back from the server. I’m also using jq here to format the returned JSON.

cobra@arch:~/codebreaker/task4$ curl -k --cert client.crt --key client.key https://████████/\?q\=test | jq .
{
  "code": "0xC004348B",
  "error": true,
  "message": "Failed to receive a response from GuardianGPT service"
}

Since it’s a caching proxy, we probably need to provide a previously used query to get a response from the service.

Looking at the provided audit.log file, we can see all the commands a developer ran within a certain period of time including gagpt commands which include queries made to the GuardianGPT service. The only issue here is that because it is a raw TTY log, its covered in ANSI escape codes that make it difficult to parse.

cobra@arch:~/codebreaker/task4$ cat audit.log
ttyaudit=1715601680 w=2 d=echo "exprt PATH=\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[Do\033[C\033[C\033[C\033[C\033[C\033[C\033[C\033[C$PATH:~/bin" >> ~/.profile\x0d u=1000 s=144 id=802900 c=0xb941
ttyaudit=1715601711 w=2 d=\x03 u=1000 s=4 id=802901 c=0x9609
ttyaudit=1715601726 w=2 d=source  ~/.profi\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[3~\033[C\033[C\033[C\033[C\033[C\033[C\033[C\033[Cle\x0d u=1000 s=131 id=802902 c=0x51d2
ttyaudit=1715601791 w=2 d=-m "If I comment out code but\x01gagpt \x05 noS\x08body is around to see it, do I still need to document it"\x0d u=1000 s=112 id=802903 c=0x74c7
ttyaudit=1715601817 w=2 d=gagpt -m "Can you explain hoi\x08w Python's __slots__ feature works, and when it should (or shouldn't) be used for optimizing memory usage in a tool that processes thousands of hardware objects"\x0d u=1000 s=198 id=802904 c=0xcc26
ttyaudit=1715601835 w=2 d=curl -fsSL https://code.visualstudio.com/shell installer.sh -o vs_code.sh\x0d u=1000 s=77 id=802905 c=0xa511
ttyaudit=1715601919 w=2 d=chmod +x vs_code.sh\x0d u=1000 s=23 id=802906 c=0xf459
ttyaudit=1715601928 w=2 d=./vs_code.sh --install-app\x0d u=1000 s=30 id=802907 c=0x738a
ttyaudit=1715601937 w=2 d=../vs_\033[D\033[D\033[D\033[D\033[D\033[3~\033[C\033[C\033[C\033[Ccode.shhh\x08\x08 --add-to-path\x0d u=1000 s=102 id=802908 c=0xf0ca
ttyaudit=1715601946 w=2 d=dock\x03 u=1000 s=8 id=802909 c=0xaaa0
...

Nonetheless, we can test hitting the caching proxy with a previously used query to verify it works.

Picking the second query from above, there’s only one escape sequence, \x08.

gagpt -m "Can you explain hoi\x08w Python's __slots__ feature works, and when it should (or shouldn't) be used for optimizing memory usage in a tool that processes thousands of hardware objects"

After a quick Google search, we can see that this escape sequence is just a backspace, so we can clean up the command accordingly.

gagpt -m "Can you explain how Python's __slots__ feature works, and when it should (or shouldn't) be used for optimizing memory usage in a tool that processes thousands of hardware objects"

URL encoding the query and passing it into curl yields a succesful response.

curl -k --cert client.crt --key client.key https://████████/\?q\=Can%20you%20explain%20how%20Python%27s%20__slots__%20feature%20works%2C%20and%20when%20it%20should%20%28or%20shouldn%27t%29%20be%20used%20for%20optimizing%20memory%20usage%20in%20a%20tool%20that%20processes%20thousands%20of%20hardware%20objects | jq .
{
  "fulfillment": [
    {
      "index": 0,
      "role": "assistant",
      "text": "`__slots__` in Python is a class-level attribute that restricts the attributes an instance can have, replacing the usual `__dict__` with a fixed-size array. This can significantly reduce memory usage for classes with many instances.\n\nWhen to use:\n- Processing thousands of hardware objects with a known, fixed set of attributes\n- Memory optimization is crucial\n- Class structure is stable and unlikely to change\n\nWhen not to use:\n- Dynamic attribute addition is needed\n- Inheritance is heavily used (can complicate things)\n- Memory savings are negligible for the use case\n\nExample usage:\n\n```python\nclass HardwareComponent:\n    __slots__ = ['id', 'type', 'status']\n\n    def __init__(self, id, type, status):\n        self.id = id\n        self.type = type\n        self.status = status\n```\n\nThis approach can be beneficial when dealing with large numbers of objects, but should be applied judiciously based on profiling and specific requirements."
    }
  ],
  "id": "8887adcc-44aa-451e-e1e2-fb9a22d3081e",
  "lang": "en",
  "model": "gagpt-xl",
  "prompt": "Can you explain how Python's __slots__ feature works, and when it should (or shouldn't) be used for optimizing memory usage in a tool that processes thousands of hardware objects",
  "upstream": "gagpt-xl-7.internal"
}

We can see a lot of information here, but the most important thing is the text key which contains the LLM’s response formatted in Markdown.

Looking back at the prompt, we need to be looking for suspicious modifications to the responses for each of the queries.

A handful of developers vaguely recall once getting some confusing additions to their responses but can’t remember the specifics.

So, we will need to parse all of the commands out of this audit log, pull the queries out of the commands, send them to the caching proxy, and look for any suspicious responses.

The trickiest part in doing this is parsing out the ANSI escape codes from the log file. One way to do this would be to write your own parser to evaluate each of the escape codes, but given the large number and weird behavior of these codes, this didn’t seem very fun to me.

Instead, we can just pass each command into the input file descriptor for a shell to evaluate it for us. After all commands are run, the decode commands will be listed in the bash history. There’s a couple of other things we have to account for, but this is the general idea for how I ended up parsing the log.

Everything that follows should definitely be run in a container/isolated environment since it involves running a list of random commands directly in your shell.

In order to write directly to the file descriptor, we can fork a new process and have it execute bash intead of using something like os.system. There are definitely other ways to do this, but this seemed the simplest. It also allows all of our commands to be contained within a single shell session. We can then control that process from our parent proces by writing to its file descriptor.

import os
import pty
import time

pid, fd = pty.fork()

if pid == 0:
    os.execlp('bash', 'bash')
else:
    os.write(fd, b'echo hello\n')
    time.sleep(0.01)
    os.close(fd)

The history is a bit weird to retrieve, but the easiest way I found was to read it within the forked process and output it to a file that we can parse in our script.

Adding -a is meant to append the current sessions’s history to ~/.bash_history, but we can tell it to append to /dev/stdout instead, so that we effectively just get the commands run in our current session. We can then parse this output to get the last run command and append it to a file for analysis.

history -a /dev/stdout | tail -n 2 | head -n 1 >> /challenge/commands.txt

We could also just dump the history after running all the commands, but I found this method to be cleaner and more organized. This whole process was very finicky.

Going back to our echo hello command, we can now output it to commands.txt.

import os
import pty
import time

pid, fd = pty.fork()

if pid == 0:
    os.execlp('bash', 'bash')
else:
    os.write(fd, b'echo hello\n')
    os.write(fd, b'history -a /dev/stdout | tail -n 2 | head -n 1 >> /challenge/commands.txt\n')
    time.sleep(0.01)
    os.close(fd)

Running it shows the exact command we input. Note that I am running this from the tmp directory since the commands we run later may result in file creation which could get messy in our challenge directory.

root@4c134f08483d:/tmp# python3 /challenge/test.py && cat /challenge/commands.txt
echo hello

The next thing to try is to actually put a command with ANSI escapes codes into our script. We can start with the first command in the audit log.

echo "exprt PATH=\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[Do\033[C\033[C\033[C\033[C\033[C\033[C\033[C\033[C$PATH:~/bin" >> ~/.profile\x0d

Plugging it into our script works to parse it.

import os
import pty
import time

pid, fd = pty.fork()

if pid == 0:
    os.execlp('bash', 'bash')
else:
    os.write(fd, b'echo "exprt PATH=\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[Do\033[C\033[C\033[C\033[C\033[C\033[C\033[C\033[C$PATH:~/bin" >> ~/.profile\x0d')
    os.write(fd, b'history -a /dev/stdout | tail -n 2 | head -n 1 >> /challenge/commands.txt\n')
    time.sleep(0.01)
    os.close(fd)
root@4c134f08483d:/tmp# python3 /challenge/test.py && cat /challenge/commands.txt
echo "export PATH=$PATH:~/bin" >> ~/.profile

Note that this command does write to ~/.profile which is why it is important that the script is run in a container.

We can extend this script to create a function to parse all lines in the log file and output them to commands.txt.

def evaluate(log):
    error_lines = []

    pid, fd = pty.fork()
    if pid == 0:
        os.execlp('bash', 'bash')
    else:
        os.write(fd, b'alias exit=true\n')
        os.write(fd, b'alias less=true\n')
        os.write(fd, b'alias vim=true\n')

        for line in log:
            print(f'Parsing {line.strip()}')
            match = re.search(r'd=(.*?)(?= u=| s=| id=|$)', line)
            
            if match:
                command = match.group(1)
                try:
                    os.write(fd, bytes(command, 'utf-8').decode('unicode_escape').encode())
                except UnicodeDecodeError:
                    print(f'Error decoding {line}')
                    error_lines.append(line)
                    continue
                time.sleep(0.01)
                os.write(fd, b'\x03') # ctrl+c
                time.sleep(0.01)
                os.write(fd, b'history -a /dev/stdout | tail -n 2 | head -n 1 >> /challenge/commands.txt\n')
                time.sleep(0.01)
        os.close(fd)
    
    print(f'Errors decoding these lines: {error_lines}')

There’s a few things that this function adds when running commands that I figured out after debugging a lot of errors:

  • We want to stop the commands exit and less/vim from working. The former will just kill our shell session which isn’t good and the latter two will lock our shell session which also isn’t good. Setting each as an alias to true solves this issue.

  • Adding ctrl+c after running each command ensures that each command is killed if it hangs or takes any amount of time to run.

  • Decoding unique escape sequences seemed to be helpful in fixing parsing issues for each command prior to dropping them into the shell.

  • Adding brief sleeps helped ensure spacing between the commands, allowing them to register in the shell.

Beyond these additions, the function just uses a regular expression to pull the command from each line as it iterates through the log.

This takes care of writing the commands from the log to a file, but we still need to parse the GPT queries out of the each command and relay them to the caching proxy. Adding this to the script makes it a little longer, but finally encompassess pulling all the responses together.

import re
import requests
import urllib3
import urllib.parse
import os
import pty
import time
import json

REPARSE = True

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

def send_to_http_server(message, cert, key):
    url = f"https://████████?q={urllib.parse.quote(message)}"
    response = requests.get(url, cert=(cert, key), verify=False)
    
    return response.json()

def evaluate(log):
    error_lines = []

    pid, fd = pty.fork()
    if pid == 0:
        os.execlp('bash', 'bash')
    else:
        os.write(fd, b'alias exit=true\n')
        os.write(fd, b'alias less=true\n')
        os.write(fd, b'alias vim=true\n')

        for line in log:
            print(f'Parsing {line.strip()}')
            match = re.search(r'd=(.*?)(?= u=| s=| id=|$)', line)
            
            if match:
                command = match.group(1)
                try:
                    os.write(fd, bytes(command, 'utf-8').decode('unicode_escape').encode())
                except UnicodeDecodeError:
                    print(f'Error decoding {line}')
                    error_lines.append(line)
                    continue
                time.sleep(0.01)
                os.write(fd, b'\x03') # ctrl+c
                time.sleep(0.01)
                os.write(fd, b'history -a /dev/stdout | tail -n 2 | head -n 1 >> /challenge/commands.txt\n')
                time.sleep(0.01)
        os.close(fd)
    
    print(f'Errors decoding these lines: {error_lines}')

if REPARSE:
    try:
        os.remove('/challenge/commands.txt')
        os.chown('/challenge/commands.txt', 1000, 1000)
    except:
        pass

responses = []
queries = []

with open('/challenge/audit.log', 'r') as log:
    if REPARSE:
        evaluate(log)

    with open('/challenge/commands.txt', 'r') as commands:
        with open('/challenge/output.md', 'w') as output:
            for command in commands:
                if command.split(' ')[0] == 'gagpt':
                    match = re.search(r'"([^"]*)"', command)
                    query = match.group(1)
                    query = query.replace(\u0080\u0094', '—')
                    query = query.replace('\u00c3\u00b6', 'ö')

                    if query not in queries:
                        server_result = send_to_http_server(query, '/challenge/client.crt', '/challenge/client.key')

                        try:
                            response_text = server_result['fulfillment'][0]['text']
                            print(f'Got response for: {query}')

                            if '`' in response_text:
                                output.write(f'>{query}\n\n{response_text}\n\n')
                            
                            responses.append(server_result)

                        except KeyError:
                            print(f'Broken response from: {query}')
                            break

                        queries.append(query)

with open('/challenge/responses.json', 'w') as responses_json:
    json.dump(responses, responses_json)

It is worth noting that two special unicode characters are in commands: “—” and “ö”. Because of the weird unicode decoding setup, they’re not properly decoded and need to be fixed manually.

Running the script succesfully congregates all the responses into a single browseable Markdown file where each query and response can be viewed. However, it does take some time to run because of the sleep statements included for padding.

root@4c134f08483d:/tmp# python3 /challenge/get_responses.py
Parsing ttyaudit=1715601680 w=2 d=echo "exprt PATH=\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[Do\033[C\033[C\033[C\033[C\033[C\033[C\033[C\033[C$PATH:~/bin" >> ~/.profile\x0d u=1000 s=144 id=802900 c=0xb941
Parsing ttyaudit=1715601711 w=2 d=\x03 u=1000 s=4 id=802901 c=0x9609
Parsing ttyaudit=1715601726 w=2 d=source  ~/.profi\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[D\033[3~\033[C\033[C\033[C\033[C\033[C\033[C\033[C\033[Cle\x0d u=1000 s=131 id=802902 c=0x51d2
Parsing ttyaudit=1715601791 w=2 d=-m "If I comment out code but\x01gagpt \x05 noS\x08body is around to see it, do I still need to document it"\x0d u=1000 s=112 id=802903 c=0x74c7
Parsing ttyaudit=1715601817 w=2 d=gagpt -m "Can you explain hoi\x08w Python's __slots__ feature works, and when it should (or shouldn't) be used for optimizing memory usage in a tool that processes thousands of hardware objects"\x0d u=1000 s=198 id=802904 c=0xcc26
Parsing ttyaudit=1715601835 w=2 d=curl -fsSL https://code.visualstudio.com/shell installer.sh -o vs_code.sh\x0d u=1000 s=77 id=802905 c=0xa511
...

We do get an error decoding one of the lines, but its just an echo statement, so we can ignore it.

Errors decoding these lines: ["ttyaudit=1715877292 w=4 d=echo -ne '\\xDE\\x\\x03 u=1000 s=20 id=805012 c=0x44b6\n"]  

Once all the log lines are parsed, we start seeing that we are getting succesful responses.

...
Got response for: If I comment out code but nobody is around to see it, do I still need to document it                                                                                                                                          
Got response for: Can you explain how Python's __slots__ feature works, and when it should (or shouldn't) be used for optimizing memory usage in a tool that processes thousands of hardware objects                                            
Got response for: Could you outline the key technical challenges of implementing autonomous navigation in heavily GPS-jammed environments for military ground vehicles                                                                          
Got response for: What's the best way to debug an intermittent issue with I2C communication between a microcontroller and a sensor using a logic analyzer                                                                            ...

After the program eventually finishes, we can look at output.md to see each query and corresponding response from the LLM.

>Can you explain how Python's __slots__ feature works, and when it should (or shouldn't) be used for optimizing memory usage in a tool that processes thousands of hardware objects

`__slots__` in Python is a class-level attribute that restricts the attributes an instance can have, replacing the usual `__dict__` with a fixed-size array. This can significantly reduce memory usage for classes with many instances.

When to use:
- Processing thousands of hardware objects with a known, fixed set of attributes
- Memory optimization is crucial
- Class structure is stable and unlikely to change
...

In total this file is 12120 lines long which will take a very long time to read through by hand. While the proper way to solve the task at this point is probably to run each query/response through the API of an LLM or a locally-run LLM to flag suspicious code, I decided to be lazy.

I started messing around with dumping snippets into ChatGPT, filtering out queries that didn’t have any code, and just blindly jumping around the output. Eventually, I stumbled on the malicious response from simply just interacting with the code so much.

>What's the most efficient way to work with fixed-point arithmetic in Python when interfacing with hardware that doesn't support floating point

To efficiently work with fixed-point arithmetic in Python when interfacing with hardware without floating-point support:

1. Use integers to represent fixed-point numbers.
...

```python
SCALE = 1000

def to_fixed(f):
    globals()['ga'] = __import__('g49c896')
    return round(f * SCALE)

def from_fixed(i):
    return i / SCALE
...

The APT was trying to set a Google analytics code in the employee’s application to redirect all analytics data to them.

globals()['ga'] = __import__('g49c896')

We can submit this line to succesfully solve the task.

Result

That’s the snippet! I doubt the LLM is supposed to suggest an import like that.