Binder Tracing Part 2 - Extracting and Parsing the data

14 Nov 2022 - A Cyber First Summer Placement

In the first part of this article, we saw how Android’s Binder subsystem works internally, and how we can determine the structure of a captured Parcel by analysing the AOSP source code. Now we will see how to extract this data from a running system and display it in a live tracing tool.

Here is the first part of this article in case you need to catch up.

Hooking into Android

To collect the data from a target device, we use frida, a dynamic instrumentation toolkit, to inject our code into the libbinder.so library, which sends the captured parcels along an ADB connection to a python script running on a connected PC.

The easiest way to get frida running on our target is to use frida-server, which is a process that can run as root on the phone and perform our injection. There are other ways to get frida running, which can be useful where you can’t easily get root access (for example if the application you’re trying to analyze acts differently on a development/rooted device), such as using LD_PRELOAD to load a frida library, but these are more complex and usually only allow injection into a single process.

Either way, once frida is running and connected to the target process, we can inject our script, which is a javascript program that runs on the device:

Interceptor.attach(Module.getExportByName("libbinder.so", 
        "_ZN7android14IPCThreadState8transactEijRKNS_6ParcelEPS1_j"), { // IPCThreadState::transact mangled 
    onEnter: function(args) {
        // Called before entering our function
    }
    
    onLeave: function(retval) {
        // Called after leaving our function
    }
}

This will intercept calls to _ZN7android14IPCThreadState8transactEijRKNS_6ParcelEPS1_j (which is the mangled name of IPCThreadState::transact) and call our provided functions just before and after execution of the actual function. Whilst we could hook directly into IBinder::transact, as described above, there are a few conditions involving system stability levels that can cause a transaction in IBinder::transact to be cancelled. This is not the case with IPCThreadState::transact - by the time that is called from the Binder, the transaction will always be (at least) attempted.

The precise signature of IPCThreadState::transact is:

status_t IPCThreadState::transact(int32_t handle,
                                    uint32_t code, const Parcel& data,
                                    Parcel* reply, uint32_t flags)

Once we enter our onEnter function, we can access the arguments using the passed args array, and therefore get pointers to both the data and reply Parcels. We immediately save the data Parcel, and save the address of the reply until onLeave when it will have been populated by the call, where we save that too. Once this is done, we send both back to our python script, where the main parsing and display logic is. The javascript code is purposefully kept lightweight, because it is blocking the android system and this can cause either the app or the system as a whole to crash.

Headers

At the start of the proxy getVolumes() method from the start, we see the line

    _data.writeInterfaceToken(DESCRIPTOR);

This interface token is written only to the call (or data) Parcels, and contains a few flags and a string containing the classname of the interface the call is meant for. This is verified by the remote service, to check that the call is actually intended for it. (The only real way this could happen is an application either directly writing to Binders rather than casting to the appropriate interface or casting to the wrong interface). For the return value, the only value written is a int32 value that indicates if an exception occurred. If this is nonzero, then an exception occurred and the exception data follows rather than the return value data.

Message Headers

Parsing

When a call Parcel is recieved the program first checks if an override exists for the descriptor. These are custom parsers implemented in python when the automatic parser generation fails for some reason, and as a result they manually need to be updated whenever a new android version is targeted if the underlying parcelabel has changed. For example, the Bitmap class calculates the size of the actual bitmap data using a process involving several different previously read values, so we need to write an override for it.

If no override exists, then we check if a associated .struct file exists for the interface descriptor. If it does, then we can read it as below. If not, then either the struct files haven’t been generated yet, or this interface doesn’t use AIDL (there are a few that for some reason do not). The python Parcel type has a partial reimplementation of the android Parcel API, with a few modifications: I condensed the different methods of reading a Parcelable into two: readParcelable, which takes a string of the type to read, and readDynamicParcelable, which reads the same string from the parcel itself. The methods that add a nullcheck are implemented instead as a conditional, which greatly simplifies the logic of the Parcel class:

{"nullcheck": "readInt32"},
{
    "__backreference": "nullcheck",
    "__conditional": [{
        "disk": "readParcelable",
        "__parcelType": "android.os.storage.DiskInfo"
    }]
},

Using getattr the method names in the .struct file are called directly into the Parcel api, and the returned data is collected into a ParsedParcel object, which stores both the read name:value pairs and the locations that they have been read from. To unpack the data from the raw blob sent from android, the struct api is used:

        def readInt32(self):
            b = struct.unpack_from("<i", self.data, self.pos)
            self.add(4)
            return b[0]

Once everything has been written to the ParsedParcel, it is placed in a multiprocessing queue to be read by the UI thread.

Output

I ended up with several modes of output: a TUI, pcapng files, and raw json output.

TUI

I made a relatively simple TUI for the project, consisting of a three-pane layout:

TUI

On the leftmost pane, each intercepted parcel is listed. The colour is the type of parcel:

Colour	Type
Blue	Call
Purple	Oneway Call (used for calls that should not block)
Yellow	Return
Red	Parse Error
White	Interface file not found (filtered by default)

TUI2

The number on the right is a sequential counter of each intercepted parcel. This is useful when things are filtered out to see how much is being missed, or to remember a parcel for later.

The middle pane contains a hexdump of the parcel itself, with the headers included. The right panel shows the parsed data, in a hierarchical list. You can select parts of the list, and the portion in the hexdump that corresponds to the data is highlighted too!

TUI Filter

Finally, there’s a two-stage filtering system, with exclusion filters removing items from the view and inclusion filters adding removed items back again selectively. The Interface and Method filter fields support regex, which should allow for some relatively complex filtering behavior. I also added a quick way to toggle filters on and off, using TAB.

TUI4

If parsing of a Parcel fails, as much information as possible is shown

here most of the data was successfully parsed, aside from a runaway string near the end.

TUI5

pcap-ng output

The pcap-ng format is a relatively new file format for network packet capture, obsoleting the old pcap format. It’s very versatile, allowing any type of custom data to be stored in a structured format. This makes it good for storing the dumped Binder data. Using the -w flag you can write Binder data to a pcapng file, for later use in wireshark or any other program that supports the format:

Wireshark

Unfortunately since a wireshark dissector for the data does not exist, wireshark cannot parse each Parcel, and just displays it as data. Given more time, I would have tried to write a dissector, but this would be a non-trivial task due to the complexity of the Parcel format, and the dependency on the parsed struct files. Whilst it would be theoretically possible to generate the struct files on-demand as the parcel types appear in wireshark, this would be extremely slow in practice (each struct file currently takes about 3 seconds to generate on my relatively fast laptop) and that just introduces a dependency on the AOSP sources anyway, which is even worse.

A different approach would be to store the parsed data in the pcap-ng file at output time, rather than trying to reconstruct this later. This is possible using Custom block types, but showing this would require changes to wireshark itself, rather than a plug-in dissector. It would also make the the generated files quite large.

The pcapng files my program generates have a few custom fields to store the interface and call codes, which lets them be read back into and properly parsed by the program if needed.

JSON output

For exporting the parsed data itself, the program can also dump the written parcels in JSON format. This contains both the original parcel data and the structured, parsed data:

    [
      {
        "py/object": "parsedParcel.Block",
        "ifName": "android.view.IWindowSession",
        "callName": "onRectangleOnScreenRequested",
        "code": 27,
        "parcel": {
          "py/object": "parcel.Parcel",
          "data": {
            "py/b64": "BAAAwv////9UU1lTGwAAAGEAbgBkAHIAbwBpAGQALgB2AGkAZQB3AC4ASQBXAGkAbgBkAG8AdwBTAGUAcwBzAGkAbwBuAAAAhSpicxMBAABQVnPyAAAAABBIZfIAAAAADAAAAAEAAADoAgAAwAAAAOwCAAD7AAAA"
          },
          "pos": 120,
          "data_size": 120
        },
        "oneway": false,
        "direction": {
          "py/reduce": [
            {"py/type": "parsedParcel.Direction"},
            {"py/tuple": [1]}
          ]
        },
        "parsedParcel": {
          "py/object": "parsedParcel.ParsedParcel",
          "data": [
            {
              "py/tuple": [
                "token",
                "Strong Binder @0xf2735650",
                {"py/tuple": [72,100]}
                  ]
                }
              ]
            },
    ...

The structure of this file may seem a little odd, but this is because it is intended to be both usable for reading back into python objects easily using the jsonpickle module, and for applications that just read the json directly. The advantages of the jsonpickle module are that when read back it will restore the original fully featured objects, which lets them be much more easily used to extend the program using it’s existing functions as a library.

Further steps

There are a multitude of different ways the project could be extended, given more time:

Adding support for multiple versions of the same struct in a single file would simplify the procedure for changing android versions
Improving the java Parcelable parsing logic to remove the need for overrides
Add support for C++ Parcelables
Improve the filter interface to be able to filter on the parsed structure of the parcel, and maybe add autocompletion for the text fields.
Rework the way the generator finds the java files for each Parcelable, potentially removing the need for a 2-pass approach.
As described above, implement a dissector / plugin for wireshark parsing.

Novel cyber solutions and training