Binder Tracing Part 2 - Extracting and Parsing the data
14 Nov 2022 - A Cyber First Summer Placement
In the first part of this article, we saw how Android’s Binder subsystem works internally, and how we can determine the structure of a captured Parcel by analysing the AOSP source code. Now we will see how to extract this data from a running system and display it in a live tracing tool.
Here is the first part of this article in case you need to catch up.
Hooking into Android
To collect the data from a target device, we use frida, a dynamic
instrumentation toolkit, to inject our code into the libbinder.so
library, which sends the captured parcels along an ADB connection to a
python script running on a connected PC.
The easiest way to get frida running on our target is to use
frida-server
, which is a process that can run as root on the phone and
perform our injection. There are other ways to get frida running, which
can be useful where you can’t easily get root access (for example if the
application you’re trying to analyze acts differently on a
development/rooted device), such as using LD_PRELOAD
to load a frida
library, but these are more complex and usually only allow injection
into a single process.
Either way, once frida is running and connected to the target process, we can inject our script, which is a javascript program that runs on the device:
1
2
3
4
5
6
7
8
9
10
Interceptor.attach(Module.getExportByName("libbinder.so",
"_ZN7android14IPCThreadState8transactEijRKNS_6ParcelEPS1_j"), { // IPCThreadState::transact mangled
onEnter: function(args) {
// Called before entering our function
}
onLeave: function(retval) {
// Called after leaving our function
}
}
This will intercept calls to
_ZN7android14IPCThreadState8transactEijRKNS_6ParcelEPS1_j
(which is
the mangled name of IPCThreadState::transact
) and call our provided
functions just before and after execution of the actual function. Whilst
we could hook directly into IBinder::transact
, as described above,
there are a few conditions involving system stability levels that can
cause a transaction in IBinder::transact
to be cancelled. This is not
the case with IPCThreadState::transact
- by the time that is called
from the Binder, the transaction will always be (at least) attempted.
The precise signature of IPCThreadState::transact
is:
1
2
3
status_t IPCThreadState::transact(int32_t handle,
uint32_t code, const Parcel& data,
Parcel* reply, uint32_t flags)
Once we enter our onEnter
function, we can access the arguments using
the passed args
array, and therefore get pointers to both the data and
reply Parcel
s. We immediately save the data Parcel
, and save the
address of the reply until onLeave
when it will have been populated by
the call, where we save that too. Once this is done, we send both back
to our python script, where the main parsing and display logic is. The
javascript code is purposefully kept lightweight, because it is blocking
the android system and this can cause either the app or the system as a
whole to crash.
Headers
At the start of the proxy getVolumes() method from the start, we see the line
1
_data.writeInterfaceToken(DESCRIPTOR);
This interface token is written only to the call (or data) Parcel
s,
and contains a few flags and a string containing the classname of the
interface the call is meant for. This is verified by the remote service,
to check that the call is actually intended for it. (The only real way
this could happen is an application either directly writing to Binder
s
rather than casting to the appropriate interface or casting to the wrong
interface). For the return value, the only value written is a int32
value that indicates if an exception occurred. If this is nonzero, then
an exception occurred and the exception data follows rather than the
return value data.
Parsing
When a call Parcel
is recieved the program first checks if an override
exists for the descriptor. These are custom parsers implemented in
python when the automatic parser generation fails for some reason, and
as a result they manually need to be updated whenever a new android
version is targeted if the underlying parcelabel has changed. For
example, the Bitmap
class calculates the size of the actual bitmap
data using a process involving several different previously read values,
so we need to write an override for it.
If no override exists, then we check if a associated .struct
file
exists for the interface descriptor. If it does, then we can read it as
below. If not, then either the struct files haven’t been generated yet,
or this interface doesn’t use AIDL (there are a few that for some reason
do not). The python Parcel
type has a partial reimplementation of the
android Parcel
API, with a few modifications: I condensed the
different methods of reading a Parcelable
into two: readParcelable
,
which takes a string of the type to read, and readDynamicParcelable
,
which reads the same string from the parcel itself. The methods that add
a nullcheck are implemented instead as a conditional, which greatly
simplifies the logic of the Parcel
class:
1
2
3
4
5
6
7
8
{"nullcheck": "readInt32"},
{
"__backreference": "nullcheck",
"__conditional": [{
"disk": "readParcelable",
"__parcelType": "android.os.storage.DiskInfo"
}]
},
Using getattr
the method names in the .struct
file are called
directly into the Parcel
api, and the returned data is collected into
a ParsedParcel
object, which stores both the read name:value pairs and
the locations that they have been read from. To unpack the data from the
raw blob sent from android, the struct
api is used:
1
2
3
4
def readInt32(self):
b = struct.unpack_from("<i", self.data, self.pos)
self.add(4)
return b[0]
Once everything has been written to the ParsedParcel
, it is placed in
a multiprocessing
queue to be read by the UI thread.
Output
I ended up with several modes of output: a TUI, pcapng files, and raw json output.
TUI
I made a relatively simple TUI for the project, consisting of a three-pane layout:
On the leftmost pane, each intercepted parcel is listed. The colour is the type of parcel:
Colour | Type |
---|---|
Blue | Call |
Purple | Oneway Call (used for calls that should not block) |
Yellow | Return |
Red | Parse Error |
White | Interface file not found (filtered by default) |
The number on the right is a sequential counter of each intercepted parcel. This is useful when things are filtered out to see how much is being missed, or to remember a parcel for later.
The middle pane contains a hexdump of the parcel itself, with the headers included. The right panel shows the parsed data, in a hierarchical list. You can select parts of the list, and the portion in the hexdump that corresponds to the data is highlighted too!
Finally, there’s a two-stage filtering system, with exclusion filters
removing items from the view and inclusion filters adding removed items
back again selectively. The Interface and Method filter fields support
regex, which should allow for some relatively complex filtering
behavior. I also added a quick way to toggle filters on and off, using
TAB
.
If parsing of a Parcel
fails, as much information as possible is shown
- here most of the data was successfully parsed, aside from a runaway string near the end.
pcap-ng output
The pcap-ng format is a relatively new file format for network packet
capture, obsoleting the old pcap
format. It’s very versatile, allowing
any type of custom data to be stored in a structured format. This makes
it good for storing the dumped Binder data. Using the -w
flag you can
write Binder data to a pcapng file, for later use in wireshark or any
other program that supports the format:
Unfortunately since a wireshark dissector for the data does not exist,
wireshark cannot parse each Parcel
, and just displays it as data.
Given more time, I would have tried to write a dissector, but this would
be a non-trivial task due to the complexity of the Parcel
format, and
the dependency on the parsed struct
files. Whilst it would be
theoretically possible to generate the struct
files on-demand as the
parcel types appear in wireshark, this would be extremely slow
in practice (each struct
file currently takes about 3 seconds to
generate on my relatively fast laptop) and that just introduces a
dependency on the AOSP sources anyway, which is even worse.
A different approach would be to store the parsed data in the pcap-ng
file at output time, rather than trying to reconstruct this later. This
is possible using Custom block types, but showing this would require
changes to wireshark itself, rather than a plug-in dissector. It would
also make the the generated files quite large.
The pcapng
files my program generates have a few custom fields to
store the interface and call codes, which lets them be read back into
and properly parsed by the program if needed.
JSON output
For exporting the parsed data itself, the program can also dump the written parcels in JSON format. This contains both the original parcel data and the structured, parsed data:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[
{
"py/object": "parsedParcel.Block",
"ifName": "android.view.IWindowSession",
"callName": "onRectangleOnScreenRequested",
"code": 27,
"parcel": {
"py/object": "parcel.Parcel",
"data": {
"py/b64": "BAAAwv////9UU1lTGwAAAGEAbgBkAHIAbwBpAGQALgB2AGkAZQB3AC4ASQBXAGkAbgBkAG8AdwBTAGUAcwBzAGkAbwBuAAAAhSpicxMBAABQVnPyAAAAABBIZfIAAAAADAAAAAEAAADoAgAAwAAAAOwCAAD7AAAA"
},
"pos": 120,
"data_size": 120
},
"oneway": false,
"direction": {
"py/reduce": [
{"py/type": "parsedParcel.Direction"},
{"py/tuple": [1]}
]
},
"parsedParcel": {
"py/object": "parsedParcel.ParsedParcel",
"data": [
{
"py/tuple": [
"token",
"Strong Binder @0xf2735650",
{"py/tuple": [72,100]}
]
}
]
},
...
The structure of this file may seem a little odd, but this is because it
is intended to be both usable for reading back into python objects
easily using the jsonpickle
module, and for applications that just
read the json directly. The advantages of the jsonpickle
module are
that when read back it will restore the original fully featured objects,
which lets them be much more easily used to extend the program using
it’s existing functions as a library.
Further steps
There are a multitude of different ways the project could be extended, given more time:
- Adding support for multiple versions of the same struct in a single file would simplify the procedure for changing android versions
- Improving the java Parcelable parsing logic to remove the need for overrides
- Add support for C++ Parcelables
- Improve the filter interface to be able to filter on the parsed structure of the parcel, and maybe add autocompletion for the text fields.
- Rework the way the generator finds the java files for each
Parcelable
, potentially removing the need for a 2-pass approach. - As described above, implement a dissector / plugin for wireshark parsing.