Paper Author: Adonis Fung email@example.com
Serverless computation is one of the major paradigm shifts advanced by modern cloud computing, where the scaling and maintenance responsibility of servers and language runtime are completely outsourced to cloud providers. Freed from many operational concerns, developers can focus solely on perfecting their application logic. To achieve that, cloud providers are obligated to create a secure and shared multi-tenant infrastructure to isolate and thwart cross-tenant and host-targeted attacks. Recommended and even made the default across all popular virtual assistant (VA) platforms, the serverless computing infrastructure is where different capabilities (Skills for Alexa, Actions for Google, or capsules for Bixby) execute to fulfill users' requests, or intents.
For a complete glossary of terms used for Bixby Developers, see our Glossary.
It is a convenient and straightforward option to accommodate cloud-based virtual assistant skills in the existing serverless computing infrastructure. When we looked more closely, we identified the following characteristics and questioned whether this was the right approach for our capsule developers.
Existing serverless infrastructure is often designed for compatibility and security reasons. To support as many language runtimes as possible, the server is designed to host and execute any binaries. In other words, they are designed for general computing, but not specifically for virtual assistant fulfillment. Essentially, the skills' fulfillment code is hosted as a web service in an existing serverless infrastructure in order to serve intent requests initiated from VA cloud servers. Below, we consider two of the most popular architectures, which their parent companies have both used to offer cloud-based virtual assistant skills.
In Firecracker (the backing Amazon AWS Lambda infrastructure), each tenant is isolated in an individual microVM with the Linux Kernel Virtual Machine (KVM) technology. A guest kernel all the way up to a language runtime is packed into a tenant unit. The KVM serves to provide the virtualization of hardware to the guest kernel, so the attack surface to the host kernel is greatly limited.
In gVisor (the backing Google Cloud Function infrastructure), each tenant is isolated in a Linux container. A user-space kernel and the language runtime exist in each tenant unit. All system call (syscall) requests are routed during runtime through the user space kernel, which on one hand exposes a wide range of available syscalls for the binary that runs on top of it, and on the other hand, confines the extent of impact to be within the container only. The syscall exposures that trickle down to the host kernel are significantly reduced.
We believe that using these existing serverless infrastructures is overkill to serve the computation needs for capsule fulfillment.
The very first thing we did was to cut out all the unnecessary, enabled-by-default, and dangerous features from the bare V8 runtime, which includes WebAssembly, compilation optimizer, executable memory, and the ability to execute dynamic code (
http that touch a broad set of syscalls), we carefully added only the most needed features, including an HTTP request library, as well as
secret libraries, to support authenticated calls to external API servers. To compensate for the performance lost by disabling compilation optimizations, we compile every single function module ahead of time and preload them in memory. This is desirable because (1) these function modules are mostly small code snippets, and (2) this avoids a class of type confusion vulnerabilities that were once classified as zero-days. This lightweight yet still powerful v8 runtime gives us a much cleaner slate than offering a bulky NodeJS runtime for the purpose of capsule fulfillment.
We do not externalize capsule code as web services, so they can never be reached from the outside, and therefore does not require request authenticity checks from developers. Instead, we host developer-submitted capsule code in our own secure serverless fulfillment infrastructure, which closely integrates with our platform. Upon receiving an execution request, our lightweight process is cloned into a separate Linux container. This per-request container has its own unique
While it is true that a guest and user space kernel is supposedly stronger and is a generic means to protect the underlying kernel, we approach the problem by reducing attack surfaces from the ground-up to produce a lightweight runtime while striking the right balance between security and performance. We can limit each untrusted per-request tenant to fire only 22 low-risk syscalls, which itself is a strong isolation against the underlying kernel. As a comparison, Firecracker needs 36 and gVisor requires 64 to function. As an additional layer of defense, we compartmentalize capsules into different VMs based on their riskiness.