blog @ johnemmons.com back to homepage

AWSLambdaFace: serverless face recognition

April 26, 2017
TL;DR: Serverless compute platforms such as Amazon Web Services (AWS) Lambda were intended to be used for web microservices and to handle asynchronous events generated by other Amazon web services (DynamoDB, S3, SNS, etc.). However, AWS Lambda also allows users to upload arbitrary linux binaries along with their lambda functions. These binaries can be executed during a lambda invocation, effectively turning AWS Lambda into a supercomputer that can be started on-demand in seconds and billed at a 100ms granularity. In this post, I will show you how I deployed a full-blown deep convolutional neural network based face recognition tool on AWS Lambda and used the system to query for faces in videos in a massively parallel way (skip to figure 2 to see the end result!).
ec2_console

Figure 1: with function as a service (FaaS) platforms like AWS Lambda, going completely serverless has become a viable option for web technology companies. Rather than buying physical machines or provisioning EC2 VMs, companies such as A Cloud Guru, Instant, and Serverless rely entirely on auto-scaling, fully-managed services like DynamoDB, Lambda, SNS, SQS, etc. (an empty EC2 dashboard has become a badge of honor!).

In 2014, Amazon Web Services (AWS) released their function as a service (FaaS) platform, Lambda. Since then, a burgeoning community of developers has formed around this technology, generating countless web frameworks, debugging/monitoring tools, and storage backends specifically designed for the serverless world. The capabilities of this technology is best understood by examining the EC2 dashboard of start ups such as A Cloud Guru, Instant, and Serverless (figure 1); these companies have been built using nothing but serverless technologies like Lamdba and DynamoDB. The FaaS paradigm has even worked its way into many major businesses outside of the tech industry; for example, Nordstroms has migrated many of their on-premise and existing cloud infrastructure to serverless platforms, which has helped them improve efficiency in both developer time and in operational cost.

There are three things that make FaaS platforms like AWS Lambda different from other auto-scaling, web infrastructure services:

  1. Fully-managed operations: developers using FaaS platforms only need to write code that follows the simple request-response software pattern. The job of provisioning servers, load-balancing, and fixing security vulnerabilities are all outsourced to the FaaS provider. By using these platforms, you are essentially renting the best operations talent on Earth from Amazon, Google, IBM, and Microsoft.
  2. Instantaneous start up and infinite scalability: for many applications, users expect an interactive experience; and delays above a few hundred milliseconds can significantly degrade a user's perceived quality of the service. In the past, obtaining interactive speeds required resources to be over-provisioned so that delays could be minimized even under spikes in load; however, having lots of idle servers is both expensive and adds complexity to the system. FaaS platforms provide an interactive experience without needing to over-provision resources by having developers encapsulate their functionality in Linux containers; these containers can be started almost instantly and distributed to additional nodes in a data center quickly during spikes in load.
  3. Micro-scale billing granularity: AWS Lambda (and other competing services) bill based on invocation time rounded up to the nearest 100ms increment; so you only pay for what you use. Cloud VMs and container management services provide this same benefit, but with much less granularity. For example, EC2 bills a minimum of 1 hour and GCE VMs bill a minimum of 10 minutes. As a result, it is frequently much less expensive to run a system on a FaaS platform than with cloud VMs or on-premise servers where machines may be idle much of the time.

Let's find George Porter ( george_porter ) with AWS Lambda! (code)
NSDI'17 full day
George Porter montage
Figure 2: The first panel shows six hours of video I recorded while walking around with a GoPro strapped to my head at this year's USENIX NSDI conference. Once the video was recorded, I used a variant of openface, a deep neural network based face recognition tool, which I had deployed onto AWS Lambda to search for one of the conference attendees, George Porter. Because I had deployed this system on AWS Lambda, the system was able to start up 3000+ Lambda workers almost instantly, allowing it to scan over all the frames in the video very quickly. Once all of the frames where George was present were identified, the ExCamera system (which also runs on AWS Lambda) was used to stitch together a montage. In total, the face recognition and montage creation took less than 5 minutes and cost about $8!

The benefits of AWS Lambda make it perfect for web microservices, but these same benefits also make it a great tool for other workloads. In fact, it is easy to imagine taking any bursty, highly parallelizeable application and deploying it on a FaaS platform; and in doing so, that application would gain all of the advantages of these services. To demonstrate this, the remainder of this blog post is dedicated to explaining how I used AWS Lambda to perform massively parallel face recognition using a deep convolutional nueral network.

In the first panel of figure 2, you see a six hour long video I recorded with a GoPro (strapped to my head!) from USENIX NSDI'17, a computer systems and networking conference. I was at the conference with Keith Winstein and Sadjad Fouladi to help present ExCamera and I wanted to record every minute of the conference! After recording this video, our goal was to scan over the faces in the video and stitch together a montage of all of the times I encountered our UCSD collaborator, George Porter (the second panel of figure 2); we wanted to make sure he made it to the conference safely! And most importantly, we wanted to do all of this on AWS Lambda so that thousands of instances of our deep nueral network based face recognizer could (1) run in parallel across the video, (2) be billed at a 100ms granularity, and (3) start up instantly after invocation.

To perform the face recognition and stitch the montage together, our system performs the following steps (see the AWSLambdaFace github page for more details and simplified demo you can try at home!):

  1. Upload image with face of interest to an EC2 coordination server and perform standard image augmentation techniques to generate a training set.
  2. Use a deep neural network (DNN) to locate and generate 128-dimensional feature vectors for the face in each augmented image in the training set.
  3. Train a KNN classifier with (1) the augmented image feature vectors and (2) labeled faces in the wild (lfw) feature vectors.
  4. Run the DNN featurizer and KNN classifier in parallel across the entire video using 3000+ AWS lambda workers to perform recognition.
  5. Aggregate all the frames where the face of interest was recognized.
  6. Launch ExCamera to encode the frames into a montage!

It is unlikely that the major cloud providers (AWS, Google, IBM, or Microsoft) would have predicted that their systems would be used to perform computations like deep learning. However, I hope that after reading this post you will agree that virtually any bursty, highly parallelizeable application could benefit by using FaaS platforms. It is time that we re-examine the ways we build mobile and desktop applications since it is now possible to affordably and interactively ask for thousands of cores in the cloud!