Docker Support in AWS Lambda

It was in re:invent 2020 that Amazon announced Docker support for Lambda functions. It was a sigh of relief for many who struggled to meet the size restrictions of lambda functions

I remember the days when I would spend hours together to package my code in a lean python code so that it could fit the lambda code size restrictions. Mostly the exercise lead to hunting for layers that are already pre-packages by some noble souls like Keith. Despite the availability of layers, it was always a challenge to make a set of python libraries to work together. Some has version conflicts, some needed a specific version of python library that was incompatible with other application library dependencies. It was a nightmare to get everything to work, if you wanted to go via serverless route.

With the container support, it makes life easy as one can create a docker image up to 10GB of size and then use it in the creation of lambda function. The workflow is pretty straightforward

Create a requirements.txt file that contains all the python dependencies
Put in all the utility functions that the main lambda handler function depends on, in a folder
Write a docker file - Think of this as a way to specify advanced version of zipping folders. That is an understatement. But that’s what it all boils down. Packaging your application and its dependencies so that it can run on any OS/machine. For example, the following are the instructions to the docker file that does basic text preprocessing using spacy

FROM public.ecr.aws/lambda/python:3.7 COPY requirements.txt . RUN pip3 install -r requirements.txt RUN python -m spacy download en_core_web_sm COPY . .

CMD ["app.lambda_handler"]

As one can see, all the application dependencies are abstracted away in the Dockerfile. All one needs to do, is to specify the dependencies in the requirements.txt file.

Create a Docker image file
Log on to AWS ECR and push the image file
Specify the image URI in the AWS Lambda creation

In about 6 steps, your lambda function is deployed and ready to be used. With the image support, there is no need to think about layer restriction, managing layer versions, managing library conflicts etc. Package the stuff that is already running on your machine in to a docker image and use it. Before you deploy on AWs Lambda, one can easily run the container locally, log in to the container using docker exec -it <CONTAINER_NAME> bash and do any diagnostic checks that will ensure that the container will run as it supposed to run.

May be a few years ago, I would have thought that docker and its related technologies are too geeky for me. But in today’s world, with so many good books and plenty of good articles on docker, one can get a good working knowledge of docker in a few days time. In any case, with so many cloud platform components incorporating docker image support, it is imperative for any big data number cruncher to learn docker skills to quicken model training, development and deployment life cycle. Of course, once one learns basic docker skills, there are many other interesting technology components to understand such as orchestration of containers using Docker Swarm, Kubernetes. Is there a need for a quant/data scientist to learn Kubernetes ? May be there is a case for getting a basic understanding of orchestration of containers too, if you are responsible for developing AND deploying the model.