Docker Support in AWS Lambda
It was in re:invent 2020 that Amazon announced Docker support for Lambda functions. It was a sigh of relief for many who struggled to meet the size restrictions of lambda functions
I remember the days when I would spend hours together to package my code in a lean python code so that it could fit the lambda code size restrictions. Mostly the exercise lead to hunting for layers that are already pre-packages by some noble souls like Keith. Despite the availability of layers, it was always a challenge to make a set of python libraries to work together. Some has version conflicts, some needed a specific version of python library that was incompatible with other application library dependencies. It was a nightmare to get everything to work, if you wanted to go via serverless route.
With the container support, it makes life easy as one can create a docker image up to 10GB of size and then use it in the creation of lambda function. The workflow is pretty straightforward
- Create a
requirements.txt
file that contains all the python dependencies - Put in all the utility functions that the main lambda handler function depends on, in a folder
- Write a docker file - Think of this as a way to specify advanced version of
zipping folders. That is an understatement. But that’s what it all boils down.
Packaging your application and its dependencies so that it can run on any
OS/machine. For example, the following are the instructions to the docker
file that does basic text preprocessing using
spacy
|
|
As one can see, all the application dependencies are abstracted away in the
Dockerfile. All one needs to do, is to specify the dependencies in the
requirements.txt
file.
- Create a Docker image file
- Log on to AWS ECR and push the image file
- Specify the image URI in the AWS Lambda creation
In about 6 steps, your lambda function is deployed and ready to be used. With
the image support, there is no need to think about layer restriction, managing
layer versions, managing library conflicts etc. Package the stuff that is
already running on your machine in to a docker image and use it. Before you
deploy on AWs Lambda, one can easily run the container locally, log in to the
container using docker exec -it <CONTAINER_NAME> bash
and do any diagnostic
checks that will ensure that the container will run as it supposed to run.
May be a few years ago, I would have thought that docker
and its related
technologies are too geeky for me. But in today’s world, with so many good books
and plenty of good articles on docker, one can get a good working knowledge of
docker
in a few days time. In any case, with so many cloud platform components
incorporating docker image support, it is imperative for any big data number
cruncher to learn docker skills to quicken model training, development and
deployment life cycle. Of course, once one learns basic docker
skills, there
are many other interesting technology components to understand such as
orchestration of containers using Docker Swarm, Kubernetes. Is there a need
for a quant/data scientist to learn Kubernetes ? May be there is a case for
getting a basic understanding of orchestration of containers too, if you are
responsible for developing AND deploying the model.