This page looks best with JavaScript enabled

Create Thumbnail Worker With S3 and Lambda: Make the Thumbnail

 ·  ☕ 8 min read

Introduction

This is second and final post in the series of creating a S3 thumbnail system with Lambda.

You must already be aware of the prerequisites from the last section. To follow this post, familarity with Python would benifit.

This is second part of the 2 section tutorial:

  1. Test the Trigger
  2. Make the Thumbnail (you are here)

We are doing Make the Thumbnail.

Overview

In this post, we’ll build upon the resources we provisioned in the last post of this series. If you have not yet covered that, please go and cover.

We have already provisioned a Lambda trigger function and the source bucket.

After you have done that. You can head over to the next section.

Prerequisites

Although we are not going to do any rocket science here, but prior familarity with following will help, but is not important.

  • IAM
  • AWS CLI
  • Bash

With that said, when we had no dependency in our Lambda function; as seen in our last post. Then we simply edited the code from the online editor. But as you know that Python standard library does not have anything to deal with images elegantly. In that case, we need to leverage libraries such as Pillow. When we work with third-party libraries, we have to package the dependency with the code itself; as a zip.

Check logs of previous invocations in CloudWatch

We have temp-thumbnail-trigger Lambda function which we worked with in last post. Let’s check their execution log.

Step 1:

Note: Please use single quotes, as both single and double quotes have different meaning for bash.

$ aws logs describe-log-groups --query 'logGroups[*].logGroupName' | grep thumbnail
    "/aws/Lambda/temp-thumbnail-trigger",

There are logs for multiple AWS service in CloudWatch. We can see logs for all the services/arns. describe-log-groups subcommand lists all of such loggable service aka log groups. The output to describe-log-groups is very verbose (it’s a JSON), and I have narrowed it down to logGroups.logGroupName only.

Make note of the output we received, we are gonna use that in our next command.

Step 2:

Now we want to see something called log streams for this particular log group. Each stream is a result of an invocation here (correct me if I am wrong here).

I have filtered it down the output to logStreams.logStreamName

$ aws logs describe-log-streams --log-group-name '/aws/Lambda/temp-thumbnail-trigger' --query 'logStreams[*].logStreamName'
[
    "2021/11/14/[$LATEST]f0c4ebb44233445ab27765977016f27f",
    "2021/11/14/[$LATEST]f5d859c9009441c9a8f0cf60b33b47de"
]

The above output shows that I have invoked this Lambda for 2 times. The second one is the latest one. I am going to use the second output in next step.

Step 3:

We are now looking for log events in this particular log stream "2021/11/14/[$LATEST]f5d859c9009441c9a8f0cf60b33b47de".

$ aws logs get-log-events --log-group-name '/aws/Lambda/temp-thumbnail-trigger' --log-stream-name '2021/11/14/[$LATEST]f5d859c9009441c9a8f0cf60b33b47de' --query 'events[*].message'
[
    "Loading function\n",
    "START RequestId: f168a35d-d4a1-4edc-909d-cd1f24612cd5 Version: $LATEST\n",
    "CONTENT TYPE: binary/octet-stream\n",
    "END RequestId: f168a35d-d4a1-4edc-909d-cd1f24612cd5\n",
    "REPORT RequestId: f168a35d-d4a1-4edc-909d-cd1f24612cd5\tDuration: 226.76 ms\tBilled Duration: 227 ms\tMemory Size: 128 MB\tMax Memory Used: 72 MB\tInit Duration: 420.19 ms\t\n"
]

As you can see, the Lambda has been invoked without raising any exceptions. Good sign.

Now you know how to check CloudWatch logs with aws cli.

Create destination bucket

Choose whatever name you want. We are anyway going to reference this name in the Lambda code.

$ aws s3api create-bucket --bucket dest-bucket-sntshk --region ap-south-1 --create-bucket-configuration LocationConstraint=ap-south-1
{
    "Location": "http://dest-bucket-sntshk.s3.amazonaws.com/"
}

Update role policy

If you got an AccessDenied error in last post while testing the Lambda, that was because by default the Lambda has only permission to talk to CloudWatch (via role). After I attached a policy with GetObject permission to the role, Lambda was able to talk to AWS as well.

This time we need to attach one more policy to the role. This time we want to PutObject to our dest-bucket-sntshk (you will have a different name).

Find the policy

If you followed the last post as it was, you should be able to run the following command and look for <funcname>-<role>-<sha>.

$ aws iam list-roles --query 'Roles[*].RoleName'

I tried piping the output to grep, but it did work. But I was able to find temp-thumbnail-trigger-role-fc1p90sl.

Update the policy

We’ll use the same method we used in last post to attach the new inline policy to put object in destination bucket.

You can update the existing role to add or update the policy from the web interface.

My temp-thumbnail-trigger-role-fc1p90sl has three policy now.

  1. The default policy created for writing to CloudWatch.

  2. s3-get-object policy which has s3:GetObject access from arn:aws:s3:::source-bucket-sntshk/*.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::dest-bucket-sntshk/*"
        }
    ]
}
  1. s3-put-object policy which has s3:PutObject access to arn:aws:s3:::dest-bucket-sntshk/*.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::dest-bucket-sntshk/*"
        }
    ]
}

Prepare a deployment package

In this section we’ll the the actual code to do the transformation and then bundle the third-party package in the bundle.

Write actual code to generate thumbnail

This is the heart of the tutorial series. We are going to use Python to create our Lambda function. Let’s get started with the code first, then I’ll explain line by line.

Go ahead and create a directory on your local machine and create a file called lambda_function.py (naming matters).

lambda_function.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import boto3
import uuid
from urllib.parse import unquote_plus
from PIL import Image
import PIL.Image

s3_client = boto3.client('s3')

def resize_image(image_path, resized_path):
    with Image.open(image_path) as image:
        image.thumbnail(tuple(x / 2 for x in image.size))
        image.save(resized_path)

def lambda_handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = unquote_plus(record['s3']['object']['key'])
        tmpkey = key.replace('/', '')
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), tmpkey)
        upload_path = '/tmp/resized-{}'.format(tmpkey)
        s3_client.download_file(bucket, key, download_path)
        resize_image(download_path, upload_path)
        s3_client.upload_file(upload_path, 'dest-bucket-sntshk', key)

Unlike what I usually do, I’ll start explaining from line 16:

  1. Line 16 defines a function called lambda_handler which is the entry point of the Lambda. And you see those event & context? event is the data that’s passed to the function upon execution. An event can come from a various number of places.. API Gateway, another Lambda and so on, each intermediate service in between can add or remove contents of the event. And for context, the main role of a context is to provide information about the current execution environment. You can read about context on AWS docs.
  2. One the just very next line we are running a for loop for all the records. Yes, the event which is coming to our Lambda will have one or more record.
  3. For each record, we are fetching the bucket name from the record dict, and the file name which is to be process. You can see we are using bracket notation to access the names from the dict. unquote_plus is to URL decoding.
  4. Line 18 to 20 deals with creating path variables for fetching and uploading images. On line 21 and 23 we do the actual fetching and uploading to the specific buckets. On line 22 we have a function call to resize_image, which is defined on line 9 through 12.
  5. resize_image simply reduces height and width of the image by a factor of 2. After resizing it saves to specified path.

Package Pillow with the deployment bundle

As you know, we depend on an external package called Pillow which provides PIL in our source code. Lambda does not supports doing a pip install and rather offload this work on the function maintainer.

pip install --target ./package Pillow

After above command, my directory structure looks something like this:

$ tree -L 2
.
├── lambda_function.py
└── package
    ├── PIL
    ├── Pillow-8.4.0.dist-info
    └── Pillow.libs

4 directories, 1 file

The point to be noted here is we need to get into the package directory now. And then make zip of this directory in the parent directory.

$ cd package
$ zip -r ../my-deployment-package.zip .
  adding: Pillow.libs/ (stored 0%)
  adding: Pillow.libs/libwebpdemux-f117ddb4.so.2.0.8 (deflated 72%)
  adding: Pillow.libs/liblzma-d540a118.so.5.2.5 (deflated 66%)
...

Now that we have created the zip with the dependencies, we will go ahead and include the function inside the zip.

I’ll go back to the parent directory here:

$ cd ..
$ zip -g my-deployment-package.zip lambda_function.py
  adding: lambda_function.py (deflated 54%)

Create a deployment to existing Lambda function

aws lambda update-function-code --function-name temp-thumbnail-trigger --zip-file fileb://my-deployment-package.zip

Updating the function code would respond in something similar to this:

{
    "FunctionName": "temp-thumbnail-trigger",
    "FunctionArn": "arn:aws:lambda:ap-south-1:XXXXXXXXXXXX:function:temp-thumbnail-trigger",
    "Runtime": "python3.8",
    "Role": "arn:aws:iam::XXXXXXXXXXXX:role/service-role/temp-thumbnail-trigger-role-fc1p90sl",
    "Handler": "lambda_function.lambda_handler",
    "CodeSize": 3413118,
    "Description": "An Amazon S3 trigger that retrieves metadata for the object that has been updated.",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2021-11-25T10:12:36.000+0000",
    "CodeSha256": "XSW5FerVZKYdDvRG+XIXKU9Rlaj6xntRb2Ja3etanZU=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "7946029f-b70d-43ef-8176-06818f8bfea1",
    "State": "Active",
    "LastUpdateStatus": "InProgress",
    "LastUpdateStatusReason": "The function is being created.",
    "LastUpdateStatusReasonCode": "Creating",
    "Architectures": [
        "x86_64"
    ]
}

Test the function with new update

Testing is part of developers life. The test we are doing by manually invoking the test from the UI is know as ad-hoc testing. Here I present you a gif.

Successful creation of thembnail from S3 trigger
Successful creation of thembnail from S3 trigger

On the other hand, if you are facing a cannot import name '_imaging' from 'PIL' error when testing, next section is for you.

cannot import name ‘_imaging’ from ‘PIL’

There could be a case where you’d get a traceback like this:

1
2
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_imaging' from 'PIL' (/var/task/PIL/__init__.py)
Traceback (most recent call last):

There is incompatibility between the version of Pillow being used with that of Python. For me the fix was simple. I updated the Lambda runtime from Python 3.7 -> 3.8.

Successful creation of thembnail from S3 trigger
cannot import name '_imaging' from 'PIL' fix

The fix could be different if you are following this tutorial anytime in future.

Homework

As a homework, you can try to maintaining the hierarchy from the source bucket.

Conclusion

Today we saw how we can leverage S3 triggers to chain lambda function to it. But we are not limited to Lambdas, we can push to event to an SNS or SQS queue. And let some other consumer parse the event and work on it.

As already might have noticed, you are also not only limited to PUT events. You can leverage other types of CRUD event to trigger various other handlers. Possibilities are endless.

Share on

Santosh Kumar
WRITTEN BY
Santosh Kumar
Santosh is a Software Developer currently working with Method Studios as a Full Stack Developer.