AWS Lambda Function to transfer S3 csv to DynamoDB table— Quick and dirty learning

Virendra Pratap Singh
6 min readMay 17, 2021

Very frequently I resort to quick and dirty learning short-cuts. One reason I avoid, whenever possible, reinventing the wheel and second, more importantly, these short-cuts provide me with a starting point for diving deep. Here is a writeup that I created for a learning demo on an AWS Lambda function. It uploads data written in S3 (as a csv file) to an AWS DynamoDB table.

The lambda code (written in Python 3.8) is provided at the end of writeup. The steps that I used are as follow –

1) Download data from Kaggle (IMDB dataset for testing and experimenting)

Since original file (I used “IMDb movies.csv” is pretty big — 45.71MB), I created a number of small csv files for testing by using subset of actual files.

2) Define an S3 location and directory (also called key by many) where files will be loaded. We will move the content of (only) csv files to DynamoDB.

3) Create a table in DynamoDB (with proper primary key) that will act as our target.

4) Create a IAM role that will enable lambda function to access S3 bucket as well as DynamoDB table and AWS CloudWatch logs.

5) Create a lambda function that will be triggered by creation/upload of csv file in S3 location and in turn transfer the contents to the DynamoDB table.

Let’s go to a bit of details for each step. Although provided in a sequence, feel free to do them as per your understanding. Remember this is a quick and dirty guide - you ought to experiment yourself a bit.

Define S3 bucket and directory –

⦁ The S3 bucket should be created in the same region where you will write your lambda function and DynamoDB table. I created it in ap-south-1 (i.e. Mumbai region).

⦁ Within bucket I created a directory “imdb” to demonstrate that the trigger can be set for a particular directory.

⦁ All other properties were left as default.

⦁ One test file imdb_test.csv was uploaded in advance for sake of testing.

Create an IAM role for lambda function -

⦁ I created a role — s3-dynamodb-cloutwatch-role with following policies attached

AmazonS3FullAccess

AmazonDynamoDBFullAccess

AWSOpsWorksCloudWatchLogs

Do we need to provide full access to for each service mentioned above? NO and we should not give such access but this was a demo so I was OK with.

Create a target table in DynamoDB–

⦁ Create a new table “imdb_table”. Remember to use same region that you chose for S3 bucket (and plan to use for Lambda function).

⦁ Chose a primary key — I took imdb_id (string) since it suits the need perfectly.

⦁ All other properties were left as default. Checking the table we don’t see any record present in it right now.

Write a Lambda function –

⦁ The code for function is provided below. It was written in Python 3.8 and used json and boto3 libraries.

⦁ The function is assigned the role — s3-dynamodb-cloutwatch-role that we created earlier.

⦁ I only read and populated following fields from csv file in DynamoDB — imdb_id (primary_key and field 0), title (field 1), year (field 3), duration (field 6), country (field 7), language (field 8) and director (field 9).

⦁ All fields were treated as string type.

⦁ Only one overall exception handling :-)

⦁ For testing, I used the S3 Put template and modified it to use the bucket and test file name that I had created previously.

A test run shows that one record gets loaded — it is reflected in the output of lambda function and we can subsequently check the same in the DynamoDB table (imdb_table) as well.

⦁ We now add a trigger to allow the Lambda function run whenever a file is loaded in the S3 bucket without a manual intervention. We go to the function overview and click “Add trigger”.

⦁ Search an S3 trigger in “Trigger Configuration” select search. Once selected, we have to provide information about the bucket name, event name (I chose all object create events), prefix (i.e. the directory within the bucket — for our case it is imdb) and suffix (i.e. the type of files — for our case it is csv).

Please note that suffix should be .csv and not *.csv

⦁ Once done the trigger should reflect in the function overview section with details in the Configuration -> Triggers tab.

⦁ Load another test csv file in the bucket (imdb directory) and it should immediately trigger another function run and data should reflect in the DynamoDB table.

Write a Lambda function –

⦁ The “monitor” tab for the lambda function shows the log and CloudWatch details.

Here is the code that was used as the actual lambda function -

Important — Once all your learning is done, remember to delete any DynamoDB instance, large S3 files etc. I do this every time to ensure I don’t get billed for things I don’t want (and to ensure links/credentials etc that I may inadvertently share as screenshots are useless for others).

Like any other learning dig deep with official training pages (AWS documentation and https://aws.training). Keep learning.

--

--

Virendra Pratap Singh

Snowflake, AWS, Python, MuleSoft and literally anything to do with data manipulation and analysis!! Love simple solutions and willing to learn from anyone.