Amazon Cloud and Lambda: Use of Lambdas with DynamoDB, RDS, and VPC
You probably already know or have certainly heard of the cloud platform offered by Amazon: Amazon Web Services (AWS), which accounts for more than a third of the share of the cloud-platform market. It is the preferred solution for key accounts such as Netflix, Atlassian, Ryanair, and even NASA. It also offers a whole range of services tailored to different needs, such as calculation, storage, databases, and networks. As you might have realized, AWS is trending at the moment and it’s a safe bet that you will work with it one day.
At ISC (SQLI’s Innovative Service Centers), we have used many of the tools offered by the AWS platform, including the Lambda functions. Here is our feedback, which should help you get your AWS Lambda project off to a good start.
In 2017, we started an ambitious project with AWS. We brought together a team of experienced developers, experts, and architects to build a portal using only cloud services from Amazon.
We set out on an adventurous path and, quite quickly, ran into a number of problems, from which we learnt some important lessons. Below are 3 important tips that we can offer you as a result of this experience.
Tip 1: The importance of design
At Amazon, a Lambda is an atomic (the smallest that exists) autonomous (stateless) unit of executable code, where the developer does not have to deal with the execution environment (serverless). In practice, it is a piece of code that contains a single function invoked by Amazon during an external call.
A Lambda can be triggered by an HTTP call (via the API Gateway), for example, to create an API endpoint. But it can also be invoked by other Lambda functions or other Amazon services, for example by triggering an event from an SNS/SQS message queue.
The principle of the Lambda function, which can be instantiated on the fly without worrying about infrastructure, may seem very attractive, particularly in terms of cost. However, you must be careful to design your application architecture properly and think about the composition of your code.
How will you manage security? What authentication systems will you implement? How do your Lambdas communicate with each other? How will you share code between your Lambdas? And to what extent?
These issues should not be taken lightly. Try to:
- Properly scope your requirement.
- Determine why you use Lambda (rather than a more classic monolithic application).
- Know how your application will behave at peak call levels or if there is little traffic.
Map your application and the dataflows between the compartments of your application.
Important note on warming up
A Lambda can be in one of two states: hot or cold. A “hot” Lambda is where source code is already loaded into memory that is ready to execute. When a call arrives, the code is triggered almost instantly and processes the request.
However, a “cold” Lambda is not yet loaded into memory and therefore needs a warm-up step. During this step, AWS loads the Lambda source code into memory (on a server) and prepares the execution context. For approximately 20 minutes, the Lambda is “hot,” ready to be executed.
The warm-up step costs execution time and is variable in length. Depending on the availability of AWS servers, the size of your source code, and the amount of code executed outside the Lambda (context), warming up your Lambda may take up to 3 seconds (in the worst case). Potentially, your code will take several tenths of a second, or even several seconds, before it begins to run. For an HTTP request, several seconds may be an eternity for the user who is awaiting the result of a query.
It is important therefore to be aware of this constraint and to optimize your code accordingly. There are solutions for bypassing these constraints, even though, in my opinion, the best solution is to adapt and to anticipate, including adjusting the ergonomics of your application.
A brief description of our experience
Some of the practices we implemented, after various experiments, are outlined below, to provide you with food for thought in your modeling.
Please note, this is not a standard architecture that will work in all cases, but simply the architecture that we implemented within the technical constraints of our business line.
In terms of security and authentication, we built on standard API Gateway rules using a Lambda identification model. The latter is called by the other application API Lambdas (themselves invoked by HTTP calls) for the logged-in user. Our identification Lambda returns information about the user (retrieved from a database) to the API Lambda based on the HTTP headers in the request.
Tip 2: The choice of data-storage system
The vast majority of applications need to store and manipulate data. There are several possible technical solutions, depending on the type of data that you handle. My advice is to analyze your requirement, conceptualize your data model, and choose your storage system accordingly.
Learn at your expense
During the first stages of application development, we were faced with a choice of storage system.
We listened to advice on best practice on Lambdas and were pointed toward the DynamoDB solution to measure the impact effectively.
DynamoDB seemed to be the ideal tool: it works in a serverless architecture, just as Lambdas do. Infrastructure costs are therefore minimal. It provides a guarantee of availability, as well as very good response times and transfer speeds. It is also able to scale dynamically (subject to some settings) and thus absorb peak traffic. However, it does not provide for all requirements! DynamoDB is a non-relational NoSQL database engine. It allows you to store and manipulate standard and non-standard data models, represented by JSON objects.
In our architecture, each business line is the master of its own data, comprising a set of Lambdas serving as the API endpoint. In this context, we created one or more DynamoDB tables to store data by business line.
The problem occurs when it comes to creating links between data items. Our business data model is relational—we define relationships between data from different lines. Unfortunately, NoSQL databases are not designed for this requirement. We therefore faced major performance problems. When a Lambda aggregates data from several business lines, it must query other Lambdas to ask them for data.
So, imagine a Lambda that retrieves 10 lines. For each, it must link data with two other business line models. It will then need to make 20 Lambda calls or query two Lambdas to retrieve and manipulate the required output data.
This is not efficient. The time for calls is long, the data-retrieval limits of DynamoDB queries are reached rapidly; it quickly becomes overly complex.
If you have a relational data model, we advise you use a classic relational database system (such as PostgreSQL, Oracle, or MySQL), which you could host on Amazon RDS. We subscribe to this service, implementing a relational database by business line, with copies of tables for the required relationships with other business lines. Each business line therefore remains responsible for its data, operating in SSOT (single source of truth). The business line owner of data notifies the other lines of a change so they can implement the amendments to their local copy of the data as required.
We have therefore retained the principles of data ownership as advocated by good micro-service practices and preserved the independence of each business line. Each can operate on a standalone basis, using the copies of the remote data that it holds (if required).
Tip 3: Take care with Amazon Virtual Private Cloud (VPC)
It is common to require high security in the implementation of your application. If you use other services and tools hosted on an EC2 instance (such as calculation services, email sender, or stateful software), you will especially need to secure access to these services.
For this, Amazon offers the Amazon VPC service. This set of network and security tools allows you to create a sub-private network, to add resources, and to apply all the safety features you need (Firewall, fixed IP, SSH secure connection, etc.). The VPC service is full-featured and highly advanced. However, combined with the use of Lambda, it can quickly turn into a nightmare.
In effect, if you place one or more Lambda functions in a VPC, you oblige Amazon to automatically open a dataflow to trigger your Lambda. At each instantiation of a Lambda (during the warm-up phase), Amazon creates a dynamic network gateway to enable access to your Lambda from outside your VPC.
This is not necessarily a problem in itself. However, the creation of this network interface is very costly in time: Amazon takes nearly 8 seconds to create this network interface.
Furthermore, the first instantiation calls to a Lambda will take 8 to 10 seconds. If your Lambda calls another Lambda via HTTP, you will potentially increase the execution time by an additional 8 seconds.
The warm-up time of Lambdas can be managed ergonomically, but if your user has to wait between 15 and 20 seconds to get the answer to their query, they are likely to leave, well before getting the expected result. These are important factors to take into account before choosing a Lambda model combined with a VPC. Ask yourself this: “Is a loading time of several seconds, or several tens of seconds, acceptable in my use case?”
If you decide to use Amazon Lambdas with an Amazon VPC, below are a few creative solutions that could help you partially bypass this problem.
- Warm up the Lambdas at regular intervals. It is possible to warm up your Lambdas automatically, to keep your Lambdas hot inside the VPC. For this, you can use a script that makes fictitious calls on a regular basis (a maximum of every 20 minutes). This will enable you to dispense with the warm-up time of 8 seconds. But you will lose the main benefits of the Lambda approach, namely scalability (each new instance of your Lambda requires 8 seconds for AWS to absorb the increased load). Taking this option initially allowed us to work around the problem without completely reworking the architecture of our application.
- Lambda gateway. You can also remove your public Lambdas from the VPC, then create a Lambda “gateway” (or use an EC2 gateway instance), which will keep a network interface open permanently in the private VPC. This will enable your public Lambdas to query this gateway with a security system that you define and to access the private resources of your VPC. However, you will lose the advantage of the automatic loading of Lambdas, since you need to keep your Lambda gateway hot at all times. But if Amazon requires a new instance of this Lambda gateway, it has to recreate a dynamic network interface and the first call to this new instance will be longer.
Overall, our experience with Amazon Web Services has been positive. Amazon offers a powerful set of tools, the vast majority of which are well thought out. Nevertheless, it is important to plan your technical choices and to ensure that the solutions chosen meet a genuine requirement. We therefore urge you to conceptualize your architecture and to validate its operation through a Proof of Concept (POC).
This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!
Want to contribute? Get published!
Follow us on Twitter to stay tuned!
Illustration by Blok
Technical Expert @ SQLI
Want to join SQLI?
- Add to favorites
- Share on Twitter
- Share on Facebook
- Share on LinkedIn