Minimum viable IoT platform with Serverless framework and DynamoDB

9 Jul 2021|Technology

Implementing lightweight APIs with AWS Lambda and DynamoDB is pretty straightforward with correct tools. Serverless framework makes it really easy and quick while leveraging the serverless model and thus reducing the amount of maintenance required. Most of the serverless services use a pay-per-use pricing model, which will also be highly beneficial in low demand applications.

LEGO construction on white baseplate showing brown blocks connected to black beams with red and yellow pieces forming a simple structure.

Background

There's a time and place for robust, resilient and scalable solutions. But there's also situations where the goal is to just get things done. Be it validating a hypothesis, building an experiment or just optimizing for a good enough solution, sometimes it just makes sense to not over-engineer a system.

I've had a personal home sensor "platform" for years already. It has been capable of storing sensor values, configuring their metadata and offering the data for UI clients. The main use case has been remote monitoring, like getting alerts if your pipes are about to freeze in the winter. During the years there has been a few different UI's, but so far my favorite has been my Telegram bot.

The need for Change

Years ago, after I had my initial version working locally, I got a VPS from Digital Ocean to run the servers on. First I configured some basics on the virtual machine, then installed PostgreSQL and a process monitoring software for NodeJS and finally configured all the stuff running indefinitely. Years went by and the thing just kept on working. When everything works, why bother even thinking about the system internals? Like it too often goes, I didn't care about updating or upgrading the OS or any of the software. To be honest, it was a calculated risk: there wasn't actually anything valuable on the server, so using my free time to something else was more worth it.

Then at some point a colleague of mine needed a place to store some sensor data of his. Now the system had users. Of course eventually the server started having issues and the kind that are close to impossible to fix after years of ignoring the whole system. I'm not going to go to the details of the problems here, but imagine having an abandoned Ubuntu 16.04 running NodeJS 4.x in year 2021. But the actual problem was that now there was the other user, who actually wanted to use the system still.

So now it was time to do something about it and that something would have to be a time- and cost-efficient rewrite of the most important parts of the system. And since both the world and myself have gone forward since the first version, now it was time to keep ignoring maintenance again but now in a way that would ensure that the updates are there even without me. Time for cloud and time for serverless.

Architecture

The amount of services every cloud platform offers is overwhelming. Even deciding how to store data is not an easy task. In addition, all the computing power and the glue between everything has to be considered carefully when designing the system. I had only a few requirements: the ability to read and write sensor data, support a few users and keep the cost as hobby-friendly as possible. Because I was on a mission rather than on a pure learning journey, I decided to go with services I was already quite familiar with: DynamoDB and Lambda.

Usually I'm not much of a fan of NoSQL databases because I love a nicely designed schema that supports the developer and guarantees integrity. This time however I decided that, because DynamoDB is a serverless service and it's a good match for sensor data, I'll go with it. Also it didn't hurt that you can get started for free. The thing I was a bit worried about was that dynamoDB can get pretty expensive if you end up storing too much data on it or doing something stupid.

Minimal IoT platform The architecture is really minimal and straightforward

Selecting a cloud provider is just the first step in implementing your solution for the cloud. The next one is to choose your tools and approach. One could go and open the web browser and configure everything there but a better way is to define the infrastructure as code. This applies to every project big or small.

There's huge differences in developer-friendliness of different solutions. For AWS there's Cloudformation and Cloud Development Kit as native solutions. Then there's cloud-agnostic solutions like Terraform and Pulumi. Still my favorite for simple Function as a Service (FaaS) projects is Serverless framework. With that you can also create simple resources for your functions. For this project the resources were the DynamoDB table and access control rules for the functions to access the table.

The following chapters include examples as well as code from the actual implementation. The full source code can be found on github.

Implementation

DynamoDB

Moving from relational database to NoSQL database does mean some changes. With noSQL there's no schema for the database so "you can store anything" on the records. This however doesn't mean that you don't have to think in advance or design the data structure. The first thing to do is to figure out what kind of data you want to store and how you would like to query it.

Because DynamoDB has a feature called Time to Live (TTL), we can use it to ensure that our database doesn't grow too big. This is because this hobby project shouldn't get too expensive and there is really no need to store all the data forever at this point.

Table design

For this project, the main query for the data is to get measurement values for a certain sensor. Because there are multiple users and we cannot trust that every user would have unique sensor identifiers, we are going to include the user identifier for the primary key of the measurement value. The find key for the measurements table is a string in the format userID#sensorID.

The secondary limiting parameter on the queries is the time frame. We want to get all the measurement values for a certain date or a week. This is why the sort key for the measurements table will be a number representing the timestamp in unix time.

A third configured attribute in the table is expires_at, which is used for the automatic deletion of items based on their age.

The table for measurement data was created in serverless.yml like below.

resources:
  Resources:
    MeasurementsDynamoDbTable:
      Type: 'AWS::DynamoDB::Table'
      DeletionPolicy: Retain # Keep the table even if rest of the stack is removed
      Properties:
        AttributeDefinitions:
          -
            AttributeName: "sensorId"
            AttributeType: "S" # String
          -
            AttributeName: "timestamp"
            AttributeType: "N" # Number
        KeySchema:
          -
            AttributeName: "sensorId" # find key
            KeyType: "HASH"
          -
            AttributeName: "timestamp" # sort key
            KeyType: "RANGE"
        ProvisionedThroughput:
          ReadCapacityUnits: 1 # 1 read and write unit is enough
          WriteCapacityUnits: 1
        StreamSpecification:
          StreamEnabled: false # There's no need for streaming
        TimeToLiveSpecification:
          AttributeName: expires_at
          Enabled: true
        TableName: ${self:provider.environment.MEASUREMENT_TABLE}

For having access to the table, we must also make it so that there's the correct role statements for the functions. These are created like below

iamRoleStatements:
    - Effect: "Allow"
      Action:
        - dynamodb:Query
        - dynamodb:Scan
        - dynamodb:GetItem
        - dynamodb:PutItem
      Resource: "*"

Queries

The only query in the system so far is the one to fetch all the measurement values for a given time span. It is pretty straight-forward as it has one filter for sensor identifier and another for the start and end timestamps.

const params = {
  TableName: process.env.MEASUREMENT_TABLE,
  KeyConditionExpression: "#sensor = :sensorId and #ts BETWEEN :start AND :end",
  ExpressionAttributeNames: {
    "#sensor": "sensorId",
    "#ts": "timestamp"
  },
  ExpressionAttributeValues: {
    ":sensorId": sensorId,
    ":start": queryStart,
    ":end": queryEnd,
  }
};

mvp_iot_platform_measurements_example Data query for measurement values of sensor 0000067745db of user with id 10000 for a certain time frame

Functions

In this minimal system, all it needs is two functions: one for saving new measurements and one for fetching old. Both functions have the most simple user authentication and a few lines of business logic.

Create

Inserting new measurement values consists of parsing the request and creating the item for DynamoDB. Most of the lines are for calculating the TTL value for the items. What's good to understand is that the expiration time for the items is calculated when inserting the data and is item specific. For this system, it's a bit of a pain because changing the TTL for the whole system would mean changing if for all the old items as well.

Read

The read function is also as simple as it gets. It only gets the request parameters and constructs the query for DynamoDB. The primary key for the query is built from the userId gotten from the authorization token combined with the sensor identifier from the request. In addition, the start and end timestamps are included to the query object.

const queryObj = {
  TableName: process.env.MEASUREMENT_TABLE,
  KeyConditionExpression: "#sensor = :sensorId and #ts BETWEEN :start AND :end",
  ExpressionAttributeNames: {
    "#sensor": "sensorId",
    "#ts": "timestamp"
  },
  ExpressionAttributeValues: {
    ":sensorId": sensorId,
    ":start": queryStart,
    ":end": queryEnd,
  }
};

Because one of the current clients using this system is publicly accessible and reads the values directly from browser, CORS becomes an issue. This is currently handled so that the read function allows all origins for the requests.

return {
  statusCode: 200,
  headers: {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Credentials': true,
  },
  body: JSON.stringify(data, null, 2)
}

User management

This is the part I'm both most and least proud of. Because there's currently only two users for the system, I needed to get a functioning user management with API tokens, but I didn't need to do anything too heavy. No need to register new users or even have any data about the users other than the API token and an arbitrary user identifier.

The user database, if I dare to call it one, is a single string with a few separators. It supports both read and write API-tokens for multiple users. The actual tokens are UUID v4 strings.

This is definitely not the way to implement user database in any system, but the smallest disposable experiment. It still does everything I needed right now and with all the elements I can later move to an actual database.