Static Website Analytics

2 Jun 2019 golang aws javascript lambda

In this post we’re going to build a simple serverless self-hosted analytics system. This is the simple set-up that I currently use to track visitors to my website. We’ll use a few simple tools to track visitors of our websites.

Overview

The tools and services we’re going to use to build our simple analytics are:

Golang/AWS Lambda/Serverless Framework

To get things off the ground we want to implement a lambda function that we can hit when someone vists our website. For our analytics we’re looking to be very unintrusive and to respect things like “Do Not Track”. The complete set-up can be found here, https://github.com/strattonw/static-website-analytics, but we’ll quicly go through everything below.

serverless.yml

Here we define the serverless.yml file that will deploy our lambda function. Before you deploy you should change the service: strattonDevAnalytics to an app of your appropriate naming.

service: strattonDevAnalytics

frameworkVersion: ">=1.28.0 <2.0.0"

provider:
  name: aws
  runtime: go1.x
  stage: dev
  memorySize: 128

package:
 exclude:
   - ./**
 include:
   - ./bin/**

functions:
  analytics:
    handler: bin/analytics
    events:
      - http:
          path: analytics
          method: post
main.go

The most important part of main.go is the type Payload struct. The Payload is the information that is sent to our analytics endpoint that we’ll use for tracking visitors. For our simple display we’re going to use CloudWatch Insights to display our simple graphs. As we’re using CloudWatch Insights, all we have to do is log the values the stdout in a parseable format.

package main

import (
	"context"
	"encoding/json"
	"fmt"

	"github.com/aws/aws-lambda-go/events"
	"github.com/aws/aws-lambda-go/lambda"
)

type Response events.APIGatewayProxyResponse
type Request events.APIGatewayProxyRequest

type Payload struct {
	Url       string `json:"u"`
	UserAgent string `json:"ua"`
	Referrer  string `json:"r"`
	Timezone  string `json:"tz"`
	Bot       bool   `json:"b"`
}

func Handler(ctx context.Context, req Request) (Response, error) {
	var p Payload
	json.Unmarshal([]byte(req.Body), &p)

	fmt.Printf("url=%s ref=%s tz=%s bot=%t userAgent=\"%s\"", p.Url, p.Referrer, p.Timezone, p.Bot, p.UserAgent)

	return Response{StatusCode: 200}, nil
}

func main() {
	lambda.Start(Handler)
}
Deploying

Thankfully serverless framework makes it easy for us to deploy with the Makefile. We should be able to run the make deploy command to deploy the lambda to our dev environment. As I’m lazy and didn’t want to always type sls deploy --stage production evertime I wanted to deploy to production, I’ve added a make command to do just that, make prod. If everything with our deploy goes well we should see the following output

Service Information
service: strattonDevAnalytics
stage: dev
region: us-east-1
stack: strattonDevAnalytics-dev
resources: 10
api keys:
  None
endpoints:
  POST - https://rzo9xb4e74.execute-api.us-east-1.amazonaws.com/dev/analytics
functions:
  analytics: strattonDevAnalytics-dev-analytics
layers:
  None

The most important point from that output is the POST - https://rzo9xb4e74.execute-api.us-east-1.amazonaws.com/dev/analytics which is a url we’re going to need in the next step when we implement the javascript.

Javascript

Now that we have a funcitoning lambda endpoint we need to implement the javascript that will allow us to track visitors. I’ve added additional comments in the below snippet that don’t appear in the actual code that explain what some lines are doing. We can either add this to specific pages or to the top of every page, it’s your choice. In my case I put it in the head tag of every page so I can track all pageviews.

<script>
(function(window, au) {
    if (!window) return;
    // Respect "doNotTrack"
    if ('doNotTrack' in window.navigator && window.navigator.doNotTrack === '1') return;
    // Skip prerender requests
    if ('visibilityState' in window.document && window.document.visibilityState === 'prerender') return;
    // Skip when localhost
    if (window.location.hostname === 'localhost' || window.location.protocol === 'file:') return;

    try {
        var d = {
            // Add the url
            // We remove some personal data by dropping the query params and possible hashes
            u: window.location.protocol + '//' + window.location.hostname + window.location.pathname,
            ua: window.navigator.userAgent,
            r: window.document.referrer,
            // We could skip bot requests, but as I'd like to see if we get hit by bots I've left this in
            b: window.navigator.userAgent.search(/(bot|spider|crawl)/ig) > -1,
        };

        try {
            d.tz = Intl.DateTimeFormat().resolvedOptions().timeZone
        } catch (ignored) {
        }

        var r = new XMLHttpRequest();
        r.open('POST', au, true);
                                           // Prevents preflight
        r.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8');
        r.send(JSON.stringify(d));
    } catch (e) {
    }
           // Url from the above section
})(window, "https://rzo9xb4e74.execute-api.us-east-1.amazonaws.com/dev/analytics");
</script>

Testing

Now that we have the lambda deployed and the javascript inplace we want to make sure that everything is working as expected. The easiest way to do this is to remove the line in the javascript that prevents us from testing against localhost, or, if you’re daring, test in production. Once we verify that our website is sending the xhr requests we should go to AWS CloudWatch and verify our logs. If all goes well we should see logs like:

url=https://stratton.dev/28-05-2019-go-basic-sessions/ ref=https://stratton.dev/ tz=America/Los_Angeles bot=false userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15"

url=https://stratton.dev/tags/ ref=https://stratton.dev/28-05-2019-go-basic-sessions/ tz=America/Los_Angeles bot=false userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15"

url=https://stratton.dev/ ref=https://stratton.dev/tags/ tz=America/Los_Angeles bot=false userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15"

CloudWatch Insights

We have our entire flow set-up, now all we have to do it start getting metrics from our analytics. CloudWatch Insights gives us numerous possibilities to display our analytics. The most basic query to run against our log group would be

filter @message like /url/
| parse @message "url=* ref=* tz=* bot=* userAgent=\"*\"" @url, @ref, @tz, @bot, @userAgent
| sort @timestamp desc

This command will print out, in most to least recent order, the data that we’ve collected. As we do some basic bot detection we can make sure we filter out bot requests by adding in | filter @bot = "false" to our query, giving us only non-bot pageviews.

filter @message like /url/
| parse @message "url=* ref=* tz=* bot=* userAgent=\"*\"" @url, @ref, @tz, @bot, @userAgent
| filter @bot = "false"
| sort @timestamp desc

I recommend to visiting the AWS CloudWatch documentation to build out many more queries with the data collected.

Source

Github

Conclusion

We’ve shown a simple serverless self-hosted way for us to track pageviews for our website or static blog. There are lots of ways that we could improve the logging, both on the frontend and the backend. A simple way to improve the backend tracking is to utilize a database, such as Dynamodb, to store our analytics instead of CloudWatch.