Static Website Analytics
2 Jun 2019
golang
aws
javascript
lambda
In this post we’re going to build a simple serverless self-hosted analytics system. This is the simple set-up that I currently use to track visitors to my website. We’ll use a few simple tools to track visitors of our websites.
Overview
The tools and services we’re going to use to build our simple analytics are:
- Golang
- Serverless framework
- AWS Lambda
- AWS Cloudwatch
- Javascript
Golang/AWS Lambda/Serverless Framework
To get things off the ground we want to implement a lambda function that we can hit when someone vists our website. For our analytics we’re looking to be very unintrusive and to respect things like “Do Not Track”. The complete set-up can be found here, https://github.com/strattonw/static-website-analytics, but we’ll quicly go through everything below.
serverless.yml
Here we define the serverless.yml file that will deploy our lambda function. Before you deploy you should change the service: strattonDevAnalytics
to an app of your appropriate naming.
service: strattonDevAnalytics
frameworkVersion: ">=1.28.0 <2.0.0"
provider:
name: aws
runtime: go1.x
stage: dev
memorySize: 128
package:
exclude:
- ./**
include:
- ./bin/**
functions:
analytics:
handler: bin/analytics
events:
- http:
path: analytics
method: post
main.go
The most important part of main.go
is the type Payload struct
. The Payload
is the information that is sent to our analytics endpoint that we’ll use for tracking visitors. For our simple display we’re going to use CloudWatch Insights to display our simple graphs. As we’re using CloudWatch Insights, all we have to do is log the values the stdout in a parseable format.
package main
import (
"context"
"encoding/json"
"fmt"
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-lambda-go/lambda"
)
type Response events.APIGatewayProxyResponse
type Request events.APIGatewayProxyRequest
type Payload struct {
Url string `json:"u"`
UserAgent string `json:"ua"`
Referrer string `json:"r"`
Timezone string `json:"tz"`
Bot bool `json:"b"`
}
func Handler(ctx context.Context, req Request) (Response, error) {
var p Payload
json.Unmarshal([]byte(req.Body), &p)
fmt.Printf("url=%s ref=%s tz=%s bot=%t userAgent=\"%s\"", p.Url, p.Referrer, p.Timezone, p.Bot, p.UserAgent)
return Response{StatusCode: 200}, nil
}
func main() {
lambda.Start(Handler)
}
Deploying
Thankfully serverless framework makes it easy for us to deploy with the Makefile
. We should be able to run the make deploy
command to deploy the lambda to our dev environment. As I’m lazy and didn’t want to always type sls deploy --stage production
evertime I wanted to deploy to production, I’ve added a make command to do just that, make prod
. If everything with our deploy goes well we should see the following output
Service Information
service: strattonDevAnalytics
stage: dev
region: us-east-1
stack: strattonDevAnalytics-dev
resources: 10
api keys:
None
endpoints:
POST - https://rzo9xb4e74.execute-api.us-east-1.amazonaws.com/dev/analytics
functions:
analytics: strattonDevAnalytics-dev-analytics
layers:
None
The most important point from that output is the POST - https://rzo9xb4e74.execute-api.us-east-1.amazonaws.com/dev/analytics
which is a url we’re going to need in the next step when we implement the javascript.
Javascript
Now that we have a funcitoning lambda endpoint we need to implement the javascript that will allow us to track visitors. I’ve added additional comments in the below snippet that don’t appear in the actual code that explain what some lines are doing. We can either add this to specific pages or to the top of every page, it’s your choice. In my case I put it in the head tag of every page so I can track all pageviews.
<script>
(function(window, au) {
if (!window) return;
// Respect "doNotTrack"
if ('doNotTrack' in window.navigator && window.navigator.doNotTrack === '1') return;
// Skip prerender requests
if ('visibilityState' in window.document && window.document.visibilityState === 'prerender') return;
// Skip when localhost
if (window.location.hostname === 'localhost' || window.location.protocol === 'file:') return;
try {
var d = {
// Add the url
// We remove some personal data by dropping the query params and possible hashes
u: window.location.protocol + '//' + window.location.hostname + window.location.pathname,
ua: window.navigator.userAgent,
r: window.document.referrer,
// We could skip bot requests, but as I'd like to see if we get hit by bots I've left this in
b: window.navigator.userAgent.search(/(bot|spider|crawl)/ig) > -1,
};
try {
d.tz = Intl.DateTimeFormat().resolvedOptions().timeZone
} catch (ignored) {
}
var r = new XMLHttpRequest();
r.open('POST', au, true);
// Prevents preflight
r.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8');
r.send(JSON.stringify(d));
} catch (e) {
}
// Url from the above section
})(window, "https://rzo9xb4e74.execute-api.us-east-1.amazonaws.com/dev/analytics");
</script>
Testing
Now that we have the lambda deployed and the javascript inplace we want to make sure that everything is working as expected. The easiest way to do this is to remove the line in the javascript that prevents us from testing against localhost
, or, if you’re daring, test in production. Once we verify that our website is sending the xhr requests we should go to AWS CloudWatch and verify our logs. If all goes well we should see logs like:
url=https://stratton.dev/28-05-2019-go-basic-sessions/ ref=https://stratton.dev/ tz=America/Los_Angeles bot=false userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15"
url=https://stratton.dev/tags/ ref=https://stratton.dev/28-05-2019-go-basic-sessions/ tz=America/Los_Angeles bot=false userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15"
url=https://stratton.dev/ ref=https://stratton.dev/tags/ tz=America/Los_Angeles bot=false userAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15"
CloudWatch Insights
We have our entire flow set-up, now all we have to do it start getting metrics from our analytics. CloudWatch Insights gives us numerous possibilities to display our analytics. The most basic query to run against our log group would be
filter @message like /url/
| parse @message "url=* ref=* tz=* bot=* userAgent=\"*\"" @url, @ref, @tz, @bot, @userAgent
| sort @timestamp desc
This command will print out, in most to least recent order, the data that we’ve collected. As we do some basic bot detection we can make sure we filter out bot requests by adding in | filter @bot = "false"
to our query, giving us only non-bot pageviews.
filter @message like /url/
| parse @message "url=* ref=* tz=* bot=* userAgent=\"*\"" @url, @ref, @tz, @bot, @userAgent
| filter @bot = "false"
| sort @timestamp desc
I recommend to visiting the AWS CloudWatch documentation to build out many more queries with the data collected.
Source
Conclusion
We’ve shown a simple serverless self-hosted way for us to track pageviews for our website or static blog. There are lots of ways that we could improve the logging, both on the frontend and the backend. A simple way to improve the backend tracking is to utilize a database, such as Dynamodb, to store our analytics instead of CloudWatch.