ownstats

Host your own privacy-effective website statistics on AWS via CloudFront, S3, Glue and Athena.

Architecture

ownstats architecture

What is provisioned in your AWS account?

Preconditions

This guide assumes that you have an install of Serverless on your machine. Furthermore, you’ll need an AWS account enabled to run/deploy the above mentioned services, as well as already set up AWS credentials on your local machine.

How-to

To use ownstats on your websites, you can follow the steps below to get started.

Set up a Serverless project

There are basically two ways to get started, either use Serverless to generate a basic project structure, or use the “traditional” fork and clone mechanisms.

Use Serverless templates

The following command will create a local project structure you can use to deploy your static website in the ownstats-mywebsite folder relative to your current working directory:

$ sls create --template-url https://github.com/ownstats/ownstats --path ownstats-mywebsite
Serverless: Generating boilerplate...
Serverless: Downloading and installing "ownstats"...
Serverless: Successfully installed "ownstats"

Hint
When using this method, Serverless is replacing the service.name in the serverless.yml file automatically with ownstats-mywebsite. If you want to use a different stack name, you have to replace it manually. You also need to take care of that the stack name is using only allowed characters. When using the “Fork and clone” method below, the stack name is automatically derived from the domain name and sanitized regarding the allowed characters.

Fork and clone

Once you forked the repo on GitHub, you can clone it locally via

$ git clone git@github.com:youraccount/yourrepo.git

where youraccount/yourrepo needs to be replaced with the actual repository name of your forked repo.

Install dependencies

To install the dependencies, do a

$ npm i

After that, the project is usable.

Deploy

You can deploy ownstats with the following command:

$ sls deploy --domain yourdomain.yourtld --stage dev

where yourdomain.yourtld needs to be replaced with your actual domain name. You can also specify a AWS region via the --region flag, otherwise us-east-1 will be used. Furthermore, you can enable a debug mode for hello.js by specifying --debug-mode true.

Example output

Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service ownstats-yourdomain-yourtld.zip file to S3 (1.79 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
.....................
Serverless: Stack update finished...
Service Information
service: ownstats-yourdomain-yourtld
stage: dev
region: us-east-1
stack: ownstats-yourdomain-yourtld-dev
resources: 24
api keys:
  None
endpoints:
  None
functions:
  transformPageViews: ownstats-yourdomain-yourtld-dev-transformPageViews
  createPartition: ownstats-yourdomain-yourtld-dev-createPartition
  moveAccessLogs: ownstats-yourdomain-yourtld-dev-moveAccessLogs
layers:
  None
Serverless: Stack Output processed with handler: modules/renderScript.handler
Serverless: Minified hello.js
Serverless: Compressed hello.js
Serverless: Written hello.js template: /Users/username/development/ownstats-yourdomain-yourtld/src/hello.js
Serverless: Tracking pixel URL: https://d1h3biw9kbtest.cloudfront.net/hello.gif
Serverless: Tracking script URL: https://d1h3biw9kbtest.cloudfront.net/hello.js
Serverless: You can now add '<script src="https://d1h3biw9kbtest.cloudfront.net/hello.js"></script>' to your website's sources to enable the gathering of statistics
S3 Sync: Syncing directories and S3 prefixes...
S3 Sync: Synced.
DistributionIdKey: CloudFrontDistributionId
CloudfrontInvalidate: Invalidation started
Serverless: Removing old service artifacts from S3...

Usage

Using the hello.js script on your website(s)

The deployment (see above) will generate the <script> tag you can use to enable the generation of page view statistics, for example

<script src="https://d1h3biw9kbtest.cloudfront.net/hello.js"></script>

You need to add that <script> tag to each of your website`s pages. That’s all.

Hint
You can also use one deployment of ownstats to gather statistics for multiple websites. If you’re planning to do that, you can use the page_views.domain_name column to distinguish between the different websites.

How does it work internally?

The hello.js script will gather some information about the viewing device (not: fingerprinting) and other data:

Other querystring entries etc. are dismissed, and cookies for recurrent visitor detection etc. are not used.

The data is then “sent” to ownstats by requesting the hello.gif from the CloudFront CDN file and adding the above collected data to the querystring of the request. The querystring then gets split etc. by the Lambda function which generates the page_views table, so that the collected information can be used for analytical queries.

Glue tables (to be used with Athena)

You can use Athena to run queries on the automatically populating Glue tables

access_logs

The access_logs table consists of the following columns:

The table is partitioned by

page_views

The page_views table consists of the following columns:

The table is partitioned by

edge_locations

The edge_locations table consists of the following columns:

The edge_locations table is automatically joined with the access_logs table during the generation of page_views table via the respective edge_location_prefix columns to add location information to the page view data.

Querying the data

The ownstats project doesn’t come with a pre-packaged analytical frontend. Rather than being prejudiced on what analysts want to use, it builds the foundation for analyses by setting up an automated workflow from raw access logs to usable table structures, which can be queried with any tool which has support for Athena (or can use the Athena database drivers).

There are some excellent resources on how to get started and use AWS Athena:

Get page view count by day

select
  count(*) as page_view_count
from
  page_views
where
  event_date = '2019-07-01'

IAM policy

To deploy the stack, this example policy can be used:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ownstats",
            "Effect": "Allow",
            "Action": [
                "athena:GetQueryExecution",
                "athena:StartQueryExecution",
                "cloudformation:CreateUploadBucket",
                "cloudformation:CancelUpdateStack",
                "cloudformation:CreateStack",
                "cloudformation:DeleteStack",
                "cloudformation:UpdateStack",
                "cloudformation:UpdateTerminationProtection",
                "cloudformation:CreateChangeSet",
                "cloudformation:Describe*",
                "cloudformation:ContinueUpdateRollback",
                "cloudformation:EstimateTemplateCost",
                "cloudformation:PreviewStackUpdate",
                "cloudformation:List*",
                "cloudformation:ExecuteChangeSet",
                "cloudformation:ValidateTemplate",
                "cloudformation:Get*"
                "cloudfront:CreateCloudFrontOriginAccessIdentity",
                "cloudfront:CreateDistribution",
                "cloudfront:CreateInvalidation",
                "cloudfront:DeleteCloudFrontOriginAccessIdentity",
                "cloudfront:GetCloudFrontOriginAccessIdentityConfig",
                "cloudfront:GetDistribution",
                "cloudfront:TagResource",
                "cloudfront:UpdateDistribution",
                "glue:BatchCreatePartition",
                "glue:CreateDatabase",
                "glue:CreatePartition",
                "glue:CreateTable",
                "glue:DeleteDatabase",
                "glue:DeletePartition",
                "glue:DeleteTable",
                "glue:GetDatabase",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:GetTable",
                "glue:UpdateTable",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:PutRolePolicy",
                "iam:DeleteRolePolicy",
                "iam:GetRole",
                "lambda:AddPermission",
                "lambda:InvokeFunction",
                "lambda:DeleteFunction",
                "lambda:PublishVersion",
                "lambda:List*",
                "lambda:CreateFunction",
                "lambda:Get*",
                "lambda:RemovePermission",
                "lambda:CreateAlias",
                "lambda:Update*",
                "logs:CreateLogGroup",
                "logs:DeleteLogGroup",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "logs:FilterLogEvents",
                "logs:GetLogEvents",
                "logs:PutRetentionPolicy",
                "s3:CreateBucket",
                "s3:DeleteBucket",
                "s3:DeleteBucketPolicy",
                "s3:DeleteBucketWebsite",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:GetBucketAcl",
                "s3:GetBucketLocation",
                "s3:GetBucketNotification",
                "s3:GetBucketPolicy",
                "s3:GetBucketWebsite",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:ListAllMyBuckets",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucketVersions",
                "s3:PutBucketAcl",
                "s3:PutBucketCORS",
                "s3:PutBucketNotification",
                "s3:PutBucketPolicy",
                "s3:PutBucketTagging",
                "s3:PutBucketWebsite",
                "s3:PutEncryptionConfiguration",
                "s3:PutObject",
                "s3:PutPublicAccessBlock"
            ],
            "Resource": "*"
        }
    ]
}

Removal

You can remove the stack by running

$ sls remove --domain yourdomain.yourtld

Hint
It’s possible that you’ll have to clean the S3 buckets manually before running the above command.