Downloads-Statistics-Part1

Other parts

Old System

We're talking about https://stats.ow2.org (login required), which data is not updated since June this year, along with gforge shutdown.

What is it ?

This is a set of Perl scripts that were developped back in the 2000's.

How does it work ?

The whole mechanism requires two things in order to work:
- access to Gforge database
- access to web server log files

In a nutshell, the system is able to extract downloads hits per project/component/release/file. It makes a relation between the path and filename of the downloaded file found in the  webserver's logs AND Gforge's FRS project/component/release/file defined in GForge database.

So it heavily relies on both Gforge and web server log and it has its own way to count what is download. For example the system counts every hits with status 200 or 304, regardless of the IP and timeframe which tends to grow up a bit the reported metrics. however it counts only one download for status code 206 per IP per day.

New system

Testing Matomo

The idea is to rely on existing best-of-the-breed tools and rethink about our real need : start simple and enrich later.
A logical choice for the tool is Matomo.

I've set up a testing instance is at https://ow2-utils.ow2.org/matomo/ and ran the shipped Python script to analyse web server log file to feed matomo database.

After that we get raw downloads statistics: see an example for Legacy Downloads repository (gforge) on May/2018 :

What's a download in Matomo ? There are several criterias/filtering in place:

  • user-agents
  • http status. At the time being, the Matomo's log analyzer counts partial download (HTTP status code 206) as a regular hit.
  • file extension

On the top of that, Matomo is reporting two metrics for each reported files. What it represents is detailed in Matomo Glossary:

  • # Downloads : The number of times this link was clicked. Roughly those are hits
  • # Unique Downloads : The number of visits that involved a click on this link. If a link was clicked multiple times during one visit, it is only counted once.

It's also interesting to learn what is a Visit : If a visitor comes to your website for the first time or if they visit a page more than 30 minutes after their last page view, this will be recorded as a new visit. 

Extending Matomo to OW2's need

The download files list as it is displayed in Matomo's UI is pretty raw and only contains 100 items because the sub-list (when clicking +) is not able to paginate. However we're able to store much more in database (# of items in database configurable in Matomo). Remains that in a such shape, the list is pretty useless, at least for OW2 needs. It's where it becomes interesting to study Matomo's API.

With the following command, I'm able to retrieve the download metrics in JSON format for all files considered as downloads by Matomo.

curl 'https://ow2-utils.ow2.org/matomo/?module=API&method=Actions.getDownloads&idSite=5&period=month&date=2018-05-01&format=json&token_auth=anonymous&expanded=1'

With this JSON output one can write a kind of webapp that parse the data, compute it a little and display it somehow. One has to remember that the only hit's metadata available is the project name as per the accessed URI, ie /knowage/Knowage_6.x_CE_Manual.pdf (it's implied per-design in the current FRS over SFTP)

defining the need

Now we need to define what we want it terms of display/layout.
There are basically two ways to consume the metrics:

  • as a MO member : need to have a synthetic view of all projects statistics, per year/month for a start. Overall the master words would be start simple and light.
  • as a project leader or visitor  : need statistics when browsing specific projects, basically when browsing a project from the Project Marketplace.

Further questions:

  • Could it be useful to provide a CSV output, so one can import the metrics to a spreadsheet ?
  • Technical : what and how could fit as an XWiki app ?

Let's discuss and see the Part 2