Downloads-Statistics-Part1


Content related to this topics is stored in 3 different pages:

Testing Matomo

The idea is to rely on existing best-of-the-breed tools and rethink about our real need: start simple and enrich later.
A logical choice for the tool is Matomo.

I've set up a testing instance is at https://ow2-utils.ow2.org/matomo/ and ran the shipped Python script to analyse web server log file to feed matomo database.

After that we get raw downloads statistics: see an example for Legacy Downloads repository (gforge) on May/2018 :

What's a download in Matomo ? There are several criterias/filtering in place:

  • user-agents
  • http status. At the time being, the Matomo's log analyzer counts partial download (HTTP status code 206) as a regular hit.
  • file extension

On the top of that, Matomo is reporting two metrics for each reported files. What it represents is detailed in Matomo Glossary:

  • # Downloads : The number of times this link was clicked. Roughly those are hits
  • # Unique Downloads : The number of visits that involved a click on this link. If a link was clicked multiple times during one visit, it is only counted once.

It's also interesting to learn what is a Visit : If a visitor comes to your website for the first time or if they visit a page more than 30 minutes after their last page view, this will be recorded as a new visit. 

Extending Matomo to OW2's need

The download files list as it is displayed in Matomo's UI is pretty raw and only contains 100 items because the sub-list (when clicking +) is not able to paginate. However we're able to store much more in database (# of items in database configurable in Matomo). Remains that in a such shape, the list is pretty useless, at least for OW2 needs. It's where it becomes interesting to study Matomo's API.

With the following command, I'm able to retrieve the download metrics in JSON format for all files considered as downloads by Matomo.

curl 'https://ow2-utils.ow2.org/matomo/?module=API&method=Actions.getDownloads&idSite=5&period=month&date=2018-05-01&format=json&token_auth=anonymous&expanded=1'

With this JSON output one can write a kind of webapp that parse the data, compute it a little and display it somehow. One has to remember that the only hit's metadata available is the project name as per the accessed URI, ie /knowage/Knowage_6.x_CE_Manual.pdf (it's implied per-design in the current FRS over SFTP)

defining the need

Now we need to define what we want it terms of display/layout.
There are basically two ways to consume the metrics:

  • as a MO member: need to have a synthetic view of all projects statistics, per year/month for a start. Overall the master words would be start simple and light.
  • as a project leader or visitor : need statistics when browsing specific projects, basically when browsing a project from the Project Marketplace.

Further questions:

  • Could it be useful to provide a CSV output, so one can import the metrics to a spreadsheet?
  • Technical: what and how could fit as an XWiki app?

Let's discuss and see the Part 2