Download-Statistics-Part3

Contents

Previous parts

Download-Statistics-Part2
Download-Statistics-Part1 (first part)

Refining the need #3

We had the chance to discuss further the topic during the TC meeting on 2018-10-05

Best quotes from the logs about it

DZE: regarding YTD, the requirement is: I'd like to see how donwloads are going this year and compare with previous years, having details at month level (because I want to see what happended when releasing a new version of the product). If user can have values for last X month is fine
MHA: anyway let's start easy. We could provide a single CSV per month
DZE: I'd prefer to have a separate row for each country, and I'd like to have data for all countries, if possible
MHA: In my original suggestion, I was suggestion a country % distribution per file
DZE: we can have values for top 10 countries and other countries aggregated, top 10 or top 20 ...

Current requirements

So, in the end of the meeting the requirements we agreed on where (per project):

one CVS file per month
following columns: year, month, file, uniqueDownloadCount, country
separate row for each country

Implementation

What I'm (MHA) thinking for the next dev steps:

we shouldn't work on the display until we have a working CSV generator
once we have processed a complete month CSV, it should be stored as is in the wiki. ("where" exactly have TBD)
CSV generator : it should produce the data in a handy way for both usage scenarios as detailed in Part 1. Two possible approaches
- 1) iterate over every projects objects in the wiki and issue a Matomo request for each.
- 2) issue a single Matomo request for all project. In this case Matomo's segment argument isn't mandatory
  It's all about a balance between number of requests VS size of the returned data
For both approaches, the job could be triggered using XWiki's scheduler
- for approach 1, the job could look for all projects objects that have the boolean property "dlStatsEnable" (for example) and maybe a "dlStatsId" when the download's folder name doesn't match the XWiki's doc name lemonldap-ng. (in this example they don't match.)
- for approach 2, the job consist of generating a "big" CSV based on Matomo's result for the current month. Then it should be post-processed and dispatched over the concerned projects, when applicable (= using the object properties "dlstatsEnable" & "dlStatsId")

Pontential issues we should avoid

For the use case, Matomo database will be fed by its log analyzer : we should avoid running the analyzer twice on the same log file at all cost otherwise, the count would be doubled in Matomo database.
For the record, each web server log file cover a 24-hours time frame and splits at 6:30AM. It means one can have a whole day from midnight to midnight only by combining two successive files.