Download-Statistics-Part3


Previous parts

Refining the need #3

We had the chance to discuss further the topic during the TC meeting on 2018-10-05

Best quotes from the logs about it

 DZE: regarding YTD, the requirement is: I'd like to see how donwloads are going this year and compare with previous years, having details at month level (because I want to see what happended when releasing a new version of the product). If user can have values for last X month is fine
 MHA: anyway let's start easy. We could provide a single CSV per month
 DZE: I'd prefer to have a separate row for each country, and I'd like to have data for all countries, if possible
 MHA: In my original suggestion, I was suggestion a country % distribution per file
 DZE: we can have values for top 10 countries and other countries aggregated, top 10 or top 20 ...

Current requirements

So, in the end of the meeting the requirements we agreed on where (per project):

  • one CVS file per month
  • following columns: year, month, file, uniqueDownloadCount, country
  • separate row for each country

Implementation

What I'm (MHA) thinking for the next dev steps:

  • we shouldn't work on the display until we have a working CSV generator
  • once we have processed a complete month CSV, it should be stored as is in the wiki. ("where" exactly have TBD)
  • CSV generator : it should produce the data in a handy way for both usage scenarios as detailed in Part 1. Two possible approaches
    • 1) iterate over every projects objects in the wiki and issue a Matomo request for each.
    • 2) issue a single Matomo request for all project. In this case Matomo's segment argument isn't mandatory
      It's all about a balance between number of requests VS size of the returned data
  • For both approaches, the job could be triggered using XWiki's scheduler
    • for approach 1, the job could look for all projects objects that have the  boolean property  "dlStatsEnable" (for example) and maybe a "dlStatsId" when the download's folder name doesn't match the XWiki's doc name lemonldap-ng. (in this example they don't match.)
    • for approach 2, the job consist of generating a "big" CSV based on Matomo's result for the current month. Then it should be post-processed and dispatched over the concerned projects, when applicable (= using the object properties "dlstatsEnable" & "dlStatsId")

Pontential issues we should avoid

  • For the use case, Matomo database will be fed by its log analyzer : we should avoid running the analyzer twice on the same log file at all cost otherwise, the count would be doubled in Matomo database.
  • For the record, each web server log file cover a 24-hours time frame and splits at 6:30AM. It means one can have a whole day from midnight to midnight only by combining two successive files.