Download-Statistics-Part2


Other parts

Defining the need #2

In addition to what has been said about that topic in the first part, Davide expressed the following need:

 <Davide> we would like to have information about downloads, which files where downloaded (tipically Demo package is download more than others), when download occurred and from where
 <Davide> it would be enough for us to have those information as raw data (for example within a csv file)
 ...
 <Davide> last month or last year or YTD

So if I summarize:

  • The metrics should contains per-file geographic distribution.
  • Pretty display is not a first-have, raw CSV file to download is enough for a start.
  • the download stats from the project user/leader POV should be reachable from OW2's projects dashboard/scorecard

Now about the "when": when it's said "last year or YTD": does it mean it should be available per month ?

Example CSV with YTD:

year, month, file, uniqueDownloadCount,top5Country
2018, 01,,...
(...)
2018, 03, Knowage_6.x_CE_Manual.pdf, 1234, [US30,BR24,CN18,...]
2018, 04, Knowage_6.x_CE_Manual.pdf, 567, [US30,BR24,CN18,...]
2018, 05, Knowage_6.x_CE_Manual.pdf, 8910, [US30,BR24,CN18,...]

At some point, especially for past data it would requires a storage and caching mechanism. We're not going to retrieve matomo's data again and again. Need to define how it is going to be stored.

Research with Matomo's API

In order to get visits geolocalization details, you cannot rely on Actions.getDownloads. Instead we might use the following:

curl 'https://ow2-utils.ow2.org/matomo/?module=API&method=Live.getLastVisitsDetails&idSite=5&period=month&date=2018-05-01&format=json&token_auth=anonymous&expanded=1&segment=downloadUrl=^http%3A%2F%2Fdownload.forge.ow2.org%2Fasm'

The above retrieves per visit detail when a download URL starting with http://download.forge.ow2.org/asm occurs. (here it's the whole "asm" project)
It is the visit's details that holds the geographic metadata.

For a given visit, actionsDetails nested array contains hits details.

For unique download or raw download count, one can also rely on this API method:
- if we want unique download count per file on the request period, one has to compute the json and search for the given url within all visits. If there is a match, the current visit is an unique download.
- Raw download count isn't visit-based, it's based on url hits : it is as simple as counting how many times a given url appear to all visits.
- To extract the top 5 country per file, one has to compute the json to extract which visitors has ever accessed the given file and retrieve (for example) the related countryCode

Of course one could also use a combination of Actions.getDownloads and Live.getLastVisitsDetails depending on taste/performance results.