JIRA to GitLab Migration Keeping References, Components and Dates

OW2 is an independent, global, open source software community. The mission of OW2 is to a) promote the development of open-source middleware, generic business applications, cloud computing platforms and b) foster a vibrant community and business ecosystem. The OW2 consortium is an independent non-profit organization open to companies, public organizations, academia and individuals. OW2 is committed to growing a community of open source code developers around the OW2 code base consisted in more than 30 active open source projects. In this context, OW2 maintains a forge. GitLab has recently become the cornerstone of the new OW2 infrastructure for developers, replacing a combination of GForge, Subversion, JIRA and Bamboo.

The whole infrastructure is entirely built on top of open source components, and on OW2 projects whenever possible. It comprises 4 parts:

Identity management: FusionDirectory
Development: GitLab and Nexus
Quality management: SonarQube and FOSSology
Communication: XWiki, SYMPA, Rocket.Chat (work in progress)

In order to migrate from JIRA to GitLab, we wrote dedicated scripts that are available from the OW2 GitLab under the MIT license, i.e. the same license as GitLab CE. We first considered to perform the migration by using exclusively the REST APIs of both solutions. We quickly understood however that this method had important limitations: in particular, the cross-references between issues would be lost since the identifiers attributed by GitLab would differ from the ones in JIRA. We hence decided to generate some SQL code so as to keep the original IDs. In order to do so, we looked into the GitLab SQL structure. Here is a step by step description of the migration process. It hinges on two scripts, which require to have administration access to both JIRA and GitLab:

Utils.py - a set of utility methods for dealing with users and groups
SqlGenerator.py - the script performing the actual migration

The functionalities provided by these scripts are the following:

Creation of users in GitLab, based on JIRA credentials
Internal IDs remain the same - hence issue cross-referencing keeps working
The creation date and authors of issues or comments are kept
Attachments are transferred
The JIRA components each issue relates to are transformed as labels

Configuration

A config.cfg file allows to specify the main parameters to be used by the migration script.

[Global]
# set this to false if JIRA / Gitlab is using self-signed certificate.
verifySslCertificate: False

[JIRA]
# URL of the JIRA instance
url:
# JIRA account to be used to connect to JIRA, e.g. ('login', 'password')
account:
# JIRA project name
project:
group:
header: {
   'User-Agent': 'My User Agent 1.0', #JIRA won't work with the standard Python user-agent
   'Content-Type': 'application/json',
   'Cache-Control': 'no-cache'
   }

[GitLab]
# URL of the GitLab instance
url:
# Token to be used whenever the API is invoked and
# the script will be unable to match the JIRA's author of the comment / attachment / issue.
# This identity will be used instead.
token:
# GitLab project name, groupName/projectName
project:
# The numeric project ID. If you don't know it, the script will search for it
# based on the project name.
projectId:
# GitLab API version
version: v4
api: ${GitLab:url}api/${GitLab:version}

User and Group utility methods

In the script Utils.py two utility methods are defined for dealing with user import: getJiraUsers, addUserToGitLab.

Retrieving the users of a group

The method getJiraUsers uses the JIRA REST API to retrieve a list of users belonging to a group, in JSON.

def getJiraUsers(JIRA_URL, JIRA_GROUP, JIRA_ACCOUNT, JIRA_HEADER, VERIFY_SSL_CERTIFICATE):
    u = {}
    fUsers = True
   while (fUsers):
        jira_users = requests.get(
            JIRA_URL + 'rest/api/2/group?groupname=%s&expand=users[%d:%d]&maxResults=1000' % (JIRA_GROUP, len(u), len(u)+50),
            auth=HTTPBasicAuth(*JIRA_ACCOUNT),
            verify=VERIFY_SSL_CERTIFICATE,
            headers=JIRA_HEADER
        )
       # print jira_users.text
        userItems = jira_users.json()['users']['items']
       # print str(userItems)
       for user in userItems:
            u[user['name']] = user
           # print user['name']
       if len(userItems)==0:
            fUsers = False
   print "%d users found" % len(u)
   return u

Adding users to GitLab

The utility script contains a method addUserToGitLab taking a user name as parameter and performing the following operations:

It replaces any space found in the user name by an underscore because spaces are not authorized in GitLab user names
If the user name is an e-mail address, only the first part before the '@' is kept because '@' is not authorized
If the user name does not exist in GitLab:
- If it exists in the list of JIRA users returned by the method getJiraUsers, the user is created in GitLab with the same credentials as in JIRA
- Else a user is created in GitLab with default information
- A check that the user was created successfully is performed, and an exception is raised if it was not
Else we check that the real name is the same as in JIRA.
Then the user is added to the project given as parameter

Migrating issues from JIRA to GitLab

First, all the project's components are imported as labels:

Importing all project's components as labels

def importComponentsAsLabels():
  SQL_File.write(u'DELETE FROM labels WHERE project_id = {project_id};\n'.format(project_id = gitLabProjectId));
  jira_project = requests.get(jiraUrl + 'rest/api/2/project/' + jiraProject,
    auth = HTTPBasicAuth(*jiraAccount),
    headers = jiraHeader)
  components = jira_project.json()['components']
for component in components:
    description = ""
   if 'description' in component:
      description = component['description']
    r = lambda : random.randint(0,255)
    SQL_File.write(u'INSERT INTO labels (title, project_id, description, type, color) VALUES (\'{title}\', {project_id}, \'{description}\', \'ProjectLabel\', \'{color}\');\n'.format(title = component['name'], project_id = gitLabProjectId, description = description, color = '#%02X%02X%02X' % (r(), r(), r())))

Then, the actual migration consists the following steps, for each issue:

Generate SQL for each issue
Generate SQL for each issue component
Generate SQL for each issue comment
Retrieve the issue attachments using the JIRA REST API and post them to GitLab, with the corresponding note reference

Generating SQL for each JIRA issue

All JIRA issues corresponding to the defined project id are retrieved using the JIRA REST API, they are then processed individually (see snippet below). Finally, all the {code} tags in JIRA are replaced by the string "```" in order to get the code properly formatted in GitLab using the Markdown syntax. This transformation is performed via two SQL queries (one for the issues, one for the comments) using a regex. Another transformation is applied: the references to other issues are replaced by the hash sign followed by the issue identifier.

acquiredIssues = 0
counter = 0
MAX=10000
while True:
  jira_issues = requests.get(
    jiraUrl + 'rest/api/2/search?jql=project=%s+&startAt=%d' % (jiraProject, acquiredIssues),
    auth=HTTPBasicAuth(*jiraAccount),
    verify=verifySslCertificate,
    headers=jiraHeader
  ).json()
  acquiredIssues += jira_issues['maxResults']

for issue in jira_issues['issues']:
    counter = counter + 1
   if counter < MAX:
        processIssue(issue)
if acquiredIssues >= jira_issues['total']:
    acquiredIssues = jira_issues['total']
   break

#Replace all the "{code}" occurrences by "```"
SQL_File.write(u'UPDATE issues SET description = regexp_replace(description, \'\{\{code[^\}\}]*}}(.*?)\{\{code\}\}\', \'\n```\\1```\', \'g\') WHERE project_id = {project_id};\n'.format(project_id = gitLabProjectId))
SQL_File.write(u'UPDATE notes SET note = regexp_replace(note, \'\{\{code[^}}]*\}\}(.*?)\{\{code\}\}\', \'```\n\\1```\', \'g\'), note_html = NULL WHERE noteable_type = \'Issue\' AND noteable_id IN (SELECT id FROM issues WHERE project_id = {project_id});\n'.format(project_id = gitLabProjectId))
#Replace all references to other issues such as "PROJECT-ISSUE_ID" by "#ISSUE_ID"
SQL_File.write(u'UPDATE issues SET description = regexp_replace(description, \'{project_name}-(\\d*)\', \'#\\1\', \'g\') WHERE project_id = {project_id};\n'.format(project_id = gitLabProjectId, project_name = jiraProject))
SQL_File.write(u'UPDATE notes SET note = regexp_replace(note, \'{project_name}-(\\d*)\', \'#\\1\', \'g\'), note_html = NULL WHERE noteable_type = \'Issue\' AND noteable_id IN (SELECT id FROM issues WHERE project_id = {project_id});\n'.format(project_id = gitLabProjectId, project_name = jiraProject))
#Replace all username mentions
SQL_File.write(u'UPDATE issues SET description = regexp_replace(description, \'\\[~([^\\]]*)\\]\', \'@\\1\', \'g\') WHERE project_id = {project_id};\n'.format(project_id = gitLabProjectId))
SQL_File.write(u'UPDATE notes SET note = regexp_replace(note, \'\\[~([^\\]]*)\\]\', \'@\\1\', \'g\'), note_html = NULL WHERE noteable_type = \'Issue\' AND noteable_id IN (SELECT id FROM issues WHERE project_id = {project_id});\n'.format(project_id = gitLabProjectId))

print "Generated SQL code for %s issues from project %s" % (acquiredIssues, jiraProject)

The processIssue method is the main code carrying out the conversion of each JIRA issue into SQL code aimed at GitLab, together with the related comments and attachments. In a first step, all issue fields are converted to SQL, then the comments are retrieved and converted as well. In those two steps, the issue or comments authors are added to GitLab if they do not exist already. Attachments are also handled: they are inserted into GitLab over the GitLab REST API.

The SQL code is generated along the code below:

SQL = u'INSERT INTO issues (title, project_id, iid, author_id, state, created_at, updated_at, description, milestone_id) SELECT \'{title}\', {project_id}, {iid}, id, \'{state}\', \'{created_at}\', \'{updated_at}\', \'{description}\', (SELECT id from milestones WHERE project_id={project_id} and title=\'{milestone_name}\') FROM users WHERE username = \'{username}\';\n'.format(title = title, iid = iid, project_id = gitLabProjectId, state = state, created_at = issue['fields']['created'], updated_at = issue['fields']['updated'], username = gitlabReporter, description = description, milestone_name = milestone_name)

Generating SQL for labels

Then, the 'components' field, if present in the issue, is converted into labels in SQL:

if 'components' in issue['fields']:
for component in issue['fields']['components']:
SQL_File.write(u'INSERT INTO label_links (label_id, target_id, target_type) (SELECT l.id, i.id, \'Issue\' FROM labels l, issues i WHERE l.title = \'{label}\' AND iid = {iid} AND l.project_id = {project_id} AND i.project_id = {project_id});\n'.format(label = component['name'], iid = iid, project_id = gitLabProjectId));

Generating SQL for comments

The same goes for comments, turned into GitLab notes:

SQL_File.write(u'INSERT INTO notes (note, noteable_type, author_id, created_at, updated_at, project_id, noteable_id) SELECT \'{note}\', \'Issue\', (SELECT id FROM users WHERE username = \'{username}\'), \'{created_at}\', \'{updated_at}\', {project_id}, (SELECT id FROM issues WHERE iid = {iid} AND project_id = {project_id});\n'.format(note = body, project_id = gitLabProjectId, created_at = comment['created'], updated_at = comment['updated'], username = author, iid = iid))

Handling attachments

Finally, attachments are handled: a POST request is executed to upload the attachment, then it is inserted into the GitLab database:

file_info = requests.post(
gitLabUrl + 'api/' + gitLabVersion + '/projects/%s/uploads' % gitLabProjectId,
headers = {
   'PRIVATE-TOKEN': gitLabToken,
   'SUDO': author#GITLAB_USER_NAMES.get(author, author)
  },
files = {
   'file': (
      filename,
      _content
    )
   },
  verify = verifySslCertificate
)
  #print str(file_info)
if 'markdown' in file_info.json():
    # now we got the upload URL. Let's post the comment with an
    # attachment
   SQL_File.write(u'INSERT INTO notes (note, noteable_type, author_id, created_at, updated_at, project_id, noteable_id) SELECT \'{note}\', \'Issue\', (SELECT id FROM users WHERE username = \'{username}\'), \'{created_at}\', \'{updated_at}\', {project_id}, (SELECT id FROM issues WHERE iid = {iid} AND project_id = {project_id});\n'.format(note = file_info.json()['markdown'], created_at = attachment['created'], updated_at = attachment['created'], username = author, iid = iid, project_id = project_id))

We hope these scripts will be useful to other parties. You will find the whole code in OW2's GitLab. Feel free to report there any bug or enhancement request.