Github wiki import

Last modified by Bart Kummel on 2021/03/18 11:29

cogScript to import one or more Github wikis into an XWiki space
Type
Category
Developed by

Bart Kummel

Rating
0 Votes
LicenseGNU Lesser General Public License 2.1

Description

Instructions

  • Paste the code below into an empty page
  • Make sure the Markdown Syntax 1.2 is installed and activated. The script uses this to convert Markdown from Github to XWiki syntax.
  • Make sure you've installed and activated the Git API. This script uses that API to clone the Github Wiki as a Git repository.
  • Replace <username> and <token> by your Github username and a personal access token you created via the Github web interface
  • View the page
  • Fill in the fields and click the button

How it works

Github wikis are rather unstructured. It's just a bunch of Markdown files in a Git repository. This script does some assumptions to be able to import in a structured way. The most important one is that it is assumed a Sidebar.md file exists, that has the complete sctructure of the wiki. Pages that are not linked to from this Sidebar.md will NOT be imported. (See also known issued below.)

For each page, the script uses the syntax convertor from the Markdown module to convert Markdown to XWiki syntax. Some patches are applied on the resulting syntax (using regex find/replace). Then a page is created in XWiki.

Each created page is tagged with "Github import" and the name of the root page. This is to make sure the imported pages can be found easily and deleted, should something go wrong during the import.

The script updates pages if they already exist. Therefore, the script can be run several times without problems. 

Known issues

  • It would be better to use the Git API module. However, this module only supports public Github repositories.
  • The username and token should probably be input fields instead of being hard coded.
  • Github wikis are less structured than XWiki, the following assumptions are made:
    • The _Sidebar.md file is used as entry point. Only pages that are linked to from _Sidebar.md will be imported. At the end of the script, files that are in the repository but not imported are listed. Suggested workflow is to add those files to the _Sidebar.md at the Github side and re-import.
    • It is assumed _Sidebar.md only contains list items. The hierarchy of the list items will be used as hierarchy in XWiki.
    • The hierarchical list in _Sidebar.md can have items that are not links. In that case, an empty page in XWiki will be created to preserve the hierarchical structure.
    • It is assumed that Home.md is the root page of the wiki.
    • If an image is linked to from multiple pages, it will be added as an attachment to each of the created XWiki pages it is used in.
  • The input form is very basic and not so pretty
  • Validation is very minimal
  • Tags for created pages should be configurable
  • Markdown conversion issues:
    • Internal links (inside a page) are not converted correctly. The fixInternalLinkAnchors() function in the script tries to fix this, but this does not work in all circumstances. Probably, the best solution is to fix the Markdown converter itself.
    • The image links that are created have to be updated manually to point to the files that are uploaded as attachments.
    • Lists with indented blocks (e.g. a code block inside a list item) are not properly converted: the block ends up at the top level and the list counter is reset. I've not fixed/patched this (yet).
{{velocity}}

#if("$!request.sourceRepoURL" == '')

  {{html}}
   <form action="" id="newdoc" method="post">
     <div>
        Source repository URL: <input type="text" name="sourceRepoURL" value="Source repository URL" class="withTip" size="50"/><br/>
        Source repository Name: <input type="text" name="sourceRepoName" value="Source repository name" class="withTip" size="50"/><br/>
        Root page name: <input type="text" name="rootPageName" value="Root page name" class="withTip" size="50"/><br/>
       <span class="buttonwrapper"><input type="submit" value="Import" class="button"/></span>
     </div>
   </form>
  {{/html}}
 
 #stop
#else
 $xcontext.put("sourceRepoURL", $request.sourceRepoURL)
 $xcontext.put("sourceRepoName", $request.sourceRepoName)
 $xcontext.put("rootPageName", $request.rootPageName)
#end

{{/velocity}}

{{groovy}}

import org.apache.commons.io.*
import org.eclipse.jgit.api.*
import org.eclipse.jgit.lib.*
import org.eclipse.jgit.revwalk.*
import org.eclipse.jgit.storage.file.*
import org.gitective.core.*
import org.gitective.core.filter.commit.*
import groovy.json.*
import java.util.regex.*

import org.eclipse.jgit.transport.*;

import org.xwiki.environment.*;
import org.xwiki.rendering.syntax.SyntaxType;

def CredentialsProvider getCredentialsProvider() {
 return new UsernamePasswordCredentialsProvider("<username>", "<token>")
}

def Repository getRepository(String repositoryURI, String localDirectoryName) {
  Repository repository;

  Environment environment = services.component.getInstance(Environment);
  File permDir = environment.getPermanentDirectory();
  File localGitDirectory = new File(permDir, "git")
  File localDirectory = new File(localGitDirectory, localDirectoryName);
  File gitDirectory = new File(localDirectory, ".git");
  println "Local Git repository is at [${gitDirectory}]"
  FileRepositoryBuilder builder = new FileRepositoryBuilder();

 try {
   // Step 1: Initialize Git environment
    repository = builder.setGitDir(gitDirectory)
                       .readEnvironment()
                       .findGitDir()
                       .build();
    Git git = new Git(repository);

   // Step 2: Verify if the directory exists and isn't empty.
   if (!gitDirectory.exists()) {
     // Step 2.1: Need to clone the remote repository since it doesn't exist
      git.cloneRepository()
        .setCredentialsProvider(getCredentialsProvider())
        .setDirectory(localDirectory)
        .setURI(repositoryURI)
        .call();
    }
  } catch (Exception e) {
    throw new RuntimeException(String.format("Failed to execute Git command in [%s]", gitDirectory), e);
  }

 return repository;
}

def service = services.get("git")
def sourceRepoURL = xcontext.get("sourceRepoURL");
def sourceRepoName = xcontext.get("sourceRepoName");
def rootPageName = xcontext.get("rootPageName");

importedFiles = [];

def fixInternalLinkAnchors(xwikiSyntax) {
 // Fix to create internal links to anchors that acutally work
  xwikiSyntax = xwikiSyntax.replaceAll(~/\[\[([^>^]+)>>#([^\]]+)]]/, "[[\$1>>||anchor=\"\$2\"]]")
 def matches = (xwikiSyntax =~ /anchor="([^\"]+)\"\]\]/)
  matches.findAll({it.size() == 2}).each {
    xwikiSyntax = xwikiSyntax.replace(it[0], "anchor=\"H${it[1].split("-").collect{it.capitalize()}.join("").replace(" ", "")}\"]]")
  }
 return xwikiSyntax
}

def uploadImages(workTree, document, text) {
  matches = text =~ /\[\[image:([^\]^\|]+)[\]\]|\|\|]/
  matches.each {
   if (!it[0].contains("http:") && !it[0].contains("https:")) {
     def filePath = it[1]
     def fileNameWithoutDirectory = filePath.tokenize("/").last()
     def fileFound = false
      workTree.eachFileRecurse({ if (fileNameWithoutDirectory == it.name){  fileFound = it } })
     if (fileFound) {
        importedFiles += fileNameWithoutDirectory
        document.addAttachment(fileNameWithoutDirectory, fileFound.bytes)
        text = text.replaceAll(filePath, fileNameWithoutDirectory)
        println "Uploaded file ${filePath}: ${fileFound.length()} bytes = ${fileFound.bytes.length} bytes} "
      } else {
        println "(% style=\"color:red\" %)Image file not found: ${filePath}(%%)"
      }
    }
  }
 return text
}

def createOrUpdatePage(workTree, root, pageName, contents) {
 def xdom = services.rendering.parse(contents, "markdown/1.2")
 def xwikiSyntax = services.rendering.render(xdom.getRoot(), "xwiki/2.1")

 def fixedWikiSyntax = fixInternalLinkAnchors(xwikiSyntax)

 def newDoc
 if (pageName == "Home") {
    newDoc = xwiki.getEntityDocument("${root}", org.xwiki.model.EntityType.SPACE)
    newDoc.setTitle(root)
  } else {
    newDoc = xwiki.getEntityDocument("${root}.${pageName}",  org.xwiki.model.EntityType.SPACE)
    newDoc.setTitle(pageName)
  }

 // Add tags
  newDoc.createNewObject("XWiki.TagClass")
  newDoc.getObject("XWiki.TagClass").set("tags", ["Github import", root.split(/\./).head()])

 // Add images
 def wikiSyntaxWithImages = uploadImages(workTree, newDoc, fixedWikiSyntax)

 // Actually add the content
  newDoc.setContent(wikiSyntaxWithImages)

 // Save it
  newDoc.save("Imported from Github", true)
  println "{{code}}Document created: ${workTree}, [${root}], ${pageName} : ${newDoc.getSpace}{{/code}}"
}

def handlePage(dir, docName, rootPage, emptyPage = false) {
 if (docName.contains('#')) {
    docName = docName.substring(0, docName.lastIndexOf("#"))
  }
  fileName = "${docName}.md"
 def fileFound = false

 dir.eachFileRecurse({ if (fileName == it.name){  fileFound = it } })

 if (fileFound) {
    importedFiles += fileName.tokenize("/").last()
    createOrUpdatePage(dir, rootPage, docName, fileFound.text)
  } else if (emptyPage){
    createOrUpdatePage(dir, rootPage, docName, "")
  } else {
    println "(% style=\"color:red\" %)File does not exist: '{{{${fileName}}}}'(%%)"
  }
}


if (sourceRepoURL && sourceRepoName && sourceRepoURL != "" && sourceRepoName != "") {
  println "== Source repo: ${sourceRepoName}: ${sourceRepoURL}\n"
 def repo = getRepository(sourceRepoURL, sourceRepoName)
  result = new Git(repo).pull()
                       .setCredentialsProvider(getCredentialsProvider())
                       .call()

 if (result.isSuccessful()) {
    println "Repo pulled sucessfully.\n\n{{code}}${result}{{/code}}\n"

   def workTree = repo.getWorkTree()

   def sidebar = new File(workTree, "_Sidebar.md")

   def roots = [rootPageName]
   def lastLevel = 0
   def lastPage = ""

   if (sidebar.exists() && sidebar.canRead()) {
      sidebar.eachLine { line ->
       def level = line.indexOf('*')
       if (level < 0) {
          level = line.indexOf('-')
        }
       if (level >= 0) { // skip empty lines
         def links = (line =~ /\[[^\]]+\]\(([^)]+)\)/);
         if (level > lastLevel) {
            roots.add(lastPage)
          } else if (level < lastLevel) {
           def diff = ((lastLevel - level)/2).intValue()
            roots = roots.dropRight(diff)
          }

          lastLevel = level
         if (links) {
            links.each { link ->
              lastPage = link[1]
              handlePage(workTree, link[1], roots.join("."))
            }
          } else{
            lastPage = line.replace("*", "").trim()
            handlePage(workTree, lastPage, roots.join("."), true)
          }
        }
      }
     
      println "== Ready importing =="
      println "List of not imported files in the Github repo:"
      notImported = []
      workTree.eachFileRecurse(groovy.io.FileType.FILES) {
       if (it.toString().contains(".git/")) {
         return
        }
       if (!importedFiles.contains(it.name)) {
          notImported += it.name
        }
      }
      notImported.sort().each {
        println "*  ${it}"
      }

    } else {
      println "ERROR: Cannot open file ${sidebar}"
    }
  } else {
    println "Error pulling repo.\n\n${result}\n"
  }
} else {
  println "Source repo URL and name not set."
}

{{/groovy}}
     

Get Connected