Filesystem Attachment Porter

Last modified by Thomas Mortagne on 2021/03/18 11:28

cogTool for porting attachments from the database to the filesystem store
TypeSnippet
Category
Developed by

xwiki:XWiki.cjdelisle

Rating
1 Votes
LicenseGNU Lesser General Public License 2.1

Description

This snippet should not be used with XWiki >= 9.10RC1. XWiki now support mixed storage for attachment which mean if you switch from hibernate to file store the existing attachment will keep working without doing anything (they will stay in the database and work fine even if new attachments end up on the hard drive).

With the new filesystem based attachment storage engine, you can save attachments without placing load on the database. You can also save very large (over 1 gigabyte) attachments. However all of your existing attachments are still in the database and schemes such as checking the filesystem and then the database or automatically porting the attachments out were judged to be complex and magical, both attributes which in storage we want to avoid, the answer is to use a script to move all of the attachments under the administrator's supervision.

Anyone with a new wiki who wants to use attachment storage need only switch it on before starting their wiki for the first time and they need not worry any more about it.

Safety First:
When doing anything with storage it is important to back up your database before proceeding.

When running the script, it will walk you through the process but a preview of the steps are as follows:

Step 1: Switch to Filesystem attachments.

Edit your xwiki.cfg file by modifying the attachment store lines to read as follows:

xwiki.store.attachment.hint = file
xwiki.store.attachment.versioning.hint = file
xwiki.store.attachment.recyclebin.hint = file

Please note that, even if the script seems to be working without making these changes first (e.g. doing the port and then changing the storage), it does not actually work properly. Although it does do something, what is doing is incomplete. The script was made to work with the storage on file, it should be ran like this.

Also make sure they are not commented out.

The storage section of my xwiki.cfg file reads as follows:

#---------------------------------------
# Storage
#

#-# Role hints that differentiate implementations of the various storage components. To add a new implementation for one
#-# of the storages, implement the appropriate interface and declare it in a components.xml file (using a role-hint other
#-# than 'default') and put its hint here.
#
#-# The main (documents) storage.
# xwiki.store.main.hint=default
#-# The attachment storage.

xwiki.store.attachment.hint=file

#-# The document versioning storage.
# xwiki.store.versioning.hint=default
#-# The attachment versioning storage. Use 'void' to disable attachment versioning.

xwiki.store.attachment.versioning.hint=file

#-# The document recycle bin storage.
# xwiki.store.recyclebin.hint=default
#-# The attachment recycle bin storage.

xwiki.store.attachment.recyclebin.hint=file

#-# Whether the document recycle bin feature is activated or not
# xwiki.recyclebin=1
#-# Whether the attachment recycle bin feature is activated or not

When bringing up the wiki after making these changes, it is advisable to lock out the users from uploading attachments since they will be uploading filesystem attachments which the script will mistakenly think are database attachments which it cannot port.

Step 2: Add a new directory to your backup routine.

Since data is going to be stored on the filesystem, you need to make the directory where it is stored is backed up regularly.

Usually, that storage directory is defined in WEB-INF/xwiki.properties within environment.permanentDirectory setting

Step 3: Copy attachments from database to filesystem.

Now you are ready to copy the data over from your database to the filesystem. To do so, click "Download" above and copy the content of the script in a page in your wiki.

You'll need to have programming rights in order to be able to execute the script.

It is prudent to leave the attachments in the database since in most situations the attachment data is not bothersome just sitting in the database (The only risk of attachments left in the database is that they will bloat the size of the database files). As such, this script contains no facility to delete entries from the database.
If anything goes wrong in this function, it will fail with an error message and you should get the stack trace, keep it to confuse and humiliate the developer with. No harm should be done since this only loads from the database and only saves to the filesystem.

You can use the "dry run" option to check your attachments for corruption even if you don't have FS attachments enabled.

Checking for corruption is recommended using the "dry run" option, you can also enable "verbose" to watch the list of attachments be printed to the screen as they are processed.

Once you click the "start" button, it will do the work in 20 second chunks and print the results of each chunk. It is doing something.

Note that, because of https://jira.xwiki.org/browse/XWIKI-14354 , on MySql, you can have attachments that will not be migrated, if 2 pages exist with and without a trailing whitespace, and one or both of them have attachments (one of the page will be completely ignored). Note that the porter will not display anything in the UI for the page that is ignored, no error, nothing, and, in addition, it will lie about the number of attachments it had processed: the number listed will not be the number of attachments _actually_ processed, but the number of attachments present in the database. This number will include the number of attachments from the ignored pages, even if they had not been processed.

Note that you can use the Attachments Checker to check the state of your attachments after migration, which should discover the missing attachments.

Step 4: Make sure everything is working.

Check to make sure your attachments are still there, if an attachment is broken, it will appear to be there but on opening it will tell you the attachment does not exist. If something goes terribly wrong with filesystem attachment store, you may have your old attachment system back simply by changing the lines in the xwiki.cfg file and restarting the container. HOWEVER: This will not preserve attachments which were uploaded after switching to filesystem attachments so as users edit the system you will become locked in to filesystem attachments unless a script is written to do the inverse of this one.
Despite being experimental, the filesystem attachment storage is quite stable and the risk of actually losing something so that it is unrecoverable is very remote.
NOTE: This must be run seperately in each subwiki.

Step 5: Clean database of old attachment content.

Note that this step is optional and if you choose to do this, there is no going back and any old attachments which may have somehow failed to import will be irrevocably lost. To do this, backup your database then manually clear the xwikiattachment_content table and the xwikiattachment_archive table. Then for each entry in xwikiattrecyclebin, empty the XDA_XML field but leave all other metadata fields intact. After this step repeat step 4 again and if anything is amiss, recover from your backup dump.

And that's all it is.

Prerequisites & Installation Instructions

Copy the code snippet to a page and save it.

  • You must run this as a user with programming permission.
  • Since it is not intended to be secure, it should be removed or disabled after use.

Release Notes

v2.5

Disabled when version is not lower than 9.10RC1.

v2.4

Fixed some javascript that was preventing, in some situations, the asynchronous execution of the script.

v2.3

Workaround new groovy version which forbids private method named "main()"

v2.2

Work around for XWIKI-9657 and groovy private method issue.

v2.1

Integrated Guillaume Delhumeau's patch which works around XWIKI-7936.

v2.0

Handles large wikis much better, doing work in 20 second chunks.
Javascript front end prints attachments as they are processed.
Errors are explained better and stack traces are printed to the window rather than the log.
Dry run option to scan for corruption without saving files.

v1.2

This version is needed for 3.4 and newer.

Tags:
     

Get Connected