Batch Import API

Last modified by Anca Luca on 2023/01/30 00:15

cogAllows to import data from CSV and Excel files in XWiki structured documents.
TypeJAR
Category
Developed by

Anca Paula Luca, Ludovic Dubost, slauriere, mouhb

Active Installs81
Rating
0 Votes
LicenseGNU Lesser General Public License 2.1

Installable with the Extension Manager

Description

This extension provides a service to import data from the rows of an MS Excel file or a CSV file into structured documents in the wiki.

An interface for this Batch Import API which provides a wizard to import data in applications created with AppWithinMinutes is provided by the Batch Import Application.

If Excel import is wanted, then the Excel plugin needs to be installed in the wiki. If CSV is needed, the open csv library needs to be installed (see installation instructions for more information). 

The import is done based on a mapping of the columns in the file to import and the fields of an XWiki Class and some document metadata that can be set for the document (document fullname, title, tags, etc). 

How to use

To use the batch import, you can use it either as a component from java code, by requesting an implementation of the BatchImport from the component manager, or as a service from velocity or groovy, as services.batchimport (which is backed by an instance of BatchImportService).

The complete javadoc of the batch import API can be found here.

Features

  • Data to be imported can be read from a page attachment, but also from any input stream.
  • CSV separators and text delimiters can be configured.
  • The locale can be specified for the numbers and dates in the source file, any date format can be specified as well as the multiple values separator (for lists of values).
  • Office files from a folder on the server file system can be imported in the content of the created documents, based on a mapping configured in the Excel or CSV file. Any file can be attached to the imported documents based on such a configuration.
  • Multiple data deduplication options can be configured for the import: behavior of the import when two rows in the imported file seem to be mapped on the same wiki document, or when an imported document would overwrite a document already existing in the wiki.
  • Automatic generation of document names can be done based on a user configured prefix.
  • An import preview can be done to verify the mapping and data conversion, as well a simulation of the import of the whole data set.
  • The batch import service provides functions to read the import configuration from the request or from an XWiki object, thus allowing to easily handle parameters configuration and persisting the configurations.
  • XWiki 2.3+ an option makes it possible to specify whether empty values in the input file should be handled as such and overwrite possibly non-empty values in object properties or document data (title, parent, content), or should be ignored. In previous versions, empty values were always honored for document data, and ignored for object properties.
  • Detailed and internationalizable logging of the import result.
  • Extensible file readers system, with default implementation for Excel and CSV data source files.
    • the excel file reader is based on the XWiki Excel plugin which uses the jxl library for reading file
    • the csv file reader is based on the OpenCSV library
    • XWiki 2.4+ an alternative csv iterator, based on the Apache Commons CSV library, can be installed and used under the name of commonscsv. It can be installed by installing the extension with id org.xwiki.contrib:xwiki-batchimport-fileiterators-commonscsv and the version of the batch import that you're using.
  • XWiki 1.3+ Extensible post-processors system, allowing to inject transformations of the data read from the input file before it gets imported, using the visitor pattern. See Batch Import Database List Identifier Post-processor for such an example.
  • Extensible logging system, allowing to write a different logger, if needed.
  • Although out of the scope of the batch import, a function to delete the existing instances of a class is provided, to facilitate the cleanup of the imported instances.

For developers

If you want to write a new connector to a new type of source file, it is enough to implement the ImportFileIterator interface and set the hint of this implementation as the type of import in the BatchImportConfiguration used by the import.

If you want to write a new logger of the import results, the interface to implement is BatchImportLog .

Prerequisites & Installation Instructions

We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager).

You can also use the manual method which involves dropping the JAR file and all its dependencies into the WEB-INF/lib folder and restarting XWiki.

  • If Excel import is desired, the Excel Plugin needs to be installed and configured in the wiki.

Release Notes

v1.1

  • Update parent POM to XWiki Commons 4.5.4

Dependencies

Dependencies for this extension (org.xwiki.contrib:xwiki-batchimport-api 2.5):

Get Connected