The crux here is that much more effort is required to correct incomplete metadata during the migration process, as metadata must be added manually or external data sources are needed to fill in the missing information.
Director Products @ fme AG
January 30, 2019
Today, we live in a world of “big data” enabling us to collect and process tremendous amounts of information. In this rapidly evolving digital landscape, content migration has become a critical aspect of business growth and transformation. At fme, we are now asking ourselves how the advent of Artificial Intelligence (AI) might affect us to manage the growing complexity of content migration processes more efficiently.
We all agree that migration projects can be very tedious and present many challenges. One of them concerns the metadata or classification of unstructured content such as Office or PDF documents. This metadata is often incomplete, incorrect, or does not exist in the source system that needs to be integrated into a new or existing enterprise content management (ECM) application.
As you might expect, most of our customers shy away from this additional investment during their migration project. They choose the easy way out by setting default values. Beware: this decision has disastrous consequences. The documents are migrated with insufficient context, and users are therefore unable to find them as expected. In other words, from day one, the user acceptance of the application is very poor since they don’t see the added value of the new system.
To improve the quality of the ECM platform, it is strongly recommended to enrich the unstructured content with meaningful metadata. This is where AI comes into play: an AI-assisted classification process can drastically reduce this effort. As companies have typically created terabytes of “manually” classified documents in their existing ECM systems over the years, AI is just the technology to unlock this treasure of data.
We have added new functionalities to our content migration software migration-center allowing us to use existing classified documents as training data for the new AI-based auto-classification module. With these functions, users can extract the existing data (metadata and content) from an ECM application and train an AI algorithm.
The resulting trained AI model can then be used to automatically classify documents with incomplete or missing metadata during migrations. Our algorithm recognizes patterns based on the unstructured content of a document and maps these patterns to possible attribute values.
Subsequently, the user can specify the threshold for acceptance based on a statistical confidence value between 0 and 100 %. Once the selected confidence level of an attribute/class is reached, migration-center automatically sets this value as a new attribute for the corresponding document.
One of the main goals for migration-center’s AI capabilities was to drastically shorten the start-up phase (data preparation, algorithm training, and validation) to avoid unnecessary delays in the project due to the use of AI technologies.
Another important goal was to implement an algorithm that is able to classify documents with a high recognition rate in combination with just a minimum of training data. Today, migration-center is able to classify 75 % of all documents with a confidence of above 90 %, and this with only a small set of a few thousand training documents.
With all these features, customers can easily implement an AI-based migration process for many different use cases:
In summary, migrating content into a new or existing ECM system can partially be simplified by leveraging AI capabilities. However, most migration projects are highly complex and will continue to require manual effort (just think about data preparation and algorithm training mentioned earlier). As technology continues to evolve at a rapid pace, we are looking for exciting advances in this area.