MEDIA
HELP
Author
Florian Piaszyk-Hensen
Director Products @ fme AG
January 30, 2019
The journey starts with the science fiction movie “The Wizard of Oz” in the first half of the 20th century where the concept of artificially intelligent robots was born. By the 1950s, a couple of scientists, mathematicians, and philosophers addressed the concept of artificial intelligence (AI) the very first time in a scientific context. However, what stopped the scientists from getting to work on these possibilities? First, computers needed to fundamentally change. Before 1949, computers lacked a key prerequisite for intelligence: they couldn’t store commands, only execute them. In other words, computers could be told what to do but couldn’t remember what they did. Second, computing was extremely expensive. In the early 1950s, the cost of leasing a computer ran up to 200.000 dollars a month.
Between 1957 to 1974 computers could store more information and became faster, cheaper, and more accessible. Machine learning algorithms also improved and people got better at knowing which algorithm to apply to their problem. The optimism was high and expectations were even higher. However, there was still a long way to go before the goals of AI could be achieved.
In the 1980s, AI was boosted by an expansion of the algorithmic toolkit, and available funds. The Japanese government heavily funded such projects and invested 400 million dollars with the goals of improving e.g. AI. Unfortunately, most of the ambitious goals were not met with the result that the funding has stopped, and AI fell out of the public hype. Ironically, in the absence of funding and the hype, AI developed continuously. During the 1990s and 2000s, many of the landmark goals of artificial intelligence had been achieved. In 1997, the grand master of chess Gary Kasparov was defeated by IBM’s Deep Blue, a chess playing computer program. This was a huge step for artificially intelligent decision making programs.
Today, we live in the world of “big data”, which has the capacity to collect huge sums of information and the necessary technologies to process this huge amount of data. AI applications are used successfully in several industries and allow us to solve many different problems, but how will AI affect the content migration process in the future?
Of course, complex migration projects can be very painful and expose many challenges whether a product like migration-center is used or not. One of such challenges refers to the metadata or classification of unstructured content like Office or PDF documents. Those metadata are often incomplete, incorrect or simply don’t exist in the source system, which needs to be onboarded to a new or existing Enterprise Content Management (ECM) application. To fix incomplete metadata during the migration process, much more effort is needed because metadata must be added manually or external data sources are required to complete the missing information. Most customers shy away from this additional investment during the migration project and choose the simple way by setting default values instead of using meaningful metadata. This decision has a devastating impact because documents are not migrated with a meaningful context and therefore users may not be able to find the documents as expected. With other words, the user acceptance of the application is very poor from the first day on and many ECM projects fail because the users don’t see the additional value for the new system.
To protect the investment made in the ECM application, it is highly recommended to go the extra mile and enhance the unstructured content with meaningful metadata. At this point AI comes into play; an AI support classification process could reduce this effort dramatically.
Many customers run ECM applications for decades and over these entire years, users typically create terabytes of “manually” classified documents in existing ECM systems. This classified content is a real treasure chest for a migration project and AI is the right technology to dig up this treasure.
At fme we introduced capabilities to our product migration-center, which allow us to use the existing classified documents as training data for our new AI based auto classification module. With this set of features, users are able to extract the existing data (metadata plus content) from an ECM application and train an AI algorithm. The resulting trained AI model can then be used to classify documents with incomplete or missing metadata during the migration automatically. Our algorithm detects patterns based on the unstructured content of a document and refer their patterns to possible attribute values. Afterwards, the user can define the threshold of acceptance based on a statistical confidence value between 0-100 %. If the chosen confidence level of an attribute/class has been reached, migration-center automatically sets this value as a new attribute for the corresponding document.
One of the main goals for migration-center’s AI capabilities was to reduce the start-up phase (data preparation, algorithm training and validation) dramatically to avoid unnecessary delays in the project due to the use of AI technologies. Another important goal was to implement an algorithm, which is able to classify documents with a high recognition rate with just a minimum of training data. Today, migration-center is able to classify 75% of all documents with a confidence of above 90% with just a small set of a few thousand training documents. All these characteristics allow clients to easily implement an AI based migration process for many different use cases like:
• Onboard file shares to an ECM system and set meaningful metadata
• Merge ECM repositories and harmonize/enrich metadata
• Archive file shares to an enterprise archive together with a context
• Improve eDiscovery by extracting information from the unstructured content
• Set safety classifications for documents
• Reduce the effort of creating meaningful metadata
• Increase the quality of the migration results
• Analyze the quality of existing metadata within an ECM system
• Enhance the searchability of documents