For validated content used in submissions, even 95% extraction accuracy is too risky. For a whole host of strategic and business efficiency reasons, not to mention compliance drivers (including preparation for the EU’s implementation of IDMP), most pharmaceutical companies are now actively looking to overhaul their regulatory information management capabilities and enrich the data they routinely store about their products.
In many cases, this means updating or establishing new systems and migrating huge volumes of content across to the new environment – and supplementing or enriching this data in the process so that it better meets ongoing needs and to be aligned with IDMP controlled value lists. Effective migration is likely to involve locating and transferring information from hundreds of thousands of content files currently residing in rudimentary file shares, where a lot of potentially valuable data currently exists in unstructured form within single-use documents.
Given the scale of the task before them, and the scarcity of spare capacity to oversee the work manually, it is easy to appreciate why Regulatory teams and supporting IT departments might look to artificial intelligence (AI) as a means of expediting the data extraction and enrichment process, as companies look to convert unstructured information into searchable and re-usable data assets in the new target system.
Certainly, AI specialist tool and service providers have made some pretty lofty promises about the technology’s potential, accuracy, and scope. With training, they say, machine learning solutions can hit 95 per cent accuracy in finding, identifying, tagging and lifting the information that is needed from commonly-used documents and other unstructured content sources. To an overstretched RA team drowning in an ocean of material, spanning metaphorical warehouses and continents in its product coverage, this promise of reliable task automation is undeniably appealing.
BUT – and there is a huge caveat here – 95 per cent accuracy, even if attainable, is still too risky for validated use cases, such as regulatory submissions preparation and management. The trouble with AI algorithm performance monitoring is that it is all statistics and trends based: details of where it is doing well or less well are much vaguer. In other words, while 95 per cent overall accuracy might sound impressive, the margin of error remains all too great if no one can be quite sure where any gaps or errors are arising. And if humans have to go through everything to check, any time and labor saving to this point will have been for nothing.
This needn’t be cause for outright disillusionment, though. For one thing, there are rules-based processes that provide more predictability than AI, which can be used instead to assume much of the legwork while retaining the assurance of human quality control.
Meanwhile, AI tools and techniques can play a useful part in non-validated content management – for example, for enriching/adding metadata to archived content which is no longer used in live submissions, but which has to be retained (e.g. for anything from 10 to 25 years) for compliance reasons. Here, smart automation offers a way to breathe new life and value into legacy records, rendering them more immediately searchable and useful. If, as part of an AI-driven data enrichment/meta-tagging exercise, 5% of the content is missed or indexed incorrectly, someone can perform a manual search or manual checks without any risk to submissions performance, marketing authorization status, or patient safety.
As ever, it’s a case of horses for courses, and for now AI promises more than it can deliver for validated regulatory content migration purposes. But that doesn’t mean there isn’t an alternative to sheer manual graft, and you can count on fme to harness the most effective tools and processes for each project.