Project Leader @ fme AG
April 17, 2019
The migration-center supports a new use case: the pure content-move.
In a typical document management system (DMS) there are three main components: the software, a database and the storage layer. Often the storage is a file share or SAN but it can also be another and more complex storage provider of which there are many out there. One example is the EMC Centera.
The Centera is a storage management system, which has many functionalities on top of storing content files. It does de-duplication, retention management for instance and it secures the files so it is not possible to delete anything (which is not supposed to be deleted), a read-only filestore.
Now, if there is a need to replace or get rid of the current storage system but no need to do a „real“ migration or transformation of metadata or object type definitions, it would be great to just move the content files from one storage system into the other and set the file locations accordingly.
There are some methods to do that. On a simple file share you may just copy the files to the new storage and let a script run to update the file locations. Sometimes there are also board tools in the corresponding platform, which could do that but often they aren’t convenient, fast enough nor suitable for a large amount of content files.
The „migrate_content“ admin method is a built-in admin method or job in OpenText Documentum.
It can move content files of objects from one storage to another. It offers to move all files on a specific storage to another or only specific files defined with a query. It is not bad.
The issue with that method in various scenarios is that you have no real control or overview during the process. There is no interface and no progress. It runs in the background, it is managed and queued by the content server and it writes a logfile so the report will be available at the end of the run.
After that run, assuming we have 50 million content files to move, there are most likely errors like: content is corrupted, document or a version was checked out, storage is full, etc…
So information about the run and its errors can be found deep down within the logfiles only.
Another issue could be the pure large amount of documents. If it is millions, they should be processed in batches and not all at a time. Maybe only in off-work-hours or even in a dedicated time window.
To do that is not too easy with the integrated „migrate_content“ method alone.
There is already an OpenText Documentum In-Place connector, which can be used to update only metadata in a existing repository. It also allows to re-define the object types, metadata definition and structure. It can be used in various use cases like enrichment, clean-up, restructure or reorganize. It is also capable of moving content but also limited in that regard. It can not move or copy content from a read-only device like a filestore.
• Scan a specific scope of documents with a DQL query.
• Transform any kind of attribute, e. g. to change the object types, the folder structure or simply enrich the attribute set with new or missing attributes and values from other sources of information.
• Update the existing documents in-place.
With those capabilites it is easy to enhance or extend an existing application and even create new business supporting processes without doing a real content migration or introduce a new system.
Now, that In-Place connector has the new functionality to move the content files as well, or only.
The process of scanning and importing/updating remains nearly the same (only the a_storage_type attribute needs to be scanned) but now there are options to move (only) the content files of the objects regardless of their read-only or checked-in state.
Technically it does utilize the existing „migrate_content“ method in the content server with all the additional benefits of migration-center.
No matter if you want to do in-place updates to the metadata and/or move or copy the content files of objects to another storage system – with migration-center you can do both.
Full and easy control of the whole content-move or migration process
Full flexibility: the content migration can be done in batches, in smaller, better manageable units. The batches can be grouped by any attribute, e.g. create_date or object_type.
Actively manage and control the whole process. Start, stop and pause any time to keep the impact on the system as low as possible.
Individual error handling
Error handling is much easier. It is on an object basis and very well reported. For each batch run an error report will be created and is already visible during the process of the content move. This results in better information about the error objects to investigate into the matter.
Renditions, versions and checked-out
Parameters in the Importer (Updater) enable also the move of the content files for renditions and for all versions. No need to scan all objects in the repository: the object’s current version is enough.
Additionally checked-out documents can be moved as well. The user will simply not notice that the content was moved.
All the tasks can be executed in parallel. Scanning and moving/updating can be done with many jobs at the same time. Each job is also using up to 20 threads internally. Roughly 100.000 content files are scanned and moved in under an hour – with minimal setup (1 Jobserver).
Clean up, enrich or reorganize
Another advantage is to have the ability to do updates to the metadata as well. Sometimes it is purely needed to move content (e. g. because the storage system will be shut down) the fastest way but sometimes it is really helpful also to reclassify the existing process during that process.
Carry out two tasks with one tool!