Recommendations for Successful Document Migrations

By: Brian J. Stewart

Document migrations are often complex endeavors that are underestimated in a project. Migrations are increasingly more complex when migrating from an unstructured data source system such as a network file share, Documentum eRoom, or other system that doesn’t support integrated metadata management. These systems typically store metadata separately in a Microsoft Excel Spreadsheet or Microsoft Access Database.  Another complexity associated with document migration is that the source system almost always uses different data dictionaries than the new system, so the attribute values need to be mapped to the equivalent values in the new system. This is often a difficult, tedious, and time-consuming process.

A solid well written Data Migration Plan is critical to a successful migration. The plan should be as detailed as possible with no assumptions. Also, it should contain contingency planning and a well detailed process for verifying and correcting data.

In addition to a solid Data Migration Plan, there are several design and process decisions that can significantly improve the migration process and ensure a successful migration, regardless of the source or target system. Below are the design and process decision recommendations:

Always build migration tools to support multiple executions
Map and cleanup legacy metadata
Always copy source legacy metadata to the target system
Always store legacy repository name and unique identifier
Generate two output files during migration
Migration tool should support two modes – Pre-Migration and Migration
Ensure adequate User Acceptance Testing is done for Migration

These design and process recommendations will ensure quality, future supportability, and options for contingency planning in the event that something should go wrong.

Always build migration tools to support multiple executions

Migration tools should always be built to support multiple (incremental) executions. This means that the tool should support multiple passes or executions to migrate documents from a legacy system. With each pass or execution, only the documents not previously migrated or changed should be imported. The support for multiple migrations is important even if the migration is an ‘all-or-nothing’ migration where the goal is to migrate all documents prior to retiring the source system. There are many advantages of this approach, including:

Advantage #1:
If an error occurs with one or more documents, the problem can be resolved and only those specific documents impacted need to be deleted. The documents can be re-migrated in a subsequent execution without having to re-migrate all documents. This saves significant execution time, as typically migrations take several hours or days to run. This is especially true when there is a ‘data freeze’ for updates during the migration. In addition, migrating only all documents don’t need to be re-verified if a problem is encountered which affects only a small subset of documents.
Advantage #2: Manually mapping legacy data may take longer than expected, a tool that can be executed multiple times allows the data to be migrated in phases based on business priorities.

Map and cleanup legacy metadata

Migrations provide an excellent opportunity to harmonize business data and cleanup metadata. Many legacy systems contain unstructured data. Too often a decision is made, in order to expedite the migration project, to cleanup data or correct and harmonize data post migration. This is often shortsighted and the data is seldom ‘fixed’ in the new system which limits the effectiveness of searching, browsing, and locating documents. There are many techniques and options to map data.

For smaller source repositories, use Microsoft Excel Spreadsheets to map the metadata. These spreadsheets can be read and programmatically processed by both .NET and Java libraries, but also provide the business users a familiar tool to map and review data. Microsoft Excel also provides data validation, macros, and rich filtering to facilitate the mapping process.

For larger repositories there are a few options to map the metadata:

Option #1: Use a rich custom mapping tool (either web or Windows application). It is critical that the tool is designed with user experience in mind, as the business users will be asked to repetitively map each value. This is very time consuming and a poorly designed system with a lot of clicks and no copy/paste features will make the tasks more arduous.
Option #2: Use a backend Microsoft SQL or Oracle Database and a Microsoft Excel or Microsoft Access frontend to manipulate the data.

Where possible define automatic mapping rules to minimize the number of mappings that need to be done by the business. This not only reduces the number of mappings, but also improves data accuracy by reducing the potential for human error.

Another approach for mapping data is to use a staging area where legacy metadata is loaded into a tool and manually cleaned up by business users prior to production data migration. The tool can include data validation, querying, and mapping of values. This staging area provides an excellent quality gateway which must be cleared before any data is actually migrated to the new system.

Always copy source legacy metadata (unchanged) to the target system

When migrating documents to a new repository, it is useful to copy the legacy metadata to a table or data store that can be queried for future use. All metadata for all records whether the record is migrated or not should be copied to the new data store. There are many advantages and benefits to this approach.

First, the data can be used to generate post-migration reports. For example, the data can be used to generate a Migration Summary Report that details which documents were migrated and not migrated. Another report that can be generated is a Migration Mapping Summary Report that details the original and mapped metadata. These reports can be reviewed by the business as part of the overall migration process and retained for evidence of the migration.

Another key benefit of copying all source metadata is that typically not all legacy metadata is mapped to the new taxonomy because the equivalent attributes don’t exist in the new system. It is possible that the legacy metadata may be needed for reporting or other purposes in the future. Having the legacy metadata in the new system makes the data easily accessible. The data does not need to be exposed through the new user interface, it only needs to be queryable.

The final key benefit is that often post-migration business users need to know the original or legacy attribute values. There are several potential scenarios including audits, compliance or regulatory inquiries, or business reporting.

Although highly advantageous, this may not always be possible based on data size and storage constraints. At minimum, the legacy data should be archived unchanged in a queryable format and to an easily accessible archive storage.

Always store legacy repository name and unique identifier

When migrating a document to a new repository it is critical to retain the legacy identifier and source system for each migrated record. There are many potential uses for these identifiers, including:

Use #1: Tracking mechanism for incremental migrations
Use #2: Querying mechanism to update any system that integrated with the legacy system
Use #3: Querying mechanism to join legacy metadata with new metadata (see previous recommendation)

The legacy record identifier should be unique in the target system. The file path can be used as the legacy identifier for documents migrated from a file share. The source repository name can be appended as a prefix if the identifier is non-unique, such as a sequential integer.

Generate two output files during migration

All migration tools should generate two output files during the migration. One log file should contain technical information and the target audience is developers. A second report should be generated for business and quality users.

Too often only a single output file is generated. It typically contains the information useful to the developers who generally want detailed information for each record to facilitate the resolution of any issues or problems. This type of log file is often confusing and not particularly useful to non-technical business users.

A second output file should be created for business and quality users. Ideally it should be stored in a Comma-Separated Values (CSV) or Microsoft Excel file so that it can be easily viewed, queried, and filtered in a familiar tool, such as Microsoft Excel. In addition to the migration status for each record, it should contain all legacy attribute values and new mapped values. This output file can be used as evidence for the migration, as well as post-migration for any reports.

Migration tool should include two modes – Pre-Migration and Migration

The migration tool should support two modes ‘Pre-Migration’ and ‘Migration’.  

From a technical implementation perspective, it is critical that these two should share as much code as possible, including validation, error checks, and data mapping.

The ‘Pre-Migration’ mode should generate a migration report that includes source (legacy) and target (mapped) metadata, as well as an indicator whether the record can be migrated. The ‘Pre-Migration’ report should be reviewed prior to the migration by as many business users as possible to ensure data is mapped correctly. It should also contain a detailed summary (i.e. number of documents migrated or not migrated) and be grouped by product, or category to facilitate review.

Ensure adequate User Acceptance Testing is done for migration

Too often User Acceptance Testing is focused on the new system and less on the migration and on migrated records. The migration testing is typically done in System Testing by non-business individuals who can only ensure the mapping rules are properly applied. Only business users with direct knowledge of the business data can ensure the target system and migration tools meet business requirements and expectations.

In addition, User Acceptance Testing should ensure a migrated record is equivalent to a non-migrated record in all respects. Migrated records should be tested through the entire document lifecycle and all functions to ensure that this goal is met. It is very easy to miss the setting of a single attribute value during migration which may cause functionality not to work properly.

Data Migration Success Planning

Data migrations are complex and require careful planning to ensure quality and success. However, a few design and process recommendations go a long way to ensure a successful migration, including:

Always build migration tools to support multiple executions
Map and cleanup legacy metadata  
Always copy source legacy metadata to the target system
Always store legacy repository name and unique identifier  
Generate two output files during migration  
Migration tool should support two modes – Pre-migration and Migration  
Ensure adequate User Acceptance Testing is done for Migration  

 Related Article(s)

  1. Do’s and Don’ts in Document Migrations