Why Centralized Repositories Aren’t Always a Good Strategy

By: Brian J. Stewart

Document management is a maturing technology. File shares with no metadata or metadata managed separately from content (i.e. in a spreadsheets or databases) have been replaced by powerful electronic document management systems (EDMS). However, instead of having a growing number of network file shares to manage, Information Technology (IT) Departments are faced with building and supporting a growing number of EDMS solutions each dedicated to the management of specific business processes. Although these EDMS solutions provide significant benefits to both IT and the end user communities, they do so with much higher implementation and support costs. This has led IT executives to question their EDMS strategies. Historically EDMS solutions were departmental solutions (or decentralized repositories), however a new trend in document management is the implementation of large centralized repositories. Centralized repositories promise several key benefits, including:

Lower cost of ownership through reduced infrastructure costs (less servers and fewer software licenses)
Lower operational costs through reduced number of application support staff and system administrators
Consistent user experience for end users across business processes
Fewer information or content silos to improve operational efficiencies

Although centralized repositories provide real and tangible benefits which can be measured and realized, these benefits must be weighed against the potential disadvantages of centralized repositories. Often the disadvantages and costs are underestimated or hidden, including the following key items:

System performance implications
Change management and support complications
Difficulties with harmonization and standardization

System performance implications

System performance often ‘makes or breaks’ an application and the user experience. A system that performs poorly decreases operational efficiencies and user productivity. Furthermore, a system that doesn’t allow users to perform their tasks efficiently will cause users to circumvent the system and develop workarounds that adversely affect the Return on Investment (ROI). Finally, a poorly performing system results in higher support costs through increased help desk tickets. For these reasons it is critical that performance be adequately planned and designed.

When designing centralized repositories it is essential to understand the key inherent advantages of decentralized and centralized repositories and how their architecture differences impact system performance. With decentralized repositories or departmental solutions:

Server workload is distributed among several servers
Content storage is distributed among the servers and regional network storage
Metadata is stored in dedicated databases

With decentralized repositories the server workload is spread across multiple servers with dedicated CPU’s and memory. Each server can be adequately sized based on the unique requirements of each EDMS solution. Additional server resources can be easily added as needed, especially when a server virtualization architecture is used. In contrast, in centralized repositories there is often contention for resources for each business workflow process and document lifecycle.

With decentralized repositories the content is distributed among servers and regional network storage. The servers and network storage is geographically placed to optimize content access. In contrast, centralized repositories also often require centralized storage of content. This results in varying user experience depending on the location of the user.

Lastly, with decentralized repositories the metadata is stored in dedicated databases. These databases are optimized based on the unique business requirements and use case scenarios. For example, the indexing strategy is designed around how the data is queried and accessed. In contrast, with centralized repositories the opposite may be true. An indexing strategy for one business workflow process and document lifecycle may not be optimal for another. Creating too many database indices (each for a different business process) can actually decrease database performance.

For centralized repository solutions to be successful, it often requires the implementation of server redundancy and clustering to support the large number of concurrent users. In addition, data replication is often required to optimize the user experience regardless of the geographic location of the end users. These architectural design choices certainly can offset the performance impact of centralized solutions. Servers can be dispersed geographically to improve performance and provide redundancy. Content can be replicated to improve the viewing of content. Although performance may be improved significantly, it results in a more complex infrastructure design that typically requires specialized knowledge for infrastructure implementation and support. Server redundancy and clustering also necessitates more expensive hardware for clustering, failover, and storage. The replication of content across distributed servers impacts network bandwidth usage for replication.

Change management and support complications

Another key consideration when deciding between decentralized and centralized repositories is the impact on the change management process. This is especially true for validated systems in regulated industries such as Life Sciences. There are several hidden change management related costs associated with centralized repositories that need to be fully understood and evaluated. Centralized repository solutions:

Often require additional regression testing
Involve additional approvers in the change approval workflow
Have more difficulty scheduling downtime for the deployment of changes
Have a broader impact on business continuity in the event of infrastructure failures

In decentralized repository solutions each system can be independently changed based on business needs. Isolated application changes limit regression testing to only the specific modules that are changed, while general application changes that impact more than one module may result in additional regression testing but is still limited in scope to the application. This is not true for centralized repository solutions used to manage content for multiple business processes. A rigorous impact assessment is necessary to ensure a change to one business process doesn’t impact another. This is especially true for general application changes which may require all stakeholders to come to a consensus – the larger the system, the more stakeholders, the more difficult consensus becomes.

With a decentralized repository, a single business lead is typically responsible for approving all changes to a system. In centralized repositories there are usually multiple business leads and subject matter experts who need to be involved in the review and approval of changes. This means the review and approval process will inevitably take more time than with separate systems. It also means there may be competing priorities and visions with system changes. For example, if documents for quality, supply chain, and product development are stored in the same repository, no single business leader can possibly represent all these departments.

It also is typically easier to schedule downtime when implementing changes in decentralized solutions since the system is only serving a single department. The opposite is true with centralized solutions which typically manage documents across departmental boundaries. For example, if quality documentation is managed in the same repository as the supply management documentation, it is possible that a change may be required for the quality department at a time when critical supply management activities are ongoing.

The impact to business continuity in the event of system failures or system maintenance is compounded with large centralized repositories. Even the best designed and implemented solutions will inevitably experience system problems that affect the availability of the system. Larger systems that serve cross-functional departments become increasingly critical to business continuity and operations because a single failure may affect exponentially more users and ongoing business activities.

Difficulties with harmonization and standardization

In order for any system to be maintainable and supportable in the long term, it must be methodically designed with careful consideration to standardization and consistency. This is especially true for large centralized systems which can easily become muddled and complex resulting in a sluggish system that is difficult to support, maintain, and change. In large centralized repositories there are key areas where standardization is critical:

Standardization of data (taxonomy) model
Harmonization of data dictionary values
Standardization of security model

These areas may be more difficult to harmonize and standardize with competing visions, business needs, and stakeholder priorities.

The first area where standardization and harmonization is required is the taxonomy, or organization and classification of documents/content. This includes the defining of the object model and attributes (property). It also includes the folder or view structure used to organize and store documents. Since all documents will be stored in a single centralized repository, all stakeholders need to agree on structure and organization to ensure a cohesive and harmonized data model rather than a disjointed taxonomy that is a byproduct of decentralized repositories. This will inevitably require significant effort and discussion due to the different visions of the stakeholders. However not only will this significant effort yield operational efficiency and support benefits, it will also make locating, searching, and presenting content consistent. This often results in productivity gains not possible with content silos associated with decentralized repositories.

 

In a centralized repository the different business workflow processes will need to share common data dictionaries. For this reason, it is critically important that the data is harmonized, which means the data values are consistent in terms of values and meaning. For example the data values would not be harmonized if the values differ in the use of all caps or abbreviations, such as the data value of Pennsylvania would not be the same as PA. This is an especially difficult task when there are legacy systems that will be migrated to a single repository. For example, in a Life Sciences organization it will be very difficult to harmonize the product dictionary between the research and development, clinical, and marketing departments due to the different stages of the product development lifecycle. Nonetheless, with a single repository it is expected that the centralized repository will yield significant efficiencies and synergies. It requires a design that reflects the different and competing needs and the harmonization of data.

 

The last key area for standardization, which is especially important in large centralized repositories, is the security model. The security model includes application roles, security groups and document security, and auditing. A poorly designed security model will cripple a system, making it difficult to grant the appropriate users the proper security rights and permissions. Too often with EDMS solutions, security is assumed to be simpler than it is in the real-world where users belong to multiple roles and exceptions are the norm. The exceptions are often overlooked until after a system goes into production. It then requires changes which if not carefully implemented can result in a hodgepodge security model. To avoid this scenario, a consistent and flexible security model needs to be designed into the system initially. This requires extensive interviewing to understand business needs and use case scenarios. This will ensure the business needs drive the system not that the security model drives the business processes. It will also yield efficiencies with user and security administration. In a large centralized repository, a flawed and poorly designed security model can make or break a system. All stakeholders need to focus on flexibility and standardization, and not on protecting fiefdoms or narrow views of content use.

Advantages vs. Disadvantages of Centralized Repositories

Centralized repositories offer many clear and measurable advantages, such as lower cost of ownership, lower operational costs, a consistent user experience, and breaking down of information or content silos. However it is equally important to adequately plan for the potential disadvantages associated with large centralized repositories, such as system performance implications, change management and support complications, and difficulties with harmonization and standardization.