![]() |
|||||||||||||||||||||||||||
Information Hybrids Federated Systems Meld Central And Distributed Data Marts By Rick Whiting Centralized data warehouses or distributed data marts? For years, industry experts, vendors, and IS managers have debated the best approach to designing and building a data warehouse system. But a growing number of IS executives are coming around to the view that data warehousing doesn't have to take an either-or approach. Data warehouses based on a hybrid architecture, which some call "federated" or "hub-and-spoke" systems, incorporate aspects of centralized data warehouses and distributed data marts. They can also provide many of the advantages of both without the accompanying problems. Data warehouses, which first appeared around 10 years ago, are complex, expensive, and can take years to build. One estimate from Ovum Ltd., a London market-research firm, puts the failure rate of enterprise data warehouse projects at 70% or more. In response, departmental and line-of-business managers have often built their own data marts as a quick and reasonably inexpensive means of meeting their decision-support needs. But data marts create other problems. Information in line-of-business data marts might be useful to other business units across the company. "The idea is that we've already invested in the customer relationship and we want to share that information across the company," says Pat Komar, information services VP at Prudential Insurance Co. of America, which implemented a federated data warehouse system to share customer data among its business units. Another problem with having multiple data marts is that each may have its own way of defining and organizing information. "The whole idea behind data warehouses was to get to one version of the truth," says Doug Laney, an analyst at the Meta Group. The inconsistency inherent in data marts makes it nearly impossible to integrate them into a single, centralized system.
Prudential implemented its federated data warehouse system last year with the goal of breaking down the information barriers across the company's line-of-business operations. Although Prudential, in Newark, N.J., has some 50 million customers, 70% owned only a single Prudential product--even though the company's offerings include life, property and casualty insurance, securities, and other financial products and services. The company hoped that sharing customer data among business units would increase marketing and cross-selling opportunities. "The only way that we could pull it together was to build a central hub," says Komar. The hub, residing on an IBM DB2 database on an MVS mainframe, contains about 1 terabyte of summarized customer data about products, households, and customer accounts, while detailed transactional data resides in the line-of-business data stores. From the summarized data, Prudential analysts create project-specific data marts using Cognos' Impromptu and PowerPlay online analytical processing tools and SAS Institute's Enterprise Miner for analysis. Prudential uses the systems to create marketing campaigns and to identify prospects for cross-selling Prudential products and services. "We can't do that unless we know everything about the customer," Komar says. That hits on a key driver behind the move to federated data warehouse systems. Data marts tend to be focused on individual products or product lines. But more businesses today are trying to become customer-focused, which requires understanding how an enterprise is engaged with a customer across all its business units and products. Optimized Spending Carlson Wagonlit Travel in Minneapolis, a leading business travel services company, assembled a data warehouse in 1997 to store data about the travel patterns of its customers. That information is studied to optimize travel spending, such as by negotiating deals with airlines. The company uses Informatica's PowerCenter data integration software to pull data from its operational system and populate an Oracle 8.0 data warehouse. Carlson Wagonlit plans to create similar warehouses for its Pacific/ Asia and Europe/Middle East/Africa operations. These systems and the Americas data warehouse need to remain separate because they contain very different kinds of data, including differences in language and currency, for local reporting requirements. Also, a single, all-encompassing data warehouse would be in the multiterabyte range: The agency handles more than $11 billion in travel bookings each year. Still, the company sometimes needs data about its multinational customers, especially as more companies want to manage their travel expenditures on a global scale. The plan is to create a global data warehouse, using the PowerCenter tool, that will act as a hub for the regional warehouses. "This is the way for us to pull it all together," says Jay Vetsch, director of global information delivery. The global data warehouse will be about 500 Gbytes; the regional warehouses will be 100 Gbytes to 200 Gbytes each. Work on the global hub is expected to take at least a year, says Vetsch. Getting the project done in a reasonable time frame is one potential problem with federated data warehouse systems. "Any time you try to have a best-of-both-worlds system, such an architectural design is going to be quite complex," says David Wells, a principal analyst with Ovum. Adding to the problem is the fact that there's no specific definition of a federated data warehouse from which to develop a blueprint. The problem is even more complex if an organization already has a data warehouse or data mart system. "When building a federated data warehouse, you almost have to start from scratch," says Rob Armstrong, a senior data warehouse consultant at NCR Corp. At Prudential, the challenge was achieving consistency between the line-of-business data stores and the data warehouse hub. "We had 30 different ways of describing genders," Komar says. IS created a set of standard codes, including naming and reference standards and consistent identifiers, that would let the business unit data stores communicate with the hub. Only in that way, for example, could a Prudential sales manager look up a customer in the central data warehouse, then drill down to a line-of-business database to check that customer's driving-accident record. "Standardization is probably the most difficult thing," Komar says. Getting line-of-business managers to follow the data standards is key. "The whole thing is constant communications and meetings," she says. Building a federated system from existing data marts is difficult. First, the enterprise data warehouse has to be defined. "And that's no easy task," says John Santaferraro, data warehouse program manager at Hewlett-Packard. "Then you have to go back and basically reverse-engineer your data marts"--including such lengthy chores as conducting conversion analysis. Many IS managers and data warehouse experts believe that constructing a federated system is easier with a top-down approach--building the corporate-level data warehouse first, followed by attached data marts--assuming the project manager has the good fortune to be starting out with a clean slate or an enterprise-level data warehouse. One company that is taking such an approach is bowling-equipment manufacturer Brunswick Indoor Recreation, a division of Brunswick Corp. The Muskegon, Mich., company went live at the start of the year with a Microsoft SQL server 6.5 data warehouse containing sales, back-order, and warranty information. That data is pulled from an AS/400 using Ardent Software Inc.'s DataStage software and analyzed using decision support tools from Business Objects SA. But the data warehouse already contains a sales table of 3 million to 4 million rows of data, and that's going to increase by 1 million to 2 million rows by the end of the year, says programmer analyst Rob Mark. These numbers don't include order and warranty data. Mark's concern is that the data warehouse's performance will suffer as it grows, and complex reports could take a long time to process. Mark plans to create a federated system with subject-specific data marts, such as a customer profile data mart, to reduce the strain on the enterprise data warehouse. He is also exploring the use of an intranet to provide users with access to the system. But Mark is still mulling whether to keep transaction-level data in the data warehouse and provide subsets of the information to the data marts, or keep the detailed data in the marts with the centralized system holding only metadata. Centralizing the data would be easier and less costly to manage, he says, but it also puts all the data in one basket. "The big drawback would be if the main server [with the central data warehouse] crashes," he says. "Then everyone would lose access to the data." Where's The Data Another design issue is where operational data should enter the federated systems. Meta Group's Laney says data from enterprise resource planning and other transactional systems should enter at the enterprise data warehouse level and flow down to data marts. That single data feed, he says, puts less strain on the business' operational systems. HP's Santaferraro notes that "cleansing" and preparing a single stream of data for analysis is easier than data flowing into multiple data marts. Queries that require access to detailed data need only access a single source rather than multiple sources. Depending on the autonomy of an organization's line-of-business operations, however, some data marts might contain transactional data. In this case a common metadata model to reconcile differences between the data marts can be key. "All this comes down to is the metadata level and a common metadata language," says Harry Kolar, strategy and emerging technologies manager in IBM's global business intelligence solutions operation. This leads to yet another major headache--the lack of an industry metadata standard. Although a number of industry groups are working--even competing--to develop such a standard, Kolar and others don't expect one until next year at the earliest. Even political and cultural issues can hinder the construction of a federated data warehouse system. The whole idea of federated systems is that local operations maintain control of their data. But some line-of-business managers balk at having their data mart linked to a corporate system. An even bigger problem can be getting data mart owners to agree on common data descriptions and models. Prudential's Komar met regularly with line-of-business executives to get everyone onboard while the federated system was being designed. Federated data warehouses are not for everyone. Highly centralized companies or those with a single product or service can make do with a single, enterprise-level system. Conversely, within highly decentralized organizations with autonomous business units, independent data marts are the way to go. "However, most organizations aren't at either of those extremes. They're somewhere in the middle," says Ovum's Wells. As with all IT decisions, the technology needs to fit the business goals of the enterprise. The architecture of a data warehouse system should be dictated by business needs, not vice versa. "Does it have any benefit from the business perspective?" Wells asks. "If not, it's wrong." |
|||||||||||||||||||||||||||
SOLUTION GRPS | NEWS/EVENTS | CAREER OPPS | CONTACT US | SITE MAP © Copyright 1999 Waterstone Consulting |
|||||||||||||||||||||||||||