DATA VIRTUALIZATION: TOP USE CASES THAT MAKE A DIFFERENCE
/by Dimitri MaesfranckxTraditional data warehouses entered on repositories are no longer sufficient to support today’s complex data and analytics landscape. The logical data warehouse (LDW) combines the strengths of traditional repository warehouses with alternative data management and access strategies to improve your agility, accelerate innovation, and respond more efficiently to changing business requirements.
Challenges include:
- A data services approach that separates data access from processing, processing from transformation, and transformation from delivery
- Diverse analytic tools and users
- Diverse data types and sources including traditional data repositories, distributed processing (big data), virtualized sources, and analytic sandboxes
- Unified business ontologies that resolve diverse IT taxonomies via common semantics
- Unified information governance including data quality, master data management, security, and more
- Service level agreement (SLA) driven operationalization
Data Virtualization provides a virtualization-centric LDW architecture solution.
With Data Virtualization you can:
- Access any source including traditional data repositories, distributed processing (big data), virtualized sources, and analytic sandboxes, both onpremises and in the cloud
- Model and transform data services quickly, in conformance with semantic standards
- Deliver data in support of a wide range of use cases via industry-standard APIs including ODBC, JDBC, SOAP, REST, and more
- Share and reuse data services across many applications
- Automatically allocate workloads to match SLA requirements
- Align data access and use with enterprise security and governance requirements
With Data Virtualization you get:
- One logical place to go for analytic datasets regardless of source or application
- Better analysis from broader data access and more complex transformations
- Faster analysis time-to-solution via agile data service development and reuse
- Higher quality analysis via consistent, well-understood data
- Higher SLAs via loose-coupling and optimization of access, processing and transformation
- Flexibility to add or change data sources or application consumers as required
- More complete and consistent enterprise data security and governance
The idea of having all of your data stored and available in a data lake sounds like a wonderful idea to everyone. Finally the promise of Big Data can become reality. But what is the reality behind implementing a data lake? Virtually all organizations already have numerous data repositories – data warehouses, operational data stores, data stored in files, etc. It is impossible to load all of this data into a data lake and give everyone access to it. Next to data volume, You should also think about other elements like different data formats, data quality and even data security.
All this complexity however doesn’t mean that you should forget about a data lake. Using Data Virtualization, it is possible to leave the data in its current environment and create a virtual or better a logical data lake.
Challenges include:
- Move all the data into a single location
- Move all the data in a timely manner to that single location
- Deliver the data as integrated data
- Deliver fresh data
- Where to store the date and how limit the impact when this storage mechanism needs to be changed
- Avoid and limit the impact of the ever changing technologies
Data Virtualization lets you:
- Isolate your data consumers from the underlying data architecture
- Be flexible in changing the underlying architecture without impacting your data consumers
- Deliver fresh data without the need to move it upfront
With Data Virtualization you get:
- Fresh data for your data consumers
- Agile data delivery in the shape the data consumers like to have it
- Deliver data independent whether it is located in traditional systems, multiple big data systems or a combination of all of these
- Agility to change the data architecture without impacting the data consumers
Physical integration is a proven approach to analytic data integration; However, long lead times associated with physical integration — on average 7+ weeks according to TDWI — can delay realizing business value. Further, physical integration requires significant data engineering efforts and a complex software development lifecycle.
Challenges include:
- Requirements. Business requirements are not always clear at the start of a project and thus can be difficult for business users to clearly communicate.
- Design. Identifying and associating new mappings, new ETLs, and schema changes is complex. Further, current data engineering staff may not understand older schemas and ETLs. This makes detailed technical specifications a key requirement.
- Development. Schema changes and ETL builds are required prior to end user validation. Resultant rework cycles often delay solution delivery.
- Deployment. Modifying existing warehouse / data mart schemas and ETLs can be difficult and/or risky.
Data Virtualization lets you:
- Interactively refine requirements, and based on actual data, build virtual data services side-by-side with business users.
- Quickly deploy agreed datasets into production to meet immediate business needs.
- Invest additional engineering efforts on physical integration later, only if required.
- If required, use mappings and destination schema within the proven dataset as a working prototype for physical integration ETLs and schema changes.
- Once physical integration is tested, transparently migrate from virtual to physical without loss of service.
With Data Virtualization you get:
- Faster time-to-solution than physical integration, and accelerated business benefits
- Less effort spent on upfront requirements definition and technical specification
- The right level of data engineering required to meet requirements, while avoiding unnecessary over-engineering
- Less disruption of existing physical repositories, schemas, and ETLs
Vendor-specific analytic semantic layers provide specialized data access and semantic transformation capabilities that simplify your analytic application development.
However, these vendor-specific semantic layer solutions have limitations including:
- Delayed support of new data sources and types
- Inability to share analytic datasets with other vendor’s analytic tools
- Federated query performance that is not well optimized
- Limited range of transformation capabilities and tools
Data Virtualization provides a vendor-agnostic solution to data access / semantic layer for analytics challenges.
Challenges include:
- Access any data source required
- Model and transform analytic datasets quickly
- Deliver analytic data to a wide range of analytics vendor tools via industrystandard APIs including ODBC, JDBC, SOAP, REST, and more
- Share and reuse analytic datasets across multiple vendors’ tools
- Automatically optimize queries
- Conform analytic data access and delivery to enterprise security and governance requirements
With Data Virtualization you get:
- One place to go for analytic datasets regardless of analytic tool vendor
- Better analysis from broader data access and more complex transformations
- Lower costs, with reuse of analytic datasets across diverse analytic tools and users
- Faster query performance
- Greater analytic data security and governance
Self-service data preparation has proven to be a great way for business users to quickly transform raw data into more analytic friendly datasets. However, some agile data preparation needs require data engineering skill and higher-level integration capabilities.
Challenges include:
- Support for increasingly diverse and distributed data sources and types
- Limited range of transformation capabilities and tools
- Constraints on securing, governing, sharing, reusing, and productionizing prepared datasets
Data Virtualization provides an agile data preparation solution for data engineers that complements business user data preparation tools.
Data Virtualization lets you:
- Interactively refine requirements and prepare datasets with business users based on actual data
- Prepare datasets that may require complex transformations or highperformance queries
- Leverage existing datasets when preparing new datasets
- Quickly deploy prepared datasets into production when appropriate
- Align data preparation activities with enterprise security and governance requirements
With Data Virtualization you get:
- Rapid, IT-grade datasets that meet analytic data needs
- The right level of data engineering required to meet requirements, while avoiding unnecessary over-engineering
- Less effort spent productionizing datasets
- More complete and consistent data security and governance
Physical operational data stores (ODS) have proven a useful compromise that balances operational data access needs with operational system SLAs.
However, replicating operational data in an ODS is not without it costs.
Challenges include:
- Significant development investments for ODS set up, and for integration projects that move data to them.
- Higher operating costs for managing the associated infrastructure.
- Integration workloads on the operational system.
- Often the operational source is not resource constrained, or operational queries may be light enough to not create significant workloads.
- When operational data is in an ODS, it may still require further transformations to make it useful for diverse analysis needs.
Data Virtualization lets you:
- Access any operational data or other sources as required
- Model and transform operational datasets quickly
- Deliver data to a wide range of operational applications via industry-standard APIs, protocols, and architectures including ODBC, JDBC, SOAP, REST, and more
- Share and reuse analytic datasets across applications
- Reduce the impact on operational sources via query optimization and intelligent caching
- Conform operational data access and delivery to enterprise security and governance requirements
With Data Virtualization you get:
- One virtual place to go for operational data
- Better analysis from broader data access and more flexible transformations
- Lower costs due to less replicated data maintained in physical ODSs
- More than good enough query performance without impacting operational system SLAs
The data hub is logical architecture that enables data sharing by connecting producers of data (applications, processes, and teams) with consumers of data (other applications, processes, and teams). Master data hubs, logical data warehouses, customer data hubs, reference data stores, and more are examples of different kinds of data hubs. Data hub domains might be geographically focused, business process-focused, or application-focused.
Challenges include:
- The data hub provision data to and receive data from analytic and operational applications
- Hub data is governed and secure
- Data flows into and out of the hub are visible
Data Virtualization data hub solution delivers these requirements.
Data Virtualization lets you:
- Introspect sources and identify potential data hub entities and relationships
- Access any data hub data source
- Model and transform data hub datasets
- Deliver data hub datasets to diverse analytic and operational applications via industry-standard APIs, protocols, and architectures including ODBC, JDBC, SOAP, REST, and more
- Share and reuse data hub datasets across multiple applications
- Conform data hub access and delivery to enterprise security and governance requirements
With Data Virtualization you get:
- A complete solution for data hub implementations
- Better analysis and business processes via consistent use of data hub datasets
- Higher analytic and operational application quality via consistent use of data hub datasets
- Greater agility when adding or changing data hub datasets
- Complete visibility into data hub data flows
- End-to-end data hub security and governance
Master Data Management (MDM)is an essential capability. Analyst firms such as Gartner have identified four MDM implementation styles (consolidation, registry, centralized, and coexistence) that you can deploy independently or combine to help enable successful MDM efforts.
Challenges include:
- Access to master and reference data from diverse sources
- A cross-reference table (index) that reconciles and links related master data entities and identifiers by source
- Data services that expose the cross-reference table to analytic and operational applications that require master data from one or more sources
- Data federation that leverages the cross-reference table when querying detailed data associated with master entities
Data Virtualization is a poven technology for registry-style MDM solutions.
Data Virtualization lets you:
- Introspect sources and identify potential master data entities and relationships
- Build a physical master data registry that relates and links master data across sources
- Cache registry copies adjacent to MDM user applications to accelerate frequent MDM queries
- Combine master, detail, and non-master data to provide more complete 360-degree views of key entities
With Data Virtualization you get:
- A complete solution for registry-style MDM implementations
- Better analysis via more complete views of master data entities across sources
- Higher analytic and data quality via consistent use of master and reference data
- Faster query performance and less disruption to master data sources
- Greater agility when adding or changing master and reference data sources
New technology provides more advanced capabilities and lower cost infrastructure. You want to take advantage. However, migrating legacy data
repositories to new ones or legacy applications to new applications technology is not easy.
Challenges include:
- Business continuity requires non-stop operations before, during, and after the migration.
- Applications and data repositories are often tightly coupled making them difficult to change.
- Big bang cutovers are problematic due to so many moving parts.
- Too often, testing and tuning only happen after the fact.
Data Virtualization provides a flexible solution for legacy system migration challenges.
Data Virtualization lets you:
- Create a loosely coupled, middle-tier of data services that mirror as-is data access, transformation, and delivery functionality
- Test and tune these data services on the sidelines without impacting current operations
- Modify the as-is data services to now support the future-state application or repository, then retest and retune
- Migrate the legacy application or repository
- Implement future-state data services to consume or deliver data to and from the new application or repository
With Data Virtualization you get:
- To take advantage of new technology opportunities that can improve your business and cut your costs
- Loose coupling you need to divide complex migration projects into more manageable phases
- Less risk by avoiding big bang migrations
- Reusable data services that are easy to modify and extend for additional applications and users
Your applications run on data. However, application data access can be difficult.
Challenges include:
- The need to understand and access increasingly diverse and distributed data sources and types
- Difficulty in sharing data assets with other applications
- Federated query performance that may require optimization
- Complex transformations that may require specialized tools and techniques
- Complex data and application security requirements that need to be enforced
Data Virtualization provides a powerful solution to these application data access challenges.
Data Virtualization lets you:
- Access any data source required
- Model and transform application datasets quickly
- Deliver data to a wide range of applications development tools via industry standard APIs, protocols, and architectures including ODBC, JDBC, SOAP, REST, and more
- Share and reuse application datasets across multiple analytic and operational applications
- Automatically optimize queries
- Conform data access and delivery to enterprise security and governance requirements
With the rise of cloud-based applications and infrastructure, more data than ever resides outside your enterprise. As a result, your need to share data across your cloud and enterprise sources has grown significantly.
Challenges include:
- The need to understand and access increasingly diverse cloud data sources and APIs
- Diverse data consumers, each with their own data needs and application technologies
- Complex transformations that may require specialized tools and techniques
- Wide-area network (WAN) query performance that may require optimization
- Complex cloud data security requirements that need to be enforced
Data Virtualization provides a powerful solution for these cloud data sharing challenges.
Data Virtualization lets you:
- Access any cloud data source
- Model and transform cloud datasets quickly
- Deliver cloud data to a wide range of applications development tools via industry-standard APIs, protocols, and architectures including ODBC, JDBC, SOAP, REST, and more
- Share and reuse cloud data across multiple applications
- Automatically optimize queries and apply caching to mitigate WAN latency
- Align data access and delivery to conform with enterprise and cloud data security and governance requirements
With Data Virtualization you get:
- One place to go for cloud and enterprise data
- Better applications from broader cloud data access and more complex transformations
- Lower costs due to dataset reuse across diverse applications
- Faster query performance
- Greater cloud data security and governance
Would you like to know how Datalumen can also help you understand how your organization can benefit from using Data Virtualization? Contact us and start our data conversation.