BUSINESS GLOSSARY VS DATA CATALOG: A TALE OF TWO CITIES?

Business glossaries and data catalogs play vital roles within data management. They are essential components in virtually any data architecture, but their purposes and interconnections are not always clear to everyone and as such worth exploring.

Exploring the relationships

A business glossary and a data catalog are closely related components within the field of data management. They both serve the purpose of organizing and documenting information about data assets within an organization, but they focus on different aspects.

Business Glossaries – Establishing the common language

A business glossary is a centralized repository or collection of terms and definitions that are specific to the organization’s business domain. It provides a common understanding and consistent definition of business terms used across different departments and stakeholders. The business glossary helps ensure clear communication and alignment between business users, data professionals, and technical teams by establishing a shared vocabulary.

Data Catalogs – Unveiling the data landscape

On the other hand, a data catalog is a comprehensive inventory of the available data assets within an organization. It provides detailed information about the structure, content, and characteristics of each dataset or data source. A data catalog captures metadata about the data, including data lineage, data sources, data quality, and other relevant information. It serves as a valuable reference for data consumers, data analysts, and data scientists to discover, understand, and effectively utilize the available data assets.

Complementary forces

The link between a business glossary and a data catalog lies in their complementary roles in facilitating data understanding and governance. While the business glossary focuses on defining business terms and ensuring consistent business vocabulary, the data catalog provides technical information about the underlying data assets. The business glossary helps users interpret and understand the data catalog by providing clear definitions of the business terms used in the metadata descriptions. In turn, the data catalog helps enrich the business glossary by associating technical metadata with the corresponding business terms, enhancing the overall understanding of the data assets and their context within the organization.

By integrating a business glossary with a data catalog, organizations can bridge the gap between business and technical perspectives, fostering better collaboration, data governance, and data-driven decision-making.


CONTACT US

Also want to take your data agenda to the next level? Would you like to find out how Datalumen can help?
 

TOP DATA OBSERVABILITY & DATA QUALITY TRENDS TO PUT ON YOUR RADAR

Data quality management is a critical component in the successful realization of your data strategy, and there are several hot topics that are currently gaining traction in this area. Here are some of the latest trends in data quality & data observability:

  • DQaaS or DQSaaS
  • Hybrid Usage & Island Hopping.
  • Machine Learning has hit the DQ space.
  • From patchwork to a fundamental capability.
  • From a technology to a business driven framework.


TREND #1. DQaaS or DQSaaS

Data quality as a service (DQaaS or DQSaaS) is an emerging trend that involves outsourcing a subset of data quality management functionality to third-party cloud application providers. DQaaS providers offer tools and services to monitor and improve data quality, reducing the workload on in-house data teams. In general, SaaS is provided in a cloud-based or hosted model.

We see two types of DQaaS:
  • One type basically providing a complete set of data quality functionalities (equivalent to traditional on-premise offerings) that run on cloud platforms. Clients typically order them on demand from cloud enabled vendors and often use them on a subscription basis (1 year or longer).
  • The other type of DQaaS is basically based on on-demand online services to i.e. validate and verify addresses or other relevant data assets. These micro data quality services are typically used on a pay per usage / service-call basis.

TREND #2. Hybrid Usage & Island Hopping

Until recently, Data Quality was primarily applied in an operational/transactional context. DQ in an analytics context was also already implemented in quiet some organization, but what is relative new is the enhanced DQ usage in a number of other data related initiatives such as MDM, DG, Data Engineering and AI/ML. What we also see as part of that move, is a shift from an island DQ usage tailored towards one specific initiative to an organization-wide DQ usage. The organization-wide DQ approach provides a number of benefits ranging from more consistent data quality up to enhanced collaboration, more efficient data processing and reuse.

TREND #3. Machine Learning has hit the DQ space

We don’t need to explain you that AI/ML is hot and all over the place. The AI/ML examples that you read about might not always seem to be relevant to you. However, DQ is one of those areas where AI/ML can really deliver substantial added value. Machine learning algorithms can be used to automatically identify data quality issues and correct them. For example, machine learning models can be trained to detect duplicates, correct spelling errors, and identify missing data. Next to more automated error resolution a lot of DQ applications are expanding DQ to provide insights by discovering relationships, patterns and trends.

We see both custombuild applications (Python and other open source libraries) and Data Quality platforms coming with embedded AI/ML functionality to bring DQ automation to the next level.

TREND #4. From patchwork to a fundamental & companywide capability

As Data Governance (DG) maturity in organizations grows, we also see that they address Data Quality as a fundamental integrated data capability. Instead of the so called one-off usage for one specific case (data migration, CRM, etc), a lot of companies make DQ a structural component that they can reuse continuously for all their data related initiatives. The benefit of this wider embedded approach is that it eases organizations to demonstrate DQ to be a profit vs a cost component.

TREND #5. From a technology to a business driven framework

As we mentioned in the previous trend and as maturity increases, we are also observing a shift in the approach from an it-driven towards a business-driven perspective. The main reason for this is that organizations require data quality to be seamlessly integrated into their business processes for optimal results.


Conclusion

In conclusion, data quality management is a critical component of data management, and there are many exciting trends and technologies emerging in this area. From data profiling to machine learning, organizations have many tools and techniques available to improve data quality and drive business growth.

KELLOG: REAL-TIME VISIBILITY INTO SUPPLY CHAIN PROFITABILITY

Reducing a 24-hour ETL process to 43 minutes

From cereal to potato chips, Kellogg’s puts some of the world’s most popular packaged foods on grocery shelves every day. But its supply chain dashboards, powered by Hadoop and SAP Object Data Services, made it impossible for managers to get the fresh data necessary for daily profitability analyses. Hadoop’s batch data ingestion required an ETL process of about 24 hours, and interactive queries were painfully slow.


From batch ETL to real-time insights

Kellogg’s looked to SingleStore to replace Hadoop for improved speed-to-insight and concurrency. Its priority was twofold: real-time ingestion of data from multiple supply chain applications, and fast SQL capabilities to accelerate queries made through Kellogg’s Tableau data visualization platform. Deploying SingleStore on AWS, with SingleStore Pipelines, Kellogg’s was able to continuously ingest data from AWS S3 buckets and Apache Kafka for up-to-date analysis and reporting. SingleStore reduced the ETL process from 24 hours to an average 43 minutes — a 30x improvement. The team then was able to bring in three years of archived data into SingleStore without increasing ETL timeframes, and subsequently was able to incorporate external data sources such as Twitter, as well.


Direct integration with Tableau

Finally, Kellogg’s was able to analyze customer logistics and profitability data daily, running Tableau visualizations directly on top of SingleStore rather than on a data extract. With SingleStore directly integrated with Tableau, Kellogg has maintained an 80x improvement on analytics performance while supporting hundreds of concurrent business users.


Source: SingleStore 

CONTACT US

Also in need for a data acceleration solution to boost your data? Want to find out how we help you optimize your data architecture for speed and agility?