Reducing a 24-hour ETL process to 43 minutes

From cereal to potato chips, Kellogg’s puts some of the world’s most popular packaged foods on grocery shelves every day. But its supply chain dashboards, powered by Hadoop and SAP Object Data Services, made it impossible for managers to get the fresh data necessary for daily profitability analyses. Hadoop’s batch data ingestion required an ETL process of about 24 hours, and interactive queries were painfully slow.

From batch ETL to real-time insights

Kellogg’s looked to SingleStore to replace Hadoop for improved speed-to-insight and concurrency. Its priority was twofold: real-time ingestion of data from multiple supply chain applications, and fast SQL capabilities to accelerate queries made through Kellogg’s Tableau data visualization platform. Deploying SingleStore on AWS, with SingleStore Pipelines, Kellogg’s was able to continuously ingest data from AWS S3 buckets and Apache Kafka for up-to-date analysis and reporting. SingleStore reduced the ETL process from 24 hours to an average 43 minutes — a 30x improvement. The team then was able to bring in three years of archived data into SingleStore without increasing ETL timeframes, and subsequently was able to incorporate external data sources such as Twitter, as well.

Direct integration with Tableau

Finally, Kellogg’s was able to analyze customer logistics and profitability data daily, running Tableau visualizations directly on top of SingleStore rather than on a data extract. With SingleStore directly integrated with Tableau, Kellogg has maintained an 80x improvement on analytics performance while supporting hundreds of concurrent business users.

Source: SingleStore 


Also in need for a data acceleration solution to boost your data? Want to find out how we help you optimize your data architecture for speed and agility?


As data continues to proliferate at an unprecedented rate, organizations require a powerful and flexible solution to manage, store, and analyze their data. Translytical data platforms are a new type of database management system that combines the capabilities of transactional and analytical databases. They enable businesses to perform transactional processing and analytics on the same data simultaneously in real-time or near real-time, without complex and costly ETL processes.

What are Translytical Data Platforms?

Translytical data platforms are a new class of database management systems that combine the capabilities of transactional and analytical databases. They provide the ability to process transactions and analytics simultaneously in real-time or near real-time, without the need for complex and costly ETL (Extract, Transform, Load) processes.

In other words, translytical data platforms enable businesses to perform transactional processing and analytics on the same data at the same time, resulting in faster insights and improved decision-making. These platforms are designed to handle the complexity of modern data, including structured, semi-structured, and unstructured data.

How are Translytical Data Platforms Different from Traditional Databases?

Traditional databases are designed for either transactional processing or analytics. Transactional databases are optimized for storing and processing large volumes of data related to business transactions, such as sales, inventory, and customer interactions. They ensure data consistency, accuracy, and reliability, but are not suitable for complex queries and analytics.

On the other hand, analytical databases are optimized for complex queries and reporting. They provide fast access to historical data for analysis and decision-making. However, they are not optimized for transactional processing and may require ETL processes to combine data from multiple sources.

Translytical data platforms bridge the gap between transactional and analytical databases by providing a single platform for processing transactions and analytics simultaneously. They enable businesses to perform real-time analytics on transactional data, eliminate the need for separate transactional and analytical databases, and reduce data duplication and latency.

Benefits of Translytical Data Platforms

      1. Real-Time Analytics: Translytical data platforms enable businesses to perform real-time analytics on transactional data. This means that they can get faster insights, make decisions quickly, and respond to changing business conditions.

      2. Flexible AI Foundation: Overall, translytical data platforms can provide a powerful foundation for AI applications, enabling organizations to process large amounts of data quickly and efficiently, and to gain real-time insights that can improve the accuracy and effectiveness of AI models.

      3. Simplified Data Architecture: By eliminating the need for separate transactional and analytical databases, translytical data platforms simplify data architecture and reduce data duplication and latency

      4. Improved Data Quality: Translytical data platforms ensure data consistency, accuracy, and reliability by processing transactions and analytics on the same data.

      5. Cost Savings: Translytical data platforms eliminate the need for complex ETL processes and multiple databases, reducing the cost of infrastructure and maintenance.


Translytical data platforms are the future of data management in general. They provide businesses with the ability to process transactions and analytics simultaneously, in real-time or near real-time, without the need for complex and costly ETL processes. With the ability to handle structured, semi-structured, and unstructured data, translytical data platforms provide faster insights, simplified data architecture, improved data quality, and cost savings. As the volume and complexity of data continue to grow, translytical data platforms will become essential for businesses to stay competitive and make informed decisions.


Also in need for a modern data architecture to boost your data roadmap? Would you like to find out how Datalumen can help?


Discovering new medical therapies and drugs is a complex and time-consuming process. High-performance computing and interactive data analytics can make a substantial difference and  accelerate discovery and time-to-market.

UCB  is a multinational biopharmaceutical company headquartered in Brussels. UCB focuses primarily on research and development, specifically involving medications centered on epilepsy, Parkinson’s disease, and Crohn’s disease. The company’s efforts are focused on treatments for severe diseases treated by specialists, particularly in the fields of central nervous system disorders (including epilepsy), inflammatory disorders (including allergy) and oncology.


The company’s data backbone built on Oracle database technology was struggling to meet the new data platform’s performance demands. Next to performance scalelability was also a point of attention as future data volumes would make the current system unstable.
UCB wanted to build a data platform for early stage drug discovery enabling scientists to quickly access research data via self-service and as a result accelerate the early discovery process. Its goals included:
  • Decreasing the time scientists spend waiting for data. Previously, scientists had to wait for days or weeks for the data to be made available for analysis by IT.
  • Helping scientists reach answers faster. A faster and easier-to-use system would allow scientists to test hypotheses quickly.
  • A user-friendly self-service interface. Scientists would be able to use this system without needing assistance from IT. They could easily pick and choose the right data for their experiments, prepared in the way that makes the most sense for the success of the experiment.


30X Speed improvements enabling real-time dashboards 
48X batch data refresh ops improvement 


The new data platform needed to deliver faster query performance, while being able to easily scale to handle the large, complex queries scientists used to gather research data. The new solution had to accommodate massive data sets and growth so the company wouldn’t need to transition to another database in the future. 

More importantly, UCB’s new early stage discovery data platform needed to be built around the FAIR principle: Findable, Accessible, Interoperable, and Reusable. The chosen database technology would need to fit this approach and easily connect with the rest of UCB’s data stack. 

Platform choice

Up to 800 scientists worldwide rely on UCB’s data platform every day to power their essential research. “We chose SingleStore because it offered a modern database that can accelerate time to insight with ultra-fast ingestion, super-low latency, high-performance queries, and high concurrency,” said Frédéric Vanclef, Senior IT Expert, UCB. SingleStore offers parallel, high-scale streaming data ingest that can handle trillions of events per second for immediate availability and concurrency for tens of thousands of users, which supports UCB’s current needs and positions it for future growth.


With SingleStore, UCB is now giving scientists what they need in real time to get the answers they need faster to drive their research:

  • Empowering scientists with a high-performance self-service solution 
    UCB scientists can now collect and prepare their data themselves, getting exactly what they need for their experiments. The SingleStore-powered data backbone gathers all of the research and referential data into a single source of truth, allowing users to select from a wide range of high-quality data. Now, scientists can ask more questions during early stage drug discovery. “Thanks to SingleStore, we can do more and do it faster, which is invaluable to our research,” said Vanclef.
  • Elevating the role of IT from tactical to highly strategic 
    UCB has a dedicated IT support team to assist its scientists. Before the rollout of the early stage discovery data platform, this team was responsible for handling data requests from scientists. Unfortunately, this scenario meant scientists experienced long delays between when they asked for the data and actually received it. With the new self-service solution, the IT team was able to transform its mission and actions from tactical to highly strategic.
  • Accelerating drug discovery
    UCB’s new data backbone has massive improvements in query speed and latency. Instead of being forced to wait up to 20 minutes for query results, scientists now have access to real-time data and query results in 20 seconds, reducing query latency by more than 30X. They can check on available data sets in real time, eliminating delays between data publication and availability in the data mart. The platform’s batch data refresh operation run times dropped by 48X: from 4 hours to 5 minutes(!).
  • Providing scientists with analytical flexibility
    Each data type has its own optimal analytics approach, requiring different tools to handle the wide range of information that UCB scientists work with. SingleStore supports multiple popular analytics solutions so that scientists can work with their preferred applications for a particular experiment.

Source: SingleStore UCB casestory. Full document can be downloaded here.

Also in need for a data acceleration solution to boost your data?

Would you like to find out how Datalumen can also help you as a SingleStore partner?  Contact us and start our data conversation.