FISHING FOR BETTER BIG DATA INSIGHTS WITH AN INTELLIGENT DATA LAKE

Fishing in a lake and a data lake are much the same.
Data scientists must not only go where the fish are for big data insights, but also find a way to quickly build the data pipeline that turns raw data into business results.

When fishing it doesn’t matter how good of a fisherman you are—you’re not going to catch anything if you’re not fishing where the fish are. This same bit of advice extends to data lakes. 

Not even the best data scientists in the world can find insights in data lakes that are nothing but data swamps. But that’s what most data analysts are using today—swamps filled with databases, file systems, and Hadoop clusters containing vast amounts of siloed data, but no efficient way to find, prepare, and analyze that data. That is why ideally you have collaborative self-service data preparation capabilities with governance and security controls.

With this in mind, Informatica launched Big Data Management, which included a Live Data Map component to collect, store, and manage the metadata of many types of big data and deliver universal metadata services to power intelligent data solutions, such as the Intelligent Data Lake and Secure@Source. Intelligent Data Lake leverages the universal metadata services of Live Data Map to provide semantic and faceted search and a 360-degree-view of data assets such as end-to-end data lineage and relationships.



In addition to smart search and a 360-degree-view of your data, Intelligent Data Lake provides analysts with a project workspace, schema-on-read data preparation tools, data profiling, automated data discovery, user annotation and tagging, and data set recommendations based on user behavior using machine learning. These capabilities make it much easier for analysts to “fish where the fish are” for big data insights.  

In order to “land the fish” and turn these insights into big value, there needs to be a way to quickly build the data pipeline that turns raw data into business results. Intelligent Data Lake does this automatically by recording all the actions of a data analyst as they prepare data assets in what is called a “recipe.” These recipes then generate data pipelines (called mappings in Informatica) that IT can automatically deploy into production. What better way to turn insights into business value and fry up those fish you just caught?

If you want to see how an Intelligent Data Lake works through a live demo, please contact us or have a chat with us at the upcoming Big Data & Analytics 2017 event.

 

THE NEED FOR TOTAL DATA MANAGEMENT IN BIG DATA

The buzz about “big data” is here for a couple of years now.  Have we witnessed incredible results? Yes. But maybe they aren’t as impressive as previously believed they would be. When it comes down to Big Data, we’re actually talking about data integration, data governance and data security. The bottom line? Data needs to be properly managed, whatever its size and type of content. Hence, total data management approaches as master data management are gaining momentum and are the way forward when it comes down to tackling an enterprise’s Big Data problem.

Download the Total Data Management in Big Data infographic (PDF).

Data Integration:
Your First Big Data Stepstone

In order to make Big Data work you need to address data complexity in the context of the golden V’s: Volume, Velocity and Variety. Accessing, ingesting, processing and deploying your data doesn’t automatically happen and traditional data approaches based on manual processes simply don’t work. The reason why these typically fails is you because:

  • you need to be able to ingest data at any speed
  • you need to process data in a flexible, read scalable and efficient, but also repetitive way
  • and last but not least you need to be able to deliver data anywhere and with the dynamics of the ever changing big data landscape in mind, this is definitely a challenge

Data Governance:
Your Second Big Data Stepstone

A substantial amount of people believe that Big Data is the golden grail and consider it as a magical black box solution. They believe that you can just get whatever data in your Big Data environment and it miraculously is going result into useful information. Reality is somehow different. In order to get value out of your initative, you also need to actually govern your Big Data. You need to govern it in two ways:

Your Big Data environment is not a trash bin.

Key for success is that you are able to cleanse, enrich and standardize your Big Data. You need to prove the added value of your Big Data initiative so don’t forget your consumers and make sure you are able to generate and share trusted insights. According to Experian’s 2015 Data Quality Benchmark Report, organizations suspect 26% of their data to be inaccurate. Reality is that with Big Data this % can be even be 2 to 3 times worse.

 

Your Big Data is not an island.

Governing your Big Data is one element but in order to get value out of it you should be able to combine it with the rest of your data landscape. According to Gartner, through 2017, 90% of the information assets from big data analytic efforts will be siloed and unleverageable across multiple business processes. That’s a pity given that using Master Data Management techniques you can break the Big Data walls down and create that 360° view on your customer, product, asset or virtually any other data domain.

Data Protection:
Your Third Big Data Stepstone

With the typical Big Data volumes but also growth in mind, many organizations have limited to no visibility into the location and use of their sensitive data. However new laws and regulations like GDPR do require a correct understanding of the data risks based on number of elements like data location, proliferation, protection and usage. This obviously applies to traditional data but is definitely also needed for Big Data. Especially if you know that a substantial amount of organizations tend to use their Big Data environment as a black hole, the risk of having also unknown sensitive Big Data is real.

How do you approach this:

Classify

Classify your sensitive data. In a nutshell, data inventory, topology, business process and data flow mapping and operations mapping.

De-identify

De-identifies your data so it can be used wherever you need it. Think about reporting and analysis environments, think about testing, etc. For this purpose masking and anonymization techniques and software can be used.

Protect

Once you know where your sensitive data is located you can actually protect it through tokenization and encryption techniques. These techniques are required if you want to keep and use your sensitive data in the original format.



More info on Big Data Management?

Would you like to know what
Big Data Management can also mean for your organization?
Have a look at our Big Data Management section 
and contact us.