FISHING FOR BETTER BIG DATA INSIGHTS WITH AN INTELLIGENT DATA LAKE

Fishing in a lake and a data lake are much the same.
Data scientists must not only go where the fish are for big data insights, but also find a way to quickly build the data pipeline that turns raw data into business results.

When fishing it doesn’t matter how good of a fisherman you are—you’re not going to catch anything if you’re not fishing where the fish are. This same bit of advice extends to data lakes. 

Not even the best data scientists in the world can find insights in data lakes that are nothing but data swamps. But that’s what most data analysts are using today—swamps filled with databases, file systems, and Hadoop clusters containing vast amounts of siloed data, but no efficient way to find, prepare, and analyze that data. That is why ideally you have collaborative self-service data preparation capabilities with governance and security controls.

With this in mind, Informatica launched Big Data Management, which included a Live Data Map component to collect, store, and manage the metadata of many types of big data and deliver universal metadata services to power intelligent data solutions, such as the Intelligent Data Lake and Secure@Source. Intelligent Data Lake leverages the universal metadata services of Live Data Map to provide semantic and faceted search and a 360-degree-view of data assets such as end-to-end data lineage and relationships.



In addition to smart search and a 360-degree-view of your data, Intelligent Data Lake provides analysts with a project workspace, schema-on-read data preparation tools, data profiling, automated data discovery, user annotation and tagging, and data set recommendations based on user behavior using machine learning. These capabilities make it much easier for analysts to “fish where the fish are” for big data insights.  

In order to “land the fish” and turn these insights into big value, there needs to be a way to quickly build the data pipeline that turns raw data into business results. Intelligent Data Lake does this automatically by recording all the actions of a data analyst as they prepare data assets in what is called a “recipe.” These recipes then generate data pipelines (called mappings in Informatica) that IT can automatically deploy into production. What better way to turn insights into business value and fry up those fish you just caught?

If you want to see how an Intelligent Data Lake works through a live demo, please contact us or have a chat with us at the upcoming Big Data & Analytics 2017 event.

 

WATCH PROF. DR. PIETER BALLON – IMEC & BART VANDEREIJT – CARREFOUR AT BIG DATA & ANALYTICS 2017, 7/12/2017

HOW TO REALLY SHIFT FROM IT-DRIVEN TO SELF-SERVICE ANALYTICS WITH DATA VIRTUALIZATION? LOOK BEYOND THE SHOP WINDOW.

Business intelligence & analytics today have dramatically shifted from the traditional IT-driven model to a modern self-service approach. This is due to a number of changes, including the fact that the balance of power has steadily shifted from IT to the business, and also the fact that the business community has new access to more innovative technologies that give them powerful analytical and visualization capabilities (e.g. Tableau, …).  This increased use and capability has put the business in the driver seat of much front-end BI decision-making. 

In order to help your business community continue to increase its self-service capabilities, there is one important, but often-overlooked item: Many implementations fail to realize their full potential because they fall into the trap of building out just the proverbial shop window, and forgetting the actual shop!  It is just as important to add increased accessibility and flexibility to the underlying data-layer (and ease the access, discovery, and governance of your data), as it is to provide users the front-end thru powerful analytics and visualization capabilities. 

With respect to self‐service analytics, four phases can be identified in the market. These also typically mirror how analytics are implemented in many of companies. The following diagram describes in four phases how data virtualization can strengthen and enrich the self‐service data integration capabilities of tools for reporting and analytics:

 

THE NEED FOR DATA PREPARATION AND DATA VIRTUALIZATION

To support both IT-driven and Business-driven BI, two techniques are required: data preparation, and data virtualization.   There are a many scenarios where you can use these techniques to strengthen and speedup the implementation of self‐service analytics:

  • Using data virtualization to operationalize user‐defined data sets
  • Using data virtualization as a data source for data preparation
  • Using data virtualization to make data sets developed with data preparation available for all users 

To learn about how to succeed in your data journey, feel free to contact us. More info about our full spectrum of data solutions is also available on the Datalumen website.

Read more in detail about the different scenario’s in the ‘Strengthening Self-Service Analytics with Data Preparation and Data Virtualization’ whitepaper. In addition, this whitepaper describes how these two BI forms can operate side by side in a cooperative fashion without lowering the level of self‐service for business users. In other words, it describes how the best of both worlds can be combined. This whitepaper is written by Rick Van Der Lans, an indepedent analyst and expert.

THE NEED FOR TOTAL DATA MANAGEMENT IN BIG DATA

The buzz about “big data” is here for a couple of years now.  Have we witnessed incredible results? Yes. But maybe they aren’t as impressive as previously believed they would be. When it comes down to Big Data, we’re actually talking about data integration, data governance and data security. The bottom line? Data needs to be properly managed, whatever its size and type of content. Hence, total data management approaches as master data management are gaining momentum and are the way forward when it comes down to tackling an enterprise’s Big Data problem.

Download the Total Data Management in Big Data infographic (PDF).

Data Integration:
Your First Big Data Stepstone

In order to make Big Data work you need to address data complexity in the context of the golden V’s: Volume, Velocity and Variety. Accessing, ingesting, processing and deploying your data doesn’t automatically happen and traditional data approaches based on manual processes simply don’t work. The reason why these typically fails is you because:

  • you need to be able to ingest data at any speed
  • you need to process data in a flexible, read scalable and efficient, but also repetitive way
  • and last but not least you need to be able to deliver data anywhere and with the dynamics of the ever changing big data landscape in mind, this is definitely a challenge

Data Governance:
Your Second Big Data Stepstone

A substantial amount of people believe that Big Data is the golden grail and consider it as a magical black box solution. They believe that you can just get whatever data in your Big Data environment and it miraculously is going result into useful information. Reality is somehow different. In order to get value out of your initative, you also need to actually govern your Big Data. You need to govern it in two ways:

Your Big Data environment is not a trash bin.

Key for success is that you are able to cleanse, enrich and standardize your Big Data. You need to prove the added value of your Big Data initiative so don’t forget your consumers and make sure you are able to generate and share trusted insights. According to Experian’s 2015 Data Quality Benchmark Report, organizations suspect 26% of their data to be inaccurate. Reality is that with Big Data this % can be even be 2 to 3 times worse.

 

Your Big Data is not an island.

Governing your Big Data is one element but in order to get value out of it you should be able to combine it with the rest of your data landscape. According to Gartner, through 2017, 90% of the information assets from big data analytic efforts will be siloed and unleverageable across multiple business processes. That’s a pity given that using Master Data Management techniques you can break the Big Data walls down and create that 360° view on your customer, product, asset or virtually any other data domain.

Data Protection:
Your Third Big Data Stepstone

With the typical Big Data volumes but also growth in mind, many organizations have limited to no visibility into the location and use of their sensitive data. However new laws and regulations like GDPR do require a correct understanding of the data risks based on number of elements like data location, proliferation, protection and usage. This obviously applies to traditional data but is definitely also needed for Big Data. Especially if you know that a substantial amount of organizations tend to use their Big Data environment as a black hole, the risk of having also unknown sensitive Big Data is real.

How do you approach this:

Classify

Classify your sensitive data. In a nutshell, data inventory, topology, business process and data flow mapping and operations mapping.

De-identify

De-identifies your data so it can be used wherever you need it. Think about reporting and analysis environments, think about testing, etc. For this purpose masking and anonymization techniques and software can be used.

Protect

Once you know where your sensitive data is located you can actually protect it through tokenization and encryption techniques. These techniques are required if you want to keep and use your sensitive data in the original format.



More info on Big Data Management?

Would you like to know what
Big Data Management can also mean for your organization?
Have a look at our Big Data Management section 
and contact us.


 

HOW TRADITIONAL BANKING COMPANIES COUNTER FINTECH WITH DATA GOVERNANCE?

The new digital environment as well as a tough regulatory climate force the financial industry to adapt its business model in order to meet the demands of investors, regulators and customers. Today we mainly want to address the aspects of customer experience that traditional bankers ought to reflect on copying – or even exceeding. Because actually, it is customer experience that could be the traditional bank’s biggest asset. By this we mean that traditional banks are a one-stop shop for a broad range of financial products and services. This could serve both as an advantage as well as a competitive weakness to FinTech startups. Many traditional banks are still organized into silos. With business lines for individual products and services that use separate information systems and do not communicate to one another.

To improve on the customer experience, banks must be able to analyze customer information (data) and make that data useful for both the business and the customers. This is basically what Fintech does. However, they first need to gather the data. Traditional banks with a good data governance program, already have those data. They should have an advantage and leverage that.

To counter the extreme effectiveness and customer experience brought by new Fintech startups, some financial institutions are already upping their tech game. They work on the improvement of the user experience, they provide more insightful data analysis and increase cybersecurity.

While these are all true and important for banks, we believe getting “insightful data” is a little underestimated. There’s no data insights without clean data. There’s no clean data without a strong governance.

Data governance is all about processes that make sure that data are formally managed throughout the entire enterprise. Data governance is the way to ensure that data are correct and trustworthy. Data governance also turn employees accountable for anything bad occuring to the company resulting from a lack of data quality.

The role of data governance in the bank of the future?

The bank of the future is tech- and data-driven. Today’s digital capabilities turn the customer journey into a personalized experience. The bank of the future is predictive, proactive and understand the customers’ needs. It’s some sort of “Google Now for Banking”, suggesting actions proactively.  The bank of the future is a bank for individuals, it’s personalized in the range of services and products it offers to the individual – based on in-depth knowledge and understanding of the customer. By having up-to-date and correct data, you can truly serve customers.The “Bank of the Future” positions itself as ‘the bank that makes you the banker’. It thrives on interaction and a deep knowledge of its customers through data mining.

As the existing banking model is unbundled, everything about our financial services experience will change. In five to ten years, the industry will look fundamentally different. There will be a host of new providers and innovative new services. Some banks will take digital transformation seriously, others will buy their way into the future by taking over challengers and some will lose out. Some segments will be almost universally controlled by non-banks; other segments will be better within the structural advantages of a bank. Across the board, consumers will benefit as players will compete on innovation and customer experience. This is only possible with solid multi-domain, cross-silo data management with a solid data governance program on top of it.

FIND OUT WHO’S LEADING THE 2017 MASTER DATA MANAGEMENT SOLUTIONS MARKET

Gartner recently released an updated version of the Magic Quadrant for Master Data Management Solutions report and positioned Informatica & Orchestra Networks as leaders in this MDM market segment. 

 

 

Our key takeaways:

  • Specialized MDM vendors still lead the market and offer biggest added value. Vendors, delivering niche data solutions directly linked to i.e. specific business applications, seem to lack the focus and as result don’t meet the function & features of the leaders.
  • The multi-domain data trend continues to happen. This is the first edition of the Gartner Magic Quadrant on Master Data Management where Gartner consolidates the different data domains. This translates also a trend that we see it in the market where customers need an MDM solution that covers all their domains and not only customers, products, assets, etc. Reality is that customers need an MDM solution that they can use to address their complete data agenda.
  • Informatica still makes a difference compared to other vendors due to its end-to-end data capabilities which are also key for a successful MDM implementation. Data Quality, Data Governance and Data Integration are key components of any MDM platform and are available as integrated functionality in the Informatica MDM solution.

 

Learn more. Download the full Gartner report here

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document.

Gartner does not endorse any vendor, product, or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Gartner, Magic Quadrant for Master Data Management Solutions, Bill O’Kane, Terilyn Palanca, Michael Patrick Moran January 2017.


If you are preparing or in the process of implementing an MDM solution and you are interested in getting additional advice, get in touch with us and we are happy to help you out.

 

THE NEVER ENDING WRESTLING GAME OF DATA SECURITY IN THE CLOUD

Despite the growing popularity and actual implementations of cloud applications, the majority of organizations today are not adjusting their governance to secure their cloud data. This is illustrated by The 2016 Global Data Security Report conducted by the Ponemon Institute.

3 KEY FINDINGS FROM “THE 2016 GLOBAL CLOUD DATA SECURITY STUDY”

  • Half of all cloud services and corporate data stored in the cloud are not controlled by IT departments
    On average, 47% of all data in the cloud is not managed by the IT department. You can argue about who should actually be in the driver’s seat when talking flexibility, time to market, etc. However involvement from your security staff is something else and should be a no-brainer. The risk of shadow IT initiatives that go under the radar basically makes that your cloud data is typically the weakest link and generates the highest risk.
  • Only a third of sensitive data stored in cloud-based applications is encrypted
    72% of the respondents believes that protecting sensitive information through data encryption and data tokenization is important. In contradiction with this, only 34% says their Software-as-a-Service (SaaS) data is indeed encrypted or tokenized. Relying on the security function-features from a Cloud platform provider is one thing, it still doesn’t guarantee that your sensitive data is really secure. The only way to get there is using the proper encryption techniques and best practice is that you use the same policies and technology across your complete data landscape (on-premise and cloud).
  • More than half of companies do not have a proactive approach for compliance with privacy and security regulations for data in cloud environments
    73% of about 3500 participants indicated that cloud services and platforms are important. 81% even confirmed that the importance of cloud in the next two years will grow. Despite this trend, 54% says that their organization has no proactive data protection approach. With compliance regulations like the General Data Protection Regulation (GDPR) in mind, this seems a rather scary and risky thought.


THE REALITY GAP

The fact that companies are wrestling with protecting cloud data is somehow caused by the idea that these platforms and data are managed by an external party. Companies should realize that when they approach their data governance agenda, it is linked to both their traditional on-premise and remote cloud data. The data reality is hybrid and the idea of your cloud platforms being disconnected islands is long gone. A uniform and consistent data protection approach covering all your data, regardless of the location, is in essence what companies should target.

.