A New Strategy for Big Data Analytics: Data Blending
A plethora of technological developments have emerged to make it easier to gather information from many new kinds of data sources and machines – including websites, applications, servers, networks, sensors, mobile devices and also social networks. This means that companies have no choice but to introduce strategies and projects that will help manage this dramatic increase in data volume – or better still, profit from it.
Few doubt that big data presents both challenges and opportunities for all modern businesses. However, what may be less apparent is that the real value of Big Data is not the data itself, but the whole new world of insight that can emerge when combining data from a huge array of new and established sources.
New techniques such as data blending offers huge potential – but it can be tough to adapt traditional information structures, which have enterprise data warehouses (EDWs) at their centers, to this brave new world.
Why? Because it simply doesn’t make sense to move big data into an EDW and then analyze it in the same way that we would with relational data. The structural variety and volume makes it extremely impractical and too time consuming, thereby negating the possibility of gaining insights in anything like near real time.
As a result, we are rapidly moving into an era of distributed data architectures, where data remains housed in the type of store most optimal for its volume and variety. In this distributed approach, you have alongside the traditional EDW hosting data from enterprise applications such as CRM and ERP systems, new more flexible and agile big data infrastructures such as Hadoop and NoSQL.
As a consequence new information architectures that manage the flow of data through to analytics are being constructed differently than before. Agile analytics requires data to be accessible where it resides ‘at the source’ to ensure it is based on the most up-to-date information possible.
When it comes to working with relational data (and even more so for big data), speed, quality and integrity all need to be guaranteed. Otherwise, flawed data stays flawed, with the resulting analytics having the potential to land businesses in perilous situations. As such, it makes sense to execute data blending at the stage where data integration is taking place, rather than during analysis by the end-user “at the glass”.
Permitting end users or analysts do their own data blends at the glass comes with three significant disadvantages: First, the data is not captured at the source, so it is dated by definition. Therefore it is not suitable to support decisions that involve reacting to critical events as they arise. Second, end users don’t usually understand the underlying data semantics so their blends could end up having a negative impact on data governance and security. And this leads us to the third disadvantage – inaccurate or even completely incorrect results that can have disastrous consequences for the business.
With this is in mind, organizations that have high standards for data governance and a need for real-time information should consider a big data analytics solution that enables data blending at the source.
Readers attending CeBIT have the opportunity to explore the new world of hybrid data infrastructures and learn more about data blending at the Big Data Pavilion in Hall 6/ C30. Business analytics vendor Pentaho will, for example, demo how to blend data from traditional enterprise applications and big data sources, like MongoDB, which will also be at the pavilion.