SME Toolkit Logo
Partner Logo
Home  > Big Data and Analytics Summit 2014
 Share  Print Version  Email

Big Data and Analytics Summit 2014

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

Inhi Cho, the VP/GM of Big Data, Integration, & Governance at IBM, spoke on the topic of “Continuously Curate Information – Realize the Full Value of Data” at the IBM Big Data and Analytics Summit in June 2014.

Cho said the “hyper-changing world” necessitates a new way of thinking about business.  “The competitors you have to get ahead of are ones you may not be thinking about.”  This is the same across many industries.  Once a client, consumer or business knows about a particular experience, they will expect it in other areas.  Your competition is actually the “last best experience” the customer had.

Cho mentioned that it’s instinct, rehearsal, practice and training that drives the many actions we make throughout the day. IBM portfolio expansion is based on how to capture confidence and context quickly.




IBM is releasing Big SQL 3.0 with the ability to leverage ANSI SQL to Hadoop. Big SQL is IBM’s entrant in the SQL-on-Hadoop competition. The difference between SQL on Hadoop and Hadoop Connectors that you may have heard about as well, is that SQL-on-Hadoop does its processing on HDFS or HBase instead of moving the data to a relational database first.

Big SQL takes advantage of the parallelism in Hadoop (HDFS) and helps address the skills gap with Hadoop, which is often a barrier for SMBs or less mature large companies that have not yet fully invested in a Hadoop skillset.

Another new emphasis is around incremental data. Historically, most decision-making was done in batch. With batch, we needed a sufficient amount of data to make a prediction or rules. However, with incremental data, and the right infrastructure, one data point could change all prior understanding of that data – what historically only a large batch would yield. All data builds context. This, Cho said, is “knowing the entity”. This is a similar concept to what I have architecting for clients – an infrastructure that brings historical summarized data to bear on real-time data, while allowing the real-time data to also blend into the historical data for the next interaction.

For example, a small bank need to analyze whether to allow for a large withdrawal by a customer. The customer’s unique characteristics, built up over many historical transactions, could be brought into the analysis, in summary fashion, perhaps in the form of a score or a withdrawal limit. After the transaction is completed, it goes immediately into the customer analytics and the score is updated for the next withdrawal. For example, a large withdrawal now may limit immediate another one in 10 minutes.



With an organization, each worker should be able to understand quickly what’s available, what can be used and how to access it for business gain. After this “shopping” for the data, movement needs to happen in real-time. It’s not just traditional staging. It’s “information virtualization” according to Cho.

This curation approach will vary based on the nature of the data, what action is desired and the timeliness of the action.

Cho then talked about the concept of the data lake, which she had recently discussed with leaders at a large financial services company. Cho said IBM thinks about it as multiple data stores, not just Hadoop, and they prefer “data reservoir” because reservoirs are managed and controlled whereas lakes are not.

By continuously curating information, IBM wants clients to be able to leverage all data, anywhere, unconstrained and with governance with SQL-on-Hadoop, a concept very relevant to SMB organizations looking to take advantage of big data.

This article was written by William McKnight and originally published at:

 Share  Print Version  Email