« On the Verge of a U.S. Manufacturing Renaissance…Or, Déjà vu All Over Again | Main | Big Data, Redux: 4 Ideas for Discussion »

Introducing “Big Data”

Posted on May 23, 2011 in Technology Trends

On Friday, May 13, The New York Times published an article on “big data,” which was based on a new report from the McKinsey Global Institute. MGI is the research unit of the well-known management consulting firm. As you might surmise, big data refers to enormous data sets. Examples could range from the video tracking of the movement of every shopper in a large suburban mall to deep-dives into digitized patient medical records.  

Big Data: The Next Frontier for Innovation, Competition and Productivity opens with some attention-grabbing stats. Here are some examples:

  • 30 billion pieces of content are shared on Facebook every month.
  • The amount of data generated worldwide is expected to rise 40% annually—or eight times faster than IT spending.
  • The U.S. Library of Congress amassed 235 terabytes of data in April 2011.
  • Global businesses stored more than 7 exabytes of new data last year, while the planet’s consumers stored more than 6 exabytes –a single exabyte is equal to more than 4,000 times the information at the Library of Congress.
  • Each second of high-definition video requires more than 2,000 times the storage capacity of one page of text.
  • The rapid increase in big data will result in a severe talent shortage.  In the US alone, big data users will need an additional 140,000 to 190,000 deep analysts and 1.5 million managers who understand how to leverage the information.

More data coming from the “Internet of Things”

The report authors also note that there are already more than 30 million networked sensor nodes installed across the automotive, industrial, transportation, retail, and utilities industries. This base is expected to grow 30% annually.

You may not realize it, but cell phone manufacturers are adding more sensors to your handheld device.  In the future, your phone will be able to provide your location and monitor your health, too. At the end of last year, there were more than 4 billion cell phone users. This covers 60% of the global population.

These sensors are examples of objects that represent the “Internet of Things.”  Some believe that the “Internet of objects” could extend to 50 to 100 trillion objects. Every person is surrounded by 1,000 to 5,000 objects. As we move closer and closer to the futuristic world presented in Minority Report, you can start to see both the opportunities and the challenges.

Early big data applications are already available

MIT’s Technology Review is one of my favorite magazines and daily news feeds. While TR hasn’t used the big data label to date, here are some examples that have appeared over the last six weeks.

  • San Francisco-based Quid’s tagline is “Mapping the World’s Technologies.” The company has built a “technology genome” based on an analysis of 35,000 companies. The data set includes published information on successes and failures, patent filings, government grants, employment ads, and social networking posts. The goal is to help ferret out the next new opportunities.
  • Canadian researchers have developed algorithms that allow hospitals to identify newborns in the neonatal ICU who may be at risk of developing an infection. Among the input factors are the onset of sleep apnea and changes in body temperature.
  • MIT-spinoff Bluefin Labs tracks the level of engagement in television programs and commercials.  Last month, the company collected more than three billion posts on Twitter and Facebook. That data set yielded 4.5 million unique authors and connected 13.7 millions posts to specific programs. The data can be used to help advertisers determine the best program/channel/timeslot in which to air their commercials, and can help network executives learn more about viewers likes/dislikes.
  • CalmSea provides information gleaned from Facebook to retailers to help them offer better promotions. The company provides a dashboard that segments customers by their behaviors, likes, and social networks. The client can select from a palette of promotions to offer their customers. The promotions are then offered via Facebook or Twitter. CalmSea tracks the effect of the promotion on the specific customer, as well as their friend network. All of this can be compared to other retailer systems, such as loyalty programs and product returns.

There can be a lot of personal information in the big data pile, too. A recent Technology Review piece introduced readers to I Can Stalk You, a website set up by two researchers to warn consumers that  they are unwittingly providing too much information about themselves. At a recent conference, the founders showed the audience how your cell phone can be used to disclose too much personal information.

They started with cell phone pictures posted on an anonymous Twitter account. Since each snapshot was encoded with location metadata, they were able to use a variety of sources to find the person’s home address, name, place of work, wife’s name, and information about his kids. One can imagine how this could be used by burglars.

The data sets only get larger. On a typical day, I Can Stalk You downloads 15 gigabytes of pictures, scans more than 35,000 tweets, and analyzes 20,000+ photos.

Next week:  Big data and emerging technologies

If the volume of data continues to grow at 40% per year, this will be great news for vendors offering storage systems, networking gear, and data mining/data management software. 

What are some of the other opportunities? As frequent readers of The View from Inside know, I’m fixated on data visualization and the concept of “wall apps.” My fellow Inforians and I are also intrigued by some of the new work being done in database technology. Please look for more on that next week.

In the meantime, I welcome your feedback and ideas. What do you think of big data? Is it an opportunity for your company to leap ahead of your competition, or just a new name to the same old data management challenges?


| Save to del.icio.us | Digg This

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Kitipan

Definitely Bruce, more operational BI opportunities will occur from this big data.

But before that, transferring the operational data into management information will play its important role.
The intermediate process of selecting, cleansing, mapping data using time series and correlation among them will be a hurdle as operational data deals with data only what matters to it, not the surroundings, hence, the only correlation possible for matching difference sets of big data is time.

Real time data capturing e.g. Osisoft's PI, Aspectech's InforPlus21 will therefor be more widely used. That's where the $$$ will be for the next couple of years.

Peter

We're only now starting to look at the data amassed by our Workbrain installation, years after it went in. While nowhere near the scale that qualifies as 'big data' to Facebook or Google, it's still a sizable chunk, that requires special techniques and skills to examine. I'm curious what you have coming.

Tony Heringer

Coherence is key. I like that its called "big data" and not "information glut or overload". We have to make sense of this mass of bits and bytes and while I might be able to single out an invidivual what's the incentive -- legal incentive -- for doing so? What markers are going to yield the most consitent results? I agree, there will be an army of folks required to deal with this stuff, I'm not so sure about the analyst/manager ratios though :-)

Thanks again for the weekly treat!

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them By posting your comments, you are agreeing to Infor's Terms & Conditions of Weblog Use which can be viewed here.

If you have a TypeKey or TypePad account, please Sign In




Watch all of Bruce's Conversations