What you’ll get by dipping your toe in a fresh, clean data lake
February 6, 2018
By Massimo Capoccia, Infor SVP of Development
“Data lake” is a term you’ve likely encountered recently. The idea is simple: Data lakes absorb vast repositories of varied, often unstructured data, and store them safely until later, when they can be sorted out to provide helpful insights. The data lake model gives organizations the ability to generate powerful insights and understanding from loosely-structured pools of data that were once too murky to yield much value. This is especially helpful in the enterprise, where those pools are often too numerous to track.
Today’s world is awash in tweets, texts, IoT data, orders, invoices, inquiries, compliments, complaints, and a thousand other data types that might be useful later. But it doesn’t make sense to build an expensive, carefully organized data warehouse for all that information without knowing why. For many businesses, it can be much more economical to establish a single location for storing all business information where everything is accessible, searchable, and available for analysis—even if none of that information is sorted. That location is your data lake.
Some skeptics doubt the value of data lakes, pointing out that most business users lack the data manipulation and analytical skills they’d need to effectively use a large body of unstructured information. And the skeptics have a point—you need to implement effective processes and technology to turn unstructured data into value.
Where the skeptics err, however, is in assuming that simple, low-cost data lakes should replace complex, expensive data warehouses. At this point in its evolution, data lake technology offers an economical path and cost-effective way to store a wide variety of data types to be reused for different use cases. And yes, it’s overkill to use a data lake exclusively for analytics. However, when it is applied to a wide assortment of use cases, the economics become interesting and strategically valuable.
The Infor Data Lake (part of Infor OS) adds value as it collects all your CloudSuite data, IoT data, documents, third-party application data, and more in one single place. It includes intelligent data ingestion, metadata management (metagraph), and key elements of big data architecture. It also provides an assortment of interfaces that allow you to access and consume that data, depending on how you need to use it (APIs, SQL, Elastic Search, etc.). That gives you more ways to find solutions to business problems using the data you already have.
We’re not pretending that a data lake will solve all your business problems. By their nature, first-generation data lakes are exploratory, a way of discovering unexpected relationships within your data to deliver better insight. But until you have all your data available in one place, it’s impossible to identify and evaluate many important insights.
We’re building a suite of self-service interfaces to help you make the best use of the data you’ve stored. For example, through the Infor Coleman AI PaaS, we will include a way for developers to select and analyze data sets through machine learning. They can then expose this logic via APIs or through events that can be interrogated by the CloudSuite. That’s just one example of the methods Infor developers are designing to structure information and help you find the answers you need.
In addition, our new Infor OS technology includes data push capabilities to establish a clean, useful, near real-time data lake that you can analyze with the help of our Birst BI platform. Then, with our new Approva OS in combination with Infor Coleman, you can establish real-time risk analysis and fraud prevention, and discover emerging trends quickly enough to take effective action. If you start thinking about stocking your data lake now, you’ll get much better results from using it in the future.