Data Warehouse vs. Data Lake

Jae,

Thanks for your question. I personally am curious with this, as there are many reasons to choose each, and there are complications in each technology. I would, however, speculate that the answer is both, as Dale indicated in his response about the spectrum of data.

Here’s where I sit:

- If you are only deploying solutions on premise at your institution, then I think the answer more than likely will be standard data warehouse with an operational data store that mirrors the relational database technology. My opinion is that unless you are willing to spend on the resources (both hardware and system administrators), it is difficult to deploy data lake technology on premise.

- If you are able to go to the cloud, then the answer likely will be a hybrid model. The Operational Data Store is replaced with a data lake, where you can just dump a bunch of data. This is the primary strength of the data lake - the ability to store data in a way that doesnt require the upfront source-to-target work that a traditional ETL process into a data warehouse. Once the data has been moved into the data lake, you move it to a dimensional modeled data warehouse.

There are certainly good reasons to deviate from the above, and obviously this is not a comprehensive list...but like most things in the BI / Analytics world today, the answer is not one or the other - its both - using the technology to best solve for immediate business opportunities and long term data strategy.

Curious your thought process and what you are leaning towards as well!

Mike Lindberg

Replies

Vendor

Rakesh R Naidu October 12, 2018 at 1:36pm

Hello ,

We at Nihilent are a large Data and AI partner for many Banking, Financial and Credit Union Space.. Wil be happy to be a technology and business partner to help you and your organisation take an informed decision on implementation of Data Ware House and/or Data Lake.
CU Employee Community Chair

Michael Lindberg September 13, 2018 at 6:45pm

Jae,

Thanks for your question. I personally am curious with this, as there are many reasons to choose each, and there are complications in each technology. I would, however, speculate that the answer is both, as Dale indicated in his response about the spectrum of data.

Here’s where I sit:

- If you are only deploying solutions on premise at your institution, then I think the answer more than likely will be standard data warehouse with an operational data store that mirrors the relational database technology. My opinion is that unless you are willing to spend on the resources (both hardware and system administrators), it is difficult to deploy data lake technology on premise.

- If you are able to go to the cloud, then the answer likely will be a hybrid model. The Operational Data Store is replaced with a data lake, where you can just dump a bunch of data. This is the primary strength of the data lake - the ability to store data in a way that doesnt require the upfront source-to-target work that a traditional ETL process into a data warehouse. Once the data has been moved into the data lake, you move it to a dimensional modeled data warehouse.

There are certainly good reasons to deviate from the above, and obviously this is not a comprehensive list...but like most things in the BI / Analytics world today, the answer is not one or the other - its both - using the technology to best solve for immediate business opportunities and long term data strategy.

Curious your thought process and what you are leaning towards as well!

Mike Lindberg

This reply was deleted.

Data Warehouse vs. Data Lake

Replies

Links

Offering

CULytics Transformation Center