CU Employee

Data Warehouse vs. Data Lake

Good morning CULytics Community,

Has anyone recently made a decision to implement a solution between these two data aggregation architectures?  I would love to hear the community's opinion, for or against, either of these solutions.

Thanks in advance for your time,

Jae Lee

You need to be a member of Credit Union Data Analytics and Digital Transformation Community to add comments!

Join Credit Union Data Analytics and Digital Transformation Community

Email me when people reply –

Replies

  • Vendor

    Hello ,

     

    We at Nihilent are a large Data and AI partner for many Banking, Financial and Credit Union Space.. Wil be happy to be a technology and business partner to help you and your organisation take an informed decision on implementation of Data Ware House and/or Data Lake. 

  • Community Chair

    Jae,

     

    Thanks for your question.  I personally am curious with this, as there are many reasons to choose each, and there are complications in each technology.  I would, however, speculate that the answer is both, as Dale indicated in his response about the spectrum of data.   

    Here’s where I sit:

    - If you are only deploying solutions on premise at your institution, then I think the answer more than likely will be standard data warehouse with an operational data store that mirrors the relational database technology.  My opinion is that unless you are willing to spend on the resources (both hardware and system administrators), it is difficult to deploy data lake technology on premise.  

    - If you are able to go to the cloud, then the answer likely will be a hybrid model.  The Operational Data Store is replaced with a data lake, where you can just dump a bunch of data.  This is the primary strength of the data lake - the ability to store data in a way that doesnt require the upfront source-to-target work that a traditional ETL process into a data warehouse.  Once the data has been moved into the data lake, you move it to a dimensional modeled data warehouse.  

    There are certainly good reasons to deviate from the above, and obviously this is not a comprehensive list...but like most things in the BI / Analytics world today, the answer is not one or the other - its both - using the technology to best solve for immediate business opportunities and long term data strategy. 

    Curious your thought process and what you are leaning towards as well!

     

    Mike Lindberg

     

  • CU Employee Community Chair

    Thanks for your post, Jae.

    I'll speak only for myself, but I'm increasingly of the mind that we should think of managing data along a spectrum. Some data consumers within our credit unions will be best served with the neatly curated fact and dimension tables of a data warehouse. The cost and effort of all the ETL that goes into staging the data for that context is easily proven worthwhile, given the use cases. For our CU, it's financial data, operational efficiency reporting, and line-of-business performance reporting that warrants investment in that level of processing. Business users can use non-technical tools to answer essential business challenges with very tidy data.

    But with other data, full ETL is more difficult to justify. It's used largely for ad hoc (often very speculative) analyses. Or the credit union intends to assess its value but, until that assessment is done, it's difficult to say whether the data source drives sufficient insights to merit integration into a structured warehouse. Or, the nature of the data is such that it defies logical integration into rigid data schemas, and benefits from looser associations with other data. Maybe we're talking about streams of real-time data that lose their value if they wait for overnight batch processes that move data into a warehouse. If any of these are the case, better to dump the data into a data lake with minimal curation, stockpile its history, and make it accessible to programmers who can leverage it -- rather than leave it altogether fallow (even at a risk of loss over time) in operational stores.
    To the extent that the credit union has the talent to wrench occasional or limited value from a data lake, it can reside there in its rawer form indefinitely (or while it is useful).

    It seems likely that many credit unions will be able to make the case for both structures, given the spectrum of use cases and the harsh realities that we don't have bottomless data budgets to ETL all the data sources we have (or can imagine having). Does it make sense for 20% of your data assets to be situated in a warehouse, and 80% in a lake? Or are your needs such that the percentages are better reversed? I find it more useful to think in those terms -- what needs to be in a warehouse and what is sufficiently kept in a lake -- as I evaluate where data investments are best made.

    There are very likely technical concerns to consider also, and I certainly urge others who have experience with both structures to chime in with those more practical considerations. But my own thinking starts with this kind of business perspective. Good luck with your assessment!

    Dale Davaz
    STCU R&D Strategist
    CULytics Community Chair

This reply was deleted.