What is a cloud data warehouse

As organization’s prioritize their digital transformation goals, two trends in modernization, namely the hybrid cloud and the “cloud data warehouse,” have converged presenting a real opportunity to move the needle in terms of digitally “future-proofing” the enterprise. The adoption of hybrid cloud environments have enabled the development of cloud data warehouses which, in turn, solve the need for agility and adaptability in delivering strategic data to the business.

A significant collection of cloud providers and data warehouse vendors have come to market with cloud data platforms to provide a more viable, scalable and integrated approach to deploying data warehouses, data lakes and the tooling to deliver advanced analytics from the data they manage. This is euphemistically known as acquiring a “lake house in the cloud.” Combine this with new, more capable and easily adaptable data warehousing architectures and methodologies such as a data vault, and organizations now feel they can significantly optimize their return on data through a data warehouse modernization initiative.

What is a data warehouse?

A data warehouse is a centralized data repository that can be analyzed to make better decisions. Data is regularly replicated into the data warehouse from transactional systems, relational databases, and other sources. Data warehouses have been a core feature of the data architecture for most large enterprises for many years. Beginning in the mid 1980’s, organizations began designing and deploying purpose-built, specialty databases designed to capture and store large amounts of historical data to support DSS (Decision Support Solutions) that enable organizations to adopt a more evidence-based approach to their critical business decisions. Over time, vendors like Teradata, Oracle and IBM began building data warehouse specific DBMS’ to better support the scale and architectures required to maintain these aggregated data stores. New design methodologies were also created to better enable the slicing and dicing required to support these DSS use cases.

The role of DataOps

In organizations of all sizes, advanced analytics have become a top priority across industries over the past decade. Research shows the vast majority of companies recognize its value, and have started to put internal analytics organizations in place, with an eye toward scaling use cases. However, that same majority of companies have not been able to unlock the full potential of advanced analytics—with the main reason being the lack of visibility, capabilities and repeatable processes needed to deliver data to feed these new algorithms and analytics models.

DataOps is an automated, process-oriented methodology used by analytics and data teams to improve quality and reduce the cycle time of advanced analytics. DataOps puts a lot of focus on “data pipelines” and insuring they are transparent, high-performing, agile, adaptable and well-governed. So, what does this have to do with moving to a cloud data warehouse? Well, in most data architectures, the data warehouse is a critical hub in pipelines that bring the data together and it represents the riskiest single point of failure in realizing the benefits of DataOps.

Challenges of legacy data warehouses

As with all good ideas, and their associated technologies, business innovation outstrips the capabilities of legacy solutions and approaches with new requirements, data types/data volumes and use cases that weren’t even imagined when these solutions were first introduced. In this digital age, legacy data warehouses struggle with a number of challenges:

  • Greater variety of data types confounding traditional relational data designs with their brittle schema when trying to capture new data formats.
  • Massive volume of data causing performance to suffer with complex querying requirements.
  • The increasing requirement for raw, un-transformed data to meet the depth and breadth of emerging analytics thereby changing the traditional ETL (Extract Transform Load) approach to loading data into the warehouse.
  • Traditional on-premises data warehousing technologies and approaches have a high total cost of ownership and require rare and expensive skillsets to maintain the environment.
  • Brittle architecture hampers IT’s ability to adopt and deploy new use cases in a timely fashion and with all the desired features.

A time-consuming development process and restricted support of self-service business intelligence (BI) are the major drivers for modernizing the data warehouse.

The most pressing issue according to our research was a lack of agility in the data warehouse development process. Business users, in particular, consider the inability to provide required data and the lack of user acceptance as a huge impediment to meeting their analytics goals. Combine this with the realization that the TCO on their existing data warehouse approach (software licenses, infrastructure, resourcing for DW DEV/OPS) and the conditions are optimal for the enterprise to make a significant move.

Enter the data warehouse in the cloud

The pressures caused by the business’ desire for data democratization, self-service, data-driven insights and digital transformation are driving organizations to re-envision their data aggregation solutions and vendors have responded with new cloud data warehousing technologies that deliver:

  • Adaptability – More timely and accurate adoption of new data and new analytics use cases.
  • Scalability – The ability to seamlessly meet the growing needs of the business.
  • Performance – Meeting both the SLA’s operational requirements as well as the financial budget limitations.
  • Reusability – Maintaining more data in it’s original (non-transformed) state for further use and value.

What is a cloud data warehouse?

A cloud data warehouse is a data warehouse that is maintained as a managed service in the public cloud and is optimized for business intelligence and analytics that can be used on a large scale. A cloud data warehouse provides businesses of all sizes with benefits and flexibility they couldn’t enjoy before. No longer constrained by physical data centers, companies can now dynamically grow or shrink their data warehouses to rapidly meet changing business budgets and requirements. Modern cloud architectures combine three essentials: the power of data warehousing; flexibility of big data platforms; and elasticity of cloud at a fraction of the cost of traditional solutions.

Leading cloud data warehouse technologies

The market continues to expand with a number of different cloud data warehouse solutions. However, there are four offerings that have bubbled to the top of the stack:

  • Amazon Redshift
  • Microsoft Azure Synapse
  • Google BigQuery
  • Snowflake Cloud Data Platform

While these platforms offer the opportunity to overcome the constraints inherent in traditional on-premises offerings, they also lack some of the tooling and capabilities to overcome the challenges required for easy adoption and long-term success for their customers.

What are the risks of moving to a cloud data warehouse?

Adopting a cloud data warehouse holds many potential benefits but like any large application modernization, there are significant risks involved in this undertaking. Organizations cannot afford any disruptions to normal business operations. They must have a clear understanding of their existing data assets in the data warehouse as well as all the processes involved in the operation of the data warehouse. Of equal importance are the existing data consumption processes and applications that utilize data in the warehouse and provide the business with the intelligence it needs.

The organization must be able to support their personnel with tools to plan, design, develop and execute the migration of both the existing data warehouse infrastructure (schema, processes, applications) and the data stored in the data warehouse to these modern platforms in a timely and accurate fashion.

Once the new cloud data warehouse is deployed, organizations must have the tooling required to monitor data warehouse performance and data quality, ensure data visibility and observability to enable literacy and ideation, and protect the data in this new system from threats and/or loss throughout the entire lifecycle.

Challenges with cloud data warehouses

The biggest challenges with cloud data warehouses are the following:

  • Lack of governance – Organizations continue to be concerned about the risks associated with hosting and provisioning data in the cloud. While cloud security has made great strides in easing these concerns, a robust data governance framework and practice is required to ensure organizations know what data is in the cloud, what rules and policies apply, who is responsible for that data, who should/shouldn’t have access and the guardrails for its consumption and usage.
  • Lack of skilled resources – New technologies and architectures require new skillsets, especially in designing, cataloging, developing and maintaining these new data warehouses.
  • Lack of planning support – While the cloud offers new consumption models that promise financial benefits, vendors provide little in the way of support to help organizations understand and plan how their requirements can be best deployed to achieve these benefits.
  • Lack of automation support – Latency created by expensive and time-consuming manual processes required to design, develop, adjust, maintain and replicate data in their environments can be overcome thru the automation of repeatable processes that assure agility, speed and accuracy in delivering a data warehousing platform.

What should you consider when choosing a cloud data warehouse solution?

Successfully adopting a cloud data warehouse requires data governance, metadata management, platform automation, data movement and replication, data modeling and preparation, and data infrastructure monitoring solutions. When combined well, these tools can enable organizations to document their legacy data warehouse, plan and envision their modern aggregation platform, migrate their legacy data structures, logic and movement processes and govern and automate the new platform. These processes will assure the accuracy, adaptability, maintainability and control of strategic data assets.

A cloud data warehouse solution should do this by supporting three key phases to assure the success of your new modern data warehouse:

  1. Model and document your as-is and to-be data warehouses to visualize your metadata which is the heart of your enterprise data management, data governance and intelligence efforts.
  2. Migrate the data as well as the data warehouse structures, logic and processes using automation.
  3. Govern and automate the ongoing development and operations of your modern data warehouse.

By empowering data warehouse modernization with the right tools and processes, organizations can accelerate legacy migrations while creating agile, adaptable, cost-effective and well-governed cloud data warehouse.

About the Author

Danny Sandwell

Danny Sandwell is an IT industry veteran who has been helping organizations create value from their data for more than 30 years. As Director of Product Marketing for erwin by Quest, he is responsible for evangelizing the business value and technical capabilities of the company’s enterprise modeling and data intelligence solutions. During Danny’s 20+ years with the erwin brand, he also has worked in pre-sales consulting, product management, business development and business strategy roles – all giving him opportunities to engage with customers across various industries as they plan, develop and manage their data architectures. His goal is to help enterprises unlock their potential while mitigating data-related risks.

Related Articles