Modernizing data architectures is a crucial endeavor for organizations to stay competitive and make better use of their data assets. As organizations continue to view their data as an asset over simply a necessary cost to do business, the focus on structuring data for its best use increasingly becomes a top priority. However, implementing a modern data architecture that effectively serves businesses can be a distinct challenge.
Data architectures have evolved
Traditionally, data architectures were designed to support the day-to-day operations of a business. Usually to capture, move and manage customers through a business process. That meant identifying your operational systems, the data in them, then capturing and storing that data in built-for-use databases focused on processes and operations.
Then came the realization that insights could be derived from all that data. However, everything was built strictly for running the business, so analytics were secondary. To get business intelligence, it would be necessary to take all the data and put it in a central place; a data warehouse or data lake. That, however, meant degrading a lot of the data because it wasn’t all meant to fit together. Performing business intelligence as simple as analyzing sales year over year could be a very labor-intensive process in the traditional data architecture paradigm.
The need to move data to put it into a usable format presents various challenges. ETL systems were created to populate analytics platforms, but this often resulted in running analytics on outdated data. Beyond that, combining Relational Database Management Systems (RDBMS) and analytics workloads on the same data wasn’t possible until the advent of modern data platforms like Snowflake and Databricks.
Overall, the biggest change in how organizations look at their data has been the priority of data architectures. The problems that it needs to solve have evolved from transacting (keeping the business running) to analyzing (putting the business on a good strategic footing based on data insights).
The pillars and benefits of modern data architectures
Interoperability
When you look at the data warehouse, you look at the lowest common denominator of the data to see how it can all fit together. Transforming and massaging data means losing some of it, but if you don’t, then the silos won’t fit together. That brings you to what people are trying to achieve with the modern data architecture: much more interoperability of data without degradation.
The smart way to achieve interoperability is to standardize how data is captured in your organization so that you eliminate transformation and degradation. From an operational, day-to-day, cost-and-time perspective, standardization simplifies and lowers the cost of data integration. If data is designed to fit together, then you don’t have to spend time going through the analysis and programming to make it fit together.
Decentralization and distributed data governance
A data mesh is one of the modern data architectures that appeal to data architects. One goal of a data mesh is to give control over data to the people who are close to it. That may not sound like decentralization, until you combine it with the overarching idea of standardization. For the big picture and in the context of expertise, you want to decentralize and take advantage of skills in a specialty. However, that shouldn’t stop data from coming together centrally.
Consider federation, which operates on the same principles that a government does. A federal government has the big picture and sets rules and standards, and states interpret that in a way that makes sense for them. Once the data is federated, the concepts behind it, such as data governance and data architecture, follow a federated model. At the highest level, you define what will make everything hang together well, then allow flexibility for individual subject matter expertise. For example, marketing staff are ultimately responsible for marketing data, the sales staff for sales data and so forth.
The model ultimately leads to distributed or federated data governance, characterized by central rules: for example, private data is private data, no matter whose domain it sits in. At the same time, you provide guardrails for differences within that framework so that departments have some flexibility without crossing into other lanes.
Self-service and discoverability
The prevailing wisdom used to be that IT owned the data and allowed the business users to access it. It made sense and reduced risk when anyone wanting to do anything with data had to go through a gatekeeper. However, gatekeepers don’t scale and growth can suffer as a result.
Nowadays it’s smart to make sure data is in the hands of the users so they can access it without overcoming roadblocks and jumping through hoops. You achieve that once you’re confident in how you’ve defined and governed your data.
Self-service and discoverability mean bringing everybody in an empowered data citizen at the level of their skill set and in their context. Self-service and discoverability support the ideation and innovation that come from the business. It shortens the time to value because users can do more for themselves. It’s a big step toward scalability while maintaining control and still mitigating risk.
Automation and infrastructure management
Self-service and discoverability are closely related to automation and infrastructure management, which enable the architecture to keep up with the demands of the business.
They also make sure that processes run repeatably and reliably —as designed —without daily human intervention. The cleaner and more thorough the automation, the faster you can move without introducing more risk.
Continuous integration/Continuous delivery (CI/CD)
CI/CD is an agile DevOps approach that applies to delivering software to the business timely and effectively. Instead of DevOps, it’s specific to the data; so DataOps.
Time is of the essence in a modern data architecture because the opportunities represented by data are time-boxed. You can’t be getting last month’s performance figures on the twentieth of the following month, the way it used to be. Opportunities don’t wait around from one quarter to the next.
Available modern data architecture options
The choice of modern data architecture should fit your organization’s goals, data characteristics, technical expertise and business requirements. Combining elements from different architectures can provide a balanced solution.
- Data mesh: A data mesh aims to decentralize data management by giving ownership of the data to those who produce the data.
- Data fabric: A data fabric uses services to connect and integrate data sources into an accessible structure.
- Lambda architecture: Lambda architecture combines batch and real-time data processing to handle large quantities of data.
- Kappa architecture: Kappa architecture uses a single stream of processing data, which focuses on processing data streams as they arrive.
- Data lake architecture: A data lake allows organizations to store data in native formats without a need for structuring or schema.
- Cloud native architecture: Cloud native architectures use multiple cloud services to build dynamic applications.
- Data warehouse architecture: A data warehouse is a centralized data repository that can be analyzed to make better decisions.
- Event-driven architecture: Event-driven architecture allows organizations to detect “events” and act on them in real-time.
- Microservices architecture: Microservices architecture is a number of independent services that can interact with each other. They are loosely coupled, but they can be managed, maintained, tested and deployed somewhat independently. They tend to be based on specific business capabilities or application functionality areas.
- Hybrid cloud architecture: A hybrid cloud architecture combines private and public cloud environments. This allows data and applications to be shared while maintaining some separation.
- Serverless architecture: A serverless architecture allows organizations to run and create applications and services without managing underlying infrastructure.
Every organization will need to evaluate what potential data architectures make the most sense to implement. The suitability of a particular modern data architecture will depend on an organization’s goals, data landscape, technical capabilities and cultural readiness.
Challenges of building a modern data architecture
Legacy technology
This applies to organizations that either are on technology or have an architecture behind it that doesn’t fit with modern technologies. It’s time-consuming and expensive to get off of those systems because they’re usually poorly documented and understood. A lot of legacy technology falls under the rubric of “if it ain’t broke, don’t fix it.”
These are old systems and old programming languages of little appeal to the majority of software engineers. The technology landscape has shifted a long way and moving to a new architecture can be risky.
Fear of another swamp of data quality problems
Data quality problems easily create data silos as bad as, or worse than, the previous ones. Those data quality problems don’t go away by moving to a new architecture.
Before going down the modernization path, look closely at your data today. Does it need to be cleansed? What can you do with it? Is your governance sufficient that you can lift-and-shift it to a modern data architecture as it is?
If you’re leaning toward a modern data architecture looking only at today’s challenges, without thinking about things like scalability and future needs, you can find yourself in another swamp.
Data infrastructure that won’t support generative AI
Executive leaders are frequently asking IT to add a layer of generative AI onto organizational data. However, if the underlying data infrastructure isn’t trustworthy, the resulting insights from generative AI won’t be trustworthy either. “Garbage in, garbage out” becomes reality for organizations that don’t take the time to shore up their data infrastructure. In order to leverage AI successfully, organizations must ask the tough questions to really figure out if their data pipelines are capable of supporting AI.
Security and compliance
If you’re not well governed now, then it’s not a good time to start moving lots of parts around. Switching architecture without a security blanket is risky. Not only are you flirting with data loss or a breach, but it can quickly deflate your internal campaign for buy-in. Imagine putting your whole strategy at risk by loudly singing the praises of the new architecture, only for previous issues to get ported over.
Talent and skills
The issue of finding the labor to modernize your data architecture cuts both ways.
On one hand, you may know where to find the talent to implement the architecture and wield the technologies behind it. But those people are not just sitting on the fence waiting for the phone to ring; they’re highly sought after and expensive.
On the other hand, you may have a lot of engineers and database administrators who have proven their worth and have a lot of tribal knowledge about your business. Can you entrust the project to them if their skills aren’t up to date? How can you avoid throwing out the baby with the bath water? How do you move them forward in their knowledge of architecture? Do they want to move forward? Are they afraid to make changes to the status quo?
Understanding and buy-in
Through the entire undertaking, you have to sell what you’re doing and make sure that people understand what you’re doing, why, and make sure that you have buy-in throughout the whole process. That’s how you’ll get funding for the work and help when it’s time to move roadblocks.
Most of the time, you’ll already have buy-in from the data people. Your chief data officer, for instance, is leading the charge, sitting at the table with your chief information officer. You might think that CDO, CIO and CTO are on the same page and buy-in is automatic. But changing architecture means more headaches for the CIO and CTO in the short term, so your CDO has some convincing to do.
On the other side of the C-suite, the chief revenue officer, chief analytics officer and chief financial officer may want what you want and be on board. But that’s not the same as understanding what it takes to get there. So you have to sell them on what it takes to move forward and the specific obstacles they have to move. Moreover, this is the mindset that has to be ingrained into the organization to adopt this successfully.
Company culture
All of this effort can be for naught if company culture is antagonistic. If you’re contending with a culture of “if it ain’t broke, don’t fix it,” you’ll have a lot of inertia to overcome. Not everybody knows where it’s broke or the extent to which the data architecture is broken, so it can be a constant job of selling.
Mistakes organizations make attempting to build a modern data architecture
Not thinking past implementation of the data lakehouse
You can achieve some of the goals of a modern data architecture by implementing a data lakehouse. The important thing is to think beyond that implementation to what comes next.
First, some background.
- The traditional data warehouse is tightly structured. It often requires transforming data, which inevitably leads to degradation of data.
- The alternative, the data lake, became the place to store raw data. No transformation took place and the data resided there anytime you needed it in its raw form. The problem is that it often becomes a data swamp because there was not enough structure to easily tell what was in it. Worse yet, if you ever wanted to combine it with your traditional data warehouse, you then have an even bigger transformation job.
- The data lakehouse, in the form of cloud offerings allow you to use a data warehouse and data lake more easily.
As a synthesis, the data lakehouse gives you big analytics, but you’re still transforming a lot of data, with the attendant degradation. You may feel some relief, but you still don’t have a modern data architecture. That’s why, if you opt for a data lakehouse, you can’t lose sight of what to do next.
Thinking that one size fits all
It’s important to take the time to understand what’s right for your specific organization, given your legacy, your goals and your markets. You can find vendors that have compiled multiple ideas and tried to standardize them, but the resulting architecture may not fit you.
It’s not a matter of picking one from a list and going with it, because anything you pick will have pluses and minuses. With a thoughtful process that includes modeling and assessing before spending money, you can avoid taking two steps forward and one step back. Over time you discover that you still need the good things from your previous architecture, and it’s hard to find them in a standardized approach. Your existing architecture has some beneficial elements, so why throw them out? How can you reapply them? How can you take advantage of them to move forward?
Considerations when choosing a modern data architecture
- Intended use cases and business outcomes – Why are you going to all this trouble and expense? What do you expect to get out of it?
- Existing architecture – What do you have today? Which parts of it are going to help you and hurt you on the way to achieving your desired outcomes?
- Candidate architectures – Which approaches stand the best chance of solving your problems? What was your process in arriving at them?
- Technology, resources and expertise – Which investments and changes will your destination architecture require?
- Road map – Define the steps the organization will take along the way to the new architecture.
- Communication – Show incremental progress. Supplement the road map with the results, the returns, the good, the bad and the ugly so that people are not surprised and don’t get the wrong idea about progress.
- Flexibility – Don’t be brittle; brittleness in moving to a new data architecture – saying “That’s the one, and we’re done questioning it” – is bad. Be prepared to constantly question and adjust. Make sure that the architecture you’re adopting is agile and adaptable because there will be smaller changes and adjustments over time.
Steps to implement a modern data architecture/strategy
Implementing a modern data architecture involves several steps that are crucial for effectively managing and utilizing data within an organization. Here are the main steps involved:
- Define your business goals and data strategy: Clearly identify your organization’s business objectives and determine how data can contribute to achieving those goals. Establish a data strategy that aligns with your business objectives, considering aspects such as data governance, data quality, data privacy and data security.
- Assess your current data landscape: Evaluate your existing data infrastructure, systems and processes. Understand the sources, types and volumes of data you generate or collect, as well as the data storage, integration and analysis methods currently in use. Identify any gaps or limitations in your current data architecture that need to be addressed.
- Plan for data integration and interoperability: Determine how data from various sources and systems will be collected, integrated and managed. Identify the data integration techniques and technologies that best suit your needs, such as data pipelines, ETL (Extract, Transform, Load) processes, APIs and data virtualization. Consider the need for real-time or batch processing, as well as ensuring data interoperability across different platforms and applications.
- Choose appropriate data storage solutions: Select suitable data storage technologies based on your requirements. This could include traditional relational databases, NoSQL databases, data lakes and data warehouses. Consider factors such as scalability, performance, data retrieval speed, cost and data governance capabilities when making your choices.
- Implement a data governance framework: Establish a robust data governance framework to ensure data quality, consistency and compliance across the organization. Define data ownership, data stewardship and data management policies. Implement data governance tools and processes to monitor data quality, enforce data standards and manage metadata.
- Establish data security and privacy measures: Data security and privacy are critical aspects of modern data architectures. Implement appropriate security measures, such as encryption, access controls and user authentication, to protect sensitive data from unauthorized access. Comply with relevant data privacy regulations and ensure that data usage adheres to ethical and legal guidelines.
- Enable data analytics and insights: Implement tools and technologies for data analysis, reporting and visualization. This could involve using business intelligence (BI) platforms, data visualization tools, data mining techniques and advanced analytics algorithms. Consider the needs of various stakeholders in your organization, such as executives, data analysts and data scientists, when designing your analytics capabilities.
- Embrace cloud and modern infrastructure: Leverage cloud computing and modern infrastructure technologies to enhance scalability, flexibility and cost-effectiveness. Cloud platforms can provide storage, processing power and advanced analytics capabilities, allowing you to focus on insights rather than infrastructure management. Consider cloud providers, hybrid cloud architectures and serverless computing options based on your requirements.
- Implement data pipelines and automation: Create efficient data pipelines to automate data ingestion, integration and transformation processes. Use tools and frameworks such as Apache Kafka, Apache Airflow and custom-built solutions to orchestrate data flows and automate repetitive tasks. Automating these processes reduces manual effort, improves data quality and enables real-time data processing.
- Foster a data-driven culture: Encourage a data-driven culture within your organization by promoting data literacy, training employees on data analytics tools and techniques, and fostering collaboration between business users and data professionals. Create an environment where data-driven decision-making is valued and supported.
Conclusion
Modernizing data architecture is essential for organizations that want to unlock the full potential of their data. While challenges exist, there are numerous options available to develop a modern data architecture to meet your organization’s specific needs, ensuring better data utilization, insights and competitiveness.
Keep in mind that implementing a modern data architecture is an iterative process that requires continuous improvement and adaptation to changing business needs. Regularly assess your data architecture, monitor data quality and explore emerging technologies and trends to stay ahead in the rapidly evolving data landscape.