How Data Fabric Is Changing the Future of Business
By integrating advanced analytics with a unified data fabric, Quantexa is helping organizations overcome data challenges and unlock new insights.
Enterprise data management has constantly evolved, driven by an ever-changing need for businesses to harness the full potential of their data. From the early days of data warehouses to more recent concepts of data lakes, lakehouses, and streams, approaches have attempted to resolve the challenges posed by increasing data variety, volume, and velocity.
Among these, the emergence of the data fabric represents a significant milestone, offering unifying data integration and management, and closer coupling with data science, analytics and decision-making. Here we explore the historical context from which the data fabric emerged, and highlight how Quantexa enhances its role in decision intelligence and delivering business value.
Fig 1: A Quantexa Enterprise Data Fabric
Data Paradigms Inspiring Quantexa and the Data Fabric
Throughout my career, enterprise data management paradigms have rapidly evolved. Let’s briefly traverse some data management history to understand how data fabric emerged.
Popular approaches have included:
Data Warehouse
Data Lake
ETL, i.e. Extract, Transform, Load
ELT, i.e. Extract, Load, Transform
Data Lakehouse
Data Streaming
Data Mesh
Data Fabric
Data Swamp
“Data swamp” may have been no-one’s preferred architecture, but it has unfortunately been a common one in practice.
From Data Warehouses to Lakes, Lakehouses and Data Streaming
Since the late 1980s, data warehouses have a substantial history in decision support and business intelligence applications. Proprietary Massively Parallel Processing (MPP) architectures facilitated the handling of larger structured data sizes – at high cost. However, unstructured and semi-structured data proved troublesome, and data warehouses, primarily centered on three phase ETL (Extract, Transform, Load) processes, found data-sets featuring high variety, velocity, and volume to be beyond reach.
Then, about 15 years ago, architects envisioned the data lake, a promising cost effective and scalable single system to house data for many different analytic products and repositories for raw data in a variety of formats. Apache Hadoop, inspired by early Google papers, was its focal point. From a data management perspective, while great for storing data, they relied on upstream quality processes to enforce data quality, and lack of transaction support made it hard to provide consistency. Lack of governance and discoverability, e.g. imperfect metadata and imperfect relationships, helped popularize the term data swamp.
However, further into the enterprise, data scientists and enterprise architects found MapReduce protocols (first Map, then Reduce) unfamiliar and complicated, a bloated stack including Impala, Pig, Flume, Sqoop and many other small open source projects attempting to overcome specific challenges on a patchwork and inconsistent basis. As DMRadio's Eric Kavanagh notes "the hype around HDFS [Hadoop Distributed File Server] faded by the late 2010s, in part due to the massive amounts of reverse engineering required to connect the relatively arcane protocols of Hadoop to other enterprise systems."
Step forward the open-source Apache Spark project and its unicorn sponsor, Databricks, with its Delta Lake table format finally restoring standard warehouse functionality (transactions, efficient upserts, isolation, time-travel query). With this, Databricks coined the phrase data lakehouse. More open and inclusive, the lakehouse facilitated co-existence of lake and warehouse, with low-cost cloud storage, object stores, and open formats to offer modern efficient batch (analytics) processing popular with data scientists.
As Databricks and the lakehouse grows from strength to strength, cloud data warehouse Snowflake has picked up steam and, in conjunction with the increasingly popular iceberg table format, is adding momentum to the "data icehouse". Meanwhile a distinct and significant data streaming and analytics ecosystem has emerged led by Confluent based on Apache Kafka and Apache Flink.
In all cases, the critical enabler is analytics, the oil that informs business value. Data without analytics is like the proverbial noise in the forest when there is no one there to hear it.
Data Mesh and Data Fabric
Somehow organizations need to manage all of this emergent complexity in a wildly heterogenous data landscape. Thus enter vendor neutral approaches like data mesh and data fabric, which marry abstract data management to practical business value through analytics, facilitating business intelligence and increasingly decision intelligence.
Data fabric and data mesh are similar but different, co-existing when required. A data mesh is an organizational concept akin to agile or lean software development methodologies, with data fabric a technology pattern. According to Gartner, “a data fabric is an emerging data management and data integration design concept. Its goal is to support data access across the business through flexible, reusable, augmented and sometimes automated data integration.”
Data Fabric | Data Mesh |
---|---|
Centrally Managed and Owned | Distributed Ownership |
Integrated, Unified | Decentralized |
A Platform and Technology, Helped by a Metadata Layer | An Architecture and Organizational Philosophy |
Integration–Centric | Business Domain-Centric |
Brokered Data Interactions | Direct Data Access |
Table 1: Data Fabric versus Data Mesh.
Organization and technology has to go hand in hand to resolve enterprise pinch points. Data mesh allows for autonomy, but most products will use a shared platform. Data fabric architectures can encompass and integrate all data management strategies from warehouse to lakehouse, accelerating product creation in a data mesh, underpinning analytics wherever, whenever and however needed, real-time, on-demand or batch. A data fabric helps both data products and other data assets to be discovered and used beyond managed data products, helping to make reuse and return on investment real.
Here's the problem. A data fabric brings together data physically at the point of processing or consumption and transforms it into a common shape. However, it neither truly unifies it - data can still be duplicated. Fabrics often represent metadata as a graph, but it's also important to unify data into valuable knowledge graphs and contextual data products. This makes a data fabric into a contextual fabric, providing the uplift anaytics needs.
Here's the second problem. Some of the data will suffer from poor quality. The fabric needs to help identify it and allow it to be fixed, otherwise consumers won't trust it. Data products in the fabric need to be stewarded, effectively the next generation of data management.
To create higher quality data products, then, a data fabric should incorporate entity resolution and impactful master data quality capabilities, and facilitate knowledge creation and analysis.
How Quantexa Empowers the Data Fabric (and the Data Mesh)
Quantexa excels at unified data products, tailored and flexible to the use case type, unifying on-demand decisioning, batch analytics, high performance APIs and streaming.
Quantexa thus enables data without doubt, resolved entities in unified data, facilitating accurate knowledge discovery through knowledge graphs and ego graphs, aka networks.
This in turn predicates accurate scoring methodologies directly to fraud, AML, risk and other solution use cases, and/or incorporates contextual knowledge into data, model and AI pipelines, for example populating machine learning feature stores.
Upstream in the data fabric, Quantexa drives data management by resolving entities to unprecedented levels of accuracy from source data stores. This facilitates the ready creation and diffusion of more data-as-knowledge assets and products.
Overall, Quantexa generates business value through Decision Intelligence, a harmonious balance of operational and analytical focus, underpinned by data quality management.
This ensures that a data fabric thrives because:
Data engineers can process, transform and distribute data with confidence.
Data technology owners can use metadata management and lineage tools, such as Solidatus and Collibra alongside Quantexa’s data unification and enterprise data quality capabilities.
Data scientists can prototype and deploy graph and network analytics, extracting intelligence at the point of need and into analytics pipelines.
Analysts and investigators can analyze and question key data-sets and resources, e.g. watchlists, reports and news intelligence.
SMEs and managers can incorporate context into their decision-making and infuse automated processes with micro decision making.
For organizations maintaining many data platforms and analytics layers in a data fabric and/or data mesh, Quantexa brings a unified contextual fabric informing great analytics, comprehensive data quality management, and better decision intelligence.
Fig 2: How a Data Fabric Enables Data as a Product into Downstream Decision Intelligence and Upstream Data Management.
Traditional data warehouses to data fabric
The journey from traditional data warehouses to the modern concepts of data fabric and data mesh reflects the constantly evolution of data management. As businesses continue to navigate the complexities of data integration and use, the need for robust and versatile solutions becomes paramount.
Quantexa’s approach to data fabric, with its emphasis on entity resolution, data quality, and contextual analytics, exemplifies the next step in this evolution. By empowering organizations with the tools to create accurate, knowledge-rich data products, Quantexa not only enhances the effectiveness of data fabrics but also enhances more informed decision-making. As data management landscapes continue to evolve, solutions like Quantexa will increasingly shape the future of enterprise data strategies.
To learn more, view the recorded webinar: Revolutionize Your Data Fabric: Practical Steps Towards a Data Fabric in Financial Services