Enterprise Scalability: The Most Important Capability for Entity Resolution
Overcoming data challenges and harnessing the value of big data starts with building a reliable and accurate data foundation.
Overcoming data challenges and harnessing the value of big data starts with building a reliable and accurate data foundation. Entity Resolution is the process of connecting data, wherever it is, regardless of quality, into a single complete view – providing the foundation for better decision-making.
But this single view is only complete if it can scale across your entire enterprise, encompassing an ever-growing amount of data. A successful deployment of entity resolution must encompass as much data as possible – meaning it needs to scale indefinitely.
Existing technologies can’t achieve enterprise scalability
Traditionally, there were two competing approaches to entity resolution:
Batch-based systems: all entities are resolved across all data sources in parallel. This is useful for analytical model building (where a data scientist wants to build a model for all customers). However, the biggest pitfall with batch processing is when operationalizing these models, decisions are being made on outdated data.
Real-time systems: These technologies allowed the entities to be kept up to date as new data arrived. This makes them suitable for streaming use cases and ensures they are always up to date. However, these architectures are optimized to process one record after another, making them extremely slow for full-scale batch runs.
So how can organizations benefit from the speed of batch entity resolution and the timeliness of real-time?
The answer is dual architecture
To overcome this challenge, Quantexa built a dual architecture platform that allows for both batch and real-time processing. As a result, Quantexa is able to do the initial priming of a system faster than any competing technology, while its real-time capability means data is continuously up to date.
Quantexa is also able to scale linearly with data and infrastructure. This means if you use twice the amount of hardware, it’ll run for half the time; if you double the amount of data and double the amount of hardware, it’ll run for the same time.
Many existing entity resolution technologies can not offer linear scalability with data and hardware – and those that do, operate in either batch or real-time.
The best of both worlds: Quantexa’s performance
To put the benefit of the Quantexa dual architecture in perspective, it is worth looking at a real-world comparison. Using an alternative real-time-only engine, 1 billion records can be processed in roughly two weeks, with a cost of $30,000 on hardware. By comparison, Quantexa’s platform can process 1.5 billion records (50% more) in under 3.5 hours (100x faster), costing only $183 (160x cheaper). Fundamentally, this is because a real-time engine is being asked to do something it is not suited to batch-based processing.
When comparing market-leading batch-based entity resolution engines, Quantexa is able to process records 4.5x faster which generally means an equivalent reduction in the cost of infrastructure.
Dynamic: The next evolution of Entity Resolution
One of the biggest challenges with traditional technology is there is of an assumption that it is possible to create a single view for all use cases across the enterprise. Unfortunately, this assumption is invalid for two reasons:
Different requirements on match confidence The single view that is required for a Master Data Management (MDM) use case is stricter than one required for financial crime. This is because in MDM, there is zero tolerance for over-matching, whereas in financial crime the focus is on ensuring you do not miss risk.
Data security/compliance Each use case will have different requirements around what data can be used. For example, a fraud watchlist cannot be used in marketing, and an individual who has enacted their GDPR “right to be forgotten” must not be used in marketing but can be used in financial crime detection.
The result of this means traditional technology ends up being deployed multiple times for each use case, resulting in multiplication of hardware, run time and cost.
Quantexa’s Dynamic Entity Resolution technology solves these problems. Rather than keeping an existing single view of an entity up-to-date, Dynamic Entity Resolution regenerates the entity in real-time from the underlying raw data. This unique capability is fundamental to scaling entity resolution across your enterprise and allows a single instance of the platform to serve all of the use cases across your organization.