In Rodgers and Hammerstein’s “The King and I,” the King explains to “I” that the bee always flies from flower to flower, the flower never flies from bee to bee. That justification for philandering didn’t fly with Mrs. Anna, but it does make sense when applied to the relationship between applications and data: Should data fly from application to application, or should the data stay put like a flower and let applications approach it on its terms?
A new framework, formulated as an open standard that has just received the imprimatur of the Canadian government, is keeping data firmly rooted.
Jump to:
What is Zero-Copy Integration?
Zero-Copy Integration is an initiative championed by the Canadian collaborative data company Cinchy. It aims to overturn the enterprise software API integration paradigm with a totally new model — the company calls it dataware — that keeps data effectively rooted while removing complexity and data redundancy from the enterprise software integration process.
Benefits of Zero-Data Integration
Proponents of zero-copy integration and dataware say the framework will lower data storage costs, improve performance of IT teams, improve privacy and security of data, and drive innovation in systems for public health, social research, open banking and sustainability through innovations in:
- Application development and enrichment.
- Predictive analytics.
- Digital twins.
- Customer 360 technology.
- Artificial intelligence and machine learning.
- Workflow automation.
- Legacy system modernization.
SEE: Big data vs the right data: Becoming more productive in the cloud (TechRepublic)
On Tuesday, Canada’s Digital Governance Council and the not-for-profit Data Collaboration Alliance, created by Cinchy, announced CAN/CIOSC 100-9, Data governance – Part 9: Zero-Copy Integration, a national standard approved by the Standards Council of Canada, to be published as an open standard.
Read more about the announcement and Canada’s Digital Governance Council in this TechRepublic article.
Zero-Copy Integration seeks to eliminate API-driven data silos
The basic idea, according to Dan DeMers, Cinchy’s CEO, is that the framework aims to remove application data silos by using access-based data collaboration versus standard API-base data integration that involves copying data and branding it with complex app-specific coding. This would be done by access controls set in the data layer. It would also involve:
- Data governance via data products and federated stewardship, not centralized teams.
- Prioritization of “data-centricity” and active metadata over complex code.
- Prioritization of solution modularity over monolithic design.
The initiative said viable projects for Zero-Copy Integration include the development of new applications, predictive analytics, digital twins, customer 360 views, AI/ML operationalization and workflow automations as well as legacy system modernization and SaaS application enrichment.
DeMers, who is also technical committee member for the standard, promises a revolution in data.
“At some point in a world of increasing complexity, you fall off a cliff, so we believe we’re at the beginning of the simplification revolution,” he said. “The fact is that data is becoming increasingly central, and the way that we share it is with APIs and ETLs, which involves creating copies and vastly increases complexity and cost. It amounts to half the IT capacity of every complex organization on the planet, and every year it gets more expensive.”
He said even more concerning is that every time a copy is generated, a degree of control is lost.
“If I run a bank, and I have a thousand applications, and they all need to interact with some representation of my customer, and by doing that are copying that representation, I now have a thousand copies of that customer,” DeMers said. “How do I protect that?”
SEE: Data governance checklist for your organization (TechRepublic Premium)
Security through Zero-Copy frameworks
Laws describing ownership of data limit how organizations or governments can use that data — but they are laws, not systematic controls, noted DeMers. A key point of the Zero-Data Integration argument, and Canada’s adoption of a framework in principle, is that it makes data security easier by limiting access and control.
“Zero Copy is a paradigm shift because it allows you to embed controls in the data itself,” DeMers said. “Because it’s access based, not copy based, access can be granted and it can be revoked, whereas copies are forever and you can quickly lose control over who has them, and any attempt to limit what organizations do when they obtain a copy is hard. “
Cinchy is aiming for a “data fabric architecture” to transform data warehouses, lakes and/or lake houses into repositories that can actualize both analytics and operational software. This is so apps can come to it, not carry copies of data back to the application walled garden.
DeMers argued that the creation and storage of copies costs money, both because of storage and data pipelines and the time IT has to spend managing the iterations of data generated by hundreds or thousands of apps an enterprise may host.
“Copies of data require storage; the creation of the copy and synchronizing it not only uses storage, but also uses computation,” he said. “If you imagine most of the processes running on servers in the bank right now, they’re moving and reconciling copies of data, which constitutes energy use.”
He added that copying and moving data creates opportunities to introduce errors. If two systems connected by a data pipeline desync, then data can be lost or corrupted, reducing data quality. With one copy of the data used collectively by all systems, there’s no chance of records appearing differently in different contexts.
Is Zero-Copy Integration an L.A. subway dream?
Matt McLarty, chief technology officer of Salesforce’s MuleSoft, agrees that data replication is a perennial issue.
“Not even data replication, but the existence of semantically equivalent data in different places,” he said.
He sees it as a bit like Los Angeles and subways: A great idea in principle, but nobody is going to tear Los Angeles down and rebuild it around mass transit.
“It’s both a huge issue but also an unavoidable reality,” he said. “From a problem statement, yes, but I would say there are multiple categories of software in the space, including Salesforce Genie, all about how you harness all of the customer data widely dispersed across the ecosystem.”
SEE: Study: Companies have upwards of 1,000 apps but only a third are integrated (TechRepublic)
Operational elephants and analytical zebras drinking from the same data lake
Most enterprises, explained McLarty, have two massive areas of data that, while not at cross purposes, need to live separately: operational data and analytical data. Operational data is employed by such user-facing applications as mobile banking; analytical data takes data out of the flow of operational activities and uses it for business analytics and intelligence.
“They have historically lived separately because of the processing differences,” he said. “Operationally, there’s high speed, high-scale processing and analytically, small internal groups crunching big numbers.”
DeMers explained that what dataware does, among other things, is to incorporate “operational data fabric.” This, he said, makes “last time” integration from external data sources to an architecture based on a “network of datasets” that’s capable of powering unlimited business models.
“Once created, these models can be readily operationalized as metadata-based experiences or exposed as APIs to power low code and pro code UX designs,” he said, adding that it eliminates the need to stand up new databases, perform point-to-point data integration or set app-specific data protections.
“Another core concept associated with dataware technology is ‘collaborative intelligence,’ which is created as a result of users and connected systems, simultaneously enriching the information within the dataset network,” he said.
DeMers said users granted access to a dataset by its owners get an interface called a “data browser” offering a “self-serve experience.”
“In principle, this works a bit like Google Docs, where multiple colleagues collaborate on a white paper or business proposal while the software automatically offers grammatical suggestions and manages roles, permissions, versioning and backup,” he said.
DeMers added that the end result is super-enriched and auto-protected data that can be instantly queried by teams to power unlimited dashboards, 360 views and other analytics projects.
Will companies simplify or “embrace the chaos?”
By some estimates, companies are taking the “embrace the chaos” route to find new approaches that concede that the enterprise data frameworks will remain complex and L.A.-like. These include data mesh frameworks and automation and machine learning systems creating models that integrate different kinds of data.
“I think the biggest shift right now in the world of data is that the two worlds — analytical and operational — are colliding,” McLarty said. “What’s happening now, because of the big data movement and machine learning, is data-derived coding — writing code with data, ingesting data and producing machine learning models based on the data that I can put into my applications.”
DeMers said that the dataware paradigm enables data mesh concepts.
“Requiring a single team to manage every dataset in the organization is a sure path to failed data governance,” he said.
He also argued that in a data-centric organization, data stewards should reflect the granularity of your organization chart.
“This approach to federated data governance organized around data domains and data products is the data mesh, and it’s a big part of establishing a more agile enterprise,” DeMers said.
Data silos make this difficult because of the unrestricted point-to-point data integration that it involves.
Liberating data from the application
Sylvie Veilleux, former chief information officer of Dropbox, said data silos are a fundamental part of the Software as a Service ecosystem, but that is a problem dataware can solve.
“Every app solves a specific and unique purpose, and they are tending toward more and more specialization, she said. “The more SaaS adoption continues, which is very healthy in terms of how the business gets access to tools, the more it’s continuously creating a hundred, thousand or more data silos in larger corporations. This number will continue to grow without us taking a whole new approach to how we think about data applications.”
She said dataware and Zero-Data Integration allows enterprises to eliminate extra data integrations by having the app connect to a network data source.
“It changes how we work by pivoting the process from data being the captive of an application to keeping it on a network, thereby letting users collaborate, and giving businesses real-time access to it,” Veilleux said.
With data repositories moving to the cloud, a boon to collaboration, companies have more flexibility and reduced costs, but at what cost to security and threats? Download this TechRepublic Premium policy, which includes guidelines that will help you achieve secure cloud data management for integrity and privacy of company-owned information.