[ SysAdmin  Philosophy  DevOps  DataScience  BigData  ]

Data Has Gravity(AI)

The Obsession: An Introduction to Our Love for Data

Data has always fascinated us, even before the advent of the internet. We’ve been captivated by the idea of storing vast amounts of information digitally, marveling at the possibilities it holds. Our attachment to data stems from our desire to leave a lasting legacy, to pass on knowledge from one generation to another. We believe that through data, we can uncover new wisdom and transform the world. This article explores our obsession with data and the implications it has on our lives.

The Data Struggle: Challenges in Storing and Protecting Data

However, our obsession with data comes at a cost. Storing and protecting data has never been easy or free. The historical example of the Library of Alexandria illustrates the challenges of preserving valuable information from destruction. Even in the modern era, with the exponential growth of data, managing and safeguarding it becomes increasingly difficult. The scale of data today is incomprehensible, and the mechanisms we’ve relied on, such as search engines, have struggled to keep up. The concept of reading “the whole internet” is akin to an ant eradicating an entire species of elephant. This exponential growth of data presents significant challenges in terms of storage, protection, and meaningful utilization.

Byte by Byte: The Rapid Expansion of Data

The last half-century has witnessed an exponential expansion of data. This growth has been made possible by assumptions like Moore’s Law, which posits that computer systems double in performance every two years. However, this assumption has reached its limits, and the industry now grapples with the reality that we can no longer rely on exponential growth. The consequences of this growth are becoming evident. Digital storage platforms, like Photobucket and YouTube, face the challenge of managing vast amounts of data that may no longer hold value. The scalability of data warehouses has shown us that not all data is worth keeping, and managing worthless data at scale becomes an insurmountable task.

This Is Fine: Data’s Gravity and the Rise of Data Lakes

Our insatiable appetite for data has transformed our data landscape from a desert to a jungle. We have come to realize that data has gravity, pulling us into a new reality. Organizations speak of “Data Lakes” as if they possess magical powers, a solution to all data management challenges. But in reality, data lakes have only created more complexity. They have become data planets with satellites orbiting around them, generating reflections that bounce between parties willing to pay for access. The result is often a “data swamp” where the integrity of the lake is compromised, rendering the generated connections and insights useless.

Root of All Evil: The Illusion of Infinite Data Expansion

To address the challenges posed by infinite data expansion, we must acknowledge the impossibility of keeping all data forever. While the solution of removing low-priority data might work in theory, it faces practical challenges. Ownership of data within organizations is complex, and data hierarchies make decision-making difficult. The advent of data lakes and data mesh principles can help, but they are not standalone solutions. Tools that allow organizations to assess the value of data and involve stakeholders in the decision-making process can help identify irrelevant data. However, achieving widespread participation and cultural change remains a significant hurdle.

Disentangle, Disassociate: The Promise of Data Mesh

The concept of Data Mesh, proposed by Zhamak Dehghani, provides a promising direction for solving data management challenges. By treating data as a service and pushing management responsibilities to data owners, organizations can start to address data issues effectively. However, at a societal or global scale, scalability becomes an obstacle. Organizations must find ways to commoditize their data within the mesh to prevent stale nodes. Cultural management of data mesh solutions remains a significant challenge, particularly when dealing with organizational changes and technical debt.

A New Thought: AI and Data Distillation

The field of AI research offers hope for addressing data challenges. Companies like Tesla generate vast amounts of data, but using all of it for training self-driving models would be impractical. AI provides a solution by distilling data, retaining only what is meaningful and compressing or removing less valuable data. The radical promise of AI lies in its ability to identify meaningful data based on inference activity. If data is never accessed or proven valuable during inference, it can be safely discarded or distilled into its core essence. This approach can be applied at scale once the right AI models and utilities are designed and widely available.

Promises of Hope: AI as a Tool for Data Stewardship

While skepticism surrounding AI is warranted, there is genuine hope in its potential to revolutionize data stewardship. AI can alleviate the burden of managing data, enabling individuals to become better data stewards. The prospect of AI-powered robots cleaning up and organizing data, with minimal guidance, is becoming more plausible. These advancements in AI tools and capabilities can empower individuals to manage their data effectively. Leveraging AI’s potential, along with other existing tools and philosophies, organizations can address data challenges and shape the future of data management.

Conclusion: Cultivating Healthy Data Habits

Our love for data, like our love for money, can lead to ruin if not approached with healthy habits. We must develop responsible data management practices and utilize the available tools and resources to navigate the challenges. AI holds promise in alleviating some of these challenges, but it’s essential to maintain a critical mindset. By acknowledging the limitations of infinite data expansion and embracing new approaches like the data mesh, we can begin to shape a future where data is managed effectively. It’s crucial for organizations and individuals to engage in the ongoing conversation and prepare for the evolving landscape where AI interfaces may replace traditional search tools.

Written on June 5, 2023