We return to NYC on June 5th to work with executives to explore comprehensive ways to audit AI models for bias, performance, and ethical compliance across a variety of organizations. Find out how to attend here.
Over the past decade, the world of data tools and infrastructure has exploded. As the founder of a cloud data infrastructure company in the early days of cloud computing in 2009 and the founder of a meetup community for early data engineering groups in 2013, I have been at the center of this community even before the term 'data engineer' was created. “This position allows me to reflect on the lessons learned from the recent past of data tools and how they should drive advancements in the new era of AI.
In technological anthropology, 2013 was a period between the ‘big data’ era and the ‘modern data stack’ era. In the era of big data, as the name suggests, the more data, the better. Data is known to contain analytical secrets for creating new value in your business.
As a strategy consultant for a large Internet company, I was once tasked with combing through the data emitted from billions of DNS queries per day, finding the magic insights buried there, and developing a plan for what could become a new line of business for the company. It's worth $100 million. Did you find this insight? It wasn't just the relatively short amount of time (months) we spent on the project. As it turns out, storing big data is relatively easy, but generating big insights takes a lot of work.
But not everyone realized this. All they knew was that they couldn't play the insight game if their data house wasn't working properly. As a result, businesses of all shapes and sizes are rushing to strengthen their data stacks, and the number of data tools available from vendors offering them has exploded. their The solution was the missing piece of a truly holistic data stack that could generate the type of magical insights businesses were looking for.
VB events
AI Impact Tour: AI Audit
invitation request
I don't use the term “explosion” lightly. Recently, in the 2024 Machine Learning, AI, and Data (MAD) landscape, author Matt Turck found that the number of companies selling data infrastructure tools and products in 2012 (he set out to create a market map) was 139 companies. . This year's version includes 2,011, a 14.5x increase!
Several things have happened that have helped shape our current data landscape. Enterprises are starting to move more on-premises workloads to the cloud. Modern data stack vendors have offered managed services with composable cloud offerings that can provide customers with higher reliability, greater system flexibility, and the convenience of on-demand expansion.
However, as companies went through the Zero Interest Rate Policy (ZIRP) period and increased the number of data tool vendors, cracks began to appear in the MDS façade. Problems with system complexity (resulting from a variety of tools), integration issues (lots of different point solutions that need to talk to each other), and underutilized cloud services have led some to question whether the promise of the MDS panacea can be achieved.
Many Fortune 500 companies have invested heavily in data infrastructure without a clear strategy on how to drive value from that data (remember, finding insights is hard!), resulting in inflated costs without proportional value. However, it was fashionable to collect various tools. We often hear reports of multiple tools being used overlappingly across multiple teams within the same company. For example, across business intelligence (BI), many companies will also install Tableau, Looker, and perhaps even a third tool that essentially accomplishes the same business objective while processing invoices three times faster.
Of course, this type of excess will ultimately end in the ZIRP bubble bursting. However, the MAD environment continues to grow unabated. why?
What is the new ‘AI stack’?
Clearly, many data tools companies were well capitalized during the ZIRP period and will therefore be able to continue operating despite difficult corporate budgets and reduced market demand for their services. One of the reasons is that startup failures or departures due to mergers are not yet reflected in the number of logos.
But the main reason is that interest in AI is surging, giving rise to a wave of next-generation data tools. What is somewhat unique is that this new wave of AI gained momentum, giving birth to many more new data tool companies before the actual market shook or the consolidation of the last wave (MDS) was complete.
However, if someone, like me, believes that the “AI stack” is a fundamentally new paradigm, this is somewhat understandable. At a high level, AI is driven by huge amounts of unstructured data (think Internet-sized piles of text, images, and video), while MDS is driven by small amounts of structured data (think tabular data in spreadsheets or databases). Built for.
Additionally, the so-called non-deterministic or “generative” nature of AI models is completely different from the deterministic approaches designed for more traditional machine learning (ML) models. These older models are often designed to predict outcomes based on limited training data sets. However, new generative AI models are designed to synthesize summaries or generate insights. This means that the output may be different each time the model is run, even if the inputs have not changed. To prove this, note the difference you get when you ask ChatGPT the same question more than once.
Because the architecture and output of AI models are fundamentally different, developers must adopt a new paradigm to test and evaluate these responses based on the original intent of the user or application. Not to mention ensuring the ethical safety, governance, and monitoring of AI systems. Some of the additional areas around new AI stacks that require further investigation are agent orchestration (AI models talking to other models). Opportunity for small-scale, purpose-built models for vertical use cases that are disrupting traditional industries that were too costly and complex to automate Fine-tuning that companies can use to “embed” their own personal data to create custom models A workflow tool that enables the collection and management of data sets.
As new developer platforms emerge, all of these opportunities and more will be addressed as part of the new AI stack. Hundreds of startups are already working to address these challenges by building new, cutting-edge tools.
How can we build better and smarter this time?
As we enter the new “AI era,” I think it’s important to recognize where we’re coming from. After all, data is the mother of AI, and countless data tools in recent history have provided at least a solid training to get started in business. We are firmly on the path to treating data like first-class citizens. But I am asking myself this:“How can we avoid the tool overload of the past as we continue to advance toward an AI future?”
One suggestion is for companies to strive to be clear about the specific value they expect specific data or AI tools to provide to their business. Overinvesting in technology trends for the wrong reasons is never a good business strategy. While AI is currently sucking all the air out of enterprise IT and software budgets, it is important to focus on deploying the tools that can prove it. Clear value and real ROI.
Another attraction is for entrepreneurs to stop building “me too” data and AI tool options. If you already have several tools in the market that you are considering entering the market, take some time to ask yourself: “Are we the best founding team with unique, differentiated experiences that drive key insights into how we solve this problem? ?” If the answer isn't a resounding yes, don't pursue building that tool, no matter how much money a VC is willing to invest.
Finally, investors are encouraged to think carefully about where value will come from the various layers of the data and AI tooling stack before investing in early-stage companies. Too often we see VCs with single checkbox criteria. If the tool-building founder has a certain pedigree or comes from a certain technology company, they write a check immediately. This is lazy and creates too many homogenized data tools crowding the market. No wonder you need a magnifying glass to read MAD 2024.
One speaker at a recent conference suggested that companies ask themselves, “What will it cost my business if even one row of data is inaccurate?” In other words, can you establish a way to clearly outline a framework for how to quantify the value of data or data tools to your business?
If we can't get there, no amount of budget or venture capital invested in data and AI tools will solve our mess.
Pete Soderling is the founder and general partner of Zero Prime Ventures.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place for professionals, including technical people, who work with data to share data-related insights and innovations.
If you want to read about cutting-edge ideas, latest information, best practices, and the future of data and data technology, join DataDecisionMakers.
You might also consider contributing your own article!
Learn more at DataDecisionMakers