Select Page

Datavolo Announces Over $21M in Funding!

Datavolo Raises Over $21 Million in Funding from General Catalyst and others to Solve Multimodal Data Pipelines for AI

Phoenix, AZ, April 2, 2024Datavolo, the leader in multimodal data pipelines for AI, announced today that it has raised over $21 million in financing, led by General Catalyst, with participation from notable investors including Citi Ventures, Human Capital, Rob Bearden, and MVP Ventures. The company’s total funding to date includes Seed and Series A funding.

Organizations are assessing the opportunities for GenAI to dramatically transform their businesses and create customer value – increasing revenues and reducing costs simultaneously.  While AI models are rapidly iterating and advancing, their effectiveness at their core is constrained by their ability to access timely, secure, and complete data sets. According to an August 2023 IDC report, 90% of data generated by organizations is unstructured, yet enterprises today are heavily dependent on data pipeline software that is neither designed for nor capable of handling the unstructured data necessary for fully unlocking GenAI potential. 

Fortunately, Datavolo is built for this very task. Datavolo is powered by Apache NiFi which was created at the National Security Agency (NSA) specifically to handle secure pipelines of multimodal data.  Over the last decade, NiFi has evolved to also handle the structured data needs of modern enterprises and is used by thousands of the largest and most secure corporations and agencies in the world. However, the use case of multimodal data pipelines for GenAI is akin to a homecoming for the Datavolo team as it returns the product to its unique differentiation in the market and why it was originally created.

The new capital enables the engineering team to focus on harnessing the foundational power of NiFi into a cloud-native managed service oriented with specific capabilities and integrations for rapid development of multimodal data pipelines for the latest AI Systems.  Both funding rounds have been led by General Catalyst, a prominent investor in next-generation data and analytic solutions.  

“When AI systems become the backbone of daily business operations, it will be built on a data architecture which is multimodal and real time,” said Quentin Clark, Managing Director of General Catalyst. “Joe and Luke are not just building another data platform; they’re setting the stage for a future where data isn’t merely handled but intelligently harnessed to fulfill the evolving requirements driven by AI. We believe Datavolo has one of the best open-source teams out there, and has the product and partners in place to make this vision a reality.”

Datavolo’s founders have a long and deep history as leaders in the data and analytics space.  Joe Witt, CEO, was the creator in 2006 of the project that became Apache NiFi while working at the NSA.  He also founded Onyara which was acquired by Hortonworks in 2015 and most recently was Corporate Vice President of Engineering for the Data-In-Motion portfolio at Cloudera.  Luke Roquet, COO, has been a senior sales and marketing executive in the data and analytics space since 2007 across innovative companies such as Oracle, Hortonworks, Unravel Data, AWS, and Cloudera.  Joe and Luke have worked with the largest and most trailblazing companies in the world to solve their data and AI challenges.  The founders share a passion for building cutting edge products and, most importantly, making their customers wildly successful.  

“At Citi Ventures, we have been investing in artificial intelligence and machine learning companies for over a decade. When we approached Datavolo, we were particularly excited by their ability to meet the needs of large enterprises like Citi,” says Vibhor Rastogi, Head of AI Investments at Citi Ventures. “Their scalable, flexible and secure multimodal data pipeline platform enables users to ingest, process, govern, schedule and track unstructured data from beginning to end, establishing a chain of custody for mission-critical generative AI retrieval-augmented generation (RAG) applications. These are key requirements for regulated and security-sensitive industries such as banking. Our investment in Datavolo is part of a commitment to exploring new generative AI products that may benefit the bank and its customers around the world.”

In addition, Datavolo is pleased to announce a private beta program for customers building Retrieval Augmented Generation (RAG) applications today. Ideal customers are those seeking a SaaS solution or operating within Amazon Web Services and looking to automate continuous capture, transformation, and loading of unstructured data from and to hundreds of systems out of the box.  To learn more about this private beta, please fill out the form at https://datavolo.io/contact-us/.

“Luke and I feel fortunate to collaborate with exceptional investors and advisors, assembling an extraordinary team with deep enterprise expertise. Every team member is dedicated to the mission of advancing Generative AI applications tailored to the data-intensive needs of our customers” says Datavolo co-founder Joe Witt.

About Datavolo

Founded in 2023, Datavolo helps customers rapidly build scalable and secure multimodal data pipelines for AI.  Datavolo is founded by Joe Witt, creator of the project that became Apache NiFi, and Luke Roquet, veteran sales and marketing leader in data and analytics.  Datavolo is powered by NiFi which was originally developed at the NSA with the purpose being the global acquisition, processing, and distribution of multimodal data.  Datavolo solves a foundational part of the GenAI tech stack for organizations looking to build secure and scalable AI applications. For more information, visit datavolo.io

About General Catalyst

General Catalyst is a venture capital firm that invests in powerful, positive change that endures — for our entrepreneurs, our investors, our people, and society.  We support founders with a long-term view who challenge the status quo, partnering with them from seed to growth stage and beyond to build companies that withstand the test of time. With offices in San Francisco, Palo Alto, New York City, London, Berlin and Boston, the firm has helped support the growth of businesses such as: Airbnb, Deliveroo, Guild, Gusto, Hubspot, Illumio, Lemonade, Livongo, Oscar, Samsara, Snap, Stripe, and Warby Parker. For more: www.generalcatalyst.com.

Contacts

[email protected]

Top Related Posts

Data Pipeline Observability is Key to Data Quality

In my recent article, What is Observability, I discussed how observability is crucial for understanding complex architectures and their interactions and dependencies between different system components. Data Observability, unlike Software Observability, aims to...

Building GenAI enterprise applications with Vectara and Datavolo

The Vectara and Datavolo integration and partnership When building GenAI apps that are meant to give users rich answers to complex questions or act as an AI assistant (chatbot), we often use Retrieval Augmented Generation (RAG) and want to ground the responses on...

Custom code adds risk to the enterprise

Data teams are actively delivering new architectures to propel AI innovation at a rapid pace. In this blog, we’ll explore how Datavolo empowers these teams to accelerate while addressing the critical aspects of security, observability, and maintenance for their data...

Fueling your Chatbots with Slack

The true power of chatbots is not in how much the large language model (LLM) powering it understands. It’s the ability to provide relevant, organization-specific information to the LLM so that it can provide a natural language interface to vast amounts of data. That...

Datavolo Architecture Viewpoint

The Evolving AI Stack Datavolo is going to play in three layers of the evolving AI stack: data pipelines, orchestration, and observability & governance. The value of any stack is determined by the app layer, as we saw with Windows, iOS, and countless other...

ETL is dead, long live ETL (for multimodal data)

Why did ELT become the most effective pattern for structured data? A key innovation in the past decade that unlocked the modern data stack was the decoupling of storage and compute enabled by cloud data warehouses as well as cloud data platforms like Databricks. This...

FlowGen Improvements (already!)

In the past week, since Datavolo released its Flow Generation capability, we've witnessed fantastic adoption as users have eagerly requested flows from the Flow Generation bot. We're excited to share that we have recently upgraded our models, enhancing both the power...

Seven Strategies for Securing Data Ingest Pipelines

Introduction Information security is an elusive but essential quality of modern computer systems. Implementing secure design principles involves different techniques depending on the domain, but core concepts apply regardless of architecture, language, or layers of...