Datavolo Announces Over $21M in Funding!

Datavolo Raises Over $21 Million in Funding from General Catalyst and others to Solve Multimodal Data Pipelines for AI

Phoenix, AZ, April 2, 2024Datavolo, the leader in multimodal data pipelines for AI, announced today that it has raised over $21 million in financing, led by General Catalyst, with participation from notable investors including Citi Ventures, Human Capital, Rob Bearden, and MVP Ventures. The company’s total funding to date includes Seed and Series A funding.

Organizations are assessing the opportunities for GenAI to dramatically transform their businesses and create customer value – increasing revenues and reducing costs simultaneously.  While AI models are rapidly iterating and advancing, their effectiveness at their core is constrained by their ability to access timely, secure, and complete data sets. According to an August 2023 IDC report, 90% of data generated by organizations is unstructured, yet enterprises today are heavily dependent on data pipeline software that is neither designed for nor capable of handling the unstructured data necessary for fully unlocking GenAI potential. 

Fortunately, Datavolo is built for this very task. Datavolo is powered by Apache NiFi which was created at the National Security Agency (NSA) specifically to handle secure pipelines of multimodal data.  Over the last decade, NiFi has evolved to also handle the structured data needs of modern enterprises and is used by thousands of the largest and most secure corporations and agencies in the world. However, the use case of multimodal data pipelines for GenAI is akin to a homecoming for the Datavolo team as it returns the product to its unique differentiation in the market and why it was originally created.

The new capital enables the engineering team to focus on harnessing the foundational power of NiFi into a cloud-native managed service oriented with specific capabilities and integrations for rapid development of multimodal data pipelines for the latest AI Systems.  Both funding rounds have been led by General Catalyst, a prominent investor in next-generation data and analytic solutions.  

“When AI systems become the backbone of daily business operations, it will be built on a data architecture which is multimodal and real time,” said Quentin Clark, Managing Director of General Catalyst. “Joe and Luke are not just building another data platform; they’re setting the stage for a future where data isn’t merely handled but intelligently harnessed to fulfill the evolving requirements driven by AI. We believe Datavolo has one of the best open-source teams out there, and has the product and partners in place to make this vision a reality.”

Datavolo’s founders have a long and deep history as leaders in the data and analytics space.  Joe Witt, CEO, was the creator in 2006 of the project that became Apache NiFi while working at the NSA.  He also founded Onyara which was acquired by Hortonworks in 2015 and most recently was Corporate Vice President of Engineering for the Data-In-Motion portfolio at Cloudera.  Luke Roquet, COO, has been a senior sales and marketing executive in the data and analytics space since 2007 across innovative companies such as Oracle, Hortonworks, Unravel Data, AWS, and Cloudera.  Joe and Luke have worked with the largest and most trailblazing companies in the world to solve their data and AI challenges.  The founders share a passion for building cutting edge products and, most importantly, making their customers wildly successful.  

“At Citi Ventures, we have been investing in artificial intelligence and machine learning companies for over a decade. When we approached Datavolo, we were particularly excited by their ability to meet the needs of large enterprises like Citi,” says Vibhor Rastogi, Head of AI Investments at Citi Ventures. “Their scalable, flexible and secure multimodal data pipeline platform enables users to ingest, process, govern, schedule and track unstructured data from beginning to end, establishing a chain of custody for mission-critical generative AI retrieval-augmented generation (RAG) applications. These are key requirements for regulated and security-sensitive industries such as banking. Our investment in Datavolo is part of a commitment to exploring new generative AI products that may benefit the bank and its customers around the world.”

In addition, Datavolo is pleased to announce a private beta program for customers building Retrieval Augmented Generation (RAG) applications today. Ideal customers are those seeking a SaaS solution or operating within Amazon Web Services and looking to automate continuous capture, transformation, and loading of unstructured data from and to hundreds of systems out of the box.  To learn more about this private beta, please fill out the form at

“Luke and I feel fortunate to collaborate with exceptional investors and advisors, assembling an extraordinary team with deep enterprise expertise. Every team member is dedicated to the mission of advancing Generative AI applications tailored to the data-intensive needs of our customers” says Datavolo co-founder Joe Witt.

About Datavolo

Founded in 2023, Datavolo helps customers rapidly build scalable and secure multimodal data pipelines for AI.  Datavolo is founded by Joe Witt, creator of the project that became Apache NiFi, and Luke Roquet, veteran sales and marketing leader in data and analytics.  Datavolo is powered by NiFi which was originally developed at the NSA with the purpose being the global acquisition, processing, and distribution of multimodal data.  Datavolo solves a foundational part of the GenAI tech stack for organizations looking to build secure and scalable AI applications. For more information, visit

About General Catalyst

General Catalyst is a venture capital firm that invests in powerful, positive change that endures — for our entrepreneurs, our investors, our people, and society.  We support founders with a long-term view who challenge the status quo, partnering with them from seed to growth stage and beyond to build companies that withstand the test of time. With offices in San Francisco, Palo Alto, New York City, London, Berlin and Boston, the firm has helped support the growth of businesses such as: Airbnb, Deliveroo, Guild, Gusto, Hubspot, Illumio, Lemonade, Livongo, Oscar, Samsara, Snap, Stripe, and Warby Parker. For more:


[email protected]

Top Related Posts

Survey Findings – Evolving Apache NiFi

Survey of long time users to understand NiFi usage Datavolo empowers and enables the 10X Data Engineer. Today's 10X Data Engineer has to know about and tame unstructured and multi-modal data. Our core technology, Apache NiFi, has nearly 18 years of development,...

Generative AI – State of the Market – June 17, 2024

GenAI in the enterprise is still in its infancy.  The excitement and potential is undeniable.  However, enterprises have struggled to derive material value from GenAI and the hype surrounding this technology is waning.  We have talked with hundreds of organizations...

Secure Data Pipeline Observability in Minutes

Monitoring data flows for Apache NiFi has evolved quite a bit since its inception. What started generally with logs and processors sprinkled throughout the pipeline grew to Prometheus REST APIs and a variety of Reporting Tasks. These components pushed NiFi closer to...

How to Package and Deploy Python Processors for Apache NiFi

Introduction Support for Processors in native Python is one of the most notable new features in Apache NiFi 2. Each milestone version of NiFi 2.0.0 has enhanced Python integration, with milestone 3 introducing support for loading Python Processors from NiFi Archive...

Apache NiFi – designed for extension at scale

Apache NiFi acquires, prepares, and delivers every kind of data, and that is exactly what AI systems are hungry for.  AI systems require data from all over the spectrum of unstructured, structured, and multi-modal and the protocols of data transport are as varied...

Data Pipeline Observability is Key to Data Quality

In my recent article, What is Observability, I discussed how observability is crucial for understanding complex architectures and their interactions and dependencies between different system components. Data Observability, unlike Software Observability, aims to...

Building GenAI enterprise applications with Vectara and Datavolo

The Vectara and Datavolo integration and partnership When building GenAI apps that are meant to give users rich answers to complex questions or act as an AI assistant (chatbot), we often use Retrieval Augmented Generation (RAG) and want to ground the responses on...