Introducing our GenAI NiFi Flow Builder!

Hey everyone, it’s been an incredible journey over the past ten years since we open-sourced Apache NiFi. Right from the beginning, our mission with NiFi was crystal clear: to make it easier for all of you to gather data from anywhere and transport it to wherever it’s needed, all while ensuring the data is prepared for consumption. We wanted to achieve this in a way that allowed you to build powerful data pipelines quickly and with complete transparency through a visual interface.

Today, I want to share with you how, with the advent of large language models, we’re taking data integration to the next level. We’re making it even more accessible and helping you create robust data flows in record time. Many of you have experienced firsthand how NiFi has transformed your data processing tasks, turning what used to take weeks or months into mere hours or days. And now, with the introduction of Datavolo, we’re on a mission to make it happen even faster.

Our goal is simple: to reduce the time it takes to process data from hours to days, down to minutes, or even less. At Datavolo, we’re reimagining the user experience to make it incredibly user-friendly. This encompasses everything from setting up and scaling NiFi clusters to building and monitoring data flows. And let me tell you, our all-new AI-powered flow generation capabilities are just the beginning.

Let’s dive into a real-world demonstration to see Datavolo in action:

In this video, I’m going to take you to the DataVolo community workspace on Slack, where we interact with our FlowGen user. We’re going to request the creation of a data flow that extracts JSON data from an S3 bucket we call “NiFi notes.” We know that the data might be compressed, so we specify that it should include a decompression step. Additionally, we want to reformat the transaction date field and convert all field names to uppercase. Lastly, the data should be pushed into a Postgres database using the transactions table.

As soon as we send the request, we receive a quick acknowledgment, signaling that Datavolo is starting to work on it. Sometimes the messages from Datavolo can be a bit quirky, but they’re always entertaining!

In no time, we receive the generated flow, which we can download and add to our NiFi canvas. It’s named “S3 JSON to Postgres,” and our first task is to ensure its accuracy. While AI-generated, there’s always a possibility of minor errors. However, the overall structure appears sound: listing S3 bucket contents, fetching files, identifying and decompressing when necessary, reformatting dates, converting field names to uppercase, and finally, pushing data into Postgres.

Configuring parameters is a breeze, with Datavolo smartly assisting in their creation. We enable controller services, and the flow is ready to go.

Remarkably, it takes just about two minutes to build and configure this fully functional data flow, and it executes all the specified tasks, from sourcing data in S3 to loading it into Postgres, in mere milliseconds. This incredible speed and simplicity demonstrate that you don’t need to be an NiFi expert to harness Datavolo’s capabilities.

This development in data integration is nothing short of revolutionary, and I couldn’t be more excited to share it with all of you. Datavolo’s ability to expedite and simplify data processing tasks holds immense promise for businesses and individuals alike. If you’re eager to learn more and experience this capability for yourself, please visit Datavolo.io and reach out to us. We can’t wait to hear from you and help you revolutionize your data processing workflows.

Top Related Posts

Generative AI – State of the Market – June 17, 2024

GenAI in the enterprise is still in its infancy.  The excitement and potential is undeniable.  However, enterprises have struggled to derive material value from GenAI and the hype surrounding this technology is waning.  We have talked with hundreds of organizations...

Building GenAI enterprise applications with Vectara and Datavolo

The Vectara and Datavolo integration and partnership When building GenAI apps that are meant to give users rich answers to complex questions or act as an AI assistant (chatbot), we often use Retrieval Augmented Generation (RAG) and want to ground the responses on...

Datavolo Announces Over $21M in Funding!

Datavolo Raises Over $21 Million in Funding from General Catalyst and others to Solve Multimodal Data Pipelines for AI Phoenix, AZ, April 2, 2024 – Datavolo, the leader in multimodal data pipelines for AI, announced today that it has raised over $21 million in...

Fueling your Chatbots with Slack

The true power of chatbots is not in how much the large language model (LLM) powering it understands. It’s the ability to provide relevant, organization-specific information to the LLM so that it can provide a natural language interface to vast amounts of data. That...

Datavolo Architecture Viewpoint

The Evolving AI Stack Datavolo is going to play in three layers of the evolving AI stack: data pipelines, orchestration, and observability & governance. The value of any stack is determined by the app layer, as we saw with Windows, iOS, and countless other...

ETL is dead, long live ETL (for multimodal data)

Why did ELT become the most effective pattern for structured data? A key innovation in the past decade that unlocked the modern data stack was the decoupling of storage and compute enabled by cloud data warehouses as well as cloud data platforms like Databricks. This...

FlowGen Improvements (already!)

In the past week, since Datavolo released its Flow Generation capability, we've witnessed fantastic adoption as users have eagerly requested flows from the Flow Generation bot. We're excited to share that we have recently upgraded our models, enhancing both the power...

The Evolution of AI Engineering and Datavolo’s Role

Humility is the first lesson In the machine learning era of software engineering, one persistent truth has emerged: engineers are increasingly submitting to the will of the machine. A significant milestone in the transition from classical machine learning to deep...