Select Page

FlowGen Improvements (already!)

In the past week, since Datavolo released its Flow Generation capability, we’ve witnessed fantastic adoption as users have eagerly requested flows from the Flow Generation bot. We’re excited to share that we have recently upgraded our models, enhancing both the power and accuracy of flow generation. Additionally, we’ve introduced several key new features.

One of the most notable improvements is the enhanced accuracy of the model. Specifically, we have refined the process of selecting the most relevant Processors for sources and sinks. Furthermore, we have fine-tuned the logic for identifying the appropriate Controller Service for your specific use case.

The significant improvements in accuracy alone have justified the release of a new model version. The flow requests we’ve received through Slack have been incredibly insightful and have led us to enable additional capabilities. For instance, you can now request the Flow Generator to create a flow that utilizes NiFi’s Stateless Execution Engine. This engine offers various runtime trade-offs, most notably shifting the “transactional boundary” from the Processor level to the Process Group level. This allows for the consumption of messages from durable stores like Apache Kafka, JMS, or Amazon Kinesis without acknowledging the messages until processing is complete. Consequently, messages will be redelivered in case of processing failures.

The ability to have NiFi process a single FlowFile at a time can be achieved through several approaches. The Flow Generation bot can now handle this task for you if you request it. For example, you might ask it to “Create a flow that <insert processing logic>… Only process one file at a time.”

At Datavolo, we are dedicated to providing the best and most accurate flows possible. However, we acknowledge that anything generated using Generative AI may introduce inaccuracies. Moreover, we recognize that accuracy can vary under different circumstances. We have enhanced our bot to make these particular circumstances more transparent.

If the bot cannot find a suitable Processor for a specific task, we will now convey this information, along with helpful insights,  in the Slack message. In other cases, the model will select a Processor and indicate in the NiFi flow that this particular Processor should be carefully reviewed. For instance, consider a scenario where there is a typo in your message, and you ask the bot to “Generate a flow that fetches data from S3, flumps the data, and then sends it to GCS.”

The model may omit the step mentioning “flumping the data” and include a warning in the message, such as “The term ‘flump’ is not a standard data processing term, and it is not clear what specific transformation it refers to. Assuming it means a generic transformation or processing, we can use a processor like JoltTransformRecord or ScriptedTransformRecord to apply the required transformation.” Alternatively, it may insert a JoltTransformRecord Processor and prominently label it with this warning:

Even without typos, there may be situations where the model’s confidence is low. For instance, if you ask it to send data to an endpoint it is not familiar with or perform a transformation that the model is uncertain about.

While flow generation is a powerful capability on its own, the ability to swiftly identify and highlight areas that require particular attention translates into even faster time to production!

We are thrilled not only to offer this capability but also to see many users embracing it eagerly, and witnessing improvements emerging rapidly. If you haven’t already, we invite you to join our Slack Community and experience this capability for yourself!

Top Related Posts

Building GenAI enterprise applications with Vectara and Datavolo

The Vectara and Datavolo integration and partnership When building GenAI apps that are meant to give users rich answers to complex questions or act as an AI assistant (chatbot), we often use Retrieval Augmented Generation (RAG) and want to ground the responses on...

Datavolo Announces Over $21M in Funding!

Datavolo Raises Over $21 Million in Funding from General Catalyst and others to Solve Multimodal Data Pipelines for AI Phoenix, AZ, April 2, 2024 – Datavolo, the leader in multimodal data pipelines for AI, announced today that it has raised over $21 million in...

Fueling your Chatbots with Slack

The true power of chatbots is not in how much the large language model (LLM) powering it understands. It’s the ability to provide relevant, organization-specific information to the LLM so that it can provide a natural language interface to vast amounts of data. That...

Datavolo Architecture Viewpoint

The Evolving AI Stack Datavolo is going to play in three layers of the evolving AI stack: data pipelines, orchestration, and observability & governance. The value of any stack is determined by the app layer, as we saw with Windows, iOS, and countless other...

ETL is dead, long live ETL (for multimodal data)

Why did ELT become the most effective pattern for structured data? A key innovation in the past decade that unlocked the modern data stack was the decoupling of storage and compute enabled by cloud data warehouses as well as cloud data platforms like Databricks. This...

The Evolution of AI Engineering and Datavolo’s Role

Humility is the first lesson In the machine learning era of software engineering, one persistent truth has emerged: engineers are increasingly submitting to the will of the machine. A significant milestone in the transition from classical machine learning to deep...

Introducing our GenAI NiFi Flow Builder!

Hey everyone, it's been an incredible journey over the past ten years since we open-sourced Apache NiFi. Right from the beginning, our mission with NiFi was crystal clear: to make it easier for all of you to gather data from...

Field CTO Perspectives: Why Datavolo and Why Now?

Setting the Stage There are a few times in our lives when we feel the ground shifting under our feet due to seismic shifts in technology. You know these paradigm shifts are truly seismic when they lead to broader changes in society–the web, search engines, mobile, and...