NiFi Flow GitHub Action

Continuous Integration for NiFi Flows in GitHub

Datavolo is proud to announce the release of a GitHub Action designed to help with Continuous Integration of Apache NiFi Flows and make reviewing of changes between two flow versions as easy as possible.

At Datavolo, collaboration on the Flow Definitions is done by the use of registry clients directly connecting to code repositories. We currently provide two options:

  • GitHub Registry Client
  • GitLab Registry Client

The idea of a Registry Client is to connect NiFi directly to a repository and use the repository as a way to store and version flow definitions. A demo is available on our Youtube Channel where the GitHub Registry Client is used as an example to understand the flow versioning capabilities with NiFi.

Branching

Let’s consider a relatively simple approach for branching:

  • There is a main branch for what is deployed in the production environment
  • There is a dev branch for the working being done in the development environment
  • Additional feature-xxx branches are created when working on the new feature of the flow

A fairly common strategy is to create a feature branch from the dev branch when a new feature needs to be developed in an existing flow. This allows for multiple individuals to work on different features for the same flow, at the same time.

To do that, the following steps would be executed:

  • In the code repository used by the configured Registry Client, the user would go create the feature branch.
  • Once done, the user can go back into the NiFi UI and import the flow from the feature branch as a new process group.
  • At this point, the user can go in the process group and start working on the changes and commit changes whenever required.

Once the final changes for the feature are done and the last commit landed on the feature branch, it would be time to go into the code repository and open a pull request from the feature branch against the dev branch.

A demo is available on our Youtube Channel where we discuss the concept of branching with the GitHub Registry Client.

Pull Requests and reviewing changes

In NiFi, a flow definition is a JSON file and comparing two JSON files might not be the easiest thing to do. When opening a pull request, it is expected from the author to clearly describe the changes that are submitted but the reviewers of the pull request would also look at the differences between the two JSON files to accept or not the submitted changes.

To help with reviewing changes, Datavolo provides a GitHub Action that will compare the two flow definitions and automatically add a comment to the pull request with a human readable description of the changes.

To configure this Github Action, create a file .github/workflows/flowdiff.yml in the repository used to version your NiFi Flow definitions. The content of the file can be found in the Datavolo Flow Diff GitHub Action page.

When filing a Pull Request to review changes for a new flow version, the GitHub Action will be automatically triggered to compare the two versions and comment the Pull Request with a comprehensive description of the changes.

Please check this video for a demo of this feature:

This GitHub Action is free to use by anyone and contributions are welcome!

Datavolo is working on a lot of new features to help with CI/CD pipelines when it comes to Apache NiFi, so stay tuned and have a look at our documentation.

NiFi Flow GitHub Action

Top Related Posts

Apache NiFi frontend modernization complete

Apache NiFi's 2.0.0 release included several upgrades that make the platform faster, more secure, and easy to use. One thing that really stands out to us, however, is how transformational Apache NiFi's frontend modernization really is. The platform has been redesigned...

Next Generation Apache NiFi | NiFi 2.0.0 is GA

Apache NiFi is about to turn 10 years old as an Apache Software Foundation (ASF) project and it is in use by over 8,000 enterprises around the globe. No better time for this incredibly flexible and powerful framework to finalize its 2.0.0 version. Welcome to the Next...

Streaming Data to Iceberg From Any Source

New support for writing to Apache Polaris-managed Apache Iceberg tables enables Datavolo customers to stream transformed data from nearly any source system into Iceberg. Originally created by Snowflake, Polaris allows customers to use any query engine to access the...

How we use the Kubernetes Operator pattern

Organizations using NiFi for business-critical workloads have deep automation, orchestration, and security needs that Kubernetes by itself cannot support. In this second installment of our Kubernetes series, we explore how the Kubernetes Operator pattern alleviates...

Constructing Apache NiFi Clusters on Kubernetes

Introduction Clustering is a core capability of Apache NiFi. Clustered deployments support centralized configuration and distributed processing. NiFi 1.0.0 introduced clustering based on Apache ZooKeeper for coordinated leader election and shared state tracking. Among...

Onward with ONNX® – How We Did It

Digging into new AI models is one of the most exciting parts of my job here at Datavolo. However, having a new toy to play with can easily be overshadowed by the large assortment of issues that come up when you’re moving your code from your laptop to a production...

Survey Findings – Evolving Apache NiFi

Survey of long time users to understand NiFi usage Datavolo empowers and enables the 10X Data Engineer. Today's 10X Data Engineer has to know about and tame unstructured and multi-modal data. Our core technology, Apache NiFi, has nearly 18 years of development,...

Apache NiFi – designed for extension at scale

Apache NiFi acquires, prepares, and delivers every kind of data, and that is exactly what AI systems are hungry for.  AI systems require data from all over the spectrum of unstructured, structured, and multi-modal and the protocols of data transport are as varied...