Introduction
Support for Processors in native Python is one of the most notable new features in Apache NiFi 2. Each milestone version of NiFi 2.0.0 has enhanced Python integration, with milestone 3 introducing support for loading Python Processors from NiFi Archive files. NiFi 2.0.0-M3 aligns Python Processor loading with Java component loading, which provides a solid foundation for scalable extensibility.
Announcing the Hatch Datavolo NAR Plugin
To accompany the release of Apache NiFi 2.0.0-M3, Datavolo published the Hatch Datavolo NAR project, which provides a builder plugin to the Hatch project management tool. As a project of the Python Packaging Authority, Hatch supports building, managing, and publishing Python components. With a few additions to a Python project configuration, the Hatch Datavolo NAR plugin not only builds NiFi Archives, but also packages Python dependencies for subsequent deployment. Hatch supports continuous integration and delivery, making the Datavolo NAR plugin a natural fit for building maintainable solutions with Apache NiFi.
Hatch is not alone in the world of Python project management solutions, but with support for major operating systems, best practices for development lifecycle operations, and extensibility for additional features, it provides a straightforward solution for developing libraries and applications. Getting started with Hatch is easy with its template-based project creation command.
Packaging Custom Python Processors
Configuring a project with the Hatch Datavolo NAR plugin for packaging Python Processors involves straightforward updates to the project configuration.
A single Hatch command creates a new project for Python Processors.
hatch new processors
The command creates a project directory and prints the structure as follows.
processors
├── src
│ └── processors
│ ├── __about__.py
│ └── __init__.py
├── tests
│ └── __init__.py
├── LICENSE.txt
├── README.md
└── pyproject.toml
The pyproject.toml configuration includes default values and placeholders for project metadata.
Adding hatch-datavolo-nar to the list of required libraries for the project build system enables the nar target argument for the hatch build command.
[build-system]
requires = ["hatchling", "hatch-datavolo-nar"]
The project configuration also requires a target section listing the package directory containing Python Processor classes.
[tool.hatch.build.targets.nar]
packages = ["src/processors"]
The last step required before creating the Python Processor class itself is defining dependencies. The follow configuration enables packaging and using the Python HTTP requests library.
[project]
dependencies = ["requests"]
The Apache NiFi Python Developer’s Guide provides examples of custom Processor classes to get started.
Running the hatch build command with the nar target downloads declared project dependencies for packaging together with custom Python Processors.
hatch build --target nar
The build command creates a versioned NAR in the dist directory. Copy the NAR file to the extensions directory of an Apache NiFi installation to start building a custom data pipeline.
Conclusion
The Hatch Datavolo NAR project is open sourced under the Apache License Version 2.0. The source code is available in the hatch-datavolo-nar project on GitHub. Datavolo publishes project releases to the Python Package Index, using Hatch for build automation.
The NAR plugin for Hatch enables shift left security for Python Processors. The project highlights Datavolo’s commitment to enterprise security for data pipeline development and deployment. Packaging code and dependencies enables scanning to reduce the risks surrounding custom code, and supports repeatable deployments based on software development best practices.