Our blog

Insights and inspiration, case studies and community for AI/ML and Data Engineers.

How we use the Kubernetes Operator pattern

How we use the Kubernetes Operator pattern

Organizations using NiFi for business-critical workloads have deep automation, orchestration, and security needs that Kubernetes by itself cannot support. In this second installment of our Kubernetes series, we explore how the Kubernetes Operator pattern alleviates...

Constructing Apache NiFi Clusters on Kubernetes

Constructing Apache NiFi Clusters on Kubernetes

Introduction Clustering is a core capability of Apache NiFi. Clustered deployments support centralized configuration and distributed processing. NiFi 1.0.0 introduced clustering based on Apache ZooKeeper for coordinated leader election and shared state tracking. Among...

Prompt Injection Attack Explained

Prompt Injection Attack Explained

By now, it’s no surprise that we’ve all heard about prompt injection attacks affecting Large Language Models (LLMs). Since November 2023, prompt injection attacks have been wreaking havoc on many in house built chatbots and homegrown large language models. But what is...

Tutorial – How to Convert to ONNX®

Tutorial – How to Convert to ONNX®

Converting from Pytorch/Safetensors to ONNX® Given the advantages described in Onward With ONNX® we’ve taken the opinion that if it runs on ONNX that’s the way we want to go.  So while ONNX has a large model zoo we’ve had to convert a few models by hand.  Many models...

Survey Findings – Evolving Apache NiFi

Survey Findings – Evolving Apache NiFi

Survey of long time users to understand NiFi usage Datavolo empowers and enables the 10X Data Engineer. Today's 10X Data Engineer has to know about and tame unstructured and multi-modal data. Our core technology, Apache NiFi, has nearly 18 years of development,...

Apache NiFi – designed for extension at scale

Apache NiFi – designed for extension at scale

AI systems need data all along the spectrum of unstructured, structured, and multi-modal.  The protocols by which these diverse types of data are both acquired and delivered are as varied as the data types themselves.  At the same time data volumes and latency requirements grow ever stronger which demands solutions which scale down and up first – then out.  In other words we need maximum efficiency, we can’t resort to remote procedure calls for every operation, and we need to support hundreds if not thousands of different components or tools in the same virtual machine.