Datavolo's blog
What's in Datavolo's blog? Insights and inspiration, case studies and community for AI/ML and Data Engineers. Discover what we are talking about everyday here at Datavolo!
Continuous Integration for NiFi Flows in GitHub
Datavolo is proud to announce the release of a GitHub Action designed to help with Continuous Integration of Apache NiFi Flows and make reviewing of changes between two flow versions as easy as possible. At Datavolo, collaboration on the Flow Definitions is done by...
Apache NiFi frontend modernization complete
Apache NiFi's 2.0.0 release included several upgrades that make the platform faster, more secure, and easy to use. One thing that really stands out to us, however, is how transformational Apache NiFi's frontend modernization really is. The platform has been redesigned...
Next Generation Apache NiFi | NiFi 2.0.0 is GA
Apache NiFi is about to turn 10 years old as an Apache Software Foundation (ASF) project and it is in use by over 8,000 enterprises around the globe. No better time for this incredibly flexible and powerful framework to finalize its 2.0.0 version. Welcome to the Next...
Streaming Data to Iceberg From Any Source
New support for writing to Apache Polaris-managed Apache Iceberg tables enables Datavolo customers to stream transformed data from nearly any source system into Iceberg. Originally created by Snowflake, Polaris allows customers to use any query engine to access the...
What is LLM Insecure Output Handling?
The Open Worldwide Application Security Project (OWASP) states that insecure output handling neglects to validate large language model (LLM) outputs that may lead to downstream security exploits, including code execution that compromises systems and exposes data. This...
Data Ingestion Strategies for GenAI Pipelines
You did it! You finally led the charge and persuaded your boss to let your team start working on a new generative AI application at work and you’re psyched to get started. You get your data and start the ingestion process but right when you think you’ve nailed it, you...
How we use the Kubernetes Operator pattern
Organizations using NiFi for business-critical workloads have deep automation, orchestration, and security needs that Kubernetes by itself cannot support. In this second installment of our Kubernetes series, we explore how the Kubernetes Operator pattern alleviates...
Constructing Apache NiFi Clusters on Kubernetes
Introduction Clustering is a core capability of Apache NiFi. Clustered deployments support centralized configuration and distributed processing. NiFi 1.0.0 introduced clustering based on Apache ZooKeeper for coordinated leader election and shared state tracking. Among...
Prompt Injection Attack Explained
By now, it’s no surprise that we’ve all heard about prompt injection attacks affecting Large Language Models (LLMs). Since November 2023, prompt injection attacks have been wreaking havoc on many in house built chatbots and homegrown large language models. But what is...
Onward with ONNX® – How We Did It
Digging into new AI models is one of the most exciting parts of my job here at Datavolo. However, having a new toy to play with can easily be overshadowed by the large assortment of issues that come up when you’re moving your code from your laptop to a production...
Tutorial – How to Convert to ONNX®
Converting from Pytorch/Safetensors to ONNX® Given the advantages described in Onward With ONNX® we’ve taken the opinion that if it runs on ONNX that’s the way we want to go. So while ONNX has a large model zoo we’ve had to convert a few models by hand. Many models...
Survey Findings – Evolving Apache NiFi
Survey of long time users to understand NiFi usage Datavolo empowers and enables the 10X Data Engineer. Today's 10X Data Engineer has to know about and tame unstructured and multi-modal data. Our core technology, Apache NiFi, has nearly 18 years of development,...