Datavolo's blog
What's in Datavolo's blog? Insights and inspiration, case studies and community for AI/ML and Data Engineers. Discover what we are talking about everyday here at Datavolo!
Generative AI – State of the Market – June 17, 2024
GenAI in the enterprise is still in its infancy. The excitement and potential is undeniable. However, enterprises have struggled to derive material value from GenAI and the hype surrounding this technology is waning. We have talked with hundreds of organizations...
Secure Data Pipeline Observability in Minutes
Monitoring data flows for Apache NiFi has evolved quite a bit since its inception. What started generally with logs and processors sprinkled throughout the pipeline grew to Prometheus REST APIs and a variety of Reporting Tasks. These components pushed NiFi closer to...
How to Package and Deploy Python Processors for Apache NiFi
Introduction Support for Processors in native Python is one of the most notable new features in Apache NiFi 2. Each milestone version of NiFi 2.0.0 has enhanced Python integration, with milestone 3 introducing support for loading Python Processors from NiFi Archive...
Troubleshooting Custom NiFi Processors with Data Provenance and Logs
We at Datavolo like to drink our own champagne, building internal tooling and operational workflows on top of the Datavolo Runtime, our distribution of Apache NiFi. We’ve written about several of these services, including our observability pipeline and Slack chatbots....
Apache NiFi – designed for extension at scale
AI systems need data all along the spectrum of unstructured, structured, and multi-modal. The protocols by which these diverse types of data are both acquired and delivered are as varied as the data types themselves. At the same time data volumes and latency requirements grow ever stronger which demands solutions which scale down and up first – then out. In other words we need maximum efficiency, we can’t resort to remote procedure calls for every operation, and we need to support hundreds if not thousands of different components or tools in the same virtual machine.
Data Pipeline Observability is Key to Data Quality
In my recent article, What is Observability, I discussed how observability is crucial for understanding complex architectures and their interactions and dependencies between different system components. Data Observability, unlike Software Observability, aims to...
Streamlining Trade Finance Operations: Cleareye.ai Chooses Datavolo
In the ever-evolving landscape of trade finance, digitization and compliance automation are paramount for efficiency and regulatory adherence. Enter Cleareye.ai, a pioneering force in the industry. Their digital workbench, ClearTrade®, revolutionizes trade finance...
Building GenAI enterprise applications with Vectara and Datavolo
The Vectara and Datavolo integration and partnership When building GenAI apps that are meant to give users rich answers to complex questions or act as an AI assistant (chatbot), we often use Retrieval Augmented Generation (RAG) and want to ground the responses on...
Datavolo Announces Over $21M in Funding!
Datavolo Raises Over $21 Million in Funding from General Catalyst and others to Solve Multimodal Data Pipelines for AI Phoenix, AZ, April 2, 2024 – Datavolo, the leader in multimodal data pipelines for AI, announced today that it has raised over $21 million in...
Data Engineering for Advanced RAG: Small-to-Big with Pinecone, LangChain, and Datavolo
Data Engineering for Advanced RAG Datavolo helps data teams build multimodal data pipelines to support their organization’s AI initiatives. Every organization has their own private data that they need to incorporate into their AI apps, and a predominant pattern to do...
Collecting Logs with Apache NiFi and OpenTelemetry
Introduction OpenTelemetry has become a unifying force for software observability, providing a common vocabulary for describing logs, metrics, and traces. With interfaces and instrumentation capabilities in multiple programming languages, OTel presents a compelling...
How custom code can add security risk to enterprise AI projects and LLMs
Data teams are actively delivering new architectures to propel AI innovation at a rapid pace. In this blog, we’ll explore how Datavolo empowers these teams to accelerate while addressing the critical aspects of security, observability, and maintenance for their data...