Data teams are actively delivering new architectures to propel AI innovation at a rapid pace. In this blog, we’ll explore how Datavolo empowers these teams to accelerate while addressing the critical aspects of security, observability, and maintenance for their data pipelines. We’ll discuss the risks associated with custom code within enterprises and the alternative approach of using low-code platforms like Datavolo, which can mitigate certain risks by transferring them to the software vendor, such as ensuring secure supply chains for dependencies. This post will outline Datavolo’s emphasis on pipeline maintainability and security, along with how our low-code platform facilitates rapid time-to-value through an extensive array of out-of-the-box processors and blueprints for multimodal data pipelines for AI.
Deleting Code
Experienced software engineers often champion code deletion for valid reasons. Having less code means a reduced surface area that requires maintenance, security measures, reliability checks, documentation, and testing. These tasks are crucial for software, Site Reliability Engineering (SRE), and data teams to effectively manage a code base and its associated data pipelines. Additionally, as businesses evolve, the software and data abstractions reflecting the business must evolve accordingly, presenting an ongoing challenge.
Now, deleting code can only be a good thing if the business is still able to achieve whatever the code was intended to enable in the first place! Let’s draw a distinction between the business’s own custom code and their vendors’ code and services–from which the business can derive value. In finance, there is an axiom that risk cannot be destroyed, only transferred. In a sense, businesses pay software vendors to transfer certain risks and burdens to them–the risk of insecure software, the risk of low-quality code, the risk of unmaintained code, and more.
When it comes to ensuring software security, contemporary development practices for application security play a pivotal role. At Datavolo, we offer comprehensive Software Bills of Material (SBOMs) for all our deployments, including dependencies and extensions integrated into our platform. Utilizing Oxeye, Datavolo runtimes can even identify and alert users about insecure dependencies in running code, including extensions.
Careful consideration of non-functional aspects like security, scalability, flexibility, and observability is often overlooked when prioritizing new code delivery for urgent business needs. As the code base expands, the repercussions of not adhering to best practices become more significant. In our experience, a large number of engineering challenges stem directly from poorly-written custom code. In our experience, a large number of engineering escalations are attributed directly to poorly-written, custom code. Ideally, the code you don’t write is the code where your vendor has found a best practice and served it up to you in their service or library!
Technical Debt
Even well-written code will deteriorate in quality over time when not maintained. This is akin to a second law of thermodynamics for code: software must evolve alongside the business and surrounding systems, or it will degrade. Unmaintained code and legacy architectures often contribute to technical debt, a long-term maintenance burden that consumes engineering resources, thereby impeding team velocity.
The key takeaway is that enterprises must balance time-to-value with technical debt. Most software tools that are sold to the enterprise promise higher velocity and reduction of time-to-value, but what hangs in the balance is often massive technical debt, and shadow IT projects that are spawned as a result of frustration from the business. This can result in substantial spending to maintain legacy code and services, stifling innovation.
The majority of a code’s lifespan occurs after its initial creation. In large organizations, many engineers spend a significant portion of their time grappling with legacy codebases, reviewing and rectifying low-quality code written years ago. While time-to-value is paramount during initial delivery, ongoing maintenance and reliable service operation become predominant over its lifespan.
The Alternative
Instead of crafting custom code for new data engineering applications, users can opt for established data engineering platforms and to collaborate with vendors that can inventory important risks. At Datavolo, our team has been assisting data engineers in solving complex problems within the Apache NiFi community for almost a decade. We’ve curated a set of patterns and best practices, offering them as processors and templates to help data teams achieve their goals. For building multimodal data pipelines, we provide engineers with over 300 processors for extracting, chunking, transforming, and loading multimodal data for AI use cases. Alongside being secure, scalable, and user-friendly, Datavolo offers flexibility to seamlessly swap APIs and modify transformations, sources, destinations, and models. Datavolo users can efficiently reuse modular code, fostering collaboration and preventing redundant effort.
Datavolo is a platform equipped with a wide range of out-of-the-box processors and patterns for implementing data engineering pipelines for AI use cases. Our aim at Datavolo is to become the trusted partner capable of assuming risks associated with insecure software, low-quality code, and unmaintained code. We welcome the opportunity to establish that trust with your organization. Please don’t hesitate to reach out if you’d like to discuss further!