Select Page

Custom code adds risk to the enterprise

Data teams are actively delivering new architectures to propel AI innovation at a rapid pace. In this blog, we’ll explore how Datavolo empowers these teams to accelerate while addressing the critical aspects of security, observability, and maintenance for their data pipelines. We’ll discuss the risks associated with custom code within enterprises and the alternative approach of using low-code platforms like Datavolo, which can mitigate certain risks by transferring them to the software vendor, such as ensuring secure supply chains for dependencies. This post will outline Datavolo’s emphasis on pipeline maintainability and security, along with how our low-code platform facilitates rapid time-to-value through an extensive array of out-of-the-box processors and blueprints for multimodal data pipelines for AI.

Deleting Code

Experienced software engineers often champion code deletion for valid reasons. Having less code means a reduced surface area that requires maintenance, security measures, reliability checks, documentation, and testing. These tasks are crucial for software, Site Reliability Engineering (SRE), and data teams to effectively manage a code base and its associated data pipelines. Additionally, as businesses evolve, the software and data abstractions reflecting the business must evolve accordingly, presenting an ongoing challenge.

Now, deleting code can only be a good thing if the business is still able to achieve whatever the code was intended to enable in the first place! Let’s draw a distinction between the business’s own custom code and their vendors’ code and services–from which the business can derive value. In finance, there is an axiom that risk cannot be destroyed, only transferred. In a sense, businesses pay software vendors to transfer certain risks and burdens to them–the risk of insecure software, the risk of low-quality code, the risk of unmaintained code, and more.

When it comes to ensuring software security, contemporary development practices for application security play a pivotal role. At Datavolo, we offer comprehensive Software Bills of Material (SBOMs) for all our deployments, including dependencies and extensions integrated into our platform. Utilizing Oxeye, Datavolo runtimes can even identify and alert users about insecure dependencies in running code, including extensions.

Careful consideration of non-functional aspects like security, scalability, flexibility, and observability is often overlooked when prioritizing new code delivery for urgent business needs. As the code base expands, the repercussions of not adhering to best practices become more significant. In our experience, a large number of engineering challenges stem directly from poorly-written custom code. In our experience, a large number of engineering escalations are attributed directly to poorly-written, custom code. Ideally, the code you don’t write is the code where your vendor has found a best practice and served it up to you in their service or library!

Technical Debt

Even well-written code will deteriorate in quality over time when not maintained. This is akin to a second law of thermodynamics for code: software must evolve alongside the business and surrounding systems, or it will degrade. Unmaintained code and legacy architectures often contribute to technical debt, a long-term maintenance burden that consumes engineering resources, thereby impeding team velocity.

The key takeaway is that enterprises  must balance time-to-value with technical debt. Most software tools that are sold to the enterprise promise higher velocity and reduction of time-to-value, but what hangs in the balance is often massive technical debt, and shadow IT projects that are spawned as a result of frustration from the business. This can result in substantial spending to maintain legacy code and services, stifling innovation.

The majority of a code’s lifespan occurs after its initial creation. In large organizations, many engineers spend a significant portion of their time grappling with legacy codebases, reviewing and rectifying low-quality code written years ago. While time-to-value is paramount during initial delivery, ongoing maintenance and reliable service operation become predominant over its lifespan.

The Alternative

Instead of crafting custom code for new data engineering applications, users can opt for established data engineering platforms and to collaborate with vendors that can inventory important risks. At Datavolo, our team has been assisting data engineers in solving complex problems within the Apache NiFi community for almost a decade. We’ve curated a set of patterns and best practices, offering them as processors and templates to help data teams achieve their goals. For building multimodal data pipelines, we provide engineers with over 300 processors for extracting, chunking, transforming, and loading multimodal data for AI use cases. Alongside being secure, scalable, and user-friendly, Datavolo offers flexibility to seamlessly swap APIs and modify transformations, sources, destinations, and models. Datavolo users can efficiently reuse modular code, fostering collaboration and preventing redundant effort.

Datavolo is a platform equipped with a wide range of out-of-the-box processors and patterns for implementing data engineering pipelines for AI use cases. Our aim at Datavolo is to become the trusted partner capable of assuming risks associated with insecure software, low-quality code, and unmaintained code. We welcome the opportunity to establish that trust with your organization. Please don’t hesitate to reach out if you’d like to discuss further!

Top Related Posts

Data Pipeline Observability is Key to Data Quality

In my recent article, What is Observability, I discussed how observability is crucial for understanding complex architectures and their interactions and dependencies between different system components. Data Observability, unlike Software Observability, aims to...

Building GenAI enterprise applications with Vectara and Datavolo

The Vectara and Datavolo integration and partnership When building GenAI apps that are meant to give users rich answers to complex questions or act as an AI assistant (chatbot), we often use Retrieval Augmented Generation (RAG) and want to ground the responses on...

Datavolo Announces Over $21M in Funding!

Datavolo Raises Over $21 Million in Funding from General Catalyst and others to Solve Multimodal Data Pipelines for AI Phoenix, AZ, April 2, 2024 – Datavolo, the leader in multimodal data pipelines for AI, announced today that it has raised over $21 million in...

Collecting Logs with Apache NiFi and OpenTelemetry

Introduction OpenTelemetry has become a unifying force for software observability, providing a common vocabulary for describing logs, metrics, and traces. With interfaces and instrumentation capabilities in multiple programming languages, OTel presents a compelling...

Datavolo Architecture Viewpoint

The Evolving AI Stack Datavolo is going to play in three layers of the evolving AI stack: data pipelines, orchestration, and observability & governance. The value of any stack is determined by the app layer, as we saw with Windows, iOS, and countless other...

ETL is dead, long live ETL (for multimodal data)

Why did ELT become the most effective pattern for structured data? A key innovation in the past decade that unlocked the modern data stack was the decoupling of storage and compute enabled by cloud data warehouses as well as cloud data platforms like Databricks. This...

Seven Strategies for Securing Data Ingest Pipelines

Introduction Information security is an elusive but essential quality of modern computer systems. Implementing secure design principles involves different techniques depending on the domain, but core concepts apply regardless of architecture, language, or layers of...