Break Up Your Iced Out Data with Datavolo & Snowflake's Cortex AI
Pairing Datavolo with Snowflake allows enterprises to store all of their structured and unstructured data in one place enabling success with AI.
Datavolo provides the industry’s only enterprise proven platform for Generative AI data pipelines. Generative AI applications are uniquely dependent upon unstructured data – whether it’s for model training, RAG applications, or agentic architectures. Datavolo solves for all scenarios requiring the secure and continuous ingestion of large scale unstructured data.
The Snowflake Platform provides a single, fully managed platform that powers the AI Data Cloud. Snowflake securely connects businesses globally across any type or scale of data to productize AI, applications and more in the enterprise. Snowflake Cortex API allows enterprises to build generative AI applications with fully managed LLMs and chat with your data service.
While Snowflake, integrated with partner tools, has had a long history managing the secure and scalable ingestion of structured data, there remains a significant challenge with regards to the continuous and automated secure ingestion of unstructured data. Datavolo solves that challenge.
Approach
Currently, Snowflake works with several vendors when dealing with structured, row-oriented data. These solutions are very good at moving large amounts of structured data, but largely incapable of moving and processing unstructured data. At Datavolo we are able to handle arbitrarily large data and allows enterprises the optionality to plug and play best of breed and custom components into our GenAI data pipelines.
For any multimodal data that exists outside of Snowflake –complex documents, audio recordings, images, and more– Datavolo ingests, transforms, and persists these data to Snowflake tables for integration into AI applications. For example, transformations for complex documents include layout detection, table parsing, enrichment, cleansing, and semantic chunking. Once these chunks are persisted to Snowflake tables, Datavolo invokes Cortex embedding models so that embeddings for these chunks are available for later retrieval by AI apps.
Furthermore, when the context of these documents change –for example authorization metadata– in the source systems like Sharepoint or Google Drive, Datavolo will capture these changes and persist them in Snowflake. In the case of authorization metadata, this will assure that access control is enforced at the AI app layer.
Datavolo will also make use of Arctic LLMs for key pre-processing steps within the data pipeline, for example, summarization of chunks and description of images detected within documents. Datavolo is a containerized platform that will run fully within SPCS, providing assurances to customers regarding their data security and privacy and driving compute and storage for the Snowflake platform.
Conclusion
“The combination of Snowflake and Datavolo will allow for the rapid development of Generative AI applications that enterprises can trust. Ensuring the end-to-end security, governance, and quality of data from source through AI will allow enterprises to confidently leverage their most valuable data to propel innovation and business transformation” says Joe Witt, CEO, Datavolo.