The LF AI & Data Foundation, the premier organization supporting open source innovation in artificial intelligence and data under the Linux Foundation, announced the launch of the Vortex Project: an open, extensible columnar format that bridges the gap between cloud storage and heterogeneous compute, handling data seamlessly across memory, disk (file format), and network (IPC format) while maintaining compression throughout.
Contributed to LF AI & Data as a new Incubation-stage project by SpiralDB, Vortex joins LF AI & Data with contributions and support from Microsoft, Snowflake, Palantir, NVIDIA, and other industry leaders, signaling broad industry alignment around the need for next-generation storage infrastructure.
AI Authority Trend: CapStorm Rolls Out CapStorm: AI A Game-Changer for Salesforce Data Access
Vortex is purpose-built as the foundational storage format for modern data systems backed by object storage and is based on the latest compression research. Recent public validation includes the Technical University of Munich’s (TUM) database group calling Vortex the “cutting edge,” and Microsoft demonstrating 30% runtime reductions when running traditional Spark workloads with Vortex in Apache Iceberg. Unlike Apache Parquet and other formats that were built only for structured analytics performed on CPUs, Vortex is optimized to also support multimodal data, wide schemas, GPU-based training workloads, and high performance reads from cloud object stores such as S3 and GCS.
“Storage and compute have always been fungible, but data processing is no longer only about moving data from a disk into the CPU. Modern GPUs can consume terabits per second, but legacy storage formats are a huge bottleneck – they effectively require CPUs to sit in the middle, decompressing data before passing it on. We created Vortex to support this next generation of workloads, while dramatically improving performance for traditional data systems at the same time,” said Will Manning, co-founder and CEO at SpiralDB. “By contributing Vortex to LF AI & Data, we’re excited to foster a broader community. What excites me most is that Vortex gives the entire community a platform to innovate on storage – researchers can contribute new compression techniques, companies can optimize it for their workloads, and everyone can benefit from shared advances.”
AI Authority Trend: Immuta Launches Industry-First AI Capabilities to Speed Up Enterprise Data Access
Designed for speed, simplicity and composability, Vortex provides:
- State-of-the-art performance across every key metric: 100x faster random access reads, 10-20x faster scans, and 5x faster writes compared to Apache Parquet, while maintaining similar compression ratios.
- An extensible architecture designed to facilitate research and rapidly incorporate new compression techniques, ensuring Vortex remains state-of-the-art as the field evolves.
- First-class, native integrations with many other key open source data tools across the Composable Data Stack, including Apache Arrow, Apache DataFusion, DuckDB, Apache Spark, and (soon) Apache Iceberg.
- The first storage format designed for direct GPU decompression, eliminating CPU bottlenecks by loading training data straight from object storage into GPU memory.
“Vortex tackles one of the most overlooked performance problems in AI infrastructure: how slow and cumbersome it is to access training data from the cloud,” said Mark Collier, general manager of AI & Infrastructure at the Linux Foundation. “This project represents a huge step forward for scalable, AI-native data pipelines – and we’re thrilled to welcome it into the LF AI & Data community.”
Vortex has been initiated with contributions from leading researchers and engineers across academia and industry, and welcomes broad participation from the global open source community.
AI Authority Trend: MindsDB Adds Federated Data Access to Model Context Protocol, Boosting AI Innovation
Source – PR Newswire
To share your insights, please write to us at sudipto@intentamplify.com





