DataPelago announced the release of new benchmarking results that highlight how its universal data processing engine, Nucleus, significantly outperforms Nvidia’s cuDF across compute-intensive workloads running on Nvidia GPUs. Unlike traditional CPU-bound systems, Nucleus seamlessly executes data tasks across heterogeneous hardware from CPUs to GPUs delivering superior performance and cost efficiency without requiring developers to change their code or infrastructure.
As organizations increasingly handle vast and complex data for ETL, business intelligence, and GenAI workloads, traditional CPU-based processing struggles to keep pace. In contrast, GPUs offer enormous parallelism and throughput, making them ideal for scaling these workloads. However, Nvidia GPUs also introduce challenges such as I/O bottlenecks and limited memory, which restrict the amount of data processed at one time. To truly maximize GPU potential and accelerate enterprise adoption, modern data engines must be optimized to balance these advantages with inherent limitations.
AI Authority Trend: SK Telecom and VAST Data Partner for Korea’s Sovereign AI Cloud with NVIDIA Blackwell GPUs
With this need in mind, Nucleus’ GPU-optimized execution layer was purpose-built to address such complexities. While cuDF has long set the bar for GPU-based data processing, it falls short in real-world scenarios like multi-key sorting and variable-length string operations. By comparison, Nucleus demonstrates clear gains across these areas, proving its capability to raise the performance roofline.
Specifically, Nucleus leverages advanced features such as improved parallel algorithms, high-speed flows for common workloads, optimized multi-column support, kernel fusions for complex expressions, and end-to-end string optimization powered by zero-copy shared memory. These innovations allow organizations to extract greater value from existing GPU investments.
The benchmark results underscore these advantages:
- Complex Expressions: Nucleus runs project operations up to 10.5x faster, filter operations up to 10.1x faster, and aggregate operations up to 4.3x faster than cuDF.
- Variable-Length Strings: In hash join operations, Nucleus achieves up to 38.6x higher throughput for smaller strings and up to 4x faster performance for larger strings. Hash aggregates show improvements of 3.8x, while Top-K operations improve by 5.9x.
- Multi-Column Support: For Top-K operations involving multiple columns, Nucleus delivers up to 8.2x faster performance compared to cuDF.
AI Authority Trend: Computex 2025: Intel Unveils New GPUs for AI and Workstations
Highlighting the significance of this breakthrough, Rajan Goyal, CEO of DataPelago, said: “While organizations deal with a tsunami of complex data, fortunately accelerated hardware like GPUs have become more readily available in today’s cloud environments. To take full advantage of the performance benefits possible with accelerated hardware, new approaches and non-linear thinking are required. We founded DataPelago to apply this non-linear thinking and create a new data processing standard for the accelerated computing era so that companies can overcome performance, cost and scalability limitations. These latest benchmark results are an example of how DataPelago is continuing to push this new standard forward.”
By setting a new benchmark for GPU-powered data processing, DataPelago positions Nucleus as a next-generation engine for enterprises seeking scalable, cost-efficient, and high-performance data solutions.
AI Authority Trend: Acer Unveils Nitro Gaming PCs with NVIDIA GeForce RTX 50 Series GPUs
To share your insights, please write to us at sudipto@intentamplify.com
