Avesha’s Smart Scaler introduces a Reinforcement Learning-based intelligent scaling solution for AI workloads, delivering unprecedented performance gains and cost efficiencies.

Avesha’s Smart Scaler, part of its Elastic AI Services Suite, for Inference Endpoint scaling and GPU/CPU resource optimization, delivering up to 3x performance gains and reducing inference latency by 75%.

Avesha, a Gartner Cool Vendor and a leader in AI-driven GPU/CPU orchestration, today announced groundbreaking results from its latest benchmarking of Smart Scaler, which dynamically scales GPU resources in proportion to traffic, delivering up to 3x improvement in processing efficiency, 85% larger batch sizes, and 70% higher token throughput per batch for the llambda3-8B model on the Huggingface/TGI framework.. In addition, Smart Scaler demonstrated 2x improvement for the same model on the VLLM framework over TGI and a further 1.5x boost coming from Smart Scaler alone. This enables enterprises to scale AI workloads seamlessly across multiple clusters and cloud environments without overprovisioning or wasted compute.

Latest AI Trends: New WSO2 API Management Offerings Harness AI for Greater Productivity and Governance

Smart Scaler, an advanced AI-powered predictive scaling mechanism, dynamically scales resources based on workload demand. The benchmarking results highlight key advantages for AI inferencing and training:

  • Higher Instantaneous Throughput: Processed 3X more tokens in a burst enabling faster AI inferencing using the HuggingFace/TGI framework..
  • Reduced Latency: AI model inference latency dropped from 8 seconds to 2 seconds.
  • Improved Throughput for Industry-Leading AI Models: Llama3-8B workloads had a 31% increase in token throughput, while DeepSeek 7B had a 13.5% boost.

Driving AI Innovation with EGS

For exciting research companies like InpharmD that combine pharmacist expertise with AI to provide state-of-the-art, evidence-based drug information, having the right tools to optimize research and reduce costs is essential.

“With Avesha’s Elastic AI Services we’re able to optimize our GPU workloads dynamically, ensuring we maximize performance without overpaying for underutilized resources,” said Tulasee Rao Chintha, CTO, InpharmD. “This allows us to scale efficiently while keeping our research and operational costs predictable and manageable.”

Latest AI Trends: New WSO2 Integration Offerings Maximize the Power of AI 

Benchmarking Results Validate EGS Performance

“The benchmarking results speak for themselves—Avesha is setting a new standard for AI workload efficiency for LLMs as well as scientific or specialized models ,” said Raj Nair, Founder and CEO at Avesha. Avesha improves interactive performance by 85% and triples overall efficiency, making high-performance AI more accessible and cost-effective for enterprises and startups alike”

Pay-per-work-output pricing

Avesha’s innovative high-performance scaling solution enables GPU Cloud Providers to offer pay-per-work-output pricing instead of traditional GPU time-based pricing, significantly reducing costs and making AI development more accessible. This incredible performance improvement creates the opportunity for very competitive pay-per-work-output made feasible by sharing higher performance GPUs – a higher throughput makes the price per work-output lower than a lower priced but slower GPU. .

“With Avesha, startups no longer need to pay for idle GPU hours,” added Raj Nair, Founder and CEO at Avesha. “Now, they can only pay for actual AI workloads processed, making it a game-changer for companies creating innovative AI applications while maintaining cost efficiency. We are introducing a FREE Tier for our GPU services available through OCI.”

A Hybrid Pricing Model That Maximizes Value

EGS introduces a flexible pricing strategy designed to optimize costs while maintaining high-performance AI scaling:

  • Value-Based Pricing – Customers pay for actual performance gains rather than static GPU time.
  • On Demand/Spot Pricing – Leverages unused GPU capacity for cost savings.
  • Tiered Commitments – Offers long-term cost reductions for enterprise-scale AI workloads.
  • Auto-Scaling Capabilities – Dynamically adjusts GPU allocation based on real-time demand.

With this approach, GPU cloud providers also benefit by optimizing resource allocation and monetizing idle capacity efficiently.

Latest AI Trends: BGV Unveils A Comprehensive Playbook for Founders Building Human-Centric AI Startups

Source: prweb

To share your insights, please write to us at sudipto@intentamplify.com