
With Augmented Memory Grid, Weka Challenges Old Guard Thinking
A wonderful thing about the GTC conference is that there are pockets of news to unpack across the data center computing landscape, and one such story is WEKA’s announcement of a new Augmented Memory Grid within their WEKA Data Platform software integration with NVIDIA accelerated computing. It’s these kinds of stories that get our attention at TechArena, as they demonstrate shifts in the computing landscape that are often overlooked.
To understand just what WEKA has delivered, we have to look at a broader trend going on within computer interface advancement, those industry standards that connect components in and across systems. When we look at the past five years in Ethernet speeds, we’re seeing data rates climb from 25 Gb to 40, 100, 200, 400 Gb and beyond. In a similar timeframe, PCIe has moved from 64 GB/s to a forecasted 256 GB/s with 7.0 (with influence on future generations of NVMe transfer rates as well), and the upcoming introduction of MR-DIMMs will introduce increased bandwidth and transfer rates – but all of these changes have not advanced as quickly as Ethernet.
What does this represent? We’re advancing data movement in the platform, but the thought that keeping data local vs transferring data across the network may be antiquated, at least at this moment of our architectural paradigm.
WEKA – an expert in managing distributed data – has seized on this opportunity with Persistent Memory Grid support. What the technology does is tap standard NVMe drives for fast data capacity delivered at surprisingly low latency, forming a far memory tier for accelerated inference clusters. This augmented memory grid, being based on NVMe, offers persistence. Don’t get distracted by that persistent memory name. This is not 3DXPoint or Optane, just standard NAND-based storage operating at fantastic speeds.
WEKA claims a 3x capacity improvement to existing designs for large model support, and this is important when you consider the longer context windows needed to support functions like agentic computing and its inherent expanded autonomous decision-making. Some performance claims offered to support the technology included improvement of first token delivery by 41x when processing 105,000 tokens and a savings on the cost of token throughput by 24%. Those claims are significant when considering the scale of infrastructure deployment and are just another example of great engineering delivering pragmatic value to customers.
A wonderful thing about the GTC conference is that there are pockets of news to unpack across the data center computing landscape, and one such story is WEKA’s announcement of a new Augmented Memory Grid within their WEKA Data Platform software integration with NVIDIA accelerated computing. It’s these kinds of stories that get our attention at TechArena, as they demonstrate shifts in the computing landscape that are often overlooked.
To understand just what WEKA has delivered, we have to look at a broader trend going on within computer interface advancement, those industry standards that connect components in and across systems. When we look at the past five years in Ethernet speeds, we’re seeing data rates climb from 25 Gb to 40, 100, 200, 400 Gb and beyond. In a similar timeframe, PCIe has moved from 64 GB/s to a forecasted 256 GB/s with 7.0 (with influence on future generations of NVMe transfer rates as well), and the upcoming introduction of MR-DIMMs will introduce increased bandwidth and transfer rates – but all of these changes have not advanced as quickly as Ethernet.
What does this represent? We’re advancing data movement in the platform, but the thought that keeping data local vs transferring data across the network may be antiquated, at least at this moment of our architectural paradigm.
WEKA – an expert in managing distributed data – has seized on this opportunity with Persistent Memory Grid support. What the technology does is tap standard NVMe drives for fast data capacity delivered at surprisingly low latency, forming a far memory tier for accelerated inference clusters. This augmented memory grid, being based on NVMe, offers persistence. Don’t get distracted by that persistent memory name. This is not 3DXPoint or Optane, just standard NAND-based storage operating at fantastic speeds.
WEKA claims a 3x capacity improvement to existing designs for large model support, and this is important when you consider the longer context windows needed to support functions like agentic computing and its inherent expanded autonomous decision-making. Some performance claims offered to support the technology included improvement of first token delivery by 41x when processing 105,000 tokens and a savings on the cost of token throughput by 24%. Those claims are significant when considering the scale of infrastructure deployment and are just another example of great engineering delivering pragmatic value to customers.