Explore the cutting edge of computing from data center to edge including solutions unlocking the AI pipeline, all backed by Solidigm's leading SSD portfolio.



In the race for advancing technology, time, funding, and attention are often dedicated to immediately monetizable applications. While industry roadmaps certainly drive technological advancement, basic science, which forwards our fundamental understanding of the universe, can create breakthrough findings with wide-reaching applications and effects.
My recent conversation with Silvia Zorzetti from Fermilab and Solidigm’s Jeniece Wnorowski revealed how such research into the convergence of high-energy physics and quantum technology is creating outstanding developments for quantum computing.
As a U.S. particle accelerator laboratory, Fermilab has spent decades perfecting superconducting cavities that accelerate particle beams to near light speed. Through years of study, researchers at the lab have identified several sources of noise that can make these superconducting cavities less efficient, and they have worked to eliminate those sources of loss.
In 2017, researchers began studying these same cavities at the quantum level. As Silvia explained, at this single photon level, the energy is much lower, which means there are new potential sources of loss compared to the higher energy levels. “We can focus on the basic science and the basic understanding of those mechanisms,” Silvia explained.
At the same time, Fermilab is finding ways to transform superconducting cavities to be efficient for quantum computing. By placing qubits inside the cavities, Fermilab has achieved 20 milliseconds of coherence. That coherence time represents a critical advance in the typical rapid decay time of quantum information. “And we know that it is possible to achieve more coherence,” Silvia said.
Quantum computers won’t replace classical computers for all tasks. The technology excels at specific problems, including the Shor algorithm for prime number factorization and Grover’s algorithm, a quantum algorithm for unstructured search. As the threats to digital security grow, the Shor algorithm is of extreme interest because of its efficient ability to factorize prime numbers, a process that is very important for cryptography. “This means that we can have systems that are very secure because they cannot be broken by someone else with a more advanced cryptography system, let’s say an alien, because here on Earth, we all have to comply with the quantum mechanics rule,” she explained.
Beyond these practical applications, quantum computing excels for quantum simulations for field theories, which involve many interacting components. “We have these huge many body problems in which there are so many entities. We know how to describe them if they are alone,” Silvia explained. “But then when they start to interact to each other, it becomes a very complex model. So, we know that quantum computing is very good for those kinds of applications.”
While quantum computing has seen great progress, challenges remain to realizing its full potential. Silvia identified two types of hurdles to be overcome: “engineering problems,” where technical challenges are understood and need to be addressed, and problems where more fundamental research is required.
“Those are mainly between the interconnects,” she said. “The interconnects are needed for scaling quantum computing…and in particular, interconnects with very low losses.” Quantum information is a weak signal, so research to find solutions that will prevent the loss of this information remains a priority.
Quantum error correction presents another major hurdle. Quantum states are inherently fragile, and errors can arise in quantum information due to decoherence and other issues. The community is developing algorithmic techniques that make computations more robust to noise, while simultaneously working on hardware improvements.
The environmental sensitivity that creates problems for quantum computing is actually an advantage for quantum sensing, which is already delivering results. Fermilab is leveraging this sensitivity to detect dark matter, dark photons, and axions. It is also studying radiation, like the gamma ray, to understand better how these affect quantum computers, and how quantum computers could be built that are robust to radiation.
Fermilab’s approach to quantum computing demonstrates how domain expertise from one field can catalyze breakthrough innovations in another. By leveraging decades of superconducting cavity development for particle accelerators, the laboratory has achieved amazing quantum coherence times. The pursuit of fundamental knowledge about how materials behave at the quantum level has yielded practical breakthroughs that are now accelerating the entire field forward.
For those interested in learning more, Fermilab hosts regular symposiums and outreach programs, including their Quantum 101 track designed for non-experts. Learn more about Fermilab’s quantum research at fnal.gov.

As organizations deploy AI models at scale, a new set of challenges has emerged around operational efficiency, developer velocity, and infrastructure optimization. A recent conversation with Solidigm’s Jeniece Wnorowski and Brennen Smith, head of engineering at Runpod, revealed how cloud platforms are rethinking the entire AI stack to help developers move from concept to production in minutes rather than months.
Runpod operates 32 data centers globally, providing graphics processing unit (GPU)-dense compute infrastructure for small companies and enterprises building and deploying AI systems. This service is crucial considering the economics of modern GPUs, where a single system with 8 GPUs can cost hundreds of thousands of dollars. Runpod understands that the compute hardware is only part of the equation. “Storage and networking…glue these systems together,” Brennen said. “By ensuring that there’s high quality storage paired up with these GPUs….we have been able to show that this results in a markedly better experience.”
On top of this, the company provides a sophisticated software stack that allows developers to go from their idea to production in minutes, across training and inference use cases. The goal is to “Make it so developers and AI researchers can focus on what they do best, which is actually delivering value to their customers,” Brennen said.
The ability to rely on optimized infrastructure is becoming even more important as organizations move from training to deployment. Smith likened training infrastructure to traditional business capital expenditures, noting that the high up-front costs see a return on investment over a long period of time. In inferencing, organizations deal with ongoing operational realities, grappling with scaling, efficiency, and delivering value to customers daily. As a result, Runpod has engineers specifically looking at inference optimization. With the rise of AI factories, “How well these systems are run from an operational excellence perspective will dictate the winners and losers,” Brennen said. “You run an inefficient factory, you’re out.”
One of the most important insights from our conversation addressed storage, which is now seen as a hidden bottleneck in AI. Brennen recounted how his engineering team recently investigated Docker image loading times. While unrelated to a specific large language model (LLM) activity, developers flag issues like slow loading times as hurting their overall workflow. This gets in the way of things needing “to magically just work.”
For the solution, Brennen reiterated that storage is what glues the system together. “What we have found is every time, as long as we are optimizing our storage, we are able to make the data move faster,” he said. And when data movement is optimized, entire development cycles accelerate.
Runpod recently launched ModelStore, a feature in public beta that leverages NVMe storage and global distribution to make AI models seem to appear “like magic.” What previously took minutes or hours now happens seamlessly, compressing development iteration cycles. For organizations under pressure to deliver AI capabilities quickly, these time savings compound into significant competitive advantages.
Brennen emphasized that faster developer cycles enable teams to fail fast and iterate more effectively to deliver successful outcomes. When CTOs receive mandates to implement AI, their success depends on giving teams tools that accelerate innovation rather than creating additional friction.
Looking ahead, Brennen identified the convergence of infrastructure and software as a transformative trend. The goal is to enable code to self-declare and automatically establish the infrastructure required to run it, freeing developers from thinking about infrastructure so they can focus on their code and creating value aligned to business logic. “Anything we can do to make it even easier to get global distribution, that’s a hugely powerful paradigm,” he said.
Runpod’s emphasis on developer experience demonstrates that sustainable AI deployment requires thinking holistically about the entire infrastructure stack. The company’s focus on making complex infrastructure feel magical to developers reflects a broader industry recognition that reducing friction accelerates innovation.
As AI moves from experimentation to production deployment, organizations that optimize for developer velocity and operational efficiency will have a significant advantage from their ability to accelerate time to value. For organizations evaluating AI infrastructure partners, Runpod’s approach offers a model that balances performance, scalability, and ease of use.
Connect with Brennen Smith on LinkedIn to continue the conversation, or visit Runpod’s website and active Discord community to explore how their platform might support your AI initiatives.

Fermilab’s Silvia Zorzetti explains how quantum computing and sensing are evolving, where they outperform classical systems, and what’s next for the field.

Hedgehog CEO Marc Austin joins Data Insights to break down open-source, automated networking for AI clusters—cutting cost, avoiding lock-in, and keeping GPUs fed from training to inference.

Rose-Hulman Institute of Technology shares how Azure Local, AVD, and GPU-powered infrastructure are transforming IT operations and enabling device-agnostic access to high-performance engineering software.

By delivering performance with one-sixth the hardware footprint of competitors, the software-defined storage startup aims to make AI experimentation affordable at scale.
Organizations building out AI infrastructure have rapidly matured from struggling to understand GPU requirements to demanding scalable, cost-effective solutions that can grow with their ambitions. At the recent OCP Global Summit, I spoke with Roger Cummings, CEO of PEAK:AiO, and Solidigm’s Jeniece Wnorowski about how one company is tackling the infrastructure challenges that emerge as AI moves into production-scale deployments.
PEAK:AiO’s original breakthrough was software-defined AI storage that transforms commodity servers into high-performance infrastructure. “Our secret sauce very early was getting line-speed performance on a single server,” Roger said, thereby maximizing performance in the smallest possible footprint. This approach turns an ordinary server “into a rocket ship for AI” and helps organizations avoid massive deployments that consume excessive power, cooling, and rack space.
The efficiency gains are dramatic. Roger explained that competitors typically require 12 to 15 nodes to match the performance PEAK:AiO delivers with just one-sixth the infrastructure. Enabled by its close partnership with Solidigm, this density advantage translates directly into lower operational costs for power, thermal management, and physical space—critical factors as data centers face growing energy constraints.
Single-server performance solved the first wave of challenges, but today’s expanding AI applications demand the ability to scale across distributed file systems. The market is littered with proprietary solutions, so PEAK:AiO took a different path: in collaboration with Los Alamos National Laboratory, it developed an open-source parallel network file system (pNFS) built specifically for AI workloads.
Going open source aligns with industry standards and customer demands for simplicity and flexibility. Roger emphasized that the new pNFS solution “will match the performance of storage as well as the scale of the file system that people need today.” The company uses a modular framework that automatically recognizes new nodes as they’re added, delivering linear scaling for both capacity and performance. This architecture dramatically lowers the cost of experimentation and failure—an essential consideration for teams exploring new AI use cases. As Roger put it, “It doesn’t have to be cost-prohibitive to take risks and build innovation.”
PEAK:AiO’s value proposition has evolved along with the AI market itself. While large-scale training clusters once dominated the conversation, inference workloads—both in centralized facilities and at the edge—are now the primary growth driver. The company’s high-performance, scalable platform is ideally suited for both.
Roger also highlighted rising interest in federated learning, where intelligence is captured as close as possible to the data source before being rolled up into master models. PEAK:AiO’s infrastructure naturally supports these distributed architectures by enabling fast data capture and processing wherever the data is generated.
Looking ahead, Roger said, “We need less infrastructure and more success—and I think we’re a great partner to achieve that.” Future innovations from PEAK:AiO, developed in partnership with Solidigm, will create richer memory and storage tiers with deeper intelligence about AI workload patterns. This will allow automated, policy-driven movement of workloads to the optimal tier, further improving both performance and cost efficiency.
PEAK:AiO’s trajectory shows how infrastructure providers are evolving to meet AI’s real-world scaling challenges. Its focus on extreme efficiency, modular open-source architectures, and workload-aware optimization directly addresses the constraints of power, space, and budget while delivering the performance AI demands. As deployments shift from centralized training to distributed inference and federated learning, solutions that combine density with operational simplicity will become increasingly indispensable.
Learn more about PEAK:AiO’s infrastructure solutions at https://peak-aio.com or connect with the team to explore how their open-source approach can accelerate your AI initiatives.

During #OCPSummit25, Jeniece Wnorowski of Solidigm and I caught up with Jelle Slenters of RackRenew on how the firm converts retired OCP-compliant racks, servers, switches, power shelves and more into validated rack-level systems, complete with provenance, burn-in, and a joint certification label.
The cloud era taught us to think in fleets. The AI era is forcing us to think in megawatts. In between those realities sits an enormous pool of high-quality, standards-based gear that ages out of hyperscale production far faster than it ages out of usefulness. RackRenew’s thesis is simple: if we standardize the processes for take-back, test, refurbish, and certify—at scale—we can turn retired systems into ready-to-run capacity for the next wave of adopters.
That’s not a niche. As Jelle put it, the total addressable opportunity is in the “hundreds of millions,” and if we truly nail the collaboration across the industry, the upside is “beyond calculation.” The value comes from process: documented, repeatable, and reliable.
OCP is the right place to be talking about this because standards reduce entropy. Common form factors, power and management specs, and known failure modes mean you can design remanufacturing flows that aren’t bespoke for every asset. When you remove variance, you remove cost and time. When you add shared protocols, you add trust.
Enter the OEMs and platform providers. The opportunity is to co-design take-back and recert flows for entire OCP building blocks—racks, servers, switches, power shelves, and harnessing—not just components. That means shared diagnostics, firmware baselines, power/thermal tests, and a joint certification label that signals: remanufactured, validated, and backed by a warranty. That’s what moves circular gear from “nice idea” to procurement-approved infrastructure.
Two near-term landing zones stood out in our conversation:
If we get the ecosystem right, customers get predictable outcomes. And predictable outcomes are the only way circularity shows up in the production SOW.
I love a good sustainability story, but the reason this matters goes beyond sustainability into economics. In an era of equipment scarcity and grid constraints, circular supply unlocks capacity faster and cheaper. That means shorter time to deploy, lower embodied carbon, and better capex efficiency. And because OCP standards reduce integration costs, the savings aren’t swamped by engineering overhead.
Jelle’s outline for how this scales:
Do that across storage, compute, networking, and power, and the “hundreds of millions” TAM looks conservative.
Reliability comes from grading and process: rack-level power and thermal validation, network link integrity checks, server health screening, and repeatable test plans. The result is predictable service-level objectives and clear workload matching—without over-indexing on any single subsystem.
Circularity wins when it delivers new-grade outcomes. The end user shouldn’t have to adjust workloads or expectations because a rack is remanufactured. OCP standards make that possible at scale. The next step is trust infrastructure—joint labels, shared test artifacts, and warranties. RackRenew’s rack-level, certified approach makes circular capacity a practical default for new deployments, unlocking savings, faster turn-ups, and lower embodied carbon.
Learn more about RackRenew at their website.
Watch the podcast

I recently caught up with Giga Computing’s Chen Lee about what’s really changing inside AI data centers. Spoiler: it’s not just bigger racks and faster GPUs. It’s how those racks get built, where they’re assembled, and how we plan for a world where inference becomes the dominant workload pattern.
One of the threads in our discussion was Giga Computing’s push on circularity—reclaiming, sorting, and returning components to a second life. Chen was candid: yes, circular practices eliminate waste and can reduce some costs, but they’re not a pure economic play. There’s real human work in sorting and qualification. The point isn’t cost-first; it’s responsibility-first—answering “the call for Earth,” as he put it. That framing matters. At OCP, sustainability isn’t a backdrop; it’s a design constraint. And circularity is moving from “nice to have” to “show me your plan.”
Training may steal the headlines—and budgets—but inference is the business. Chen’s view is that the next expansion wave will be dominated by inference-centric racks that look and behave differently: more elastically scaled, more network-sensitive, and more tightly integrated with edge and enterprise fabrics. That opens new addressable markets across sectors—healthcare, finance, oil and gas, education, government—each with unique latency, privacy, and cost envelopes. If training is a few giant mountains, inference is a mountain range: broader, more varied, and much closer to the users who depend on it.
Giga Computing’s emphasis is modularity: building blocks that let operators configure for today’s sprint and tomorrow’s pivot. In practice, modularity shortens time-to-capacity, smooths upgrades, and lowers the operational blast radius when something changes—like a new model family, memory footprint, or accelerator ratio. The companies winning rack scale are the ones who treat integration, test, and validation as first-class products—not just a step between BOM and shipment.
A second pillar: U.S.-based assembly. Giga Computing is standing up local capability to improve lead times, reduce shipping risk, and cut carbon that comes with long logistics chains. There’s also a quality and reliability angle that’s easy to overlook: racks that don’t spend weeks getting rattled across oceans arrive with fewer transport-induced gremlins, making burn-in and final test more predictive. Chen flagged SKD approaches that enable “Assembled in America” servers—another lever for responsiveness when demand spikes and when regulatory or customer requirements insist on a regional footprint. In a supply chain still healing from 2020–2022 shocks, proximity is performance.
Chen was direct about the macro arc: AI’s disruption dwarfs the industrial revolution. That’s not hyperbole on the OCP show floor; it’s the operating assumption for everyone building AI factories. But the second clause matters: be careful about the road we go down. For vendors, that means shipping capacity with guardrails—power-aware, grid-friendly, and supportable. For operators, it means designing for efficiency, circularity, and people—because the most constrained resource in this market might be expert hands, not megawatts.
It’s clear that Giga Computing is leaning into OCP’s core ethos—openness, efficiency, and scalability—while aligning to the buyer reality of 2025: more inference, faster turns, and a stricter carbon ledger. The choice to invest in U.S. assembly is a clear signal. The focus on modular rack-scale integration is another. And the honesty about circularity’s costs—and its necessity—reads as maturity, not marketing.
But the larger takeaway is ecosystem-level: OCP’s center of gravity is shifting from “can we build it?” to “can we scale it responsibly, locally, and profitably?” As AI diffuses into every sector, vendors that align on all three tend to be better positioned to keep shipping through the next wave.
If you want to learn more, Chen points to Giga Computing’s site and his LinkedIn.
The ground is shifting under data center infrastructure—away from one-off training builds toward the day-to-day realities of inference at scale, tighter lead times, and verifiable sustainability. In that context, Giga Computing is getting more sophisticated in addressing enterprise requirements—focusing on modular rack integration, treating circularity as an engineering process, and leaning into regional assembly for predictability and QA.

The data center industry’s cooling crisis has a surprising champion: a century-old British lubricants manufacturer better known for what’s under the hood of your car than what’s inside your server rack. My recent conversation with Darren Burgess, PhD, Business Development Director for Castrol, and Solidigm’s Jeniece Wnorowski revealed how the company is leveraging decades of fluid expertise to address one of AI infrastructure’s most pressing challenges.
The transition to liquid cooling isn’t a preference but a necessity driven by the thermal characteristics of modern AI chipsets. As Darren explained, “The chipsets are just too hot to be cooled by air. Basically, if you want to try to do air cooling, the servers would need to be so separated you’d probably only get a couple of servers per rack in order to rush enough air through.” Given the premium on data center real estate and power, that approach obviously doesn’t scale.
Direct-to-chip cooling addresses this constraint by placing cold plates (essentially heat exchangers) directly against the chipsets generating the most heat. A propylene glycol solution circulates through these cold plates, drawing heat away from the processors and enabling the server densification that AI deployments demand. This approach allows operators to maximize their infrastructure investment while maintaining the vertical rack configurations they’re already familiar with.
What appears straightforward on the surface reveals significant complexity once you examine the chemistry involved. The coolant used is PG25, which contains 25% propylene glycol and 75% water. This immediately raises concerns about corrosion in systems featuring copper cold plates, iron components, and various other metals throughout the cooling distribution infrastructure.
This is where Castrol’s differentiation emerges. The critical factor isn’t the base fluid itself. “The additive pack is sort of the ‘magic dust,’ which is really where the action is,” Darren explained. It protects against corrosion while maintaining cooling efficiency. If corrosion occurs, particulates can clog the narrow channels within cold plates, some measuring as small as 50 to 100 microns. The result would be degraded cooling performance or complete system failure, forcing expensive downtime to remedy the problem.
Castrol’s expertise lies in formulating additive packages that prevent these failure modes, drawing on the same chemical engineering knowledge they’ve applied to automotive lubricants for decades. The qualification process requires demonstrating that cooling fluids won’t corrode system components, an essential step before any deployment.
Darren offered a compelling analogy to illustrate the cooling fluid’s critical role in data center infrastructure. While operators can deploy redundant cooling distribution units, backup pumps, and duplicate systems throughout the infrastructure stack, the cooling fluid itself represents a single point of failure. “It’s really like the blood,” he said. “We have two eyes, hands, and feet. We can get away with a lot of repeat, but we only have one system of blood. You really have to take care of it or you’ll shut the body down, or the data center down.”
This reality positions fluid health and maintenance as crucial operational concerns. Castrol’s value proposition extends beyond simply supplying cooling liquids to encompassing the entire lifecycle: proper installation to avoid contamination, ongoing maintenance to preserve fluid integrity, and eventual disposal when systems are decommissioned.
The liquid cooling market’s maturation has accelerated dramatically over the past year. Hyperscalers including Amazon Web Services, Microsoft, and Google have moved beyond pilot programs to large-scale deployments of direct-to-chip cooling. The conversation has shifted from questions about equipment availability to detailed technical discussions about additive packages, compatibility testing, and failure mode analysis.
This evolution reflects the industry’s growing confidence in liquid cooling as a proven technology rather than an experimental approach. Organizations are now focused on operational details: understanding how fluids interact with different materials, identifying potential points of failure before they occur in production, and standardizing deployment practices across the industry.
Castrol’s pivot from automotive lubricants to data center cooling solutions demonstrates how adjacent industry expertise can address emerging infrastructure challenges. Their experience formulating fluids for demanding thermal environments translates directly to the requirements of AI infrastructure.
As liquid cooling transitions from niche technology to mainstream deployment, organizations that partner with suppliers offering comprehensive lifecycle management, from installation through maintenance to disposal, will be better positioned to avoid costly downtime and operational disruptions. The cooling fluid may be invisible to end users, but it’s becoming as foundational to AI infrastructure as the silicon it protects.
For more information about Castrol’s data center cooling solutions, visit https://www.castrol.com/en/global/corporate/products/data-centre-and-it-cooling.html.

For more than 150 years, Valvoline has been synonymous with high-performance motor oil and racing heritage. Now, the company is applying its expertise to a very different kind of performance challenge: keeping AI data centers cool as they transition from the megawatt era into the gigawatt era.
In a recent conversation with Michael Morrison, director of new ventures at Valvoline Global Operations, and Solidigm’s Jeniece Wnorowski, I discussed how data center cooling represents a natural evolution for a company built on managing heat and performance. As Michael explained, Valvoline has actually maintained a data center presence for years, providing oils for backup power generation systems. The move into cooling solutions represents a deeper engagement with an industry facing unprecedented thermal challenges.
While rising temperatures grab headlines, Michael emphasized that density is the real challenge facing modern data centers. AI-enhanced workloads require packing more chips into the same physical space, creating concentrated heat loads that traditional air cooling cannot effectively manage. Liquid cooling enables increased density of chips per server and of servers per data center, fundamentally changing the economics of AI infrastructure deployment.
Two approaches to liquid cooling have arisen in response to this challenge: direct-to-chip cooling, and immersion cooling. Direct-to-chip cooling runs coolant through lines and cold plates to cool individual processors, and it has already moved beyond adoption into rapid growth. Major manufacturers have begun supporting this approach, and deployments have begun.
Immersion cooling, however, remains in earlier stages. In immersion cooling applications, entire servers are submerged in tanks filled with dielectric fluid. The approach allows heat to be captured from all components simultaneously. It also represents a large potential change for hyperscalers, which explains why it is still largely in proof-of-concept phase.
“They’re not used to having large open tanks sitting in their data centers,” Michael said. “So, they’re not only testing performance metrics, but understanding, ‘what is my maintenance on a server like?’ All of those things have to have operational procedure set: all the nuances of running it in a normal setting and an emergency environment.”
The key to immersion cooling is dielectric oils. Dielectric materials, like the oils Valvoline produces, are nonconductive substances for electric currents. And finding a fluid with ideal properties to enable high performance is where Valvoline Global shines.
“We’re used to testing properties in our fluids that would determine, does it conduct electricity? Does it transfer heat?” he said.
While Valvoline Global’s fluid testing capabilities form the foundation, Michael emphasized that deploying these solutions successfully requires a more comprehensive approach. When servers are immersed in dielectric oil, compatibility becomes critical across thousands of individual components. Valvoline Global works closely with data center operators to ensure their fluids are compatible with specific hardware configurations, tank materials, and operational requirements. This collaborative approach, which the company has refined over more than 150 years of customer relationships, distinguishes their market strategy from simple product provision.
Beyond performance, liquid cooling addresses sustainability concerns that are becoming critical for data center operators. By reducing or eliminating large HVAC systems required for air cooling, facilities can significantly decrease power consumption and operational expenses. Water usage can also be reduced depending on system configuration. Michael noted that liquid cooling can creates a scenario where improved cost structure and reduced environmental impact work together rather than act as competing priorities.
Valvoline Global’s entry into data center cooling represents more than a company diversifying its product portfolio. It reflects how foundational technologies from established industries are being reimagined to solve the infrastructure challenges of AI deployment. As data centers grapple with the thermal and density challenges of AI-enabled workloads, Valvoline Global’s’s combination of fluid science expertise, collaborative approach, and long history of managing high-performance applications positions them as a meaningful player in this infrastructure evolution. For organizations planning liquid cooling deployments, the lesson is clear: success depends not just on the technology itself but on the partnerships and compatibility testing that ensure reliable, long-term operation.
Learn more about Valvoline Global’s data center cooling solutions at their website, valvoline.com, where they provide detailed technical resources on liquid cooling technologies. Connect with Michael Morrison on LinkedIn to continue the conversation about thermal management innovation.

Inside Equinix and Solidigm’s playbook for turning data centers into adaptive, AI-ready platforms that balance sovereignty, performance, efficiency, and sustainability across hybrid multicloud.

During the recent OCP Summit in San Jose, Jeniece Wnorowski and I sat down with Eddie Ramirez, vice president of marketing at Arm, to unpack how the AI infrastructure ecosystem is evolving—from storage that computes to chiplets that finally speak a common language—and why that matters for anyone trying to stand up AI capacity without a hyperscaler’s deep pockets.
Two years ago at OCP Global, Arm introduced Arm Total Design—an ecosystem dedicated to making custom silicon development more accessible and collaborative. Fast-forward to this year’s conference, and the program has tripled in participants, with partners showing real products both in Arm’s booth and in the OCP Marketplace. That traction sets the backdrop for Arm’s bigger news: an elevated role on OCP’s Board of Directors and the contribution of its Foundational Chiplet System Architecture (FCSA) specification to the community.
Why should operators, builders, and CTOs care? Because the cost and complexity of building AI-tuned silicon is still brutal. Depending on the packaging approach—think advanced 3D stacks—Eddie put the total bill near a billion dollars. That number alone has kept bespoke designs out of reach for all but a few. The chiplet vision changes the calculus: assemble best-of-breed dies from different vendors rather than funding a monolith. But the promise only holds if those chiplets interoperate cleanly across more than just a physical link.
That’s the gap FCSA endeavors to fill. It goes beyond lane counts and bump maps to define how chiplets discover each other, boot together, secure the system, and manage the data flows between dies. If it works as intended inside OCP, we are an inch closer to a real chiplet marketplace—mix-and-match components with predictable integration, not months of bespoke glue logic.
Ecosystem is the keyword here, and not just for compute. Eddie spoke to collaborations across the platform, including within storage, as a case in point. Storage is stepping into the AI critical path, not simply holding training corpora but participating in the performance equation. AI at scale turns every subsystem into a performance domain. If data can be prepped, staged, filtered, or lightly processed closer to where it lives, you free up precious GPU cycles and avoid starving accelerators. Expect to see more of that thinking show up across NICs, DPUs, and smart memory tiers.
There’s also a geographic angle that’s difficult to ignore. Several of the newest Arm Total Design partners hail from Korea, Taiwan, and other regions actively cultivating their own semiconductor ecosystems. That matters for resilience and supply, but also for innovation velocity. When the entry ticket to custom silicon comes down, you get more specialized parts serving narrower, high-value slices of AI workloads—think tokenizer offload, retrieval augmentation helpers, or secure inference enclaves woven into the package fabric.
Underneath the product updates is a posture shift: lead with others. The Arm Total Design ecosystem is designed for co-design, not solo heroics, acknowledging that no one player can keep up with AI’s pace alone. OCP, with its bias toward open specs and reference designs that ship, is a natural forcing function. Putting FCSA into that process doesn’t just rack up community points; it pressures the spec to survive real-world scrutiny—power budgets, thermals, board constraints, and the ugly details that tend to eat elegant diagrams for breakfast.
If you’re operating AI clusters today, you’re already feeling the ripple effects. Racks are transitioning from steady-state power draw to spiky, sub-second pulses. Data movement is the enemy. The “box-first” era is fading into a rack- and campus-first design ethic where each layer—power delivery, cooling, storage, fabric, memory, compute—must flex in concert. Chiplets slot into that future because they can accelerate specialization at the silicon layer while OCP standardization tames integration higher up the stack.
What should you watch next? Three signals. First, real FCSA-based silicon or reference platforms that demonstrate multi-vendor die assemblies with clean boot and security flows. Second, storage and memory vendors showing measurable end-to-end gains on AI pipelines when compute nudges closer to data. Third, OCP Marketplace listings that move from reference intent to deployable inventory you can actually procure for pilot workloads.
If the last two years were about proving that chiplets are technically feasible, the next two will test whether they’re operationally adoptable. Specs are necessary; supply chains and service models are decisive. The teams that align those pieces—across vendors, geographies, and disciplines—will dictate how fast AI capacity gets cheaper, denser, and more power-aware.
The AI build-out is colliding with real-world constraints—power, thermals, and capital. Ecosystems that compress time-to-specialization without exploding integration cost will win. Arm’s OCP board seat plus the FCSA contribution is a smart bet that interoperability is the bottleneck to unlock. If FCSA becomes the lingua franca for chiplets, operators could see a practical path to tailored silicon without a billion-dollar entry fee. Pair that with smarter storage and memory paths, and you start to chip away at the two killers of AI efficiency: idle accelerators and stranded data. The homework now is ruthless validation: put these pieces under AI-class loads, measure tokens per joule, and prove that “lead with others” doesn’t just sound good on stage—it pencils out in the data center.

The next wave of AI won’t live in a data center—it will weld seams, pick bins, and navigate factories alongside people. Physical AI brings intelligence into machines that perceive, decide, and act at the edge, closing the loop between perception and action.
Real-world factories introduce drift, glare, vibration, dust, and unpredictable human behavior—conditions that most models never see in simulation.
For IT and cloud architects, this is a stack problem with hard requirements: real-time inference under adverse conditions, data pipelines that span OT and IT, and operational discipline that turns pilot demos into consistent, reliable uptime. The gap between ‘works in simulation’ and ‘works every day on the line’ remains the main blocker to scale.
But it’s also a workforce problem. Robots reduce injury risk in hazardous tasks, but displacement effects are real and localized. The question isn’t “robots: yes or no?” but “how do we deploy them responsibly with both operational rigor and a workforce plan?”
Three curves are converging: cheaper edge compute and sensors, strong perception models, and maturing MLOps for robotics. The International Federation of Robotics reports 542,000 industrial robots installed in 2024—the fourth consecutive year above 500k—with global demand doubling over the past decade. Industrial AI spending is projected to grow from $44 billion in 2024 to $154 billion by 2030.
Standards are accelerating deployment. ROS 2/DDS, OPC Unified Architecture, and OCP’s rack-level guidance are pushing interoperability across sensors, controllers, and training infrastructure. The blockers are shifting from feasibility to integration discipline and change management - not ‘Can we automate?’ but ‘Can we maintain reliability when conditions vary beyond 5–10% of training data?’
Industry-grade platforms now stitch together simulation, data pipelines, and robotics foundation models. NVIDIA’s Omniverse plus Isaac tools let teams generate synthetic data, train policies in digital twins, and validate behaviors before touching a live cell—shrinking iteration from months to days. The missing piece is capturing the tribal knowledge of veteran technicians and encoding it into recovery behaviors robots can execute.
I spoke with Durgesh Srivastava, CTO of DataraAI, at the recent OCP Global Summit about what separates production systems from pilot theater. His outlook is pragmatic: target full automation for bounded task families, match human quality, and build graceful fallbacks when reality goes off-script.
DataraAI provides a data engine for physical AI—a data-as-a-service platform that transforms factory experience into machine intelligence. It captures how technicians act, how robots fail, and how edge conditions drift, creating the data foundation real-world robotics has always lacked.
The company emphasizes three pillars:
1. Egocentric Multi-Modal Data Capture – Robots learn from the same viewpoint they act in. DataraAI’s robot-mounted and wearable sensors record RGB-D vision, IMU, tactile, and audio data from real operations. This captures the nuanced cues—force patterns, drift, micro-failures—that static cameras miss.
2. AI-Driven Annotation – DataraAI’s engine automatically labels rare and high-impact events—fires, spills, breakdowns, human-robot handoffs—turning chaos into structured data. It consistently captures scenarios that traditional CV pipelines fail to label or detect.
3. Continuous Learning Loop – New anomalies are fed back into the data engine. Each cycle makes models more resilient and accurate in the field. Every exception becomes new training data, creating a self-improving loop tied directly to real operations.
Early industrial pilots using this loop showed a 53% accuracy lift and 67% better edge-case handling—clear evidence that real-world data closes the performance gap.
The winning pattern is consistent: push perception and control onto the robot, treat the cloud as training and update infrastructure, and run a disciplined data loop that captures real-world anomalies. In harsh conditions—glare, occlusion, spark bursts—edge models must keep the task running. Inference runs locally on the robot, keeping factory data on-site, reducing latency, and enabling real-time adaptation to drift and anomalies. The back end aggregates field data and pushes lightweight updates routinely. That loop turns demos into dependable production.
High-fidelity simulation generates diverse synthetic data and lets teams rehearse rare events safely. Foundation models provide generalizable priors; reinforcement learning in sim refines task skills before transfer to the real world. Edge inference runs locally; telemetry goes upstream for labeling and augmentation; new policies return during maintenance windows. Daily updates are feasible when you structure the pipeline—this reduces drift and grows edge-case coverage.
Case studies show robots cut musculoskeletal risk and reduce exposure to hazardous tasks like forging and welding. Collaborative-robot safety standards (ISO 10218, ISO/TS 15066) and OSHA guidance formalize safe human-robot interaction.
But displacement effects are real. MIT research found that each additional industrial robot per thousand workers reduced employment and wages in affected commuting zones—a meaningful impact not fully offset by productivity gains. The IMF estimates about 40 percent of jobs worldwide are exposed to AI impact; roughly half may see productivity augmentation, while the rest face reduced labor demand without intervention.
The macro takeaway: adoption will rise, safety can improve, and some roles will shift. For architects presenting to leadership or works councils, the deployment plan must address both.
Physical AI is crossing from pilot theater to production credibility.
The durable advantage comes from operational muscle: how quickly you spin the loop from floor data to better policies and back to the line, while moving people from hazardous repetitive work to higher-value tasks with clear retraining paths. Start with one cell, one exception, and a fallback procedure. Prove you can turn tribal knowledge into machine-executable behaviors and safety risks into measurable improvements. Then scale the loop. That’s the compounding edge in physical AI.

From SC25 in St. Louis, Nebius shares how its neocloud, Token Factory PaaS, and supercomputer-class infrastructure are reshaping AI workloads, enterprise adoption, and efficiency at hyperscale.

Runpod head of engineering Brennen Smith joins a Data Insights episode to unpack GPU-dense clouds, hidden storage bottlenecks, and a “universal orchestrator” for long-running AI agents at scale.

As AI adoption accelerates across industries, financial services takes the front line of both innovation and risk. From fraud detection to customer personalization, AI is reshaping how institutions operate. But the sector’s high stakes and regulatory complexity demand a uniquely careful approach.
At the recent AI Infra Summit in Santa Clara, Jeniece Wnorowski and I sat down with FinTech expert Anusha Nerella for a Data Insights conversation about how financial organizations can responsibly scale AI, stay ahead of fraudsters, and build teams equipped for the future.
“Many institutions are still in the early stages of AI deployment, while bad actors are moving fast and experimenting aggressively,” Nerella said.
This dynamic creates an urgent need for stronger, more agile defenses. Nerella emphasized that financial firms must accelerate their AI implementation cycles without sacrificing the governance and compliance guardrails that define the industry.
Asked what the broader technology ecosystem should do to support responsible AI in finance and enterprise, Nerella returned to the importance of regulatory alignment.
“Everything has to go through the regulatory and compliance [process] in order to make it responsibly…applicable to the enterprise sector,” she said.
But regulations alone aren’t enough. Nerella believes that financial institutions must rethink team structures and knowledge transfer to keep pace. She advocates for what she calls “reverse training,” in which organizations bring in engineers well-versed in AI frameworks and libraries, then combine their expertise with the strategic experience of senior leaders.
By fostering two-way collaboration between new AI talent and experienced financial professionals, companies can build stronger, future-ready teams.
“It becomes… a collaborative effort for sure,” Nerella explained. “It’s an equal opportunity here because whoever [has] decades of experience…might have limited exposure towards AI-based frameworks or library utilization or hands-on experience.”
This equal exchange of knowledge, she argued, is essential for success.
For organizations just beginning their AI journeys, Nerella’s advice is both practical and pointed: don’t try to boil the ocean. She recommends starting with “two or three clear use cases with ROI” and ensuring that governance and control mechanisms are in place from the outset.
“When you follow all these basic principles, then you will be able to see…result-oriented AI-based implementation from your end,” she said.
Throughout the conversation, she underscored that AI success in financial services requires human-in-the-loop collaboration.
The financial sector’s high regulatory stakes, complex legacy systems, and relentless fraud threats make its AI journey distinct. Nerella’s insights highlight that the path forward isn’t just about technology—it’s about culture, compliance, and collaboration.
To build responsible and trusted AI systems, financial organizations must:
As the industry races to stay ahead of increasingly sophisticated fraud tactics, success will depend on balancing agility and accountability.

Recorded at #OCPSummit25, Allyson Klein and Jeniece Wnorowski sit down with Giga Computing’s Chen Lee to unpack GIGAPOD and GPM, DLC/immersion cooling, regional assembly, and the pivot to inference.

The current speed of the data center industry’s transformation is unlike any in its history. Where infrastructure upgrades once followed multiyear cycles, the pace now is annual, at the speed of consumer electronics. My recent conversation with Kelley Mullick, CEO and founder of Avayla, at the Open Compute Project (OCP) Global Summit in San Jose revealed how liquid cooling has moved from niche application to critical infrastructure component, and why standardization will determine whether the industry can meet the moment.
During our TechArena Data Insights episode with Solidigm’s Jeniece Wnorowski, Kelley shared insights from her extensive career in cooling technologies and her current role as chair of OCP’s industry liaison team. Her perspective illuminates both the technical challenges operators face and the collaborative frameworks emerging to address them.
As data centers continue to evolve in the race to support AI-enabled workloads, Kelley noted that scalability has emerged as the primary challenge facing operators today. During our conversation, she cited an OCP keynote by Meta’s head of infrastructure, Dan Rabinovitsj. What once took multiple years now happens annually, leading him to compare current deployment cadences to consumer electronics rather than traditional data center timelines. “That was a big insight for me,” she said. “I live and breathe in this space, but it is a real insight to make that comparison.”
With this primary challenge of scalability come a host of secondary challenges, including cooling this infrastructure, and preparing for liquid cooling. Before 2022, more than 90% of data centers relied on traditional air cooling. In just two years, liquid cooling adoption has surged to approximately 30% of the market. This rapid acceleration has driven significant growth in coolant distribution units (CDUs), the critical infrastructure components that deliver coolant from chips to distribution systems. Recognizing this need, at the 2025 global summit, OCP announced a new working group focused on CDU specifications and best practices.
As the industry faces these challenges, collaboration and defining industry standards become more important than ever. At OCP, Kelley is chair of an industry liaison team that connects external standards organizations to OCP. This year, the team had two announcements to make. First, OCP and the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) signed a memorandum of understanding, establishing formal collaboration to eliminate duplicative efforts and create clearer pathways for standards development. In addition, a new liaison role connects OCP directly into ASHRAE’s processes, ensuring thermal management standards align with open infrastructure development. Second, ASTM International launched a new subcommittee on insulating fluids for immersion cooling applications, with a member of the industry liaison team working with that organization as well.
In another example of the importance of collaboration, Kelley highlighted the Universal Quick Disconnect (UQD) specification 2.0, which addresses quick disconnects within the cold plate cooling loop from the chip to the CDU.
“We had many different technologies…There was also a lot of proprietary designs, and this was creating a lot of problems and challenges within the industry,” she said. “Specification 2.0 was one of the first within liquid cooling to come out and really make standards more readily available to address the challenge of interoperability and heterogeneity within the data center.”
Looking ahead, Kelley sees a future that will likely involve hybrid cooling strategies rather than single-solution deployments. While direct-to-chip cooling currently dominates new deployments, Kelley anticipates growing adoption of immersion cooling as workloads demand complete thermal management for entire compute stacks. Cooling networking equipment, memory modules, and storage alongside compute will become increasingly important. Immersion cooling’s ability to capture 100% of generated heat makes it a valuable complement to direct-to-chip solutions, particularly interest in heat recapture and reuse rises.
As AI-enhanced workloads continue driving unprecedented infrastructure demands, the industry’s ability to standardize rapidly will determine whether operators can deploy at scale while maintaining flexibility for future innovation. Kelley Mullick’s leadership in standards development through OCP demonstrates how open collaboration can accelerate adoption while maintaining interoperability. Organizations that engage with standards bodies today and build relationships across the ecosystem will be best positioned to capitalize on the liquid cooling transition reshaping data center design.
For more information about Avayla, visit avayla.net. To learn about OCP’s Cooling Environments working group and standards development, visit the OCP wiki.

From #OCPSummit25, this Data Insights episode unpacks how RackRenew remanufactures OCP-compliant racks, servers, networking, power, and storage—turning hyperscaler discards into ready-to-deploy capacity.

As enterprises move artificial intelligence (AI)-based solutions further into production, inference speed is becoming a key factor in whether deployments succeed or fail. Real business value, and real infrastructure challenges, lie in how quickly models can generate responses for end users.
In a recent TechArena Data Insights episode, I spoke with Val Bercovici, chief AI officer at WEKA, and Scott Shadley, director of thought leadership at Solidigm, to explore how inference workloads are exposing infrastructure bottlenecks that threaten AI economics. Their conversation revealed why a metric called time to first token has become essential for measuring inference performance, and how storage architecture designed for this phase of AI can transform both productivity and profitability.
In a relatively short time, one metric has emerged to measure AI responsiveness: time to first token. As Val and Scott explained, this is a measure of the time it takes for a model to respond to a given prompt. It has emerged as a key metric because it directly translates to business value. “Time to first token literally translates to revenue, OPEX, and gross margin for the inference providers,” Val said.
As a concrete example, Val cited real-time voice translation, where instantaneous responses are critical to natural conversation. “Who wants to wait an awkward, pregnant pause of 30, 40 seconds for a translation?” Val asked. “We want that to be real-time and instantaneous, and time to first token is a key metric for that kind of use case.”
The metric matters because it reflects deeper infrastructure realities for AI inference workloads. Behind the response to every prompt lies a complex process that can be considered in two phases. In the pre-fill phase, prompts are converted into tokens and then expanded into key value (KV) cache, essentially the working memory of the large language model. Then in the decode phase, the model generates the actual output users see. Graphics processing units (GPUs) are currently being asked to do both at once, which is, in Val’s words, “a very expensive kind of context switching,” and the latter phase makes extreme demands of memory as well.
The conversation revealed how AI workloads differ fundamentally from traditional computing. Modern GPUs contain over 17,000 cores compared to a CPU’s hundred cores, creating entirely different performance requirements. This architectural shift demands a fresh approach to storage design, one that treats solid-state drives (SSDs) not merely as storage devices, but as memory extensions.
WEKA’s NeuralMesh Axon technology demonstrates this evolution. The solution embeds storage intelligence close to the GPU and creates software-defined memory from NVMe devices, allowing inference servers to see NVMe storage as memory, delivering memory-level performance from SSD hardware. This approach addresses one of inference computing’s most significant challenges: providing sufficient memory bandwidth to feed GPU cores without incurring prohibitive costs.
One of the discussion’s most striking revelations centered on what Val termed the “assembly line” problem. While data centers optimized for AI are often described as “AI factories,” AI inference today operates more like a job shop than an assembly line. Data movement remains inefficient, causing expensive re-pre-filling operations and consuming kilowatts each time.
This inefficiency manifests in real-world constraints that AI users encounter daily. The rate limits imposed by AI service providers reflect the genuine economics of token generation. Coding agents and research tools that consume 100 to 10,000 times more tokens than simple chat sessions can’t be served profitably at current infrastructure costs, forcing providers to limit access even to customers willing to pay premium prices.
Scott and Val offered practical guidance for IT leaders navigating the transition from proof-of-concept projects to production AI deployments. Scott stressed the importance of aligning hardware and software planning, noting that AI infrastructure demands closer collaboration between traditionally siloed teams. Val encouraged leaders to approach AI infrastructure with fresh perspectives, setting aside assumptions from previous technology generations.
As AI moves from experimental projects to production workloads generating measurable business value, infrastructure choices increasingly determine competitive advantage. Organizations that optimize storage architecture for token economics position themselves to scale AI profitably, while those applying traditional storage approaches risk creating bottlenecks that limit innovation. The enterprises that act decisively today in implementing high-performance storage architectures designed specifically for AI workloads will find themselves better positioned to capitalize on AI’s transformative potential.
For more information on WEKA’s AI infrastructure solutions, visit WEKA.io. Learn about Solidigm’s AI-optimized storage innovations at solidigm.com/ai.

For 15 years, Scality has operated at a scale most enterprises never contemplate. Its customers routinely manage petabytes, tens of petabytes, and now exabytes of unstructured data. My recent conversation with Paul Speciale, chief marketing officer of Scality, alongside Jeniece Wnorowski from Solidigm, revealed how this software-defined storage pioneer is navigating a shift in how organizations think about data protection in an era of ransomware threats and AI workloads.
The ransomware threat has fundamentally changed the data protection conversation. As Paul noted, you can’t go a day without seeing news of a cyber attack, and this constant barrage has added a new priority for chief information officers: in addition to backup speed, recovery speed is now top of mind. Organizations need their backup systems to serve as insurance policies that can quickly resurrect operations after an incident. This demand requires special architectural considerations and performance characteristics from the backup system.
AI also has introduced additional complexity to the data protection equation. First, the integrity of data feeding AI systems becomes paramount. “Imagine training your AI on data that’s been tampered with or has some kind of integrity constraints,” Paul said, raising the possibility of the potentially catastrophic outcomes. Second, AI itself becomes a tool for both attack and defense. While Scality explores embedding AI capabilities to detect suspicious access patterns and questionable payloads, ransomware actors are simultaneously weaponizing AI for sophisticated phishing attacks and impersonation schemes.
The flash storage revolution is reshaping Scality’s deployment patterns in ways that would have seemed unlikely just years ago. Historically, Scality’s massive capacity deployments relied primarily on high-density hard disk drives (HDDs), with flash comprising only 1-2% of total capacity to accelerate metadata operations and data lookups. The converging costs between flash and HDD storage, combined with performance demands from analytics and AI workloads, are driving increased adoption of all-flash configurations.
Paul cited a conversation with a major analyst firm that reinforced this trend: AI models in cloud environments are already training on object storage backed by flash. The same pattern is emerging for on-premises deployments. Even fast backup operations can benefit from all flash storage. “Why? Because again, it’s back to this restore equation,” he explained. “If I have flash, I can do a faster job of restoring.”
Paul emphasized Scality’s software-only approach as a core differentiator in an increasingly competitive storage market. By remaining hardware agnostic, Scality can leverage best-of-breed components like Solidigm’s high-capacity SSDs while optimizing for specific media characteristics. This flexibility extends to energy efficiency, performance tuning, and media-specific optimizations.
The conversation revealed Scality’s vision toward becoming a data platform, extending their reach beyond storage. “And what is a platform?” he asked. “A platform is something that has a series of services. It might be a search service, a data cleansing service, a vectorial database for AI.” Paul sees software vendors as holding advantages in delivering these integrated service offerings given they are not tied to specific servers or form factors.
Object storage’s mainstream emergence represents another important industry change. After 15 years of evangelizing object storage’s benefits, Paul sees the market finally recognizing its fundamental advantages for managing unstructured data at scale. Object storage’s inherent scalability, combined with its understanding of metadata and data characteristics, positions it ideally for immutable data protection and AI workloads. The hyperscalers have already validated this approach through Amazon S3 and Azure Blob adoption for AI applications. “The opportunity is huge as an object storage player in that arena,” he said.
Scality’s 15-year journey demonstrates the importance of being able to adapt alongside shifting enterprise priorities. Storage solutions must balance performance, resilience, and flexibility to meet the growing demand for data restoration capabilities and to address the changes rising from AI’s dual role as both a workload and security tool. As object storage enters the mainstream and flash economics continue improving, organizations managing massive unstructured data volumes will increasingly depend on software-defined approaches that optimize across media types and deployment models. Scality’s platform vision, rooted in pure software architecture and object storage expertise, positions the company to address these converging demands.
For more information on Scality’s data protection solutions, visit scality.com.

Allyson Klein and co-host Jeniece Wnorowski sit down with Arm’s Eddie Ramirez to unpack Arm Total Design’s growth, the FCSA chiplet spec contribution to OCP, a new board seat, and how storage fits AI’s surge.

The artificial intelligence “surge” is here, and with it, data center infrastructure is fundamentally changing. From increased demand for liquid cooling to a rise in interest for co-location services, both bleeding-edge and well-established solutions are being called into play to answer AI’s appetite for power, cooling, and data access. I recently explored this phenomenon with Glenn Dekhayser, global principal technologist at Equinix, and Scott Shadley, leadership marketing director at Solidigm, to understand how enterprises are navigating AI’s infrastructure demands.
The fundamental challenge facing enterprises deploying AI can be summarized in two words, according to Glenn: “power and heat.” Where graphics processing units (GPUs) once served as passengers in computing architectures, they now drive the entire infrastructure bus, and the power they demand is staggering. An estimated 97gigawatts of new power will be required for all registered data center projects by 2030, and when one average nuclear reactor puts out 1.4 gigawatt, “The math doesn’t work,” as Glenn said.
These power requirements create cascading effects throughout the infrastructure stack. Glenn noted that not every organization needs the highest-performance solutions, and liquid cooling becomes necessary only when rack density exceeds 40 kilowatts (kW). However, when direct-to-chip liquid cooling can capture and dissipate up to 1,000 times the amount of heat that air can, enterprises deploying next-generation GPUs must confront the requirements to implement this technology. This transition fundamentally changes total cost of ownership calculations and operational complexity for new infrastructure investments.
The full effects of AI’s increasing prevalence remain to be felt, but as Glenn noted, AI is driving change across all industries. While generative AI has advanced adoption so that “everybody has a need for” AI now, different forms of machine learning are also being used to solve domain-specific problems. “Every conversation has some angle to it,” he noted. “Whether you’re to be a consumer or provider or some middle service provider for data.”
Beyond dealing with power and heat, enterprises are adopting a mix of infrastructure strategies to adapt to the needs of AI workloads. For example, contrary to predictions that AI would centralize workloads in public clouds, enterprises are increasingly turning to co-location for AI deployments. The driving factors extend beyond simple cost considerations. The rapid pace of AI innovation means that the public cloud provider chosen two years ago may not offer the optimal AI services, capacity, or specialized models an organization needs today. Co-location provides the connectivity and flexibility to leverage various services without vendor lock-in, while maintaining control over data sovereignty and performance.
Perhaps nowhere is AI’s impact more visible than in storage strategy. Scott outlined how different AI workloads create distinct storage demands, and how innovative techniques can make the most of storage in all portions of the AI pipeline. For example, Solidigm has shown that retrieval-augmented generation (RAG) tasks can be offloaded from expensive distributed random-access memory (DRAM) to optimized solid-state drives (SSDs), and that performance can actually be improved while lowering costs by doing so.
Both experts emphasized that successful AI infrastructure requires holistic thinking across compute, storage, and interconnect. Rather than isolated decisions, enterprises need integrated solutions that can adapt to rapidly changing requirements.
The AI infrastructure revolution is forcing enterprises to balance competing demands: the need for high-performance computing against power and cooling constraints, the desire for cutting-edge capabilities against cost considerations, and the requirement for agility against infrastructure investments. Equinix and Solidigm demonstrate how thoughtful collaboration can address these challenges through flexible, efficient solutions that scale from current needs to future requirements.
As AI continues its rapid evolution, organizations that invest in adaptable, well-connected infrastructure today, while maintaining control over their data platforms, will be best positioned to capitalize on tomorrow’s AI innovations. The key is not predicting exactly what AI will become, but building the foundation to respond quickly when it does.
For more insights on Equinix’s AI infrastructure solutions, visit blogs.equinix.com or connect with Glenn Dekhayser on LinkedIn. Learn more about Solidigm's AI-focused storage innovations at solidigm.com or reach out to Scott Shadley on social media platforms.

For decades, redundant array of independent disk (RAID) technology has quietly protected enterprise data, operating as a reliable safeguard designed in an era when hard disk drives were the only storage option available. That legacy architecture is now colliding with the demands of AI workloads, creating a performance gap that enterprises can no longer afford to ignore. As organizations invest heavily in infrastructure to optimize graphics processing units (GPUs), they’re discovering that traditional data protection methods can become a significant bottleneck preventing them from maximizing their AI investments.
During a recent TechArena Data Insights episode, I explored this challenge with Davide Villa, chief revenue officer at Xinnor, and Sarika Mehta, senior storage solutions architect at Solidigm. Our conversation revealed how the transition from hard drives to high-capacity quad-level cell solid-state drives (QLC SSDs) is forcing a fundamental rethinking of data protection strategies for AI environments.
Davide framed the issue clearly: AI infrastructure is designed around GPUs, with storage often treated as an afterthought. That approach creates significant risks. A leading truck manufacturer shared that every AI-based simulation job they run costs more than $2 million. If they lose data and must rerun a job, they’re facing substantial financial consequences.
The challenge extends beyond data loss prevention. GPU idle time carries steep costs, making it essential to maintain full system performance even during drive failures. Traditional RAID solutions that were designed for hard disk drives (HDDs), a slow media, struggle to meet this requirement. The performance characteristics of modern NVMe drives—capable of delivering tens of gigabytes per second in read and write, and multi-million input-output per second (IOPS) in random operation—require data protection solutions designed specifically for that level of parallelism.
Testing conducted by Solidigm and Xinnor revealed striking performance differences between traditional and modern data protection approaches. Rebuilding a 61.44 terabyte (TB) drive took just over five hours with Xinnor’s xiRAID solution compared to more than 53 hours with traditional Linux OS RAID (MD/RAID).
More importantly, those measurements represent system performance during idle rebuild operations. When the same tests ran with heavy workloads active, the performance gap widened dramatically. Xinnor’s solution completed rebuilds 25 times faster while maintaining full system performance. “So you can still run your AI workload…as if nothing happened. And in the background, our software is rebuilding the drive, recovering the data, and avoiding any data loss,” Davide emphasized.
Sarika highlighted how the shift from hard drives to QLC SSDs is enabling these improvements. Solidigm’s QLC drives deliver 11 times more write bandwidth and roughly 25 times more read bandwidth compared to 30TB hard drives. Those performance characteristics, combined with capacity advantages—122TB SSDs versus 32TB maximum for hard drives—create compelling economics for AI deployments.
The transition also addresses power and space constraints that have become critical considerations. As GPU power consumption claims the majority of data center power budgets, storage must maximize capacity per watt. High-capacity QLC drives deliver the necessary performance for AI workloads while optimizing the infrastructure footprint.
Another factor driving adoption is the warming of storage tiers. “Data has resided on cooler tiers before, which were primarily served by hard drives. But with AI taking off, those tiers are warming up. The performance that hard drives were providing before is no longer sufficient,” Sarika explained. That shift makes the raw performance of QLC SSDs essential for preventing GPU idling.
Davide emphasized that hardware performance alone isn’t sufficient. Software plays a crucial role in enabling reliable performance at scale. AI deployments require clusters of drives, not individual units. The challenge lies in aggregating multiple drives into larger pools while maintaining the performance characteristics of individual devices. “This can only be done through software implementation,” Davide said. “So that’s exactly what we do. We try to maximize the aggregated performance of what Solidigm brings to the market.”
Xinnor’s approach focuses on maximizing what hardware can theoretically deliver as the industry transitions from PCIe Gen 4 to Gen 5. That optimization ensures organizations can fully leverage the capabilities Solidigm brings to market while maintaining the data protection AI workloads require.
The convergence of high-capacity QLC SSDs and modern data protection software represents a meaningful advance for AI infrastructure. Organizations that recognize storage and data protection as strategic components will be better positioned to maximize their GPU investments and avoid the costly consequences of system degradation or data loss. As AI workloads continue to evolve and drive capacity demands higher, the gap between legacy RAID approaches and modern solutions will only widen.
For more information about Xinnor’s data protection solutions, visit xinnor.io. Learn more about Solidigm’s AI-focused storage innovations at solidigm.com.

OCP Summit put a spotlight on something I’ve been watching for a while: AI has turned the data center into a living system. The optimization target isn’t a single server box anymore. It’s the confluence of power, liquid cooling, interconnects, storage tiers, and software orchestration, all moving in step to serve wildly dynamic workloads. I was lucky enough to moderate an AI Factory panel last week with CoreWeave’s Jacob Yundt, Dell’s Peter Corbett, NVIDIA’s CJ Newburn, Solidigm’s Alan Bumgarner, and VAST Data’s Glenn Lockwood. These seasoned data center veterans have experience from across the value chain and provided some fantastic insights on how swiftly data center innovation is moving to address customer demand.
The clearest through-line in the conversation was the industry shift from box-level thinking to rack- and row-scale system design. That change cascades into everything from how you bring in power and manage heat to how you route traffic, where you place data, and how you orchestrate workflows. Jacob framed the pivot crisply, discussing CoreWeave’s move from “thinking of compute as individual units” to racks, rows, and entire data centers, where “everything just needs to work together… power delivery, liquid cooling, networking, storage.”
Glenn jumped on this foundation, explaining why VAST doesn’t merely see that as just a need for more network connectivity but a call for application-specific disaggregation. He clarified that rack-scale and scale-up interconnects lets you design for how the work runs, not just push everything through a generic L3 and hope for the best. He also pointed to checkpointing patterns and the pairing of global shared storage with node-local SSDs to maximize performance and economics.
In AI data centers, compute utilization, more than ever, defines the business outcome. Peter outlined Dell’s view on the topic, plainly explaining that ensuring high utilization means feeding accelerators while not recomputing what you already computed. Operators need to build the surrounding storage and network capabilities so intermediate results and checkpoints can be reused at speed.
That’s harder than it sounds because the software surface is changing at breakneck pace. CJ put a timestamp on it, reflecting NVIDIA’s torrid pace of innovation: “The only thing I’m really confident in is that it’ll be completely different in three weeks.” He emphasized that orchestration must continually decide where data should live, when to move it, and when remote access beats relocation.
Jacob echoed this sentiment stating, “The pace of deployment right now is unlike anything I’ve ever seen… the cost of getting it wrong… is so expensive… if your network or storage is not fast enough.” When considering the scale that CoreWeave is operating, it was clear that guessing wrong here could make or break financial success.
While it was no surprise that storage was a central discussion point of the panel… after all, Solidigm did host this breakfast… the fervor with which the panelists spoke about storage underscored its architectural importance to this era of the data center. Their message? Stop relegating data plumbing to a downstream team. From metadata and indexing to checkpoint lifecycle and synthetic-data governance, data is foundational to the AI pipeline.
Peter walked through why. Training, inference, RAG, and fine-tuning each have distinct data modes and retention demands. Checkpoints must be instantly restorable and archived for reproducibility, while the generated/ simulated data used to train other LLMs needs cataloging at massive scale. He went further forecasting that “we could see a hundred-x increase” in those volumes.
On the abstraction side, CJ argued for higher-level interfaces that let users express intent, “just go get my data wherever it is,” while the stack decides placement, sharding, staging, and presentation under the hood. That decoupling gives vendors freedom to innovate behind stable APIs as hardware and media evolve.
And the inside-the-enterprise reality check came from none other than Alan, sharing that after a company-wide AI enablement push, Solidigm had discovered very quickly how unclean their data was. His sharing was met with many nods around the room and reflected my own discussions with enterprises that have shared the same scar tissue. To get the most out of AI, governance and operational control are no longer optional programs – they’re table stakes.
With the financial and resource-based investment in AI factories, the focus on efficiency has never been sharper. Liquid cooling has moved from science project to assumed ingredient. Jacob didn’t mince words, stating the importance of this innovation for companies like CoreWeave. He explained that the next frontier is tying everything together so when a 20-line command is about to launch and burn tens of megawatts, the power and cooling systems respond proactively, not reactively.
Peter built on this, highlighting why even storage needs thermal re-thinking. Multi-kilowatt drive enclosures are real, and standards groups including SNIA and OCP are working on how to cool serviceable drives in high-density racks, because you still need to pull and replace them safely.
Glenn then took the conversation further, uncovering a somewhat silent efficiency killer. He explained that when something fails or stalls, entire clusters idle and argued for communities like OCP to push shared semantics for reliability in practice so we can identify link flaps, hanging collectives, and other at-scale “ugly bits” quickly, rather than burning cycles waiting for the system to realize that it’s reached an unhealthy state.
One culprit in this equation is how storage granularity can wreck energy math for certain vector-style workloads. CJ provided an example: reading 16–32KB NAND pages to ID ~512 bytes of useful data leads to absurd power budgets. The fix? Targeting coordinated innovation across drives, controllers, firmware, and software to reduce over-fetch and align I/O to what a workload actually needs.
Alan added a note on fabric efficiency at scale, explaining that he’s seeing reports of very high utilization on enormous clusters, which are impressive, but it makes the stack hypersensitive to tail latencies and congestion. Getting protocol/ software choices right is what keeps utilization sustained, not just peaky.
As veterans of the data center industry, the panelists knew well that open standards are an innovation multiplier. Peter put it succinctly, stating that standards have been essential for storage innovation across hardware and protocol interfaces because they let multiple vendors innovate behind them and lower barriers to entry for newcomers.
But when you are moving so quickly, can traditional storage approaches keep up? CJ offered a contrarian view, stating that we shouldn’t rush to standardize designs that limit concurrency or add unnecessary complexity. Instead, we should run experiments, share data, then bring the minimum viable interfaces to standards bodies so we minimize time to useful production without locking in untested ideas.
If you compress the future that our panel described into a single image, it’s this: An open chiplet designed fueled single rack, consuming 50 megawatts, surrounded by goop, running elegant software that is utilizing it. That was my summary of the panelists prognostications of a 5-year innovation horizon.
Jacob kicked us off with a view of “football fields of infrastructure for power and cooling,” feeding one mega-rack at ~50 MW. This was not simply a joke, but the logical endpoint of density, liquid loops everywhere, and efficiency work that compresses visible compute while everything around it scales out of sight.
Glenn took the idea further, claiming that the “one rack” only works when it’s surrounded by the unglamorous machinery that turns data into answers. He explained that the AI factory of the future would be full of “goop,” the power, cooling, storage and networking required for compute, and at the center GPUs actually driving insights.
On software, CJ pushed the same abstraction thread forward. He sees a future where operators declare intent (“go get my data wherever it is”), while the stack optimizes where data lives, how it’s sharded, and how it’s staged/presented across heterogeneous building blocks. That keeps innovation vibrant underneath stable interfaces as media, fabrics, and accelerators evolve.
On hardware, Alan expects innovation to move inside the package: by 2030, we’ll need new standards because we can combine different things inside an SoC in ways that aren’t possible at rack or PCB scales today. That’s the open chip economy showing up in real design choices including interfaces, subsystems, and thermal/ power tech that collectively ripples up into rack scale architecture.
And Peter widened the lens: the power envelope itself becomes a macro-catalyst. If the capacity required for AI is built primarily with low-carbon generation, the grid’s evolution can accelerate through economies of scale—changing siting, cooling, and heat-reclaim strategies along the way.
AI isn’t just turning the data center into one big server, it’s turning it into a cohesive and interdependent organism. The amount of collaboration required to execute at this design point is unlike anything we have seen before. Aligning on open, agile interfaces that increase industry velocity will be critical to our collective success. Designing everything to remove inefficiency to the whole is now non-negotiable, and a seat is open for anyone at the table who has bold ideas for efficiency breakthroughs.