Solidigm Media Platform

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Caduceus symbol with timeline visualization morphing from slow paperwork stacks into instant digital approvals, patient care accelerated, motion blur showing speed, modern healthcare environment, hopeful tone.

Innovaccer Reframes AI Governance as Healthcare’s Accelerator

Back in 2015, the “godfather of AI” Geoffrey Hinton made a bold prediction: stop training radiologists immediately, because deep learning would render them obsolete within five years. A decade on, this looks unlikely to happen any time soon, and radiologists remain in just as much demand, showing how important accuracy and safety remain and the unique challenges in adopting AI in this space.

My recent conversation with Tapan Shah, AI Architect at Innovaccer and Agentic AI Work Group Lead at the Coalition for Health AI (CHAI), and our Data Insights co-host Jeniece Wnorowski from Solidigm, shed light on some of the challenges in creating scalable AI systems for healthcare. His role involves creating AI systems and agents that work in actual healthcare environments and enterprise systems that affect patient and provider outcomes.

In Tapan’s view, the hardest problem in healthcare AI is not creating the right models or algorithms, but in designing from the ground up.

The Gap Between Pilot and Production

Tapan opened with an example that cuts to the heart of the challenge. An AI clinical note generator built for a cardiology practice may work great in a pilot and then stumble when deployed for other disciplines like oncology or orthopedics, or even a different practice running a different electronic health record (EHR) system. Even when the underlying model remains the same, the results can be vastly different based on the medical discipline.

“Scaling AI into enterprise healthcare is less of an AI problem and more of a system design problem,” Tapan said. “The real problem here is whether in real-world situations, an AI agent being developed has the right level of access and the capability to create sufficiently transparent and explainable recommendations that even a skeptical clinician can accept.”

From Building Models to Agents

In the past decade, the healthcare AI industry has undergone a seismic shift from building predictive models to building agents. Historically, validating an AI system was relatively straightforward: train a model, measure accuracy on a holdout set, and deploy. This has been successfully validated in cases like early tumor detection, says Tapan.

Agents are a fundamentally different beast. They pull data from multiple data sources, invoke various tools, and combine these inputs to perform complex tasks. Often there is no single source of truth and clinicians can interpret the same data differently. Data can be missing or certain users cannot access certain tools or software. In this scenario, the challenge becomes ensuring that the agent being built is safe and can handle the scenario safely and predictably even in a novel scenario.

And because sensitive data is being handled, safeguards need to be built in the system from the get-go. For instance, a cardiology clinical note generator should not have access to a patient’s psychiatric records.

Governance as Enablement, not Constraint

When the topic turned to governance, Tapan pushed back against the assumption that governance is primarily about controls and restrictions.

“AI governance is not a constraint, it’s enablement,” he said, comparing a good governance framework to a constitution: it can be used as a binding document, or it can serve as the foundation for doing genuinely useful things, based on how you build and use it.

He illustrated this with a scenario where an authorization agent shifted from a 70% auto-approval rate to a 90% auto-approval rate. Effective governance would mean detecting this shift, reviewing the agent’s complete decision graph and identifying the root cause. A successful governance model would enable such decision making to be made in minutes, rather than weeks.

Clinical Consequences

The thorniest issue in the conversation was accountability, especially as AI agents take on decisions with both clinical and administrative consequences. Tapan was candid: there is no perfect solution yet. Legal frameworks are still catching up to the question of what it means for an AI agent to make a consequential decision.

Innovaccer’s current approach is to make sure that there is comprehensive logging of every AI decision, granular access control for agents, and human oversight with the ability to override. For all clinical use cases, and many administrative ones, a human remains in the loop, able to review and reverse any AI-generated decision. As legal and governance frameworks evolve, these foundations will provide the structure to adapt.

Long-term Value

When asked about measuring long-term strategic value, Tapan pointed to two holy grails: improved patient and provider outcomes. Treatment authorizations are a good example of where AI intervention can help, he explained.

“There are cases where it can take upwards of two to three weeks for a prior authorization for a procedure, that leads to delay in care,” he said. “If we can bring that down to, let’s say, a day, less than a day, even a few minutes, it actually impacts patient outcomes and cost of care.”

On the other end, freeing clinicians of administrative burdens allows them to spend more of their time caring for patients, reducing burnout and stress levels.

And because healthcare AI serves multiple stakeholders including operations, compliance and clinical teams, a scalable solution would need to be designed with solid system design principles, with observability, tracing, and monitoring built in right from the very beginning.

The TechArena Take

Innovaccer’s approach demonstrates the challenges in building a successful system that can work across multiple specialties in real-life hospital scenarios. As integrating AI in healthcare has shifted from building models to building agents, the hardest problem to solve isn’t technical performance, but rather ensuring safety, accountability, and governance.

Tapan’s framing that governance should be treated as enablement, not constraint, feels like an important mindset shift for leaders trying to move beyond the pilot stage. By helping to reduce authorization times and administrative burden, AI can help provide long-term benefits such as better patient care and provider experience.

If you’re interested in learning more, check out the full podcast. In addition, the Department of Health and Human Services recently published updated guidelines for AI, and the CHAI and Innovaccer websites provide useful guidance on the use of agentic AI use in healthcare settings.

An elegant abstract composition of separate, floating geometric shapes, interwoven threads of light, representing AI interconnection, soft focus, cinematic lighting.

Betterworks Uses ‘Horizontal Intelligence’ to Connect HR Silos

The moments that help define an employee's trajectory, including performance reviews and manager feedback, are too consequential to get wrong. AI promises to help managers be better prepared for these important conversations by presenting clear insights that draw from the sea of daily work data. But it can only deliver when it is trusted on all sides.

In my recent conversation with Maher Hanafi, senior vice president of engineering at Betterworks, and Solidigm’s Jeniece Wnorowski, we discussed what it takes to turn AI’s potential into a trusted and valued enterprise solution.

Beyond the Static HR Database

Betterworks describes itself as a talent and performance management platform for global enterprise customers, but Maher is quick to distinguish it from traditional HR software. Where legacy tools function as administrative record-keepers by tracking history, storing documents, and managing lists, Betterworks aims to orient its platform around the flow of work.

“We were looking at the data from a performance lens,” Maher explained. “We’re trying to enable anything that helps go beyond just tracking history…to focus more on the flow of work.” For large enterprises with complex organizational structures spanning multiple regions, that means helping individuals, managers, and business units connect their daily efforts to company-wide goals, a capability that only becomes more valuable, and more technically demanding, as AI matures.

AI as Horizontal Intelligence

Maher offered the useful frame of thinking about AI as enabling “horizontal intelligence.” Before AI, Betterworks’ modules — goals, feedback, one-on-one meetings, talent and skills — operated as largely separate domains. Generative AI has made it possible to interconnect those domains in ways that weren’t previously practical.

“With AI today, it’s just way easier to interconnect all of these,” he said. “I think SaaS products and SaaS platforms will be built as more of an interconnected set of layers that will break the silos between different components and features.”

In practical terms, this means a manager preparing for a one-on-one meeting can receive and review AI-generated insights drawn from an employee’s recent goals, feedback, and performance history before a conversation, rather than manually pulling together and examining months’ worth of data.

Responsible AI as a Design Principle

When AI provides insights that can influence such important conversations, it’s paramount that all parties can trust the system’s output. Operating in this environment, Betterworks has emphasized responsible AI guided by two principles in particular: transparency and explainability. Transparency means the system can show users what sources it drew on to generate a response. Explainability means users understand why an AI suggestion is what it is. With this foundation, when managers are giving feedback to employees based on information AI provides, they can make suggestions and have confidence in the underlying insights.

“We are trying to use AI as a way to really get you as a better individual, better member of the organization and contributing to the big picture versus having AI take control,” Maher said. “You should be in the driver’s seat. AI is just there to help you and be a co-pilot, nothing else.”

Advice for Technology Leaders

As the conversation turned to broader lessons, Maher offered practical guidance for engineering and technology leaders navigating AI adoption inside enterprise organizations.

His first recommendation is simply to stay informed without becoming overwhelmed. “AI is moving very fast…. Picking the one out of the haystack is very challenging,” he said. To manage that, he created what he calls an AI Engineering Lab at Betterworks, a structured environment where engineers could explore tools and run experiments, rather than waiting for top-down mandates on which technology to adopt.

He also urged leaders to take the financial dimension seriously. “There was a huge risk of AI taking too much money without achieving ROI,” he said. “Turning into someone who cares more about the financial aspect and looking at costs on a frequent basis…was a huge success.” In his view, senior technology leaders increasingly need to think with some of the rigor of a chief financial officer when it comes to managing AI infrastructure spend.

Finally, he pointed to the value of frameworks. His own AI maturity framework and a flywheel model focused on planning, building, and optimizing AI systems have helped keep the team oriented even as the technology underneath them continues to shift.

The TechArena Take

Maher’s perspective reflects a measured but substantive view of what AI can deliver in enterprise software, one grounded in the realities of compliance-heavy industries and the organizational complexity of global customers. Rather than positioning AI as a transformation layer bolted onto an existing product, Betterworks has committed to rebuilding the platform’s foundations to make intelligence a native capability. For technology decision makers evaluating AI-powered SaaS in regulated environments, the Betterworks story offers a useful model.

Learn more about Betterworks at betterworks.com, and watch our full podcast episode.

Cambium Capital General Partner David Moehring on a Blue Background for TechArena Data Insights Tech Interview

From Xcelerated Compute: Cambium Capital on Practical Quantum

Depiction of Nilesh Shah on a graphical background.

Where AI Inference Hits the Memory Wall

ZeroPoint’s Nilesh Shah explores why data movement, compression, and memory bandwidth now shape AI inference performance, and where heterogeneous systems and quantum may fit next.

Innovaccer’s Tapan Shah depicted on a graphical background.

Innovaccer’s Tapan Shah on Scaling Healthcare AI Safely

AI architect Tapan Shah joins the Data Insights podcast to discuss scaling AI in healthcare, governance, AI agents, and how technology can improve patient outcomes and reduce provider burnout.

Depiction of Maher Hanafi with a blue and purple graphical background.

Betterworks: Turning AI Hype Into Enterprise SaaS at Scale

Maher Hanafi of Betterworks joins TechArena Data Insights to discuss AI in enterprise SaaS, why many AI proof-of-concepts fail, and how engineering leaders can successfully move AI into production.

As organizations build private AI clouds to control costs and protect their data, they face a familiar dilemma: the trade-off between performance and operational simplicity. Hyperscalers (like AWS or Google) have both, but only because they have armies of engineers to build custom software that tames their hardware.

How Hedgehog Brings Hyperscaler Agility to Any AI Infrastructure

As organizations build private AI clouds to control costs and protect their data, they face a familiar dilemma: the trade-off between performance and operational simplicity. Hyperscalers (like AWS or Google) have both, but only because they have armies of engineers to build custom software that tames their hardware.

My recent conversation with Solidigm's Jeniece Wnorowski and Marc Austin, CEO and co-founder of Hedgehog, revealed how enterprises can now access that same "Hyperscaler Agility"—without the army of engineers.

The key? Decoupling the control plane from the hardware.

The Hyperscaler Playbook for Everyone

Hedgehog’s mission centers on enabling enterprises, government agencies, and neoclouds to “network like a hyperscaler.” This means moving beyond rigid trade-offs. Instead of being forced to choose between the stability of validated reference architectures or the flexibility of open standards, Hedgehog allows organizations to leverage both, orchestrated by a single software platform.

This approach offers a massive strategic advantage: Supply Chain Resilience.

As Marc explained, a diversified hardware strategy is critical for risk management. “If you have a supply shock—like a global pandemic or a trade war—that can limit your ability to scale because supply becomes constrained,” he noted. “You can’t add capacity to your network when you need to.”

By running open-source software on OCP standards-based servers, organizations can acquire equipment from whichever vendor offers the best price and availability at that moment. And because Hedgehog’s control plane is hardware-agnostic, it can eventually extend this same flexibility to other high-performance reference architectures, ensuring that the software experience remains consistent regardless of the underlying silicon.

Automating the "Network Appliance"

Hardware diversity is only half the battle; the other half is operational speed. Hedgehog delivers all the software needed to automatically install, configure, and operate AI networks as a turnkey "appliance." This eliminates weeks of manual configuration work by network architects.

More importantly, it democratizes access. By providing a Virtual Private Cloud (VPC) service, Hedgehog allows enterprise users or neocloud tenants to operate within a private, secure segment—consuming on-premise AI infrastructure with the same self-service ease they expect from a public cloud provider.

Real-World Agility: Zipline & FarmGPU

The power of this "Universal Control Plane" is evident in how customers are using it to bypass traditional infrastructure bottlenecks.

Zipline, an automated drone delivery company, utilized Hedgehog to build a private cloud that cut infrastructure costs by 70% while keeping their delivery data secure. The critical win wasn't just the hardware savings—it was the operational model. They managed the deployment with their existing DevOps team, without hiring specialized network engineers, because Hedgehog abstracted the physical switching complexity into simple software commands.

In the high-performance arena, FarmGPU (operating the Solidigm AI Central Lab) used Hedgehog to orchestrate an 800G fabric for AI training. Independent testing by SemiAnalysis highlighted that Hedgehog’s software-defined congestion management maximized bandwidth and GPU utilization.

This proves a vital point for the future of AI: The software you use to manage the network matters just as much as the wire itself.

Solving the "Gateway Problem"

Agility isn't just about the switch fabric; it's about how data enters the building. FarmGPU faced a challenge familiar to many AI operators: ingesting terabytes of training data through a limited enterprise firewall.

Legacy solutions required expensive, proprietary hardware routers. Hedgehog’s software-defined gateway turns standard x86 servers into high-performance routers. This effectively brings the functionality of a public cloud "Transit Gateway" on-premise, allowing secure, multi-tenant segmentation for AI workloads.

The TechArena Take

Hedgehog is redefining the role of the network in the AI stack. By focusing on a hardware-agnostic control plane, they are ensuring that the "Brain" of the network (the automation) is distinct from the "Body" (the switch).

This is the architecture of the future. It gives enterprises the ultimate luxury: Choice. It allows IT leaders to select the best hardware for their specific workload—optimizing for cost, performance, or supply chain availability—while maintaining a consistent, automated operating experience across the entire fleet.

For organizations that view data as their competitive moat, this ability to unify diverse infrastructure under one automated standard is the key to scaling AI.

Particle Physics Powers Quantum Computing’s Future at Fermilab

In the race for advancing technology, time, funding, and attention are often dedicated to immediately monetizable applications. While industry roadmaps certainly drive technological advancement, basic science, which forwards our fundamental understanding of the universe, can create breakthrough findings with wide-reaching applications and effects.

My recent conversation with Silvia Zorzetti from Fermilab and Solidigm’s Jeniece Wnorowski revealed how such research into the convergence of high-energy physics and quantum technology is creating outstanding developments for quantum computing.

From Colliders to Qubits

As a U.S. particle accelerator laboratory, Fermilab has spent decades perfecting superconducting cavities that accelerate particle beams to near light speed. Through years of study, researchers at the lab have identified several sources of noise that can make these superconducting cavities less efficient, and they have worked to eliminate those sources of loss.

In 2017, researchers began studying these same cavities at the quantum level. As Silvia explained, at this single photon level, the energy is much lower, which means there are new potential sources of loss compared to the higher energy levels. “We can focus on the basic science and the basic understanding of those mechanisms,” Silvia explained.

At the same time, Fermilab is finding ways to transform superconducting cavities to be efficient for quantum computing. By placing qubits inside the cavities, Fermilab has achieved 20 milliseconds of coherence. That coherence time represents a critical advance in the typical rapid decay time of quantum information. “And we know that it is possible to achieve more coherence,” Silvia said.

Quantum Computing’s Sweet Spot

Quantum computers won’t replace classical computers for all tasks. The technology excels at specific problems, including the Shor algorithm for prime number factorization and Grover’s algorithm, a quantum algorithm for unstructured search. As the threats to digital security grow, the Shor algorithm is of extreme interest because of its efficient ability to factorize prime numbers, a process that is very important for cryptography. “This means that we can have systems that are very secure because they cannot be broken by someone else with a more advanced cryptography system, let’s say an alien, because here on Earth, we all have to comply with the quantum mechanics rule,” she explained.

Beyond these practical applications, quantum computing excels for quantum simulations for field theories, which involve many interacting components. “We have these huge many body problems in which there are so many entities. We know how to describe them if they are alone,” Silvia explained. “But then when they start to interact to each other, it becomes a very complex model. So, we know that quantum computing is very good for those kinds of applications.”

Roadmap Challenges

While quantum computing has seen great progress, challenges remain to realizing its full potential. Silvia identified two types of hurdles to be overcome: “engineering problems,” where technical challenges are understood and need to be addressed, and problems where more fundamental research is required.

“Those are mainly between the interconnects,” she said. “The interconnects are needed for scaling quantum computing…and in particular, interconnects with very low losses.” Quantum information is a weak signal, so research to find solutions that will prevent the loss of this information remains a priority.

Quantum error correction presents another major hurdle. Quantum states are inherently fragile, and errors can arise in quantum information due to decoherence and other issues. The community is developing algorithmic techniques that make computations more robust to noise, while simultaneously working on hardware improvements.

The Promise of Quantum Sensing

The environmental sensitivity that creates problems for quantum computing is actually an advantage for quantum sensing, which is already delivering results. Fermilab is leveraging this sensitivity to detect dark matter, dark photons, and axions. It is also studying radiation, like the gamma ray, to understand better how these affect quantum computers, and how quantum computers could be built that are robust to radiation.

The TechArena Take

Fermilab’s approach to quantum computing demonstrates how domain expertise from one field can catalyze breakthrough innovations in another. By leveraging decades of superconducting cavity development for particle accelerators, the laboratory has achieved amazing quantum coherence times. The pursuit of fundamental knowledge about how materials behave at the quantum level has yielded practical breakthroughs that are now accelerating the entire field forward.

For those interested in learning more, Fermilab hosts regular symposiums and outreach programs, including their Quantum 101 track designed for non-experts. Learn more about Fermilab’s quantum research at fnal.gov.

A line of book shelves with the front shelf turning into a computer parts shaped like a shelf

How Rose-Hulman Modernized IT to Focus on Education Outcomes

In higher education, information technology infrastructure often operates behind the scenes, quietly enabling learning without drawing attention to itself. For Rose-Hulman Institute of Technology, that philosophy recently drove a significant infrastructure transformation. The goal was straightforward: remove barriers so faculty and students can focus on research, teaching, and learning rather than wrestling with technology limitations.

During my recent TechArena Data Insights episode with Solidigm’s Jeniece Wnorowski and Justin Baker, systems administrator lead at Rose-Hulman, Justin shared how the institution modernized their infrastructure. The results demonstrate how strategic infrastructure investments can dramatically improve operational efficiency while directly supporting educational outcomes.

The Challenge: When Infrastructure Becomes the Bottleneck

Before its latest upgrade, Rose-Hulman’s previous infrastructure challenged system administrators in a variety of ways. Older, disparate systems that were pieced together created slowdowns in trying to do any sort of maintenance, from bringing systems back up if they went down to meeting the demand to roll out new software.

For a small IT team managing everything from student information systems to enterprise resource planning platforms and Microsoft 365 administration, these delays were a serious hindrance. The team needed infrastructure that would let them respond rapidly to emerging needs rather than constantly fighting the limitations of aging hardware.

“Upgrading made the most sense in terms of being able to get that speed and that ease of use….and making fewer points of failure,” Justin explained.

Doing More with Less Through Strategic Partnerships and Modernization

Rose-Hulman’s decision to upgrade by partnering with DataON and incorporating Solidigm solid-state drives (SSDs) as the storage foundation centered on technical compatibility. As a Microsoft shop running primarily Windows servers, Rose-Hulman saw DataON’s close collaboration with Microsoft as a perfect fit. In addition, DataON’s hardware expertise ensured the new infrastructure would support Rose-Hulman’s critical administrative and educational systems.

The performance improvements following the infrastructure upgrade were substantial. Scheduled maintenance windows that previously consumed six to eight hours now are completed in under three hours. Server deployment timelines have been compressed from up to two hours to 10-to-15 minutes. The team no longer needs to wait for “after hours” time blocks to do maintenance or fine tuning, and has time to address critical institutional systems.

“We’re able to run more with less,” Justin explained. “So we can focus on the types of things that allow us to add reliability or backup or something like that to our environment versus having to front-load most of the infrastructure for it just to run everything.”

Piloting a New Model for Access to Engineering Applications

Beyond upgrading core infrastructure, Rose-Hulman is exploring how Azure Local paired with Azure Virtual Desktop (AVD) and NVIDIA L4 graphics processing units (GPUs) can transform software delivery for students. The pilot deployment runs demanding engineering applications through virtual desktop infrastructure, eliminating the traditional constraint of needing powerful local hardware.

This approach addresses a longstanding challenge in engineering education: ensuring every student can access resource-intensive applications regardless of the device they own. By centralizing compute resources and delivering applications virtually, Rose-Hulman can provide consistent performance and eliminate student concerns around having the right high-performance device, or needing to make time to get to a lab to complete coursework.

The TechArena Take

Rose-Hulman’s infrastructure transformation illustrates how strategic technology investments can directly support educational missions in higher education. By partnering with vendors who understand their technology ecosystem and deploying high-performance storage solutions, the institution is achieving measurable operational improvements that cascade into better student experiences. For educational institutions managing tight budgets and small IT teams, efficiency gains translate directly into capacity for innovation and improved service delivery.

As Rose-Hulman continues expanding their Azure Local deployment and virtual desktop capabilities, they’re positioned to offer students greater flexibility and access while maintaining the high-performance infrastructure that engineering education demands. This balance between operational efficiency and educational excellence reflects the thoughtful approach required when infrastructure decisions directly impact student success. Learn more about Rose-Hulman Institute of Technology at www.rose-hulman.edu.

The New Economics of AI Inference with Runpod

As organizations deploy AI models at scale, a new set of challenges has emerged around operational efficiency, developer velocity, and infrastructure optimization. A recent conversation with Solidigm’s Jeniece Wnorowski and Brennen Smith, head of engineering at Runpod, revealed how cloud platforms are rethinking the entire AI stack to help developers move from concept to production in minutes rather than months.

The Economics of AI Inference

Runpod operates 32 data centers globally, providing graphics processing unit (GPU)-dense compute infrastructure for small companies and enterprises building and deploying AI systems. This service is crucial considering the economics of modern GPUs, where a single system with 8 GPUs can cost hundreds of thousands of dollars. Runpod understands that the compute hardware is only part of the equation. “Storage and networking…glue these systems together,” Brennen said. “By ensuring that there’s high quality storage paired up with these GPUs….we have been able to show that this results in a markedly better experience.”

On top of this, the company provides a sophisticated software stack that allows developers to go from their idea to production in minutes, across training and inference use cases. The goal is to “Make it so developers and AI researchers can focus on what they do best, which is actually delivering value to their customers,” Brennen said.

The ability to rely on optimized infrastructure is becoming even more important as organizations move from training to deployment. Smith likened training infrastructure to traditional business capital expenditures, noting that the high up-front costs see a return on investment over a long period of time. In inferencing, organizations deal with ongoing operational realities, grappling with scaling, efficiency, and delivering value to customers daily. As a result, Runpod has engineers specifically looking at inference optimization. With the rise of AI factories, “How well these systems are run from an operational excellence perspective will dictate the winners and losers,” Brennen said. “You run an inefficient factory, you’re out.”

Storage as the Innovation Accelerator

One of the most important insights from our conversation addressed storage, which is now seen as a hidden bottleneck in AI. Brennen recounted how his engineering team recently investigated Docker image loading times. While unrelated to a specific large language model (LLM) activity, developers flag issues like slow loading times as hurting their overall workflow. This gets in the way of things needing “to magically just work.”

For the solution, Brennen reiterated that storage is what glues the system together. “What we have found is every time, as long as we are optimizing our storage, we are able to make the data move faster,” he said. And when data movement is optimized, entire development cycles accelerate.

Runpod recently launched ModelStore, a feature in public beta that leverages NVMe storage and global distribution to make AI models seem to appear “like magic.” What previously took minutes or hours now happens seamlessly, compressing development iteration cycles. For organizations under pressure to deliver AI capabilities quickly, these time savings compound into significant competitive advantages.

Brennen emphasized that faster developer cycles enable teams to fail fast and iterate more effectively to deliver successful outcomes. When CTOs receive mandates to implement AI, their success depends on giving teams tools that accelerate innovation rather than creating additional friction.

The Convergence of Infrastructure and Code

Looking ahead, Brennen identified the convergence of infrastructure and software as a transformative trend. The goal is to enable code to self-declare and automatically establish the infrastructure required to run it, freeing developers from thinking about infrastructure so they can focus on their code and creating value aligned to business logic. “Anything we can do to make it even easier to get global distribution, that’s a hugely powerful paradigm,” he said.

The TechArena Take

Runpod’s emphasis on developer experience demonstrates that sustainable AI deployment requires thinking holistically about the entire infrastructure stack. The company’s focus on making complex infrastructure feel magical to developers reflects a broader industry recognition that reducing friction accelerates innovation.

As AI moves from experimentation to production deployment, organizations that optimize for developer velocity and operational efficiency will have a significant advantage from their ability to accelerate time to value. For organizations evaluating AI infrastructure partners, Runpod’s approach offers a model that balances performance, scalability, and ease of use.

Connect with Brennen Smith on LinkedIn to continue the conversation, or visit Runpod’s website and active Discord community to explore how their platform might support your AI initiatives.

Quantum Computing Comes of Age: How Fermilab Is Shaping the Future

Fermilab’s Silvia Zorzetti explains how quantum computing and sensing are evolving, where they outperform classical systems, and what’s next for the field.

Depiction of Marc Austin on a purple and blue graphical background.

How Hedgehog Brings Hyperscaler Networking to Enterprise AI

Hedgehog CEO Marc Austin joins Data Insights to break down open-source, automated networking for AI clusters—cutting cost, avoiding lock-in, and keeping GPUs fed from training to inference.

Rose-Hulman Institute of Technology on the Solidigm Data Insights Podcast Series

How Rose-Hulman Modernized Its Data Center for Student Success

Rose-Hulman Institute of Technology shares how Azure Local, AVD, and GPU-powered infrastructure are transforming IT operations and enabling device-agnostic access to high-performance engineering software.

How PEAK:AiO Turns Commodity Servers Into AI Rocket Ships

By delivering performance with one-sixth the hardware footprint of competitors, the software-defined storage startup aims to make AI experimentation affordable at scale.

Organizations building out AI infrastructure have rapidly matured from struggling to understand GPU requirements to demanding scalable, cost-effective solutions that can grow with their ambitions. At the recent OCP Global Summit, I spoke with Roger Cummings, CEO of PEAK:AiO, and Solidigm’s Jeniece Wnorowski about how one company is tackling the infrastructure challenges that emerge as AI moves into production-scale deployments.

Performance Without Infrastructure Sprawl

PEAK:AiO’s original breakthrough was software-defined AI storage that transforms commodity servers into high-performance infrastructure. “Our secret sauce very early was getting line-speed performance on a single server,” Roger said, thereby maximizing performance in the smallest possible footprint. This approach turns an ordinary server “into a rocket ship for AI” and helps organizations avoid massive deployments that consume excessive power, cooling, and rack space.

The efficiency gains are dramatic. Roger explained that competitors typically require 12 to 15 nodes to match the performance PEAK:AiO delivers with just one-sixth the infrastructure. Enabled by its close partnership with Solidigm, this density advantage translates directly into lower operational costs for power, thermal management, and physical space—critical factors as data centers face growing energy constraints.

Open-Source File System for Scale

Single-server performance solved the first wave of challenges, but today’s expanding AI applications demand the ability to scale across distributed file systems. The market is littered with proprietary solutions, so PEAK:AiO took a different path: in collaboration with Los Alamos National Laboratory, it developed an open-source parallel network file system (pNFS) built specifically for AI workloads.

Going open source aligns with industry standards and customer demands for simplicity and flexibility. Roger emphasized that the new pNFS solution “will match the performance of storage as well as the scale of the file system that people need today.” The company uses a modular framework that automatically recognizes new nodes as they’re added, delivering linear scaling for both capacity and performance. This architecture dramatically lowers the cost of experimentation and failure—an essential consideration for teams exploring new AI use cases. As Roger put it, “It doesn’t have to be cost-prohibitive to take risks and build innovation.”

From Centralized Training to Distributed Intelligence

PEAK:AiO’s value proposition has evolved along with the AI market itself. While large-scale training clusters once dominated the conversation, inference workloads—both in centralized facilities and at the edge—are now the primary growth driver. The company’s high-performance, scalable platform is ideally suited for both.

Roger also highlighted rising interest in federated learning, where intelligence is captured as close as possible to the data source before being rolled up into master models. PEAK:AiO’s infrastructure naturally supports these distributed architectures by enabling fast data capture and processing wherever the data is generated.

Building Success Through Workload Intelligence

Looking ahead, Roger said, “We need less infrastructure and more success—and I think we’re a great partner to achieve that.” Future innovations from PEAK:AiO, developed in partnership with Solidigm, will create richer memory and storage tiers with deeper intelligence about AI workload patterns. This will allow automated, policy-driven movement of workloads to the optimal tier, further improving both performance and cost efficiency.

TechArena Take

PEAK:AiO’s trajectory shows how infrastructure providers are evolving to meet AI’s real-world scaling challenges. Its focus on extreme efficiency, modular open-source architectures, and workload-aware optimization directly addresses the constraints of power, space, and budget while delivering the performance AI demands. As deployments shift from centralized training to distributed inference and federated learning, solutions that combine density with operational simplicity will become increasingly indispensable.

Learn more about PEAK:AiO’s infrastructure solutions at https://peak-aio.com or connect with the team to explore how their open-source approach can accelerate your AI initiatives.

RackRenew Turns Decommissioned OCP Hardware into Ready Capacity

During #OCPSummit25, Jeniece Wnorowski of Solidigm and I caught up with Jelle Slenters of RackRenew on how the firm converts retired OCP-compliant racks, servers, switches, power shelves and more into validated rack-level systems, complete with provenance, burn-in, and a joint certification label.

Why Remanufacturing is Having a Moment

The cloud era taught us to think in fleets. The AI era is forcing us to think in megawatts. In between those realities sits an enormous pool of high-quality, standards-based gear that ages out of hyperscale production far faster than it ages out of usefulness. RackRenew’s thesis is simple: if we standardize the processes for take-back, test, refurbish, and certify—at scale—we can turn retired systems into ready-to-run capacity for the next wave of adopters.

That’s not a niche. As Jelle put it, the total addressable opportunity is in the “hundreds of millions,” and if we truly nail the collaboration across the industry, the upside is “beyond calculation.” The value comes from process: documented, repeatable, and reliable.

Standards are the Scaffolding

OCP is the right place to be talking about this because standards reduce entropy. Common form factors, power and management specs, and known failure modes mean you can design remanufacturing flows that aren’t bespoke for every asset. When you remove variance, you remove cost and time. When you add shared protocols, you add trust.

Trust is the Currency

Enter the OEMs and platform providers. The opportunity is to co-design take-back and recert flows for entire OCP building blocks—racks, servers, switches, power shelves, and harnessing—not just components. That means shared diagnostics, firmware baselines, power/thermal tests, and a joint certification label that signals: remanufactured, validated, and backed by a warranty. That’s what moves circular gear from “nice idea” to procurement-approved infrastructure.

Where the Demand Shows Up First

Two near-term landing zones stood out in our conversation:

Fast-growing operators: Clouds, co-locators, and service providers that need to stand up capacity quickly—full racks, network fabrics, and power shelves—without long lead times or greenfield budgets.

GPU inference and bare-metal providers: Standing up GPU clusters isn’t just GPUs; it’s the rack, the fabric, the power path. Dropping in certified OCP rack-level assemblies lets teams scale faster while keeping performance envelopes predictable.

If we get the ecosystem right, customers get predictable outcomes. And predictable outcomes are the only way circularity shows up in the production SOW.

Circularity With a P&L

I love a good sustainability story, but the reason this matters goes beyond sustainability into economics. In an era of equipment scarcity and grid constraints, circular supply unlocks capacity faster and cheaper. That means shorter time to deploy, lower embodied carbon, and better capex efficiency. And because OCP standards reduce integration costs, the savings aren’t swamped by engineering overhead.

The Collaboration Blueprint

Jelle’s outline for how this scales:

Co-develop take-back protocols with OEMs and hyperscalers.

Align on software tools, test plans, and recertification criteria.

Ship with a joint label and warranty that enterprise buyers trust.

Route eligible material into remanufacturing streams and place it directly into new, standards-compliant builds.

Do that across storage, compute, networking, and power, and the “hundreds of millions” TAM looks conservative.

Quality and Reliability

Reliability comes from grading and process: rack-level power and thermal validation, network link integrity checks, server health screening, and repeatable test plans. The result is predictable service-level objectives and clear workload matching—without over-indexing on any single subsystem.

TechArena Take

Circularity wins when it delivers new-grade outcomes. The end user shouldn’t have to adjust workloads or expectations because a rack is remanufactured. OCP standards make that possible at scale. The next step is trust infrastructure—joint labels, shared test artifacts, and warranties. RackRenew’s rack-level, certified approach makes circular capacity a practical default for new deployments, unlocking savings, faster turn-ups, and lower embodied carbon.

Learn more about RackRenew at their website.

Watch the podcast

Glowing circular arrows showing data flow inside a server system

Giga Computing: Rack-Scale, Regional, and Ready for Inference

I recently caught up with Giga Computing’s Chen Lee about what’s really changing inside AI data centers. Spoiler: it’s not just bigger racks and faster GPUs. It’s how those racks get built, where they’re assembled, and how we plan for a world where inference becomes the dominant workload pattern.

One of the threads in our discussion was Giga Computing’s push on circularity—reclaiming, sorting, and returning components to a second life. Chen was candid: yes, circular practices eliminate waste and can reduce some costs, but they’re not a pure economic play. There’s real human work in sorting and qualification. The point isn’t cost-first; it’s responsibility-first—answering “the call for Earth,” as he put it. That framing matters. At OCP, sustainability isn’t a backdrop; it’s a design constraint. And circularity is moving from “nice to have” to “show me your plan.”

Why Inference Changes the Rack

Training may steal the headlines—and budgets—but inference is the business. Chen’s view is that the next expansion wave will be dominated by inference-centric racks that look and behave differently: more elastically scaled, more network-sensitive, and more tightly integrated with edge and enterprise fabrics. That opens new addressable markets across sectors—healthcare, finance, oil and gas, education, government—each with unique latency, privacy, and cost envelopes. If training is a few giant mountains, inference is a mountain range: broader, more varied, and much closer to the users who depend on it.

Modularity is the Multiplier

Giga Computing’s emphasis is modularity: building blocks that let operators configure for today’s sprint and tomorrow’s pivot. In practice, modularity shortens time-to-capacity, smooths upgrades, and lowers the operational blast radius when something changes—like a new model family, memory footprint, or accelerator ratio. The companies winning rack scale are the ones who treat integration, test, and validation as first-class products—not just a step between BOM and shipment.

Proximity is Performance: Why Local Build Matters

A second pillar: U.S.-based assembly. Giga Computing is standing up local capability to improve lead times, reduce shipping risk, and cut carbon that comes with long logistics chains. There’s also a quality and reliability angle that’s easy to overlook: racks that don’t spend weeks getting rattled across oceans arrive with fewer transport-induced gremlins, making burn-in and final test more predictive. Chen flagged SKD approaches that enable “Assembled in America” servers—another lever for responsiveness when demand spikes and when regulatory or customer requirements insist on a regional footprint. In a supply chain still healing from 2020–2022 shocks, proximity is performance.

The Revolution—Tempered with Responsibility

Chen was direct about the macro arc: AI’s disruption dwarfs the industrial revolution. That’s not hyperbole on the OCP show floor; it’s the operating assumption for everyone building AI factories. But the second clause matters: be careful about the road we go down. For vendors, that means shipping capacity with guardrails—power-aware, grid-friendly, and supportable. For operators, it means designing for efficiency, circularity, and people—because the most constrained resource in this market might be expert hands, not megawatts.

It’s clear that Giga Computing is leaning into OCP’s core ethos—openness, efficiency, and scalability—while aligning to the buyer reality of 2025: more inference, faster turns, and a stricter carbon ledger. The choice to invest in U.S. assembly is a clear signal. The focus on modular rack-scale integration is another. And the honesty about circularity’s costs—and its necessity—reads as maturity, not marketing.

But the larger takeaway is ecosystem-level: OCP’s center of gravity is shifting from “can we build it?” to “can we scale it responsibly, locally, and profitably?” As AI diffuses into every sector, vendors that align on all three tend to be better positioned to keep shipping through the next wave.

If you want to learn more, Chen points to Giga Computing’s site and his LinkedIn.

TechArena Take

The ground is shifting under data center infrastructure—away from one-off training builds toward the day-to-day realities of inference at scale, tighter lead times, and verifiable sustainability. In that context, Giga Computing is getting more sophisticated in addressing enterprise requirements—focusing on modular rack integration, treating circularity as an engineering process, and leaning into regional assembly for predictability and QA.

Science beaker filled with fizzing yellow liquid inside a data center

Castrol Offers a Fluid Fix for AI’s Overheating Data Centers

The data center industry’s cooling crisis has a surprising champion: a century-old British lubricants manufacturer better known for what’s under the hood of your car than what’s inside your server rack. My recent conversation with Darren Burgess, PhD, Business Development Director for Castrol, and Solidigm’s Jeniece Wnorowski revealed how the company is leveraging decades of fluid expertise to address one of AI infrastructure’s most pressing challenges.

The Physics of AI Demand Liquid Cooling

The transition to liquid cooling isn’t a preference but a necessity driven by the thermal characteristics of modern AI chipsets. As Darren explained, “The chipsets are just too hot to be cooled by air. Basically, if you want to try to do air cooling, the servers would need to be so separated you’d probably only get a couple of servers per rack in order to rush enough air through.” Given the premium on data center real estate and power, that approach obviously doesn’t scale.

Direct-to-chip cooling addresses this constraint by placing cold plates (essentially heat exchangers) directly against the chipsets generating the most heat. A propylene glycol solution circulates through these cold plates, drawing heat away from the processors and enabling the server densification that AI deployments demand. This approach allows operators to maximize their infrastructure investment while maintaining the vertical rack configurations they’re already familiar with.

The Chemistry Behind the Cooling

What appears straightforward on the surface reveals significant complexity once you examine the chemistry involved. The coolant used is PG25, which contains 25% propylene glycol and 75% water. This immediately raises concerns about corrosion in systems featuring copper cold plates, iron components, and various other metals throughout the cooling distribution infrastructure.

This is where Castrol’s differentiation emerges. The critical factor isn’t the base fluid itself. “The additive pack is sort of the ‘magic dust,’ which is really where the action is,” Darren explained. It protects against corrosion while maintaining cooling efficiency. If corrosion occurs, particulates can clog the narrow channels within cold plates, some measuring as small as 50 to 100 microns. The result would be degraded cooling performance or complete system failure, forcing expensive downtime to remedy the problem.

Castrol’s expertise lies in formulating additive packages that prevent these failure modes, drawing on the same chemical engineering knowledge they’ve applied to automotive lubricants for decades. The qualification process requires demonstrating that cooling fluids won’t corrode system components, an essential step before any deployment.

The Single Point of Failure

Darren offered a compelling analogy to illustrate the cooling fluid’s critical role in data center infrastructure. While operators can deploy redundant cooling distribution units, backup pumps, and duplicate systems throughout the infrastructure stack, the cooling fluid itself represents a single point of failure. “It’s really like the blood,” he said. “We have two eyes, hands, and feet. We can get away with a lot of repeat, but we only have one system of blood. You really have to take care of it or you’ll shut the body down, or the data center down.”

This reality positions fluid health and maintenance as crucial operational concerns. Castrol’s value proposition extends beyond simply supplying cooling liquids to encompassing the entire lifecycle: proper installation to avoid contamination, ongoing maintenance to preserve fluid integrity, and eventual disposal when systems are decommissioned.

Rapid Market Evolution

The liquid cooling market’s maturation has accelerated dramatically over the past year. Hyperscalers including Amazon Web Services, Microsoft, and Google have moved beyond pilot programs to large-scale deployments of direct-to-chip cooling. The conversation has shifted from questions about equipment availability to detailed technical discussions about additive packages, compatibility testing, and failure mode analysis.

This evolution reflects the industry’s growing confidence in liquid cooling as a proven technology rather than an experimental approach. Organizations are now focused on operational details: understanding how fluids interact with different materials, identifying potential points of failure before they occur in production, and standardizing deployment practices across the industry.

The TechArena Take

Castrol’s pivot from automotive lubricants to data center cooling solutions demonstrates how adjacent industry expertise can address emerging infrastructure challenges. Their experience formulating fluids for demanding thermal environments translates directly to the requirements of AI infrastructure.

As liquid cooling transitions from niche technology to mainstream deployment, organizations that partner with suppliers offering comprehensive lifecycle management, from installation through maintenance to disposal, will be better positioned to avoid costly downtime and operational disruptions. The cooling fluid may be invisible to end users, but it’s becoming as foundational to AI infrastructure as the silicon it protects.

For more information about Castrol’s data center cooling solutions, visit https://www.castrol.com/en/global/corporate/products/data-centre-and-it-cooling.html.

Valvoline Global Operations Engineers Data Center Liquid Cooling

For more than 150 years, Valvoline has been synonymous with high-performance motor oil and racing heritage. Now, the company is applying its expertise to a very different kind of performance challenge: keeping AI data centers cool as they transition from the megawatt era into the gigawatt era.

In a recent conversation with Michael Morrison, director of new ventures at Valvoline Global Operations, and Solidigm’s Jeniece Wnorowski, I discussed how data center cooling represents a natural evolution for a company built on managing heat and performance. As Michael explained, Valvoline has actually maintained a data center presence for years, providing oils for backup power generation systems. The move into cooling solutions represents a deeper engagement with an industry facing unprecedented thermal challenges.

Two Liquid Cooling Solutions to Meet the Density Challenge

While rising temperatures grab headlines, Michael emphasized that density is the real challenge facing modern data centers. AI-enhanced workloads require packing more chips into the same physical space, creating concentrated heat loads that traditional air cooling cannot effectively manage. Liquid cooling enables increased density of chips per server and of servers per data center, fundamentally changing the economics of AI infrastructure deployment.

Two approaches to liquid cooling have arisen in response to this challenge: direct-to-chip cooling, and immersion cooling. Direct-to-chip cooling runs coolant through lines and cold plates to cool individual processors, and it has already moved beyond adoption into rapid growth. Major manufacturers have begun supporting this approach, and deployments have begun.

Immersion cooling, however, remains in earlier stages. In immersion cooling applications, entire servers are submerged in tanks filled with dielectric fluid. The approach allows heat to be captured from all components simultaneously. It also represents a large potential change for hyperscalers, which explains why it is still largely in proof-of-concept phase.

“They’re not used to having large open tanks sitting in their data centers,” Michael said. “So, they’re not only testing performance metrics, but understanding, ‘what is my maintenance on a server like?’ All of those things have to have operational procedure set: all the nuances of running it in a normal setting and an emergency environment.”

Precision Testing and Partnership: Engineering Compatibility at Every Level

The key to immersion cooling is dielectric oils. Dielectric materials, like the oils Valvoline produces, are nonconductive substances for electric currents. And finding a fluid with ideal properties to enable high performance is where Valvoline Global shines.

“We’re used to testing properties in our fluids that would determine, does it conduct electricity? Does it transfer heat?” he said.

While Valvoline Global’s fluid testing capabilities form the foundation, Michael emphasized that deploying these solutions successfully requires a more comprehensive approach. When servers are immersed in dielectric oil, compatibility becomes critical across thousands of individual components. Valvoline Global works closely with data center operators to ensure their fluids are compatible with specific hardware configurations, tank materials, and operational requirements. This collaborative approach, which the company has refined over more than 150 years of customer relationships, distinguishes their market strategy from simple product provision.

Sustainability Through Operational Efficiency

Beyond performance, liquid cooling addresses sustainability concerns that are becoming critical for data center operators. By reducing or eliminating large HVAC systems required for air cooling, facilities can significantly decrease power consumption and operational expenses. Water usage can also be reduced depending on system configuration. Michael noted that liquid cooling can creates a scenario where improved cost structure and reduced environmental impact work together rather than act as competing priorities.

The TechArena Take

Valvoline Global’s entry into data center cooling represents more than a company diversifying its product portfolio. It reflects how foundational technologies from established industries are being reimagined to solve the infrastructure challenges of AI deployment. As data centers grapple with the thermal and density challenges of AI-enabled workloads, Valvoline Global’s’s combination of fluid science expertise, collaborative approach, and long history of managing high-performance applications positions them as a meaningful player in this infrastructure evolution. For organizations planning liquid cooling deployments, the lesson is clear: success depends not just on the technology itself but on the partnerships and compatibility testing that ensure reliable, long-term operation.

Learn more about Valvoline Global’s data center cooling solutions at their website, valvoline.com, where they provide detailed technical resources on liquid cooling technologies. Connect with Michael Morrison on LinkedIn to continue the conversation about thermal management innovation.

Equinix on Architecting the AI-Ready Data Center

Inside Equinix and Solidigm’s playbook for turning data centers into adaptive, AI-ready platforms that balance sovereignty, performance, efficiency, and sustainability across hybrid multicloud.

Connected data nodes with central processing hub

OCP 2025: Arm’s Chiplet Play Aims To Democratize AI Compute

During the recent OCP Summit in San Jose, Jeniece Wnorowski and I sat down with Eddie Ramirez, vice president of marketing at Arm, to unpack how the AI infrastructure ecosystem is evolving—from storage that computes to chiplets that finally speak a common language—and why that matters for anyone trying to stand up AI capacity without a hyperscaler’s deep pockets.

Two years ago at OCP Global, Arm introduced Arm Total Design—an ecosystem dedicated to making custom silicon development more accessible and collaborative. Fast-forward to this year’s conference, and the program has tripled in participants, with partners showing real products both in Arm’s booth and in the OCP Marketplace. That traction sets the backdrop for Arm’s bigger news: an elevated role on OCP’s Board of Directors and the contribution of its Foundational Chiplet System Architecture (FCSA) specification to the community.

Why should operators, builders, and CTOs care? Because the cost and complexity of building AI-tuned silicon is still brutal. Depending on the packaging approach—think advanced 3D stacks—Eddie put the total bill near a billion dollars. That number alone has kept bespoke designs out of reach for all but a few. The chiplet vision changes the calculus: assemble best-of-breed dies from different vendors rather than funding a monolith. But the promise only holds if those chiplets interoperate cleanly across more than just a physical link.

That’s the gap FCSA endeavors to fill. It goes beyond lane counts and bump maps to define how chiplets discover each other, boot together, secure the system, and manage the data flows between dies. If it works as intended inside OCP, we are an inch closer to a real chiplet marketplace—mix-and-match components with predictable integration, not months of bespoke glue logic.

Ecosystem is the keyword here, and not just for compute. Eddie spoke to collaborations across the platform, including within storage, as a case in point. Storage is stepping into the AI critical path, not simply holding training corpora but participating in the performance equation. AI at scale turns every subsystem into a performance domain. If data can be prepped, staged, filtered, or lightly processed closer to where it lives, you free up precious GPU cycles and avoid starving accelerators. Expect to see more of that thinking show up across NICs, DPUs, and smart memory tiers.

There’s also a geographic angle that’s difficult to ignore. Several of the newest Arm Total Design partners hail from Korea, Taiwan, and other regions actively cultivating their own semiconductor ecosystems. That matters for resilience and supply, but also for innovation velocity. When the entry ticket to custom silicon comes down, you get more specialized parts serving narrower, high-value slices of AI workloads—think tokenizer offload, retrieval augmentation helpers, or secure inference enclaves woven into the package fabric.

Underneath the product updates is a posture shift: lead with others. The Arm Total Design ecosystem is designed for co-design, not solo heroics, acknowledging that no one player can keep up with AI’s pace alone. OCP, with its bias toward open specs and reference designs that ship, is a natural forcing function. Putting FCSA into that process doesn’t just rack up community points; it pressures the spec to survive real-world scrutiny—power budgets, thermals, board constraints, and the ugly details that tend to eat elegant diagrams for breakfast.

If you’re operating AI clusters today, you’re already feeling the ripple effects. Racks are transitioning from steady-state power draw to spiky, sub-second pulses. Data movement is the enemy. The “box-first” era is fading into a rack- and campus-first design ethic where each layer—power delivery, cooling, storage, fabric, memory, compute—must flex in concert. Chiplets slot into that future because they can accelerate specialization at the silicon layer while OCP standardization tames integration higher up the stack.

What should you watch next? Three signals. First, real FCSA-based silicon or reference platforms that demonstrate multi-vendor die assemblies with clean boot and security flows. Second, storage and memory vendors showing measurable end-to-end gains on AI pipelines when compute nudges closer to data. Third, OCP Marketplace listings that move from reference intent to deployable inventory you can actually procure for pilot workloads.

If the last two years were about proving that chiplets are technically feasible, the next two will test whether they’re operationally adoptable. Specs are necessary; supply chains and service models are decisive. The teams that align those pieces—across vendors, geographies, and disciplines—will dictate how fast AI capacity gets cheaper, denser, and more power-aware.

TechArena Take

The AI build-out is colliding with real-world constraints—power, thermals, and capital. Ecosystems that compress time-to-specialization without exploding integration cost will win. Arm’s OCP board seat plus the FCSA contribution is a smart bet that interoperability is the bottleneck to unlock. If FCSA becomes the lingua franca for chiplets, operators could see a practical path to tailored silicon without a billion-dollar entry fee. Pair that with smarter storage and memory paths, and you start to chip away at the two killers of AI efficiency: idle accelerators and stranded data. The homework now is ruthless validation: put these pieces under AI-class loads, measure tokens per joule, and prove that “lead with others” doesn’t just sound good on stage—it pencils out in the data center.

Robotic arm creating sparks inside factory

Physical AI in Production: Datara AI’s Data-Loop Edge Playbook

The next wave of AI won’t live in a data center—it will weld seams, pick bins, and navigate factories alongside people. Physical AI brings intelligence into machines that perceive, decide, and act at the edge, closing the loop between perception and action.

Real-world factories introduce drift, glare, vibration, dust, and unpredictable human behavior—conditions that most models never see in simulation.

For IT and cloud architects, this is a stack problem with hard requirements: real-time inference under adverse conditions, data pipelines that span OT and IT, and operational discipline that turns pilot demos into consistent, reliable uptime. The gap between ‘works in simulation’ and ‘works every day on the line’ remains the main blocker to scale.

But it’s also a workforce problem. Robots reduce injury risk in hazardous tasks, but displacement effects are real and localized. The question isn’t “robots: yes or no?” but “how do we deploy them responsibly with both operational rigor and a workforce plan?”

Why the Market is Ready 

Three curves are converging: cheaper edge compute and sensors, strong perception models, and maturing MLOps for robotics. The International Federation of Robotics reports 542,000 industrial robots installed in 2024—the fourth consecutive year above 500k—with global demand doubling over the past decade. Industrial AI spending is projected to grow from $44 billion in 2024 to $154 billion by 2030. 

Standards are accelerating deployment. ROS 2/DDS, OPC Unified Architecture, and OCP’s rack-level guidance are pushing interoperability across sensors, controllers, and training infrastructure. The blockers are shifting from feasibility to integration discipline and change management - not ‘Can we automate?’ but ‘Can we maintain reliability when conditions vary beyond 5–10% of training data?’

Industry-grade platforms now stitch together simulation, data pipelines, and robotics foundation models. NVIDIA’s Omniverse plus Isaac tools let teams generate synthetic data, train policies in digital twins, and validate behaviors before touching a live cell—shrinking iteration from months to days. The missing piece is capturing the tribal knowledge of veteran technicians and encoding it into recovery behaviors robots can execute.

The Architecture That’s Working

I spoke with Durgesh Srivastava, CTO of DataraAI, at the recent OCP Global Summit about what separates production systems from pilot theater. His outlook is pragmatic: target full automation for bounded task families, match human quality, and build graceful fallbacks when reality goes off-script. 

DataraAI provides a data engine for physical AI—a data-as-a-service platform that transforms factory experience into machine intelligence. It captures how technicians act, how robots fail, and how edge conditions drift, creating the data foundation real-world robotics has always lacked.

The company emphasizes three pillars: 

1. Egocentric Multi-Modal Data Capture – Robots learn from the same viewpoint they act in. DataraAI’s robot-mounted and wearable sensors record RGB-D vision, IMU, tactile, and audio data from real operations. This captures the nuanced cues—force patterns, drift, micro-failures—that static cameras miss.

2. AI-Driven Annotation – DataraAI’s engine automatically labels rare and high-impact events—fires, spills, breakdowns, human-robot handoffs—turning chaos into structured data. It consistently captures scenarios that traditional CV pipelines fail to label or detect.

3. Continuous Learning Loop – New anomalies are fed back into the data engine. Each cycle makes models more resilient and accurate in the field. Every exception becomes new training data, creating a self-improving loop tied directly to real operations.

Early industrial pilots using this loop showed a 53% accuracy lift and 67% better edge-case handling—clear evidence that real-world data closes the performance gap.

The winning pattern is consistent: push perception and control onto the robot, treat the cloud as training and update infrastructure, and run a disciplined data loop that captures real-world anomalies. In harsh conditions—glare, occlusion, spark bursts—edge models must keep the task running. Inference runs locally on the robot, keeping factory data on-site, reducing latency, and enabling real-time adaptation to drift and anomalies. The back end aggregates field data and pushes lightweight updates routinely. That loop turns demos into dependable production.

High-fidelity simulation generates diverse synthetic data and lets teams rehearse rare events safely. Foundation models provide generalizable priors; reinforcement learning in sim refines task skills before transfer to the real world. Edge inference runs locally; telemetry goes upstream for labeling and augmentation; new policies return during maintenance windows. Daily updates are feasible when you structure the pipeline—this reduces drift and grows edge-case coverage.

The Workforce Reality 

Case studies show robots cut musculoskeletal risk and reduce exposure to hazardous tasks like forging and welding. Collaborative-robot safety standards (ISO 10218, ISO/TS 15066) and OSHA guidance formalize safe human-robot interaction.

But displacement effects are real. MIT research found that each additional industrial robot per thousand workers reduced employment and wages in affected commuting zones—a meaningful impact not fully offset by productivity gains. The IMF estimates about 40 percent of jobs worldwide are exposed to AI impact; roughly half may see productivity augmentation, while the rest face reduced labor demand without intervention.

The macro takeaway: adoption will rise, safety can improve, and some roles will shift. For architects presenting to leadership or  works councils, the deployment plan must address both.

TechArena Take 

Physical AI is crossing from pilot theater to production credibility.

The durable advantage comes from operational muscle: how quickly you spin the loop from floor data to better policies and back to the line, while moving people from hazardous repetitive work to higher-value tasks with clear retraining paths. Start with one cell, one exception, and a fallback procedure. Prove you can turn tribal knowledge into machine-executable behaviors and safety risks into measurable improvements. Then scale the loop. That’s the compounding edge in physical AI.

Nebius at SC25: Building the Neocloud for Enterprise AI

From SC25 in St. Louis, Nebius shares how its neocloud, Token Factory PaaS, and supercomputer-class infrastructure are reshaping AI workloads, enterprise adoption, and efficiency at hyperscale.

Inside Runpod’s GPU Cloud for Long-Running AI Agents

Runpod head of engineering Brennen Smith joins a Data Insights episode to unpack GPU-dense clouds, hidden storage bottlenecks, and a “universal orchestrator” for long-running AI agents at scale.

Expansive city of high rise buildings with abstract digital security wall

How Financial Services Can Responsibly Scale AI

As AI adoption accelerates across industries, financial services takes the front line of both innovation and risk. From fraud detection to customer personalization, AI is reshaping how institutions operate. But the sector’s high stakes and regulatory complexity demand a uniquely careful approach.

At the recent AI Infra Summit in Santa Clara, Jeniece Wnorowski and I sat down with FinTech expert Anusha Nerella for a Data Insights conversation about how financial organizations can responsibly scale AI, stay ahead of fraudsters, and build teams equipped for the future.

“Many institutions are still in the early stages of AI deployment, while bad actors are moving fast and experimenting aggressively,” Nerella said.

This dynamic creates an urgent need for stronger, more agile defenses. Nerella emphasized that financial firms must accelerate their AI implementation cycles without sacrificing the governance and compliance guardrails that define the industry.

Regulatory and Team Culture Shifts

Asked what the broader technology ecosystem should do to support responsible AI in finance and enterprise, Nerella returned to the importance of regulatory alignment.

“Everything has to go through the regulatory and compliance [process] in order to make it responsibly…applicable to the enterprise sector,” she said.

But regulations alone aren’t enough. Nerella believes that financial institutions must rethink team structures and knowledge transfer to keep pace. She advocates for what she calls “reverse training,” in which organizations bring in engineers well-versed in AI frameworks and libraries, then combine their expertise with the strategic experience of senior leaders.

By fostering two-way collaboration between new AI talent and experienced financial professionals, companies can build stronger, future-ready teams.

“It becomes… a collaborative effort for sure,” Nerella explained. “It’s an equal opportunity here because whoever [has] decades of experience…might have limited exposure towards AI-based frameworks or library utilization or hands-on experience.”

This equal exchange of knowledge, she argued, is essential for success.

Start Small, Stay Governed

For organizations just beginning their AI journeys, Nerella’s advice is both practical and pointed: don’t try to boil the ocean. She recommends starting with “two or three clear use cases with ROI” and ensuring that governance and control mechanisms are in place from the outset.

“When you follow all these basic principles, then you will be able to see…result-oriented AI-based implementation from your end,” she said.

Throughout the conversation, she underscored that AI success in financial services requires human-in-the-loop collaboration.

TechArena Take

The financial sector’s high regulatory stakes, complex legacy systems, and relentless fraud threats make its AI journey distinct. Nerella’s insights highlight that the path forward isn’t just about technology—it’s about culture, compliance, and collaboration.

To build responsible and trusted AI systems, financial organizations must:

Accelerate adoption cycles without compromising governance.
Foster reverse training models that marry AI expertise with institutional knowledge.
Start with targeted, ROI-driven use cases, rather than sprawling transformations.

As the industry races to stay ahead of increasingly sophisticated fraud tactics, success will depend on balancing agility and accountability.

Watch the full podcast | Subscribe to our newsletter

Giga Computing on Rack-Scale AI: From Training to Inference

Recorded at #OCPSummit25, Allyson Klein and Jeniece Wnorowski sit down with Giga Computing’s Chen Lee to unpack GIGAPOD and GPM, DLC/immersion cooling, regional assembly, and the pivot to inference.



Empowering Innovationwith Solidigm

Nebius on Why AI Infrastructure Is More Than GPUs

Foresight Works Tackles the Execution Gap in AI Buildouts

Podcasts.

Storage Innovation in the AI Era with Solidigm’s Roger Corell and Tahmid Rahman

Blogs.

Foresight Works Tackles the Execution Gap in AI Buildouts

Submer Group: Data Centers Must Build for the Long Term

MiTAC Computing Bets the Future on Turnkey AI Infrastructure

Scaleway Makes the Case for a Cloud Strategy Built on Control

Komprise Helps Enterprises Escape the Storage Price Trap

IONOS Explores Why AI Success is Decided at the Data Layer

Building Toward Practical Quantum: Ecosystems, Not Hype

Taking AI to the Diagnostic Lab with Care

ZeroPoint Technologies Identifies the Real AI Infra Bottleneck

Dell is Building the Classical Foundation for the Quantum Era

Innovaccer Reframes AI Governance as Healthcare’s Accelerator

Betterworks Uses ‘Horizontal Intelligence’ to Connect HR Silos

Videos.

Solidigm Spotlights AI Storage and Liquid Cooling at OCP

Solidigm & Supermicro Bring AI Storage to New Heights

Optimizing AI Compute Value with OVHcloud’s Custom Infrastructure

AI, Storage & the NYSE: Behind the World's Fastest Markets

Powering Agentic AI at Scale

How Scaleway Is Scaling AI Sustainably in Europe

Arm Weighs in on AI’s Evolution

Scaling Smarter: Peak AIO Tackles AI Bottlenecks at GTC

Solidigm, Dell Discuss AI Data Challenges at GTC

Innovaccer Reframes AI Governance as Healthcare’s Accelerator

The Gap Between Pilot and Production

From Building Models to Agents

Governance as Enablement, not Constraint

Clinical Consequences

Long-term Value

The TechArena Take

Betterworks Uses ‘Horizontal Intelligence’ to Connect HR Silos

Beyond the Static HR Database

AI as Horizontal Intelligence

Responsible AI as a Design Principle

Advice for Technology Leaders

The TechArena Take

From Xcelerated Compute: Cambium Capital on Practical Quantum

Where AI Inference Hits the Memory Wall

Innovaccer’s Tapan Shah on Scaling Healthcare AI Safely

Betterworks: Turning AI Hype Into Enterprise SaaS at Scale

How Hedgehog Brings Hyperscaler Agility to Any AI Infrastructure

The Hyperscaler Playbook for Everyone

Automating the "Network Appliance"

Real-World Agility: Zipline & FarmGPU

Solving the "Gateway Problem"

The TechArena Take

Particle Physics Powers Quantum Computing’s Future at Fermilab

From Colliders to Qubits

Quantum Computing’s Sweet Spot

Roadmap Challenges

The Promise of Quantum Sensing

The TechArena Take

How Rose-Hulman Modernized IT to Focus on Education Outcomes

The Challenge: When Infrastructure Becomes the Bottleneck

Doing More with Less Through Strategic Partnerships and Modernization

Piloting a New Model for Access to Engineering Applications

The TechArena Take

The New Economics of AI Inference with Runpod

The Economics of AI Inference

Storage as the Innovation Accelerator

The Convergence of Infrastructure and Code

The TechArena Take

Quantum Computing Comes of Age: How Fermilab Is Shaping the Future

How Hedgehog Brings Hyperscaler Networking to Enterprise AI

How Rose-Hulman Modernized Its Data Center for Student Success

How PEAK:AiO Turns Commodity Servers Into AI Rocket Ships

Performance Without Infrastructure Sprawl

Open-Source File System for Scale

From Centralized Training to Distributed Intelligence

Building Success Through Workload Intelligence

TechArena Take

RackRenew Turns Decommissioned OCP Hardware into Ready Capacity

Why Remanufacturing is Having a Moment

Standards are the Scaffolding

Trust is the Currency

Empowering Innovation
with Solidigm

Physical AI in Production: Datara AI’s Data-Loop Edge Playbook

Why the Market is Ready 

The Architecture That’s Working

The Workforce Reality 

TechArena Take