
TechArena host Allyson Klein chats with Arm’s Eddie Ramirez at the Open Compute Summit about the architecture’s progress in data center, growing a thriving ecosystem, and the sustainability advantages of Arm’s design making it even more attractive to data center operators.

TechArena host Allyson Klein chats with Solidigm’s Roger Corell and Tahmid Rahman at the OCP Summit about their company’s heritage in the storage arena and how their SSD portfolio delivers the performance and efficiency required for the AI era.

This summer, the Ultra Ethernet Consortium was formed by founding members AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta and Microsoft. The charter of this organization is to address gaps in Ethernet primarily for AI cluster requirements. You may wonder why a technology founded decades ago requires this work, but if you look back in the history of high speed fabrics you’ll remember we’ve been here before. InfiniBand entered the scene in the early 2000’s providing mainframe inspired RDMA at high speeds. Soon, the Top 500 supercomputing rankings started featuring the interconnect. Fast forward to the AI era, the convergence of HPC architecture with AI clusters, and NVIDIA’s acquisition of InfiniBand provider Mellanox, and you can see that AI cluster operators today are limited in scale of cluster deployment without the pricey and single vendor solution. Cloud providers want to scale more efficiently, and UEC is looking to solve the final limitations that have not been addressed with previous efforts.
As someone who worked on InfiniBand during the foundation of the specifications, I’ve followed this dramatic narrative more than the average data center geek. While the companies gathered to work on UEC specifications provide great leadership across the value chain, and others from the Ethernet industry have also signaled their support for this initiative, there has been some concern that the religion that comes up about communications protocols could once again limit progress.
This week, however, a further advancement to UEC was reached with a strategic collaboration with the Open Compute Project and the collective data center operators that roam OCP working groups and deploy open OCP spec’d hardware. The two organizations announced a sweeping collaboration across the OCP Switch Abstraction Interface (SAI), OCP Caliptra Workstream, OCP Networking Project, OCP NIC Workstream, OCP Time Appliance Project, and OCP Future Technologies Initiative. I’d also assume that OCP’s sustainability initiative will get involved to ensure the next generation industry standard fabrics that stem from UEC will offer sustainability as well as performance.
When I caught up with Cloudflare’s Rebecca Weekly earlier this week she put this announcement in context. “We need massive clusters for AI models. I think one of the big themes here is around ultra ethernet, the consortium that has just been established in the last few months to really start to drive ethernet as primary. That is a huge step forward. I would argue for the ecosystem to be able to drive interconnected systems to get us from a couple hundred accelerated nodes operating together to actually millions of accelerated nodes operating together. That is a connectivity first problem.”
The TechArena take: we’re excited to see this collaboration as it will accelerate time from specification to solutions and hopefully prevent forking of vendor solutions from standards. We also see this as a widening of cloud operator blessing on UEC signaling interest in using Ethernet for massive scale fabrics. This puts UEC on technologies to watch heading into 2024.

I was excited for the WEKA presentation at Cloud Field Day, and the WEKIES did not disappoint. Launched in 2017 at reInvent, WEKA provides data services aimed at the HPC cloud services whether in public cloud or within an on-prem private cloud. Their customers are scientists exploring anything from drug discovery and genome sequencing to advances in autonomous transportation and electronic design automation. And yes, their solutions extend to AI as many of the infrastructure principles from HPC have formed the foundation for tightly coupled AI clusters.
When you consider optimization of an AI data pipeline, each stage of the pipeline introduces different challenges for data storage. WEKA has unified this optimization through a single software approach for each stage and further optimized for the different IO characteristics across AI models.

WEKA discusses super efficient HPC storage optimization
WEKA runs in across the four major cloud players, and the performance is eye opening – 5 million IOPs in AWS, rendering 120 fps from the cloud, and 40X faster model deployment, 2 TB/s of data in OCI, and 10X faster research. This performance also results in cost savings in reduction of duplicate copies and pay for what you use storage. This also is estimated to have saved a collective 260 million tons of carbon emissions for WEKA customers.
They deliver this in part by modernizing storage tiering by keeping metadata in the flash tier and chopping large files into small objectives and packing up tiny files into larger objects. They also pre-stage data on the SSD while they’re waiting for current process to complete. Object storage is also kept on hotter tier SSDs until space is needed to lower holistic latency and improve application performance. They walked through an S3 example but apply similar approaches across public offerings.
WEKA also drives data portability within hybrid and multi-cloud environments. They’re doing this by pulling data from on-prem, snapping to object storage, encrypting the data, and moving it to the public cloud. This is also done for bursting where analysis is delivered in the cloud and data is then moved back on prem so that public services can be spun down.
The WEKIE team also discussed zero footprint storage, or running applications and WEKA on the same servers to deliver cost savings and value at scale to the customer. This is employed within GPU farms where IO resources are under-tapped and in other hardware resources with excess IO capacity.
TechArena’s take: cloud providers are desperate to drive AI clusters to higher levels of performance and efficiency. That was the #1 topic at OCP this week, and it’s no surprise that all of the big players have integrated WEKA into their offerings. With more organizations moving HPC workloads to the cloud and more enterprises expanding their AI workloads in the cloud when they can get the GPU cycles, WEKA has an opportunity for major growth in the days ahead. I’m most intrigued by the converged mode zero footprint solution as an innovative path to increased resource and cost efficiency. I’ll be watching this space.

AMD CVP Robert Hormuth was on hand at Cloud Field Day to share his views on cloud native computing and how his architect team at AMD has tackled EPYC processor development to deliver the right performance for cloud native environments. Why would architecture changes matter for an operational computing model? Robert laid out his case by discussing the differences between traditional monolithic computing to virtual/Iaas, Containerized/Caas, and Functionize/FaaS workload management. Each of these stack changes drove underlying changes to the processor architecture. The virtualization era drove higher memory capacity, a move to multicore architectures and some intra VM data locality. In moving to containers, we say higher core counts and increased memory BW, a great reduction in data locality, and more stress on IO performance. Finally, functions have called for maximization of core count and a state of short-lived workload existence.
What has AMD done to address this? First, AMD placed priority on innovating for the future of cloud native workloads over porting the past. With this comes a shift from VMs sharing core hypervisor services across all platform services to the concept of Dom0, a thinner hypervisor interacting with Dom0 services running in a single VM and run on a DPU or smartNIC. The AMD DPU, called Pensando, is integrating high speed packet processing, CPU slow path processing and backside IO to flash, NVMe, Drives, DRAM etc. This buys you more tenant instances, increased enabled applications in the cloud for determinism and performance, consistent management across environments, centralization of security and abstraction of HW+SW services to any host CPU and bare metal OS.
This also comes with a focus on chiplet architectures and acceleration of innovation past Moore’s Law. AMD’s chiplet design enables product innovation mixing and matching cores and architectural features based on different workload requirements as represented by 4 different EPYC product lines under the 4th generation of processor. These chiplets also lower cost of design with incredible flexibility compared to monolithic design. Chiplets with 2.5D and 3D stacking also increase silicon real estate delivering higher silicon area on package and ultimately more cores per CPU and memory to the customer.
So what about CPU acceleration? AMD strongly believes that tenants will want to utilize off the shelf code that have not been optimized for workload acceleration, and that operators benefit not from offering tenants workload acceleration but accelerating core functions through the DPUs and providing high performance standard cores for tenant workloads. This is an interesting view given AMD’s primary competitor has loaded their CPU with 14 workload accelerators.
The TechArena’s take is that both companies are right. Some savvy enterprises will optimize their code to take advantage of workload acceleration, and some of them will run these strategic apps in the cloud. The majority of enterprises do indeed want to run off the shelf code, and AMD’s bet on simplified high-performance cores will pay off over alternatives. Cloud operators likely will continue offering a choice in infrastructure instances to address both customer markets. AMD Bergamo offers 128 core scaling that puts it in the lead for pure performance, and because of this cloud native workloads especially in containerized or functionized environments are ideally suited for these platforms.

TechArena host Allyson Klein chats with Cloudflare VP Rebecca Weekly at the OCP Summit on far ranging topics including the demands of AI on infrastructure, how Summit announcements will shape the industry, and the importance of modular and circular infrastructure oversight.

TechArena host Allyson Klein talks to Intel’s Eric Dahlen, Microsoft’s Shruti Sethi and Schneider Electric’s Alex Rakow about the advancement of OCP’s sustainability initiative in 2023 across embodied carbon in silicon, circularity, and getting beyond PUE.

At Cloud Field Day today, the team from Juniper Networks delivered their vision for how they’re assembling the right technology, partnerships and embrace of industry standards for AI era data center networking requirements. Their discussion focused across operations, openness and solutions and was very much rooted in the recent acquisition of the Apstra technology, network automation software that aims to simplify and scale network management for IT operators.
Juniper detailed how it sees cloud style on-prem networking. Chris Magret detailed the complexity of managing across the alphabet soup of network solutions and made a case for x-vendor oversight with Apstra with Terraform. Chris walked through a demo of configuring a small LAN of Juniper switches and showed how to establish an Apstra routing zone, services network and VLAN through educating Apstra on network topology, Apstra’s automated discovery of network resources including spine and leaf switches as well as server nodes into a blueprint, and automating the configuration of all switches in the network. Chris went further by deploying an application into the fabric using Apstra. The key point of this exercise is demonstrating a cloud native management interface saving IT complexity, time and required skill for network oversight.
We next dove deep into AI cluster design and the specific challenges that AI training places on networking with James Kelly. James discussed GPU fabric rail-optimized design delivering a flatter network topology in groups of eight leafs, called a stripe by Juniper. This localizes traffic and keeps as much traffic as possible off of spines while scaling the network to support over 18K GPUs within the cluster. Can you go to a super spine configuration to scale further? James argued against it given the additional complexity and latency of such a configuration.
Finally, we looked at network analytics and how Juniper tools help network operations administer given new network requirements. Kyle Baxter walked us through how telemetry data is ingested into Apstra using Juniper’s custom telemetry collector aimed at IT admins. The telemetry collector offers a code free way to oversee the network in flight which also integrates an option for command line interface control. The key takeaway was that continual telemetry data feeds across network nodes provides Apstra’s ability to maintain management of all network resources while providing administrators the tools to sort data into meaningful information about the network.
The TechArena takeaway?
Apstra is a fantastic jump forward in simplifying network management, and we’re keen to see how Juniper continues down the path for true multi-vendor support across network infrastructure providers. We’re curious if Juniper sees Apstra only as a companion product of Juniper switch engagements or as a true standalone management console regardless of switch deployment. We’re also excited to see more progress from Juniper and the entire ethernet ecosystem with the advancement of ethernet for AI clusters including movement from the Ultra Ethernet Alliance and its new collaboration with the Open Compute Project.

TechArena host Allyson Klein chats with Fermyon CEO Matt Butcher about cloud redundancy’s impact to sustainability, and how his organization is delivering new serverless AI capabilities help usher in a more sustainable computing future.

Andy Bechtosheim is a legend of the computing industry. Currently at Arista Networks, Andy is arguably best known for founding Sun Microsystems but also founded Granite Systems, a gigabit ethernet startup acquired by Cisco, and of course Arista. We was also a key founder of the OCP organization. He also is known for being a savvy silicon valley investor including one of the earliest investors in Google funding Larry and Sergey before they’d incorporated. Depending on who you want to believe, he also encouraged them to name the company Google. He has received the Smithsonian Leadership Award for Innovation, is an elected member of the National Academy of Engineering, as was recognized as the person who has delivered the most to server innovation over the past 20 years by IT Pros. He has a knack for understanding deep technology as well as being on the cusp of what’s next, which is why I prioritized his talk on the AI Data Center at OCP Summit.
Andy started his talk today identifying that generative AI performance requirements are growing 10X per year. Where does a 100X performance breakthrough come from over the next few years? Andy called out architectural improvements, next gen processor advancements, optimized number representations including microscaling formats, and higher memory bandwidth among key elements for the industry to focus. Process technology alone will not get us there as the industry moves from 5nm to 3nm and 2nm technologies. 2.5D and 3D packaging technologies will help as available silicon real estate will basically double. Together, these represent at 5X performance optimization…not enough.
Andy suggested that doubling power and improving cooling will be necessary to drive performance required. He laid out a vision for a 3000W GPU chip representing historic challenges for cooling technologies. He introduced a focus on diamond substrate technology (DST) 5X better conductor than copper leading to a reduction of hot spots on a die and more effective area for liquid cooling technologies to be effective. So what cooling is required? Liquid cooled data centers are the new design focus due to AI. Andy called for a holistic design approach from chip to rack to building.
He introduced the Open AI rack GPU/CPU configuration that delivers 100-200 KW/rack supporting 16 2U GPU blades with supporting switch and power shelves. The practical limit for air cooling, for contrast, is 50 KW using heat exchanger technology. The fabric inside the rack utilizes passive copper cables offering the lowest power and high reliability. Andy believes that this rack technology is the future for AI clusters and called for the industry to standardize this specification so that engineering development is not fragmented over redundant solutions.
But what if folks want to go farther to 200 KW? And they do…HPE/Cray has a 200KW system on the market today as one example of industry delivery at higher power. For this, immersion cooled racks will rule the day. All heat in this case is removed from immersion liquid. Here, the industry is still not settled on the liquid to be used between water, mineral oil and fluorinert. Water would be preferred, but there are challenges with corrosion, bacterial introduction, and ongoing water chemistry maintenance required. Two phased cooling using alternative liquids can avoid pumps or ongoing maintenance for antibacterial or anticorrosion treatment. What does this mean for IT operators? Andy called for more standards to come into play as the industry is limited to bespoke solutions that will not achieve the scale the requirements for our impending AI era.
The TechArena’s take from this rapidfire talk at OCP Summit: the fact that Andy focused on something as esoteric as liquid cooling underscores the pending challenge for performance scaling for AI. The ongoing death of Moore’s Law is driving up energy draw and heat from server silicon, and processing requirements of AI are growing at the same time at unprecedented rates. Expect to see rapid innovation in the liquid cooling arena, and look for OCP’s sustainability initiative to drive a wedge into standardization efforts in this arena.

TechArena host Allyson Klein chats with Meta and OCP lead Dharmesh Jani about his vision for sustainability innovation for the data center and how the OCP sustainability initiative came into being.

TechArena' host Allyson Klein chats with Oracle executive Bev Crair on how she’s tackling long term strategic planning for OCI to get ahead of where the market and customers travel and how this approach informs Oracle business.

TechArena host Allyson Klein chats with Intel’s chief sustainability officer Jennifer Huffstetler about her vision for more energy efficient, resource aware computing and how the industry needs to work together towards building a more sustainable future.

TechArena host Allyson Klein chats with HED associate principal Sara Martin about the trends in greenfield data center buildout and what drives determination of greenfield site prioritization.

High speed connectivity is an often-overlooked element of building a sound strategy for AI training and inference at scale. AI supercomputers, built on the architectural foundations of high-performance computing, rely on masses of interconnected servers functioning as one logical system and sharing work across nodes towards a common objective. NVIDIA saw this key technology as a strategic element of their strategy when purchasing Mellanox and integrating InfiniBand into their silicon portfolio. But where does that leave other silicon logic players and customers that want another solution outside of NVIDIA’s InfiniBand solutions? Enter Alphawave Semi, a silicon leader known for connectivity solutions from high-performance IP to custom silicon and chiplet delivery. Alphawave has engaged with the TechArena in the past to discuss their broader portfolio, but I was delighted to talk to their CTO, Tony Chan Carusone, at the AI HW and Edge Summit in Santa Clara last week to discuss their strategy and what they’re seeing in the market as AI adoption broadens across cloud service providers and into the enterprise.
There are a few key trends to look at when considering AI infrastructure adoption. The first is that AI is calling for customers to consider alternative architectures and in doing so adopt a CPU + Accelerator strategy for compute capability. Customers are also becoming savvier at considering custom solutions with adoption extending more broadly than the historic confines of the world’s largest cloud players. Finally, chiplet architectures are shifting from something that large silicon providers deliver as proprietary to those that are enabled by an industry standard such as the Universal Chiplet Interconnect ExpressTM. All of these trends support Alphawave Semi’s business strategy and position them well for growth. This started with Alphawave’s leadership in connectivity IP, followed by their acquisition of OpenFive which completed last year. This acquisition of custom silicon capabilities and interface IP has transformed the company into a vertically integrated semiconductor supplier, and nearly doubled Alphawave Semi’s connectivity IP portfolio across 5nm, 4nm and 3nm solutions at the time with key technology squarely focused at AI requirements.
Alphawave Semi adds to this with their “spec-to-silicon” engineering prowess to deliver application-optimized chiplet-enabled custom silicon configurations for customers. Blending up to 112Gb SerDes, PCIe Gen6.0, CXL3.0, HBM 3.0, and power-performance-area optimized Arm and RISC-V processor subsystems as examples, the company can deliver the leading-edge SOCs to their hyperscaler and data infrastructure customers, that are needed to keep pace with the surge in data-intensive applications like generative AI. Moreover, their deep industry experience, key partnerships and ecosystem collaborations provide confidence in moving from ideation to custom solution delivery with speed.
Where things get really interesting to me is with the advent of open chiplet configurations as this accelerates Alphawave IP into heterogeneous solutions. Tony Carusone noted that Alphawave Semi has positioned itself well for this broader industry trend being on the foundations of industry standards setting and building technology prowess with things like advanced 2.5D packaging that will enable Alphawave Semi to work with other industry leaders as a connectivity chiplet supplier.
When one considers the strong demand for alternative AI silicon in the market, an independent supplier like Alphawave Semi becomes even more interesting as they collaborate with multiple vendors to deliver core connectivity capabilities in the most advanced process nodes. Moreover, their customers can be assured that their custom chip and chiplet solutions remain proprietary. As Tony stated in our conversation, “Just the ability, to leverage our industry leading connectivity IP and silicon offerings across the full spectrum of solutions allows us the flexibility to work with some of the biggest players in AI and meet them wherever they're at to help them solve problems in whatever way makes most sense for their specific AI workloads and applications.” In my view, this summarizes why Alphawave Semi is perfectly positioned to take advantage of the massive growth in AI infrastructure deployments expected over the next several years. Watch this space for more updates regarding Alphawave, connectivity, and the demands for more silicon capability to fully unleash the power of AI.

One thing you notice when you talk to Jay Dawani is that when he sees a challenge that’s important enough, he actually takes it on himself to solve it. This was made clear when Jay delivered the how to for coding AI algorithms based on a foundation of high school math to democratize access to AI developer skillsets, and it’s clear again with Lemurian’s strategy to redefine the mathematical foundations of AI and deliver more performant and efficient computing solutions.
With all the coverage on AI in 2023, one would need to be hiding from technology to not understand the size and urgency for demand of AI infrastructure. Today, however, AI training capabilities are locked within the largest of cloud providers due to limitations of access to the compute scale required to fuel training and the developer prowess to code AI effectively. The need for an alternative architecture is drawing the investor community to lean forward into AI silicon startups like Lemurian and customers to lean forward into consideration of alternative solutions.
Lemurian’s approach is rooted in silicon innovation but starts arguably even more foundationally in new math for AI. In our interview Jay described previous math models as great for silicon but not necessarily ideal for developers calling out INT8 as a prime example of an approach that challenged developer effectiveness.
Lemurian’s approach is rooted in logarithmic innovation. Jay explained, “We went back to a really old idea. Log number systems. It's the holy grail of number systems that if you can make a purely log link format work, and it's a 250 year old math problem. We came up with a way of actually making addition in a log format work, and we've developed multiple types, all of them, two bits to 64 bits, to stack up against floating point. So pound for pound or bit for bit, this thing has better precision, dynamic range and signal to noise ratio. This thing wins all day long.”
If you can re-map the number system that silicon uses to calculate, you can arguably achieve a Moore’s Law like increase in performance, something that AI sorely needs to break through with the compute efficiency required. In Jay’s presentation at the conference, he argued that the combination of accelerator silicon innovation, software improvement and number system advancement could yield as high as a 20X performance leap which explains why his sessions at the conference drew standing room crowds. He backed that up further showcasing eye opening simulations of 11,357 inferences a second for BERT Large and 119,278 inferences a second for ResNet50 based on a single node configuration. To put this in perspective, if these numbers hold they’ll beat NVIDIA’s A100 today which is what is widely deployed in the market.
So what is TechArena’s take? There are absolutely AI inferencing challenges at scale that require acceleration. Facebook’s workloads come to mind requiring nimble acceleration to meet latency requirements from users. While CPUs will remain the standard for inferencing performance efficiency, acceleration will continue to gain inroads for workloads that make sense opening a door for vendors like Lemurian. If Lemurian’s number system gains traction with developers the opportunity for the company will scale significantly, and this is where we’ll be watching most acutely as this exciting area of the industry progresses. For now, I love that Jay and team are challenging the status quo and stating emphatically that a tool as powerful as AI requires new ideas for democratic delivery. I’ll be watching what comes next from this exciting startup.

TechArena host Allyson Klein chats with Alphawave Semi’s CTO Tony Chan Carusone regarding the unique opportunity for connectivity innovation to fuel the era of AI and why Alphawave is perfectly poised for IP, chiplet and custom solution delivery.

TechArena host Allyson Klein chats with Tenstorrent’s David Bennett about the company’s vision for RISC-V + accelerator solutions to usher in a new era of AI compute and how customers are hungry for alternatives including custom designs.

TechArena host Allyson Klein chats with Lemurian Labs co-founder and CEO Jay Dawani about his vision for a mathematical revolution in AI and how his company plans to change the game on AI performance.

Last week, I was delighted to chat with Nidhi Chappell, Microsoft’s GM behind delivery of AI supercomputers, on her team’s journey in architecting the infrastructure that has unleashed Chat GPT to the world. Imagine for a minute the awesome opportunity and responsibility in delivering the compute capability to unlock generative AI for the world. Imagine first seeing Chat GPT before everyone else and knowing you’ll be sharing it with the world on the supercomputer you’re designing and deploying very soon. This is what Nidhi and her team at Microsoft have delivered. What Nidhi shared about this experience was also insightful regarding the broader moment we’re standing in right now. As she described leveraging her background in high performance computing to build AI training supercomputers, one thing she said stood out to me. Her team is utilizing hundreds of thousands of GPUs to train their generative AI solution that will likely transform every industry on the planet, re-architect how we look at work and societal constructs, and potentially reshape how governments address the great divide between those who thrive in the era of AI and those who struggle to survive. Those GPUs run in buildings that consume more power than entire towns, and are as rare to find as Willy Wonka’s golden tickets. We’re at a moment where the most powerful innovation that humans arguably have ever created is resting in the hands of one company…NVIDIA…to supply a few companies, including, of course, Microsoft, racing with Google, Amazon, and a few others, to control the power of AI engines for the world.
Yesterday’s publication of Elon Musk’s biography delivers a stark warning about the consolidation of power to one individual or company and what it means to society at large. I think even leadership within the industry stalwarts that can afford building massive AI compute engines would argue that a thriving AI industry vibrant with competition and diffuse in power is good for all of us. Nidhi talked at length about Microsoft’s own vision of responsible AI development and delivery aligned with AI serving as a co-pilot to human innovation aligning well with the company’s broader reputation for responsible use of technology. But how do we get to this vibrant future? To find out, I traveled to Santa Clara to the AI HW and Edge Summit, featuring a who’s who of AI silicon innovators. This event offers the perfect opportunity to assess the state of innovation of AI silicon. The lineup of speakers, sponsors, and attendees confirmed that the broader industry is hungry for alternatives for data center and edge AI implementation.
What is driving this hunger? GPU supply constraints have been well documented, but other influences, such as power consumption in barriers to developer engagement, are creating demand for alternative architectures. Today's data centers consume over 3% of the world’s electricity, and some forecasts predict that AI will scale compute energy demand to upwards of 20% of global consumption. GPUs are literally becoming the Hummers of the data center before our eyes. Can we afford to rely on this power-hungry architecture at a time when the world’s climate crisis is peaking? Will this reliance on massive electricity further consolidate control of AI to those who can not only afford the infrastructure cost, but the utility bill?
Organizations are also challenged in finding AI talent with true knowledge of training algorithm and framework development. How do we ensure that AI can be democratized, and does architectural choice including custom silicon and chiplet innovation aid in broadening the opportunity for AI innovation?
This week, I’ll be exploring these questions with three companies challenging the AI power dynamics, including Tenstorrent, the Jim Keller led startup delivering highly scalable RISC-V + ML accelerator solutions, Lemurian Labs, a company founded in robotics but delivering new acceleration solutions for AI from data center to edge, and Alphawave Semi, a semiconductor IP engine rooted in the connectivity and chiplet solutions required for AI innovation. Collectively, I expect they’ll provide insight on the state of broad industry delivery of compelling alternatives to GPUs. Watch this space for insights from the conference, and, as always, thanks for engaging. - Allyson

Allyson Klein chat’s with Microsoft GM Nidhi Chappell about her team’s amazing journey delivering the compute power to train Chat GPT and integrate generative AI power for the world.

TechArena host Allyson Klein sits down with NASA’s Matt Greenhouse to discuss the Webb telescope project, a mission he has been working on for decades, and how it will shape our understanding of the birth of the universe, exoplanets, and potentially life beyond Earth.

VMware: A company with a storied history that dates back to the very definition of virtualization and a darling of enterprise IT for decades. But since its pre-pandemic heyday under the leadership of Pat Gelsinger, VMware’s future has come into question. Many consider the company’s multiple investments in unproven technologies, cloud service provider delivery of alternative hypervisors, and the departure of Pat for Intel placing the future of VMware at risk. With the acquisition of the company by Broadcom looming (and another hurdle to that acquisition passed in the past few days), what will VMware become in the coming years, and what specifically does Hock Tan and the Broadcom team see as the true value of VMware’s IP portfolio?
To learn more, I headed to Las Vegas and to the TechField Day Extra event at the Wynn hotel. There, the VMware team detailed the new NSX+ offering including NSX+ policy management, NSX+ intelligence, NSX+ network detection and response, and NSX+ ALB cloud services. NSX+ is obviously the latest iteration of VMWare’s successful NSX offering, a shining star that has grabbed a hold of solution leadership beyond the VMware core hypervisor business. NSX+ is also likely at the center of Broadcom’s interest in the company given it’s squarely in network. So what does NSX+ deliver that is new? According to the VMware team, multi-cloud security policy configuration and management with a single pane of glass view on management of distributed firewalls, gateways and IDS/IPS policies are at the center of innovation.
The VMware team demonstrated the NSX+ Policy Management capability at work allowing adherence to corporate security guidelines and enabling teams to operate seamlessly within those guidelines. The demo provided walked through a step by step of setting up subnets within the VPC, associating VNICs to those subnets, and showcasing that teams have no access beyond the VPC intended ensuring compliance to broader corporate security policies.
Another key feature that caught my attention was the NSX ALB controller cloud service enabling multi-cloud load balancing through a single controller cloud service. When asked what the difference was between this core capability and existing management schema, the VMware team clarified that this offers a SaaS management option as an alternative to traditional in house management which was termed “disconnected”. This will be available to on-prem instances today and then extending to VMware cloud and public cloud instances over time.
But VMware saved the most interesting feature, at least from my perspective, for last. They’ve integrated centralized multi-cloud analytics with NSX+ intelligence. What does this provide for administrators? It delivers a scalable and centralized data platform to not just view historical data but also amps security oversight and delivers behavioral threat analytics. VMware also talked up their use of ML to classify workloads and oversee other rudimentary data collection, and I expect they’ll expand this capability over time. Will this be used by administrators vs. unique network security tools? I could see integration with other security tools, but I’m dubious that NSX+ alone will provide the requirements that enterprises require.
So where does this leave VMware? I think the key takeaway is that core innovation is still being delivered from their engineers, and NSX+ certainly will be part of the VMware of 2024+. Given the heritage of VMware within enterprise, I expect core hypervisor use to bias to VMware moving forward within enterprise deployments. But future innovation in the broader industry squarely biases towards cloud native workload management, and the innovation conversation in my opinion has shifted from VMworld to the likes of Cloud Native Con and the CNCF community, for example. And with Broadcom’s acquisition likely to close imminently, it does make a VMware Explore 2024 a real question at least in its current definition. As always, thanks for engaging.

TechArena host Allyson Klein chats with Project Inkblot co-founder Jahan Mantin about what it means to be a designer and how to build inclusive design practices.

TechArena host Allyson Klein talks with Cerbos co-founder and CEO Emre Baran of the importance of authorization solutions and why he founded Cerbos to deliver scalable code that developers can seamlessly integrate into cloud native applications.