Tuesday, July 16, 2024

Intel - Is it an IPU or a DPU or what?

Intel has developed and sold classic Ethernet Network Interface Cards (NICs) for a long time, but many might not be as familiar with their product offerings in the SmartNIC and more advanced NIC categories. Intel breaks down their product offering as follows:

Intel Ethernet Connectivity Solutions

Intel presented on the work they are doing around their Infrastructure Process Unit (IPU), which they refer to as an "Improved DPU" or Data Processing Unit, which fits in the general bucket of "SmartNIC" and is developed by the Intel NEX Cloud Connectivity Group. This post focuses on that, since that is what they presented at #NFD35. I must admit, I am interested in hearing more about their AI Optimized solutions as I am sure they are being leveraged by some very large organizations for interesting workloads, perhaps a future NFD?

There were two features in the IPU that are something infrastructure engineers should know about. Specifically, the capability to build out reliable transport between two hosts who both have an Intel IPUs. The Ethernet fabric no longer needs to run special queuing and management to deal with congestion and microburst issues but instead, the IPU running Falcon, and leveraging programable congestion control, deals with it. Effectively, Falcon is the method for reliable transport over an existing lossy fabric which brings a lot of options to companies who may not want to build out a dedicated fabric for running storage or AI workloads.

In a shared fabric environment, it can be difficult to structure and provision all the access ports with the right queuing and policies. Given that difficultly, it might make sense, for smaller networks and diverse compute environments, to simple purchase the more advanced IPU's for the servers that require them and have the IPU deal with the lossy fabric issues.

The other feature was demonstrating the use of the IPU for general compute capabilities and also AI inference, and the markets that could potentially use the solution. They are definitely targeting a wide audience of infrastructure engineers who might need to run services and workloads but might not have the capacity, budget, or fabric design to support what they are trying to do. Intel sees the following areas as potential good use cases for their IPU.

IPUs In & Beyond the Data Center

 

You can watch the overview presentation from Thomas Scheibe w/ Intel at:



If you want more information about their Reliable Transport over Lossy Fabrics, which is called Falcon, then check out:



Intel also provided some actual demos and you can watch those at:



What is always a little interesting about Intel and their solutions, is that typically, you and I aren't buying directly from Intel. You are normally purchasing their products through a distributor or hardware supplier like HPE, or Dell, or SuperMicro. But Intel still wants infrastructure engineers to know what their products are capable of, so when you are building out the next server, you are picking the right SmartNIC for your specific needs. So it makes sense they are out providing this information directly to the public, or NFD events in this case, so you can pick and choose the right solution for your Data Center and Enterprise server networking needs.

- Ed


In a spirit of fairness (and also because it is legally required by the FTC), I am posting this Disclosure Statement. It is intended to alert readers to funding or gifts that might influence my writing. My participation in Network Field Day, a Tech Field Day event, was voluntary and I was invited to participate. Tech Field Day events are hosted by Gestalt IT (part of The Futurum Group) and my hotel, transportation, food and beverage was/is paid for by Gestalt IT for the duration of the event. In addition, small swag gifts or donations were/are provided by some of the sponsors of the event to delegates (I don't accept gifts but I do ask the sponsors to donate to causes that support Mental Health). It should be noted that there was/is no requirement to produce content about the sponsors and any content produced does not require review or editing by Gestalt IT or the sponsors of the event. So all the spelling mistakes, technical missteps, incorrect opinions, and grammar errors are my own.


Tuesday, July 02, 2024

Network Field Day 35

Network Field Day 35 (#NFD35) is happening July 10-11, 2024 and I am fortunate enough to be a delegate for the event. You can check out the full event schedule at the NFD35 website. The sponsors list has been growing so checking the site is the best until the event starts. I recommend watching live if you can and I believe LinkedIn is likely the best place to catch stuff.

So far the sponsor line up is:

I will be attending in person, and I will be doing my best to take notes and ask interesting questions. Obviously, there is no way we can cover all the questions that those who are watching remote might have, but hit us up on X/Twitter using #NFD35 or via the Tech Field Day slack channel or even via LinkedIn and we will all do our best to try and bring up the point.

So, there you go, let's get ready to have some serious fun with NFD35 as the delegate line up is pretty impressive! If you are at all into networking then I encourage you to follow along live for the events on the Tech Field Day website or via LinkedIn. If you are interested in being a delegate, you can check out the website, they have all the details up there.

- Ed


In a spirit of fairness (and also because it is legally required by the FTC), I am posting this Disclosure Statement. It is intended to alert readers to funding or gifts that might influence my writing. My participation in Tech Field Day events was voluntary and I was invited to participate. Tech Field Day is hosted by Gestalt IT (part of The Futurum Group) and my hotel, transportation, food and beverage was/is paid for by Gestalt IT for the duration of the event. In addition, small swag gifts or donations were/are provided by some of the sponsors of the event to delegates (I don't accept gifts but I do ask the sponsors to donate to causes that support Mental Health). It should be noted that there was/is no requirement to produce content about the sponsors and any content produced does not require review or editing by Gestalt IT or the sponsors of the event. So all the spelling mistakes, technical missteps, incorrect opinions, and grammar errors are my own.

Monday, August 07, 2023

Nile's changing up how Enterprises design, build, and consume Access Networks at Network Field Day 32

Nile presented at Networking Field Day 32 on July 26, 2023 and they presented on their Enterprise Networking solutions. Nile has built out a set of networking solutions that focuses on the enterprise and commercial market and they are selling the solution in a Network as a Service model. The overview of what they provide:

  • Wired and Wireless LAN as a Service
  • Guaranteed Network Performance
  • Zero Trust
  • IT Simplicity

It seems they are competitors to Meraki, Mist, and Aruba from an enterprise solution offering and to Ubiquiti and Microtik in the commercial market. All of these competitors have strong market positions and install bases. This is a simplistic comparison, but for the purpose of understanding what market groups they are potentially suited for, it works just fine.

Here is their overview:


There are several more YouTube videos available, you can find them all over at the Tech Field Day 32 Nile page.

But in typical NFD fashion, the most interesting and relevant session ended up being the last video and the poor presenter was given the least amount of time because everyone else was unable to keep on track prior.

Note: If Nile presents at another field day, I suggest they START with this demo, focus on doing Q&A around it and expand everything else after it. Honestly, the first 30-45 mins of the overall timeslot was a waste of time and could have been cut (except the marketing people likely wanted that content - stop listening to them, you can record that stuff on your own, you don't need a bunch of delegates in the room for that part). If you are going to watch anything, watch this one:



My quick thoughts on what Nile presented:
Of course the IPv6 question was asked and they built a new generation of networking gear and solution without IPv6 as a first class citizen. I don't know if that is really forgivable in the current market. While I understand the US Federal Government is not their primary customer, or even a secondary, there will definitely be organizations that need IPv6. It is just such a glaring misstep I can't really take the rest of the product seriously, so you know my bias going into this. 

They also need to explain and position their place in the market a bit more clearly. A simple elevator pitch that says something like: "We are Meraki or Mist generation 2.0" or something similar to give a reference point. I get that they are doing Network as a Service (NaaS) and their billing/revenue model is slightly different but it puts them in front of the right general audience. The current pitch and explanation is too broad and doesn't narrow the field for buyers to understand what they do and why.

Effectively, they are wrapping together hardware, software, support, and installer/operator easy of administration in a recurring revenue model. I'm not sure that is revolutionary at this point. They did invest to brand their own hardware solution. I'm not sure putting simple diagrams on the equipment makes it unique in terms of IT Simplicity. Their management UI looks like a combo of Mist, Meraki and Ubiquiti so nothing super unique going on there, though that might be a plus, people who have used those other solutions can figure theirs out a bit faster.

I will be honest, I am not 100% sure what the large/important differentiator is for Nile. I either missed the key points in the presentation or they need to hone their message of how they are different, unique, and valuable for a customer. It just wasn't clear to me why I would want them versus any other product solution set out there right now. It should be the first, second, and third thing they talk about. I'm not even sure it was mentioned specifically.

I will keep an eye on Nile and what they are doing, but honestly, just like with Meraki, I won't take them seriously until they can work with IPv6 as a fully supported networking protocol.


 - Ed

In a spirit of fairness (and also because it is legally required by the FTC), I am posting this Disclosure Statement. It is intended to alert readers to funding or gifts that might influence my writing. My participation in Tech Field Day events was voluntary and I was invited to participate in NFD32. Tech Field Day is hosted by Gestalt IT and my hotel, transportation, food and beverage was/is paid for by Gestalt IT for the duration of the event, if travel was involved. In addition, small swag gifts or donations were/are provided by some of the sponsors of the event to delegates (I didn't accept the swag gifts offered). It should be noted that there was/is no requirement to produce content about the sponsors and any content produced does not require review or editing by Gestalt IT or the sponsors of the event. So all the spelling mistakes and grammar errors are my own along with the ideas and thoughts.

Tuesday, August 01, 2023

Broadcom's AI Networking Solutions at Networking Field Day 32

Broadcom presented at Networking Field Day 32 on July 26, 2023 and they presented on their AI Networking solutions. These are products and architectures that address the needs of those building out AI data center focused networks. Obviously the design will work for regular data center workloads too. albeit, suboptimal because the design is focused on addressing AI workloads and not a more general workload. The attributes that Broadcom define for what makes an AI Network unique are:

  • Fewer flows (low entropy)
  • High bandwidth flows (elephant flows due to the large amount of data sets being moved around)
  • Synchronized and bursty traffic
  • Links are saturated in micro-seconds (<<RTT)
  • Training jobs run for long periods of time (hours/days)
  • Tail latency impacts job completion time significantly
And they shared some interesting info about "time spent in network" is impacted by:

  • Transient oversubscription
  • Flow collisions and link failures
  • Incast - many GPUs sends into one or a few GPU(s)
Broadcom says the solution is to build a Clos fabric that makes use of a receiver-based credit control process that can pace the senders accurately. This means it is impossible to oversubscribe the Clos fabric and therefore you can leverage techniques like packet spraying with receiver ordering. It is worth watching the presentation on YouTube to understand what they are doing and why. You can check that out here:



There are specific videos on the Tomahawk AI Interconnect here:



And also on Jericho3 AI here:


And their wrap up on AI/ML Data Center Fabric solutions can be found here:


My quick thoughts on what Broadcom presented:
I wasn't aware (more likely I haven't been paying attention to what is happening in AI/ML like I should be) that there was this much specific network design work going into addressing AI workloads. While I understand there are a lot of AI/ML projects, I wasn't aware that so many private firms might want this solution architecture for their own needs versus running on leased cloud models.

Clearly there is a pricing advantage to running stuff at scale on your own hardware (in terms of reduced network data ingress/egress costs, compute cycles, and having dedicated GPU access) otherwise Broadcom wouldn't be building these sorts of solutions. It seems most of the large scale cloud providers have built something similar on their own or have requested that Broadcom address a gap in what traditional Ethernet fabrics can provide.

What will be interesting to me is if this is a short term industry change to address a narrow vertical or if this will become the new default Ethernet fabric architecture because AI/ML workloads will become common place DC workloads. I'm not convinced it will go that way, perhaps a hybrid of specific AI/ML Ethernet fabrics that are L3 connected to traditional DC focused Ethernet fabrics to attempt to give an organization the best of both worlds.

You can also get Drew Conry-Murray's thoughts on Broadcom's presentation over at his Packet Pushers blog post.

 - Ed

In a spirit of fairness (and also because it is legally required by the FTC), I am posting this Disclosure Statement. It is intended to alert readers to funding or gifts that might influence my writing. My participation in Tech Field Day events was voluntary and I was invited to participate in NFD32. Tech Field Day is hosted by Gestalt IT and my hotel, transportation, food and beverage was/is paid for by Gestalt IT for the duration of the event, if travel was involved. In addition, small swag gifts or donations were/are provided by some of the sponsors of the event to delegates (I didn't accept the swag gifts offered). It should be noted that there was/is no requirement to produce content about the sponsors and any content produced does not require review or editing by Gestalt IT or the sponsors of the event. So all the spelling mistakes and grammar errors are my own along with the ideas and thoughts.

Wednesday, June 21, 2023

IPv6-only has become a thing

Outside of posting content around Tech Field Day events I occasionally participate in, my blog hasn't seen a lot of activity. Mainly because I have been posting content over at the Infoblox IPv6 Center of Excellence or via the IPv6 Buzz Podcast. I recommend you check both those out, it isn't just me generating that content but also Scott Hogg, Tom Coffeen, Cody Christman, Tim Martin, and other great IPv6 content creators.

I did want to highlight one observation I have made starting at the end of 2022 and continuing throughout 2023 and that is the noticeable increase of discussions around IPv6-only. In the past, IPv6-only was a smaller corner case for many organizations as everyone thought the natural progression for IPv6 adoption (IPv6 transition) was to move to dual-stack first and figure out the deployment and operational issues and then shut off IPv4. The problem with this workflow was twofold. First, it is difficult to determine operational issues in dual-stack networks as happy eyeballs hides many of those issues from you. And second is that no one actually turns off IPv4, which defeats the whole purpose of adopting IPv6 for the long run.

There is no denying that part of the reason for some much IPv6-only conversation is due to the IPv6 transition requirement that OMB published regarding moving to IPv6-only. Deploying dual-stack doesn't help an organization meet the requirements defined in the memo, which leaves these departments and agencies to figure out how to do IPv6-only. There are also now more Fortune 500 and Forbes Global 2000 companies who are having significant IPv4 address supply issues and realizing they only want to get around those is to either buy more costly public IPv4 address space or adopt IPv6-only to slow the burn rate of IPv4 usage.

Every organization has their unique business and technical requirements. IPv6-only may only address problems in one of those problems spaces. But it is a tool that more organizations are realizing they should have in their tool belt and that they can get wins in both business and technical requirements. For example, IPv6-only makes merger and acquisitions much easier to perform as making use of GUA space guarantees uniqueness of addresses, meaning it is a simple routing and peering problem to integrate the networks and not a NAT or re-addressing project that might take years to complete. It also means the timeframe to perform a merger or to integrate an acquired company drops dramatically. This has profound impact on the financial structure of the deal which is something that should not be overlooked.

I think it is important for people to realize there are not a lot of people with industry experience deploying IPv6-only networks. So, be cautious when talking with vendors, consultants, and industry peers about what to do. Very few people have the experience and design skills to navigate everything that goes into making IPv6-only a reality. I have interacted with a lot of vendors and consultants recently who claim they can do it, but the only IPv6 they have deployed is dual-stack and they have serious gaps in their knowledge and in the solutions that will actually work. So, do your homework and buyer beware. I will try and post some IPv6-only resources as I run across them (or just write them myself!)

- Ed