AMD has filed a patent for something that everyone knew would eventually happen: an MCM GPU Chiplet design. Spotted by LaFriteDavid over at Twitter and published on Freepatents.com, the document shows how AMD plans to build a GPU chiplet graphics card that is eerily reminiscent of its MCM based CPU designs. With NVIDIA working on its own MCM design with Hopper architecture, it’s about time that we left monolithic GPU designs in the past and enable truly exponential performance growth.
AMD patents GPU chiplet design for future graphics cards
The patent points out that one of the reasons why MCM GPUs have not been attempted in the past is due to the high latency between chiplets, programming models and it being harder to implement parallelism. AMD’s patent attempts to solve all these problems by using an on-package interconnect it calls the high bandwidth passive crosslink. This would enable each GPU chiplet to communicate with the CPU directly as well as other chiplets via the passive crosslink. Each GPU would also feature its own cache. This design appears to suggest that each GPU chiplet will be a GPU in its own right and fully addressable by the operating system.
There have been leaks in the past which suggested AMD is considering the move to an MCM design for its GPUs after RDNA3, and if NVIDIA’s Hopper does the same, then AMD would have very little choice but to do so as well. Intel has already achieved success using the MCM design methodology and demoed the first MCM based GPU quite a while back. One thing is for sure: things are about to get very interesting for GPU enthusiasts.
The full patent is given below:
AMD has proven itself to be exceptionally good at creating MCM based products. It’s Zen-based CPUs were absolutely disruptive to the HEDT market space. They single-handedly turned what was usually a 6-core and very expensive affair to a 32 core+ affordable combo. The power of servers and Xeons was finally in the hands of average consumers, so why can’t the same philosophy work for GPUs as well?
Well, theoretically speaking, it should work better in all regards for GPUs which are parallel devices than for CPUs which are serial devices. Not only that but you are looking at massive yield gains from just shifting to an MCM based approach instead of a monolithic die. A single huge die has abysmal yields, is expensive to produce and usually has high wastage. Multiple chips totaling the same die size would offer yield increases straight of the bat.
I took the liberty to do some rough approximations using the lovely Silicon Edge tool and was not surprised to see instant yield gains. The Vega 64 has a die measuring 484mm² which equates to a die measuring 22mm² by 22mm². Splitting this monolothic die into 4x 11mm² by 11² gives you the same net surface area (484mm²) and will also result in yield gains. How much? Let’s see. According to the approximation, a 200mm wafer should be able to produce 45 monolithic dies (22×22) or 202 smaller dies (11×11). Since we need 4 smaller dies to equal 1 monolithic part, we end up with 50 484mm² MCM dies. That’s a yield gain of 11% right there.
The yield gains are even larger for bigger chips. The upper limit of lithographic techniques (with reasonable yields) is roughly 625mm². On a single 200mm wafer, we can get about 33 of these (25×25) or 154 smaller dies (12.5×12.5). That gives us a total of 38 MCM based dies for a yield increase of 15%. Now full disclosure, this is a very rough approximation and does not take into account several factors such as packaging yields, complicated high-level design, etc but the basic idea holds well. But at the same time, it also does not take into account increased gains by lowered wastage – a faulty 625mm² monolithic die is much more wasteful than a single 156mm² one!
Long story short, AMD is perfectly capable of creating an MCM based GPU and would even get some serious yield benefits out of this if it chooses to run with this in the future. NVIDIA is also actively pursuing this path for the same reasons.