Hi friends, welcome back to the channel last week
I conducted the webinar for tech investors and there was a lot of interest in memory technology
but let me just tell you straight the memory is dead. That it's a fact and it will affect
the entire industry and this is a huge problem for NVIDIA Intel Apple AMD and pretty much every
chip maker and startup and this is really bad news because memory is essential for every CPU and
GPU and every SoC now for the good news there is a brand new memory technology which may solve
this problem and it brings us one step closer to a measure boost in speed and reduction in cost.
it's super interesting let me explain ever since transistors were invented in 1947 we've always
found ways to shrink them down year after year and we did great job here because the first
transistors were in the range of centimeters then micrometers and now we progressed to
mass producing chips at 3 and 4 nanometers for example Qualcomm Snapdragon X Elite chip is
in 4 nanometer process note and the latest M4 chip is in 3 nm and recently TSMC announced that
they would be ramping up the production of chips in the new 1.6nm process node and you know what
for a long time memory was also following this beautiful trend but not anymore it's over now
and this is a huge problem for the the entire semiconductor industry and here I'm talking about
the fastest so-called cache memory over the past 60 years SRAM has been the memory of choice for
applications where we need speed and fast access time a typical SRAM cell is consists of latches
which are built of usually four to six transistors it's basically two inverters connected back to
back with the idea that one keeps the level of another alive and this architecture differentiate
it from DRAM so we love SRAM memory because it tends to perform better and also drain less
power especially when it's idle it's the highest performing memory and it's integrated directly
alongside with the processing cores it actually stores the data very close to the processing
cores and here we still use the gigahertz range clock so we can access this data in the range
of 250-500ps you see this memory is essential and here I wanted to make a memory joke but I
don't remember which one XD and now it starts to get even worse because the general trend is
that the amount of memory per chip is constantly increasing if we look at the all recent chips
developed by Intel AMD Nvidia and Apple all of them are adding more and more memory to their
chips for example Nvidia is adding more and more cash into each of their new GPU and they're
making more and more cash XD the problem is that cache memory doesn't scale as well as the logic
so it's keep eating up larger and larger parts of the chips and this is really catastrophic for the
future now let's try to understand what exactly goes wrong here? in comparison to all other types
of memory SRAM is a part of the chip die itself and it's fabricated in the same process nude
as a chip logic the chip logic has more or less followed Moore's law giving us approximately two
times the transistor density with each processed node at the same price.. unfortunately memory
cells doesn't scale at the same rate it was at some point a factor of 1.8 scaling and then 1.6
1.4 and with each process node this number has gotten lower and lower until the point when TSMC
announced their N3 process node and at this point it became crystal clear that SRAM scaling is now
officially dead the N3 node actually delivered factor of 1.7 transistor scaling and a factor of
1.0 scaling of SRAM this means SRAM cells stayed exactly the same size and as we can see from this
chart bit cell size has an area of 0.021um2 which is exactly the same size as it's their N5 node
and what's really sad with their next improved N3B process node it's scaled by just amazingly 5%
and this is actually not just the problem of TSMC because Intel Samsung Global foundies everyone is
facing the same challenges.. to illustrate how bad actually things are... imagine an imaginary TSMC
chip in 16nm and let's assume that 18% of the chip area is dedicated to SRAM memory if we fabricate
the same chip the same design in N3 process node now SRAM area would occupy more than 30% of
the chips die that's really bad but now let's understand why this scaling doesn't work simply
put these memory cells are very special despite the fact that these cells are constructed from
transistors they have unique structure that does not conform to the normal logic design rules for
each new process node it must be redesigned using special rules developed by fundies it's a highly
sensitive device which is very vulnerable to the manufacturing process variations for example
two variations in the cell threshold voltage or dopant fluctuations and any of such variations
can render the SRAM cell unstable and unreliable affecting overall yield this situation will not
improve but most likely will get even worse when we now transition from our FinFET transistors
to a new Gate-all-around transistor architecture because now we have to replace the fins with the
nanosheets and this introduces many new technical challenges so I would say at this point of time
it's inevitable that with each new process node SRAM memory consumes more cheap area and driving
up costs from my experience in chip design I can tell you that area is everything and engineers are
ready to go above and beyond.. extra mile to save every single micrometer square of area because
area = money you know whenever we fabricate chips we pay price per area and the major issue
here is that we actually can't do without SRAM we cannot survive without enough SRAM because
if a processor core doesn't have enough SRAM it has to retrieve data from further away and this
takes more power it consumes more power and also slows the speed the performance let me know your
thoughts on this problem in the comments before we jump in the new memory technology that may solve
this problem I want to show you something very exciting. This is the brand new ASUS Vivobook S
15 and I was very excited to try it because it's their first Copilot+PC laptop which is based
on the new Qualcomm Snapdragon X Elite chip in 4 nm it has a dedicated NPU neural processing
unit and it's capable of 45 TOPS with this new chip they're bringing generative AI capabilities
to the laptop first of all it has a Copilot key which I find very handy you can instantly access
the AI assistant which can answer your questions, generate images and create presentations
for you. this is definitely the future of tech! here they've also introduced AI powered
applications to simplify your daily routine starting from StoryCube for organizing your
large multimedia library to Asus Adaptive Lock that keeps your space safe by locking when
you're away. it also features Cocreator an AI tool that allows you to draw images in Paint
and then enhance them with AI. I've been using the Asus Vivobook S 15 for more than two weeks
now and what I can say apart of its sleek design it's a really capable laptop so you can run LLMs
with up to 13 billion parameters on the device and the specs on this laptop are great as mentioned
it has a Snapdragon X Elite chip which is an Arm based chip paired with 16 GB of RAM a gorgeous
3K OLED display and 1 terabyte of SSD and it has tons of ports like HDMI two USB C ports a Micro
SD card reader and two USB A ports I've taken it with me everywhere I go and I'm really enjoying
it you get up to 18 hours of battery life so I could even take it to my next transatlantic flight
and work nonstop. So it's a really beautiful piece of hardware! Make sure to check check it out using
the link below. so when we realized that we cannot scale SRAM memory any further we found another
option chiplets simply putting memory right on top of the course it was a huge deal back in 2022
when AMD introduced their V-cache technology and this was huge because with that we can managed to
add much more cash memory and here we must give a lot of credit to AMD for their forward thinking
and of course some of the credit goes also to TSMC because AMD used TSMC's 3D SoIC so-called system
and integrated chips packaging technology to make this to work they basically stacked an additional
64MB of L3 cache right on top of the CPU die and this additional cache gave a huge performance
boost to many applications including gaming in general let's agree on one thing this idea
of stacking one thing on top of each other is brilliant because this gives you an opportunity to
mix and match different dies or different chiplets in different process nodes for example you can
build a chip in the most advanced process node for the logic and on top you stack a memory die
in one of the older process nodes in this case we can benefit from the speed and transistor density
and also power improvements in the core logic and then use some bigger memory on top and this
memory can even be in the older process node which means it will be more reliable and also much
much cheaper and we see this approach adopted by more and more companies not only for the cache
but also for other types of memory as well as for other blocks for analog circuits for example
because analog circuits also don't scale where well at least not as well as digital in this way
we can combine different chiplets manufactured at different technologies let's say at 3nm and
at 16nm and build a chip in the most efficient way possible the truth is that chiplets are great
and they may help us to reduce costs and add more cash but it's not really solving the problem
that's why for a long time now the industry been looking for an alternative to current memory
technology so there are several emerging memory Technologies including magnetic Ram ferroelectric
Ram resistive Ram Phase-change-memory and others I used to work with ream resistive Ram in
the past but what is so interesting about all these memory flavours that each of them has
their pros and cons for example some of them are more optimised for area or speed some for power
some have faster access rates and higher band with than others on top of that we know that each
CPU or GPU features several types of memory and each of them have different requirements the main
differences are in their data storage mechanism and the speed and speed as we know for SRAM is
essential because SRAM is the fastest with access times in the range of a few nanoseconds and it
has to be low power DRAM of course is slower it's based on just one transistor and the capacity and
here the excess time is in the range of tens of nanoseconds and we all know flash memory flash
memory is the slowest it takes microseconds to read the data from it and then we also have to
distinguish between volatile and non-volatile memory so for example SRAM and DRAM are volatile
memory which means they only retain data when the power is supplied so now it's clear that when
it comes to SRAM the most critical things are latency area and Power consumption so in this new
paper published in nature researchers at Stanford have developed a new PCM so phase change memory
material called GST467 that uses chalcogenide in a superlattice structure and it's a great one
in terms of the properties we are looking for how does this phase change memory work it's basically
a memory cell that consists of a glass material which is sandwiched between two electrodes and
when we apply high current pulse to it it switches between the crystalline and amorphous states and
the crystalline state represents a digital one and amorphous state zero and then through the
data we simply measure the resistance of this memory cell and this new memory technology has
really high potential because it checks all our boxes first of all it has very fast access time
in the range of a few nanoseconds second it works at a low operating voltage so it's compatible
with modern processors and then according to the paper it has the smallest dimensions to date
0.016 micrometer square and it's actually denser than for example TSMC's SRAM cells in 3nm process
node I've actually made some back of the envelope calculations and from the area point of view
it's about 23% more area efficient and this is brilliant the last important thing we always have
to check with a new technology is scalability and in this case these PCM cells are compatible with
the CMOS manufacturing process so it seems it could be a contender for the ultimate memory for
example it can be used for the L3 cache memory in configuration with 3D stacking let me know your
thoughts about it in the comments and consider sharing this video with your friends or colleagues
who might be interested another important point here apart of the fact that this memory technology
is very dense it also a nonvolatile memory which may be a big plus for many applications and on
top of that it has this unique property where each cell each memory cell can store multiple
bits and this multi-level memory sounds like a dream like a shortcut you know but not in this
particular case here we must consider that this property makes sense for analog in-memory
computing applications only and I will not go too much into the details here because I have
many older videos explaining how this technology works I just want to mention that we can't use
this multi-level storage properties when we talk about cash because then instead of a single
comparator we need to use ADCs and a lot of ADCs and this will explode power consumption as well
as the latency which is so important here now to the challenges of course this is a brand new
research and there are many technical challenges remains until it can reach a widespread adoption
one of the challenges is integrating it in the current CMOS manufacturing flow then reducing the
programming current for the cells and eventually improving reliability but it's a great progress
so what I think is there is a clear ongoing trend that we are putting more and more memory memory
into chips and this is becoming even more critical with the rise of AI applications despite the
fact that SRAM memory is quite an old technology it became the workhorse memory for AI in fact
many AI chips rely on SRAM memory placed close to the course and exactly these types of cheap
architectures suffer greatly from this problem I believe that SRAM technology as we know it
won't go anywhere for L1 L2 caches at least for the next couple of decades so we will see it
consuming more and more of cheap area and money afterward we will see new memory Technologies
coming first to DRM and then to L3 cache as soon as they can achieve fast enough access time
in the range of a few nanoseconds let me know your thoughts in the comments another way to address
this issue is to look for different options how to bypass the cash of course this won't work
for a CPU for example but for some applications it might be effective for instance in AI
training the training data is only used once while the parameters should be accessible
on chip so here we could find some tricks to operate without this classic cash memory as we
know it I think the current dead end situation is SRAM will compel us to work on further
Innovations and it's going to be exciting very exciting to follow so I hope you will stay
with this channel to stay up to dat with the most important and critical trends in the microchip
technology now to support the channel check out the new Asus Vivobook S 15 with the link below
or by scanning the code here thank you so much for your support and for watching and I will
see you very soon in the next episode. Ciao !