Memory Technology and Its Impact on the Industry

Hi friends, welcome back to the channel last week I conducted the webinar for tech investors and there was a lot of interest in memory technology but let me just tell you straight the memory is dead. That it's a fact and it will affect the entire industry and this is a huge problem for NVIDIA Intel Apple AMD and pretty much every chip maker and startup and this is really bad news because memory is essential for every CPU and GPU and every SoC now for the good news there is a brand new memory technology which may solve this problem and it brings us one step closer to a measure boost in speed and reduction in cost. it's super interesting let me explain ever since transistors were invented in 1947 we've always found ways to shrink them down year after year and we did great job here because the first transistors were in the range of centimeters then micrometers and now we progressed to mass producing chips at 3 and 4 nanometers for example Qualcomm Snapdragon X Elite chip is in 4 nanometer process note and the latest M4 chip is in 3 nm and recently TSMC announced that they would be ramping up the production of chips in the new 1.6nm process node and you know what for a long time memory was also following this beautiful trend but not anymore it's over now and this is a huge problem for the the entire semiconductor industry and here I'm talking about the fastest so-called cache memory over the past 60 years SRAM has been the memory of choice for applications where we need speed and fast access time a typical SRAM cell is consists of latches which are built of usually four to six transistors it's basically two inverters connected back to back with the idea that one keeps the level of another alive and this architecture differentiate it from DRAM so we love SRAM memory because it tends to perform better and also drain less power especially when it's idle it's the highest performing memory and it's integrated directly alongside with the processing cores it actually stores the data very close to the processing cores and here we still use the gigahertz range clock so we can access this data in the range of 250-500ps you see this memory is essential and here I wanted to make a memory joke but I don't remember which one XD and now it starts to get even worse because the general trend is that the amount of memory per chip is constantly increasing if we look at the all recent chips developed by Intel AMD Nvidia and Apple all of them are adding more and more memory to their chips for example Nvidia is adding more and more cash into each of their new GPU and they're making more and more cash XD the problem is that cache memory doesn't scale as well as the logic so it's keep eating up larger and larger parts of the chips and this is really catastrophic for the future now let's try to understand what exactly goes wrong here? in comparison to all other types of memory SRAM is a part of the chip die itself and it's fabricated in the same process nude as a chip logic the chip logic has more or less followed Moore's law giving us approximately two times the transistor density with each processed node at the same price.. unfortunately memory cells doesn't scale at the same rate it was at some point a factor of 1.8 scaling and then 1.6 1.4 and with each process node this number has gotten lower and lower until the point when TSMC announced their N3 process node and at this point it became crystal clear that SRAM scaling is now officially dead the N3 node actually delivered factor of 1.7 transistor scaling and a factor of 1.0 scaling of SRAM this means SRAM cells stayed exactly the same size and as we can see from this chart bit cell size has an area of 0.021um2 which is exactly the same size as it's their N5 node and what's really sad with their next improved N3B process node it's scaled by just amazingly 5% and this is actually not just the problem of TSMC because Intel Samsung Global foundies everyone is facing the same challenges.. to illustrate how bad actually things are... imagine an imaginary TSMC chip in 16nm and let's assume that 18% of the chip area is dedicated to SRAM memory if we fabricate the same chip the same design in N3 process node now SRAM area would occupy more than 30% of the chips die that's really bad but now let's understand why this scaling doesn't work simply put these memory cells are very special despite the fact that these cells are constructed from transistors they have unique structure that does not conform to the normal logic design rules for each new process node it must be redesigned using special rules developed by fundies it's a highly sensitive device which is very vulnerable to the manufacturing process variations for example two variations in the cell threshold voltage or dopant fluctuations and any of such variations can render the SRAM cell unstable and unreliable affecting overall yield this situation will not improve but most likely will get even worse when we now transition from our FinFET transistors to a new Gate-all-around transistor architecture because now we have to replace the fins with the nanosheets and this introduces many new technical challenges so I would say at this point of time it's inevitable that with each new process node SRAM memory consumes more cheap area and driving up costs from my experience in chip design I can tell you that area is everything and engineers are ready to go above and beyond.. extra mile to save every single micrometer square of area because area = money you know whenever we fabricate chips we pay price per area and the major issue here is that we actually can't do without SRAM we cannot survive without enough SRAM because if a processor core doesn't have enough SRAM it has to retrieve data from further away and this takes more power it consumes more power and also slows the speed the performance let me know your thoughts on this problem in the comments before we jump in the new memory technology that may solve this problem I want to show you something very exciting. This is the brand new ASUS Vivobook S 15 and I was very excited to try it because it's their first Copilot+PC laptop which is based on the new Qualcomm Snapdragon X Elite chip in 4 nm it has a dedicated NPU neural processing unit and it's capable of 45 TOPS with this new chip they're bringing generative AI capabilities to the laptop first of all it has a Copilot key which I find very handy you can instantly access the AI assistant which can answer your questions, generate images and create presentations for you. this is definitely the future of tech! here they've also introduced AI powered applications to simplify your daily routine starting from StoryCube for organizing your large multimedia library to Asus Adaptive Lock that keeps your space safe by locking when you're away. it also features Cocreator an AI tool that allows you to draw images in Paint and then enhance them with AI. I've been using the Asus Vivobook S 15 for more than two weeks now and what I can say apart of its sleek design it's a really capable laptop so you can run LLMs with up to 13 billion parameters on the device and the specs on this laptop are great as mentioned it has a Snapdragon X Elite chip which is an Arm based chip paired with 16 GB of RAM a gorgeous 3K OLED display and 1 terabyte of SSD and it has tons of ports like HDMI two USB C ports a Micro SD card reader and two USB A ports I've taken it with me everywhere I go and I'm really enjoying it you get up to 18 hours of battery life so I could even take it to my next transatlantic flight and work nonstop. So it's a really beautiful piece of hardware! Make sure to check check it out using the link below. so when we realized that we cannot scale SRAM memory any further we found another option chiplets simply putting memory right on top of the course it was a huge deal back in 2022 when AMD introduced their V-cache technology and this was huge because with that we can managed to add much more cash memory and here we must give a lot of credit to AMD for their forward thinking and of course some of the credit goes also to TSMC because AMD used TSMC's 3D SoIC so-called system and integrated chips packaging technology to make this to work they basically stacked an additional 64MB of L3 cache right on top of the CPU die and this additional cache gave a huge performance boost to many applications including gaming in general let's agree on one thing this idea of stacking one thing on top of each other is brilliant because this gives you an opportunity to mix and match different dies or different chiplets in different process nodes for example you can build a chip in the most advanced process node for the logic and on top you stack a memory die in one of the older process nodes in this case we can benefit from the speed and transistor density and also power improvements in the core logic and then use some bigger memory on top and this memory can even be in the older process node which means it will be more reliable and also much much cheaper and we see this approach adopted by more and more companies not only for the cache but also for other types of memory as well as for other blocks for analog circuits for example because analog circuits also don't scale where well at least not as well as digital in this way we can combine different chiplets manufactured at different technologies let's say at 3nm and at 16nm and build a chip in the most efficient way possible the truth is that chiplets are great and they may help us to reduce costs and add more cash but it's not really solving the problem that's why for a long time now the industry been looking for an alternative to current memory technology so there are several emerging memory Technologies including magnetic Ram ferroelectric Ram resistive Ram Phase-change-memory and others I used to work with ream resistive Ram in the past but what is so interesting about all these memory flavours that each of them has their pros and cons for example some of them are more optimised for area or speed some for power some have faster access rates and higher band with than others on top of that we know that each CPU or GPU features several types of memory and each of them have different requirements the main differences are in their data storage mechanism and the speed and speed as we know for SRAM is essential because SRAM is the fastest with access times in the range of a few nanoseconds and it has to be low power DRAM of course is slower it's based on just one transistor and the capacity and here the excess time is in the range of tens of nanoseconds and we all know flash memory flash memory is the slowest it takes microseconds to read the data from it and then we also have to distinguish between volatile and non-volatile memory so for example SRAM and DRAM are volatile memory which means they only retain data when the power is supplied so now it's clear that when it comes to SRAM the most critical things are latency area and Power consumption so in this new paper published in nature researchers at Stanford have developed a new PCM so phase change memory material called GST467 that uses chalcogenide in a superlattice structure and it's a great one in terms of the properties we are looking for how does this phase change memory work it's basically a memory cell that consists of a glass material which is sandwiched between two electrodes and when we apply high current pulse to it it switches between the crystalline and amorphous states and the crystalline state represents a digital one and amorphous state zero and then through the data we simply measure the resistance of this memory cell and this new memory technology has really high potential because it checks all our boxes first of all it has very fast access time in the range of a few nanoseconds second it works at a low operating voltage so it's compatible with modern processors and then according to the paper it has the smallest dimensions to date 0.016 micrometer square and it's actually denser than for example TSMC's SRAM cells in 3nm process node I've actually made some back of the envelope calculations and from the area point of view it's about 23% more area efficient and this is brilliant the last important thing we always have to check with a new technology is scalability and in this case these PCM cells are compatible with the CMOS manufacturing process so it seems it could be a contender for the ultimate memory for example it can be used for the L3 cache memory in configuration with 3D stacking let me know your thoughts about it in the comments and consider sharing this video with your friends or colleagues who might be interested another important point here apart of the fact that this memory technology is very dense it also a nonvolatile memory which may be a big plus for many applications and on top of that it has this unique property where each cell each memory cell can store multiple bits and this multi-level memory sounds like a dream like a shortcut you know but not in this particular case here we must consider that this property makes sense for analog in-memory computing applications only and I will not go too much into the details here because I have many older videos explaining how this technology works I just want to mention that we can't use this multi-level storage properties when we talk about cash because then instead of a single comparator we need to use ADCs and a lot of ADCs and this will explode power consumption as well as the latency which is so important here now to the challenges of course this is a brand new research and there are many technical challenges remains until it can reach a widespread adoption one of the challenges is integrating it in the current CMOS manufacturing flow then reducing the programming current for the cells and eventually improving reliability but it's a great progress so what I think is there is a clear ongoing trend that we are putting more and more memory memory into chips and this is becoming even more critical with the rise of AI applications despite the fact that SRAM memory is quite an old technology it became the workhorse memory for AI in fact many AI chips rely on SRAM memory placed close to the course and exactly these types of cheap architectures suffer greatly from this problem I believe that SRAM technology as we know it won't go anywhere for L1 L2 caches at least for the next couple of decades so we will see it consuming more and more of cheap area and money afterward we will see new memory Technologies coming first to DRM and then to L3 cache as soon as they can achieve fast enough access time in the range of a few nanoseconds let me know your thoughts in the comments another way to address this issue is to look for different options how to bypass the cash of course this won't work for a CPU for example but for some applications it might be effective for instance in AI training the training data is only used once while the parameters should be accessible on chip so here we could find some tricks to operate without this classic cash memory as we know it I think the current dead end situation is SRAM will compel us to work on further Innovations and it's going to be exciting very exciting to follow so I hope you will stay with this channel to stay up to dat with the most important and critical trends in the microchip technology now to support the channel check out the new Asus Vivobook S 15 with the link below or by scanning the code here thank you so much for your support and for watching and I will see you very soon in the next episode. Ciao !

Hi friends, welcome back to the channel last week 
I conducted the webinar for tech investors and   there was a lot of interest in memory technology 
but let me just tell you straight the memory   is dead. That it&#39;s a fact and it will affect 
the entire industry and this is a huge problem   for NVIDIA Intel Apple AMD and pretty much every 
chip maker and startup and this is really bad news   because memory is essential for every CPU and 
GPU and every SoC now for the good news there   is a brand new memory technology which may solve 
this problem and it brings us one step closer to   a measure boost in speed and reduction in cost. 
it&#39;s super interesting let me explain ever since   transistors were invented in 1947 we&#39;ve always 
found ways to shrink them down year after year   and we did great job here because the first 
transistors were in the range of centimeters   then micrometers and now we progressed to 
mass producing chips at 3 and 4 nanometers   for example Qualcomm Snapdragon X Elite chip is 
in 4 nanometer process note and the latest M4   chip is in 3 nm and recently TSMC announced that 
they would be ramping up the production of chips   in the new 1.6nm process node and you know what 
for a long time memory was also following this   beautiful trend but not anymore it&#39;s over now 
and this is a huge problem for the the entire   semiconductor industry and here I&#39;m talking about 
the fastest so-called cache memory over the past   60 years SRAM has been the memory of choice for 
applications where we need speed and fast access   time a typical SRAM cell is consists of latches 
which are built of usually four to six transistors   it&#39;s basically two inverters connected back to 
back with the idea that one keeps the level of   another alive and this architecture differentiate 
it from DRAM so we love SRAM memory because   it tends to perform better and also drain less 
power especially when it&#39;s idle it&#39;s the highest   performing memory and it&#39;s integrated directly 
alongside with the processing cores it actually   stores the data very close to the processing 
cores and here we still use the gigahertz range   clock so we can access this data in the range 
of 250-500ps you see this memory is essential   and here I wanted to make a memory joke but I 
don&#39;t remember which one XD and now it starts   to get even worse because the general trend is 
that the amount of memory per chip is constantly   increasing if we look at the all recent chips 
developed by Intel AMD Nvidia and Apple all of   them are adding more and more memory to their 
chips for example Nvidia is adding more and   more cash into each of their new GPU and they&#39;re 
making more and more cash XD the problem is that   cache memory doesn&#39;t scale as well as the logic 
so it&#39;s keep eating up larger and larger parts of   the chips and this is really catastrophic for the 
future now let&#39;s try to understand what exactly   goes wrong here? in comparison to all other types 
of memory SRAM is a part of the chip die itself   and it&#39;s fabricated in the same process nude 
as a chip logic the chip logic has more or less   followed Moore&#39;s law giving us approximately two 
times the transistor density with each processed   node at the same price.. unfortunately memory 
cells doesn&#39;t scale at the same rate it was at   some point a factor of 1.8 scaling and then 1.6 
1.4 and with each process node this number has   gotten lower and lower until the point when TSMC 
announced their N3 process node and at this point   it became crystal clear that SRAM scaling is now 
officially dead the N3 node actually delivered   factor of 1.7 transistor scaling and a factor of 
1.0 scaling of SRAM this means SRAM cells stayed   exactly the same size and as we can see from this 
chart bit cell size has an area of 0.021um2 which   is exactly the same size as it&#39;s their N5 node 
and what&#39;s really sad with their next improved   N3B process node it&#39;s scaled by just amazingly 5% 
and this is actually not just the problem of TSMC   because Intel Samsung Global foundies everyone is 
facing the same challenges.. to illustrate how bad   actually things are... imagine an imaginary TSMC 
chip in 16nm and let&#39;s assume that 18% of the chip   area is dedicated to SRAM memory if we fabricate 
the same chip the same design in N3 process node   now SRAM area would occupy more than 30% of 
the chips die that&#39;s really bad but now let&#39;s   understand why this scaling doesn&#39;t work simply 
put these memory cells are very special despite   the fact that these cells are constructed from 
transistors they have unique structure that does   not conform to the normal logic design rules for 
each new process node it must be redesigned using   special rules developed by fundies it&#39;s a highly 
sensitive device which is very vulnerable to the   manufacturing process variations for example 
two variations in the cell threshold voltage   or dopant fluctuations and any of such variations 
can render the SRAM cell unstable and unreliable   affecting overall yield this situation will not 
improve but most likely will get even worse when   we now transition from our FinFET transistors 
to a new Gate-all-around transistor architecture   because now we have to replace the fins with the 
nanosheets and this introduces many new technical   challenges so I would say at this point of time 
it&#39;s inevitable that with each new process node   SRAM memory consumes more cheap area and driving 
up costs from my experience in chip design I can   tell you that area is everything and engineers are 
ready to go above and beyond.. extra mile to save   every single micrometer square of area because 
area = money you know whenever we fabricate   chips we pay price per area and the major issue 
here is that we actually can&#39;t do without SRAM   we cannot survive without enough SRAM because 
if a processor core doesn&#39;t have enough SRAM it   has to retrieve data from further away and this 
takes more power it consumes more power and also   slows the speed the performance let me know your 
thoughts on this problem in the comments before we   jump in the new memory technology that may solve 
this problem I want to show you something very   exciting. This is the brand new ASUS Vivobook S 
15 and I was very excited to try it because it&#39;s   their first Copilot+PC laptop which is based 
on the new Qualcomm Snapdragon X Elite chip in   4 nm it has a dedicated NPU neural processing 
unit and it&#39;s capable of 45 TOPS with this new   chip they&#39;re bringing generative AI capabilities 
to the laptop first of all it has a Copilot key   which I find very handy you can instantly access 
the AI assistant which can answer your questions,   generate images and create presentations 
for you. this is definitely the future of   tech! here they&#39;ve also introduced AI powered 
applications to simplify your daily routine   starting from StoryCube for organizing your 
large multimedia library to Asus Adaptive   Lock that keeps your space safe by locking when 
you&#39;re away. it also features Cocreator an AI   tool that allows you to draw images in Paint 
and then enhance them with AI. I&#39;ve been using   the Asus Vivobook S 15 for more than two weeks 
now and what I can say apart of its sleek design   it&#39;s a really capable laptop so you can run LLMs 
with up to 13 billion parameters on the device and   the specs on this laptop are great as mentioned 
it has a Snapdragon X Elite chip which is an Arm   based chip paired with 16 GB of RAM a gorgeous 
3K OLED display and 1 terabyte of SSD and it has   tons of ports like HDMI two USB C ports a Micro 
SD card reader and two USB A ports I&#39;ve taken it   with me everywhere I go and I&#39;m really enjoying 
it you get up to 18 hours of battery life so I   could even take it to my next transatlantic flight 
and work nonstop. So it&#39;s a really beautiful piece   of hardware! Make sure to check check it out using 
the link below. so when we realized that we cannot   scale SRAM memory any further we found another 
option chiplets simply putting memory right on   top of the course it was a huge deal back in 2022 
when AMD introduced their V-cache technology and   this was huge because with that we can managed to 
add much more cash memory and here we must give   a lot of credit to AMD for their forward thinking 
and of course some of the credit goes also to TSMC   because AMD used TSMC&#39;s 3D SoIC so-called system 
and integrated chips packaging technology to make   this to work they basically stacked an additional 
64MB of L3 cache right on top of the CPU die and   this additional cache gave a huge performance 
boost to many applications including gaming   in general let&#39;s agree on one thing this idea 
of stacking one thing on top of each other is   brilliant because this gives you an opportunity to 
mix and match different dies or different chiplets   in different process nodes for example you can 
build a chip in the most advanced process node   for the logic and on top you stack a memory die 
in one of the older process nodes in this case we   can benefit from the speed and transistor density 
and also power improvements in the core logic and   then use some bigger memory on top and this 
memory can even be in the older process node   which means it will be more reliable and also much 
much cheaper and we see this approach adopted by   more and more companies not only for the cache 
but also for other types of memory as well as   for other blocks for analog circuits for example 
because analog circuits also don&#39;t scale where   well at least not as well as digital in this way 
we can combine different chiplets manufactured   at different technologies let&#39;s say at 3nm and 
at 16nm and build a chip in the most efficient   way possible the truth is that chiplets are great 
and they may help us to reduce costs and add more   cash but it&#39;s not really solving the problem 
that&#39;s why for a long time now the industry   been looking for an alternative to current memory 
technology so there are several emerging memory   Technologies including magnetic Ram ferroelectric 
Ram resistive Ram Phase-change-memory and others   I used to work with ream resistive Ram in 
the past but what is so interesting about   all these memory flavours that each of them has 
their pros and cons for example some of them are   more optimised for area or speed some for power 
some have faster access rates and higher band   with than others on top of that we know that each 
CPU or GPU features several types of memory and   each of them have different requirements the main 
differences are in their data storage mechanism   and the speed and speed as we know for SRAM is 
essential because SRAM is the fastest with access   times in the range of a few nanoseconds and it 
has to be low power DRAM of course is slower it&#39;s   based on just one transistor and the capacity and 
here the excess time is in the range of tens of   nanoseconds and we all know flash memory flash 
memory is the slowest it takes microseconds to   read the data from it and then we also have to 
distinguish between volatile and non-volatile   memory so for example SRAM and DRAM are volatile 
memory which means they only retain data when the   power is supplied so now it&#39;s clear that when 
it comes to SRAM the most critical things are   latency area and Power consumption so in this new 
paper published in nature researchers at Stanford   have developed a new PCM so phase change memory 
material called GST467 that uses chalcogenide   in a superlattice structure and it&#39;s a great one 
in terms of the properties we are looking for how   does this phase change memory work it&#39;s basically 
a memory cell that consists of a glass material   which is sandwiched between two electrodes and 
when we apply high current pulse to it it switches   between the crystalline and amorphous states and 
the crystalline state represents a digital one   and amorphous state zero and then through the 
data we simply measure the resistance of this   memory cell and this new memory technology has 
really high potential because it checks all our   boxes first of all it has very fast access time 
in the range of a few nanoseconds second it works   at a low operating voltage so it&#39;s compatible 
with modern processors and then according to   the paper it has the smallest dimensions to date 
0.016 micrometer square and it&#39;s actually denser   than for example TSMC&#39;s SRAM cells in 3nm process 
node I&#39;ve actually made some back of the envelope   calculations and from the area point of view 
it&#39;s about 23% more area efficient and this is   brilliant the last important thing we always have 
to check with a new technology is scalability and   in this case these PCM cells are compatible with 
the CMOS manufacturing process so it seems it   could be a contender for the ultimate memory for 
example it can be used for the L3 cache memory in   configuration with 3D stacking let me know your 
thoughts about it in the comments and consider   sharing this video with your friends or colleagues 
who might be interested another important point   here apart of the fact that this memory technology 
is very dense it also a nonvolatile memory which   may be a big plus for many applications and on 
top of that it has this unique property where   each cell each memory cell can store multiple 
bits and this multi-level memory sounds like a   dream like a shortcut you know but not in this 
particular case here we must consider that this   property makes sense for analog in-memory 
computing applications only and I will not   go too much into the details here because I have 
many older videos explaining how this technology   works I just want to mention that we can&#39;t use 
this multi-level storage properties when we   talk about cash because then instead of a single 
comparator we need to use ADCs and a lot of ADCs   and this will explode power consumption as well 
as the latency which is so important here now   to the challenges of course this is a brand new 
research and there are many technical challenges   remains until it can reach a widespread adoption 
one of the challenges is integrating it in the   current CMOS manufacturing flow then reducing the 
programming current for the cells and eventually   improving reliability but it&#39;s a great progress 
so what I think is there is a clear ongoing trend   that we are putting more and more memory memory 
into chips and this is becoming even more critical   with the rise of AI applications despite the 
fact that SRAM memory is quite an old technology   it became the workhorse memory for AI in fact 
many AI chips rely on SRAM memory placed close   to the course and exactly these types of cheap 
architectures suffer greatly from this problem   I believe that SRAM technology as we know it 
won&#39;t go anywhere for L1 L2 caches at least   for the next couple of decades so we will see it 
consuming more and more of cheap area and money   afterward we will see new memory Technologies 
coming first to DRM and then to L3 cache as   soon as they can achieve fast enough access time 
in the range of a few nanoseconds let me know your   thoughts in the comments another way to address 
this issue is to look for different options how   to bypass the cash of course this won&#39;t work 
for a CPU for example but for some applications   it might be effective for instance in AI 
training the training data is only used   once while the parameters should be accessible 
on chip so here we could find some tricks to   operate without this classic cash memory as we 
know it I think the current dead end situation   is SRAM will compel us to work on further 
Innovations and it&#39;s going to be exciting   very exciting to follow so I hope you will stay 
with this channel to stay up to dat with the most   important and critical trends in the microchip 
technology now to support the channel check out   the new Asus Vivobook S 15 with the link below 
or by scanning the code here thank you so much   for your support and for watching and I will 
see you very soon in the next episode. Ciao !

Transcript for:Memory Technology and Its Impact on the Industry

Transcript for:
Memory Technology and Its Impact on the Industry