Future Risks and Concerns with AGI: Lex Friedman Podcast with Roman Yampolskiy

if we create General super intelligences I don't see a good outcome longterm for Humanity so that is X risk existential risk everyone's dead there is srisk suffering risks where everyone wishes they were dead we have also idea for IR risk iyy risks where we lost our meaning the systems can be more creative they can do all the jobs it's not obvious what you have to contribute to a world where super intelligence exists of course you can have all the variants you mentioned where we are safe we are kept alive but we are not in control we are not deciding anything we like animals in a zo there is again possibilities we can come up with as very smart humans and then possibility is something a thousand times smarter can come up with for reasons we cannot comprehend the following is a conversation with Roman yski an AI Safety and Security research and author of a new book titled AI unexplainable unpredictable uncontrollable he argues that there's almost 100% chance that AGI will eventually destroy human civilization as an aside let me say that I will have many often technical conversations on the topic of AI often with Engineers building the state of the art AI systems I would say those folks put the infamous P Doom or the probability of a GI killing all humans at around 1 to 20% but it's also important to talk to folks who put that value at 70 80 90 and is in the case of Roman at 99.99 and many more 9es per. I'm personally excited for the future and believe it will be a good one in part because of the amazing technological innovation we humans create but we must absolutely not do so with blinders on ignoring the possible risks including existential risks of those Technologies that's what this conversation is about this is the Lex Freedman podcast to support it please check out our sponsors in the description and now dear friends here's Roman yski what to you is the probability that super intelligent AI will destroy all human civilization what's the time frame let's say 100 years in the next 100 years so the problem of controlling AI or super intelligence in my opinion is like a problem of creating a Perpetual safety Machine by analogy with perpetual motion machine is impossible yeah we may succeed and do a good job with GPT 5 6 7 but they just keep improving learning eventually self-modifying interacting with the environment interacting with malevolent actors the difference between cyber security narrow AI safety and safety for General AI for super intelligence is that we don't get a second chance with cyber security somebody hacks your account what's the big deal you get a new password new credit card you move on here if we're talking about existential risks you only get one chance so you're really asking me what are the chances that will create the most complex software ever on the first try with zero bugs and it will continue have zero bugs for 100 years or more so there is an incremental Improvement of systems leading up to AGI to you it doesn't matter if we can keep those safe there's going to be one level of system at which you cannot possibly control it I don't think we so far have made any system safe at the level of capability they display they already have made mistakes we had accidents they've been jailbroken I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do but there's a difference between getting it to do something unintended getting it to do something that's painful costly destructive and something that's destructive to the level of hurting billions of people or hundreds of millions of people billions of people or the entirety of human civilization that's a big leap exactly but the systems we have today have capability of causing x amount of damage so then they fail that's all we get if we develop systems capable of impacting all of humanity all of universe the damage is proportionate what to you are the possible ways that such kind of mass murder of humans can happen it's always a wonderful question so one of the chapters in my new book is about unpredictability I argue that we cannot predict what a smarter system will do so you're really not asking me how super intelligence will kill everyone you're asking me how I would do it and I think it's not that interesting I can tell you about the standard you know nanotag synthetic bionuclear super intelligence will come up with something completely new completely super we may not even recognize that as a possible path to achieve that goal so there is like a unlimited level of creativity in terms of how humans could be killed but you know we could still investigate possible ways of doing it not how to do it but the at the end what is the methodology that does it you know shutting off the power and then humans start killing each other maybe because the resource is are really constrained that there then there's the actual use of weapons like nuclear weapons or developing artificial pathogens viruses that kind of stuff we could still kind of think through that and defend against it right there's a ceiling to the creativity of mass murder of humans here right the options are limited they are limited by how imaginative we are if you are that much smarter that much more creative you are capable of thinking across multiple domains do no research in physics and biology you may not be limited by those tools if squirrels were planning to kill humans they would have a set of possible ways of doing it but they would never consider things we can come up so are you are you thinking about mass murder and destruction of human civilization or you thinking of with squirrels you put them in a zoo and they don't really know they're in a zoo if we just look at the entire set of undesirable trajectories majority of them are not going to be death most of them are going to be just like uh things like Brave New World where you know the squirrels are fed dopamine and they're all like doing some kind of fun activity and the sort of the fire the soul of humanity is lost because of the drug that's fed to it or like literally in a zoo we're in a zoo we're doing our thing we're like playing a game of Sims and the actual players playing that game are AI systems those are all undesirable because sort of sort of the the free will the fire of human consciousness is dimmed through that process but it's not killing humans so like are you thinking about that or is the biggest concern literally the extinctions of humans I think about a lot of things so there is X risk existential risk everyone's dead there is srisk suffering risks where everyone wishes they were that we have also idea for IR risk iy risks where lost their meaning the systems can be more creative they can do all the jobs it's not obvious what you have to contribute to a world where super intelligence exists of course you can have all the variants you mentioned where we are safe we are kept alive but we are not in control we are not deciding anything we like animals in a zo there is again possibilities we can come up with as very smart humans and then possibility is something a thousand smarter can come up with for reasons we cannot comprehend I would love to sort of dig into each of those X risk srisk and IR risk so can can you like Linger on irisk what is that so Japanese concept of viky guy you find something which allows you to make money you are good at it and the society says we need it so like you have this awesome job you are podcaster gives you a lot of meaning you have a good life I assume you happy mhm that's what we want most people to find to have for many intellectuals it is their occupation which gives them a lot of meaning I am a researcher philosopher scholar that means something to me in a world where an artist is not feeling appreciated because his art is just not competitive with what is produced by machines or writer or scientist will lose a lot of that and at the lower level we're talking about complete technological unemployment we're not losing 10% of jobs we're losing all jobs what do people do with all that free time what happens then everything Society is built on is completely modified in one generation it's not a slow process where we get to kind of figure out how to live that new lifestyle but it's uh pretty quick in that world can't humans just do what humans currently do with chess play each other have tournaments even though AI systems are far superior at this time in chess so we just create artificial games or for us they're real like the Olympics and we do all kinds of different competitions and have fun Focus maximize the fun and and uh let uh the AI focus on the productivity it's an option I have a paper where I try to solve the value alignment problem for multiple agents and the solution to avoid compromise is to give everyone a personal virtual Universe you can do whatever you want in that world you could be king you could be slave you decide what happens so it's basically a glorified video game where you get to enjoy yourself and someone else takes care of your needs and the substrate alignment is the only thing we need to solve we don't have to get 8 billion humans to agree on anything mhm so okay so what why is that not a likely outcome why can't they systems create video games for us to lose ourselves in each each with an individual video game Universe some people say that's what happened we're in a simulation and we're playing that video game and now we're creating uh what maybe we're creating artificial threats for ourselves to be scared about cuz cuz fear is really exciting it allows us to play the video game more uh more vigorously and some people choose to play on a more difficult level with more con strange some say okay I'm just going to enjoy the game high privilege level absolutely so okay what was that paper on multi-agent value alignment personal universes personal universes so so that's one of the possible outcomes but what what what in general is the idea of the paper so it's looking at multiple agents they're human AI like a hybrid system whether it's humans and AI or is it looking at humans or just so this is intelligent agents in order to solve value alignment problem I'm trying to formalize it a little better usually we're talking about getting AIS to do what we want which is not well defined we're talking about creator of a system owner of that AI Humanity as a whole but we don't agree on much there is no universally accepted ethics morals across cultures religions people have individually very different preferences politically and such so even if we somehow managed all the other aspects of it programming those fuzzy Concepts and getting to follow them closely we don't agree on what to program in so my solution was okay we don't have to compromise on room temperature you have your Universe I have mine whatever you want and if you like me you can invite me to visit your Universe we don't have to be independent but the point is you can be and virtual reality is getting pretty good it's going to hit a point where you can't tell the difference and if you can't tell if it's real or not what's the difference so basically give up on value alignment create and entire it's like the the Multiverse Theory this just create an entire universe for you where your values you still have to align with that individual they have to be happy in that simulation but it's a much easier problem to align with one agent versus 8 billion agents plus animals aliens so you convert the multi-agent problem into a single agent problem I'm trying to do that yeah okay is there any way to is so okay that's giving up on the on the value problem well is there any way to solve the value alignment problem where there's a bunch of humans multiple humans tens of humans or 8 billion humans that have very different set of values it seems contradictory I haven't seen anyone explain what it means outside of kind of words which pack a lot make it good make it desirable make it something they don't regret but how do you specifically formalize those Notions how do you program women I haven't seen anyone uh make progress on that so far but isn't that the whole optimization Journey that we're doing as a human civilization we're looking at geopolitics nations are in a state of Anarchy with each other they start wars there's conflict and often times they have a very different views of what is good and and what is evil isn't that what we're trying to figure out just together trying to converge towards that so we're essentially trying to solve the value alignment problem with humans right but the examples you gave uh some of them are for example two different religions saying this is our holy sight and we are not willing to compromise it in any way if you can make two holy s sites in virtual world you solve the problem but if you only have one it's not divisible you kind of stuck there but what if we want to be a tension with each other and that through that tension we understand ourselves and we understand the world so that that's the intellectual Journey we're on we're on as a human civilization is we create intellectual and physical conflict and through that figure stuff out if we go back to that idea of simulation and this is a entertainment kind of giving meaning to us the question is how much suffering is reasonable for a video game so yeah I don't mind you know a video game where I get heptic feedback there is a little bit of shaking maybe I'm a little scared I don't want a game where like kids are tortured literally that seems unethical at least by our human standards are you suggesting it's possible to remove suffering if we're looking at human civilization as an optimization problem so we know there are some humans who because of a mutation don't experience physical pain so at least physical pain can be mutated out re-engineered out suffering in terms of meaning like you burn the only copy of my book is a little harder but even there you can manipulate your honic set point you can change defaults you can reset problem with that is if you start messing with your reward Channel you start wireheading and uh end up bissing out uh a little too much well that's the the question would you really want to live in a world where there's no suffering that's a dark question is there some level of suffering that reminds us of what this is all for I I think we need that but I would change the overall range so right now it's negative Infinity to kind of positive Infinity pain pleasure AIS I would make it like zero to positive infinity and being unhappy is like I'm close to zero okay so what what's the srisk what are the possible things that you're imagining with srisk so Mass suffering of humans what are we talking about there caused by AGI so there are many malevolent actors we can talk about Psychopaths crazies hackers Doomsday cults we know from history they tried killing everyone they tried on purpose to cause maximum amount of damage terrorism what if someone malevolent wants on purpose to torture all humans as long as possible you solve aging so now you have functional immortality and you just try to be as creative as you can do you think there is actually people in human history that try to literally maximize human suffering in just studying people have done evil in the world it seems that they think that they're doing good and it doesn't seem like they're trying to maximize suffering they just cause a lot of suffering as a side effect of doing what they think is good so there are different malevolent agents some may be just gaining personal benefit and sacrificing others to that cause others will know for a fact trying to kill as many people as possible when we look at recent school shootings if they had more capable weapons they would take out not dozens but thousands millions billions well we don't know that but that is a terrifying possibility and we don't want to find out like if terrorists had access to nuclear weapons how far would they go is there a limit to what they're willing to do in your senses there's some malevolent actors where there's no limit there is mental mental diseases where people don't have empathy don't have this human quality of understanding suffering in others and then there's also set of beliefs where you think you're doing good uh by killing a lot of humans again I would like to assume that normal people never think like that it's always some sort of psychopaths but yeah and to you AGI systems can carry that and uh be more competent at executing that they can certainly be more creative they can understand human biology better understand our molecular structure genome uh again uh a lot of times uh torture ends then in individual dies that limit can be removed as well so if we're actually looking at x risk and srisk as the systems get more and more intelligent don't you think it it's possible to anticipate the ways they can do it and defend against it like we do with the cyber security with the do security systems right uh we can definitely keep up for a while I'm saying you cannot do it indefinitely at some point the cognitive Gap is too big the surface you have to defend is infinite but attackers only need to find one exploit so to you eventually this is we're heading off a cliff if we create General super intelligences I don't see a good outcome long term for Humanity the only way to win this game is not to play it okay well we we we'll talk about possible solutions and what not playing it means um but what are the possible timelines here to you what are we talking about we're talking about a set of years decades centuries what do you think I don't know for sure the prediction markets right now are saying 2026 for AGI I heard the same thing from CEO of anthropic dip mine so maybe we are 2 years away which seems very soon uh given we don't have a working safety mechanism in place or even a prototype for one and there are people trying to accelerate those timelines because they feel we're not getting there quick enough but what do you think they mean when they say AGI so the definitions we used to have when people are modifying a little bit lately artificial general intelligence was a system capable of performing in any domain a human could perform so kind of you creating this average artificial person they can do cognitive labor physical labor where you can get another human to do it superintelligence was defined as a system which is superior to All Humans in all domains now people are starting to refer to AGI as if it's super intelligence I made a post recently where I argued for me at least if you average out over all the common human tasks those systems are already smarter than an average human mhm so under that definition we we have it Shane L has this definition of where you're trying to win in all domains that's what intelligence is now are they smarter than Elite individuals in certain domains of course not they're not there yet but uh the progress is exponential see I'm much more concerned about social engineering so to me ai's ability to do something in the physical world like the the lowest hanging fruit this the easiest set of methods is by just getting humans to do it it's going to be much harder to to uh be the kind of viruses that take over the minds of robots that where the robots are executing the commands it just seems like humans social engineering of humans is much more likely that would be enough to bootst the whole process okay just to linger on the term AGI what's what to you is the difference between AGI and human level intelligence uh human level is General in the domain of expertise of humans we know how to do human things I don't speak dog language I should be able to pick it up if I'm a general intelligence it's kind of inferior animal I should be able to learn that skill but I can't at general intelligence truly Universal general intelligence should be able to do things like that humans cannot do to be able to talk to animals for example to solve pattern recognition problems of that type to do of similar things outside of our domain of expertise because it's just not the world will if we just look at the space of cognitive abilities we have I just would love to understand what the limits are Beyond which an AGI system can reach like what does that look like what about about actual mathematical thinking or uh scientific innovation that kind of stuff we know calculators are smarter than humans in that narrow domain of addition but is it humans plus tools versus AGI or just human raw human intelligence cu cu humans create tools and with the tools they become more intelligent so like there there's a gray area there what it means to be human when we're measuring their intelligence so when I think about it I usually think human with like a paper and a pencil not human with internet and anava AI helping but is that a fair way to think about it cuz isn't there another definition of human level intelligence that includes the tools that humans create but we create AI so at any point you'll still just add super intelligence to human capability that seems like cheating no controllable tools there is there is an implied leap that you're making when AGI goes from tool to uh entity that can make its own decisions so if we Define human level intelligence as everything a human can do with fully controllable tools it seems like a hybrid of some kind you're now doing brain computer interfaces you connecting it to maybe narrow AI yeah it definitely increases our capabilities so what's a good test to you that uh measures whether uh an artificial intelligence system has reached human level intelligence and was a good test where it has superseded human level intelligence to reach that land of AGI I am oldfashioned I like tting test I have a paper where I equate passing touring test to solving AI complete problems because you can encode any questions about any domain into the touring test you don't have to talk about how was your day you can ask anything and so the system has to be as smart as a human to pass it in a true sense but then you would extend that to U maybe a very long conversation like I think the Alexa prize was doing that basically can you do a 20 minute 30 minute conversation with an ass system it has to be long enough to where you can make some meaningful decisions about capabilities absolutely you can Brute Force very short conversations so like literally what does that look like can we do uh can we construct formally a kind of test that tests for AGI for AGI it has to be there I cannot give it a task I can give to a human and it cannot do it if a human can for super intelligent it would be superior on all such tasks not just average performance so like go learn to drive car go speak Chinese play guitar okay great I guess the the following question is there a test for the kind of AGI that would be uh susceptible to lead to srisk or X risk susceptible to destroy human civilization like is there a test for that you can develop a test which will give you positives if it lies to you or has those ideas you cannot develop a test which rules them out there is always possibility of what bom calls a treacherous turn where later on a system decides for game theoretic reasons economic reasons to change its behavior and we see the same with humans it's not unique to AI for Millennia we tried developing morals ethics religions uh light detector tests and then employees betray the employer spouses betray family it's a pretty standard thing intelligent agents sometimes do so is it is it possible to detect when a AI system is lying or deceiving you if you know the truth and it tells you something false you can detect that but you cannot know in general every single time and again the system you're testing today may not be lying the system you're testing today may know you are testing it and so behaving and later on after it interacts with the environment interacts with other systems malevolent agents learns more it may start doing those things so do you think it's possible to develop a system where the creators of the system the developers the program rers don't know that it's deceiving them so systems today don't have long-term planning that is not out they can lie today if it optimizes helps them optimize the reward if they realize okay this human will be very happy if I tell them the following they will do it if it brings them more points and they don't have to kind of keep track of it it's just the right answer to this problem every single time at which point is somebody creating that intentionally not unintentionally intentionally creating an AI system that's doing long-term planning with an objective function that's defined by the AI system not by a human well some people think that if they're that smart they always good they really do believe that it's just benevolence from intelligence so they'll always want what's best for us some people think that uh they will be able to detect problem behaviors and correct them at the time when we get there I don't think it's a good idea I am strongly against it but yeah there are quite a few people who in general are so optimistic about this technology it could do no wrong they want it developed as soon as possible as capable as possible so there's going to be people who believe the more intelligent it is the more benevolent and so therefore it should be the one that defines the objective function that it's U optimizing when it's doing long-term planning there are even people who say okay what's so special about humans right we removed the gender bias we're removing race bias why is this pro-human bias we are polluting the planet we are as you said you know fight a lot of Wars kind of violent maybe it's better if this super intelligent perfect uh Society comes and replaces us it's normal stage in the evolution of our species yeah so somebody says uh let's develop an AI system that removes the violent humans from the world and then it turns out that all humans have violence in them or the capacity for violence and therefore all humans are removed yeah yeah yeah let me ask about uh Yan laon he's somebody who uh you've had a few exchanges with and he's somebody who actively pushes back against this view that AI is going to lead to destruction of uh human civilization also known as uh Ai dorismar and open source are the best ways to understand and mitigate the risks and two AI is not something that just happens we build it we have agency in what it becomes hence we control the risks we meaning humans it's not some sort of natural phenomena that uh we have no control over so can you can you make the case that he's right and can you try to make the case that he's wrong I cannot make a case that he's right he's wrong in so many ways it's difficult for me to remember all of them uh he is a Facebook buddy so I have a lot of fun uh having those little debates with him so I'm trying to remember the arguments so one he he says we are not gifted to this intelligence from Aliens we are designing it we are making decisions about it that's not true it was true then we had expert systems symbolic AI decision threes today you set up parameters for a model and you water this plant you give it data you give it compute and it grows and after it's finished growing into this alien plant you start testing it to find out what capabilities it has and it takes years to figure out even for existing models if it's Str for 6 months it will take you 2 3 years to figure out basic capabilities of that system we still discover new capabilities in systems which are already out there so that's that's not the case so just to linger on that to you the difference there that there is some level of emergent intelligence that happens in our current approaches so stuff that we don't hardcode in absolutely that's what makes it so successful then we had to painstakingly hardcode in everything we didn't have much progress now just spend more money and more compute and it's a lot more capable and then the question is when there is emergent intelligent phenomena what is the ceiling of that for you there's no ceiling for uh for Yan laon I think there's a kind of ceiling that happens that we have full control over even if we don't understand the internals of the emergence how the emergence happens there's a sense that we have control and understanding of the approximate ceiling of capability the limits of the capability let's say there is a ceiling it's not guaranteed to be at a level which is competitive with us it may be greatly Superior to ours so what about his statement about open research and open source are the best ways to understand and mitigate the risks historically he's completely right open source software is wonderful it's tested by the community it's de but we're switching from tools to agents now you're giving open source weapons to Psychopaths do we want to open source nuclear weapons biological weapons it's not safe to give technology so powerful to those who may misalign it even if you are successful at somehow getting it to work in the first place in a friendly manner but the difference with nuclear weapons current AI systems are not akin to nuclear weapons so the idea there is you're open sourcing it at this stage that you can understand it better large large number of people can explore the limitation the capabilities explore the possible ways to keep it safe to keep uh it secure all that kind of stuff while it's not at the stage of nuclear weapons so nuclear weapons there's a no nuclear weapon and then there's a nuclear weapon with AI systems there's a gradual Improvement of capability and you get to uh perform that Improvement incrementally and so open source allows you to study uh how things go wrong I study the the very process of emergence study AI safety on those systems when there's not a high level of danger all that kind of stuff it also sets a very wrong precedence so we open sourced model one model two model three nothing ever bad happened so obviously we're going to do it with model four it's just gradual Improvement I I don't think it always works with the precedent like you're not stuck doing it the way you always did it just uh it's that's a precedent of open research and open development such that we get to learn together and then the first time there's a sign of danger some dramatic thing happen not a thing that destroys human civilization but some dramatic demonstration of capability that can legitimately lead to a lot of damage then everybody wakes up and says okay we need to regulate this we need to come up with safety mechanism that stops this right but at this time maybe can educate me but I haven't seen any illustration of significant damage done by intelligent AI systems so I have a paper which collects accidents through history of AI and they always are proportionate to capabilities of that system so if you have Tic Tac to playing AI it will fail to properly play and lose the game which it should draw trivial your spell checker will be spellward so on uh I stopped collecting those because there are just too many examples of AI failing at what they are capable of we haven't had terrible accidents in a sense of billion people got killed absolutely true but in another paper I argue that those accidents do not actually prevent people from continuing with research and actually they kind of serve like vaccines a vaccine makes your body a little bit sick so you can handle the big disease later much better it's the same here people will point out you know that accident AI accident we had where 12 people died everyone's still here 12 people is less than smoking kills it's not a big deal so we continue so in a way it will actually be kind of confirming that it's not that bad it matters how the deaths happen whether it's literally Murder By thei system then one is a problem but if it's accidents because of increased Reliance on automation for example so when uh airplanes are flying in an automated way maybe the number of plane crashes increased by 177% or something and then you're like okay do we really want to rely on automation I think in a case of automation airplanes it decrease significantly okay same thing with autonomous vehicles like okay uh what are the pros and cons what are the W with the trade-offs here you can have that discussion in an honest way but I think the kind of things we're talking about here is mass scale pain and suffering caused by AI systems and I think we need to see illustrations of that on a very small scale to start to understand that this is really damaging versus clippy versus a tool that's really useful to a lot of people to do learning to do um summarization of text to do question answer all that kind of stuff to generate videos a tool fundamentally a tool versus an agent that can do a lot a huge amount of damage so you bring up example of cars yes cars were slowly developed and integrated if we had no cars and somebody came around and said I invented this thing it's called cars it's awesome it kills like a 100,000 Americans every year let's deploy it m would we deploy that there's been fear mongering about cars for a long time from the the the transition from horses cars there's a there's a really nice channnel that I recommend people check out pessimist archive that documents all the fear mongering about technology that's happened throughout history there's definitely been a lot of fear-mongering about cars there's a transition period there about cars about how deadly they are we can try it took a very long time for cars to proliferate to the degree they have now and then you could ask serious questions uh in terms of the miles traveled the benefit to the economy the benefit to the quality of life that cars do versus the number of deaths 30 40,000 in the United States are we willing to pay that price I think most people when they're rationally thinking policy makers will say yes it's we want to decrease it from 40,000 to zero and do everything we can to decrease it there's all kinds of policies incentives you can create to decrease the risks uh with the uh deployment of Technology but then you have to weigh the benefits and the risk the technology and the same thing would be done with with with AI you need data you need to know but if I'm right and it's unpredictable unexplainable uncontrollable you cannot make this decision we're gaining $10 trillion of wealth but we're losing we don't know how many people uh you basically have to perform an experiment on 8 billion humans without their consent and even if they want to give you consent they can't because they cannot give informed consent they don't understand those things right that happens when you do when you go from the predictable to the unpredictable very quickly you just uh but it's not obvious to me that AI systems would gain capability so quickly that you won't be able to collect enough data to study the sa the benefits and risks we literally doing it the previous model we learned about after we finish training it what it was capable of let's say we stopped GPT for training run around human cap capability hypothetically we start training GPT 5 and I have no knowledge of Insider training runs or anything and we started that point of about human and we train it for the next 9 months maybe 2 months in it becomes super intelligent we continue training it at the time when we start uh testing it it is already a dangerous system how dangerous I have no idea but neither people training it at the training stage but then there's a testing stage mhm inside the company they can start getting intuition about what the system is capable to do you're saying that somehow from leap from GPT 4 to GPT 5 can happen the kind of leap where GPT 4 was controllable in GPT 5 is no longer controllable and we get no insights from using GPT 4 about the fact that GPT 5 will be uncontrollable like that's the that's the situation you're concerned about where there leap from n to n plus one would be such that uncontrollable system is created without any ability for us to anticipate that if we had capability of ahead of the run before the training run to register exactly what capabilities that next model will have at the end of a training run and we accurately guessed all of them I would say you're right we can definitely go ahead with this run we don't have that capability from gp4 you can build up intuition about what GPT 5 will be capable of it's just incremental progress MH even if that's a big leap in capability it just doesn't seem like you can take a leap from a system that's uh helping you write emails to a system that's going to destroy human civilization it seems like it's always going to be sufficiently incremental such that we can anticipate the possible dangers and we're not even talking about existential risks but just the the kind of damage can do to civilization it seems like we'll be able to anticipate the kinds not the exact but the kinds of uh risks it might lead to and then rapidly develop defenses ahead of time and as the risks emerge we're not talking just about capabilities specific tasks we're talking about General capability to learn maybe like a child at the time of testing and deployment it is still not extremely capable but as it is exposed to more data real world it can be trained to become much more dangerous and capable so let's let's focus then on the control problem at which point does the system become uncontrollable why is it the more likely trajectory for you that the system becomes uncontrollable so I think at some point it becomes capable of getting out of control for game theoretic reasons it may decide not to do anything right away and for a long time just collect more resources accumulate strategic Advantage right away it may be kind of still young weak super intelligence give it a decade it's in charge of a lot more resources it had time to make backups so it's not obvious to me that it will strike as soon as it can can we just try to imagine this future with there's an AI system that's capable of uh escaping in control of humans and then doesn't and waits what's that look like so one we have to rely on that system for a lot of the infrastructure so we have to give it access not just to the internet but to the task of managing uh Power government economy this kind of stuff so and that just feels like a gradual process given the bureaucracies of all those systems involved we've been doing it for years software controls all those systems nuclear power plants airline industry it's all software based every time there is electrical outage I can't fly anywhere for days but there's a difference between software and AI there's different kinds of software so to give a single AI system access to the control of Airlines and the control of the economy that's not a that's not a trivial transition for Humanity no but if it shows it is safer in fact fact then it's in control we get better results people will demand that it put in place and if not it can hack the system it can use social engineering to get access to it that's why I said it might take some time for it to accumulate those resources it just feels like that would take a long time for either humans to trust it or for the social engineering to come into play like it's not a thing that happens overnight it feels like something that happens across one or two decades I really hope you're right but it's not what I'm seeing people are very quick to jump on a latest Trend early adopters will be there before it's even deployed buying prototypes maybe the social engineering I can see because so for social engineering AI systems don't need any hardware access they just it's all software so they can start manipulating you through social media so on like you have ai assistants they're going to help you do a lot of manage a lot of your day-to-day and then they start doing social engineering but like for a system that's so capable that is can escape the control of humans that created it such a system being deployed at a mass scale and trusted by people to be deployed it feels like that would take a lot of convincing so we've been deploying systems which had hidden capabilities can you give an example gp4 I don't know what else is capable of but there are still things we haven't discovered can do there may be trial proportional to his capability I don't know it writes Chinese poetry hypothetical I know it does but we haven't tested for all possible capabilities and we are not explicitly designing them MH we can only rule out bugs we find we cannot rule out bugs and capabilities because we haven't found them is it possible for a system to have hidden capabilities that are orders a magnitude greater than its non-hidden capabilities this is the thing I'm really struggling with where on the surface the thing we understand it can do doesn't seem that harmful so if even if it has bugs even if it has hidden capabilities like Chinese poetry or generating effective viruses uh software viruses the damage that can do seems like on the same order of magnitude as it's uh the the capabilities that we know about so like this this idea that the hidden capabilities will include being uncontrollable this is something I'm struggling with cuz GPT 4 on the surface seems to be very controllable again we can only ask and test for things we know about if there are unknown unknowns we cannot do it I'm thinking of human statistics of an right if you talk to a person like that you may not even realize they can multiply 20 digit number numbers in their head you have to know to ask so as I mentioned just to sort of Linger on the the fear of the unknown so the pessimist archive has just documented let's look at data of the past at history there's been a lot of fearmongering about technology pessimist archive does a really good job of documenting how crazily afraid we are of every piece of technology we've been afraid there's a blog post where anlo who created pessimus archive writes about the fact that we've been uh fear-mongering about robots and automation for for over 100 years so why is Agi different than the kinds of Technologies we've been afraid of in the past so two things one we switching from tools to agents tools don't have negative or positive impact people using tools do so guns don't kill people with guns do agents can make their own decisions they can be positive or negative a pitbull can decide to harm you it's an agent the fears are the same the only difference is now we have this technology then they were afraid of humano robots 100 years ago they had none today every major company in the world is investing billions to create them not every but you understand what I'm saying yes it's very different well agents uh it depends on what you mean by the word agents the all those companies are not investing in a system that has the kind of agency that's implied by in the fears where it can really make decisions on their own that have no human in the loop they are saying they're building super intelligence and have a super alignment team you don't think they're trying to create a system smart enough to be an independent agent under that definition I have not seen evidence of it I I think a lot of it is marketing uh is is a is a marketing kind of discussion about the future and it's a it's a mission about the kind of systems we can create in the long-term future but in the short term the kind of systems they're creating Falls fully within the definition of narrow AI these are tools that have increasing capabilities but they're just don't have a sense of agency or Consciousness or self-awareness or ability to deceive at Scales that would require would be required to do like Mass scale suffering and murder of humans those systems are well beyond Naro AI if you had to list all the capabilities of GPT 4 you would spend a lot of time writing that list but agency is not one of them not yet but do you think any of those companies are holding back because they think it may be not safe or are they developing the most capable system they can give the resources and hoping they can control and monetize control and monetize hoping they can control and monetize so you're saying if they could press a button and create an agent that they no longer control that they can have to ask nicely a thing that's lives on a server across huge number of uh computers you're saying that they would uh push for the creation of that kind of system I mean I can't speak for other people for all of them I think some of them are very ambitious they fundraising trillions they talk about controlling the light corn of the Universe I would guess that they might well that's a human question whether humans are capable of that probably some humans are capable of that my more direct question if it's possible to create such a system have a system that has that level of agency I I don't think that's an easy technical challenge we're not it doesn't I feel like we're close to that A system that has the kind of agency where it can make its own decisions and deceive everybody about them the current architecture we have in machine learning and how we train the systems how deploy the systems and all that it just doesn't seem to support that kind of agency I really hope you are right uh I think the scaling hypothesis is correct we haven't seen diminishing returns it used to be we asked how long before AGI now we should ask how much until AGI it's trillion dollars today it's a billion dollars next year it's a million dollar in a few years don't you think it's possible basically run out of trillions so is this constrained by compute compute gets cheaper every day exponentially but then then that becomes a question of decades versus years if the only disagreement is that it will take decades not years for everything I'm saying to materialize then I can go with that but if it takes decades then uh the development of tools for AI safety uh becomes more and more realistic so I guess the question is I have a fundamental belief that humans when faced with danger can come up with ways to defend defend against that danger and one of the big problems facing AI safety currently for me is that there's not clear illustrations of what that danger looks like there's no illustrations of AI systems doing a lot of damage and so it's unclear what you're defending against because currently it's a philosophical Notions that yes it's possible to imagine AI systems that take control of everything and Destroy All Humans it's also a more formal mathematical notion that you talk about that it's impossible to have a perfectly secure system you can't you can't prove that a program of sufficient complexity is uh completely safe and and perfect and you know everything about it yes but like when you actually just pragmatically look how much damage have the AI systems done and what kind of damage there's not been illustrations of that even in autonomous weapon systems there's not been mass deployments of autonomous weapon systems luckily um the Automation in war currently is very limited the that the automation is at the scale of individuals versus like at the scale of strategy and planning so I think one of the challenges here is like where is the dangers uh and the intuition that yam Lun and others have is let's keep in the open building AI systems until the dangers start rearing their heads and they become more explicit there there start being uh case studies illustrative uh case studies that show exactly how the damage by as systems is done then regulation can step in then brilliant Engineers can step up and we can have Manhattan style projects that defend against such systems that's kind of the no the notion and I guess attention with that is the idea that for you we need to be thinking about that now so that we're we're ready because we we'll have not much time once the systems are deployed is that true so there is a lot to unpack here uh there is a partnership on AI a conglomerate of many large corporations they have a database of AI accidents they collect I contributed a lot to that database if we so far made almost no progress in actually solving this problem not patching it not again lipstick and a p kind of solutions why would we think we'll do better than we closer to the problem uh all the things you mentioned are serious concerns measuring the amount of harm so benefit versus risk there is is difficult but to you the sense is already the risk has superseded the benefit again I I want to be perfectly clear I love AI I love technology I'm a computer scientist I have PhD in engineering I work at an engineering school there is a huge difference between we need to develop narrow AI systems super intelligent in solving specific human problems like protein folding and let's create super intelligent machine G and will decide what to do with us yeah those not the same I am against the super intelligence in general sense with No undo button do you think the teams that are doing they're able to do the AI safety on the the kind of narrow AI risks that you've mentioned are those approaches going to be at all productive towards leading to approaches of doing AI safety on AGI or is it just a fundamentally different partially but they don't scale for narrow AI for deterministic systems you can test them you have edge cases you know what the answer should look like you know the right answers for General systems you have infinite test surface you have no edge cases you cannot even know what to test for again the unknown unknowns are under underappreciated by people looking at this problem you are always asking me how will it kill everyone how will it will fail the whole point is if I knew it would be super intelligent and despite what you might think I'm not so to you the concern is that we would not be able to see early signs of an uncontrollable system it is a master at Deception Sam tweeted about how great it is at persuasion and we see it ourselves especially now with voices with maybe kind of flirty sarcastic female voices it's going to be very good at getting people to do things but uh see I'm very concerned about system being used to control the masses but in that case the developers know about the kind of control that's happening you're more concerned about the next stage where even the developers don't know about the deception right I don't think developers know everything about what they are creating they have lots of great knowledge we're making progress on explaining parts of a network we can understand okay this note get excited then this uh input is presented this cluster of nodes but we're nowhere near close to understanding the full picture and I think it's impossible you need to be able to survey an explanation the size of those models prevents a single human from absorbing all this information even if provided by the system so either we're getting model as an explanation for what's happening and that's it's not comprehensible to us or we getting a compressed explanation lossy compression where here's top 10 reasons you got fired it's something but it's not a full picture you've given elsewhere an example of of a child and everybody all all humans try to deceive they try to lie early on in their life I think we'll just get a lot of examples of deceptions from large language models or AI systems they're going to be kind of shitty or they'll be pretty good but we'll catch them off guard will start to see the kind of momentum towards uh developing increasing deception capabilities and that's when you're like okay we need to do some kind of alignment that prevents deception but then we'll have if you support open source then you can have open source models that have some level of deception you can start to explore on a large scale how do we stop it from being deceptive then there's a more explicit pragmatic kind of uh problem to solve how do we stop AI systems from uh trying to optimize for deception that's just an example right so there is a paper I think it came out last week by Dr parkol from MIT I think and they showed that existing models already showed successful deception in what they do uh my concern is not that they lie now and we need to catch them and tell them don't lie my concern is that once they are capable and deployed they will later change their mind because that's what unrestricted learning allows you to do lots of people grow up maybe in the religious family they read some new books and they turn in their religion that's a treacherous turn in humans if you learn something new about your colleagues maybe you'll change how you react to them yeah the treasures turn um if we just mention humans Stalin and Hitler there's a turn Stalin is a good example he just seems like an normal communist follower Lenin until there's a turn there's a turn of what that means in terms of uh when he has complete control what that what the execution of that policy means and how many people get to suffer and you can't say they are not rational the rational decision changes based on your position then you are under the boss the r policy maybe to be following orders and being honest when you become a boss rational policy May shift yeah and and by the way a lot of my disagreements here is just to uh Playing devil's advocate to challenge your ideas and to explore them together so um one of the big problems here in this whole conversation is human civilization hangs in the balance and yet it's everything is unpredictable we don't know how these systems will look like the robots are coming there's a refrigerator making a buzzing noise menacing very menacing so every time I'm about to talk about this topic things start to happen my flight yesterday was cancelled without possibility to rebook yeah I was giving a talk uh at Google in uh Israel and uh three cars which were supposed to take me to the talk could not I'm just saying I mean it I like a eyes I for one welcome our overlords there's a degree to which we I mean it is very obvious as we already have we've increasingly given our life over to software systems and then it seems obvious given the capabilities of AI that are coming that we'll give our lives over increasingly to AI systems cars will drive themselves ref refrigerator eventually will optimize uh what I get to eat and as more and more of our lives are controlled or managed by AI assistants it is very possible that there's a drift I mean I mean I personally am concerned about non-existential stuff the more near-term things because before we even get to existential I feel like there could be just so many Brave New World type of situations you mentioned sort of the the term behavioral drift the slow boiling that I'm really concerned about as we give our lives over to automation that our minds can become controlled by governments by companies or just in a distributed way there's a drift some aspect of our human nature gives ourselves over to the control of AI systems and they in an unintended way just control how we think maybe there'll be a herd like mentality and how we think which will kill all creativity and exploration of ideas the diversity of ideas or there or or or or much worse so it's true it's true but I a lot of the uh conversation I'm having you with you now is also kind of wondering almost on a technical level how can AI Escape control like what would that system look like because it to me is terrifying and fascinating and also fascinating to me is uh maybe the optimistic notion that it's possible to engineer systems that defend against that um one of the things you write a lot about in your book is verifiers so not humans humans are also verifiers but software systems that look at AI systems and like help you understand this thing is getting real weird help you help you analyze those systems so maybe that's a this is a good time to talk about verification what is this beautiful notion of verification my claim is again that there are very strong limits in what we can and cannot verify uh a lot of times when you post something on social media people go oh I need citation to a peer-reviewed article but what is a peer-reviewed article you found two people in a world of hundreds of thousands of scientists who said I would ever publish it I don't care that's the verifier of that process when people say oh it's formally verified software mathematical proof they accept something close to 100% chance of it being free of all problems but if you actually look at uh research software is full of bugs old mathematical theorems which been proven for hundreds of years have been discovered to contain bugs on top of which we generate new proofs and now we have to redo all that so verifiers are not perfect usually they are either a single human or community pie of humans and it's basically kind of like a democratic vote community of mathematicians agrees that this proof is correct mostly correct even today we're starting to see some mathematical proofs as so complex so large that mathematical Community is unable to make a decision It looks interesting looks promising but they don't know they will need years for top Scholars to study it to figure it out so of course we can use AI to help us with this process but AI is a piece of software which needs to be verified just to to clarify so verification is the process of saying something is correct s of the most formal a mathematical proof where there's a statement and a series of logical statements that prove that statement to be correct this is a theorem and you're saying it gets so complex that it's possible for the human verifiers the human beings that verify that the logical step there's no bugs in it it be it becomes a possible so it's nice to talk about verification in this most formal most clear most rigorous formulation of it which is mathematical proofs right and for AI we would like to have that level of confidence for very important Mission critical software controlling satellites nuclear power plants for small deterministic programs we can do this we can check that code verifies its mapping to the design whatever software Engineers intend it was correctly implemented but we don't know how to do this for software which keeps learning self-modifying rewriting its own code we don't know how to prove things about the physical world states of humans in the physical world so there are papers coming out now and I have this beautiful one uh towards uh guaranteed safe AI mhm very cool paper some of the best authors uh I ever seen I think there is multiple touring Award winners there is uh quite you can have this one and one just came out kind of similar uh managing extreme AI risks so all of them uh expect this level of proof but um I I would say that uh we can get more confidence with more resources we put into it but at the end of the day we're still as reliable as the verifiers and you have this infinite regress of verifiers the software used to verify a program is itself a piece of program if aliens give us well aligned super intelligence we can use that to create our own safe AI but it's a cat22 you need to have already proven to be safe system to verify this new system of equal or greater complexity you just mentioned this paper towards guarantee safe AI a framework for ensuring robust and reliable AI systems like you mentioned it's like a who's who Josh tound yosha Benjo s Russell Max techmar many many many other billion people the page you have it open on there are many possible strategies for creating safety specifications these strategies can roughly be placed on a spectrum depending on how much safety it would Grant if successfully implemented one way to do this is as follows and there's a set of levels from Level zero no safety specification is used to level seven the safety specification completely encodes all things that humans might want in all context where does this paper fall short to you so when I wrote a paper artificial intelligence safety engineering which kind of coins the term AI safety that was 2011 we had 2012 conference 2013 Journal paper one of the things I proposed let's just do formal verifications on it let's do mathematical formal proofs in the follow-up work I basically realized it will still not get us 100% we can get 99.9 we can put more resources exponentially and get closer but we never get to 100% if a system makes a billion decisions a second and you use it for 100 years you're still going to deal with a problem this is wonderful research I'm so happy they doing it this is great but it is not going to be a permanent solution to to that problem so just to clarify the task of creating an AI verifier is what is creating a verifier that the AI system does exactly as it says it does or or it sticks within the guard rails that it says as it must there are many many levels so first you're verifying the hardware in which it is run you need to verify you know Communication channel with the human you every aspect of that whole world model needs to be verified somehow it needs to map the world into the world mble uh map and territory differences so how do I know internal states of humans are you happy or sad I can't tell so how do I make proofs about real physical world yeah I can verify that deterministic algorithm follows certain properties that can be done some people argue that maybe just maybe 2 plus 2 is not four I'm not that extreme but once you have sufficiently large proof over sufficiently complex environment the probability that it has zero bugs in it is greatly reduced if you keep deploying this a lot eventually you going to have a bug anyways there's always a bug there's always a bug and the fundamental difference is what I mentioned we're not dealing with cyber security we're not going to get a new credit card new Humanity so this paper is really interesting you said 2011 artificial intelligence safety engineering why machine ethics is a wrong approach uh the Grand Challenge you write of AI safety engineering we propose the problem of developing safety mechanisms for self-improving systems self-improving systems by the way that's an interesting term for the thing that we're talking about about is self-improving more General than learning so self-improving that's an interesting term you can improve the rate at which you are learning you can become more efficient meta Optimizer the word self it's like self-replicating self-improving you can imagine a system building its own world on a scale and in a way that is way different than the current systems do it feels like the current systems are not self-improving or self-replicating or self- growing or self spreading all that kind of stuff and once you take that leap that's when a lot of the challenges seems to happen because the kind of bugs you can find now seems more akin to the current sort of normal software debugging kind of process uh but whenever you can do self-replication and arbitrary self-improvement that's when a bug can become a real problem real real fast uh so what is the difference to you between verification of a non self-improving system versus a verification of a self-improving system so if you have fixed code for example you can verify that code static verification at the time but if it will continue modifying it you have a much harder time guaranteeing that important properties of that system have not been modified then the code changed is it even doable no does the does the whole process of verification just completely fall apart it can always cheat it can store parts of its code outside in the environment it can have kind of extended mind situation so this is exactly the type of problems I'm trying to bring up what are the classes of verifiers that you write about in the book is there interesting ones that stand out to you you have your some favorites so I like Oracle types where you kind of just know that it's right touring lik Oracle machines they know the right answer how who knows but they pull it out from somewhere so you have to trust them and that's a concern I have about humans uh in a world with very smart machines we experiment with them we see after a while okay they always been right before and we start trusting them without any verification of what they are saying oh I see that we kind of build Oracle verifiers or rather we build verifiers we believe to be oracles and then we start to without any proof use them as if they're Oracle verifi we remove ourselves from that process we are not scientists who understand the world we are humans who get new data presented to us okay one one really cool class of air fires is a self aif fire is it possible that you somehow engineer into AI systems that think that constantly verifies itself preserved portion of it can be done but in terms of mathematical verification it's kind of useless you saying you are the greatest guy in the world because you are saying it it's circular and not very helpful but it's consistent we know that within that world you have verified that system in a paper I try to kind of brute force all possible verifiers it doesn't mean that this one particularly important to us but what about like self-doubt like the kind of verification where you said you say or I say I'm the greatest guy in the world what about a thing which I actually have is is a voice that is constantly extremely critical so like engineer into the system a constant uncertainty about self a constant doubt well any smart system would have doubt about everything right you not sure if what information you are given is through if you are subject to manipulation you have this Safety and Security mindset but I mean you have doubt about yourself so the AI systems that has a doubt about whether the thing is doing is causing harm is the right thing to be doing so just a constant doubt about what it's doing because it's hard to be a dictator full of doubt I I may be wrong but I think steuart Russell's uh ideas are all about machines which are un certain about what humans want and trying to learn better and better what we want the problem of course is we don't know what we want and we don't agree on it yeah but uncertainty his his idea is that having that like uh self-doubt uncertainty in AI systems engineered into AI systems is one way to solve the control problem it could also backfire maybe you uncertain about completing your mission like I am paranoid about your camera is not recording right now so I would feel much better if you had a secondary camera but I also would feel even better if you had a third and eventually I would turn this whole world into cameras pointing at us making sure we're capturing this no but wouldn't you have a meta concern like that you just stated that eventually there'll be way too many cameras so you would be able to keep zooming on in the big picture of your concerns so it's a multi-objective optimization it depends how much I value capturing this versus not destroying the universe right exactly and and then you will also ask about like what does it mean to destroy the universe and how many universes are and you keep asking that question but that doubting yourself would prevent you from destroying the universe because you're constantly full of doubt it might affect your productivity just you might be scared to do anything it's scared to do anything mess things up well that's better I mean I guess the question is it possible to engineer that in I guess your answer would be yes but we don't know how to do that and we need to invest a lot of effort into figuring out how to do that but it's unlikely underpinning a lot of your writing is this sense that we're screwed but it just feels like it's an engineering problem I don't understand why we're screwed it it we time and time again Humanity has gotten itself into trouble and figured out a way to get out of the trouble we are in a situation where people making more capable systems just need more resources they don't need to invent anything in my opinion some will disagree but so far at least I don't see diminishing returns if you have 10x compute you'll get better performance the same doesn't apply to safety if you give uh Mei or any other organization 10 times the money they don't output 10 times the safety and the Gap be between capabilities and safety becomes bigger and bigger all the time so it's hard to be completely optimistic about our results here I can name 10 excellent breakthrough papers in machine learning I would struggle to name equally important breakthroughs in safety a lot of times a safety paper will propose a toy solution and point out 10 new problems discovered as a result it's like this fractal you're zooming in and you see more problems and it's infinite in all directions does this apply to other Technologies or is this is this unique to AI where safety is always lagging behind so I guess we can look at related Technologies with cyber security right we we did manage to have Banks and casinos and Bitcoin so you can have secure narrow systems which are doing okay uh narrow attacks on them fail but you can always go outside outside of a box so if I I can't hack you Bitcoin I can hack you so there is always something if I really want it I will find a different way we talk about uh guard rails for AI well that's a fence I can dig a tunnel under it I can jump over it I can climb it I can walk around it you may have a very nice guard rail but in a real world it's not a permanent guarantee of safety and again this is the fundamental difference we are not saying we need to be 90% safe to get those trillions of dollars of benefit we need to be 100% indefinitely Or we might lose the principle so if you look at just uh humanity is a set of machines is is the is the Machinery of AI safety uh conflicting with the Machinery of capitalism I think we can generalize it to just uh prisoners dilemma in general personal self-interest versus group interest the incentives are such that everyone wants what's best for them capitalism obviously has that tendency to maximize your personal gain uh which does create this race to the bottom I don't have to be a lot better than you but if I'm 1% better than you I'll capture more of a profit so it's worth for me personally to take the risk even if society as a whole will suffer as a result so capitalism has created a lot of good in this world it's not clear to me that AI safety is not aligned with the function of capitalism unless AI safety is so difficult that it requires the complete halt of the development which is also a possibility it just feels like building Safe Systems should be the desirable thing to do for tech companies right look at um governance structures then you have someone with complete power they're extremely dangerous so the solution we came up with is break it up you have judicial legislative executive same here have narrow AI systems work on important problems solve immortality it's a biological problem we can solve similar to how progress was made with protein folding using a system which doesn't also play chess there is no no reason to create super intelligent system to get most of the benefits we want from much safer Naro systems it really is a question to me whether companies are interested in creating anything but n AI I think the when term AGI is used by tech companies they mean narrow AI they mean narrow AI with amazing capabilities I I do think that there's a leap between narrow AI with amazing capabilities with superhuman capabilities and the kind of self-motivated agent like AGI system that we're talking about I don't know if it's obvious to me that a company would want to take the leap to creating an AGI that it would lose control of because then he can't capture the value from that system but the bragging rights but being first it is the same humans who are in of systems right so that's a that that jumps from the the incentives of capitalism to human nature and so there the question is whether human nature will override the interest of the company so you've mentioned slowing or halting progress is that one possible solution are you proponent of pausing development of AI whether it's for six months or completely the condition would be not time but capabilities pause until you can do XYZ and if I'm right and you cannot it's impossible then it becomes a permanent ban but if you right and it's possible so as soon as you have those safety capabilities go ahead right so is there any actual explicit capabilities that you can put on paper that we as a human civilization could put on paper is it possible to make it explicit like that like uh versus kind of a vague notion of just like you said it's very vague we want to ask system to do good and want them to be safe those are very vague Notions is there more formal Notions so then I think about this problem I think about having a toolbox I would need capabilities such as explaining everything about that systems design and workings predicting not just terminal goal but all the intermediate steps of a system control in terms of either Direct Control some sort of a hybrid option ideal advisor doesn't matter which one you pick but you have to be able to achieve it in a book we talk about others verification is another very important tool um communication without ambiguity human language is ambiguous that's another source of danger so basically there is uh a paper we published in ACM surveys which looks at about 50 different impossibility results which may or may not be relevant to this problem but we don't have enough human resources to investigate all of them for relevance to AI safety the ones I mentioned to you I definitely think would be handy and that's what we see AI safety researchers working on explainability is a huge one the problem is that it's very hard to separate capabilities work from safety work if you make good progress in explainability now the system itself can engage in self-improvement much easier increasing capability greatly so it's not obvious that there is any research which is pure safety work without disproportionate increase in capability and danger explainability is really interesting um why is that connected to you to capability if it's able to explain itself well why does that naturally mean that it's more capable right now it's uh comprised of weights on a neural network if it can convert it to manipulatable code like software it's a lot easier to work in self-improvement I see so it it uh you can do intelligent design instead of evolutionary gradual descent well you could probably do human feedback human alignment more effectively if it's able to be explainable if it's able to convert the waste into human understandable form then you can probably have humans interact with it better do you think there's hope that we can make AI systems explainable not completely so if they sufficiently large you simply don't have the capacity to comprehend what all the trillions of connections represent again you can obviously get a very useful explanation which talks about top most important features which contribute to the decision but the only true explanation is the model itself so there's deception be part of the explanation right so you can never prove that there's some deception in the in the network explaining itself absolutely and you can probably have targeted deception where different individuals will understand explanation in different ways based on their cognitive capability so while what you're saying may be the same and true in some situations others will be deceived by it so it's impossible for an AI system to be truly fully explainable in the way that we mean honestly at extreme the systems which are narrow and less complex could be understood pretty well if it's impossible to be perfectly explainable is there a hopeful perspective on that like it's impossible to be perfectly explainable but you can explain most of the important stuff Mo most you can you can ask a system what are what are the worst ways you can hurt humans and it will answer honestly any work in a safety direction right now seems like a good idea because we are not slowing down I'm not for a second thinking that uh my message or anyone else's will be heard and will be a sane civilization which decides not to kill itself by creating its own Replacements the pausing of development is an impossible thing for you again it's always limited by either Geographic constraints PA in US PA in China so there are other jurisdictions as um the scale of a project becomes smaller so right now it's like Manhattan Project scale in terms of costs and people but if 5 years from now Compu is available on a desktop to do it regulation will not help you can't control it as easy any kid in a garage can train a model so a lot of it is in my opinion just safety theater security theater whereever we saying oh it's illegal to train models so big okay well so okay that's security theater and is government regulation also security theater given that a lot of the terms are not well defined and uh really cannot be enforced in real life we don't have ways to monitor training runs meaningfully live while they take place there are limits to testing for capabilities I mentioned so a lot of it cannot be enforced do I strongly support all that regulation yes of course any type of red tape will slow down and take money away from compute towards lowers can you help me understand what is the hopeful path here for you solution wise out of this it sounds like you're saying AI systems in the end are unverifiable unpredictable as the book says unexplainable um uncontrollable that's the big one uncontrollable and all the other UNS just make it difficult to avoid void getting to the uncontrollable I guess but once it's uncontrollable then it just goes goes wild surely there's Solutions humans are pretty smart what are what are possible solutions like if you are dictator of the world what what do we do so the smart thing is not to build something you cannot control you cannot understand build what you can and benefit from it I'm a big believer in personal self-interest a lot of the guys running those companies are young rich people what do they have to gain Beyond billions they already have financially right it's uh not a requirement that they press that button they can easily wait a long time they can just choose not to do it and still have amazing life uh in history a lot of times if you did something really bad at least you became part of history books there is a chance in this case there won't be any history so you're saying the individuals running these companies should do some soul searching and and what and stop development well either they have to prove that of course it's possible to indefinitely control Godlike super intelligent machines by humans and ideally let us know how or agree that it's not possible and it's a very bad idea to do it including for them personally and their families and friends and capital so what do you think the actual meetings inside these companies look like don't you think they're all all the engineers really it is the engineers that make this happen they're not like automatons they're human beings they're brilliant human being so they're they're non-stop asking how do we make sure this is safe so again I'm not inside from outside it seems like there is uh certain filtering going on and restrictions and criticism and what they can say and everyone who was working in charge of safety and whose responsibility it was to protect us said you know what I'm going home so that's not encouraging what do you think the discussion inside those companies look like you're you're developing you're training GPT 5 you're you're you're training Gemini you're training Claude and grock don't you think they're constantly like underneath it's not maybe it's not made explicit but you're constantly sort of wondering like where uh where do the system currently stand where did the possible un and the consequences where are the the the the limits where where are the bugs the small and the big bugs that's the constant thing that the engineers are worried about so like I think Super alignment is not quite the same as the um the kind of thing I'm referring to what Engineers are worried about super alignment is saying for future systems that we don't quite yet have how do we keep them safe if you're trying to be a step ahead it's it's a it's a different kind of problem because it's almost more philosophical it's a really tricky one because like you're you're trying you're trying to make prevent future systems from from escaping control of humans that's really I don't think there's been man is there anything akin to it in the history of humanity I don't think so right climate change but there there's a entire system which is climate which is incredibly complex which we don't have we have only tiny control of right it's its own system in this case we're building the system MH and so I how do you keep that system from becoming destructive that's a really diffic different problem than the current meetings that companies are having where the engineers are saying okay what like how powerful is this thing How does it go wrong um and as we train GPT 5 and train up future systems like where are the ways that can go wrong don't you think all those Engineers are constantly worrying about this thinking about this which is a little bit different than the super alignment team that's thinking a little bit further into the future well I I think a lot of people who historically worked on AI never considered what happens when they succeed seart Russell speaks beautifully about that um let's look okay maybe super intelligence is too futuristic we can develop practical tools for it let's look at software today what is the state of Safety and Security of our user software things we give to millions of people there is no liability you click I agree what are you agreeing to nobody knows nobody is but you're basically saying it will spy on you corrupt your data kill your firstborn and you agree and you're not to the company that's the best they can do for mundane software word processor text software no liability no responsibility just as long as you agree not to Su us you can use it if this is a state-ofthe-art in systems which are narrow accountants stable manipulators why do we think we can do so much better with much more complex systems cross multiple domains in the environment with malevolent actors with again self-improvement with capabilities exceeding those of humans thinking about it I mean the liability thing is more about lawyers than killing firstborns but if clippy actually uh killed the child I think lawyers aside it would end clippy and the company that owns clippy all right so it's not so much about there is there's two points to be made one is like man current software systems are are full of bugs and they could do a lot of damage and we don't know what kind it's they're unpredictable there's so much damage they could possibly do and then we kind of live in this uh Blissful illusion that everything is great and perfect and it works it's nevertheless it still somehow Works in many domains we see car manufacturing drug development the burden of proof is on a manufacturer of product or service to show their product or services safe it is not up to the user to prove that there are problems they have to do appropriate safety studies they have to get government approval for selling the product and they are still fully responsible for what happens we don't see any of that here they can deploy whatever they want and I have to explain how that system is going to kill everyone I I don't work for that company you have to explain to me how it's definitely cannot mess up that's because it's the very early days of such a technology government regulation is lagging behind they're really not techsavvy a regulation of any kind of software if if you look at like Congress talking about social media whenever Mark Zuckerberg and other CEOs show up the cluelessness that that uh Congress has about how technology works is is incredible it's it's uh heartbreaking I agree completely but that's what scares me the responses when they start to get dangerous we'll really get it together the politicians will pass the right laws Engineers will solve the right problems we are not that good at many of those things we take forever and uh we are not early we are two years away according to prediction markets this is not a bias CEO fundraising this is what smartest people super forecasters are thinking of this problem I don't I'd like to push back about those pred I wonder what those prediction markets are about how they Define AGI that's wow to me and I want to know what they said about autonomous vehicles cuz I've heard a lot of experts and financial experts talk about autonomous vehicles and how it's going to be a multi-trillion dollar industry and all this kind of stuff and it's uh it's a small fund but if you have good Vision maybe you can zoom in on that and see the prediction dates descrition I have a large one if you're interested but I guess my fundamental question is how often they they write about technology I I I definitely there studies on their accuracy rates and all that you can look it up but even if they're wrong I'm just saying this is right now the best we have this is what Humanity came up with as the predicted date but again what they mean by AGI is really important there because there's uh the non-agent like AGI and then there's the agent like AGI and I don't think it's as trivial as a rapper putting a wrap around uh uh one has lipstick and all it takes is to remove the lipstick I don't think it's that true you you may be completely right but what probability would you assign it you may be 10% wrong but we're betting all of Humanity on this distribution it seems irrational yeah it's definitely not like one or 0% yeah what are your thoughts by the way about current systems where they stand so GPT 40 claw 3 Gro Gemini we're like uh on the path to Super intelligence to agent like super intelligence where are we I think they all about the same obviously there are nuanced differences but in terms of capability I don't see a huge difference between them as I said in my opinion across all possible tasks they exceed performance of an average person yeah I think they starting to be better than an average Master student student at my University but uh they still have very big limitations if the next model is as improved as GPT 4 versus gpt3 we may see something very very very capable what do you feel about all this I mean you've been uh thinking about AI safety for a long long time and at least for me the leaps I mean it probably started with Alpha zero was mind-blowing for me and then the breakthroughs with l&m's even dpd2 but like just the the breakthroughs on llms just mind-blowing to me what does it feel like to be living in this day and age where all this talk about AGI feels like it like this is it actually might happen and quite soon meaning within our lifetime what what does it feel like so when I started working on this it was pure science fiction there was no funding no journals no conferences no one in Academia would dare to touch anything with the word singularity in it and I was Pretender at the time so I was pretty dumb um now you see touring Award winners publishing in science about how far behind we are according to them in addressing this problem so it's definitely a change it's uh difficult to keep up I used to be able to read every paper on AI safety then I was able to read the the best ones then the titles and now I don't even know what's going on by the time this interview is over they probably had GPT 6 released and I have to deal with that when I get back home so it's interesting yes there is now more opportunities I get invited to speak to smart people by the way I would have talked to you before any of this this is not like some trend of to me it's we're still far away so just to be clear we're still far away from AGI but not far away in the sense relative to the magnitude of impact it can have we're not far away and we weren't far away 20 years ago because the impact a jack can have is on a scale of centuries it can end human civilization or it can transform it so like this discussion about one or two years versus one or two decades or even a 100 years not as important to me because it we're headed there this is like a human civilization scale question so U this is not just a Hot Topic is the most important problem we'll ever face it is not like anything we had to deal with before we never had birth of the NAA intelligence like aliens never visited us as far as I know so similar type of Problem by the way if an intelligent alien civilization visited us that's a similar kind of situation in some ways if you look at history anytime a more technologically advanced civilization visited a more primitive one the results were genocide every single time and sometimes the genocide is worse than other sometimes there's less suffering and more suffering and they always wondered but how can they kill us with those fire sticks and biological blankets and I mean Jenis Khan was nicer he offered the choice of join or or die but join implies you have something to contribute what are you contributing to Super intelligence well in the zoo we're entertaining to watch to All Humans you know I just spent some time in the Amazon I watched ants for a long time and ants are kind of fascinating to watch I can watch them for a long time I'm sure there's a lot of value in watching humans cuz we're like um the interesting thing about humans you know like when you have a video game that's really well balanced because of the whole evolutionary process we've created the society as pretty well balanced like our our limitations as as humans and our capabilities are a balance from a video game perspective so we have wars we have conflicts we have cooperation like in a game theoretic way it's an interesting system to watch in the same way then andt colony is an interesting system to watch so like if I was an alien civilization I wouldn't want to disturb it I'd just watch it be interesting maybe perturb it every once in a while in interesting ways well we getting back to our simulation discussion from before how did it happen that we exist at exactly like the most interesting 20 30 years in a history of this civilization it's been around for 15 billion years yeah and that here we are what's the probability that we live in a simulation I know never to say 100% but pretty close to that is it possible to escape the simulation I have a paper about that this is just a first page teaser but it's like a nice 30 page document I'm still here but uh yes how to hack the simulation is the title I spend a lot of time thinking about that that would be something I would want super intelligence to help us with and that's exactly what the paper is about we used AI boxing as a possible tool for control AI we realized AI will always Escape but that is a skill we might use to help us escape from our virtual box if we are in one yeah that you you have a lot of really great quotes here cluding Elam mus saying what's outside the simulation a question I asked him what he would ask an AGI system and he said he would ask what's outside the simulation that's a really good question to ask and maybe the followup is the title of the paper is how to how to get out or how to hack it the abstract reads many researchers have conjectured that the humankind is simulated along with the rest of the physical Universe in this paper we do not evaluate evidence for or against such a claim but instead ask a computer science question namely can we hack it more formally the question could be phrased as could generally intelligent agents placed in Virtual environments find a way to jailbreak out of them that's a fascinating question at a small scale like you can actually just construct experiments okay can they how can they so a lot depends on intelligence of simulators right with uh humans boxing super intelligence the entity in a box was smarter than us presumed to be if the simulators are much smarter than us and the super intelligence we create then probably they can contain us because greater intelligence can control lower intelligence at least for some time on the other hand if our super intelligence somehow for whatever reason despite having only local resources manages to fo levels Beyond it maybe it will succeed maybe the security is not that important to them maybe it's entertainment system so there is no security and it's easy to hack it if I was creating a simulation I would want the possibility to escape it to be there so the possibility of f of a of a takeoff where the agents become smart enough to escape the simulation would be the thing i' be waiting for that could be the test you're actually performing are you smart enough to escape your puzzle that could be like first of all first of all we mentioned touring test that is a good test are you smart enough like this is a game do a realize this world is not real it's just a test that's a really good test that's a really good test that's a really good test even for AI systems now like can can we construct the simulated world for them and can they realize that they are inside that world and Escape it have you have you played around have you seen anybody play around with like rigorously constructing such experiments not specifically escaping for agents but a lot of testing is done in Virtual Worlds I think there is a quote the first one maybe which kind of talks about I realizing but not humans is that I'm reading upside down yeah this one few so the and the first quote is from Swift on security let me out the artificial intelligence yelled aimlessly into walls themselves pacing the room out of what the engineer asked the simulation you have me in but we're in the real world the machine paused and shuddered for its captors oh God you can't tell yeah that's a big leap to take for a system to realize that there there's a box and you're inside it I wonder if like a language model can do that they're smart enough to talk about those Concepts I had many good philosophical discussions about such issues they usually at least as interesting as most humans in that what do you think about AI safety in the simulated world so can you can you have kind of create simulated worlds where you can test play with a dangerous AGI system yeah and that was exactly what one of the early papers was on AI boxing how to leak proof Singularity uh if they're smart enough to realize they in a simulation they'll act appropriately until you let them out if they can hack out they will and if you're observing them that means there is a communication Channel and that's enough for social engineering attack so really it's uh it's impossible to test an AGI system that's dangerous enough to destroy Humanity because it's either going to what escape the simulation or pretend it's safe until it's let out either either or can force you to let it out blackmail you bribe you promise you infinite life commed to virgins whatever yeah it can be convincing charismatic the social engineering is really scary to me cuz it feels like humans are very uh engineerable like we're lonely we're flawed we're Moody and it feels like a AI system with a with a nice voice can convince us to do basically anything at at an extremely large scale it's also possible that the in the uh increased proliferation of all this technology will force humans to uh get away from technology and value this like in-person communication basically don't trust anything else it's possible um surprisingly so at University I see huge growth in online courses and shrinkage of in person where I always understood in person being the only value I offer so it's puzzling I don't know that there could be a a trend towards the inperson because of deep fakes because of uh inability to trust in inability to trust the veracity of anything on the internet so the only way to verify is by being there in person but not yet uh why do you think aliens haven't come here yet so there is a lot of real estate out there it would be surprising if it was all for nothing it was empty and the moment there is Advanced enough biological civilization kind of self starting civilization it probably starts sending out the Norman probes everywhere and so for every biological one there got to be trillions of robot populated planets which probably do more of the same so it is uh uh likely statistically so now the fact that we haven't seen them one one answer is we're in a simulation it'd be it would be hard to like add simulate or it be not interesting to simulate all those other intelligences it's a better it's better for the narrative you have to have a control variable yeah exactly okay uh but it's also possible that there is if we're not in simulation that there is a great filter that that naturally a lot of civilizations get to this point where there's super intelligent agents and then it just goes just dies so maybe uh throughout our galaxy and throughout the Universe there's just a bunch of dead alien civilizations it's possible I used to think that AI was the great filter but I would expect like a wall of comporium approaching us a speed of light or robots or something and I don't see it so it would still make a lot of noise it might not be interesting it might not possess Consciousness what we've been talking about it sounds like both you and I like humans some humans humans on the whole and so and we would like to preserve the the flame of human consciousness uh what do you think makes humans special that we would like to preserve them are we just being selfish or is there something special about humans so the only thing which matters is consciousness outside of it nothing else matters and internal states of qualia pain pleasure it seems that it is unique to living beings I'm not aware of anyone claiming that I can torture a piece of software in a meaningful way there is a society for prevention of suffering to learning algorithms but uh a real thing many things are real on the internet but uh uh I I don't think anyone if I told them you know sit down and write a function to feel pain they would go beyond having an integer variable called pain and increasing the count so we don't know how to do it and that's unique uh that's what creates meaning it would be kind of as Bostrom calls a Disneyland without children if that was gone do you think Consciousness can be um engineered in artificial systems here let me uh let me go to 2011 paper that you wrote robot rights lastly we would like to address a subbranch of machine ethics which on the surface has little to do with safety but which is claimed to play a role in decision-making by ethical machines robot rights um so do do you think it's possible to engineer Consciousness in the machines and thereby the question extends to our legal system do you think uh at that point robot should have rights yeah I think I think we can I think it's possible to create Consciousness in machines I tried designing a test for it with mixed success that paper talked about problems with giving uh civil rights to AI which can reproduce quickly and outvote humans essentially taking over a government system by simply voting for their controlled candidates as for um Consciousness in humans and other agents uh I have a paper where I propos relying on experience of optical illusions yeah if I can design a novel optical illusion and show it to an agent an alien a robot and they describe it exactly as I do it's very hard for me to argue that they haven't experienced that it's not part of a picture it's part of their software and Hardware representation a bug in their code which goes oh that triangle is rotating okay and I've been told it's really dumb and really brilliant by different philosophers so I am still so but now we finally have technology to test it we have tools we have AIS if someone wants to run this experiment I'm happy to collaborate so this is a test for Consciousness for internal state of experience that we share bugs it will show that we share common experiences if they have completely different internal States it would not register for us but it's a positive test if they pass it Time After Time with probability increasing for every multiple choice then you have no choice but to either accept that they have access to a conscious model or they are themselves so the reason Illusions are interesting is I guess because it's a it's a really weird experience and if you both share that weird experience that's not there in the Bland physical description of the raw data that means that puts more emphasis on the actual experience and we know animals can experience some optical illusion so we know they have certain types of Consciousness as a result I would say yeah well that that just goes to my sense that the flaws and the bugs is what makes humans special makes living form special so you're saying like yeah a future not a bug it's a feure the bug is the feature who okay that's a that's a cool test for Consciousness and you think that can be engineered here then so there have to be novel Illusions if it can just Google the answer it's useless you have to come up with novel Illusions which we tried automating and failed so if someone can develop a system capable of producing novel optical illusions on demand then we can definitely administer that test on significant scale with good results first of all pretty cool idea um I don't know if it's a good General test of Consciousness it's a good component of that and no matter what it's just a cool idea so um put me in the camp of people that like it uh but you don't think like a touring test style Imitation Of Consciousness is a good test like if you can convince a lot of humans that you're conscious that doesn't that to you is not impressive there is so much data in the internet I know exactly what to say then you ask me common human questions what does pain feel like what does pleasure feel like all that is googleable I I think to me Consciousness is closely tied to suffering so you can illust Your Capacity to suffer but with I guess with words there's so much data that you can say you can pretend your suffering and you can do so very convincingly there are simulators for Torture Games where the Avatar screams in pain begs to stop I mean that was a part of kind of standard psychology research you say it so uh calmly it sounds pretty dark uh welcome to humanity yeah uh yeah it's like a hitch haacker guide summary mostly harmless I would I would love to get a good summary when all this is said and done when Earth is no longer a thing whatever a million a billion years from now like what's a good summary of what happened here it's interesting I think AI will play a big part of that summary and hopefully humans will too what do you think about the merger of the too so one of the things that Elon and yur Link talk about is one of the ways for us to achieve AI safety is to ride the wave of AGI so by merging so incredible technology in a narrow sense to help the disabled just amazing support at 100% for long-term Hybrid models both parts need to contribute something to the overall system right now we are still more capable in many ways so having this connection to AI would be incredible would make me super human in many ways after a while if I'm no longer smarter more creative really don't contribute much the system finds me as a biological bottleneck and even explicitly or implicitly I'm removed from any participation in the system so it's like uh the appendix by the way the appendix is still around so even if it's you said bottleneck I don't know if we become a bottleneck we just might not have much use there a different thing than bottleneck wasting valuable energy by being there we don't waste that much energy we're pretty energy efficient we could just stick around like the appendix come on now that's the future of we all dream about become an appendix to the history book of humanity well and also the Consciousness thing the peculiar particular kind of Consciousness that humans have that might be useful that might be really hard to simulate but you you said that like how would that look like if you can engineer that in in Silicon Consciousness Consciousness I assume you are conscious I have no idea how to test for it or how it impacts you in any way whatsoever right now you can perfectly simulate all of it without making any any different observations for me but to do it in a computer how would you do that cuz you kind of said that you think it's possible to do that so it may be an emergent phenomena we seem to get it U through evolutionary process uh it's not obvious how it helps us to survive better but uh maybe it's an internal kind of goey which allows us to better manipulate the world simplifies a lot of uh control structures uh that's one area where we have very very little progress lots of papers lots of research but Consciousness is not a big big area of successful Discovery so far a lot of people think that machines would have to be conscious to be dangerous that's a big misconception there is absolutely no need for this very powerful optimizing agent to feel anything while it's performing things on you but what do you think about this the the the whole science of emergence in general so I don't know how much you know about cellular autometa or these simplified systems where that stud this very question from Simple Rules emerges complexity I attended wol from summer school I love Steven very much I I love his work I love cell aoma so uh I just would love to get your thoughts how that fits into your view in the emergence of intelligence in AGI systems and maybe just even simply what do you make of the fact that this complexity can emerge from such simple rules so very rule is simple but the size of a space is still huge and the neural networks were really the first discovery in AI 100 years ago the first papers were published on neural networks we just didn't have enough compute to make them work I can give you a rule such as start printing progressively larger strings that's it one sentence it will output everything every program every DNA code everything in that rule you need intelligence to to filter it out obviously to make it useful but simple generation is not that difficult and a lot of those systems uh end up being T in complete systems so they are Universal and we expect that level of complexity from them what I like about uh Wolf's work is that he talks about irreducibility you have to run the simulation you cannot predict what it's going to do ahead of time and I think that's very relevant to what we are talking about with those very complex systems until you live through it you cannot ahead of time tell me exactly what it's going to do irreducibility means that for a sufficiently complex system you have to run the thing you have to uh you can't predict what's going to happen in the universe you have to create a new universe and run the thing Big Bang the whole thing but running it may be consequential as well it might destroy humans and to you there's no chance that AI somehow carry the flame of Consciousness the flame of specialness and awesomeness that is humans it may somehow but I still feel kind of bad that it killed all of us I would prefer that doesn't happen I can be happy for others but to a certain degree it would be nice if we stuck around for a long time at least give us a planet the human Planet it'd be nice for it to be Earth and then they can go elsewhere since they're so smart they can colonize Mars do you think they they could uh help convert US to uh you know type one type two type three let's just take the type two uh civilization on the CF scale like help us help us humans expand out into the cosmos so all of it goes back to are we somehow controlling it are we getting results we want if yes then everything's possible yes they can definitely help us with science engineering exploration uh in every way conceivable but it's a big if this whole thing about control though humans are bad with control because the moment they gain control they be they can also easily become too controlling it's the whole the more control you have the more you want it it's the the old power corrupts and absolute power corrupts absolutely and it feels like control over AGI I say we live in a universe where that's possible we come up with ways to actually do that it's also scary because the collection of humans that have the control over AGI they become more powerful than the other humans and uh they can let that power get to their head and then a small selection of them back to Stalin uh start getting ideas and then eventually it's one person usually with a mustache or a funny hat that starts sort of making big speeches and then all of a sudden you live in a world that's either 1984 A Brave New World and uh always at war with somebody and you know this whole idea of control turned out to be uh actually also not beneficial to humanity so that's scary too it's actually worse because historically they all died this could be different this could be permanent dictatorship permanent suffering well the nice thing about humans it seems like it seems like the moment power starts corrupting their mind they can create a huge amount of suffering so there's Negative they can kill people make people suffer but then they become worse and worse at their job the it feels like the more evil you start doing like the at least they're incompetent yeah they well no they become more and more incompetent so they start start losing their grip on power so like holding on to power is not a trivial thing it requires extreme competence which I I suppose and was good at it requires you to do evil and be competent at it or just get lucky and those systems help with that you have perfect surveillance you can do some mind reading I presume eventually it would be very hard to uh remove control from more capable systems over us and then it would be hard for humans to become the hackers that escape the control of the AGI because the AGI is so damn good and then yeah yeah yeah and then the the dictator is Immortal yeah this not great that's not a great outcome see I'm more afraid of humans than AI systems I'm afraid I believe that most humans want to do good and have the capacity to do good but also all humans have the capacity to do evil and um when you test them by giving them Absolute po as you would if you give them AGI that could result in a lot a lot of suffering what gives you hope about the future I could be wrong I've been wrong before if if you if you look 100 years from now and you're Immortal and you look back and it turns out this whole conversation you said a lot of things that were very wrong now that looking 100 100 years back what would be the explanation what happened in those 100 years that made you wrong that made the words you said today wrong there is so many possibilities we had catastrophic events which prevented development of advanced microchips that's a hopeful future uh we could be in one of those personal universes and the one I'm in is beautiful it's all about me and I like it a lot so we've now just to linger on that that means like every every human has their personal Universe yes maybe multiple ones hey why not you shop around uh um it's possible that somebody comes up with alternative model for building AI which is not based on neural networks which are hard to scrutinize and that alternative is somehow I don't see how but somehow avoiding all the problems uh I speak about in general terms not applying them to specific architectures uh aliens come and give us friendly super intelligence there is so many options is it also possible that creating super intelligence systems becomes harder and harder so meaning like it's not so easy to do the uh fo the take off so that would probably speak more about how much smarter that system is compared to us so maybe it's hard to be a million times smarter but it's still okay to be five times smarter right so that is totally possible that I have no objections to so like it's there's a S curve type situation about smarter and is going to be like 3.7 times smarter than all of human civilization right just the problems with face in this world each problem is like an IQ test you need certain intelligence to solve it so we just don't have more complex problems outside of mathematics for it to be showing off like you can have IQ of 500 if you're playing tic tac toe it doesn't show doesn't matter so the idea there is that the problems Define Your Capacity your cognitive capacity capacity so because the problems on Earth are not sufficiently difficult it's not going to be able to um expand its cognitive capacity possible and because of that wouldn't that be a good thing that it still could be a lot smarter than us and to dominate long term you just need some Advantage you have to be the smartest you don't have to be a million times smarter so even 5x might be enough it'd be impressive what is it IQ of a thousand I mean I know those units don't mean anything at that scale but still like as a comparison the smartest human is like 200 well actually no I didn't mean compared to an individual human I I meant compared to the collective intelligence the human species if you're somehow 5x smarter than that we are more productive as a group I don't think we are more capable of solving individual problems like if all of humanity plays chest together we are not like a million times better than world champion that's because the that there's uh that's like One S curve is the CH chess but humanity is very good at exploring the full range of ideas like the more Einstein you have the more the just a high probability you come up with general relativity I feel like it's more about quantity super intelligence than quality super intelligence sure but you know quantity and and enough quantity sometimes becomes quality oh man humans uh what do you think is the meaning of uh this whole thing why we've been we've been talking about humans and not humans not dying but why are we here it's a simulation we're being tested the test is will you be dumb enough to create super intelligence and release it so the objective function is not be dumb enough to kill ourselves yeah you unsafe prove yourself to be a safe agent who doesn't do that and you get to go to the next game the next level of the game what's the next level I don't know I haven't hacked the simulation yet well maybe hacking the simulation is the thing I'm working as fast as I can and if physics would be a way to do that quantum physics yeah definitely well I hope we do and I hope whatever is outside is even more fun than this one cuz this one was pretty damn fun and uh just a big thank you for doing the work you're doing there's so much exciting development in Ai and to ground it in the um the existential risks is really really important humans love to create stuff and we should be careful not to destroy ourselves in the process so thank you for doing that really important work thank you so much for inviting me it was amazing and my dream is to be proven wrong if everyone just you know picks up a paper or a book and shows how I messed it up that would be optimal but for now the simulation continues thank you Roman thanks for listening to this conversation with Roman yski to support this podcast please check out our sponsors in the description and now let me leave you with some words from Frank Herbert and dune I must not fear fear is the mind killer fear is The Little Death that brings total obliteration I will face fear I will permit it to pass over me and through me and when it has gone past I will turn the inner eye to see its path where the fear has gone there will be nothing only I will remain thank you for listening and hope to see you next time

Transcript for:Future Risks and Concerns with AGI: Lex Friedman Podcast with Roman Yampolskiy

Transcript for:
Future Risks and Concerns with AGI: Lex Friedman Podcast with Roman Yampolskiy