Yandere Simulator Performance Optimization Review

Hmmmm, well, well, well, what do we have here? Well, it's finally here. The Yandere Simulator code review. I have a lot of stuff to talk about. But before we begin, I want to make it very clear, because last time it didn't get through your thick fucking skulls. No bullying. Anyway, as you can see from how long this video is, this video has been two months in the making. I started working on this using the May 15th build, but since then three other builds have come out on June 1st, June 15th, and July 2nd. been corresponding with Yandere Dev to fix some of these issues and some of them have already been implemented in these builds. So what problems does Yandere Simulator have? Players are frustrated with the slow progression of development and how poorly the game runs. I've spent a lot of time investigating every little nook and cranny of this game, the C-Sharp compiler, and the Unity editor. In the live stream I looked at decompiled code, but now we have the actual leaked source code. I'm pretty confident in saying that you will not find a better analysis of the performance and architectural problems in Yandere Simulator. I need to get a feel for how the game runs in order to know what to look for. It's important to note that for my initial benchmarks I used wine to run the game via the Lutris installer and that's why the text is missing. I booted up Yandere Simulator avoiding unfocusing the window because that would cause it to crash. Hopefully a wine specific bug, not a game bug. After making sure VSync was off and adjusting Yandere's breasts to the correct size, I walked around school. Right away, I looked at the very conveniently placed FPS counter on the right side of my screen, and I noticed that I was not hitting 60 FPS. I thought this was pretty strange considering my PC's specs. Now admittedly, this initial load into the school scene is very heavy. A fade from white is cleverly used to cover up the first two frames where game initialization stuff happens. Looking at the output of HTOP and Nvidia SMI, we can clearly see that neither my CPU nor my GPU are being maxed out. To confirm my suspicions, I ran Doom 2016. Here you can clearly see that my GPU is being used way more. I also ran Hitman 2, a game whose technical problems are very similar to Yandere Simulator, and the same thing happened. This could mean that the bottleneck for framerate in Yandere Simulator isn't the CPU or GPU, but I don't think so because that could just be a wine bug. and not an issue with Yandere Simulator. I quickly realized that I would be unable to do anything but speculation unless I could run Unity's profiler on the game. In order to do that, I would need the full Unity project, so I kindly asked YandereDev for the source code. Unfortunately, he declined, and he was concerned when I mentioned decompiling the game and sort of implied that he didn't want me to do it. So I did it anyway. After about five days of performing a dark magic ritual known only to the compiler gods, I had a working Unity project for Yandere Simulator. I think I'm the first person to ever get Yandere Simulator running inside of the Unity editor. Of course, it's not perfect. Most of the shaders don't work. There's a little bit of z-fighting here and there. Who cares? Now, before you ask, no, I can't redistribute this. Sorry, modders. After running the profiler, I was pleasantly surprised to find that most of the time was being spent rendering. More specifically, drawing opaque geometry was taking the most amount of time. Changing to wireframe mode, we can see a lot of geometry. Like, a lot. On the students in particular. Since these models were sourced from volunteers, I would be willing to bet that most of the artists aren't aware of best practices optimizing assets for games. I won't pretend to know either, but I do know that the more vertices there are, the worse performance becomes. For example, the time to compute animations depends on how many vertices each bone affects. Yandere Simulator's assets need to go through a major optimization sweep. with a particular focus on how much detail needs to be present given the context in which the asset is used. For example, the school perimeter wall is partly optimized. Looking from above, we can see the wall has zero thickness. So this means it should just be one quad since you only need four vertices to make a flat rectangle. But at the top of the wall, there's additional unnecessary geometry. Additionally, the fancy pillars on the walls have triangles that are never visible to the player. Optimized assets would not only improve frame rate, but it would also improve load times, memory usage, and game size because the file sizes would be smaller. However, one optimization that is being used is combined meshes. This means that fewer draw calls are executed. When a build is created, Unity automatically combines meshes that are only used in one scene and are not referenced by any scripts. But this could probably be optimized a bit more manually. In games, LOD refers to the practice of lowering the level of detail in 3D models based on their distance from the camera. Most of the time, this requires artists to make 2-3 alternative, lower-poly models that get progressively less detailed. There are experimental tools to generate these models automatically, but I wouldn't use anything this experimental on a real project. Actual LOD is not implemented in Yandere Simulator. partially there via the low poly student feature, but this only works for students and it makes it impossible to tell students apart from each other by any other way than their gender. Honestly, it doesn't make a huge difference to frame rate either. Occlusion culling is where the camera doesn't render objects if they can't be seen, aka occluded, because other objects are in the way. This is basically a must-have in most games. Thankfully, Unity provides occlusion culling as a built-in feature, and Yandere Simulator uses this system. The heart rate monitor line is rendered using a separate camera and overlaid onto the main camera's render, which is completely separate from the UI camera. It uses a Unity line renderer to draw the line. This is probably fine, although it could probably be remade as a shader, but shaders are hard. The problem is that this camera takes a long time to render a whole lotta nothing. This is probably the result of having its culling mask set to render the default layer. Everything else is on the default layer, so the heart rate monitor should go on its own layer and have the camera render only that layer. A significant amount of time is also dedicated to physics calculations. There seems to be a significant amount of colliders or triggers that are not in use or do not have an apparent use. For example, the main entrance doorway has a collider above the door that is not possible to hit. This collider only exists to prevent the camera from clipping inside the geometry, so it can be put on its own layer so it only has to check for collisions with the camera. The profiler says that animation calculations take 2.25 to 2.5 milliseconds each frame when the school scene initially loads. This is partially mitigated by the stop animation script, which disables animations on students that are too far away. However, I was able to find a few students that were still animating even though they weren't visible. Some of the character animation components had culling types set to always animate and not based on renderers. We'll come back to this when we talk about implementation details. A common theory for why performance is poor in Yandere Simulator is that the scripts that YandereDev writes are inefficient. We already know that this is is mostly not true because a lot of time is spent on rendering. However, a good 8 to 10 milliseconds is being spent running scripts. Looking at the profiler, we can see that most of the execution time for updating individual monobehaviors is not even that bad. The worst functions are uirect.update and aiBase.update. Uirect is a class from NGUI, a UI framework for Unity, and AIBase is from ASTAR, a pathfinding library for Unity. Neither UI Rect nor AIBase classes were written by YandereDev, although they could probably be used more efficiently. Let's take a closer look at the UI. I bet there doesn't need to be 1600 UI components active all the time. I noticed a high number of prompt script instances, so I figured that would be a good place to start. Here, The awake function instantiates all of the UI objects required for that prompt. All five of them. If the button only accepts one button, but if it accepts all four possible buttons, the prompt will instantiate up to 14 UI prefabs and an additional one if the prompt is considered noisy. I reckon we found the culprit. If you're looking at optimizing memory usage, this is the first thing I would investigate. Looking where the prompts get instantiated, we can see that most of these UI components are already disabled, except for the objects that are named Letter. Fortunately, this is definitely one of the more easy problems to fix, which we'll do later. Next up is the student script, of which there is one call for each of the 85 spawned students. On average, all the calls to studentScript.update in one frame have a total execution time of less than a millisecond. Despite being by far the longest script in the game, the next most intensive scripts also take less than a millisecond. Which are the prompt scripts, which are used for interacting with objects. The dynamic bones, which are used for hair or anything else that could jiggle, except for tits. Yandere script, which is used for controlling the player. And the highlighter, which is used for highlighting objects in Yandere Vision. All of the other scripts... take less than 100 microseconds to execute, with the majority of scripts taking less than 10 microseconds to execute on average. So where is all that time being spent? I made a quick script to count the number of game objects and components in the school scene. There are about 49,300 game objects in the school scene, give or take 100 or so. On all of the game objects, there are 53,015 components with 318 types of components. of unique components excluding transforms. Not all 318 unique scripts run every frame, but about 200 of them do. Most of them only get called one or two times and take less than 10 microseconds to execute, but with that many scripts the microseconds add up. The easy thing to do here is to figure out what scripts don't need to run and disable them until they are needed. So far we've only been talking about what happens in the update step. There's also about a third of the script execution time being spent on the late update step. All the instances of dynamic bone take up 50% of this time and a single UI panel takes up about 40% of this time. The dynamic bone updates make some sense, but the UI updates are really strange. At first, the object that was responsible for this UI panel was called timeless panel, but after messing with it a little bit, the load moved to another object. I was very confused. How could this even be possible? Looking for answers, I switched to deep profiling and I found that there's about 1500 calls to update transform and update geometry here. Looking at the code for these functions, there isn't any obvious big loops or anything, just matrix math. Switching to the raw hierarchy mode, we can see that even though there are a lot of calls to these functions, Only the first call to update self takes a long time to calculate. Matrix multiplication is taking a lot of time, and there's a lot of calls to dynamic lists, but after looking at the code, there isn't really much that is inefficient on its own. The next thing we can try is to reduce the number of calls to these functions. One by one, I started disabling the panels to see if the number of calls to update transform would go down. Strangely, It didn't really go down as much as I would expect it to. Turns out, I was running around in circles, because after I disabled the prompt parent, all of the calls went away. These are the things that actually matter for performance. There's a lot of speculation on the internet about what is bad code or not performing and most of it is wrong. I'm going to go one by one through each of these criticisms that people seem to cling on to and validate whether or not they're actually true. Note that I'm only focused on performance concerns right now. We'll get into architectural concerns later. By far the most abundant criticism is the lack of switch statements and opting to use lsif chains instead. I've benchmarked three different scenarios. lsif versus switch case with one integer input and a boolean output and lsif versus switch case with an integer input and a string output and a switch case versus integer to enum typecasting with one integer input and enum output. Each of these scenarios are run for 10 million iterations. The results show that yes, technically, switch cases are in fact faster than L-sif chains, but we had to do that 10 million times in order to see any difference that was statistically significant. And even then, it fluctuates a lot. Nothing in this game runs 10 million times in one frame. Additionally, for the first scenario where we output a boolean, When the Unity player is built in production mode, the lsif chain gets optimized to a boolean expression, which would be as fast if not faster than a switch statement. Therefore, this point is at best invalid and at worst, naive. A fairly common pattern that is seen in the code is a switch case or lsif chain that converts an integer into an enum or vice versa. This is completely valid, but But C-sharp allows you to explicitly cast integers into enum values. Again, technically this is faster, but more importantly it's more dynamic, meaning you can add new values to the enum and not have to update the functions manually, and you wouldn't even need functions to convert them in the first place. So this is an architectural problem. Not a performance problem. By extension, this conclusion also means that every little if statement that gets run every frame does not contribute jack shit to execution time. But it's not best practice, you say? Let's take a minute to talk about this, because I know some of you guys won't watch the full video. Let's see what it would look like if we use switch cases as much as possible. I painstakingly converted As many of the if statements in student script as possible, without regard whether or not I should, only taking into consideration if I could. After looking at the profiler, we can clearly see that the difference in execution time is not statistically significant. As for how the code looks, eh, that's subjective. Is it any easier to read? No, not really. In fact, I would argue that it's even harder to read in some places. If the switch case matches integers, you have to scroll all the way to the tops to see what variable it's comparing. There are several situations that resulted in nested switch statements, which is a big no-no in my opinion. Additionally, some IDEs don't let you fold individual cases, which makes big long switch statements even harder to work with. A great example of where a switch case shouldn't have been used is Here for the logic of report phase for students with the teachers pet persona Already we can see that a switch case doesn't really capture all of the logic required for the report phase But that's not the worst of it. Look at how long this switch statement is Look at how many follow-through cases are required to be functionally equivalent to the else if chain This is because switch statements only work based off of exact matches. This is also why you shouldn't use floats in switch cases. If it doesn't exactly match any of the cases, it goes to the default case. Hang on, stop writing that comment, stop writing that comment. Yes, I could have made the default case contain that if report phase is less than 100 statement, like this, but that's another if statement and that would completely undermine those precious performance benefits. And we can't have that. Oh, and a little side note here. If there are enough cases in an lsif chain, it gets optimized into a switch case anyway. The compiler is smarter than you. This one's actually true. Any frequently used components should be cached as private instance variables. In a similar vein, calls to any form of gameobject.find or or any references to camera.main should also be cached. So does the code use any of these in the update functions of objects in the school zine? No, not really. Otherwise, we would see it in the profiler. yandere dev appears to have already taken care of the low-hanging fruit. Another common point is using vector3.cache distance in update functions because calculating the square root is slow. One of the ways to fix this is to use a distance function that does not use square root, like using the square magnitude of the vectors, or using the Manhattan distance. distance. In embedded systems, ARM processors, and ye olden days of game development, this is true. But on modern CPUs and normal computers, this shit is a myth. It is a myth. This is no longer true. I specifically tested four different ways of calculating the distance. Vector 3 dot distance, aka the Euclidean distance, the Manhattan distance, A minus B dot square magnitude and a minus b dot magnitude which is functionally equivalent to the vector 3 dot distance my results showed that there was practically no difference in execution time between any of these most of the time vector 3 dot distance was straight up faster than everything else the The reason that square root operations don't matter anymore on modern desktop CPUs is because the operation is computed with a single CPU instruction. This is the source code for the GCP. compiler and right here in the square root function a single instruction is being used. One instruction, one instruction, one clock cycle. You can't get faster than that! Let's talk about actual performance concerns. Code that runs every frame should avoid operations and functions that allocate memory as much as possible and if memory allocation is required it should do it all at once. Here are some examples of stuff that allocates memory. String concatenation. LinnQ expressions. like anything that makes an array, and instantiating and destroying objects. In a similar vein, code that runs every frame should also cache things that take a long time to process. For example, getComponent, gameObject.find, camera.main, CsharpReflection, and pathfindingResults. In Yandere Simulator, both of these concerns appear to already be taken care of. Now that we're done talking about performance for the time being, let's jump into the game's architecture. Now, there's no denying that this game is a mess. Even YandereDev thinks so. But in my stream, I focused a little bit too much on cosmetic style things rather than architectural problems. Let's take a look at how Yandere Simulator is structured. Most of the core important stuff appears to reside in Yandere script, student script, and student manager script. Before we jump into this, I would like to point out how Yandere script and student script are a few thousand lines shorter than the ones I showed on stream. This is a great demonstration that line count doesn't matter. I used a different decompiler this time around, and this decompiler likes to remove squiggly brackets around one line. if statement clauses. It's also generally more accurate, as you'll see later on when we talk about decompiler artifacts. Yandere script and student manager scripts are actually completely fine to be the size that they are, given their complexity and the context of the game. Of course, that doesn't mean they can't be improved. Student script is another matter. This component handles the behavior of every person who is not the player, including teachers and the nurse. For anyone reading the code, this is a little misleading. But more importantly, this also means that all person-specific code gets stuffed into this one massive component. Like I mentioned before, this isn't a performance concern per se, but rather an architectural concern. When each student is initialized, some of their properties are read from streaming assets students.json. This is a good thing because otherwise the student script would be at least 2,000 lines longer. 20 properties times 100 students. Additionally, this file is read by the student manager, and the student manager applies these properties as the students are being instantiated. If this wasn't the case, then each student would have to read from the same file, iterate through the list searching for the matching ID, and finally apply these properties. For those of you who don't know, file operations are very expensive to perform. To open a file, the programmer has to make a syscall to the kernel, When the OpenSys call is made, the thread stops executing to allow the context switch from user space into kernel space, check permissions and whatnot, and receive a file handler. Of course, the details vary between operating systems. Anyway, the problem with this approach for students is that any student-specific behavior is muddled and interwoven with generic student behavior. Let's come up with an alternative that would enable the separation of student-specific behavior from generic student behavior. First, let's define what a student needs to do. Actually, we'll call them people instead. In order to function, a person needs to store some state. More specifically, store the state of their appearance and stats like hair color, health, strength, personality, stuff like that. They also need to store their normal routine. and they need to store the state of their current activity and current objective, like painting or dying. Also, every person needs to be able to complete their daily routine, react to the player committing crimes, react to blood, weapons, dead bodies, die and become a ragdoll when dead. What kinds of people do we have? We have students with a special rival subtype, a special student council subtype, and a special delinquent subtype. We also have teachers, the nurse, and the coach. Excluding the guidance counselor because she's not a real person. Even though she has a student script component attached, it's disabled. Now we can start laying the foundations of our new architecture. Let's start with a person's routine. A routine is a chronological list of activities. A person's routine could also change depending on the day of the week. Activities contain the start time and a method that defines how to complete the activity. The name of the activity is conveyed via the class name. Finally, activities are initialized via their constructor and must implement an onActivityStart method and onActivityEnd method. The advantage of this approach is that it allows us to define unique routines for each person. If we wanted to, we could also generate random routines, but that's not a design requirement. This will also allow us to separate logic used to update appearance, like determining what animation to play from brain-type logic, you know, the part that gives the illusion of intelligence. Additionally, This will allow us to override the activity at any given time based on the state or based on special event triggers like the player pointing their phone camera at them. Further building upon this, activities will not modify the person's state directly. Instead, there's a think method and a do method. The think method modifies the state and the do method acts upon the state. This will allow game logic and other logic for animations and such to be separated. In addition, this will allow activities to be easily converted to take advantage of the new Unity job system, should that be necessary. The job system automatically takes care of offloading processing to other threads, avoiding some overhead involved in spinning up new threads, and providing a nice API. An additional benefit is that we won't ever have to check the state to update the animations every single frame, because we can do that just by checking the state whenever it's committed. Now that we have this foundation set, all of the remaining requirements are pretty trivial to implement. Reacting to seeing a crime, blood, weapons, dead bodies can be implemented in a special activity that gets executed no matter what the current activity is. This activity would commit state when a person is alerted to trigger an override of the current routine's activity. dying and becoming a ragdoll would stop the execution of all activities for that person. I'm not saying that this architecture is the best possible solution. There is certainly a lot of room for improvement. For example, this architecture won't work well for activities that require multiple people. This is merely an example of what it could look like instead, instead of what it is currently. This is a fairly object-oriented approach which I know isn't the hot new design paradigm, but given the problem, this felt like the most natural solution. Now we're going to talk about implementation details. What that means is how specific features are implemented. Some of this comes down to personal preference, but also there were a lot of features that were implemented poorly, and I want to explain precisely why that is. When you make a save in Yandere Simulator, it creates a YandSave file in the same directory where the game is installed, where most of the information is stored. This file is pretty hefty and contains a lot of bloat, and saving to one slot creates a file that is one megabyte large. If you make multiple saves in different slots, this number increases. Inspecting the contents of these saves reveals that information is stored as JSON. This is not usually a problem. But given the absolute size of these lads, it would be better to use a custom byte format instead. Since JSON is a text-based format, a lot of bytes are wasted on squiggly brackets, quotes, colons, string representations of floats, and property names. In addition, there's a lot of bloat that doesn't need to be saved in the saved data because technically it could be calculated on load based on the state. For example, camera position. Remember that bit I talked about in the hypothetical architecture where the state of a person is copied before changing it? The same concept could be applied to saving and loading. But that file isn't the only place where save data is stored. Some save data, like corkboard photos or school atmosphere, seems to be stored using player prefs, a Unity class intended to be used to store the player's preferences, like video options. This is a problem because on Windows, player prefs are stored in the Windows registry. This makes it very hard for players to backup, transfer, or share saves. Interaction button prompts are implemented in a way that is not very dynamic. The current implementation has a unique instance of prompt script for each interactable object. including any GUI elements required to render the prompt. This is why the prompt script even shows up near the top of the profiler. When a prompt is being interacted with, the interactable object itself checks to see if the circle is filled before triggering the code that's run on interaction. Here's how I would implement this system instead. Now, this is one of those features that is deceptive. It seems easy to implement on the surface, but its true complexity reveals itself when you actually go to implement it. I would start out by creating a component called Interactable. This component would be placed on any object that needs to be interacted with. Interactable handles the interaction input from the player and triggers the onInteract event when the circle is filled. For each object that is interactable, a new component is created, or an existing component is used to add an onInteract method that handles the event. Now this This may seem pretty simple, but this is actually a pretty naive implementation so far. There are a few problems with this. There's no indication to the user that a given object is interactable if it's not in range. And if there's two interactable objects in range, two prompts appear when only the one that the user wants should appear. This is where that hidden complexity starts to reveal itself. The first problem can be solved very easily and in many different ways, so I'll leave that as an exercise to the viewer. The latter problem is a little more interesting. To fix this, we can add an additional field to the interactable component to indicate whether or not it's the closest. Then the player can calculate the distance between it and the interactables and set which one is closest based on the player's position. After doing a little bit of profiling, we can see that this implementation is very fast, even with hundreds of interactable objects. but there's still more that we need to do with the system to match what the system in Yandere Simulator does. Since only one prompt needs to be visible at one time, we can save ourselves the trouble of instantiating and destroying the prompts by just having one available and disabling it when it's not needed. While we're at it, we can add little flavor text for what the interaction will do. Finally, we also need to be able to prompt for more than one button at a time. Let's see how it performs now. It's still super fast! We can have up to 2050 interactable objects and still get 60 FPS, as long as we look in the opposite direction. This also further proves that vector3.distance is very fast. The very first time I opened the school map, it was incredibly laggy. That's what led me to investigate the implementation of the school map. Here's how it works. There's an orthographic camera in the center of the map pointing down. The size of the camera's frustum spans the entire map. The camera can show different floors by moving to preset positions on the y-axis. The camera also defines a short enough clipping distance so that other floors won't get rendered when they don't need to, and it only renders objects in the default and top-down layer according to its culling mask. The benefit of this implementation is that it's very easy to do, and doesn't require a whole lot of maintenance, nor upfront development time. It's actually fairly common development practice. The downside in this scenario is that the default layer is being rendered, and most objects reside in the default layer. Most games have an image they use for the map background instead, so they aren't rendering so many objects. You might be thinking, oh, I'm making changes to the school map, like, all the time. If I change the map to be a picture instead, I'll have to take a new picture every time I update the map. Thankfully, Unity has a feature where you can write editor scripts that run only in the editor. These editor scripts allow you to automate most repetitive or tedious tasks, or let you augment your user experience. You could easily compare them to macros. You can make an editor script to render a frame from the map camera to a file. and just have the map render that image. Then you can just run the script when you build your game, or better yet, automate your build pipeline too so you can include all the steps needed. However, considering that the map is not open every frame, this is probably not an issue, but it could probably be improved with proper LOD. There are a bunch of empty classes that just inherit a generic class. For example, int hash set and int end string dictionary. I thought this was weird at first, but it turns out that this is actually a workaround for a Unity bug that doesn't allow generics to serialize. On top of that, this bug was just fixed in Unity A3. Yandere Simulator runs Unity version 2019.3. 0.7f1 and it would not be worth upgrading to a buggy pre-release version just for this bug fix. Therefore, this is not a problem. So I want to preface this by saying this is going to sound a smidge nitpicky. Public fields on components for individual items and arrays are used pretty heavily. A large portion of these cases are used to store a bunch of game objects or components. Looking at the student script in the inspector can show how this practice has gotten a little bit out of hand. Just look at all this shit. It's really hard to find what I'm looking for when I'm scrolling through this massive list of editable fields. But this can be solved by a little bit of organization and I think the decompiler messed with the order of the fields a little bit. Now this isn't a bad thing regarding performance. But it is a terrible thing when it comes to maintenance and adding new features. Most of these references are static, and what I mean by that is a person has to manually drag and drop each and every one of these fields to fill them out. Most of these fields are required in order for the component to function. Instead of creating a new instance of the component normally, it's usually just easier to copy one that you've already set up. But this is pretty error prone. Um. But wait, there's more. There are copious amounts of hard-coded references to specific indexes in these arrays. This is technically faster than searching a list or using any variation of the built-in GameObject.find functions because its runtime complexity is O. However, this practice does make it very tedious to work with these components. But I'm not done yet. The majority of these arrays are treated like they are one indexed. No arrays should be treated like they are one indexed arrays because arrays are zero indexed in C sharp. Now, I actually have a theory as to why this is still the case after all this time. Ultimately, it's probably an artifact of when the code used to be written in JavaScript. JavaScript is a dynamically typed language, and the way it treats arrays is kind of weird. For context, in JavaScript, objects are treated like dictionaries. You can assign an object's property by just doing it. JavaScript treats arrays like an object that can only accept numbers as the key. A common issue that novice programmers have is the concept of zero-indexed arrays. A zero indexed array's first element can be accessed using the index 0. A one indexed array's first element starts at the index 1. Many novice programmers gravitate to wanting to use one indexed array's because you start counting at 1. However, most languages use zero indexed arrays and JavaScript is no different. The reason that zero indexed arrays are preferred is that the index refers to the offset, and the offset refers to the offset. from the array's starting memory address. This started in the C language and patterns arose that were just preferable to work with. Now, JavaScript arrays are zero indexed, but you can put any number you want for the index in an empty array, and JavaScript will fill in undefined for all of the previous elements. Either one of these problems on their own would be fairly easy to fix. But the problem is that a fix for both of these problems would require a lot of work, because changing the code to treat these arrays as zero-index would also require changing all of the hard-coded references. So how should we write code to avoid these problems? The most obvious tip in this case would be to use zero-indexed arrays, because that's how C-sharp was designed. But what about all those hard-coded and manually assigned references? The answer is that it depends. For example, in my interaction system demo, notice that I find all the interactable objects in the start method and cache the results in a private instance variable. In this case, I want to be able to add more interactables without having to mess with something unrelated. Another example, in Yandere Simulator, lots of components require a reference to the player component. In this case, it would be better to just have one reference to the player component in a static class that is globally accessible. It all depends on the specific context of how these references are used, how often they are used, and convenience. This is probably the biggest problem with the game's code. Looking at YandereScript, there are 130 public booleans, and StudentScript has 196 public booleans. Most of these are dedicated to just keeping track of state. Now, the naive way to fix this is to just put all the booleans in a single enum and call it a day. Unfortunately, we can't do that because sometimes two of these booleans need to be true to properly represent the state. Currently, the part of the student's state that is represented with booleans or enums is character attributes like gender, Whether or not the student is a teacher, whether or not the student is in a club, what club they're in, personality, stuff like that. The character's appearance, like are they wearing club attire or are they wet? Interpersonal relationship status, whether or not the teacher's trust has been lost, whether or not the student is in a relationship, or whether or not the player has complimented the student, whether or not the student is alerted, and the cause, e.g. whether he heard a noise or something. whether or not the student has witnessed anything and what type of crime was witnessed, whether or not the student is suffering and the cause, like bleeding or poison, the current action being performed, like eating, changing clothes, dying, and the current objective of the action, like turning off a radio or apprehending the player. We can already see that this is Pretty complicated, and I'm sure there's stuff that I missed. As previously demonstrated, in C-sharp you can assign numbers to enum values and cast between them at ease. The side effect of this is that you can combine enum values with bitwise operators. Let me explain what this means. Okay, so originally I did like a really long tutorial thing here, and it was it was it was way too long and boring, so instead I'll do it really fast. You can assign values to the enum that are powers of two to make each enum value correspond to a bit in their binary representation. You can use this to do bitwise operators on the enum values so you can represent two enum values at once by setting the bits in the number. To make this a little bit easier to think about, you can use bitshift operators to assign the values to the enum. Now we can apply this to the witness state. Currently, the witness state is stored in an enum called student witness type. And it looks like this. Instead of having values like blood and insanity, we can assign each one of these values to their own bits and use a value of blood, bitwise, or insanity. Honestly, the way that the state is currently represented doesn't exactly transfer very easily to enums, despite what everyone really wants to believe. The current implementation doesn't exactly lend itself to what it's trying to accomplish that well. So instead of trying to represent the current state better, let's redesign how students observe and react to the world, building on top of our hypothetical architecture from before. Our hypothetical architecture actually Pretty much solves this problem. Most of the student state can be inferred by the action. For example, we can infer that a person is in a heightened alert phase if their current activity is investigating a noise, and we can infer that a person has witnessed a crime if their current activity is reporting the crime. Like I mentioned before, The profiler says that animation calculations take 2.5 to 2.5 milliseconds each frame when the school scene first loads. My benchmarks show that the more complex your animation systems get, the faster the current animation system, Mecanim, becomes over the legacy animation system. However, the legacy animation system is way faster for simple animations. I'm honestly not sure how big of a performance benefit this would be though. in the context of Yandere Simulator. More importantly, a common pattern in Yandere Simulator's code is that first it plays an animation, then after a certain amount of time passes, code gets run to update the state or other game objects. In my opinion, it would be a lot cleaner and easier to work with to use event triggers in the animation clips to run code instead. Just to make sure this works in both the Legacy System and Mecanim, I even tested it out with this cutie I found on the Asset Store. God, look at that cutie. Multi-threading in games is really hard, and if you aren't doing it from the start, it's even harder. Fortunately, Unity has been putting a lot of work into new systems to improve performance in Unity's games by default called DOTS, or the Data Oriented Technology Stack. Unfortunately, to take advantage of this new system in Yandere Simulator, practically the entire game would have to be rewritten because the ECS, Entity Component System, is fundamentally different from how Unity's Classic Component System works. However, I wouldn't recommend using features that are still in preview, but I think the performance and architectural improvements would be beneficial to this game in particular. This may warrant using a different engine. or using a custom engine. However, the new job system for DOTS is able to be fairly easily implemented into existing projects. The job system lets you really easily send stuff to another thread to be processed. The catch is that you cannot directly update any Unity objects from the jobs, and you can only pass in data that is blittable, meaning that it has the same representation in memory in both manage C-sharp code, and behind-the-scenes Unity native code, so it doesn't need a conversion. You have to put all the data you want to work with in a struct, pass it to the job, and get the result, and then reapply your results to the game objects and components. There are some scripts that are great candidates for this kind of an upgrade. The UI library, NGUI, could probably be rewritten to take advantage of jobs. The pathfinding library, A-star, which is... already multi-threaded, would be a perfect candidate to rework to use the job system. AIPath, which inherits AIBase, is used to move objects along a path that has already been calculated. It's a perfect candidate to be refactored into a parallel for transform job, which is specifically made for moving a bunch of objects in jobs. By far the best candidate is the dynamic bone component. because all of the data that it works with is already in its own class called particle. Unit tests are little pieces of code that make sure things work as they should. The purpose of these automated tests is to make sure you don't introduce new bugs or reintroduce old bugs by accident. Unit testing is really uncommon in game development. I think this is really unfortunate because Adequate testing has caught so many bugs for me in my projects before they reach the end user. Unfortunately, most of Yandere Simulator's components change the state directly, and this makes them pretty difficult to test. I saved the best for last. Let's take a look at how the FPS is calculated. Frame rate script calculates the FPS over an interval of 500 milliseconds. This does not properly account for spikes in FPS. As a result, the FPS that is displayed is a lot lower than what is actually being rendered. There's a very good reason most games display the current, average, minimum, and maximum frame rate. Doom is an excellent overkill example. Funnily enough, if another hour or so were spent on the frame rate script, It probably would have saved him like a few years of grief. For the sake of being as comprehensive as possible, I'm going to go over what is and what isn't a decompiler artifact, and some general tips for writing better code. Is string.concat a compiler optimization? Yes. Only A plus B string concatenation and string interpolation is affected. String.format is not affected. Why are there basically zero constants and tons of hard-coded values? Well, that's because the compiler resolves all constants to their literal values at compile time. This means that every constant is basically find-replaced with their value. For example, there is a class called animNames that contains all of the animation names as constants. This is actually good practice because hard-coded strings that are used to affect program flow are a really bad idea. You don't want to be debugging a typo for hours. Why the hell is everything explicitly cast? The decompiler did that. I don't know why. You could implicitly cast them and it would be no different. What's up with these really precise float literals used in these comparisons? Floating point numbers have to be stored in a fixed amount of bytes. Specifically, single precision floats are 4 bytes long and double precision floats are 8. 8 bytes long. In order to fit these numbers into these bytes, IEEE used black fucking magic back in 1985 to cram as much information into these bytes as possible, and this standard is called the IEEE 754 floating point standard. Unfortunately, the ritual used to pull this standard out of some point near its ass wasn't perfect. As a result, only seven places after the decimal can be represented for a single precision. and 15 places for double precision. Additionally, the numbers can get rounded, as demonstrated here. These are called floating point rounding errors, and they are 100% unavoidable. Are string comparisons slower than comparing other types? Technically, yes. But, like if statements versus switch statements, it doesn't really matter. In C-sharp, strings get hashed into an integer and then get compared. And this is a lot faster than checking each character in the string. This is also how dictionaries, where the key is a string, work. Why? Why the hell are there so many goddamn inline ifs? They're so hard to read! I think you can probably guess what I'm going to say. It's a decompiler artifact. They're just, they're functionally equivalent to if statements anyway, it doesn't matter. There are a bunch of while and for loops that reference the instance variable this.id instead of normal for loops with a local variable. This is actually not a decompiler artifact. The decompiler can recreate both using an instance variable and using a local variable accurately. This means that this must be intentional. I don't know why you would do this. Why is this loop a while or a for loop? It should be the other one. The.NET intermediate language does not differentiate between while loops and for loops. So the decompiler has to guess which one it's supposed to be based on the context. Since we have the leaked source code now, we can actually criticize style. I'm going to go through this very quickly because, while I don't know why you would torture yourself, everything that I'm going to display here is completely valid and purely cosmetic. Meaning it does not affect instead of using a local variable in for loops, seemingly random concatenation of string literals, ridiculously deep indentation, student globals.memorial students should be an array, it's just easy to work with. Again, why would you torture yourself? Remember, creating new arrays allocates memory, not using existing ones. Comments that describe what if statements mean when the conditionals should be able to explain themselves without comments. Using the Unity Editor compiler flag to disable Osana instead of using a custom compiler flag. This is bad practice because if you want to create a standalone build with Osana enabled, you have to manually edit the code. This just makes it take longer for Osana to come out. So what does this all mean anyway? Yandere Simulator is a perfect textbook case of how technical debt influences a project. Technical debt is an industry term that refers to engineering decisions that are really quick and easy to implement now, but may cause more time to be wasted later on. Over the years, it's felt like development has grinded to a screeching halt, and after really reading the code and really digging into it, I can see why. Looking at different parts you can kind of tell what YandereDev was thinking. There are parts that actually make a lot of engineering sense given the architecture, but a critical aspect of managing technical debt is knowing when to refactor. It becomes kind of a sixth sense that you only gain from experience. The game's architecture needed to be refactored way back in 2016. Now, especially with Osana being functionally complete, that's not a good time to be refactoring. not really an option. Good luck Yandere Dev, you're gonna need it. If you made it this far, thank you so much for watching. It really means a lot that people care what I have to say. It's pretty amazing how my Yandere Simulator stream blew up. I want to thank everybody for over a thousand subscribers here on YouTube, 150 followers on Twitch, and almost 100 followers on Twitter. It really means a lot to see all the support that I've gotten recently. Subscribe for more stuff. I'll see you guys next time. Fucking subscriber he gets it my editor gets it

No bullying. Anyway, as you can see from how long this video is, this video has been two months in the making. I started working on this using the May 15th build, but since then three other builds have come out on June 1st, June 15th, and July 2nd. been corresponding with Yandere Dev to fix some of these issues and some of them have already been implemented in these builds. So what problems does Yandere Simulator have?

Players are frustrated with the slow progression of development and how poorly the game runs. I've spent a lot of time investigating every little nook and cranny of this game, the C-Sharp compiler, and the Unity editor. In the live stream I looked at decompiled code, but now we have the actual leaked source code. I'm pretty confident in saying that you will not find a better analysis of the performance and architectural problems in Yandere Simulator. I need to get a feel for how the game runs in order to know what to look for.

It's important to note that for my initial benchmarks I used wine to run the game via the Lutris installer and that's why the text is missing. I booted up Yandere Simulator avoiding unfocusing the window because that would cause it to crash. Hopefully a wine specific bug, not a game bug. After making sure VSync was off and adjusting Yandere's breasts to the correct size, I walked around school.

Right away, I looked at the very conveniently placed FPS counter on the right side of my screen, and I noticed that I was not hitting 60 FPS. I thought this was pretty strange considering my PC's specs. Now admittedly, this initial load into the school scene is very heavy.

A fade from white is cleverly used to cover up the first two frames where game initialization stuff happens. Looking at the output of HTOP and Nvidia SMI, we can clearly see that neither my CPU nor my GPU are being maxed out. To confirm my suspicions, I ran Doom 2016. Here you can clearly see that my GPU is being used way more.

I also ran Hitman 2, a game whose technical problems are very similar to Yandere Simulator, and the same thing happened. This could mean that the bottleneck for framerate in Yandere Simulator isn't the CPU or GPU, but I don't think so because that could just be a wine bug. and not an issue with Yandere Simulator. I quickly realized that I would be unable to do anything but speculation unless I could run Unity's profiler on the game. In order to do that, I would need the full Unity project, so I kindly asked YandereDev for the source code.

Unfortunately, he declined, and he was concerned when I mentioned decompiling the game and sort of implied that he didn't want me to do it. So I did it anyway. After about five days of performing a dark magic ritual known only to the compiler gods, I had a working Unity project for Yandere Simulator.

I think I'm the first person to ever get Yandere Simulator running inside of the Unity editor. Of course, it's not perfect. Most of the shaders don't work.

There's a little bit of z-fighting here and there. Who cares? Now, before you ask, no, I can't redistribute this. Sorry, modders.

After running the profiler, I was pleasantly surprised to find that most of the time was being spent rendering. More specifically, drawing opaque geometry was taking the most amount of time. Changing to wireframe mode, we can see a lot of geometry.

Like, a lot. On the students in particular. Since these models were sourced from volunteers, I would be willing to bet that most of the artists aren't aware of best practices optimizing assets for games. I won't pretend to know either, but I do know that the more vertices there are, the worse performance becomes.

For example, the time to compute animations depends on how many vertices each bone affects. Yandere Simulator's assets need to go through a major optimization sweep. with a particular focus on how much detail needs to be present given the context in which the asset is used. For example, the school perimeter wall is partly optimized.

Looking from above, we can see the wall has zero thickness. So this means it should just be one quad since you only need four vertices to make a flat rectangle. But at the top of the wall, there's additional unnecessary geometry.

Additionally, the fancy pillars on the walls have triangles that are never visible to the player. Optimized assets would not only improve frame rate, but it would also improve load times, memory usage, and game size because the file sizes would be smaller. However, one optimization that is being used is combined meshes.

This means that fewer draw calls are executed. When a build is created, Unity automatically combines meshes that are only used in one scene and are not referenced by any scripts. But this could probably be optimized a bit more manually. In games, LOD refers to the practice of lowering the level of detail in 3D models based on their distance from the camera. Most of the time, this requires artists to make 2-3 alternative, lower-poly models that get progressively less detailed.

There are experimental tools to generate these models automatically, but I wouldn't use anything this experimental on a real project. Actual LOD is not implemented in Yandere Simulator. partially there via the low poly student feature, but this only works for students and it makes it impossible to tell students apart from each other by any other way than their gender.

Honestly, it doesn't make a huge difference to frame rate either. Occlusion culling is where the camera doesn't render objects if they can't be seen, aka occluded, because other objects are in the way. This is basically a must-have in most games. Thankfully, Unity provides occlusion culling as a built-in feature, and Yandere Simulator uses this system.

The heart rate monitor line is rendered using a separate camera and overlaid onto the main camera's render, which is completely separate from the UI camera. It uses a Unity line renderer to draw the line. This is probably fine, although it could probably be remade as a shader, but shaders are hard. The problem is that this camera takes a long time to render a whole lotta nothing. This is probably the result of having its culling mask set to render the default layer.

Everything else is on the default layer, so the heart rate monitor should go on its own layer and have the camera render only that layer. A significant amount of time is also dedicated to physics calculations. There seems to be a significant amount of colliders or triggers that are not in use or do not have an apparent use. For example, the main entrance doorway has a collider above the door that is not possible to hit. This collider only exists to prevent the camera from clipping inside the geometry, so it can be put on its own layer so it only has to check for collisions with the camera.

The profiler says that animation calculations take 2.25 to 2.5 milliseconds each frame when the school scene initially loads. This is partially mitigated by the stop animation script, which disables animations on students that are too far away. However, I was able to find a few students that were still animating even though they weren't visible. Some of the character animation components had culling types set to always animate and not based on renderers.

We'll come back to this when we talk about implementation details. A common theory for why performance is poor in Yandere Simulator is that the scripts that YandereDev writes are inefficient. We already know that this is is mostly not true because a lot of time is spent on rendering. However, a good 8 to 10 milliseconds is being spent running scripts.

Looking at the profiler, we can see that most of the execution time for updating individual monobehaviors is not even that bad. The worst functions are uirect.update and aiBase.update. Uirect is a class from NGUI, a UI framework for Unity, and AIBase is from ASTAR, a pathfinding library for Unity. Neither UI Rect nor AIBase classes were written by YandereDev, although they could probably be used more efficiently.

Let's take a closer look at the UI. I bet there doesn't need to be 1600 UI components active all the time. I noticed a high number of prompt script instances, so I figured that would be a good place to start. Here, The awake function instantiates all of the UI objects required for that prompt. All five of them.

If the button only accepts one button, but if it accepts all four possible buttons, the prompt will instantiate up to 14 UI prefabs and an additional one if the prompt is considered noisy. I reckon we found the culprit. If you're looking at optimizing memory usage, this is the first thing I would investigate.

Looking where the prompts get instantiated, we can see that most of these UI components are already disabled, except for the objects that are named Letter. Fortunately, this is definitely one of the more easy problems to fix, which we'll do later. Next up is the student script, of which there is one call for each of the 85 spawned students. On average, all the calls to studentScript.update in one frame have a total execution time of less than a millisecond. Despite being by far the longest script in the game, the next most intensive scripts also take less than a millisecond.

Which are the prompt scripts, which are used for interacting with objects. The dynamic bones, which are used for hair or anything else that could jiggle, except for tits. Yandere script, which is used for controlling the player.

And the highlighter, which is used for highlighting objects in Yandere Vision. All of the other scripts... take less than 100 microseconds to execute, with the majority of scripts taking less than 10 microseconds to execute on average.

So where is all that time being spent? I made a quick script to count the number of game objects and components in the school scene. There are about 49,300 game objects in the school scene, give or take 100 or so.

On all of the game objects, there are 53,015 components with 318 types of components. of unique components excluding transforms. Not all 318 unique scripts run every frame, but about 200 of them do. Most of them only get called one or two times and take less than 10 microseconds to execute, but with that many scripts the microseconds add up.

The easy thing to do here is to figure out what scripts don't need to run and disable them until they are needed. So far we've only been talking about what happens in the update step. There's also about a third of the script execution time being spent on the late update step.

All the instances of dynamic bone take up 50% of this time and a single UI panel takes up about 40% of this time. The dynamic bone updates make some sense, but the UI updates are really strange. At first, the object that was responsible for this UI panel was called timeless panel, but after messing with it a little bit, the load moved to another object. I was very confused. How could this even be possible?

Looking for answers, I switched to deep profiling and I found that there's about 1500 calls to update transform and update geometry here. Looking at the code for these functions, there isn't any obvious big loops or anything, just matrix math. Switching to the raw hierarchy mode, we can see that even though there are a lot of calls to these functions, Only the first call to update self takes a long time to calculate.

Matrix multiplication is taking a lot of time, and there's a lot of calls to dynamic lists, but after looking at the code, there isn't really much that is inefficient on its own. The next thing we can try is to reduce the number of calls to these functions. One by one, I started disabling the panels to see if the number of calls to update transform would go down.

Strangely, It didn't really go down as much as I would expect it to. Turns out, I was running around in circles, because after I disabled the prompt parent, all of the calls went away. These are the things that actually matter for performance.

There's a lot of speculation on the internet about what is bad code or not performing and most of it is wrong. I'm going to go one by one through each of these criticisms that people seem to cling on to and validate whether or not they're actually true. Note that I'm only focused on performance concerns right now. We'll get into architectural concerns later. By far the most abundant criticism is the lack of switch statements and opting to use lsif chains instead.

I've benchmarked three different scenarios. lsif versus switch case with one integer input and a boolean output and lsif versus switch case with an integer input and a string output and a switch case versus integer to enum typecasting with one integer input and enum output. Each of these scenarios are run for 10 million iterations. The results show that yes, technically, switch cases are in fact faster than L-sif chains, but we had to do that 10 million times in order to see any difference that was statistically significant. And even then, it fluctuates a lot.

Nothing in this game runs 10 million times in one frame. Additionally, for the first scenario where we output a boolean, When the Unity player is built in production mode, the lsif chain gets optimized to a boolean expression, which would be as fast if not faster than a switch statement. Therefore, this point is at best invalid and at worst, naive. A fairly common pattern that is seen in the code is a switch case or lsif chain that converts an integer into an enum or vice versa.

This is completely valid, but But C-sharp allows you to explicitly cast integers into enum values. Again, technically this is faster, but more importantly it's more dynamic, meaning you can add new values to the enum and not have to update the functions manually, and you wouldn't even need functions to convert them in the first place. So this is an architectural problem.

Not a performance problem. By extension, this conclusion also means that every little if statement that gets run every frame does not contribute jack shit to execution time. But it's not best practice, you say? Let's take a minute to talk about this, because I know some of you guys won't watch the full video.

Let's see what it would look like if we use switch cases as much as possible. I painstakingly converted As many of the if statements in student script as possible, without regard whether or not I should, only taking into consideration if I could. After looking at the profiler, we can clearly see that the difference in execution time is not statistically significant.

As for how the code looks, eh, that's subjective. Is it any easier to read? No, not really. In fact, I would argue that it's even harder to read in some places.

If the switch case matches integers, you have to scroll all the way to the tops to see what variable it's comparing. There are several situations that resulted in nested switch statements, which is a big no-no in my opinion. Additionally, some IDEs don't let you fold individual cases, which makes big long switch statements even harder to work with. A great example of where a switch case shouldn't have been used is Here for the logic of report phase for students with the teachers pet persona Already we can see that a switch case doesn't really capture all of the logic required for the report phase But that's not the worst of it. Look at how long this switch statement is Look at how many follow-through cases are required to be functionally equivalent to the else if chain This is because switch statements only work based off of exact matches.

This is also why you shouldn't use floats in switch cases. If it doesn't exactly match any of the cases, it goes to the default case. Hang on, stop writing that comment, stop writing that comment.

Yes, I could have made the default case contain that if report phase is less than 100 statement, like this, but that's another if statement and that would completely undermine those precious performance benefits. And we can't have that. Oh, and a little side note here. If there are enough cases in an lsif chain, it gets optimized into a switch case anyway.

The compiler is smarter than you. This one's actually true. Any frequently used components should be cached as private instance variables. In a similar vein, calls to any form of gameobject.find or or any references to camera.main should also be cached.

So does the code use any of these in the update functions of objects in the school zine? No, not really. Otherwise, we would see it in the profiler.

yandere dev appears to have already taken care of the low-hanging fruit. Another common point is using vector3.cache distance in update functions because calculating the square root is slow. One of the ways to fix this is to use a distance function that does not use square root, like using the square magnitude of the vectors, or using the Manhattan distance.

distance. In embedded systems, ARM processors, and ye olden days of game development, this is true. But on modern CPUs and normal computers, this shit is a myth.

It is a myth. This is no longer true. I specifically tested four different ways of calculating the distance.

Vector 3 dot distance, aka the Euclidean distance, the Manhattan distance, A minus B dot square magnitude and a minus b dot magnitude which is functionally equivalent to the vector 3 dot distance my results showed that there was practically no difference in execution time between any of these most of the time vector 3 dot distance was straight up faster than everything else the The reason that square root operations don't matter anymore on modern desktop CPUs is because the operation is computed with a single CPU instruction. This is the source code for the GCP. compiler and right here in the square root function a single instruction is being used. One instruction, one instruction, one clock cycle.

You can't get faster than that! Let's talk about actual performance concerns. Code that runs every frame should avoid operations and functions that allocate memory as much as possible and if memory allocation is required it should do it all at once.

Here are some examples of stuff that allocates memory. String concatenation. LinnQ expressions. like anything that makes an array, and instantiating and destroying objects.

In a similar vein, code that runs every frame should also cache things that take a long time to process. For example, getComponent, gameObject.find, camera.main, CsharpReflection, and pathfindingResults. In Yandere Simulator, both of these concerns appear to already be taken care of. Now that we're done talking about performance for the time being, let's jump into the game's architecture.

Now, there's no denying that this game is a mess. Even YandereDev thinks so. But in my stream, I focused a little bit too much on cosmetic style things rather than architectural problems. Let's take a look at how Yandere Simulator is structured.

Most of the core important stuff appears to reside in Yandere script, student script, and student manager script. Before we jump into this, I would like to point out how Yandere script and student script are a few thousand lines shorter than the ones I showed on stream. This is a great demonstration that line count doesn't matter.

I used a different decompiler this time around, and this decompiler likes to remove squiggly brackets around one line. if statement clauses. It's also generally more accurate, as you'll see later on when we talk about decompiler artifacts.

Yandere script and student manager scripts are actually completely fine to be the size that they are, given their complexity and the context of the game. Of course, that doesn't mean they can't be improved. Student script is another matter. This component handles the behavior of every person who is not the player, including teachers and the nurse.

For anyone reading the code, this is a little misleading. But more importantly, this also means that all person-specific code gets stuffed into this one massive component. Like I mentioned before, this isn't a performance concern per se, but rather an architectural concern. When each student is initialized, some of their properties are read from streaming assets students.json. This is a good thing because otherwise the student script would be at least 2,000 lines longer.

20 properties times 100 students. Additionally, this file is read by the student manager, and the student manager applies these properties as the students are being instantiated. If this wasn't the case, then each student would have to read from the same file, iterate through the list searching for the matching ID, and finally apply these properties.

For those of you who don't know, file operations are very expensive to perform. To open a file, the programmer has to make a syscall to the kernel, When the OpenSys call is made, the thread stops executing to allow the context switch from user space into kernel space, check permissions and whatnot, and receive a file handler. Of course, the details vary between operating systems.

Anyway, the problem with this approach for students is that any student-specific behavior is muddled and interwoven with generic student behavior. Let's come up with an alternative that would enable the separation of student-specific behavior from generic student behavior. First, let's define what a student needs to do. Actually, we'll call them people instead. In order to function, a person needs to store some state.

More specifically, store the state of their appearance and stats like hair color, health, strength, personality, stuff like that. They also need to store their normal routine. and they need to store the state of their current activity and current objective, like painting or dying. Also, every person needs to be able to complete their daily routine, react to the player committing crimes, react to blood, weapons, dead bodies, die and become a ragdoll when dead. What kinds of people do we have?

We have students with a special rival subtype, a special student council subtype, and a special delinquent subtype. We also have teachers, the nurse, and the coach. Excluding the guidance counselor because she's not a real person.

Even though she has a student script component attached, it's disabled. Now we can start laying the foundations of our new architecture. Let's start with a person's routine. A routine is a chronological list of activities. A person's routine could also change depending on the day of the week.

Activities contain the start time and a method that defines how to complete the activity. The name of the activity is conveyed via the class name. Finally, activities are initialized via their constructor and must implement an onActivityStart method and onActivityEnd method. The advantage of this approach is that it allows us to define unique routines for each person.

If we wanted to, we could also generate random routines, but that's not a design requirement. This will also allow us to separate logic used to update appearance, like determining what animation to play from brain-type logic, you know, the part that gives the illusion of intelligence. Additionally, This will allow us to override the activity at any given time based on the state or based on special event triggers like the player pointing their phone camera at them.

Further building upon this, activities will not modify the person's state directly. Instead, there's a think method and a do method. The think method modifies the state and the do method acts upon the state.

This will allow game logic and other logic for animations and such to be separated. In addition, this will allow activities to be easily converted to take advantage of the new Unity job system, should that be necessary. The job system automatically takes care of offloading processing to other threads, avoiding some overhead involved in spinning up new threads, and providing a nice API. An additional benefit is that we won't ever have to check the state to update the animations every single frame, because we can do that just by checking the state whenever it's committed. Now that we have this foundation set, all of the remaining requirements are pretty trivial to implement.

Reacting to seeing a crime, blood, weapons, dead bodies can be implemented in a special activity that gets executed no matter what the current activity is. This activity would commit state when a person is alerted to trigger an override of the current routine's activity. dying and becoming a ragdoll would stop the execution of all activities for that person. I'm not saying that this architecture is the best possible solution. There is certainly a lot of room for improvement.

For example, this architecture won't work well for activities that require multiple people. This is merely an example of what it could look like instead, instead of what it is currently. This is a fairly object-oriented approach which I know isn't the hot new design paradigm, but given the problem, this felt like the most natural solution. Now we're going to talk about implementation details.

What that means is how specific features are implemented. Some of this comes down to personal preference, but also there were a lot of features that were implemented poorly, and I want to explain precisely why that is. When you make a save in Yandere Simulator, it creates a YandSave file in the same directory where the game is installed, where most of the information is stored. This file is pretty hefty and contains a lot of bloat, and saving to one slot creates a file that is one megabyte large. If you make multiple saves in different slots, this number increases.

Inspecting the contents of these saves reveals that information is stored as JSON. This is not usually a problem. But given the absolute size of these lads, it would be better to use a custom byte format instead. Since JSON is a text-based format, a lot of bytes are wasted on squiggly brackets, quotes, colons, string representations of floats, and property names. In addition, there's a lot of bloat that doesn't need to be saved in the saved data because technically it could be calculated on load based on the state.

For example, camera position. Remember that bit I talked about in the hypothetical architecture where the state of a person is copied before changing it? The same concept could be applied to saving and loading.

But that file isn't the only place where save data is stored. Some save data, like corkboard photos or school atmosphere, seems to be stored using player prefs, a Unity class intended to be used to store the player's preferences, like video options. This is a problem because on Windows, player prefs are stored in the Windows registry. This makes it very hard for players to backup, transfer, or share saves. Interaction button prompts are implemented in a way that is not very dynamic.

The current implementation has a unique instance of prompt script for each interactable object. including any GUI elements required to render the prompt. This is why the prompt script even shows up near the top of the profiler. When a prompt is being interacted with, the interactable object itself checks to see if the circle is filled before triggering the code that's run on interaction. Here's how I would implement this system instead.

Now, this is one of those features that is deceptive. It seems easy to implement on the surface, but its true complexity reveals itself when you actually go to implement it. I would start out by creating a component called Interactable. This component would be placed on any object that needs to be interacted with.

Interactable handles the interaction input from the player and triggers the onInteract event when the circle is filled. For each object that is interactable, a new component is created, or an existing component is used to add an onInteract method that handles the event. Now this This may seem pretty simple, but this is actually a pretty naive implementation so far.

There are a few problems with this. There's no indication to the user that a given object is interactable if it's not in range. And if there's two interactable objects in range, two prompts appear when only the one that the user wants should appear. This is where that hidden complexity starts to reveal itself. The first problem can be solved very easily and in many different ways, so I'll leave that as an exercise to the viewer.

The latter problem is a little more interesting. To fix this, we can add an additional field to the interactable component to indicate whether or not it's the closest. Then the player can calculate the distance between it and the interactables and set which one is closest based on the player's position. After doing a little bit of profiling, we can see that this implementation is very fast, even with hundreds of interactable objects.

but there's still more that we need to do with the system to match what the system in Yandere Simulator does. Since only one prompt needs to be visible at one time, we can save ourselves the trouble of instantiating and destroying the prompts by just having one available and disabling it when it's not needed. While we're at it, we can add little flavor text for what the interaction will do. Finally, we also need to be able to prompt for more than one button at a time.

Let's see how it performs now. It's still super fast! We can have up to 2050 interactable objects and still get 60 FPS, as long as we look in the opposite direction.

This also further proves that vector3.distance is very fast. The very first time I opened the school map, it was incredibly laggy. That's what led me to investigate the implementation of the school map. Here's how it works.

There's an orthographic camera in the center of the map pointing down. The size of the camera's frustum spans the entire map. The camera can show different floors by moving to preset positions on the y-axis.

The camera also defines a short enough clipping distance so that other floors won't get rendered when they don't need to, and it only renders objects in the default and top-down layer according to its culling mask. The benefit of this implementation is that it's very easy to do, and doesn't require a whole lot of maintenance, nor upfront development time. It's actually fairly common development practice. The downside in this scenario is that the default layer is being rendered, and most objects reside in the default layer. Most games have an image they use for the map background instead, so they aren't rendering so many objects.

You might be thinking, oh, I'm making changes to the school map, like, all the time. If I change the map to be a picture instead, I'll have to take a new picture every time I update the map. Thankfully, Unity has a feature where you can write editor scripts that run only in the editor.

These editor scripts allow you to automate most repetitive or tedious tasks, or let you augment your user experience. You could easily compare them to macros. You can make an editor script to render a frame from the map camera to a file. and just have the map render that image.

Then you can just run the script when you build your game, or better yet, automate your build pipeline too so you can include all the steps needed. However, considering that the map is not open every frame, this is probably not an issue, but it could probably be improved with proper LOD. There are a bunch of empty classes that just inherit a generic class.

For example, int hash set and int end string dictionary. I thought this was weird at first, but it turns out that this is actually a workaround for a Unity bug that doesn't allow generics to serialize. On top of that, this bug was just fixed in Unity A3.

Yandere Simulator runs Unity version 2019.3. 0.7f1 and it would not be worth upgrading to a buggy pre-release version just for this bug fix. Therefore, this is not a problem.

So I want to preface this by saying this is going to sound a smidge nitpicky. Public fields on components for individual items and arrays are used pretty heavily. A large portion of these cases are used to store a bunch of game objects or components.

Looking at the student script in the inspector can show how this practice has gotten a little bit out of hand. Just look at all this shit. It's really hard to find what I'm looking for when I'm scrolling through this massive list of editable fields.

But this can be solved by a little bit of organization and I think the decompiler messed with the order of the fields a little bit. Now this isn't a bad thing regarding performance. But it is a terrible thing when it comes to maintenance and adding new features. Most of these references are static, and what I mean by that is a person has to manually drag and drop each and every one of these fields to fill them out. Most of these fields are required in order for the component to function.

Instead of creating a new instance of the component normally, it's usually just easier to copy one that you've already set up. But this is pretty error prone. Um.

But wait, there's more. There are copious amounts of hard-coded references to specific indexes in these arrays. This is technically faster than searching a list or using any variation of the built-in GameObject.find functions because its runtime complexity is O.

However, this practice does make it very tedious to work with these components. But I'm not done yet. The majority of these arrays are treated like they are one indexed.

No arrays should be treated like they are one indexed arrays because arrays are zero indexed in C sharp. Now, I actually have a theory as to why this is still the case after all this time. Ultimately, it's probably an artifact of when the code used to be written in JavaScript. JavaScript is a dynamically typed language, and the way it treats arrays is kind of weird.

For context, in JavaScript, objects are treated like dictionaries. You can assign an object's property by just doing it. JavaScript treats arrays like an object that can only accept numbers as the key.

A common issue that novice programmers have is the concept of zero-indexed arrays. A zero indexed array's first element can be accessed using the index 0. A one indexed array's first element starts at the index 1. Many novice programmers gravitate to wanting to use one indexed array's because you start counting at 1. However, most languages use zero indexed arrays and JavaScript is no different. The reason that zero indexed arrays are preferred is that the index refers to the offset, and the offset refers to the offset. from the array's starting memory address.

This started in the C language and patterns arose that were just preferable to work with. Now, JavaScript arrays are zero indexed, but you can put any number you want for the index in an empty array, and JavaScript will fill in undefined for all of the previous elements. Either one of these problems on their own would be fairly easy to fix.

But the problem is that a fix for both of these problems would require a lot of work, because changing the code to treat these arrays as zero-index would also require changing all of the hard-coded references. So how should we write code to avoid these problems? The most obvious tip in this case would be to use zero-indexed arrays, because that's how C-sharp was designed.

But what about all those hard-coded and manually assigned references? The answer is that it depends. For example, in my interaction system demo, notice that I find all the interactable objects in the start method and cache the results in a private instance variable.

In this case, I want to be able to add more interactables without having to mess with something unrelated. Another example, in Yandere Simulator, lots of components require a reference to the player component. In this case, it would be better to just have one reference to the player component in a static class that is globally accessible. It all depends on the specific context of how these references are used, how often they are used, and convenience. This is probably the biggest problem with the game's code.

Looking at YandereScript, there are 130 public booleans, and StudentScript has 196 public booleans. Most of these are dedicated to just keeping track of state. Now, the naive way to fix this is to just put all the booleans in a single enum and call it a day. Unfortunately, we can't do that because sometimes two of these booleans need to be true to properly represent the state. Currently, the part of the student's state that is represented with booleans or enums is character attributes like gender, Whether or not the student is a teacher, whether or not the student is in a club, what club they're in, personality, stuff like that.

The character's appearance, like are they wearing club attire or are they wet? Interpersonal relationship status, whether or not the teacher's trust has been lost, whether or not the student is in a relationship, or whether or not the player has complimented the student, whether or not the student is alerted, and the cause, e.g. whether he heard a noise or something. whether or not the student has witnessed anything and what type of crime was witnessed, whether or not the student is suffering and the cause, like bleeding or poison, the current action being performed, like eating, changing clothes, dying, and the current objective of the action, like turning off a radio or apprehending the player.

We can already see that this is Pretty complicated, and I'm sure there's stuff that I missed. As previously demonstrated, in C-sharp you can assign numbers to enum values and cast between them at ease. The side effect of this is that you can combine enum values with bitwise operators. Let me explain what this means. Okay, so originally I did like a really long tutorial thing here, and it was it was it was way too long and boring, so instead I'll do it really fast.

You can assign values to the enum that are powers of two to make each enum value correspond to a bit in their binary representation. You can use this to do bitwise operators on the enum values so you can represent two enum values at once by setting the bits in the number. To make this a little bit easier to think about, you can use bitshift operators to assign the values to the enum. Now we can apply this to the witness state.

Currently, the witness state is stored in an enum called student witness type. And it looks like this. Instead of having values like blood and insanity, we can assign each one of these values to their own bits and use a value of blood, bitwise, or insanity. Honestly, the way that the state is currently represented doesn't exactly transfer very easily to enums, despite what everyone really wants to believe. The current implementation doesn't exactly lend itself to what it's trying to accomplish that well.

So instead of trying to represent the current state better, let's redesign how students observe and react to the world, building on top of our hypothetical architecture from before. Our hypothetical architecture actually Pretty much solves this problem. Most of the student state can be inferred by the action. For example, we can infer that a person is in a heightened alert phase if their current activity is investigating a noise, and we can infer that a person has witnessed a crime if their current activity is reporting the crime.

Like I mentioned before, The profiler says that animation calculations take 2.5 to 2.5 milliseconds each frame when the school scene first loads. My benchmarks show that the more complex your animation systems get, the faster the current animation system, Mecanim, becomes over the legacy animation system. However, the legacy animation system is way faster for simple animations. I'm honestly not sure how big of a performance benefit this would be though. in the context of Yandere Simulator.

More importantly, a common pattern in Yandere Simulator's code is that first it plays an animation, then after a certain amount of time passes, code gets run to update the state or other game objects. In my opinion, it would be a lot cleaner and easier to work with to use event triggers in the animation clips to run code instead. Just to make sure this works in both the Legacy System and Mecanim, I even tested it out with this cutie I found on the Asset Store. God, look at that cutie. Multi-threading in games is really hard, and if you aren't doing it from the start, it's even harder.

Fortunately, Unity has been putting a lot of work into new systems to improve performance in Unity's games by default called DOTS, or the Data Oriented Technology Stack. Unfortunately, to take advantage of this new system in Yandere Simulator, practically the entire game would have to be rewritten because the ECS, Entity Component System, is fundamentally different from how Unity's Classic Component System works. However, I wouldn't recommend using features that are still in preview, but I think the performance and architectural improvements would be beneficial to this game in particular. This may warrant using a different engine.

or using a custom engine. However, the new job system for DOTS is able to be fairly easily implemented into existing projects. The job system lets you really easily send stuff to another thread to be processed.

The catch is that you cannot directly update any Unity objects from the jobs, and you can only pass in data that is blittable, meaning that it has the same representation in memory in both manage C-sharp code, and behind-the-scenes Unity native code, so it doesn't need a conversion. You have to put all the data you want to work with in a struct, pass it to the job, and get the result, and then reapply your results to the game objects and components. There are some scripts that are great candidates for this kind of an upgrade.

The UI library, NGUI, could probably be rewritten to take advantage of jobs. The pathfinding library, A-star, which is... already multi-threaded, would be a perfect candidate to rework to use the job system.

AIPath, which inherits AIBase, is used to move objects along a path that has already been calculated. It's a perfect candidate to be refactored into a parallel for transform job, which is specifically made for moving a bunch of objects in jobs. By far the best candidate is the dynamic bone component. because all of the data that it works with is already in its own class called particle.

Unit tests are little pieces of code that make sure things work as they should. The purpose of these automated tests is to make sure you don't introduce new bugs or reintroduce old bugs by accident. Unit testing is really uncommon in game development.

I think this is really unfortunate because Adequate testing has caught so many bugs for me in my projects before they reach the end user. Unfortunately, most of Yandere Simulator's components change the state directly, and this makes them pretty difficult to test. I saved the best for last. Let's take a look at how the FPS is calculated. Frame rate script calculates the FPS over an interval of 500 milliseconds.

This does not properly account for spikes in FPS. As a result, the FPS that is displayed is a lot lower than what is actually being rendered. There's a very good reason most games display the current, average, minimum, and maximum frame rate.

Doom is an excellent overkill example. Funnily enough, if another hour or so were spent on the frame rate script, It probably would have saved him like a few years of grief. For the sake of being as comprehensive as possible, I'm going to go over what is and what isn't a decompiler artifact, and some general tips for writing better code. Is string.concat a compiler optimization? Yes.

Only A plus B string concatenation and string interpolation is affected. String.format is not affected. Why are there basically zero constants and tons of hard-coded values?

Well, that's because the compiler resolves all constants to their literal values at compile time. This means that every constant is basically find-replaced with their value. For example, there is a class called animNames that contains all of the animation names as constants. This is actually good practice because hard-coded strings that are used to affect program flow are a really bad idea.

You don't want to be debugging a typo for hours. Why the hell is everything explicitly cast? The decompiler did that. I don't know why.

You could implicitly cast them and it would be no different. What's up with these really precise float literals used in these comparisons? Floating point numbers have to be stored in a fixed amount of bytes.

Specifically, single precision floats are 4 bytes long and double precision floats are 8. 8 bytes long. In order to fit these numbers into these bytes, IEEE used black fucking magic back in 1985 to cram as much information into these bytes as possible, and this standard is called the IEEE 754 floating point standard. Unfortunately, the ritual used to pull this standard out of some point near its ass wasn't perfect. As a result, only seven places after the decimal can be represented for a single precision.

and 15 places for double precision. Additionally, the numbers can get rounded, as demonstrated here. These are called floating point rounding errors, and they are 100% unavoidable. Are string comparisons slower than comparing other types?

Technically, yes. But, like if statements versus switch statements, it doesn't really matter. In C-sharp, strings get hashed into an integer and then get compared.

And this is a lot faster than checking each character in the string. This is also how dictionaries, where the key is a string, work. Why?

Why the hell are there so many goddamn inline ifs? They're so hard to read! I think you can probably guess what I'm going to say.

It's a decompiler artifact. They're just, they're functionally equivalent to if statements anyway, it doesn't matter. There are a bunch of while and for loops that reference the instance variable this.id instead of normal for loops with a local variable. This is actually not a decompiler artifact.

The decompiler can recreate both using an instance variable and using a local variable accurately. This means that this must be intentional. I don't know why you would do this. Why is this loop a while or a for loop? It should be the other one.

The.NET intermediate language does not differentiate between while loops and for loops. So the decompiler has to guess which one it's supposed to be based on the context. Since we have the leaked source code now, we can actually criticize style. I'm going to go through this very quickly because, while I don't know why you would torture yourself, everything that I'm going to display here is completely valid and purely cosmetic.

Meaning it does not affect instead of using a local variable in for loops, seemingly random concatenation of string literals, ridiculously deep indentation, student globals.memorial students should be an array, it's just easy to work with. Again, why would you torture yourself? Remember, creating new arrays allocates memory, not using existing ones.

Comments that describe what if statements mean when the conditionals should be able to explain themselves without comments. Using the Unity Editor compiler flag to disable Osana instead of using a custom compiler flag. This is bad practice because if you want to create a standalone build with Osana enabled, you have to manually edit the code. This just makes it take longer for Osana to come out.

So what does this all mean anyway? Yandere Simulator is a perfect textbook case of how technical debt influences a project. Technical debt is an industry term that refers to engineering decisions that are really quick and easy to implement now, but may cause more time to be wasted later on. Over the years, it's felt like development has grinded to a screeching halt, and after really reading the code and really digging into it, I can see why. Looking at different parts you can kind of tell what YandereDev was thinking.

There are parts that actually make a lot of engineering sense given the architecture, but a critical aspect of managing technical debt is knowing when to refactor. It becomes kind of a sixth sense that you only gain from experience. The game's architecture needed to be refactored way back in 2016. Now, especially with Osana being functionally complete, that's not a good time to be refactoring.

not really an option. Good luck Yandere Dev, you're gonna need it. If you made it this far, thank you so much for watching. It really means a lot that people care what I have to say.

It's pretty amazing how my Yandere Simulator stream blew up. I want to thank everybody for over a thousand subscribers here on YouTube, 150 followers on Twitch, and almost 100 followers on Twitter. It really means a lot to see all the support that I've gotten recently.

Subscribe for more stuff. I'll see you guys next time. Fucking subscriber he gets it my editor gets it

Transcript for:Yandere Simulator Performance Optimization Review

Transcript for:
Yandere Simulator Performance Optimization Review