Terminal-Based YouTube Player

peaceful morning sounds * * alarm ringing * Oh man, what a great day to be watching some youtube videos. Just let me get my laptop and fire up Chrome - HE-E-E-E-EL-- Oh man, almost forgot to plug in my extra 2TB of RAM.. Silly me, how could I expect my browser to function without the minimum requirements? Ok now let’s see what my favorite youtubers prepared for me today – oh look, a new MrBeast video, I just have to see those 5 and 75 year olds duke it out for 100 grand... All of us have probably experienced the horrors beyond human comprehension that come with a classic Youtube session, what do you say if, by taking this nigh infinite potential of the platform, while getting rid of all the fluff, we use it in the way it’s always been intended – running the entire thing from your terminal: no fancy useless UI, no fancy useless features and even no fancy useless graphics. That’s right, we’re going the OG programmer way and ditching all that progress made in the last decades, running this entire thing exactly how its creators envisioned it. What I started with was kind of the central point of the app, meaning playing videos. In this first iteration, everything was done on a single thread, the program simply allowing for someone to specify a video and how it should be played. And in case the words “first iteration”, or “single thread” didn’t tip you off, this was of course reworked a healthy number of times throughout the development of the project, but the central ideas remained. The entry point is a series of command line arguments, namely a URL for a video, a scaling factor, meaning at what percent of the original size should the videos be displayed, relative to the CLI dimensions, and finally, a max width argument, as by default, the video would fit the entire available width of the terminal. First, using the yt_dlp python library, I get the URL for the video stream to load frames from, which is then used to read each frame in sequence. For each frame a callback is called, which takes that frame, converts it into an ASCII representation, which is then printed it to the screen, refreshing the console each time. The bread and butter of this whole thing is, of course, the frame-to-ASCII converter. It works by taking the original frame, converting it into a grayscale image, calculating the appropriate dimensions - starting with the width, based on the scaling factor and the max width allowed, and the height, by dividing the new width by the original frame’s ratio – and then resizing the grayscale image to its new dimensions. As for the actual conversion, it traverses the entire image by pixel groups, meaning that, based on the scaling factor, I can represent multiple pixels with a single character. And to get that character I need to calculate the mean value of the pixel group and divide it by 255, as that’s the max value of intensity a pixel can reach: 0 meaning black and 255 – white. Then I take that normalized value and multiply it with the length of a string of characters, to get the index of the character matching the desired intensity. That string is simply a sequence of characters, ordered from high to low based on the contrast they provide. And by putting these together, I can simply specify a URL for my video of choice and voila – it works perfectly or does it? Instead of directly stating it, how about we run the app a bit and see if you can spot the iffy parts. Ok so the video is playing - that’s good. But what if I were to grab this side of the terminal window and move it around? Oh – oh, well that’s not too pretty now is it. But it’s a simple fix really - mostly because there isn’t a more advanced mechanism available that is also guaranteed to work on all CLIs out of the box - I simply have to poll for the current dimensions of the CLI on each frame render, updating the internal state, used when processing the frame accordingly. Well now that certainly works, and normally you’d expect that once you solve a problem, it stays solved, right? or does it? * hey, Vsauce, Michael here * Just for testing purposes, what if I were to again grab that side of the terminal and start resizing it? Well it’s certainly scaling properly that’s for sure, but it’s also doing something extra. You see, the FPS is now dependent on the frame size, the larger the window is, the lower the fps and vice-versa. And if you take a moment to think about it, it makes perfect sense, of course a larger frame takes a lot longer to process than a smaller one, and below a certain point, they end up processed faster than what’s displayed in a normal video player, as the video stream has no idea what a framerate is - the moment you ask for a frame, it provides it, no further questions asked. But this is also fixable: in addition to the stream URL, I need to also get the associated target framerate, based on which I calculate the minimum interval a frame should take to render, and if processing takes less than that, just wait for the cycle to complete. As for the case when it’s slower than the target framerate, well... what can I say except skill issue? Now for the more critical issues. Currently you can simply stream a video to your terminal, but Youtube offers a tiiiny bit more functionality than that, mainly due to the fact it accepts inputs from the user. Well, currently the entire program runs on its main thread, so unless someone wants to send a key input whenever they want to render to the next frame, it’s high time I moved this rendering part to a separate thread. This simply means I have to move the rendering loop from the RenderingManager to the VideoStreamHandlerThread’s run method, which, upon starting the thread, begins parsing the stream of frames, calling a provided callback function for each. Additionally, I’ll add some extra utilities and control variables to allow for the thread to be aware of the rendered video’s state, especially whether it’s playing or not. On the CliManager side of things, I can now add an infinite loop constantly awaiting input from the user. And to have that blocking method of waiting for new input, I wanted to make use of a well-known function, especially by you, C programmers out there: getch(). This waits until an input character is provided and then captures it. It’s a fundamental building block, so I of course expected it to come prepackaged with the language in a simple and unified manner. Now imagine my suprise when I had to come up with this beauty to get it running properly, as depending on the OS, the ways in which it works are quite different. Also, because it turned out to be too good at its job, so much so that it even captured ctrl-C or ctrl-Z, I had to wrap the implementation to handle those special cases. But by hooking up this handler for the user input with the methods to control the video state, I can pause and unpause the player with the press of a button. Finally, to make this feel more like a youtube page and not a random video player, I’ll add a few improvements to the DisplayManager, where, by using some extra metadata from the video, it can display a status bar, with both the indicator on whether the video is playing or not, as well as a playback bar, showing the percentage elapsed – which, I forgot to mention, I get by keeping count of the index of the currently displayed frame, which I divide by the total frames in the video - and finally, the creator's username, along with a nicely formatted view count. Now, I could definitely stop here and call it a day, buut Youtube is more than a place where you load a video, watch it, then shut down the whole thing. So, in order to prepare for this great expansion, a great deal of refactoring was long due. Sparing you the boring details, I moved around functions to where they belonged, and ended up using quite a few callbacks, to keep everything as loosely coupled as possible. Worth noting is the DisplayManager, where instead of an entry function that directly outputs to the terminal, the class now manages a series of screens, each having custom rendering methods, while making use of a unified method for writing the output, making it so that whenever a render is requested the DisplayManager simply calls the render function for the currently active screen. Continuing with the DisplayManager fixation, I implemented a menu navigation bar, and made sure it properly realigned when resizing the window, so I can prove anyone can keep up with the extremely reasonable responsiveness standards of today. Also I made sure the video gets paused automatically when changing to a different screen. But before going into the implementation of those screens I mentioned before, I realized I need to have a way of actually getting the data I needed. I pondered on whether I should approach it in the same way I did with the Using Youtube As A Cloud Storage Service video, but considering that using an automated WebDriver like Selenium is the same as having my app be just a wrapper over a browser instance, I moved onto greener pastures, and chose the obviously superior alternative that is the official Youtube API provided by the ever trustworthy Google. You’ll see why I’m singing them such praises in just a bit, when I get to what I had to do when implementing the screens, just you wait... I also thought it would be neat if, in addition to the base API utilities you get when using your API key, to also have it so users can log in using their Google accounts, for that extra dash of customization potential. Fortunately implementing these was actually pretty easy – first I have to go to the Google Gloud Console, create my project and – oh wait what’s this, the project for the Compiler Running on Paper I made a few videos ago, what a coincidence, but anyway – then enable the Youtube Data API, along with setting the required scopes for the OAuth consent screen, and finally download the credentials config file. Then, after going back to my trusty IDE, I can create an AuthManager, which checks for any local configs, and if present, loads them, otherwise it prompts the user to log-in by beginning a simple OAuth flow, where, on success, it creates that config file to be used in the future. All of these were really easy to do mostly because it’s a really simple flow, and Google really excels at optimizing the 80%. Also, if I were to make this an actual product, I’d probably load that credentials config file from a remote server instead of locally, because of, you know, insane safety concerns. But with a working OAuth flow, I can work on the screens themselves, making use of the newly unlocked Youtube API skill tree. A YoutubeConnectionManager handles a series of internal states, corresponding to the list of videos associated with each screen. This service exposes APIs for initializing and navigating through the list of available videos, while making sure new results are fetched in advance. For the states themselves, I have it so they all extend a YoutubeStateBase class, as most of the logic is common, except for the YoutubeAPI endpoints used, along with their specific parameters. First, I’ll add some handlers for navigating back and forth through the list of videos, as well as creating a util function that fetches new results when either the state is initialized or when the current video index reaches a certain percentage relative to the number of available videos, triggering a new fetch in the background, using a custom thread. This util function basically takes the method that is set when extending the base class and calls it with the values of the parameters. Now, for the individual implementations, I’ll start with the Home page. You see, I named it the "Home page" as I wanted it to be the equivalent to the Home page on your browser app. Thing is the secretive fellas at Youtube don’t want to give away access to an endpoint for fetching the videos that would appear on a user’s Home page, probably in fear that it could give away their golden egg that is “the internal algorithm”. So, I’ll have to settle for the Home page being a list of the most popular videos. As for the Related page state - which you might notice doesn’t exist in the final version - I thought it would be nice to get the list of related videos for the currently playing video. And it all seemed to work together nicely. Intuitively I might say. Until I actually had to run it. But I realized the relatedToVideoId parameter wasn’t actually valid, so I thought “oh maybe I messed up its name or something", so I had to go straight to the source, where I found that a rather big change did indeed happen with the parameter, namely it got completely removed. So, until last summer you could do this without problem, but now it’s too much, really Youtube? And there was no alternative to it, probably again to protect their secret Krabby Patty formula, so I did as any good programmer does and pretended I never actually wanted that feature to exist, swept any memory of it under a rug, and repurposed that page to a creator page, so that if a user watches a video, they can navigate to the video creator’s page – quite the masterful gambit in how I handled that one dare I say. Finally, the Search page state was, fortunately, without hiccups, as the crazy bastards actually let me use a core functionality of their platform. Next, I moved onto their actual renders in the terminal. It was pretty much the same thing for all three. First, convert the thumbnail into its ASCII representation, add the title, then then creator’s name. I chose not to display the view count, not out of spite, but because for some reason you can’t retrieve the statistics section when querying using the "search" endpoint, and instead are forced to make a query for each individual video, which is not something I’m keen on. What’s weird though is that you can do this for the "videos" endpoint. But for the sake of consistency, I’m not doing it for that one either. By also linking up the new inputs, you can now navigate to the next or previous video and also select the current video, which will trigger the switch to the video screen to load it. And although I said these were pretty much identical, there are two main differences: First, the Creator screen also displays the creator’s name at the top, which is just an extra string so no big deal. And secondly, the Search page has a, you know, search bar. This is done using a separate SearchBarStateManager, that stores the current query, and whether the search bar is active or not. To set it as active, you have to press the S key while on the Search page, after which all inputs are directed to it, except for Escape and Enter which set is as not active, in addition to querying for the videos in the case of Enter. And now you’d think “and with that it’s done”, right? IF YOU WEREN’T PAYING ATTENTION THAT IS. Quickly, what have I missed or did only halfway? Here, I’ll give you some quick hints. First, remember about me bragging about this entire thing being responsive? Well, this is what happens if I resize the terminal on a screen other than the video one – not too pretty now is it? At least it’s a pretty simple fix – move that constant polling for the CLI dimensions to its separate thread, and if changes happen, signal them using a callback. Secondly, you know how I told you about integrating OAuth to customize the experience for the user, and then proceeded to not use any of that, as the YoutubeAPI doesn’t even offer a home page or recommendations anymore? Well, as to not have this entire feature be useless, I have it so now you can like and subscribe, just like you can do with this video you’re currently watching (wink). Also, a pretty big thing, or rather tall, is the fact that the rendered ASCII images are stretched vertically since characters in a font are generally taller than they are wider. So, to counteract this, I also took account of the aspect ratio of the characters when calculating the aspect ratio of the image, allowing the user to specify a custom value depending on their config. Finally, I’ll extend the list of ASCII characters to be used when rendering the frames, to have a more accurate conversion. But this should pretty much mark the finish line for this project – I mean just look at it.

Transcript for:Terminal-Based YouTube Player

Transcript for:
Terminal-Based YouTube Player