Transcript for:
Terminal-Based YouTube Player

  • peaceful morning sounds * * alarm ringing * Oh man, what a great day   to be watching some youtube videos. Just let me   get my laptop and fire up Chrome - HE-E-E-E-EL-- Oh man, almost forgot to plug in my extra 2TB of   RAM.. Silly me, how could I expect my browser  to function without the minimum requirements? Ok now let’s see what my favorite  youtubers prepared for me today – oh look,   a new MrBeast video, I just have to see those  5 and 75 year olds duke it out for 100 grand... All of us have probably experienced the  horrors beyond human comprehension that   come with a classic Youtube  session, what do you say if,   by taking this nigh infinite potential of the  platform, while getting rid of all the fluff,   we use it in the way it’s always been intended  – running the entire thing from your terminal:   no fancy useless UI, no fancy useless  features and even no fancy useless graphics.  That’s right, we’re going the OG programmer  way and ditching all that progress made in the   last decades, running this entire thing  exactly how its creators envisioned it. What I started with was kind of the central  point of the app, meaning playing videos. In this first iteration, everything  was done on a single thread,   the program simply allowing for someone to  specify a video and how it should be played. And in case the words “first iteration”,  or “single thread” didn’t tip you off,   this was of course reworked a healthy number of   times throughout the development of the  project, but the central ideas remained. The entry point is a series of command  line arguments, namely a URL for a video,   a scaling factor, meaning at what percent of the  original size should the videos be displayed,   relative to the CLI dimensions,  and finally, a max width argument,   as by default, the video would fit the  entire available width of the terminal. First, using the yt_dlp python library, I get  the URL for the video stream to load frames from,   which is then used to read each frame in  sequence. For each frame a callback is called,   which takes that frame, converts  it into an ASCII representation,   which is then printed it to the screen,  refreshing the console each time. The bread and butter of this whole thing is, of  course, the frame-to-ASCII converter. It works   by taking the original frame, converting it into  a grayscale image, calculating the appropriate   dimensions - starting with the width, based on  the scaling factor and the max width allowed,   and the height, by dividing the new width  by the original frame’s ratio – and then   resizing the grayscale image to its new  dimensions. As for the actual conversion,   it traverses the entire image  by pixel groups, meaning that,   based on the scaling factor, I can represent  multiple pixels with a single character. And to get that character I need to calculate  the mean value of the pixel group and divide   it by 255, as that’s the max value of intensity a  pixel can reach: 0 meaning black and 255 – white.  Then I take that normalized value and multiply  it with the length of a string of characters,   to get the index of the character matching  the desired intensity. That string is simply   a sequence of characters, ordered from high  to low based on the contrast they provide. And by putting these together,   I can simply specify a URL for my video  of choice and voila – it works perfectly or does it? Instead of directly stating it,   how about we run the app a bit and  see if you can spot the iffy parts. Ok so the video is playing - that’s  good. But what if I were to grab this   side of the terminal window and move it around? Oh – oh, well that’s not too pretty now is it. But it’s a simple fix really -  mostly because there isn’t a more   advanced mechanism available that is also  guaranteed to work on all CLIs out of the   box - I simply have to poll for the current  dimensions of the CLI on each frame render,   updating the internal state, used  when processing the frame accordingly. Well now that certainly works,   and normally you’d expect that once you  solve a problem, it stays solved, right? or does it? * hey, Vsauce, Michael here * Just for testing purposes, what  if I were to again grab that   side of the terminal and start resizing it? Well it’s certainly scaling properly that’s   for sure, but it’s also doing something extra. You see, the FPS is now dependent on the frame   size, the larger the window is,  the lower the fps and vice-versa. And if you take a moment to think about it, it  makes perfect sense, of course a larger frame   takes a lot longer to process than a smaller one,  and below a certain point, they end up processed   faster than what’s displayed in a normal video  player, as the video stream has no idea what a   framerate is - the moment you ask for a frame,  it provides it, no further questions asked. But this is also fixable: in addition to the  stream URL, I need to also get the associated   target framerate, based on which I calculate the  minimum interval a frame should take to render,   and if processing takes less than that, just  wait for the cycle to complete. As for the case   when it’s slower than the target framerate,  well... what can I say except skill issue? Now for the more critical issues. Currently  you can simply stream a video to your terminal,   but Youtube offers a tiiiny bit  more functionality than that,   mainly due to the fact it  accepts inputs from the user. Well, currently the entire  program runs on its main thread,   so unless someone wants to send a key input  whenever they want to render to the next   frame, it’s high time I moved this  rendering part to a separate thread. This simply means I have to move the  rendering loop from the RenderingManager   to the VideoStreamHandlerThread’s run  method, which, upon starting the thread,   begins parsing the stream of frames, calling a  provided callback function for each. Additionally,   I’ll add some extra utilities and control  variables to allow for the thread to be   aware of the rendered video’s state,  especially whether it’s playing or not. On the CliManager side of things,  I can now add an infinite loop   constantly awaiting input from the user. And to have that blocking method of waiting   for new input, I wanted to make use of a  well-known function, especially by you,   C programmers out there: getch().  This waits until an input character   is provided and then captures it.  It’s a fundamental building block,   so I of course expected it to come prepackaged  with the language in a simple and unified manner. Now imagine my suprise when I had to come up  with this beauty to get it running properly,   as depending on the OS, the ways in  which it works are quite different. Also,   because it turned out to be too good at its job,  so much so that it even captured ctrl-C or ctrl-Z,   I had to wrap the implementation  to handle those special cases. But by hooking up this handler for the  user input with the methods to control   the video state, I can pause and unpause  the player with the press of a button. Finally, to make this feel more like a youtube  page and not a random video player, I’ll add a   few improvements to the DisplayManager, where, by  using some extra metadata from the video, it can   display a status bar, with both the indicator on  whether the video is playing or not, as well as   a playback bar, showing the percentage elapsed  – which, I forgot to mention, I get by keeping   count of the index of the currently displayed  frame, which I divide by the total frames in the   video - and finally, the creator's username,  along with a nicely formatted view count. Now, I could definitely stop here and  call it a day, buut Youtube is more than   a place where you load a video, watch  it, then shut down the whole thing. So,   in order to prepare for this great expansion,  a great deal of refactoring was long due. Sparing you the boring details, I moved  around functions to where they belonged,   and ended up using quite a few callbacks, to  keep everything as loosely coupled as possible. Worth noting is the DisplayManager, where instead  of an entry function that directly outputs to the   terminal, the class now manages a series of  screens, each having custom rendering methods,   while making use of a unified method for writing  the output, making it so that whenever a render   is requested the DisplayManager simply calls the  render function for the currently active screen. Continuing with the DisplayManager  fixation, I implemented a menu navigation   bar, and made sure it properly  realigned when resizing the window,   so I can prove anyone can keep up with the  extremely reasonable responsiveness standards   of today. Also I made sure the video gets paused  automatically when changing to a different screen. But before going into the implementation  of those screens I mentioned before,   I realized I need to have a way of  actually getting the data I needed. I pondered on whether I should approach it in  the same way I did with the Using Youtube As   A Cloud Storage Service video, but considering  that using an automated WebDriver like Selenium   is the same as having my app be just a wrapper  over a browser instance, I moved onto greener   pastures, and chose the obviously superior  alternative that is the official Youtube   API provided by the ever trustworthy Google. You’ll see why I’m singing them such praises   in just a bit, when I get to what I had to do  when implementing the screens, just you wait... I also thought it would be neat if, in  addition to the base API utilities you   get when using your API key, to also have it so  users can log in using their Google accounts,   for that extra dash of customization potential. Fortunately implementing these was  actually pretty easy – first I have to   go to the Google Gloud Console, create  my project and – oh wait what’s this,   the project for the Compiler Running on Paper  I made a few videos ago, what a coincidence,   but anyway – then enable the Youtube Data API,  along with setting the required scopes for   the OAuth consent screen, and finally  download the credentials config file. Then, after going back to my trusty  IDE, I can create an AuthManager,   which checks for any local configs, and  if present, loads them, otherwise it   prompts the user to log-in by beginning  a simple OAuth flow, where, on success,   it creates that config file to be used in  the future. All of these were really easy to   do mostly because it’s a really simple flow, and  Google really excels at optimizing the 80%. Also,   if I were to make this an actual product,  I’d probably load that credentials config   file from a remote server instead of locally,  because of, you know, insane safety concerns. But with a working OAuth flow, I  can work on the screens themselves,   making use of the newly  unlocked Youtube API skill tree. A YoutubeConnectionManager handles a series of  internal states, corresponding to the list of   videos associated with each screen. This service  exposes APIs for initializing and navigating   through the list of available videos, while  making sure new results are fetched in advance. For the states themselves, I have it so  they all extend a YoutubeStateBase class,   as most of the logic is common, except  for the YoutubeAPI endpoints used,   along with their specific parameters. First, I’ll add some handlers for navigating  back and forth through the list of videos,   as well as creating a util function that fetches  new results when either the state is initialized   or when the current video index reaches a  certain percentage relative to the number   of available videos, triggering a new fetch  in the background, using a custom thread. This util function basically  takes the method that is set   when extending the base class and calls  it with the values of the parameters. Now, for the individual implementations,  I’ll start with the Home page. You see, I named it the "Home page" as  I wanted it to be the equivalent to the   Home page on your browser app. Thing is the  secretive fellas at Youtube don’t want to   give away access to an endpoint for fetching the  videos that would appear on a user’s Home page,   probably in fear that it could give away their  golden egg that is “the internal algorithm”. So,   I’ll have to settle for the Home page  being a list of the most popular videos. As for the Related page state - which you  might notice doesn’t exist in the final   version - I thought it would be nice to get the  list of related videos for the currently playing   video. And it all seemed to work together nicely.  Intuitively I might say. Until I actually had to   run it. But I realized the relatedToVideoId  parameter wasn’t actually valid, so I thought   “oh maybe I messed up its name or something", so  I had to go straight to the source, where I found   that a rather big change did indeed happen with  the parameter, namely it got completely removed.  So, until last summer you could do this without  problem, but now it’s too much, really Youtube?  And there was no alternative to it, probably again  to protect their secret Krabby Patty formula,   so I did as any good programmer does and pretended  I never actually wanted that feature to exist,   swept any memory of it under a rug, and  repurposed that page to a creator page,   so that if a user watches a video, they  can navigate to the video creator’s   page – quite the masterful gambit in  how I handled that one dare I say. Finally, the Search page state  was, fortunately, without hiccups,   as the crazy bastards actually let me use  a core functionality of their platform. Next, I moved onto their actual renders in the   terminal. It was pretty much  the same thing for all three.  First, convert the thumbnail into its  ASCII representation, add the title,   then then creator’s name. I chose not to  display the view count, not out of spite,   but because for some reason you can’t retrieve  the statistics section when querying using the   "search" endpoint, and instead are forced to make  a query for each individual video, which is not   something I’m keen on. What’s weird though is  that you can do this for the "videos" endpoint.  But for the sake of consistency, I’m  not doing it for that one either. By also linking up the new inputs, you can now  navigate to the next or previous video and also   select the current video, which will trigger  the switch to the video screen to load it. And although I said these were pretty much  identical, there are two main differences: First, the Creator screen also  displays the creator’s name at the top,   which is just an extra string so no big deal. And secondly, the Search page  has a, you know, search bar. This is done using a separate  SearchBarStateManager, that stores the current   query, and whether the search bar is active or  not. To set it as active, you have to press the   S key while on the Search page, after which all  inputs are directed to it, except for Escape   and Enter which set is as not active, in addition  to querying for the videos in the case of Enter. And now you’d think “and  with that it’s done”, right? IF YOU WEREN’T PAYING ATTENTION THAT IS. Quickly, what have I missed or did only halfway? Here, I’ll give you some quick hints. First, remember about me bragging about this  entire thing being responsive? Well, this is what   happens if I resize the terminal on a screen other  than the video one – not too pretty now is it? At least it’s a pretty simple fix –  move that constant polling for the   CLI dimensions to its separate thread, and if  changes happen, signal them using a callback. Secondly, you know how I told you about  integrating OAuth to customize the   experience for the user, and then  proceeded to not use any of that,   as the YoutubeAPI doesn’t even offer a  home page or recommendations anymore? Well, as to not have this entire feature be  useless, I have it so now you can like and   subscribe, just like you can do with this  video you’re currently watching (wink). Also, a pretty big thing, or rather tall, is  the fact that the rendered ASCII images are   stretched vertically since characters in a font  are generally taller than they are wider. So, to   counteract this, I also took account of the aspect  ratio of the characters when calculating the   aspect ratio of the image, allowing the user to  specify a custom value depending on their config. Finally, I’ll extend the list of ASCII characters   to be used when rendering the frames,  to have a more accurate conversion. But this should pretty much mark the finish  line for this project – I mean just look at it.