peaceful morning sounds * * alarm ringing *
Oh man, what a great day to be watching some youtube videos. Just let me get my laptop and fire up Chrome - HE-E-E-E-EL--
Oh man, almost forgot to plug in my extra 2TB of RAM.. Silly me, how could I expect my browser
to function without the minimum requirements? Ok now let’s see what my favorite
youtubers prepared for me today – oh look, a new MrBeast video, I just have to see those
5 and 75 year olds duke it out for 100 grand... All of us have probably experienced the
horrors beyond human comprehension that come with a classic Youtube
session, what do you say if, by taking this nigh infinite potential of the
platform, while getting rid of all the fluff, we use it in the way it’s always been intended
– running the entire thing from your terminal: no fancy useless UI, no fancy useless
features and even no fancy useless graphics. That’s right, we’re going the OG programmer
way and ditching all that progress made in the last decades, running this entire thing
exactly how its creators envisioned it. What I started with was kind of the central
point of the app, meaning playing videos. In this first iteration, everything
was done on a single thread, the program simply allowing for someone to
specify a video and how it should be played. And in case the words “first iteration”,
or “single thread” didn’t tip you off, this was of course reworked a healthy number of times throughout the development of the
project, but the central ideas remained. The entry point is a series of command
line arguments, namely a URL for a video, a scaling factor, meaning at what percent of the
original size should the videos be displayed, relative to the CLI dimensions,
and finally, a max width argument, as by default, the video would fit the
entire available width of the terminal. First, using the yt_dlp python library, I get
the URL for the video stream to load frames from, which is then used to read each frame in
sequence. For each frame a callback is called, which takes that frame, converts
it into an ASCII representation, which is then printed it to the screen,
refreshing the console each time. The bread and butter of this whole thing is, of
course, the frame-to-ASCII converter. It works by taking the original frame, converting it into
a grayscale image, calculating the appropriate dimensions - starting with the width, based on
the scaling factor and the max width allowed, and the height, by dividing the new width
by the original frame’s ratio – and then resizing the grayscale image to its new
dimensions. As for the actual conversion, it traverses the entire image
by pixel groups, meaning that, based on the scaling factor, I can represent
multiple pixels with a single character. And to get that character I need to calculate
the mean value of the pixel group and divide it by 255, as that’s the max value of intensity a
pixel can reach: 0 meaning black and 255 – white. Then I take that normalized value and multiply
it with the length of a string of characters, to get the index of the character matching
the desired intensity. That string is simply a sequence of characters, ordered from high
to low based on the contrast they provide. And by putting these together, I can simply specify a URL for my video
of choice and voila – it works perfectly or does it? Instead of directly stating it, how about we run the app a bit and
see if you can spot the iffy parts. Ok so the video is playing - that’s
good. But what if I were to grab this side of the terminal window and move it around?
Oh – oh, well that’s not too pretty now is it. But it’s a simple fix really -
mostly because there isn’t a more advanced mechanism available that is also
guaranteed to work on all CLIs out of the box - I simply have to poll for the current
dimensions of the CLI on each frame render, updating the internal state, used
when processing the frame accordingly. Well now that certainly works, and normally you’d expect that once you
solve a problem, it stays solved, right? or does it? * hey, Vsauce, Michael here * Just for testing purposes, what
if I were to again grab that side of the terminal and start resizing it?
Well it’s certainly scaling properly that’s for sure, but it’s also doing something extra.
You see, the FPS is now dependent on the frame size, the larger the window is,
the lower the fps and vice-versa. And if you take a moment to think about it, it
makes perfect sense, of course a larger frame takes a lot longer to process than a smaller one,
and below a certain point, they end up processed faster than what’s displayed in a normal video
player, as the video stream has no idea what a framerate is - the moment you ask for a frame,
it provides it, no further questions asked. But this is also fixable: in addition to the
stream URL, I need to also get the associated target framerate, based on which I calculate the
minimum interval a frame should take to render, and if processing takes less than that, just
wait for the cycle to complete. As for the case when it’s slower than the target framerate,
well... what can I say except skill issue? Now for the more critical issues. Currently
you can simply stream a video to your terminal, but Youtube offers a tiiiny bit
more functionality than that, mainly due to the fact it
accepts inputs from the user. Well, currently the entire
program runs on its main thread, so unless someone wants to send a key input
whenever they want to render to the next frame, it’s high time I moved this
rendering part to a separate thread. This simply means I have to move the
rendering loop from the RenderingManager to the VideoStreamHandlerThread’s run
method, which, upon starting the thread, begins parsing the stream of frames, calling a
provided callback function for each. Additionally, I’ll add some extra utilities and control
variables to allow for the thread to be aware of the rendered video’s state,
especially whether it’s playing or not. On the CliManager side of things,
I can now add an infinite loop constantly awaiting input from the user.
And to have that blocking method of waiting for new input, I wanted to make use of a
well-known function, especially by you, C programmers out there: getch().
This waits until an input character is provided and then captures it.
It’s a fundamental building block, so I of course expected it to come prepackaged
with the language in a simple and unified manner. Now imagine my suprise when I had to come up
with this beauty to get it running properly, as depending on the OS, the ways in
which it works are quite different. Also, because it turned out to be too good at its job,
so much so that it even captured ctrl-C or ctrl-Z, I had to wrap the implementation
to handle those special cases. But by hooking up this handler for the
user input with the methods to control the video state, I can pause and unpause
the player with the press of a button. Finally, to make this feel more like a youtube
page and not a random video player, I’ll add a few improvements to the DisplayManager, where, by
using some extra metadata from the video, it can display a status bar, with both the indicator on
whether the video is playing or not, as well as a playback bar, showing the percentage elapsed
– which, I forgot to mention, I get by keeping count of the index of the currently displayed
frame, which I divide by the total frames in the video - and finally, the creator's username,
along with a nicely formatted view count. Now, I could definitely stop here and
call it a day, buut Youtube is more than a place where you load a video, watch
it, then shut down the whole thing. So, in order to prepare for this great expansion,
a great deal of refactoring was long due. Sparing you the boring details, I moved
around functions to where they belonged, and ended up using quite a few callbacks, to
keep everything as loosely coupled as possible. Worth noting is the DisplayManager, where instead
of an entry function that directly outputs to the terminal, the class now manages a series of
screens, each having custom rendering methods, while making use of a unified method for writing
the output, making it so that whenever a render is requested the DisplayManager simply calls the
render function for the currently active screen. Continuing with the DisplayManager
fixation, I implemented a menu navigation bar, and made sure it properly
realigned when resizing the window, so I can prove anyone can keep up with the
extremely reasonable responsiveness standards of today. Also I made sure the video gets paused
automatically when changing to a different screen. But before going into the implementation
of those screens I mentioned before, I realized I need to have a way of
actually getting the data I needed. I pondered on whether I should approach it in
the same way I did with the Using Youtube As A Cloud Storage Service video, but considering
that using an automated WebDriver like Selenium is the same as having my app be just a wrapper
over a browser instance, I moved onto greener pastures, and chose the obviously superior
alternative that is the official Youtube API provided by the ever trustworthy Google.
You’ll see why I’m singing them such praises in just a bit, when I get to what I had to do
when implementing the screens, just you wait... I also thought it would be neat if, in
addition to the base API utilities you get when using your API key, to also have it so
users can log in using their Google accounts, for that extra dash of customization potential. Fortunately implementing these was
actually pretty easy – first I have to go to the Google Gloud Console, create
my project and – oh wait what’s this, the project for the Compiler Running on Paper
I made a few videos ago, what a coincidence, but anyway – then enable the Youtube Data API,
along with setting the required scopes for the OAuth consent screen, and finally
download the credentials config file. Then, after going back to my trusty
IDE, I can create an AuthManager, which checks for any local configs, and
if present, loads them, otherwise it prompts the user to log-in by beginning
a simple OAuth flow, where, on success, it creates that config file to be used in
the future. All of these were really easy to do mostly because it’s a really simple flow, and
Google really excels at optimizing the 80%. Also, if I were to make this an actual product,
I’d probably load that credentials config file from a remote server instead of locally,
because of, you know, insane safety concerns. But with a working OAuth flow, I
can work on the screens themselves, making use of the newly
unlocked Youtube API skill tree. A YoutubeConnectionManager handles a series of
internal states, corresponding to the list of videos associated with each screen. This service
exposes APIs for initializing and navigating through the list of available videos, while
making sure new results are fetched in advance. For the states themselves, I have it so
they all extend a YoutubeStateBase class, as most of the logic is common, except
for the YoutubeAPI endpoints used, along with their specific parameters. First, I’ll add some handlers for navigating
back and forth through the list of videos, as well as creating a util function that fetches
new results when either the state is initialized or when the current video index reaches a
certain percentage relative to the number of available videos, triggering a new fetch
in the background, using a custom thread. This util function basically
takes the method that is set when extending the base class and calls
it with the values of the parameters. Now, for the individual implementations,
I’ll start with the Home page. You see, I named it the "Home page" as
I wanted it to be the equivalent to the Home page on your browser app. Thing is the
secretive fellas at Youtube don’t want to give away access to an endpoint for fetching the
videos that would appear on a user’s Home page, probably in fear that it could give away their
golden egg that is “the internal algorithm”. So, I’ll have to settle for the Home page
being a list of the most popular videos. As for the Related page state - which you
might notice doesn’t exist in the final version - I thought it would be nice to get the
list of related videos for the currently playing video. And it all seemed to work together nicely.
Intuitively I might say. Until I actually had to run it. But I realized the relatedToVideoId
parameter wasn’t actually valid, so I thought “oh maybe I messed up its name or something", so
I had to go straight to the source, where I found that a rather big change did indeed happen with
the parameter, namely it got completely removed. So, until last summer you could do this without
problem, but now it’s too much, really Youtube? And there was no alternative to it, probably again
to protect their secret Krabby Patty formula, so I did as any good programmer does and pretended
I never actually wanted that feature to exist, swept any memory of it under a rug, and
repurposed that page to a creator page, so that if a user watches a video, they
can navigate to the video creator’s page – quite the masterful gambit in
how I handled that one dare I say. Finally, the Search page state
was, fortunately, without hiccups, as the crazy bastards actually let me use
a core functionality of their platform. Next, I moved onto their actual renders in the terminal. It was pretty much
the same thing for all three. First, convert the thumbnail into its
ASCII representation, add the title, then then creator’s name. I chose not to
display the view count, not out of spite, but because for some reason you can’t retrieve
the statistics section when querying using the "search" endpoint, and instead are forced to make
a query for each individual video, which is not something I’m keen on. What’s weird though is
that you can do this for the "videos" endpoint. But for the sake of consistency, I’m
not doing it for that one either. By also linking up the new inputs, you can now
navigate to the next or previous video and also select the current video, which will trigger
the switch to the video screen to load it. And although I said these were pretty much
identical, there are two main differences: First, the Creator screen also
displays the creator’s name at the top, which is just an extra string so no big deal. And secondly, the Search page
has a, you know, search bar. This is done using a separate
SearchBarStateManager, that stores the current query, and whether the search bar is active or
not. To set it as active, you have to press the S key while on the Search page, after which all
inputs are directed to it, except for Escape and Enter which set is as not active, in addition
to querying for the videos in the case of Enter. And now you’d think “and
with that it’s done”, right? IF YOU WEREN’T PAYING ATTENTION THAT IS. Quickly, what have I missed or did only halfway? Here, I’ll give you some quick hints. First, remember about me bragging about this
entire thing being responsive? Well, this is what happens if I resize the terminal on a screen other
than the video one – not too pretty now is it? At least it’s a pretty simple fix –
move that constant polling for the CLI dimensions to its separate thread, and if
changes happen, signal them using a callback. Secondly, you know how I told you about
integrating OAuth to customize the experience for the user, and then
proceeded to not use any of that, as the YoutubeAPI doesn’t even offer a
home page or recommendations anymore? Well, as to not have this entire feature be
useless, I have it so now you can like and subscribe, just like you can do with this
video you’re currently watching (wink). Also, a pretty big thing, or rather tall, is
the fact that the rendered ASCII images are stretched vertically since characters in a font
are generally taller than they are wider. So, to counteract this, I also took account of the aspect
ratio of the characters when calculating the aspect ratio of the image, allowing the user to
specify a custom value depending on their config. Finally, I’ll extend the list of ASCII characters to be used when rendering the frames,
to have a more accurate conversion. But this should pretty much mark the finish
line for this project – I mean just look at it.