Transcript for:
Understanding HTTP and HTTPS Protocols

so the second part of this second lecture is on HTTP the hypertex transfer protocol so the protocol that really makes the web run uh it is the standard web protocol that is used to transfer hyper text and as we discussed in the first um lecture hypertext is essentially text with links with additional information um it doesn't only transfer hyper text but in general hyper media so can also transfer pictures for example an important feature of HTTP is that it's stateless so it if you send more than one request the server does not know of a previous request um this is kind of nice to know for now but when we later get to the server side topics uh you might understand more in detail why this is useful but in general this makes for example the routing possible that uh different requests can have different routes and this does not cause any problems um so that's one part of it and one important thing of HTTP is that everything absolutely everything is sent and received in clear text this includes passwords um so when you want to encrypt something when you want to for example hide your passwords you should always use HTTP that is going over a secured connection uh and what you typically do is https um and in fact nowadays most uh companies most website force you to do to use https um because HTTP has of course this flaw that everything is clear so if we go to Firefox you could for example do HTTP colon double goole.com uh this should give you an unencrypted connection what you'll see is that actually uh this changes to https so for example Google one of the companies is automatically redirected you to a secure connection and that's also why you see this lock um so be careful whenever you have a connection where the lock is not there and in the practice session I will most likely show you a live example of how this looks like um and how you can actually look at the details of the connection to see um that everything is in clear text this can be problematic for example imagine you are on in a in a cafe and you're using the public Wi-Fi um if you use an open connection HTTP connection everyone that is in the same network in the same Wi-Fi could just read your password so that's quite an implication um so that's important to know but that's really the that difference we're covering here so htttp is the regular transfer protocol https is more or less the same thing just encrypted now the way this works is um when I request a website for example I want to to go to Wikipedia uh and I want to get the website back what I'll do is I send a request as we already discussed in the first part and it looks something like this so it it says get/ Vicki slurl HTTP version 1.1 host wikipedia.org and I get something back um and this is an HTTP request this is an HTTP response and we'll look into the details of what those different things mean um the typical case of doing an HTTP request is the one I've shown you so far is when you go in your browser and you open a website so that's the most by far the most typical request um but in fact there are a lot of other ways so you can do the same with for example the postman uh software which we use in this course but for example when you use any kind of app on your smartphone and it downloads some information for example weather information you're obviously not opening a browser and opening a website but you using some kind of program um and what that program does in the background is actually an HTTP request so it doesn't always have to be Firefox or Chrome HTTP requests are literally everywhere in the internet um yeah and it's really most applications that somehow interact with the web do these requests so it's kind of important to understand what HTTP uh does the requests look somehow like that and we could uh I believe look at that also in here if you go to uh Firefox you can go to tools web developer network uh and Chrome for completeness we can also show it there has something very similar uh I just have to find it because I never do when I want to history no here on The View developer you can go to develop tools um and you also get for example the network information so it's the same thing in Chrome um in Firefox you get this thing uh very cryptic but now when I do google.com and I start a request you see all sorts of things going on here um and what's actually happening is every line here is a single request so you see that when I open Google it's not just one HTTP request is actually 18 of them um but to make it simple I we just look at one thing for now uh and if I click on one of them you get the details so you get this um and there are a number of different things that are relevant here so one thing that you see is get that's the so-called HTTP method so HTTP knows a number of methods for different purposes we'll get to them later and we will heavily get to them later in the server side lectures then you have the Target so of course you want to go to Google so you have to give an address um these are URLs they can have very different formats and we'll look at them later but essentially they're addresses so you tell the browser or you tell HTTP what is it that you would like to have um then you you tell which protocol version you have uh this is what you see most often HTTP 1.1 sometimes you see http2 but it's actually not that common um so that's just telling please get me the google.com resource the website using HTTP 1.1 uh and then you have a lot of so-called headers so that's additional information uh quite detailed technical stuff if you look at it um and we'll get into details there so it tells you what what kind of additional thing are relevant here depending on what kind of request you do you might also Al have a request body for example uh again easiest thing if I am on Google and I want to actually search something web development um and I press search or press enter of course my request has to know that I'm searching web development um and that that's in the body so it's additional data that you need to provide because it's not enough to say Google but you also have to say please give me repb development um the headers are meta information as discussed for example the first one here says what kind of response to accept so here we say we want to have HTML or text back uh or other things if the server instead send sends me a video back then something is wrong please don't accept that so that's kind of things that are in there typical things that are in the header are for example as I just said they accept what kind of response do I expect what is coming back uh you typically your browser automatically sends the version for example now when I use Milla it says this request was sent using Mozilla 50 on a Macintosh with this uh operating system version blah blah blah so you actually send to Google quite a lot of information on who you are then there might be cookies I'll discuss that later um You probably have heard that because nowadays of course on all the websites you see these popups this website uses cookies um and things that are typically sent in the header are for example authorization information so if you send your user and password when you loog into Gmail most likely it is in the authorization header uh and the details here are not relevant right now so those are typical things um that you might might want to send um note that these things are not encrypted so in this case I'm saying use basic authentication there are different versions you can use uh but basic just means send the user and password and then this here is my user and password this is actually an encoding it's not encrypted so even though it looks fairly encoded and fairly encrypted it's actually not it's just converted you can just convert it back and then you get the clear text uh I'll show that in the lecture okay if everything works you'll get a response from the server back um that might look something like this HTTP 1012 200 okay uh so again this is our protocol version same we just had uh then we have a so-called status code we'll dive into those and then we have some kind of text some human readable text that says everything is okay in this case um as before we have headers so they some of them are the same for example the date header maybe um some of them are typical for responses and the response has a body because we requested some kind of resource the response should actually return the resource if the request somehow doesn't work then we should get some kind of error back some error message or some errow page or so on uh but what you get back here in the body is really the HTML the source code of your website um so again this is something you can see um this is our request and here you see actually the response header so this is what the server sent us back uh here are the request headers and if you go to response you see the actual uh response which is now I don't know why it's not here we can reload it but typically you would see uh it's probably the wrong request because there are so many if we go to Just google.com we it's probably easier so if we go to google.com uh you see actually this is the response we're getting so it's just the uh actual HTML code okay um as we already discussed the response headers might be slightly different from the request headers uh so for example a typical one is age this is an estimate of how much time has passed since this response was generated so it gives you some kind of information of how long did it take to get from Google to your computer from Wikipedia to your computer uh it also related information it tells you when is this information expired so for example if I get the response after Thursday the 1st December 94 then I should throw it away it's not valid anymore uh so that might not be that important for websites but if you send the request for example to get the current weather if you get the current weather two days later it's not relevant anymore so then you might want to have an expiry date um Let's ignore that for now but if you want to change a resource you might only allow certain things um and then you get the content type that's an important one it tells you what kind of resource did you did I give back to you so for example if the content type says text HTML uh then you know that okay this is an HTML document uh this is something that I can parse with a browser I can display if it says instead something like jpeg then you know it's a picture and so on uh and if it's text then you might want to know which character set is it or similar things so basically this helps you to uh to choose the right application to display the response and that's also why for example if I uh enter a URL with a PDF your browser automatically knows that okay I should use Acrobat Reader or I should use some internal PDF reader to display this that's the content type that gives you this information uh and finally another important one is relating cookies so the response might tell your computer to save a cookie and again more on that later now I've dropped URL before as a as a keyword but uh this is what we go into detail now whenever you enter a website a web address in your browser what you want to do is Identify some kind of address that you uh browsing to and this is using a URL so URL is any kind of reference to a resource for example an HTML document an image a video a Javascript file whatnot and they typically look something like that um so that's what you all know um and this URL has several parts so the first one is the protocol and as I've done in most of the examples I've often ski this and that's because most browsers if you just enter an a regular website it just adds HTTP automatically uh but usually for a proper request you have to enter the pro protol then the second part is this en. Wikipedia.org that's the host name so that's what we already discussed in the first part that's this part that gets mapped to an IP address so that's identifying the machine the computer where our resource is located and then the last part is the path so that's basically the folder on your computer where the resource is located um and of course for instance if I just do enan Wikipedia.org I just get to the main page uh and that corresponds to the path slash so it's just the the root directory the basic directory where everything starts um but typically you have a additional path so that's how most URLs look like um the general case is slightly different um in general you have something that's called a uniform resource look identifier so I will not discuss the details on that but uh it's very similar to URL but you have a lot of different information that you can add uh so we have already seen the protocol um what you have not seen is this one here sometimes you you have something like gisha cologne password uh so then directly in the URL you can supply a user and a password used for authentication so if you ever see this uh one name colog another thing at that basically means Authentication um colon 80 is the so-called Port uh more details later but it's basically the endpoint so on the same host on the same machine in a way you have different targets different slots uh so the corresponding thing if I take my post example again is you could send an email you could send a letter to rovic University and you could send a letter to rovic University but basically to two different department departments uh so it's a difference whether you send a letter to the school of computer science or the School of Business uh that's similar to the ports so same address but different kind of end points then we have seen the path uh and then we have some more stuff here and that one of them is called the query that's typically where parameters are provided so if you need to tell the the target Wikipedia for instance that Fu equals one then you can put that into the URI um and finally we have the so-called fragment so the hash symbol and section one in this case uh that's used to identify a part in the uh URL so one example for this is uh if we go to Wikipedia and I directly do Vick URL so that should bring me to the uh Wikipedia page on URL and if I click on any of these if I click on notes for example you'll see up here that this uh hash notes was added and that basically tells the browser you get URL back you get this page back please go to the part notes uh so it's basically a section heading and the difference is you see if I just open URL it will bring me to the top of the page if I open url # notes it will directly jump to the right section um so that's what this fragment is used for uh so this is the general thing uh many of these things you don't typically see and since we are discussing HTTP here we always have HTTP in the front but it's perfectly fine to have other protocols so for example if you use a mongodb database uh you might need some kind of string that looks like this that says okay please log into my database that is on the server DB host on port 27 something path like this uh and then additional parameters for authentication please you use SC R sha one so that's a typical situation uh you will see this in in different cases so it's not only HTTP but you can have very different protocols okay uh now we we jump into HTTP methods but only briefly these get much more important when we get to the server um but it's important to know that HTTP knows nine different kinds of requests so different methods um and the most common ones the ones that we'll use in this course are get post put patch and delete um so it's basically getting as the name suggests you want to get a Rec resource you want to request a resource post you want to send something um so for example if you want to post on a guest book in online you might do that if you want to send an email in a contact form you probably do a post so these different methods are typically used when you have a form and you want to submit data uh put patch and delete you probably haven't seen yet because these are almost exclusively used in programs so in your web browser you'll never see them uh the standard case is get and post when you want to open a website you do a get when you somehow send information through a form for example you do Post um but as you see in the browser you do not anyhow specify this uh you just see the details if you again go to your network view you actually see that everything is get here I'm requesting some kind of resource as I said this will uh become much more important later on these properties have different uh these methods have different properties um that are important again something we'll discuss more later but for instance if you do a get request a get request has to be save and that means that when you run the get request it does not cause any side effect on the server it does not change anything so if I for example request the Wikipedia page on URL it sends me back the page but the server will have exactly the same status afterwards nothing has changed uh so that's a very important property then there is one something that's called Ed imput tense uh and it means that if you run the request once or 20 or 100 times doesn't change the result uh and without going into details you can imagine if you delete something then it's gone if you delete it 20 times it's still gone it's all uh the same as before so the delete and the put methods have these properties but again that's more important later on and cachable uh I have mentioned caching in the network part uh but sometimes it's important to say that if for example I know that in my company everyone uses Google all the time maybe I can somehow save the response uh so that I don't have to ask Google every single time but I can reuse the the response uh so get and post requests can be cachable so you can save them basically again much more details later um the important thing here is uh later on when we choose methods for for different purposes we have to keep in mind what they should do now the final part in in our puzzle are the response codes so we have seen in the in the response header that we get this cryptic code that says 200 okay um and these are the HTTP response codes so whenever you get a response from HTTP you get this three-digit integer and number um that tells you how the response worked what happened basically um and there are different classes to this so the first number always tells you in general what happened uh so earlier we had 200 and if it's anything with a two at the front it's a success so whatever we requested actually we got back it worked um if we get a one something response it's some kind of information uh if it's a three response we have been redirected so we have been requesting something from Wikipedia but maybe they have actually uh redirected us to a different host um four is the client error so you have sent something that is wrong you have for example requested a page that does not exist um or you have provided the wrong authorization so you're not allowed to read that kind of URL um and if you get something with five it means something is broken on the server there was some kind of program error or similar uh now the important thing is that your client for example your browser needs to understand the first digit uh because that's what tells you in general what has happened um the other two numbers are not that important so you can actually if you write an application you can actually invent your own codes um but there are typical ones uh and we'll just look at that for example if we look at 200 okay that's what we've gotten earlier um this is the website from htttp statuses. so you get information on all the status codes and 200 okay tells you that the request has succeeded and then there is lots of details what you should get back and so on uh 2001 created that's something we'll use heavily later on but basically means you have created a resource it was successful and now there is a new resource 400 means there is a bad request whatever that means you have somehow not requested something properly um that's sometimes used when when you don't have more details 4001 is a typical thing when you're not authorized so for example I'm trying to read the emails of someone else and I'm not authorized to do that so the server doesn't give me the emails back the server gives me 401 unauthorized uh 403 Forbidden um that's maybe similar but it somehow means I'm not allowed to access that resource um for different reasons maybe I'm at the wrong place you're only allowed to access this from within the company um 404 you have all seen that means not found you have given a URL that does not exist um I don't know that one for example if if I do URL something else Wikipedia will probably tell me 404 uh and in this case instead of sending me the resource I wanted uh they sent me some kind of arror page Wikipedia does not have an article uh so that's the 4004 and then other things you typically have seen are 500 the internal server error that's again somehow a very generic thing something in the server didn't work uh 53 service unavailable that's for example often the case when the machine has shut down the server has crashed you'll get a 5003 so just this number gives you some kind of information uh but as we said you can actually make up other numbers okay um now the final part of HTTP we'll look into cookies so HTTP as we have discussed is stateless the server does not know of previous requests but of course you all know from experience that in the worldwide web we have States so if you log into Gmail and you click on something it still knows that you are logged in it doesn't it doesn't ask you to log in again uh if you visit a website repeatedly you might be logged in automatically um this one is of course a bit more problematic but if you go to a website that you have visited before you might actually get advertisement uh depending on what else you have done so all of these cases you know and of course that means there is some kind of state the worldwi web knows certain things you have done before uh and that's essentially because of something that is called cookies now cookies uh were quite unknown until this started to come up uh so when all the websites I think 2 three years ago started giving you these popups saying we use cookies to make the site simpler find out more or agree disagree and so on um so that's new legislation in the in the EU and in other countries as well um but essentially cookies are text they are simple texts that are saved in your browser um and that are set by the server and the client returns it now if we look at that if we for example go to Google as simple as that um uh and I go I always have to find it if I go to my network I go to cookies you'll see that actually there is something happening uh there response and request cookie so there lots of things here uh that I have not set myself so someone has set these cookies and if you go to storage you can get the details on that so you see that there are actually Google has stored several uh cookies for example they have stored something is called consent that's most likely whether I have clicked uh agree or not on this little window that asks me whether or not I want cookies um there is often a whole lot of different stuff you often find advertisement here for example um now cookies as I said are basically just text and the way this works is I request a website I send the request to google.com as we have already discussed and I get the website back so far so good what also happens is that the server sends back in the header it sends this set cookie field uh and for example it might say set cookie uid 5 what this means is that the server tells my client my Firefox for example to store uh a text that says uid is five and then this text is just here on my on my browser um but the next time I send the request for example to google.com what happens is the browser sends all these cookies that come from the same host from google.com back so everything that is related to google.com is being sent back to Google uh so this way if I do a second request to Google Google will get the information user ID is five uh and then for examp example it knows aha okay it's the same user that earlier Googled about cats now the person is Googling dogs so maybe the person is generally interested in pets or whatever um so this is really what it is uh it's just text but of course you can use that for a lot of different things so you can for example use this to specify advertisement preferences or so on um this is something I'll show a lot about in the in the practice session so just to give you an idea um but there are some other things about this first of all cookies are on your computer and that means you can change them um that also means that this has a security impact so for example I just made the the example of a user ID uh so I'm not sure which one of this is a user ID maybe none of them um but let's say that for example this one is something that identifies me as a user here's the value this is the text and I can just change this so I can just put in here whatever I want um and if Google hasn't programmed their servers properly and I run this again it could actually be a security risk so that's something we'll discuss in the security lecture but technically I could for example change my user ID um so you should as a server side programmer on the server you should never assume that people are not changing their cookies the other thing is that you can delete them I can delete all my cookies so if you write an application you should somehow make sure that you can use this application even if cookies are not working uh so that's another key ingredient here and last but not least cookies have this kind of negative touch uh because you hear about them you hear about security problems you get all these messages now days uh but they're actually not evil themselves it's just text uh and without them you would not have any states in the web so it wouldn't be possible to remember login information or so on uh the thing why they have such a bad reputation is that they're regularly misused or they the reason for security problems and so on uh but in itself they are important um and they are necessary good so this concludes the lecture two the network and hdp lecture what we discussed is um that when you open a website you essentially request a resource and you do that using the HTTP protocol now HTTP is is a worldwide web protocol it uses the internet and that means it uses the TCP IP stack uh and that is used to identify where to send your request a destination uh and it's used to Route it to basically make sure it takes the right way uh whenever you do this request you specify the method so what do you want to do with the resource where is the resource the the the URL and meta information headers for example cookies as we discussed or authorization information uh and then if you have done everything correct then the server will respond with the right resource for example the website uh and some kind of headers extra information and then as the last part we discussed that HTTP itself is stateless but using cookies you can basically remember what the client has done before and therefore you get uh the state full web so you get States in your web application okay so that's it for this lecture um the next lecture will start with HTML so it's important to get some tools installed uh VSS code will be using in this course if you're familiar with anything else you can use any programming environment you like uh you should have a browser installed we will be using Firefox and chrome so that's the best to test your applications as well and you won't need it for the first assignments but it's good to directly install Postman so that you can play around with HTTP requests yourself um so that's things you should be doing and the next lecture then the next two lectures actually are on HTML which is all about structure so how what kind of information is in your website uh and this will be a start start of a series of very very technical lectures so it's a lot about code coding and uh different elements of HTML okay so that's it for part two