Overview of Operant Conditioning Principles

hello everybody welcome back in this zoom video we are going to begin talking about operant conditioning so previously we talked about classical conditioning now we're going to switch gears and talk about operant conditioning so operant conditioning is based on the fact that behavior becomes more or less likely depending on its consequences so for example if you reinforce a behavior right the behavior is more likely to occur and if you punish a behavior the behavior is less likely to occur so operant conditioning is called operant conditioning because the focus is on the observable behavior how organisms operate in their environment so please um know that when it comes to operant conditioning versus classical conditioning the learned responses are very very very different so when it comes to operant conditioning the focus is on learning observable overt behaviors right however when it comes to classical conditioning the focus is on responses that are more reflexive in nature and not directly observable so for example the pride that you learn to feel when you see the american flag cannot be directly observed you feel it but the person who's observing you cannot see you feeling your pride unless you express it in your behavior if you're if you learn to salivate to a bill i as a observer cannot observe you salivating if you feel nauseous when you think about a hospital or clinic i cannot observe you feeling nauseous so once again with classical conditioning most of what is learned is reflexive and internal with operant conditioning what is learned is observable behaviors so here this binoculars is meant to convey the fact that operant conditioning is focused on what can be observed objectively observed and recorded so one thing that operant conditioning and classical conditioning studies have in common is that there is no focus on what is happening in the mind when learning occurs in fact when it comes to classical conditioning what the learning for the most part takes um learning for the most part occurs unconsciously the the organism whether it's a dog or a human or a worm doesn't even realize that he or she is learning but nonetheless these these um responses are being associated with certain things in the environment you know without our awareness if you will with operant conditioning the focus is not so much on what happens in the mind when learning occurs the focus is more on the environment and how it affects learning and subsequent extinction when the reinforcement is no longer given so i'll go and check into that in more detail in just a bit so obviously there are consequences of behavior number one a behavior can result in a neutral consequence and with a neutral consequence this neither increases or decreases the probability that the response will occur a behavior can also result in a reinforcement and the reinforcement increases or strengthens the response or makes it more likely to recur in the future so an example of a reinforcement is if your dog were to big for food and you gave it some food some table scraps you are reinforcing the dog's behavior you are making it more likely that your dog is going to beg in the future right so you giving your dog food is you reinforcing the dog's behavior so psychologists like to call reinforcements um reinforcements and not rewards because if you call something a reward it kind of implies like some sort of subjective pleasure if you will like you're rewarding somebody so the implication is that um they're proud they're happy that they're receiving this reward um but because behavioral psychologists don't want to speculate about what happens inside the organism when learning occurs behavioral psychologists want to call it a reinforcement as opposed to a reward which implies emotion um okay in addition to reinforcements a behavior can also result in a punishment and the punishment weakens a response or makes it less like too likely to reoccur so if your dog were to beg and you said no right uh if uh you are basically punishing your dog right you're kind of um you're making your dog's begging behavior less likely to occur in the future if you were to punish it so one of the things about punishment is it has to be consistent when your dog begs you can't sometimes punish it with no and sometimes reinforce it with food you have to be consistent with punishment in order for punishment to be effective and actually have a have a greater impact on decreasing behavior also when it comes to both reinforcement and punishment the sooner a consequent follows a response the greater its effect so for example let's say that i tell you hey if you study i'm going to give you a hundred dollars when you're done okay this is going to have a bigger impact on your behavior than if i said hey if you study i'm gonna give you a hundred dollars next year right so in in general um human beings are more motivated by immediate gratification the same thing is true of punishment so let's say that i said hey if you if you call me a bad word i'm gonna kick your butt right that's gonna have more of an impact on your behavior than if i said hey if you call me a bad a bad word i'm going to kick your butt next year right so more immediate punishment also has a greater impact on behavior than delayed punishment oh look at this little doggie i thought you were nice okay so uh those are the consequences of behavior uh it can result in a neutral consequence a reinforcement that increases the behaviors probability of occurring in the future and the punishment which decreases the probability that behavior will likely occur in the future so in a prototypical operant conditioning experiment [Music] um is like one using a skinner box so what happens is you put a rat in the box right for the first time and the rat does random rat like behaviors it scurries around it explores this new environment it sniffs um it basically gets up on his hind legs and you know kind of like puts his front paws on the walls and kind of like just goes around the the walls and the perimeter of the box on the side legs and eventually what happens is the rat in the skinner box will hit a lever right and when they had when the rat hits the lever a pellet of food will come out and the rat will look at the food and say wow food and the rat will eat the food and then it will go back to these random behaviors so what skinner and other behavioral psychologists have found is that eventually after some amount of time has passed the rat eventually learns that pressing the bar results in food right so when the rat finally realizes and learns that personal bar results in food the rat's behavior becomes less and less random and more and more purposeful as it hits the fever whenever it wants food so please notice how this exemplifies a um an open conditioning experiment number one everything is observable everything is recordable for example you can observe the wrap hitting the lever right you can also record the time how long did it take the route to hit the lever the first time how long did it take the rat to hit the lever the second time how long did it take the rat to hit the lever the dirt time and and eventually what is the total amount of time it took the rat to quote unquote learn that hitting the lever results in food so all these observable events in terms of frequency and in terms of time can be recorded right so you can make judgments about learning you know how long it took for example what was the learning curve was it gradual or was it immediate and all of this is based on observable events in the environment and then recording objectively what you are observing so once again this is called operant conditioning because we are focusing our attention on learning how in uh an animal operates in its how an animal operates in its environment so some principles of operant conditioning are very similar to principles of classical conditioning so this might be in a different page in your notes i think it might be out of order and if it is i'm very sorry but just find the correct page principles of operant conditioning number one like classical conditioning a learned behavior from operant conditioning can be extinguished so extinction is a procedure that causes a previously learned response or behavior to stop because the reinforcement is removed so going back to our rat study let's say that rat learned to uh to hit the lever okay the rat learned to hit the lever extinction would occur if a um if a food pellet was no longer given as reinforcement after the rat hit the lever right eventually the rat is going to basically um quote unquote unlearn hitting the lever uh the rat is going to hit the lever less and less unless as time progresses as the as a rat quote unquote learns that the lever no longer results in food so when the rat no longer hits the lever then you can say that the learned behavior was extinguished because you take because you took away the reinforcement so that's very similar to classical conditioning a parallel concept if you will in addition like classical conditioning spontaneous recovery can also occur so you might think that the rat's uh behavior was extinguished but then maybe a day later you put the rat in another cage right and all of a sudden the rat starts hitting the lever again right so after a passage of time you put the rat in a similar environment then the behavior extinguished behavior might spontaneously reappear and this is called spontaneous recovery so it might take a few sessions before the learned behavior is truly extinguished in addition like classical conditioning you can also have stimulus generalization in operant conditioning stimulus generalization is a response occurs to stimuli that resembles the stimuli present during the original learning so let's say that in this going back to the skinner box example a rat might learn how to press a lever to get food right the rat is operating in its environment is to learn how to press that lever [Music] so the thing is that the rat is not only learning to press that lever but the rat is learning to press anything that looks like that lever right so anything that looks like that lever might potentially elicit the learned behavior so that would be stimulus generalization but the most important principle we want you to focus on here now is extinction because one of the things that behavioral psychologists looked at was how long it took for extinction to occur after a reinforcement is no longer given so extinction is really important for you to understand here all right so a really interesting focus of behavioral psychologists looking at operant conditioning is they looked at how different reinforcement schedules influence learning for example um how fast or how slow did learning occur based on a different schedule and in addition how fast or how slow did the extinction occur based on the reinforcement schedule [Music] so there are two basic types of reinforcement schedules number one there is a continuous reinforcement schedule and what this means is that a particular response is always reinforced right so there is a one-to-one correspondence between a specific behavior and reinforcements so when a when a behavior is first being learned and when it is first being learned on a continuous reinforcement schedule learning is pretty quick right so if you are reinforced every time you are you do a certain behavior you notice it really quickly whether you're a rat or a human you notice it right so because you notice it learning occurs very quickly um so for example uh bring this back to a human um example where we could relate to let's say that you go to a coke machine every time you put a dollar in the in the coke machine you get a coke that is an example of continuous reinforcement you're gonna learn really quickly that coke that one dollar equals coke right you're gonna make that one-to-one connection because you're always getting reinforced every time you put a dollar in the machine the same thing is true for a wrap when the rat pisses the lever and it gets a food pellet it's going to learn pretty quickly that pressing the lever equals food so learning is very quick [Music] however on the flip side when it comes to continuous reinforcement extinction also occurs very quickly so let's say that you put a dollar in the machine and you don't get a coke are you gonna notice it hell yeah you're going to notice it right if you're always going to be reinforced with a coca-cola and then i understand that one time you don't you're certainly going to notice it and so then you're going to try again and you're gonna like what the hell no coke again you're gonna be pissed off right and you say okay one more time no coke what do you do you say screw this machine i'm gone right i'm not gonna waste my money anymore so what happens is with continuous reinforcement when the reinforcement is no longer administered you're going to notice it you're going to notice it quickly and it's quickly going to result in the extinction of your behavior you're no longer going to engage in that behavior because you're no longer getting reinforced so continuous reinforcement revolves results in quick learning and quick extinction going back to the rat example if a rat is reinforced every time it presses the lever right and then all of a sudden no food comes out no reinforcement the rat too is going to notice it and eventually very quickly the rat is going to say you know what screw this and then the rats going to no longer press deliver in other words the rat's learned behavior will be extinguished fairly quickly with a continuous reinforcement schedule [Music] uh so in addition to a continuous reinforcement schedule you have what's called an intermittent or partial reinforcement schedule and this occurs when a particular response is when a particular behavior or response is sometimes but not always reinforced so a good example of a intermittent or partial reinforcement schedule would be like slot machines every time you put a quarter in the machine you do not always win something right sometimes you win sometimes you don't so that would be an example of intermittent or partial reinforcement schedule so let me ask you something do you think that learning is going to be quicker with a continuous reinforcement schedule or an intermittent reinforcement schedule well if you if you said continuous you are exactly right learning is much quicker with a continuous reinforcement schedule because you're always getting reinforced with the intermittent or partial reinforcement schedule learning takes longer because not every behavior is getting reinforced so it takes you longer to realize hey if i do this i get that okay so learning is quicker with continuous reinforcement schedule um how about extinction okay once learning occurs okay for a continuous reinforcement and an intermittent reinforcement schedule once learning occurs is extinction faster with a continuous reinforcement schedule or a or intermittent reinforcement schedule the answer is extinction is faster with a continuous reinforcement schedule that's because if you are once again reinforced continuously you're going to notice when reinforcement is no longer administered and you're more likely to stop engaging in that learned behavior however if you're if you learned on a intermittent reinforcement schedule it's going to take you longer to realize that your behavior is no longer being reinforced so therefore you're going to engage in that behavior for a longer period of time and extinction will take longer with an intermittent or partial reinforcement schedule okay so when it comes to an intermittent reinforcement schedule there are um four different types that we're going to talk about so um number one oh okay uh oh i forgot about i forgot i forgot about las vegas las vegas is fun and these are my emotions when i play a slot machine anger and sadness and class if i never recommend you go to vegas because if you go to vegas expect to lose right so you go to vegas you're happy you're excited that you're gonna gamble you're gonna eat some great food um but you know usually 95 of the time you end up losing and so when you're driving home from vegas it's very different than when you're driving to vegas so i i call the drive home from vegas the driver of shame you know you're just lost in your thoughts which you could have done with the 500 that you lost in vegas oh i could have bought this i could have bought this i could have gone here i could have gone there so anyways try to avoid that driver shame don't even gamble don't even do it okay so there are four different types of intermittent reinforcement schedules that we're going to cover so number one there's fixed ratio schedules and what this means is that reinforcement occurs after a fixed number of behaviors or fixed number of trials so for example um a reinforced reinforcement occurs after like let's say 10 presses of the lever okay so a rat is reinforced after after um pushing the lever five times right then it presses the lever another five times reinforcement it presses labor another five times it gets reinforced when it comes to a fixed ratio schedule the rat eventually learns fairly quickly that hey if i press this lever five times i get um i get food right because it's predictable it's on a fixed ratio every five leader presses results in food so because it's it is consistent uh learning occurs fairly quickly and what happens is once an animal learns on the fixed ratio schedule um the behaviors are very rapid right so because i can because i can predict reinforcement because it's on a fixed number of behaviors uh what i'll do as a rat is i will hit the hit the lever five times one two three four five right right then one two three four five then food one two three four five food one two three four five food so what you get is a rate of behaviors that is very rapid so right right this is example of the rate of response for a behavior learned on the fixed ratio schedule so once again learning is fairly quick right because it's consistently on a fixed number of behaviors and once it is learned then the behaviors that are exhibited are fairly rapid because they're predictable in addition there is what's called a variable ratio schedule and this is uh reinforcement occurs af after an average number of trials or after an average number of behaviors okay so an example of a variable ratio schedule is reinforcement occurs after let's say five behaviors okay five behaviors so that means that maybe the fourth behavior will be reinforced and then maybe the fifth behavior will be reinforced and then the sixth behavior will be reinforced and then the third behavior will be reinforced and then the eighth behavior will be reinforced so what happens is that um there is a variable number of behaviors that must be engaged in averaging in say five so what happens with variable ratio schedules is that oops my cat stepped on the computer and brought up this window okay so what happens with oh my cat is really getting on my nerves okay right now it's sitting in front of the computer screen okay so what happens with the variable ratio schedule um because the organism is learning on the variable ratio um it is not a fixed pattern so therefore it takes the animal longer to realize that its behavior results in reinforcement right so it takes a little bit longer to for learning to be established on the variable ratio schedule however once learning does occur on the variable ratio schedule when it comes to the rate of behavior once it is learned because the behavior cannot because reinforcement is not predictable because it occurs on the variable ratio schedule what happens is the behaviors are much more slow and steady right right i'm not going to waste all my effort if i don't know when reinforcement is going to come so what i'll do is i'll engage in slow and steady behaviors and just wait for my my reinforcement that way with slow steady behaviors okay and my cat is standing in front of the computer screen finally it sat down all right so that is a variable ratio schedule okay so uh question for you class when it comes to a fixed ratio schedule versus a variable ratio schedule is learning going to be quicker with a fixed ratio or a variable ratio well if you said learning is quicker with the fixed ratio you are right that's because learning is more predictable because the ratio is fixed and so learning occurs much more quickly how about when it comes to extinction is extinction faster with fixed ratio or is extinction faster with a variable ratio abusive extinction is faster with the fixed ratio you are correct because reinforcement comes after a fixed number of behaviors when that behavior is no longer reinforced the organism is more quickly to recognize that and more likely to stop engaging that in that behavior and therefore is more likely to be extinguished quicker with a variable ratio schedule because behaviors are because behaviors are reinforced on the variable schedule it takes longer for an organism to realize reinforcement is no longer being administered and therefore the behavior lasts longer it's slower to be extinguished okay so one of the things about variable ratio schedule is that when it comes to las vegas slot machines they pay out on the variable ratio schedule so these casinos will program their their uh slot machines for example to pay out like one time out of every five or every six uh uh pools of the um of the lever so sometimes a person playing the video uh a poker machine or a slot machine sometimes they'll get reinforced with the wind after one pull and then they'll get reinforced after 10 pulls and then they'll get reinforced after six pulls and then they'll get reinforced after two pulls and then reinforce after 10 pools again but averaging some fixed number that is set by the casino so here's the thing about variable ratio is that because payout is based on the variable ratio scale it takes longer for a customer as a slot machine to realize that the machine is no longer paying out so they stay longer right and then they lose more money so those darn casinos they have it down to a science they know exactly the right variable ratio schedule to pay out to maximize their wins so when it comes to um alright so that's the variable ratio schedule in addition there's a fixed interval schedule and this means that reinforcement occurs after a fixed amount of time so let's say for example reinforcement occurs uh a rat might have to wait let's say five minutes before pressing a bar results in reinforcement so let me ask you a question if an if an organism like a rat is reinforced on a fixed interval schedule do you think that these animals develop a sense of time when the reinforcement is coming and the answer is you betcha uh absolutely um so for example uh if you have a pet um a dog will realize that you are coming home every day at 5 30 right so every day at about 5 15 your dog is going to go to the front door and just wait for you until you get home so your dog will develop a sense of timing you know so seeing you is that reinforcement and they know you come home every day at about 5 30 and then they are at the door uh at 5 15 every day somehow they they realize what time it is also if you feed your dog or cat on a fixed schedule right on the fixed interval schedule like every morning or every afternoon at 12 30 your dog or cat is gonna realize that schedule so here's a picture of my cat standing in front of the computer screen begging me for food at 12 30 because that's it's time for snacks i'm sorry i mean treats but my cat doesn't really call me handsome my cat really calls me hey you um but my cat does actually do this i mean i can set my watch to when my cat comes up to my computer screen and annoys the heck out of me uh because it wants to be fed treats right it's 12 30. so do animals develop a an uncanny accurate sense of time when they are reinforced on the fixed interval schedule and the answer is absolutely so here is once a behavior is learned on a fixed interval schedule like after every five minutes or after every 24 hours the behavioral pattern is like a scallop pattern so what happens is the organism realizes that i gotta wait every uh let's say five minutes before i'm reinforced okay so what happens is the animal is reinforced here you know the rat received the food pellet so after the rat receives the food pellet that realizes oh now i gotta wait another five minutes before i get reinforced so there is going to be a decrease in that behavior right after reinforcement because the rat knows it has to wait but as the five minute approaches you're going to be in you're going to see an increase in the learned behavior until reinforcement is administered again right so you get after reinforcement you get a decrease in the behavior and as the interval nears its end you get an increase in in behaviors toward the end until it gets reinforcement so you get the scallop pattern of behaviors uh with a fixed interval when uh with a fixed interval reinforcement schedule um to give you a human example let's say that your boyfriend emails you every day at 10 30 in the morning okay every 24 hours at 10 30 in the morning so what happens is on monday at 10 30 in the morning you get your reinforcement from your boyfriend you know that email and then you know you got to wait another 24 hours right to get that next email from your boyfriend so you stop checking your email right so your checking email behavior decreases after you you've received your reinforcement but as the interval uh nears its end and the 24 hour is coming up what happens is you're gonna start checking your email more and more and more and more until you get that next email from your um boyfriend 24 hours later right so then that's that behavior is gonna repeat every day with the scallop pattern of behavior if you will [Music] all right lastly there is a variable interval schedule and this means that reinforcement occurs after an average amount of time has passed so this means for example the organism might have to wait uh an average of 10 minutes i'm just going to go 10 minutes so the organism has to wait an average of 10 minutes so this means that for one interval the average must the organism must wait eight minutes for the next interval of reinforcement it has to wait 12 minutes or then it has to wait three minutes then it has to wait 23 minutes then it has to wait 10 minutes so across all the interval reinforcements reinforcements it has to wait an average amount of time so let's say 10 minutes so with a variable interval schedule basically um reinforcement is not really predictable you can't predict when it's going to happen uh so okay going back to a human example let's say that your boyfriend emails you once a day at a random time every day right you just don't know it's totally random it could be eight o'clock in the morning it could be noon it could be five o'clock p.m it could be 12 o'clock midnight you just don't know so with a variable interval schedule um because the reinforcement is not predictable what you're going to have is a very slow very very slow and steady rates of behavior once it is learned so for example if your boyfriend emails you at a random time every day you might check your email at three o'clock in the morning okay and then maybe at eight o'clock and then you might check it at one o'clock in the afternoon and then three o'clock in the afternoon seven o'clock at night and then maybe ten o'clock at night so there's going to be a very slow rate of behaviors uh if you learn it on the variable interval because reinforcement is just totally unpredictable so those are the four reinforcement schedules uh a quick question to you um in terms of fixed interval versus variable interval do you think learning is quicker on a fixed interval schedule or do you think it's more it's quicker on the variable interval schedule if you said fixed interval schedule you are exactly right learning uh is quicker on the fixed interval schedule so because a fixed time is needed to pass between reinforcements the organism is going to very quickly figure out that timing because it's fixed right so learning occurs fairly quickly with a variable interval schedule the timing is always different so it's going to take longer for the organism to realize that reinforcement is occurring as a result of its behavior because the reinforcement is being administered somewhat randomly if you will because of the variable interval between reinforcements so learning is quicker with the fixed interval how about extinction is extinction quicker with a fixed interval or a variable interval what do you think all right so if you said extinction is quicker with a fixed interval schedule you are exactly right so once again with a fixed interval schedule reinforcement is predictable because it occurs after a fixed amount of time so once the reinforcement is taken away the organism is going to realize it right and then the behavior is more quickly going to be extinguished if a behavior is learned on the variable interval schedule when the reinforcement is taken away it's going to take longer for the organism to realize that reinforcement is no longer being administered right so therefore the learned behavior is going to last longer throughout time after reinforcement is no longer given in other words extinction will be slower with a variable interval schedule okay um so for a response to persist for a response to persist it should be reinforced intermittently because reinforcement basically takes longer if you reinforce a behavior intermittently i'm sorry i think i said that wrong extinction is takes longer if it is reinforced intermittently okay so what this graph shows here is that uh it shows the extinction trials once uh behavior was learned and the reinforcement was no longer given after learning via opera conditioner so if a behavior was learned on the continuous reinforcement schedule and you take away the reinforcement the organism is going to very quickly notice that the reinforcement is no longer given and the learned behavior is going to be extinguished fairly quickly right so you get a very steep extinction curve or a behavior learned on the continuous reinforcement schedule however if a behavior was learned on the partial reinforcement schedule it's going to take longer for the organism to realize that the reinforcement is no longer given so therefore the learned behavior right will persist longer across time and extinction will be much more gradual for a partial or reinforcement schedule so this is basically what this what this curve is showing so for for a response to persist it should be reinforced intermittently all right class so that's it for this zoom video and i will see you in the next one you

Transcript for:Overview of Operant Conditioning Principles

Transcript for:
Overview of Operant Conditioning Principles