you may have seen the site rottentomatoes.com where they give you scores on the basis of the positivity or the negativity of the reviews of movies and tv shows now certainly given the amount of content that's being created nowadays they don't do this manually they don't have people reading through reviews and determining whether whether you know what percentage is it good what percentage is it negative so this is what they're performing here is a form of natural language processing called sentiment analysis and this is something very useful to businesses now that people are doing a lot of talking online businesses are interested in knowing what do people think about the company do they like us or do they hate us how do they like the new product do they like it or do they hate it are they saying good things about our new promotion or are they saying bad things about our new promotion so sentiment analysis being able to automatically go through a lot of text and figuring out are they saying positive things or negative things is a key and in fact one of the earlier triumphs of natural language processing so in this video we're going to take a look at sentiment analysis and then also semantic analysis which is you know getting beyond just the emotional tone but then trying to get it something of the meaning so in this lecture we're going to take a look at you know defining sentiment analysis and then looking at how it can be used in a decision support system and then we'll end with sentiment with semantic analysis so the basic task of sentiment analysis is to take documents and determine whether they are saying negative things so dissatisfaction uh they're just saying neutral things or they're saying positive things so you might be able to um you know quantify it so if you have this automatic sentiment analyzer you can take you know regular documents or correspondence you can take social media postings you can take uh wherever people are putting text and then be able to run it through the sentiment analyzer and saying okay well the commentary is 71 positive 21 percent neutral and 8 negative and so um we want this to be as accurate as possible and so an active area of research is trying to get sentiment analysis as accurate as possible so what's one way that we might do this so a kind of a naive way would be what's called a dictionary base where we just make a lexicon another word for dictionary of words that we say these are negative words so bad terrible boring and then we have a list of positive words good wonderful and exciting and then we loop through every single word in the document you know giving it a plus one every time it's got a nice word and then a negative one every time there is a bad word and then we just figure out which one is most present well that's one way to do it and that's kind of a you know a naive beginner's way of doing it we're going to take a look at a slightly more sophisticated approach but within you know just doing a word-based approach there are several problems uh that that are brought up um first of all how do we identify when somebody is being subjective or objective how do we recognize conditional sentences so i would have liked a big screen with a responsive action so they said it's a big screen and responsive action but which are good things but they're saying i would have liked that how do we recognize sarcasm how do we balance general lexicons with domain specific lexicons so a movie like um uh you know a a word like exciting might work for some things and not for others um how do we resolve co-references that um you know basically pronouns um how to disambiguate poly seems which just means you know sometimes the same work can mean multiple different things okay so there's lots of problems with in sentiment analysis so a simple lexicon based approach is unlikely to be very successful i want to show you you know one particular way that a you know sentiment analysis can be useful and this is um a system that was proposed uh to analyze the voice of the customer so the so-called voc so it's the voice of the customer voice of the competitor and voice of the competitors customers so what this system does is it takes online postings and then maps them onto themes you know so what are the people talking about we'll address how they did that in the next video and then we want to determine the sentiment of the comments so if we are a clothing manufacturer we might be interested in saying okay well um it would be good to know that they are they very much like the fit of the dress but they really hate the colors that we offer okay and so they can get this from online postings and just with a combination of classification and sentiment analysis get some really informative information for product development and so what they did was they created this interface this decision support system on the basis of the sentiment analysis so they each so let me just explain what's going on here so they each row here is a particular product category okay and the um the focal firm is costco okay so we are costco and this is our system okay and what this does is um so these are the competitors so costco kohl's walmart kmart and home depot okay and these each of these circles over here shows you uh who is winning so like h right there that stands for um home depot w is walmart okay so what what this is doing is saying okay so on clothing uh this is costco's score you know the positivity and this is the average positivity and this tells us where we are so okay so in terms of the competitors we are ranked very high okay so on electronics the score is below the mean and so we can see well who was who um you know our you know our firm is actually you know sort of more here in the middle and let's see um so on watches our you know our score is down here below so this provides a way of mapping the sentiment from text documents onto a competitive analysis where you actually break it down by competitors and products sold by the competitors so this is one application of sentiment analysis so in addition to just determining you know what the emotional tone of something is you might actually want to know what the meaning of it is now how do we approach getting the meaning out that's much more difficult so one possibility is to take what's called a categorical approach that is create a massive dictionary where you categorize words so for example words that are about universities college campus class professor or anger words rage furious outrage ir collar etc and what you do is just keep track go through the document and say how many of these words are in here so you might be able to say okay well how many of these documents are actually talking about universities one tool that does this is called liwc okay and it is um it's a it's a low-cost uh semantic analyzer um it's it stands for linguistic inquiry in word count was created by actually a psychology professor and the way the software works is you bring in all of your documents like in an excel file or something and you indicate which of these things you are looking for so there's everything from punctuation show me how many exclamation points there are etc show me how many pronouns adverbs how many words are there about seeing hearing feeling how many words are there about health and you know eating score it on terms of its analyticalness in terms of its authenticity okay and here are some more so you can get into actually positive emotion negative emotion how many words are about family and friends um and so on and so forth there's i think just over a hundred different categories of words and then you run this and then it gives you the proportion of words in each of those categories so this will help you for example find documents that may have to do with money so financial documents so there's a whole category here about money words another approach to semantic analysis is to categorize every single word in a language in terms of a big hierarchy so this is an endeavor that was started in the 80s at princeton university you basically tried to classify every single word in terms of its relation to each other so for example you start here at entity physical entity physical object living thing organism plant and ornamental plant so if you have if you're down here you can figure out well how far is this ornamental plant from you know let's say another kind of plant well that's you go up one level and then down another level so the you can understand the meaning of something in terms of how it's related on this tree so with something like this you can actually find synonyms because all you would have to do is go up one layer and then go down to other legs and so you can actually use this tool this word net tool to generate synonyms and um so for example this is a dog and you know you can if you go up one level which is these hypernyms it tells you canine and then if you go down it it will tell you okay these if you go down a level it gives you specific kinds of dogs okay so you go up a level it says canine you go down a level gives you specific kinds of dogs so so this is you know another way of going about semantic analysis so sentiment analysis is about finding emotional tone semantic analysis is about finding meaning and you can do that either by finding just non-overlapping categories of words or you can do it with this word net and figuring out you know the words that are in there where do they fall on this massive tree