Module 3 - Lecture - Natural Language Processing 1: Definition and Tasks

one of the more salient and remarkable developments in artificial intelligence is the ability for machines to listen to what we're saying and understand what we're telling them now this is not just a question of you know doing speech to text because that's a signal processing problem that's been solved for a long time what's remarkable is that the machine is now able to parse what we're saying and understand our meaning and in some cases our emotional tone and so this subset this discipline within artificial intelligence where they research these things is called natural language processing this is my own personal area of interest and this is where i do much of my research so i hope that you will enjoy this section particularly in this first lecture i'm just going to define what we mean by natural language processing and then i'm going to talk about basically what are the tasks that fall under this general heading of natural language processing i've divided them into what i call the main tasks and then the subtasks and then finally the micro tasks and these are getting more and more specific so in general the the the large most general um tasks within natural language processing are natural language understanding so when amazon's alexa can understand what we're saying and with a with varying degrees of accuracy can understand our meaning or when we can speak into google cloud or even our cell phones and it can uh it can put together the text of what we're saying not only just because it's matching the words but it can actually understand what we mean because it can hear a homophone and know which sense of the word you're looking for another main task is natural language translation so the ability to move from one language to some other language is a task you know something that required a great deal of human expertise and recent developments have enabled machines to do this for us and this opens up a lot of possibilities if you think about just the global implications of you know an area let's say somewhere in micronesia whose language generally wasn't spoken this opens up a whole world of possibilities for them in terms of trade so the third major task is natural language generation so being able to go from an intent to natural language one area well there's two areas where you see this a lot in industry one is in chat bots so the chat bot does the natural language understanding it understands what you're asking for and then it it figures out what you need finds the information that you need and then it has to put the answer into some natural language generation and so this is what you see when you're interacting with the chat bot several companies everything from auto dealers to banks are now using chat bots another area where you see natural language generation a lot is in generating text for e-commerce sites so there are hundreds of thousands of product descriptions that have to be created you can have a person write them or you can just have a an ai go through the product specs and then put together the natural language description of the product for you okay so in general natural language processing is the field of ai specializing in the interaction between computer programs and human language now the key difference here is computer language versus natural language languages computer languages were developed as a way of making it possible for a human to adapt themselves to what the computer needs so this syntax is very tightly prescribed because there's only so much that the computer can understand and if you don't get the syntax right it doesn't understand what you mean so there's an instance of where the the humans are having to learn what the computer can do and adapt their semantics into the computer language now the contrast to this is natural language where we communicate in you know whatever language we're comfortable with with our own idioms and this has always been a major challenge for computers to to go through because you know there's specialized vocabulary there's spelling mistakes there's we all have our own idiomatic way of putting together sentences and word orders often variable so this is a much harder problem and so the natural language processing as a research endeavor is quite rich and complicated okay so there's those general tasks so understanding translation and generation but there are there's another tier below there which are what i would call subtasks these are the um uh these are not as ambitious these are not as abstract but these are also important functions and problems that had to be solved before those other things could be achieved these are tasks like summarizing long documents federal regulatory documents are are a key example they could be hundreds even thousands of pages long is it possible for an ai to go and give a decent summary of that another subtask is question answering so this is you know what kind of underscores the conversational abilities of a chatbot can you you know can you have a program that can understand what you're asking for find the answer and then um and then give you the answer now the answer doesn't necessarily have to be in natural language generation but you know sometimes it's just a question of you know can you ask the computer um you know how many units did we sell last year and it will just give you an answer so that's the question answering task and these next three we're going to say a little bit more in this and the next two lectures so there's the task called information extraction which is where you begin with the text and then you find information within the text and there is also sentiment analysis and semantic analysis where sentiment is trying to understand the um the emotional polarity of the text and semantic analysis is trying to get some notion of of the meaning of the text and then finally in the last video we'll talk about auto categorization of documents so you got documents coming in you need to put them into some kind of bucket maybe for some somebody downstream to process them so the classification of text is another major subtask so just to take a look at what we mean by information extraction this is something that's been researched for a long time starting in the library science and information science and computer science the idea is that really typically an organization has got a lot of text lying around in one form or another and it's it's unstructured which means it's in natural language some estimates have as high as 80 percent so 80 of the strategic data of an organization is in a format that's that is currently not amenable to analysis and it just requires a human to go through and find it and read through it these are things like contracts repair logs correspondence speeches call center logs patient health records patents job postings sec filings and the task of information extraction is this idea of going through there and getting this in some kind of summative visualization or analysis some kind of quantifiable form that you can use to help you with your decision making so for example how many of our contracts involved late delivery okay so this is something that that you can extract from the text but it does require some work so in general information extraction could involve creating a database so you have a program that goes through and reads through the documents in question so for example um let's say we have you know hundreds of thousands of news articles and we would like to be able to create a database that we can query you might be able to you know the question is can you find events in news so you find a news event sorry you find a news document and then you can convert it into this format okay so what i want to know is what was the company involved what was the position of the person who's doing the action what is the name of the person and then what exactly was the what exactly was the event okay so these things kind of describe what we need and if we can say okay the company mentioned was starbucks and then the ceo howard schultz um uh quit or resigned okay so you see how this can become a database where you can say okay how you know who is the ceo of starbucks or who was the ceo of starbucks you know you could position this with a date also okay so you can build the database completely from just text documents that don't have any structure to them okay so even smaller so getting down into more neat more my new nitty gritty there are what i would call micro tasks these are problems that kind of had to be solved before we could do any of the mid-level tasks which had to be solved before we could do any of the larger tests such as natural language under understanding and generation so these are things like part of speech tagging often this is the first step of understanding a document is figuring out what parts of speech are there parsing that is you know understanding where are you know what are the elements of the sentence you know you may have had to diagram a sentence i don't know if they do that anymore probably not but um basically diagramming a sentence so that you can traverse the sentence as a tree sentence boundary detection okay so this is um you know on the surface this seems easy uh sentences end in periods but unfortunately acronyms end in periods too um words like mr also end in periods so there's more to it than just saying okay look for a period that's the end of the sentence also word boundary detection now this is something that is easy but it's easy in english because we put space between words certain languages such as chinese and arabic don't put space between words so it's a harder problem there there's also a task called named entity recognition so you might you might be interested in particular kinds of entities let's say you're interested in people and companies and dates and you want to be able to go through a massive piece of text and then find all of the people well that's that's not necessarily a trivial task find all of the companies or maybe find maybe you're interested in countries find all of the countries or find all the places in some cases you know let's say you work for a pharmaceutical company and you want to find all of the drug names you know given the fact that there are multiple misspellings of drug names drug names can be complicated you might be interested in a named entity recognizer that can recognize when somebody's referring to a drug so this is named entity recognition topic segmentation means going through a document and figuring out okay there's really four topics in here there's there's four things that they're actually talking about there's automatic summarization as i said the um you know part of the taking a long document summarizing it and finally there's this challenge of discourse analysis where this involves not only knowing what the words are what the tone is but also understanding something about patterns of how conversations occur and this is this is considered to be quite challenging um but the essence of this you know as far as you know what is our interest in the capabilities of natural language processing as managers is as always we're trying to think about where are the applications and what are the implications in terms of management and if you think about roles what percentage of a role is spent reading or listening then finding information somewhere usually in text and then generating a spoken or written response and if you think about this there are a lot of jobs where this is really most of what you do okay not only in customer support okay customer calls and says i want to know how i get my warranty you look up various information you tell them what they need to know okay but also something like law lawyers spend a lot of time listening to and understanding a case then they go through and they find all the precedents they look through contracts and then they create an argument this is this a little bit more complicated version of the same process and then even something like counseling reading and listening finding information in text and then generating a spoken or written response okay so counseling is something that can to some extent you know when we have perfect chat bots maybe this is something that can even be done automatically so there are lots of positions that are ripe for automating now that natural language processing has reached maturity

Transcript for:Module 3 - Lecture - Natural Language Processing 1: Definition and Tasks

Transcript for:
Module 3 - Lecture - Natural Language Processing 1: Definition and Tasks