Transcript for:
Comprehensive AI Course Lecture Notes

Welcome to our artificial intelligence full course by simpl let's start by explaining what AI is and why it's essential in 2024 artificial intelligence or AI is a simulation of human intelligence in machines it's basically about creating systems that can think learn and adapt like humans in 20124 the need for AI is greater than ever AI drives Innovation across various Industries from Healthcare to finance to entertainment and to Transportation it helps businesses automate processes improve decision making and enhanced customer experiences with AI we can solve complex problems faster and more efficiently the demand for AI professionals is skyrocketing these days on an average AI engineers in United States earn about $100,000 to $150,000 per year with experienced professional making even more this lucrative field offers EX exciting career opportunities for those with the right skills in this course we will cover everything you need to know about AI start with ai's 5 minute for a quick overview followed by an in-depth AI explain discover the top 10 AI Technologies and understand the difference between Ai and AGI we will also explore the future of AI craving a career upgrade subscribe like and comment below dive into the link in the description to FastTrack your ambitions whether you're making a switch or aiming higher simply learn has your back but before we commence on if you're interested in elevating your career our post-graduate program in Ai and machine learning is perfect for you rank first by ker Karma this 11mon online boot camp delivered in collaboration with P University and IBM covers essential skills like machine learning deep learning NLP computer vision reinforcement learning generative AI prompt engineering and anymore you can earn certificates from por University and IBM and Access Master Class by IBM experts and gain hands-on experience with 25 plus projects and 20 plus tools simply learned graduates have successfully transitioned careers and Achieve significant salary hikes the link is mentioned Below in the description box so hurry up now and join the course the robot must determine which path to take based on the circumstances this portrays the robot's reasoning ability after a short stroll the the robot now encounters a stream that it cannot swim across using the plank provided as an input the robot is able to cross this stream so our robot uses the given input and finds the solution for a problem this is problem solving these three capabilities make the robot artificially intelligent in short AI provides machines with the capability to adapt reason and provide Solutions well now that we know what AI is let's have a look at the two broad categories in AI is classified into weak AI also called narrow AI focuses solely for example Alpha go is a mro of the game go remotely good at Jess this makes alphago a weak AI you might say Alexa is definitely not a weak AI since it can perform multiple tasks well that's not really true when you ask Alexa to play despasito it picks up the key words play and despacito and runs a program it is trained to Alexa cannot respond to a question it isn't trained to answer for instance try asking Alexa the status of Alexa cannot provide you this information to it brings us to our second category of AI strong AI now this is much like the robots that only exist in fiction as of now Ultron from Avengers is an ideal example of a strong AI That's because it's self-aware and eventually even develops emotions this makes the ai's response unpredictable You Must Be Wondering well how is artificial intelligence different from machine learning and deep learning we saw what AI is machine learning is a technique to achieve Ai and deep learning in turn is a subset of machine learning machine learning provides a machine with the capability to learn from data and experience through algorithms deep learning does this learning through ways inspired by the human brain this mean through deep futurist predict so now the question arises why the current hype is in the world around AI what's the different now so let's get started why AI matters AI possesses the capability to revolutionize various sector spanning Finance education transportation and healthare by automating repetive task refining decision making process and accelerating data analysis AI holds immense potential for transformative impact presently the creation of large scale generative AI model remains within the realm of Select few technology Giants the system demands substantial computing power and data resources consequently a handful individuals leading these organization wield considerable influence over the direction of AI application with farring implications for Society at large so after learning why AI matter let's see some history of AI and where it comes from history of AI artificial intelligence is not a recent term or technology for research it's actually much older than you might think there are even stories of mechanical begins in ancient Greek and Egyptian Miss alen Turing a pioneer of AI proposed in 1950 that if a machine can engage in conversation with human and the human can't tell whether they are talking to another human or machine then the machine has shown humanik intelligence the concept of machine learning became widely known in the 19 1950s after we were witness a demonstration of Arthur Samuel's Checkers program defeating its human opponent Robert nearly on television however for many years AI was mostly associated with expert in technology and fans of science fness so let's move forward and see what is AI artificial intelligence so artificial intelligence a is the simulation of human intelligence in machine data program to think and act like humans learning reasoning problem solving perception and language comprehension are all examples of cognitive abilities artificial intelligence is a method of making computer a controll robot or a software think intelligently like the human mind AI accomplish by studying the patterns of the human brains and by analyzing the cognitive process the outcomes of these studies develop intelligence software and systems so now moving forward let's see how does AI work so let's take chat gbt example and see how does AI work chat gbt operates as an AI driven chatboard utilizing natural language processing and machine learning algorithms to engage with users so what differentiates LGBT from other chat boards in adeptness in grasping context and Furnishing relevant responses this proficiency enables it to offers assistant tailor to users queries and needs making it an invaluable tool for diverse application unlike search engine like Google which primarily retrieve web pages and articles related to search queries chity focuses on understanding the intent behind the user question and providing appropriate responses while Google returns search result based on keyword matches chat gbt generates responses based on the context and the meaning of query offering more personalized and conversational interaction the acronym gbt and chat gbt stand for generative pre-trained Transformer highlighting its capability to generate responses the model is pre-trained using the amount of text Data from sources like books web pages news articles and scientific journals developed by openai the model employs a neural network architecture known as the Transformer to process input Tex and and generate coherent and natural sounding responses to understand how chbt works it is essential to recognize its training process initially human trainer provide conversational data guiding the model to produce humanik responses through supervised learning the model learns to associate inputs with appropriate outputs refining its conversational abilities subsequently reinforcement learning for the enhances by the model performance by enabling it to learn patterns and relationships in Text data autonomously so CH gpt's neural network takes in a string of text as input and generates a response as output however as with most AI models neural network with essentially complex mathematical function that require numerical data as input so to achieve this each word in ch vocabulary is assigned a unique set of number to create a sequence of number that can be processed by the network with this process CH gbt can understand and respond to various inquiries with varying degrees of success depending on its training so moving forward now see application of AI so AI is everywhere so here are the top applications of AI first one is virtual personal assistant virtual personal assistant like Ci Alexa and Google Assistant utilize AI to understand and respond to user commands manage schedules set reminders and answer question the second one is recommendation system AI poers recommendation system used by companies like Netflix Amazon and Spotify to suggest personalized content to user based on their past behavior and preferences enhancing user experience and increasing engagement the third one is Healthcare AI is Revolution Health Care by enabling medical professionals to analyze large amount of patient data diagnosing diseases more accurately predict patient outcomes and personalize treatment plans application include medical imaging analysis drug Discovery and virtual health assistant the last one is autonomous vehicles AI plays a crucial role in autonomous vehicles enabling them to perceive their surroundings make decisions and navigate safely without human intervention AI algorithm process data from sensor such as cameras ladar and radar to detect obstacle interpret traffic science and plan routes so now let's see some advantages and disadvantages of AI so advantages are the first one is efficiency AI can automate repetitive task increasing efficiency and freeing up human resource for more complex and creative resources the disadvantage is job displacement AI automation may lead to job loss in certain industry as task become automated potentially causing unemployment and enough disruption Advantage second Advantage is accuracy AI algorithms can process vast amount of data with Precision reducing errors and improving decision making the second disadvantage are bias and discrimination AI algorithm May reflect the biases present in the data they trained on leading to discriminatory outcome and decision- making process the third one is personalization AI enables personalized experience for user by analyzing their preferences and behavior leading to tailor recommendation and services the third disadvantage is privacy concern AI system May collect and analyze vast amount of personal data raising concern about privacy and data security the fourth one is accessibility AI technology can assist individuals with disabilities by providing tools for speech recognition language translation and accessibility features in digital products the fourth dis anage are dependency over Reliance on AI technology may lead to dependency issues where individuals and organization become overly reliant on AI system potentially critical thinking and decision making skills future of AI what lies ahead in the future AI will become even more powerful and useful it will help us in many areas such as Healthcare finance and transportation we will see smarter AI system that understand us better and can do more task for us the world is becoming increasingly competitive requiring business owners or individual to find new ways to stay ahead modern customers or individuals have higher expectations demanding personalized experience meaningful relationships and faster responses artificial intelligence is a game changer here AI helps promote goods and services or make your life easy with minimal effort and maximum result allowing everyone to make faster better informed decisions however with so many AI tools available it can be challenging to identify the best ones for your needs and productivity boost so here are top 10 AI Tools in 2024 that can transform your business or boost your productivity on the number 10 we have to to is a tool that can help you share your thoughts and ideas quickly and effectively unlike other methods such as making a slide deck or building a web page Toms let you create engaging and detailed presentation in just a minute you can enter any topic or idea and the AI will help you to put together a presentation that look great and gets your message across it's like getting the ideas out of your head and into the world all without sacrificing quality with Tom you can be sure that your presentation will be the both fast and effective and Ninth on the list is zapier zapier is a popular web automation tool that connects different apps allowing user to automate repetive task without coding knowledge with zapia you can combine the power of various AI tools to supercharge your productivity zapia supports more than 3,000 apps including popular platform like Gmail slack and Google sheet this versatility makes its a valuable tool for individual teams and businesses looking to streamline their operation and improve productivity and also with 7,000 plus integration and Services offering zapier Empower businesses everywhere to create processes and systems that let computer do what they are best at doing and let humans do what they are best at doing after covering zapia number Eighth on the list is gravity right gravity right is an AI powered writing tool that transer content creation it generates high quality SE optimized content in over 30 languages catering to diverse need like blog post social media updates ad copies and emails these tools ensure 100% original plagorism free content safeguarding your Brand's Integrity its AI capabilities also include text to image generation enhancing visual content for marketing purposes the tool offers both free and paid plans making it whatti for freelancer small business owner and marketing teams on the seventh number we have audio box audio box is Advanced CI tool developed by meta designed to transform audio production it allow user to create custom voices sound effect and audio stories with simple text prompts using natural language processing audio box generate high quality audio clear that can be used for various purposes such as text to speech voice mimicking and sound effect creation additionally audio Box offer interactive storytelling demos enabling user to generate Dynamic narratives between different AI voices this tool is particularly useful for content creator marketers and anyone needing quick high quality audio production without extensive manual effort and next on number six we have AOL AOL is Advanced AI power tool tailored for e-commerce and marketing professionals it it offers comprehensive suit of feature designed to streamline content creation and enhanc personalization with a cool user can generate customiz text images voice and videos making it an invaluable assert for creating engaging product videos and marketing materials key feature of a cool include face swapping realistic avatars video transition and talking photos these tools allow businesses to create Dynamic and personalized content that can Captivate audience on social media and other platform a cool user friendly interface and intelligent design make it easy for user to produce high quality content quickly and efficiently on number five we have 11 Labs 11 Labs is a leading AI tools for text to speech and voice cloning known for its high quality natural sounding speech generation the platform includes features like voice lab for creating or cloning voices with customizable options such as gender age and accent hey there did you know that AI voices can whisper or do pretty much anything ladies and gentlemen hold on to your hats because this is one bizarre site we have reports of an enormous fluffy pink monster strutting its stuff through downtown fluffy bird in downtown weird um let's switch the setting to something more calming imagine diving into a fast-paced video game your heartbeat sinking with the story line I got to go the aliens are closing in that wasn't calming at all explore all those voices yourself on the 11 Labs platform professional voice cloning supports multiple language and needs around 30 minutes of voice samples for precise replication the extensive voice Library offers a variety of profiles suitable for podcast video narration and more with various pricing plans ranging from free to Enterprise level 11 Labs creators to individual creators and large businesses alike standing out for its userfriendly interface and Superior Voice output quality at number four we have go enhance go enhance AI is an advanced multimedia tool designed to Rize video and image Ed it leverages powerful AI algorithm to enhance and upscale images transforming them into high resolution Masterpiece with extreme detail the platform standout feature video to video allow user to convert standard video into various animated style such as pixel art and Anime giving a fresh and Creative Touch to otherwise ordinary footage this AI tool is ideal for social media content creator marketer educator and anyone looking to bring their Creative Vision to life whether you need to create eye-catching marketing materials or professional grade videos go enhance AI provides the resources to do so efficiently at number three we have pictor pictor ai power tool designed to streamline video creation by transforming various content types into engaging visual media it excels in converting text based content like articles and script into compiling videos making it ideal for Content marketers and Educators users can also upload their own images and videos to craft personalized content the platform featured AI generated voiceovers which add a professional Touch without the need for expensive voice Talent Victoria AI ofs a range of customizable templates simplifying the video production process even for those with no design skills additionally its unique text based video editing capability allow user to repurpose existing content easily creating highlights or short clips from the longer videos at number two we have Nvidia broadcast it's a powerful tool that can enhance your video conferencing experience whether you are using Zoom or teams it can address common challenges like background noise poor lightning or low quality audio video with this software you can improve audio quality by removing unwanted noise such as keyboard clicks off hand sound it also offers virtual background options and bluring effect without needing a green screen so you can seamlessly integrate it with other application like OBS Zoom Discord or Microsoft teams think of it as having a professional studio at home plus it's a free for NVIDIA RTX graphic card user visit the website to learn more and start using it today after covering all the tools at number one we have tap Leo Tapo is an AI powered tool designed to enhance your LinkedIn presence and personal branding it leverages artificial intelligence to create engaging content schedule post and provide insight into your LinkedIn performance tap Leo's main feature include AI powered content inspiration a library of viral post and a Robos post composer for scheduling and managing LinkedIn content efficiently Tapo also offers easy to understand LinkedIn analytics to help user Make informed decision based on their performance data a free Chrome extension provides a quick overview of performance metrics directly on LinkedIn making it a convenient tool for daily users there you have it top 10 AI tools that are set to transform your life in 20124 whether you are Developer content creator or someone looking to boost their productivity these tools are worth keeping an eye on the future is there and it's powered by AI so now we'll start with artificial general intelligence so AGI is a field of AI research focused on creating software with human-like intelligence and self-learning capabilities the goal is for the software to perform task it wasn't specially trained for unlike current AI Technologies which operate within predefined parameters example and AI trained for image recognition can't build websites AGI aims to develop systems with autonomous self-control self- understanding and the ability to learn new skills to solve complex problems in various context AGI remains a theoretical concept and research objective so now let's understand the difference between artificial intelligence and artificial general intelligence so AI enables software to perform task like human level performance often excelling in specific areas like AI summaries extracting key points from documents and in contrast AGI systems can solve problems across multiple domains like a human without manual intervention self-teaching and handling task they weren't initially trained for an AGI is a theortical representation of comprehensive AI with generalized human cognitive abilities and today's AI systems require substantial training for specific task unlike AGI which aims to handle unfamiliar task independently so now we'll see the theoretical approaches to artificial general intelligence research so there are several methods that drive AGI research focusing on replicating human cognitive processes number one processes symbolic that uses logic networks to represent human thoughts interpreting ideas at a higher level then comes connectionist that emulates the human brain structure with neural networks aiming for humanlike intelligence and lowlevel cognitive capabilities and then come Universalist that address AGI complexities at the calculation level formulating theoretical solutions for practical AJ systems and then we have hybrid that combines symbolic and Subs symbolic methods to achieve results Beyond a single approach and now we'll see the technologies that drive artificial general intelligence research so AJ research is propelled by several emerging Technologies coming to number one that is deep learning it trains neural networks with multiple layers to understand complex relationships from row data useful for text audio image and video analysis then we have generative that produces unique and realistic content from learn knowledge responding to human queries with text audio or visuals and then we have NLP that allows system to understand and generate human language using computational Linguistics and machine learning and after that we have computer vision that extracts and comprehends spe information from visual data automating task like object recognition and classification and after that we have robotics that builds mechanical systems for physical task essential for ai's sensory perception and physical manipulation capabilities and after that we'll see the applications of artificial general intelligence and number one is Healthcare so AJ systems can analyze V medical records for Diagnostics treatment planning and Drug Discovery they also enable personalized Medicine by tailoring treatments based on an individual's genetic makeup and medical history the next application is finance AI can revolutionize investment strategies risk management FL detection and algorithmic trading by making faster and more precise decisions through real-time analysis of market trends and economic data the next application is education AGI can personalize learning experiences by adapting to each student's unique needs and learning styles creating specializ resources and using natural language processing to clarify Concepts and after that we have manufacturing so AGI enhances manufacturing efficiency and predictability by forecasting equipment failures and conducting quality control and analyzing sensor data and production metrics to identify cost saving and efficiency opportunities in real time so now we'll see the challenges in artificial general intelligence research so the number one challenge is make connections so current AI models are domain specific and can't apply knowledge across domains unlike humans who adapt knowledge to various context then comes emotional intelligence so human creativity and emotional responses are challenging to replicate with neural networks which generate outputs based on trained data patterns and then we have sensory perception so AI requires advanced technology to perceive the world accurately differentiating shapes colors taste smells and sounds like humans so now we'll see the future of artificial general intelligence that is Agi so the future is in technological advancements so the rise of AI and cognitive Sciences makes AGI a tangible part of daily life so now coming to Second point the second advancement would be convergence of Technologies so interdisciplinary efforts combining AI Robotics and biotechnology May accelerate the development of AGI and then we have augmented intelligence so AGI has the potential to enhance human capabilities and accelerate problem solving with its creative power and now we have new interactive paradigms so Advanced natural language processing will enable more intuitive interactions between humans and machines revolutionizing communication and after that the new advancement is scientific discovery acceleration so AGI could speed up scientific processes by taking over data analysis and intellectual task and in conclusion artificial general intelligence AGI is a groundbreaking technology with the potential to transform societies worldwide however developing AGI is challenging and raises significant ethical questions despite its complexity a can address many of Humanity's problems and drive Innovation if approached responsibility AI could become a Monumental achievement for Humanity if its development and use are Guided by wisdom foresight and compassion AI will pretty much touch everything we do it's more likely to be correct and grounded in reality talk to the AI about how to do better it's a very deep philosophical conversation it's bit above my f grade I'm going to say something and it it's it's going to sound completely opposite um of what people feel uh you you you probably recall uh over the course of the last 10 years 15 years um almost everybody who sits on a stage like this would tell you it is vital that your children learn computer science um everybody should learn how to program and in fact it's almost exactly the opposite it is our job to create Computing technology such that nobody has to program and that the programming language is human everybody in the world is now a programmer this is the miracle artificial intelligence or AI from its humble beginnings in 1950s AI has evolved from the simple problem solving and symbolic reasoning to the advanced machine learning and deep learning techniques that power some of the most Innovative application we see today so AI is not just a bus word it is a revolutionary Force reshaping Industries enhancing daily life and creating unmatched opportunities across various sector AI is changing numerous fields in health care it aids in early disease diagnosis and personalized treatment plans in finance it transform money management with the robo advisers and fraud detection system the automotive industry is seeing the rise of autonomous vehicles that navigate traffic and recognize obstacle while retail and e-commerce benefit from personalized shopping experience and optimize Supply Chain management so one of the most exciting developments in the AI is the rise of advanced CI tools like chgb 40 Google Gemini and generative AI models so these tools represent The Pinacle of conversational AI capable of understanding and generating humanik tax with remarkable accuracy chgb 40 can assist in writing brainstorming ideas and even tutoring make its valuable resource for student professional and creatives similarly Google Gemini take AI integration to the next level enhancing search capabilities providing insightful responses and inating seamlessly into our digital lives generative AI is a subset of AI is also making views by creating new content from scratch tools like Dal which generates images from textual desciption and G 3 which can write coherent and creative text are just the beginnings so these Technologies are changing Fields like art design and content creation enabling the generation of unique and personal outputs that were previously unimaginable so beyond specific Industries AI application extend to everyday's life voice activated assistant like Siri and Alexa and smart home devices learn our preferences and adjust our environments ACC accordingly so AI is embedded in the technology we use daily making our lives more convenient connected and efficient so join us as we explore the future of AI examining the breakthroughs the challenges and the endless possibilities that lies ahead so whether you are a tech Enthusiast a professional in the field or simply curious about worst next so this video will provide you with a comprehensive look at how AI is shaping our world and what we can expert in the year to come so before we move forward as we know chb Gemini geni TOS is an AI based and if you want to learn how these School AI develop and want to create your own so how AI will impact the future the first is enhanced business automation AI is transforming business automation with 55% of organization adopting AI technology chatbots and digital assistant handle customer interaction and basic employee inquiries speeding up decision making the second thing is job disruption automation May displace job with a one3 to takes potentially automated while roles like Securities are at risk demand for machine learning specialist is rising AI is more likely to augment skilled and creative positions emphasizing the need for Skilling data privacy issues training AI model requires large data set raising privacy concern the FTC is investigating open AI for potential violation and the Biden haris Administration introduced an AI bill of right to promote data transfarency the fourth one is increased regulation AI impact on intellectual property and ethical concerns is leading to increased regulation lawsuits and government guidelines on responsible AI use could reshape the industry climate change concern AI optimize Supply chains and reduce emission but the energy needed for the AI model do may increase carbon emission potentially negating environmental benefits so understanding these impacts help us to prepare for ai's future challenges and opportunities so now let's see what industries will AI impact the most the first one is manufacturing AI enhances manufacturing with robotic arm and predictive sensors improving tasks like assembly and equipment and maintenance the second is Healthcare AI changes healthare by quickly identifying diseases and streamline drug Discovery and monitoring patients through virtual nursing assistant The Third One Finance AI has bank and financial institution detect fraud conduct Audits and assess loan applications while Trader use AI for risk assessment and smart investment decision the fourth one education AI personalizes education by digitizing textbook deducting plagarism and analyzing student emotions to tailor learning experience the fifth one customer service AI power chatbots and virtual assistant provide data driven insights enhancing customer service interaction so these industries are experiencing significant changes due to AI driving Innovation and efficiency across various sector so now let's move forward and see some risk and dangers of AI so AI offers many benefits but also posess significant risk the first one job loss from 2023 to 2028 44% of worker skills will be disrupted without upskilling AI could lead to to hire unemployment and fewer opportunities for marginalized groups the second one is human biases AI often reflect the biases of its trainer such as facial recognition favoring lighter skin tones unchecked biases can perpetuate social inequalities the third one defects and misinformation defect plus reality spreading misinformation with dangerous consequences they can be used for political propaganda financial fraud and compromising reputation the fourth one data privacy AI training on public data risk breaches that expose a personal information a 2024 Cisco survey found 48% of businesses use non-public information in AI tools with 69 concerned about intellectual property and legal rights breaches could expose million of consumers data the fifth one automated weapons AI in automated weapon fails to distinguish between Soldier and civilization posing savior threats Miss isues could lead endangered large population understanding these risk is crucial for responsible AI development and the use so as we explore the future of AI it's clear that impact will be profound and far-reaching AI will change Industries enhance efficiency and drive Innovation however it also brings significant challenges including job displacement biases privacy concern misinformation and the ethical implication of automated weapons so to harness AI potential responsibility we must invest in upscaling our Workforce address biases in AI system protect data privacy and develop regulations that ensure ethical AI use we've looked at a lot of examples of machine learning so let's see if we can give a little bit more of a concrete definition what is machine learning machine learning is the science of making computers learn and act like humans by feeding data and information without being explicitly programmed we see here we have a nice little Di where we have our ordinary system uh your computer nowadays you can even run a lot of the stuff on a cell phone because cell phones advance so much and then with artificial intelligence and machine learning it now takes the data and it learns from what happened before and then it predicts what's going to come next and then really the biggest part right now in machine learning that's going on is it improves on that how do we find a new solution so we go from descriptive where it's learning about stuff and understanding how it fits together to predicting what it's going to do to post scripting coming up with a new solution and when we're working on machine learning there's a number of different diagrams that people have posted for what steps to go through a lot of it might be very domain specific so if you're working on Photo identification versus language versus medical or physics some of these are switched around a little bit or new things are put in they're very specific to The Domain this kind of a very general diagram first you want to Define your objective very important to know what it is you're wanting to predict then you're going to be collecting the data so once you've defined an objective you need to collect the data that matches you spend a lot of time in data science collecting data and the next step preparing the data you got to make sure that your data is clean going in there's the old saying bad data in bad answer out or bad data out and then once you've gone through and we've cleaned all this stuff coming in then you're going to select the algorithm which algorithm are you going to use you're going to train that algorithm in this case I think we're going to be working with svm the support Vector machine then you have to test the model does this model work is this a valid model for what we're doing and then once you've tested it you want to run your prediction you want to run your prediction or your choice or whatever output it's going to come up with and then once everything is set and you've done lots of testing then you want to go ahead and deploy the model and remember I said domain specific this is very general as far as the scope of doing something a lot of models you get halfway through and you realize that your data is missing something and you have to go collect new data because you've run a test in here someplace along the line you're saying hey I'm not really getting the answers I need so there's a lot of things that are domain specific that become part of this model this is a very general model but it's a very good model to start with and we do have some basic divisions of what machine learning does that's important to know for instance do you want to predict a category well if you're categorizing thing that's classification for instance whether the stock price will increase or decrease so in other words I'm looking for a yes no answer is it going up or is it going down and in that case we'd actually say is it going up true if it's not going up it's false meaning it's going down this way it's a yes no 01 do you want to predict a quantity that's regression so remember we just did classification now we're looking at regression these are the two major divisions in what data is doing for instance predicting the age of a person based on the height weight has and other factors So based on these different factors you might guess how old a person is and then there are a lot of domain specific things like do you want to detect an anomaly that's anomaly detection this is actually very popular right now for instance you want to detect money withdrawal anomalies you want to know when someone's making a withdrawal that might not be their own account we've actually brought this up because this is really big right now if you're predicting the stock whether to buy stock or not you want to be able to know if what's going on in the stock market is an anomaly use a different prediction model because something else is going on you got to pull out new information in there or is this just the norm I'm going to get my normal return on my money invested so being able to detect anomalies is very big in data science these days another question that comes up which is on what we call untrained data is do you want to discover structure in unexplored data and that's called clustering for instance finding groups of customers with similar Behavior given a large database of custom customer data containing their demographics and past buying records and in this case we might notice that anybody who's wearing certain set of shoes go shopping at certain stores or whatever it is they're going to make certain purchases by having that information it helps us to Market or group people together so that we can now explore that group and find out what it is we want to Market to them if you're in the marketing world and that might also work in just about any Arena you might want to group people together whether they're uh based on their different areas and Investments and financial background whether you're going to give them a loan or not before you even start looking at whether they're valid customer for the bank you might want to look at all these different areas and group them together based on unknown data so you're not you don't know what the data is going to tell you but you want to Cluster people together that come together let's take a quick DeTour for quiz time oh my favorite so we're going to have a couple questions here under our quiz time and um we'll be posting the answers in the part two of this tutorial so let's go ahead and take a look at these quiz times questions and hopefully you'll get them all right and it'll get you thinking about how to process data and what's going on can you tell what's happening in the following cases of course you're sitting there with your cup of coffee and you have your check boox and your pen trying to figure out what's your next step in your data science analysis so the first one is grouping documents into different categories based on the topic and content of each document very big these days you know you have legal documents you have uh maybe it's a Sports Group documents maybe you're analyzing newspaper postings but certainly having that automated is a huge thing in today's world B identifying handwritten digits in images correctly so we want to know whether uh they're writing an A or capital a BC what are they writing out in their hand digit their handwriting C behavior of a website indicating that the site is not working as designed D predicting salary of an individual based on his or her years of experience where HR hiring uh setup there so stay tuned for part two we'll go ahead and answer these questions when we get to the part two of this tutorial or you can just simply write at the bottom and send a note to Simply learn and they'll follow up with you on it back to our regular content and these last few bring us into the next topic which is another way of dividing our types of machine learning and that is is with supervised unsupervised and reinforcement learning supervised learning is a method used to enable machines to classify predict objects problems or situations based on labeled data fed to the machine and in here you see we have a jumble of data with circles triangles and squares and we label them we have what's a circle what's a triangle what's a square and we have our model training and it trains it so we know the answer very important when you're doing supervised learning you already know the answer to a lot of your information coming in so you have a huge group of data coming in and then you have new data coming in so we've trained our model the model now knows the difference between a circle a square a triangle and now that we've trained it we can send in in this case a square and a circle goes in and it predicts that the top one's a square and the next one's a circle and you can see that this is uh being able to predict whether someone's going to default on a loan because I was talking about Banks earlier supervised learning on stock market whether you're going to make money or not that's always important and if you are looking to make a fortune on the stock market keep in mind it is very difficult to get all the data correct on the stock market it is very it fluctuates in ways you really hard to predict so it's quite a roller coaster ride if you're running machine learning on the stock market you start realizing you really have to dig for new data so we have supervised learning and if you have supervised we should need unsupervised learning in unsupervised learning machine learning model finds the hidden pattern in an unlabeled data so in this case instead of telling it what the circle is and what a triangle is and what a square is it goes in there looks at them and says for whatever reason it groups them together maybe it'll group it by the number of corners and it notices that a number of them all have three corners a number of them all have four corners and a number of them all have no corners and it's able to filter those through and group them together we talked about that earlier with looking at a group of people who are out shopping we want to group them together to find out what they have in common and of course once you understand what people have in common maybe you have one of them who's a customer at your store or you have five of them are customer at your store and they have a lot in common with five others who are not customers at your store how do you Market to those five who aren't customers at your store yet they fit the demograph of who's going to shop there and you'd like them to shop at your store not the one next door of course this is a simplified version you can see very easily the difference between a triangle and a circle which is might not be so easy in marketing reinforcement learning rein enforcement learning is an important type of machine learning where an agent learns how to behave in an environment by performing actions and seeing the result and we have here where the in this case a baby it's actually great that they used an infant for this slide because the reinforcement learning is very much in its infant stages but it's also probably the biggest machine learning demand out there right now or in the future it's going to be coming up over the next few years is reinforcement learning and how to make that work for us and you can see here where we have have our action in the action in this one it goes into the fire hopefully the baby didn't it was just a little candle not a giant fire pit like it looks like here when the baby comes out and the new state is the baby is sad and crying because they got burned on the fire and then maybe they take another action the baby's called the agent cuz it's the one taking the actions and in this case they didn't go into the fire they went a different direction and now the baby's happy and laughing and playing reinforcement learning is very easy to understand because that's how as humans that's one of the ways we learn we learn whether it is you know you burn yourself on the stove don't do that anymore don't touch the stove in the big picture being able to have machine learning program or an AI be able to do this is huge because now we're starting to learn how to learn that's a big jump in the world of computer and machine learning and we're going to go back and just kind of go back over supervised versus unsupervised learning understanding this is huge because this is going to come up in any project you're working on we have in supervised learning we have labeled data we have direct feedback so someone's already gone in there and said yes that's a triangle no that's not a triangle and then you predicted outcome so you have a nice prediction this is this this new set of data is coming in and we know what it's going to be and then with unsupervised trading it's not labeled so we really don't know what it is there's no feedback so we're not telling it whether it's right or wrong we're not telling it whether it's a triangle or a square we're not telling it to go left or right all we do is we're finding hidden structure in the data grouping the data together to find out what connects to each other and then you can use these together so imagine you have an image and you're not sure what you're looking for so you go in and you have the unstructured data find all these things that are connected together and then somebody looks at those and labels them now you can take that label data and program something to predict what's in the picture so you can see how they go back and forth and you can start connecting all these different tools together to make a bigger picture there are many interesting machine learning algorithms let's have a look at a few of them hopefully this give you a little flavor of what's out there and these are some of the most important ones that are currently being used we'll take a look at linear regression decision tree and the support Vector machine let's start with a closer look at linear regression linear regression is perhaps one of the most well-known and well understood algorithms in statistics and machine learning linear regression is a linear model for example a model that assumes a linear relationship between the input variables X and the single output variable Y and you'll see this if you remember from your algebra classes y = mx + C imagine we are predicting distance traveled y from speed X our linear regression model representation for this problem would be y = m * x + C or distance = M * speed plus C where m is the coefficient and C is the Y intercept and we're going to look at two two different variations of this first we're going to start with time is constant and you can see we have a bicyclist he's got a safety gear on thank goodness speed equals 10 m/s and so over a certain amount of time his distance equals 36 kilm we have a second bicyclist who's going twice the speed or 20 m/ second and you can guess if he's going twice the speed and time is a constant then he's going to go twice the distance and that's easily to compute 36 * 2 you get 7 2 kilm and so if you had the question of how fast would somebody's going three times that speed or 30 m/ second is you can easily compute the distance in our head we can do that without needing a computer but we want to do this for more complicated data so it's kind of nice to compare the two but let's just take a look at that and what that looks like in a graph so in a linear regression model we have our distance to the speed and we have our m equals the ve slope of the line and we'll notice that the line has a plus slope and as speed increases distance also increases hence the variables have a positive relationship and so your speed of the person which equals yal MX plus C distance traveled in a fixed interval of time and we could very easily compute either following the line or just knowing it's three times 10 m/s that this is roughly 102 km distance that this third bicep has traveled one of the key definitions on here is positive relationship so the slope of the line is positive as distance increase so does speed increase let's take a look at our second example where we put distance is a constant so we have speed equals 10 m per second they have a certain distance to go and it takes them 100 seconds to travel that distance and we have our second bicyclist who's still doing 20 m/ second since he's going twice the speed we can guess he'll cover the distance in about half the time 50 seconds and of course you could probably guess on the third one 100 divided by 30 since he's going three times the speed you could easily guess that this is 33333 seconds time we put that into a linear regression model or a graph if the distance is assumed to be constant let's see the relationship between speed and time and as time goes up the amount of speed to go that same distance goes down so now your m equals a minus ve slope of the line as the speed increases time decreases hence the variable has a negative relationship again there's our definition positive relationship and negative relationship dependent on the slope of the line and with a simple formula like this um and even a significant amount of data let's see with the mathematical implementation of linear regression and we'll take this data so suppose we have this data set where we have xyx = 1 2 3 4 5 standard series and the Y value is 3 2243 when we take that and we go ahead and plot these points on a graph you can see there's kind of a nice scattering and you could probably eyeball a line through the middle of it but we're going to calculate that exact line for linear regression and the first thing we do is we come up here and we have the mean of XI and remember mean is basically the average so we added five plus 4 plus 3 plus 2+ 1 and divide by five and that simply comes out as three and then we'll do the same for y we'll go ahead and add up all those numbers and divide by five and we end up with a mean value of y of I equals 2.8 where the XI references it's an average or means value and the Yi also equals a means value of y and when we plot that you'll see that we can put in the Y = 2.8 and the xal three in there on our graph we kind of gave it a little different color so you could sort it out with the dash lines on it and it's important to note that when we do the linear regression the linear regression model should go through that dot now let's find our regression equation to find the best fit line remember we go ahead and take our yal MX plus C so we're looking for M and C so to find this equation for our data we need to find our slope of M and our coefficient of c and we have y = mx + C where m equals the sum of x - x aage * y - y aage or y means and X means over the sum of x- X means squared that's how we get the slope of the value of the line and we can easily do that by creating some columns here we have XY computers are really good about iterating through data and so we can easily compute this and fill in a graph of data and in our graph you can easily see that if we have our x value of one and if you remember the X II or the means value is three 1 - 3 = a -2 and 2 - 3 = A1 so on and so forth and we can easily fill in the column of x - x i y - Yi and then from those we can compute x - x i^ 2 and x - x i * y - y and you can guess it that the next step is to go ahead and sum the different columns for the answers we need so we get a total of 10 for our x - xi^ 2 and a total of 2 for x - x i * y - Yi and we plug those in we get 2/10 which equals 0.2 so now we know the slope of our line equals 0.2 so we can calculate the value of c that'd be the next step is we need to know where it crosses the y axis and if you remember I mentioned earlier that the linear regression line line has to pass through the means value the one that we showed earlier we can just flip back up there to that graph and you can see right here there's our means value which is 3 x = 3 and Y = 2.8 and since we know that value we can simply plug that into our formula y = 2x + C so we plug that in we get 2.8 = 2 * 3 + C and you can just solve for C so now we know that our coefficient equals 2 2.2 and once we have all that we can go ahead and plot our regression line Y = 2 * x + 2.2 and then from this equation we can compute new values so let's predict the values of Y using xal 1 2 3 4 5 and plot the points remember the 1 2 3 4 5 was our original X values so now we're going to see what y thinks they are not what they actually are and we plug those in we get y of designated with Y of P you can see that x = 1 = 2.4 x = 2 = 2.6 and so on and so on so we have our y predicted values of what we think it's going to be when we plug those numbers in and when we plot the predicted values along with the actual values we can see the difference and this is one of the things is very important with linear aggression in any of these models is to understand the error and so we can calculate the error on all of our different values and you can see over here we plotted um X and Y and Y predict and we drawn a little line so you can sort of see what the error looks like there between the different points so our goal is to reduce this error we want to minimize that error value on our linear regression model minimizing the distance there are lots of ways to minimize the distance between the line and the data points like sum of squared errors sum of absolute errors root mean square error Etc we keep moving this line through the data points to make sure the best fit line has the least Square distance between the data points and the regression line so to recap with a very simple linear regression model we first figure out the formula of our line through the middle and then we slowly adjust the line to minimize the error keep in mind this is a very simple formula the math gets even though the math is very much the same it gets much more complex as we add in different dimensions so this is only two Dimensions y = mx plus C but you can take that out to X Z ijq all the different features in there and they can plot a linear regression model on all of those using the different formulas to minimize the error let's go ahead and take a look at decision trees a very different way to solve problems in the linear regression model decision tree is a tree-shaped algorithm used to determine a course of action each branch of a tree represents a possible decision occurrence or reaction we have data which tells us if it is a good day to play golf and if we were to open this data up in a general spreadsheet you can see we have the Outlook whether it's a rainy overcast Sun temperature hot mild cool humidity windy and did I like to play golf that day yes or no so we're taking a census and certainly I wouldn't want a computer telling me when I should go play golf or not but you could imagine if you got up in the night before you're trying to plan your day and it comes up and says tomorrow would be a good day for golf for you in the morning and not a good day in the afternoon or something like that this becomes very beneficial and we see this in a lot of applications coming out now where it gives you suggestions and let lets you know what what would uh fit the match for you for the next day or the next purchase or the next uh whatever you know next mail out in this case is tomorrow a good day for playing golf based on the weather coming in and so we come up and let's uh determine if you should play golf when the day is sunny and windy so we found out the forecast tomorrow is going to be sunny and windy and suppose we draw our tree like this we're going to have our humidity and then we have our normal which is if it's if you have a normal humidity you're going to go play golf and if the humidity is really high then we look at the Outlook and if the Outlook is sunny overcast or rainy it's going to change what you choose to do so if you know that it's a very high humidity and it's sunny you're probably not going to play golf cuz you're going to be out there miserable fighting off the mosquitoes that are out joining you to play golf with you maybe if it's rainy you probably don't want to play in the rain but if it's slightly overcast and you get just the right Shadow that's a good day to play golf and be outside out on the green now in this example you can probably make your own tree pretty easily because it's a very simple set of data going in but the question is how do you know what to split where do you split your data what if this is much more complicated data where it's not something that you would particularly understand like studying cancer they take about 36 measurements of the cancerous cells and then each one of those measurements represents how bulbous it is how extended it is how sharp the edges are something that as a human we would have no understanding of so how do we decide how to split that data up and is that the right decision tree but so that's a question is going to come up is this the right decision tree for that we should calculate entropy and Information Gain two important vocabulary words there are the entropy and the Information Gain entropy entropy is a measure of Randomness or impurity in the data set entropy should be low so we want the chaos to be as low as possible we don't want to look at it and be confused by the images or what's going on there with mixed data and the Information Gain it is the measure of decrease in entropy after the data set is split also known as entropy reduction Information Gain should be high so we want our information that we get out of the split to be as high as possible let's take a look at entropy from the mathematical side in this case we're going to denote entropy as I of P of and N where p is the probability that you're going to play a game of Golf and N is the probability where you're not going to play the game of golf now you don't really have to memorize these formulas there's a few of them out there depending on what you're working with but it's important to note that this is where this formula is coming from so when you see it you're not lost when you're running your programming unless you're building your own decision tree code in the back and we simply have a log 2qu of P Over p+ N minus n / P plus n * the log square of n of p+ n but let's break that down and see what actually looks like when we're Computing that from the computer script side entropy of a target class of the data set is the whole entropy so we have entropy play golf and when we look at this if we go back to the data you can simply count how many yeses and no in our complete data set for playing golf days in our complete set we find we have five days we did play golf and nine days we did not play golf and so our I equals if you add those together 9 plus 5 is 14 and so our I equals 5 over 14 and 9 over 14 that's our P andn values that we plug into that formula and you can go 5 over 14 = 36 9 over 14 = 64 and when you do the whole equation you get the minus. 36 logun s of 36 -64 log s < TK of 64 and we get a set value we get .94 so we now have a full entropy value for the whole set of data that we're working with and we want to make that entropy go down and just like we calculated the entropy out for the whole set we can also calculate entropy for playing golf and the Outlook is it going to be overcast or rainy or sunny and so we look at the entropy we have P of Sunny Times e of three of two and that just comes out how many sunny days yes and how many sunny days no over the total which is five don't forget to put the we'll divide that five out later on uh equals P overcast = 4 comma 0 plus rainy equal = 2A 3 and then when you do the whole setup we have 5 over 14 remember I said there was a total of five 5 over 14 * the I of 3 of 2 + 4 over 14 * the 4 comma 0 and 514 over I of 23 and so we can now compute the entropy of just the part it has to do with the forecast and we get 693 similarly we can calculate the entropy of other predictors like temperature humidity and wind and so we look at the gain Outlook how much are we going to gain from this entropy play golf minus entropy play golf Outlook and we can take the original 0.94 for the whole set minus the entropy of just the um rainy day in temperature and we end up with a gain of. 247 so this is our Information Gain remember we Define entropy and we Define Information Gain the higher the information gain the lower the entropy the better the information gain of the other three attributes can be calculated in the same way so we have our gain for temperature equals 0.029 we have our gain for humidity equals 0.152 and our gain for a windy day equals 048 and if you do a quick comparison you'll see the. 247 is the greatest gain of information so that's the split we want now let's build the decision tree so we have the Outlook is it going to be sunny overcast or rainy that's our first split because that gives us the most Information Gain and we can continue to go down the tree using the different information gains with the largest information we can continue down the nodes of the tree where we choose the attribute with the largest Information Gain as the root node and then continue to split each sub node with the largest Information Gain that we can compute and although it's a little bit of a tongue twister to say all that you can see that it's a very easy to view visual model we have our Outlook we split it three different directions if the Outlook is overcast we're going to play and then we can split those further down if we want so if the over Outlook is sunny but then it's also windy if it's uh windy we're not going to play if it's not windy we'll play so we can easily build a nice decision trat to guess what we would like to do tomorrow and give us a nice recommendation for the day so we want to know if it's a good day to play golf when it's sunny and windy remember the original question that came out tomorrow's weather report is sunny and windy you can see by going down the tree we go out look sunny out look windy we're not going to play golf tomorrow so our little Smartwatch pops up and says I'm sorry tomorrow is not a good day for golf it's going to be sunny and windy and if you're a huge golf fan you might go uhoh it's not a good day to play golf we can go in and watch a golf game at home so we'll sit in front of the TV instead of being out playing golf in the wind now that we looked at our decision tree let's look at the third one of our algorithms we're investigating support Vector machine support Vector machine is a widely used classification algorithm the idea of support Vector machine is simple the algorithm creates a separation line which divides the classes in the best possible manner for example dog or cat disease or no disease suppose we have a labeled sample data which tells height and weight of males and females a new data point arrives and we want to know whether it's going to be a male or a female so we start by drawing a line we draw decision lines but if we consider decision line one then we will classify the individual as a male and if we consider decision line two then it will be a female so you can see this person kind of lies in the middle of the two groups so it's a little confusing trying to figure out which line they should be under we need to know which line divides the classes correctly but how the goal is to choose a hyperplane and that is one of the key words they use when we talk about support Vector machines choose a hyperplane with the greatest possible margin between the decision line and the nearest Point within the training set so you can see here we have our support Vector we have the two nearest points to it and we draw a line between those two points and the distance margin is the distance between the hyper plane and the nearest data point from either set so we actually have a value and it should be equally distant between the two um points that we're comparing it to when we draw the hyperplanes we observe that line one has a maximum distance so we observe that line one has a maximum distance margin so we'll classify the new data point correctly and our result on this one is going to be that the new data point is Mel one of the reasons we call it a h hyper plane versus a line is that a lot of times we're not looking at just weight and height we might be looking at 36 different features or dimensions and so when we cut it with a hyper plane it's more of a three-dimensional cut in the data multi-dimensional it cuts the data a certain way and each plane continues to cut it down until we get the best fit or match let's understand this with the help of an example problem statement I always start with a problem statement when you're going to put some code together we're going to do some coding now classifying muffin and cupcake recipes using support Vector machines so the cupcake versus the muffin let's have a look at our data set and we have the different recipes here we have a muffin recipe that has so much flour I'm not sure what measurement 55 is in but it has 55 maybe it's ounces but uh it has certain amount of flour certain amount of milk sugar butter egg baking powder vanilla and salt and So based on these measurements we want to guess whether we're making a muffin or a cupcake and you can see in this one we don't have just two features we don't just have height and weight as we did before between the male and female in here we have a number of features in fact in this we're looking at eight different features to guess whether it's a muffin or a cupcake what's the difference between a muffin and a cupcake turns out muffins have more flour while cupcakes have more butter and sugar so basically the cupcakes a little bit more of a dessert where the muffins a little bit more of a fancy bread but how do we do that in Python how do we code that to go through recipes and figure out what the recipe is and I really just want to say cupcakes versus muffins like some big professional wrestling thing before we start in our cupcakes versus muffins we are going to be working in Python there's many versions of python many different editors that is one of the strengths and weaknesses of python is it just has so much stuff attached to it and it's one of the more popular data science programming packages you can use in this case we're going to go ahead and use anaconda and jupyter notebook the Anaconda Navigator has all kinds of fun tools once you're into the Anaconda Navigator you can change environments I actually have a number of environments on here we'll be using python 36 environment so this is in Python version 36 although it doesn't matter too much which version you use I usually try to stay with the 3x because they're current unless you have a project is very specifically in version 2x 27 I think is usually what most people use in the version two and then once we're in our um jupyter notebook editor I can go up and create a new file and we'll just jump in here in this case we're doing svm muffin versus Cupcake and then let's start with our packages for data analysis we almost always use a couple there's a few very standard packages we use we use import oops import import numpy that's for number python they usually denoted as NP that's very comma that's very common and then we're going to import pandas as PD and numpy deals with number arrays there's a lot of cool things you can do with the numpy uh setup as far as multiplying all the values in an array in an numpy array data array pandas I can't remember we it actually in this data set I think we do as an import it makes a nice data frame and the difference between a data frame and a nump array is that a data frame is more like your Excel spreadsheet you have columns you have indexes you have different ways of referencing it easily viewing it and there's additional features you can run on a data frame and pandas kind of sits on numpy so they you need them both in there and then finally we're working with the support Vector machine so from SK learn we're going to use the sklearn model import svm support Vector machine and then as a data scientist you should always try to visualize your data some data obviously is too complicated or doesn't make any sense to the human but if it's possible it's good to take a second look at it so you can actually see what you're doing and for that we're going to use two packages we're going to import map plot library. pyplot as PLT again very common and we're going to import caborn as SNS and we'll go ahead and set the font scale in the SNS right in our import line that's with this um semicolon followed by a line of data we're going to set the SNS and these are great because the the caborn sits on top of mat plot Library just like Panda sits on numpy so it adds a lot more features and uses and control we're obviously not going to get into matplot library and Seaborn it' be its own tutorial we're really just focusing on the svm the support Vector machine from sklearn and since we're in Jupiter notebook uh we have to add a special line in here for our matplot library and that's your percentage sign or Amber sign matap plot library in line now if you're doing this in just a straight code Project A lot of times I use like notepad++ and I'll run it from there you don't have to have that line in there because it'll just pop up as its own window on your computer depending on how your computer set up because we're running this in the jupyter notebook as a browser setup this tells it to display all of our Graphics right below on the page so that's what that line is for remember the first time I ran this I didn't know that and I had to go look that up years ago it's quite a headache so map plot library in line is just because we're running this on the web setup and we can go ahead and run this make sure all our modules are in they're all imported which is great if you don't have them import you'll need to go ahead and pip use the PIP or however you do it there's a lot of other install packages out there although pip is the most common and you have to make sure these are all installed on your python setup the next step of course is we got to look at the data you can't run a model for predicting data if you don't have actual data so to do that let me go ahe and open this up and take a look and we have our uh cupcakes versus muffins and it's a CSV file or CSV meaning that it's comma separated variable and it's going to open it up in a nice uh spreadsheet for me and you can see up here we have the type we have muffin muffin muffin cupcake cupcake cupcake and then it's broken up into flour milk sugar butter egg baking powder vanilla and salt so we can do is we can go ahead and look at this data also in our python let us create a variable recipes equals we're going to use our pandas module. read CSV remember it was a comma separated variable and the file name happened to be cupcakes versus muffins oops I got double brackets there do it this way there we go cupcakes versus muffins because the program I loaded or the the place I saved this particular Python program is in the same folder we can get by with just the file name but remember if you're storing it in a different location you have to also put down the full path on there and then because we're in pandas we're going to go ahead and you can actually in line you can do this but let me do the full print you can just type in recipes. head in the Jupiter notebook but if you're running in code in a different script You' need to go ahead and type out the whole print recipes. head and Panda's knows is that's going to do the first five lines of data and if we flip back gone over to the spreadsheet where we opened up our CSV file uh you can see where it starts on line two this one calls it zero and then 2 3 4 5 six is going to match go and close that out because we don't need that anymore and it always starts at zero and these are it automatically indexes it since we didn't tell it to use an index in here so that's the index number for the leftand side and it automatically took the top row as uh labels so Panda's using it to read a CSV is just really slick and fast one of the reasons we love our pandas not just because they're cute and cuddly teddy bears and let's go ahead and plot our data and I'm not going to plot all of it I'm just going to plot the uh sugar and flour now obviously you can see where they get really complicated if we have tons of different features and so you'll break them up and maybe look at just two of them at a time to see how they connect and to plot them we're going to go ahead and use caborn so that's our SNS and the command for that is SNS dolm plot and then the two different variables I'm going to plot is flour and sugar data equals recipes the Hue equals type and this is a lot of fun because it knows that this is pandas coming in so this is one of the powerful things about pandas mixed with Seaborn and doing graphing and then we're going to use a pallet set one there's a lot of different sets in there you can go look them up for Seaborn we do a regular a fit regular equals false so we're not really trying to fit anything and it's a scatter kws a lot of these settings you can look up in Seaborn half of these you could probably leave off when you run them somebody played with this and found out that these were the best settings for doing a Seaborn plot and let's go ahead and run that and because it does it in line he just puts it right on the page and you can see right here that just based on sugar and flower alone there's a definite split and we use these models because you can actually look at it and say hey if I drew a line right between the middle of the blue dots and the red dots we'd be able to do an svm and and a hyperplane right there in the middle then the next step is to four format or pre process our data and we're going to break that up into two parts we need a type label and remember we're going to decide whether it's a muffin or cupcake well a computer doesn't know muffin or cupcake it knows zero and one so what we're going to do is we're going to create a type label and from this we'll create a nump array and P where and this is where we can do some logic we take our recipes from our Panda and wherever type equals muffin it's going to be zero and then if it doesn't equal muffin which is cupcakes it's going to be one so we create our type label this is the answer so when we're doing our training model remember we have to have a a training data this is what we're going to train it with is that it's zero or one it's a muffin or it's not and then we're going to create our recipe features and if you remember correctly from right up here the First Column is type so we really don't need the type column that's our muffin or cupcake and in pandas we can easily sort that out we take our value recipes dot columns that's a pandas function built into pandas do values converting them to values so it's just the column titles going across the top and we don't want want the first one so what we do is since it always starts at zero we want one colon till the end and then we want to go ahead and make this a list and this converts it to a list of strings and then we can go ahead and just take a look and see what we're looking at for the features make sure it looks right me go ahead and run that and I forgot the S on recipes so we'll go ahead and the s in there and then run that and we can see we have flour milk sugar butter egg baking powder vanilla and salt and that matches what we have up here where we printed out everything but the type so we have our features and we have our label Now the recipe features is just the titles of the columns and we actually need the ingredients and at this point we have a couple options one we could run it over all the ingredients and when you're ding this usually you do but for our example we want to limit it so you can easily see what's going on because if we did all the ingredients we have you know that's what um seven eight different hyperplanes that would be built into it we only want to look at one so you can see what the svm is doing and so we'll take our recipes and we'll do just flour and sugar again you can replace that with your recipe features and do all of them but we're going to do just flour and sugar and we're going to convert that to values we don't need to make a list out of it because it's not string values these are actual values on there and we can go ahead and just print ingredients you can see what that looks like uh and so we have just the N of flour and sugar just the two sets of plots and just for fun let's go ahead and take this over here and take our recipe features and so if we decided to use all the recipe features you'll see that it makes a nice column of different data so it just strips out all the labels and everything we just have just the values but because we want to be able to view this easily in a plot later on we'll go ahead and take that and just do flour and sugar and we'll run that and you'll see it's just the two columns so the next step is to go ahead and fit our model we'll go and just call it model and it's a svm we're using a package called SVC in this case we're going to go ahead and set the kernel equals linear so it's using a specific setup on there and if we go to the reference on their website for the svm you'll see that there's about there's eight of them here three of them are for regression three are for classification the SVC support Vector classification is is probably one of the most commonly used and then there's also one for detecting outliers and another one that has to do with something a little bit more specific on the model but SBC and S spr are the two most commonly used standing for support vector classifier and support Vector regression remember regression is an actual value a float value or whatever you're trying to work on and SBC is a classifier so it's a yes no true false but for this we want to know 01 muffin cupcake we go ahead and create our model and once we have our model created we're going to do model. fit and this is very common especially in the sklearn all their models are followed with the fit command and what we put into the fit what we're training with it is we're putting in the ingredients which in this case we limited to just flour and sugar and the type label is it a muffin or a cupcake now in more complicated data science series you want to split into we won't get into that today we split it into uh training data and test data and they even do something where they split it into thirds where a third is used for where you switch between which one's training and test there's all kinds of things go into that it gets very complicated when you get to the higher end not overly complicated just an extra step which we're not going to do today because this is a very simple set of data and let's go ahead and run this and now we have our model fit and I got a error here so let me fix that real quick Capital SBC it turns out I did it lowercase support Vector classifier there we go let's go ahead and run that and you'll see it comes up with all this information that it prints out automatically these are the defaults of the model you notice that we changed the kernel to linear and there's our kernel linear on the printout and there's other different settings you can mess with we're going to just leave that alone for right now for this we don't really need to mess with any of those so next we're going to dig a little bit into our newly trained model and we're going to do this so we can show you on a graph and let's go ahead and get the separating and we're going to say we're going to use a W for our variable on here we're going to do model. coefficient 0 so what the heck is that again we're digging into the model so we've already got a prediction and a train this is a math behind it that we're looking at right now and so the W is going to represent two different coefficients and if you remember we had y = mx plus C so these coefficients are connected to that but in two-dimensional it's a plane we don't want to spend too much time on this because you can get lost in the confusion of the the math so if you're a math Wiz this is great you can go through here and you'll see that we have AAL minus w0 over W of one remember there's two different values there and that's basically the slope that we're generating and then we're going to build an XX what is XX we're going to set it up to a numpy array there's our np. line space so we're creating a line of values between 30 and 60 so it just creates a set of numbers for x and then if you remember correctly we have our formula yals the slope X X Plus The Intercept well to make this work we can do this as y y equals the slope times each value in that array that's the neat thing about numpy so when I do a * XX which is a whole numpy array of values it multiplies a across all of and then it takes those same values and we subtract the model intercept that's your uh we had MX plus C so that'd be the C from the formula yal MX plus C and that's where all these numbers come from a little bit confusing because it's digging out of these different arrays and then we want to do is we're going to take this and we're going to go ahead and plot it so plot the parallels to separating hyperplane that pass through the support vectors and so we're going to create b equals a model support vectors pulling our support vectors out there here's our y y which we now know is a set of data and we have uh we're going to create YY down equal a * XX + B1 minus a * b0 and then model support Vector B is going to be set that to a new value the minus one setup and y y up equals a * XX + B1 - A * B 0 and we can go ahead and just run this to load these variables up if you wanted to know understand a little bit more of what's going on you can see if we print y y you just run that you can see it's an array it's this is a line it's going to have in this case between 30 and 60 so it's going to be 30 variables in here and the same thing with y y up y y down and we'll we'll plot those in just a minute on a graph so you can see what those look like just go ahead and delete that out of here and run that so it loads up the variables nice clean slate I'm just going to copy this from before remember this our SNS our Seaborn plot LM plot flow sugar and I'll just go and run that real quick so you can see what remember what that looks like it's just a straight graph on there and then one of the neat things is because Seaborn sits on top of P plot we can do the PIP plot for the line going through and that is simply PLT t. plot and that's our xx and y y our two corresponding values x y and then somebody played with this to figure out that the line width equals two and the color black would look nice so let's go ahead and run this whole thing with the PIP plot on there and you can see when we do this it's just doing flour and sugar on here corresponding line between the sugar and the flour and the muffin versus Cupcake um and then we generated the um support vectors the y y down and y y up so let's take a look and see what that looks like so we'll do our PL plot and again this is all against XX our x value but this time we have YY down and let's do something a little fun with this we can put in a k dash dash that just tells it to make it a dotted line line and if we're going to do the down one we also want to do the up one so here's our YY up and when we run that it adds both sets of line and so here's our support and this is what you expect you expect these two lines to go through the nearest data point so the dash lines go through the nearest muffin and the nearest cupcake when it's plotting it and then your svm goes right down the middle so it gives it a nice split in our data and you can see how easy it is to see based just on sugar and flour which one's a muffin or a cupcake let's go ahead and create a function to predict muffin or cupcake I've got my um recipes I pulled off the um internet and I want to see the difference between a muffin or a cupcake and so we need a function to push that through and create a function with de and let's call it muffin or cupcake and remember we're just doing flow and sugar today not doing all the ingredients and that actually is a pretty good split you really don't need all the ingredients to know it's flour and sugar and let's go ahead and do an IFL statement so if model predict is a flour and sugar equals zero so we take our model and we do run a predict it's very common in K learn where you have a DOT predict you put the data in and it's going to return a value in this case if it equals zero then print you're looking at a muffin recipe else if it's not zero that means it's one then you're looking at a cupcake recipe that's pretty straightforward for function or def for definition DF is how you do that in Python and of course you're going to create a function you should run something in it and so let's run a cupcake and we're going to send it values 50 and 20 a muffin or a cupcake I don't know what it is and let's run this and just see what it gives us and it says oh it's a muffin you're looking at a muffin recipe so it very easily predicts whether we're looking at a muffin or a cupcake recipe let's plot this there we go plot this on the graph so we can see what that actually looks like and I'm just going to copy it and paste it from below where we plotting all the points in there so this is nothing different than what we did before if I run it you'll see it has all the point points and the lines on there and what we want to do is we want to add another point and we'll do PLT plot and if I remember correctly we did for our test we did 50 and 20 and then somebody went in here and decided we'll do yo for yellow or it's kind of a orangish yellow color is going to come out marker size nine those are settings you can play with somebody else played with them to come up with the right setup so it looks good and you can see there it is graphed um clearly a muffin in this case in cupcakes versus muffins the muffin has won and if you'd like to do your own muffin cupcake Contender series you certainly can send a note down below and the team at simply learn will send you over the data they use for the muffin and cupcake and that's true of any of the data um we didn't actually run a plot on it earlier we had men versus women you can also request that information to run it on your day data setup so you can test that out so to go back over our setup we went ahead for our support Vector machine code we did a predict 40 Parts flour 20 Parts sugar I think it was different than the one we did whether it's a muffin or a cupcake hence we have built a classifier using SPM which is able to classify if a recipe is of a cupcake what's in it for you we're going to cover clustering what is clustering K means clustering which is one of the most common use clustering tools out there including a flowchart to understand K means clustering and how it functions and then we'll do an actual python live demo on clustering of cars based on Brands then we're going to cover logistic regression what is logistic regression logistic regression curve and sigmoid function and then we'll do another python code demo to classify a tumor as malignant or benign based on features and let's start with clustering suppose we have a pile of books of different genres now we divide them into different groups like fiction horror education and as we can see from this young lady she definitely is into heavy horror you can just tell by those eyes and the maple Canadian leaf on her shirt but we have fiction horror and education and we wanted go ahead and divide our books up well organizing objects into groups based on similarity is clustering and in this case was we're looking at the books we're talking about clustering things with no one categories but you can also use it to explore data so you might not know the categories you just know that you need to divide it up in some way to conquer the data and to organize it better but in this case we're going to be looking at clustering in specific categories and let's just take a deeper look at that we're going to use K means clustering K means clustering is probably the most commonly used clustering tool in the machine learning library K means clustering is an example of unsupervised learning if you remember from our previous thing it is used when you have unlabeled data so we don't know the answer yet we have a bunch of data that we want to Cluster into different groups Define clusters in the data based on feature similarity so we've introduced a couple terms here we've already talked about unsupervised learning and unlabeled data so we don't know the answer yet we're just going to group stuff together and see if we can find an answer of how things connect we've also introduced feature similarity features being different features of the data now with books we can easily see fiction and horror and history books but a lot of times with data some of that information isn't so easy to see right when we first look at it and so K means is one of those tools where we can start finding things that connect that match with each other suppose we have these data points and want to assign them into a cluster now when I look at these data points I would probably group them into two clusters just by looking at them I'd say two of these group of data kind of come together but in K means we pick K clusters and assign random centroids to two clusters where the K clusters represents two different clusters we pick K clusters inside random centroids to the Clusters then we compute distance from objects to the centroids now we form new clusters based on minimum distances and calculate the centroids so we figure out what the best distance is for the centroid then we move the centroid and recalculate those distances repeat previous two steps iteratively till the cluster centroids stop changing their positions and become Static repeat previous two steps iteratively till the cluster centroid stop changing and the positions become Static once the Clusters become Static then K means clustering algorithm is said to be converged and there's another term we see throughout machine learning is converged that means whatever math we're using to figure out the answer has come to a solution or it's conversed on an answer shall we see the flowchart to understand make a little bit more sense by putting it into a nice easy step by step so we start we choose K we'll look at the L method in just a moment we assign random centroids to clusters and sometimes you pick the centroids because you might look at the data and in a graph and say oh these are probably the central points then we compute the distance from the objects to the centroids we take that and we form new clusters based on minimum distance and calculate their centroids then we compute the distance from objects to the new centroids and then we go back and repeat those last two steps we calculate the distances so as we're doing it it brings into the new centroid and then we move the centroid around and we figure out what the best which objects are closest to each centroid so the objects can switch from one centroid to the other as the centroids are moved around and we continue that until it is converged let's see an example of this suppose we have this data set of seven individuals and their score on two topics A and B so here's our subject in this case referring to the person taking the test and then we have subject a where we see what they've scored on their first subject and we have subject B and we can see what they score on the second subject now let's take two farthest apart points as initial cluster centroids now remember we talked about selecting them randomly or we can also just put them in different points and pick the furthest one apart so they move together either one works okay depending on what kind of data you're working on and what you know about it so we took the two furthest points one and one and five and seven and now let's take the two farthest apart points points as initial cluster centroids each point is then assigned to the closest cluster with respect to the distance from the centroids so we take each one of these points in there we measure that distance and you can see that if we measured each of those distances and you use the the Pythagorean theorem for a triangle in this case because you know the X and the Y and you can figure out the diagonal line from that or you just take a ruler and put it on your monitor that'd be kind of silly but it would work if you're just eyeballing it you can see how they naturally come together in certain areas now we again calculate the centroids of each cluster so cluster one and then cluster two and we look at each individual dot there's one two three we're in one cluster uh the centroid then moves over it becomes 1.8 comma 2.3 so remember it was at one and one well the very center of the data we're looking at would put it at the one point roughly 22 but 1.8 2.3 and the second one if we wanted to make the overall mean Vector the average Vector of all the different distances to that centroid we come up with 4 comma 1 and 54 so we've now moved the centroids we compare each individual's distance to its own cluster mean and to that of the opposite cluster and we find can build a nice chart on here that the as we move that centr around we now have a new different kind of clustering of groups and using ukian distance between the points and the mean we get the same formula you see new formulas coming up so we have our individual dots distance to the mean centr of the cluster and distance to the mean croid of the cluster only individual three is nearer to the mean of the opposite cluster cluster 2 than its own cluster one and you can see here in the diagram where we've kind of circled that one in the middle so when we've moved the clust the centroids of the Clusters over one of the points shifted to the other cluster because it's closer to that group of individuals thus individual 3 is relocated to Cluster 2 resulting in a new Partition and we regenerate all those numbers of how close they are to the different clusters for the new clusters we will find the actual cluster centroids so now we move the centroids over and you can see that we've now formed two very distinct clusters on here on comparing the distance of each individual's distance to its own cluster mean and to that of the opposite cluster we find that the data points are stable hence we have our final clusters now if you remember I brought up a concept earlier K me on the K means algorithm choosing the right value of K will help in less number of iterations and to find the appropriate number of clusters in a data set we use the elbow method and within sum of squares WSS is defined as the sum of the squared distance between each member of the cluster and its centroid and so you see what we've done here is we have the number of clusters and as you do the same K means algorithm over the different clusters and you calculate what that centroid looks like and you find the optimal you can actually find the optimal number of clusters using the elbow the graph is called as the elbow method and on this we guessed at two just by looking at the data but as you can see the slope you actually just look for right there where the elbow is in the slope and you have a clear answer that we want two different to start with k means equals 2 A lot of times people end up Computing K means equals 2 three four five until they find the value which fits on the elbow joint sometimes you can just look at the data and if you're really good with that specific domain remember domain I mentioned that last time you'll know that that where to pick the those numbers of where to start guessing at what that K value is so let's take this and we're going to use a use case using K means clustering to Cluster cars into Brands using parameters such as horsepower cubic inches make year Etc so we're going to use the data set cars data having information about three brands of cars Toyota Honda and Nissan we'll go back to my favorite tool the Anaconda Navigator with the Jupiter notebook and let's go ahead and flip over to our Jupiter notebook and in our Jupiter notebook I'm going to go ahead and just paste the uh basic code that we usually start a lot of these off with we're not going to go too much into this code because we've already discussed numpy we've already discussed matplot library and pandas the be being the number array pandas being the pandas data frame and map plot for the graphing and don't forget uh since if you're using the Jupiter notebook you do need the matap plot library in line so that it plots everything on the screen if you're using a different pyth on editor then you probably don't need that cuz it'll have a popup window on your computer and we'll go ahead and run this just to load our libraries and our setup into here the next step is of course to look at our data which I've already opened up in a spreadsheet and you can see here we have the miles per gallon cylinders cubic inches horsepower weight pounds how you know how heavy it is time it takes to get to 60 my card is probably on this one at about 80 or 90 what year it is so this is you can actually see this is kind of older cars and then the brand Toyota Honda Nissan so the different cars are coming from all the way from 1971 if we scroll down to uh the 80s we have between the 70s and 80s a number of cars that they've put out and let's uh we come back here we're going to do importing the data so we'll go ahead and do data set equals and we'll use pandas to read this in and it's uh from a CSV file remember you can always post this in the comments and request the data files for these either in the comments here on the YouTube video or go to Simply learn.com and request that the car CSV I put it in the same folder as the code that I've stored so my python code is stored in the same folder so I don't have to put the full path if you store them in different folders you do have to change this and double check your name variables and we'll go ahead and run this and uh We've chosen data set arbitrarily because you know it's a data set we're importing and we've now imported our car CSV into the data set as you know you have to prep the data so we're going to create the X data this is the one that we're going to try to figure out what's going on with and then there is a number of ways to do this but we'll do it in a simple Loop so you can actually see what's going on so we'll do for i n x. columns so we're going to go through each of the columns and a lot of times it's important I I'll make lists of the columns and do this because I might remove certain columns or there might be columns that I want to be processed differently but for this this we can go ahead and take X of I and we want to go fill na a and that's a panda's command but the question is when are we going to fill the missing data with we definitely don't want to just put in a number that doesn't actually mean something and so one of the tricks you can do with this is we can take X of I and in addition to that we want to go ahead and turn this into an integer because a lot of these are integers so we'll go aead and keep it integers and me add the bracket here and a lot of editors will do this they'll think that you're closing one bracket make sure you get that second bracket in there if it's a double bracket that's always something that happens regularly so once we have our integer of X of Y this is going to fill in any missing data with the average and I was so busy closing one set of brackets I forgot that the mean is also has brackets in there for the pandas so we can see here we're going to fill in all the data with the average value for that column so if there's missing data in the average of the data it does have then once we've done that we'll go ahead and loop through it again and just check and see to make sure everything is filled in correctly and we'll print and then we take X is null and this returns a set of the null value or the how many lines are null and we'll just sum that up to see what that looks like and so when I run this and so with the X what we want to do is we want to remove the last column because that had the models that's what we're trying to see if we can cluster these things and figure out the models there is so many different ways to sort the X out for one we could take the X and we could go data set our variable we're using and use the iocation one of the features that's in pandas and we could take that and then take all the rows and all but the last column of the data set and at this time we could do values we just convert it to values so that's one way to do this and if I let me just put this down here and print X it's a capital x we chose and I run this you see it's just the values we could also take out the values and it's not going to return anything because there's no values connected to it what I like to do with this is instead of doing the iocation which does integers more common is to come in here and we have our data set and we're going to do data set dot or data set. columns and remember that list all the columns so if I come in here let me just Mark that as red and I print data set. columns you can see that I have my index here I have my MPG cylinders everything including the brand which we don't want so the way to get rid of the brand would be to do data Columns of Everything But the last one minus one so now if I print this you'll see the brand disappears and so I can actually just take data set columns minus one and I'll put it right in here for the column columns we're going to look at and let's unmark this and unmark this and now if I do an x. head I now have a new data frame and you can see right here we have all the different columns except for the brand at the end of the year and it turns out when you start playing with the data set you're going to get an error later on and it'll say cannot convert string to uh float value and that's because for some reason these things the way they recorded them must have been recorded as strings so we have a neat feature in here on pandas to convert and it is simply convert objects and for this we're going to do convert oops convert underscore numeric numeric equals true and yes I did have to go look that up I don't have it memorized the convert numeric in there if I'm working with a lot of these things I remember them but um depending on where I'm at what I'm doing I usually have to look it up and we run that oops I must have missed something in here let me double check my spelling and when I double check my spelling you'll see I missed the first underscore in the convert objects and when I run this it now has everything converted into a numeric value because that's what we're going to be working with as numeric values down here and the next part is that we need to go through the data and eliminate null values most people when they're doing small amounts working with small data pools discover afterwards that they have a null value and they have to go back and do this so you know beware whenever we're formatting this data things are going to pop up and sometimes you go backwards to fix it and that's fine that's just part of exploring the data and understanding what you have and I should have done this earlier but let me go ahead and increase the size of my window one notch there we go easier to see so we'll do 4 I in working with x. columns we'll page through all the columns and we want to take X of I we're going to change that we're going to alter it and so with this we want to go ahead and fill in X of I pandis Has The Fill na a and that just fills in any non-existent missing data I will'll put my brackets up and there's a lot of different ways to fill this data if you have a really large data set some people just void out that data because if and then look at it later in a separate exploration of data one of the tricks we can do is we can take our column and we can find the means and the means is in our quotation marks so when we take the columns we're going to fill in the the non-existing one with the means the problem is that returns a decimal float so some of these aren't decimals certainly let me need to be a little careful of doing this but for this example we're just going to fill it in with the integer version of this keeps it on par with the other data that isn't a decimal point and then what we also want to do is we want to double check A lot of times you do this first part first to double check then you do the fill and then you do it again just to make sure you did it right so we're going to go through and test for missing data and one of the re ways you can do that is simply go in here and take our X ofi column so it's going to go through the x of I column it says is null so it's going to return any any place there's a null value it actually goes through all the rows of each column is null and then we want to go ahead and sum that so we take that we add the sum value and these are all pandas so is null is a panda command and so is sum and if we go through that and we go ahead and run it and we go ahead and take and run that you'll see that all columns have zero null values so we've now tested and double checked and our data is nice and clean we have no null values everything is now a number value we turned it into numeric and we've removed the last column in our data and at this point we're actually going to start using the elbow method to find the optimal number of clusters so we're now actually getting into the SK learn part uh the K means clustering on here I guess we'll go ahead and zoom it up one more notot so you can see see what I'm typing in here and then from sklearn going to or sklearn cluster we're going to import K means I always forget to capitalize the K and the M when I do this say capital K capital M K means and we'll go and create a um array wcss equals we'll make it an empty array if you remember from the elbow method from our slide within the sums of squares WSS is defined as the sum of square distance between each member of the cluster and its centroid so we're looking at that change in differences as far as a squar distance and we're going to run this over a number of K mean values in fact let's go for I in range we'll do 11 of them range Z of 11 and the first thing we're going to do is we're going to create the actual we'll do it all lower case and so we're going to create this object from the K means that we just imported and the variable that we want to put into this is in clusters we're going to set that equals to I that's the most important one because we're looking at how increasing the number of clusters changes our answer there are a lot of settings to the K means our guys in the back did a great job just kind of playing with some of them the most common ones that you see in a lot of stuff is how you init your K means so we have K means plus plus plus this is just a tool to let the model itself be smart how it picks it centroids to start with it's initial syns we only want to iterate no more than 300 times we have a Max iteration we put in there we have an the infinite the random State equals zero you really don't need to worry too much about these when you're first learning this as you start digging in deeper you start finding that these are shortcuts that will speed up the process as far as a setup but the big one that we're working with is the in clusters equals I so we're going to literally train our K means 11 times we're going to do this process 11 times and if you're working with uh Big Data you know the first thing you do is you run a small sample of the data so you can test all your stuff on it and you can already see the problem that if I'm going to iterate through a terabyte of data 11 times and then the K means itself is iterating through the data multiple times that's a heck of a process so you got to be a little careful with this a lot of times though you can find your elbow using the elbow method find your optimal number on a sample of data especially if you're working with larger data sources so we want to go ahead and take our K means and we're just going to fit it you you're looking at any of the SK learn very common you fit your model and if you remember correctly our variable we're using is the capital x and once we fit this value we go back to the um array we made and we want to go and just depend that value on the end and it's not the actual fit we're pining in there it's when it generates it it generates the value you're looking for is inertia so K means. inura will'll pull that specific value out that we need and let's get a visual on this we'll do our PLT plot and what we're plotting here is first D xaxis which is range 01 so that will generate a nice little plot there and the wcss for our Y axis it's always nice to give our plot a title and let's see we'll just give it the elbow method for the title and let's get some labels so let's go ahead and do PLT X label and what we'll do we'll do number of clusters for that and PLT y label and for that we can do oops there we go wcss since that's what we're doing on the plot on there and finally we want to go ahead and display our graph which is simply PLT do oops. show there we go and because we have it set to inline it'll appear in line hopefully I didn't make a type error on there and you can see we get a very nice graph you can see a very nice elbow joint there at uh two and again right around three and four and then after that there's not very much now as a data scientist if I was looking at this I would do either three or four and I'd actually try both of them to see what the um output looked like and they've already tried this in the back so we're just going to use three as a setup on here and let's go ahead and see what that looks like when we actually use this to show the different kinds of cars and so let's go ahead and apply the K means to the cars data set and basically we're going to copy the code that we looped through up above where K means equals K means number of clusters and we're just going to set that number of clusters to three since that's what we're going to look for and you could do three and four on this and graph them just to see how they come up differently' be kind of curious to look at that but for this we're just going to set it to three go ahead and create our own variable y k means for our answers and we're going to set that equal to whoops double equal there 2 K means but we're not going to do a fit we're going to do a fit predict is the setup you want to use and when you're using untrained models you'll see um a slightly different usually you see fit and then you see just the predict but we want to both fit and predict the K means on this and that's fit underscore predict and then our capital x is is the data we're working with and before we plot this data we're going to do a little pandas trick we're going to take our x value and we're going to set XS Matrix so we're converting this into a nice rows and columns kind of set up but we want the we're going to have columns equals none so it's just going to be a matrix of data in here and let's go ahead and run that a little warning you'll see These Warnings pop up because things are always being updated so there's like minor changes in the versions and future versions instead of Matrix now that it's more common to set it do values instead of doing as Matrix but M Matrix works just fine for right now and you'll want to update that later on but let's go ahead and dive in and plot this and see what that looks like and before we dive into plotting this data I always like to take a look and see what I am plotting so let's take a look at why K means I'm just going to print that out down here and we see we have an array of answers we have 2 one0 2 one two so it's clustering these different rows of data based on the three different spaces it thinks it's going to be and then let's go ahead and print X and see what we have for x and we'll see that X is an array it's a matrix so we have our different values in the array and what we're going to do it's very hard to plot all the different values in the array so we're only going to be looking at the first two or positions zero and one one and if you were doing a full presentation in front of the board meeting you might actually do a little different and and dig a little deeper into the different aspects because this is all the different columns we looked at but we only look at columns one and two for this to make it easy so let's go ahead and clear this data out of here and let's bring up our plot and we're going to do a scatter plot here so pel scatter and this looks a little complicated so let's explain what's going on with this we're going to take the X values and we're only interested in y of K means equals zero the first cluster okay and then we're going to take value zero for the xaxis and then we're going to do the same thing here we're only interested in K means equals zero but we're going to take the second column so we're only looking at the first two columns in our answer or in the data and then the guys in the back played with this a little bit to make it pretty and they discovered that it looks good with has a size equals 100 that's the size of the dots we're going to use red for this one and when they were looking at the data and what came out it was definitely the Toyota on this so we're just going to go ahead and label it Toyota again that's something you really have to explore in here as far as playing with those numbers and see what looks good we'll go ahead and hit enter in there and I'm just going to paste in the next two lines which is the next two CS and this is our Nissa and Honda and you'll see with our scatter plot we're now looking at where Yore K means equals 1 and we want the zero column and y k means equals 2 again we're looking at just the first two columns zero and one and each of these rows then corresponds to Nissan and Honda and I'll go ahead and hit enter on there and uh finally let's take a look and put the centroids on there again we're going to do a scatter plot and on the centroids you can just pull that from our K means the uh model we created do cluster centers and we're going to just do um all of them in the first number and all of them in the second number which is 01 because you always start with zero and one and then they were playing with the size and everything to make it look good we'll do a size of 300 we're going to make the color yellow and we'll label them it's always good to have some good labels centroid and then we do want to do a title PLT title and pop up there PLT title so you always make want to make your graphs look pretty we'll call it clusters of car make and one of the features of the plot library is you can add a legend it'll automatically bring in it since we've already label the different aspects of the legend with Toyota Nissan and Honda and finally we want to go ahead show so we can actually see it and remember it's in line uh so if you're using a different editor that's not the Jupiter notebook you'll get a pop up of this and you should have a nice set of clusters here so we can look at this and we have a clusters of Honda and green Toyota and red Nissan and purple and you can see where they put the centroids to separate them now when we're looking at this we can also plot a lot of other different data on here as far because we only looked at the first two columns this is just column one and two or 01 as as you label them in computer scripting but you can see here we have a nice clusters of car make and we've able to pull out the data and you can see how just these two columns form very distinct clusters of data so if you were exploring new data you might take a look and say well what makes these different almost going in reverse you start looking at the data and pulling apart the columns to find out why is the first group set up the way it is maybe you're doing loans and you want to go well why is is this group not defaulting on their loans and why is the last group defaulting on their loans and why is the middle group 50% defaulting on their bank loans and you start finding ways to manipulate the data and pull out the answers you want so now that you've seen how to use K mean for clustering let's move on to the next topic now let's look into logistic regression the logistic regression algorithm is the simplest classification algorithm used for binary or multiclassification problems and we can see we have our little girl from Canada who's into horror books is back that's actually really scary when you think about that with those big eyes in the previous tutorial we learned about linear regression dependent and independent variables so to brush up y = mx + C very basic algebraic function of uh Y and X the dependent variable is the target class variable we are going to predict the independent variables X1 all the way up to xn are the features or attributes we're going to use to predict the target class we know what a linear regression looks like but using the graph we cannot divide the outcome into categories it's really hard to categorize 1.5 3.6 9.8 uh for example a linear regression graph can tell us that with increase in number of hours studied the marks of a student will increase but it will not tell us whether the student willay pass or not in such cases where we need the output as categorical value we will use logistic regression and for that we're going to use the sigmoid function so you can see here we have our marks 0 to 100 number of hours studied that's going to be what they're comparing it to in this example and we usually form a line that says y = mx + C and when we use the sigmoid function we have P = 1/ 1 + eus y it generates a sigmoid curve and so you you can see right here when you take the Ln which is the natural logarithm I always thought it should be NL not Ln that's just the inverse of uh e your e to the minus y and so we do this we get Ln of p over 1 minus p m * x + C that's the sigmoid curve function we're looking for and we can zoom in on the function and you'll see that the function as it derives goes to one or to zero depending on what your x value is and the probability if it's greater than 0.5 the value is automatically rounded off to one indicating that the student will pass so if they're doing a certain amount of studying they will probably pass then you have a threshold value at the0 five it automatically puts that right in the middle usually and your probability if it's less than 05 the value rented off to zero indicating the student will fail so if they're not studying very hard they're probably going to fail this of course is ignoring the outliers of that one student who's just a natural genius and doesn't need any studying to Mize everything that's not me unfortunately have to study hard to learn new stuff problem statement to classify whether a tumor is malignant or benign and this is actually one of my favorite data sets to play with because it has so many features and when you look at them you really are hard to understand you can't just look at them and know the answer so it gives you a chance to kind of dive into what data looks like when you aren't able to understand the specific domain of the data but I also want you to remind you that in the domain of medicine if I told you that my probability was really good it classified things that say 90% or 95% and I'm classifying whether you're going to have a malignant or a B9 tumor I'm guessing that you're going to go get it tested anyways so you got to remember the domain we're working with so why would you want to do that if you know you're just going to go get a biopsy because you know it's that serious this is like an all or nothing just referencing the domain it's important it might help the doct know where to look just by understanding what kind of tumor it is so it might help them or Aid them and something they missed from before so let's go ahead and dive into the code and I'll come back to the domain part of it in just a minute so use case and we're going to do our normal Imports here where we're importing numpy Panda Seaborn the matplot library and we're going to do matplot library in line since I'm going to switch over to Anaconda so let's go ahead and flip over there and get this started so I've opened up a new window in my anaconda Jupiter notebook by the way jupyter notebook uh you don't have to use Anaconda for the Jupiter notebook I just love the interface and all the tools that Anaconda brings so we got our import numpy as inp for our numpy number array we have our pandas PD we're going to bring in caborn to help us with our graphs as SNS so many really nice Tools in both Seaborn and matplot library and we'll do our matplot library. pyplot as PLT and then of course of course we want to let it know to do it in line and let's go and just run that so it's all set up and we're just going to call our data data not creative today uh equals PD and this happens to be in a CSV file so we'll use a pd. reor CSV and I happen to name the file I renamed it data for p2.png happy to supply that for you and let's just um open up the data before we go any further and let's just see what it looks like in a spreadsheet so when I pop it open in a local spreadsheet and this is just a CSV file comma separate variables we have an ID so I guess they um categorizes for reference of what id which test was done the diagnosis M for malignant B for B9 so there's two different options on there and that's what we're going to try to predict is the m and b and test it then we have like the radius mean or average the texture average perimeter mean area mean smoothness I don't know about you but unless you're a doctor in the field most of the stuff I mean you can guess what concave means just by the term concave but I really wouldn't know what that means in the measurements they're taking so they have all kinds of stuff like how smooth it is uh the Symmetry and these are all float values we just page through them real quick and you'll see there's I believe 36 if I remember correctly in this this one so there's a lot of different values they take and all these measurements they take when they go in there and they take a look at the different growth the tumorous growth so back in our data and I put this in the same folder as a code so I saved this code in that folder obviously if you have it in a different location you want to put the full path in there and we'll just do uh Panda's first five lines of data with the data. head and we run that we can see that we we have pretty much what we just looked at we have an ID we have a diagnosis if we go all the way across you'll see all the different columns coming across displayed nicely for our data and while we're exploring the data our caborn which we referenced as SNS makes it very easy to go in here and do a joint plot you'll notice the very similar to because it is sitting on top of the um plot Library so the joint plot does a lot of work for us and we're just going to look at the first two columns that we're interested in the radius mean and the texture mean we'll just look at those two columns and data equals data so that tells it which two columns we're plotting and that we're going to use the data that we pulled in let's just run that and it generates a really nice graph on here and there's all kinds of cool things on this graph to look at I mean we have the texture mean and the radius mean obviously the axes you can also see and uh one of the cool things on here is you can also see the histogram they show that for the radius mean where is the most common radius mean come up and where the most common texture is so we're looking at the tech the on each growth its average texture and on each radius its average uh radius on there gets a little confusing because we're talking about the individual objects average and then we can also look over here and see the the histogram showing us the median or how common each measurement is and that's only two columns so let's dig a little deeper into Seaborn they also have a heat map and if you're not familiar with heat Maps a heat map just means it's in color that's all that means heat map I guess the original ones were plotting heat density on something and so ever since then it's just called a heat map and we're going to take our data and get our corresponding numbers to put that into the heat map and that's simply data. C RR for that that's a panda expression let remember we're working in a pandas data frame so that's one of the Cool Tools in pandas for our data and this is pull that information into a heat map and see what that looks like and you'll see that we're now looking at all the different features we have our ID we have our texture we have our area our compactness concave points and if you look down the middle of this chart diagonal going from the upper left to bottom right it's all white that's because when you compare texture to texture they're identical so they're 100% or in this case perfect one in their correspondence and you'll see that when you look at say area or right below it it has almost a black on there when you compare it to texture so these have almost no corresponding data They Don't Really form a linear graph or something that you can look at and say how connected they are they're very scattered data this is really just a really nice craft to get a quick look at your data doesn't so much change what you do but it changes verifying so when you get an answer or something like that or you start looking at some of these individual pieces you might go hey that doesn't match according to showing our heat map this should not correlate with each other and if it is you're going to have to start asking well why what's going on what else is coming in there but it does show some really cool information on here mean we can see from the ID there's no real one feature that just says if you go across the top line that lights up there's no one feature that says hey if the area is a certain size then it's going to be B9 or malignant it says there's some that sort of add up and that's a big hint in the data that we're trying to ID this whether it's malignant or B9 that's a big hint to us as data scientists to go okay we can't solve this with any one feature it's going to be something that includes all the features or many of the different features to come up with the solution for it and while we're exploring the data let's explore one more area and let's look at data do is null we want to check for null values in our data if you remember from earlier in this tutorial we did it a little differently where we added stuff up and summed them up you can actually with pandas do it really quickly data. is null and Summit and it's going to go across all the columns so when I run this you're going to see all the columns come up with no null data so we've just just to reash these last few steps we've done a lot of exploration we have looked at the first two columns and seen how they plot with the caborn with a joint plot which shows both the histogram and the data plotted on the X Y coordinates and obviously you can do that more in detail with different columns and see how they plot together and then we took and did the Seaborn heat map the SNS do heat map of the data and you can see right here where it did a nice job showing us some bright spots where stuff correlates with each other and forms a very nice combination or points of scattering points and you can also see areas that don't and then finally we went ahead and checked the data is the data null value do we have any missing data in there very important step because it'll crash later on if you forget to do this step it will remind you when you get that nice error code that says null values okay so not a big deal if you miss it but it it's no fun having to go back when you're you're in a huge process and you've missed this step and now you're 10 steps later and you got to go remember where you were pulling the data in so we need to go ahead and pull out our X and our y so we just put that down here and we'll set the x equal to and there's a lot of different options here certainly we could do x equals all the columns except for the first two because if you remember the first two is the ID and the diagnosis so that certainly would be an option but we're going to do is we're actually going to focus on the worst the worst radius the worst texture parameter area smoothness compactness and so on one of the reasons to start dividing your data up when you're looking at this information is sometimes the data will be the same data coming in so if I have two measurements coming into my model it might overweigh them it might overpower the other measurements because it's measur it's basically taking that information in twice that's a little bit past the scope of this tutorial I want you to take away from this though is that we are dividing the data up into pieces and our team in the back went ahead and said hey let's just look at the worst so I'm going to create a an array and you'll see this array radius worst texture worst perimeter worst we've just taken the worst of the worst and I'm just going to put that in my X so this x is still a pandas data frame but it's just those columns and our y if you remember correctly is going to be oops hold on one second it's not X it's data there we go so x equals data and then it's a list of the different columns the worst of the worst and if we're going to take that then we have to have our answer for our Y for the stuff we know and if you remember correctly we're just going to be looking at the diagnosis that's all we care about is what is it diagnosed is it Bine or malignant and since it's a single column we can just do diagnosis oh I forgot to put the brackets the there we go okay so it's just diagnosis on there and we can also real quickly do like x. head if you want to see what that looks like and Y do head and run this and you'll see um it only does the last one I forgot about that if you don't do print you can see that the the Y do head is just Mmm because the first ones are all malignant and if I run this the x. head is just the first five values of radius worst texture worst parameter worst area worst and so on I'll go ahead and take that out so moving down to the next step we've built our two data sets our answer and then the features we want to look at in data science it's very important to test your model so we do that by splitting the data and from sklearn model selection we're going to import train test split so we're going to split it into two groups there there are so many ways to do this I noticed in one of the more modern ways they actually split it into three groups and then you model each group and test it against the other groups so you have all kinds and there's reasons for that which is past the scope of this and for this particular example isn't necessary for this we're just going to split it into two groups one to train our data and one to test our data and the sklearn uh model selection we have train test split you could write your own quick code to do this where you just r randomly divide the data up into two groups but they do it for us nicely and we actually can almost we can actually do it in one statement with this where we're going to generate four variables capital x train capital X test so we have our training data we're going to use to fit the model and then we need something to test it and then we have our y train so we're going to train the answer and then we have our test so this is the stuff we want to see how good it did on our model and we'll go ahead and take our train test split that we just imported and we're going to do X and our y our two different data that's going in for our split and then the guys in the back came up and wanted us to go ahead and use a test size equals. 3 that's testore size random State it's always nice to kind of switch a random State around but not that important what this means is that the test size is we're going to take 30% of the data and we're going to put that into our test variables our y test and our X test and we're we're going to do 70% into the X train and the Y train so we're going to use 70% of the data to train our model and 30% to test it let's go ahead and run that and load those up so now we have all our stuff split up and all our data ready to go now we get to the actual Logistics part we're actually going to do our create our model so let's go ahead and bring that in from sklearn we're going to bring in our linear model and we're going to import logistic regression that's the actual model we're using and this we call it log models there we go model and let's just set this equal to our logistic regression that we just imported so now we have a variable log model set to that class for us to use and with most the uh models in the SK learn we just need to go ahead and fix it fit do a fit on there and we use our X train that we separated out with our y train and let's go ahead and run this so once we've run we'll have a model that fits this data that 70% of our training data uh and of course it prints this out that tells us all the different variables that you can set on there there's a lot of different choices you can make but for word do we're just going to let all the defaults set we don't really need to mess with those on this particular example and there's nothing in here that really stands out as super important until you start fine-tuning it but for what we're doing the basics will work just fine and then let's we need to go ahead and test out our modeled is it working so let's create a variable y predict and this is going to be equal to our log model and we want to do a predict again very standard uh format for the sklearn library is taking your model and doing a predict on it and we're going to test y predict against the Y test so we want to know what the model thinks it's going to be that's what our y predict is and with that we want the capital XX test so we have our train set and our test set and now we're going to do our y predict and let's go ahead and run that and if we uh print y predict let me go ahead and run that you'll see it comes up and it pred a i prints a nice array of uh B and M for B9 and malignant for all the different test data we put in there so it does pretty good we're not sure exactly how good it does but we can see that it actually works and it's functional was very easy to create you'll always discover with our data science that as you explore this you spend a significant amount of time prepping your data and making sure your data coming in is good uh there's a saying good data in good answers out bad data in bad answers out that's only half the thing that's only half of it selecting your models becomes the next part as far as how good your models are and then of course fine-tuning it depending on what model you're using so we come in here we want to know how good this came out so we have our y predict here log model. predict X test so for deciding how good our model is we're going to go from the SK learnmetrics we're going to import classification report and that just reports how good our model is doing and then we're going to feed it the uh model data and let's just print this out and we'll take our classification report and we're going to put into there our test our actual data so this is what we actually know is true and our prediction what our model predicted for that data on the test side and let's run that and see what that does so we pull that up you'll see that we have um a Precision for B9 and malignant B and M and we have a Precision of 93 and 91 a total of 92 so it's kind of the average between these two of 92 there's all kinds of different information on here your F1 score your recall your support coming through on this and for this I'll go ahead and just flip back to our slides that they put together for describing it and so here we're going to look at the Precision using the classification report and you see this is the same print out I had up above some of the numbers might be different because it does randomly pick out which data we're using so this model is able to predict the type of tumor with 91% accuracy so when we look back here that's you will see where we have B9 and mland it actually is 92 coming up here but we're looking about a 92 91% precision and remember I reminded you about domain so when we're talking about the domain of a medical domain with a very catastrophic outcome you know at 91 or 92% Precision you're still going to go in there and have somebody do a biopsy on it very different than if you're investing money and there's a 92% chance you're going to earn 10% and 8% chance you're going to lose 8% you're probably going to bet the money because at that odds it's pretty good that you'll make some money and in the long run if you do that enough you definitely will make money and also with this domain I've actually seen them use this to identify different forms of cancer that's one of the things that they're starting to use these models for because then it helps the doctor know what to investigate so that wraps up this section we're finally we're going to go in there and let's discuss the answers to the quiz asked in machine learning tutorial part one can you tell what's happening in the following cases grouping documents into different categories based on the topic and content of each document this is an example of clustering where K means clustering can be used to group the documents by topics using bag of words approach so if You' gotten in there that you're looking for clustering and hopefully you had at least one or two examples like K means that are used for clustering different things then give yourself a two thumbs up B identifying handwritten digits and images correctly this is an example of classification the traditional approach to solving this would be to extract digit dependent features like curvature of different digits Etc and then use a classifier like svm to distinguish between images again if you got the fact that it's a classification example give yourself a thumb up and if you're able to go hey let's use svm or another model for this give yourself those two thumbs up on it see behavior of a website indicating that the site is not working as designed this is an example of anomaly detection in this case the algorithm learns what is normal and what is not normal usually by observing the logs of the website give yourself a thumbs up if you got that one and just for a bonus can you think of another example of anomaly detection one of the ones I use for my own business is detecting anomalies in stock markets stock markets are very ficked and they behave very radical so finding those erratic areas and then finding ways to track down why they're erratic was something released in social media was something released you can see we're knowing where that anomaly is can help you to figure out what the answer is to it in another area D predicting salary of an individual based on his or her years of experience this is an example of regression this problem can be mathematically defined as a function between independent years of experience and dependent variables salary of an individual and if you guess that this was a regression model give yourself a thumbs up and if you're able to remember that it it was between independent and dependent variables and that terms give yourself two thumbs up summary so to wrap it up we went over what is K means and we went through also the chart of choosing your elbow method and assigning a random centroid to the Clusters Computing the distance and then going in there and figuring out what the minimum centroids is and Computing the distance and going through that Loop until it gets the perfect centroid and we looked into the elbow method to choose K based on running our clusters across a number of variables and finding the best location for that we did a nice example of clustering cars with K means even though we only looked at the first two columns to make it simple and easy to graph can easily extrapolate that and look at all the different columns and see how they all fit together and we looked at what is logistic regression we discussed the sigmoid function what is logistic regression and and then we went into an example of classifying tumors with Logistics I hope you enjoyed part two of machine learning so how this machine learning road map will help you this machine learning road map offers a clear and structured path to mastering machine learning by following this road map you will not only gain valuable knowledge but also develop a mindset gear towards Innovation and adaptability so what is machine learning and AI imagine a computer that learns from data like a student that's machine learning and and machine learning is a subset of artificial intelligence AI that enables computer to learn from the data and make prediction or decision without being explicitly programmed it uses algorithm and Stat models to improve performance or specific task through experience or data input so Artificial Intelligence on the other hand covers a broader scope and involves developing computer system that perform tasks typically requiring human intelligence so machine learning is a crucial component of AI providing the capability to learn and adapt from the data so now let's move forward and see the steps for AIML road map so step by step machine learning road map so this is step by step machine learning road map guides you through mastering ml a vital branch of AI typically over several months to a year so depending on your background so start with prequest like programming python R stat and linear algebra progress through understanding data processing learning algorithms and model evaluation and optimization so this is structured approach combined with Hands-On projects will solidied you ml expertise while preparing you for the advanced topics and machine learning applications in the dynamic field so now let's get started this step one mastering mathematics so to excel in machine learning a strong foundation in mathematics is essential so focus on the following areas the first one is linear algebra and calculus so linear algebra is the backbone of many machine learning algorithms it helps you understand how each algorithm works whereas calculus is crucial for optimizing algorithm used in machine learning key concept include like vectors metrics linear equations I values differentiation integration and gradient decent the second one is probability and statistic so probability and STS are fundamentals for analyzing data and making prediction so here are some important topics like probability distribution descriptive stat hypothesis testing regression analysis and basian Stat so moving forward step two developing programming skills Proficiency in programming is essential focus on python andr the top language for machine learning the first one python python is a popular due to its Simplicity and extensive Library like numai Panda psyched learn so it's great for both beginners and expert the second one is our programming R is excellent for statical analysis and data visualization platforms like simply Le offer specific sces in the r programming the third important python libraries learn libraries like numai for numerical operations pandas for data manipulation M plot leave and cbon for data visualization and psychic learn for machine learning the third one exploring core machine learning ml algorithms so once you have got a good anle on math and programming it's time to learn core machine learning algorithm so understanding these will help you solve real world problems so here are some key algorithms to explore the first one is un supervised learning algorithm explore clustering that is K means clustering and dimensionally reduction that is PCA for understanding patterns in unlabelled data the second one is supervised learning algorithms learn regression for continuous outcomes and classification for discrete labels covering methods like linear regression logistic regression K neighbor and support Vector machine svms and the third one is model evaluation and validation understand evaluation metrics like accuracy precision recall and F1 score learn cross validation and performance metric to SS model performance and you can also learn other important machine learning algorithm like reinforcement learning gradient descent and algorithm for slope understanding step four learn Advanced Topic in machine learning so as you advance in machine learning the journey it's important to delve into more advanced topic so these areas will deepen your understanding and help you tackle complex problem so here are some key topics to focus on the first one is deep learning and neural networks the second is anible learning technique third one is generative models and adversarial learning the fourth one is recommendation system and collaborative filtering the last one is time series analysis and forecasting so for step five there is learn deployment so you have to learn how to deploy models using flas D Jango cloud services like AWS aure gcp then Docker and kubernetes is so deployment skills are crucial for making your model accessible and usable in real world applications so moving forward let's see some machine learning projects so work on real world projects to apply your knowledge focus on data collection preparation and Capstone project in image recognition NLP and there is one more predictor modeling and anomal deduction so practical experience is the key to solidifying your skills step seven Years Learning and exploration so stay updated with the latest development by the following industry leaders engaging in online communities and working on your personal projects so pursue Advanced learning through courses and certification to keep your skills sharp so now moving forward let's see machine learning career opportunities with sell so the job market for machine learning professional is booming the average annary for machine learning Engineers can vary based on location experience and Company so here are some roles like machine learning engineer data scientist NLP engineer computer vision engineer and AIML researcher so let's see how much they earn so the first one machine learning engineer in us they earn around $153,000 and in India they earn around 11 LS per the second is data scientists they earn in us around $157,000 and in India they earn around 12 lakhs per the third one is NLP engineer in us they earn around 107 ,000 and in India they earn around 7 lakh per the fourth one come to Vision engineer they earn around $226,000 in the US and in India they earn around 6.5 lakh perom the last one Ai and ml researcher they earn around $130,000 in the US and 9 lakh perom in India so note that these figures can vary on the website to website and changes frequently the machine learning road map provides a structured guide to help you navigate this Dynamic field by following the step-by-step guide and continuously horning your skills you can embark on a successful career in machine learning embrace the challenge stay curious and equip yourself with the necessary knowledge and expertise to thrive in this ever evolving domain so what is knnn what is the KNN algorithm K nearest neighbors is what that stands for it's one of the simplest supervised machine learning algorithms mostly used for classification so we want to know is this a dog or is not a dog is it a cat or not a cat it classifies a data point based on how its neighbors are classified KNN stores all available cases and classifies new cases based on a similarity measure and here we gone from cats and dogs right into wine another favorite of mine KNN stores all available cases and classifies new cases based on a similarity measure and here you see we have a measurement of sulfur dioxide versus the chloride level and then the different wines they've tested and where they fall on that graph based on how much sulfur dioxide and how much chloride K and KN andn is a perimeter that refers to the number of nearest neighbors to include in the majority of the voting process and so if we add a new glass of wine there red or white we want to know what the neighbors are in this case we're going to put k equals 5 we'll talk about K in just a minute a data point is classified by the majority of votes from its five nearest neighbors here the unknown point would be classified as red since four out of five neighbors are red so how do we choose k how do we know k equals 5 I mean that's was the value we put in there so we're going to talk about it how do we choose the factor k k andn algorithm is based on feature similarity choosing the right value of K is a process called parameter tuning and is important for better accuracy so at k equal 3 we can classify we have a question mark in the middle as either a as a square or not is it a square or is it in this case a triangle and so if we set k equals to three we're going to look at the three nearest neighbors we're going to say this is a square and if we put k equal to 7 we classify as a triangle depending on what the other data is around and you can see as the K changes depending on where that point is that drastically changes your answer and uh we jump here we go how do we choose the factor of K you'll find this in all machine learning choosing these factors that's the face you get it's like oh my gosh did I choose the right K did I set it right my values in whatever machine learning tool you're looking at so that you don't have a huge bias in One Direction or the other and in terms of knnn the number of K if you choose it too low the bias is based on it's just too noisy it's it's right next to a couple things and it's going to pick those things and you might get a skewed answer and if your K is too big then it's going to take forever to process so you're going to run into processing issues and resource issues so what we do the most common use and there's other options for choosing K is to use the square root of n so N is a total number of values you have you take the square root of it in most cases you also if it's an even number so if you're using uh like in this case squares and triangles if it's even you want to make your K value odd that helps it select better so in other words you're not going to have a balance between two different factors that are equal so usually take the square root of N and if it's even you add one to it or subtract one from it and that's where you get the K value from that is the most common use and it's pretty solid it works very well when do we use KNN we can use K in when data is labeled so you need a label on it we know we have a group of pictures with dogs dogs cats cats data is Noise free and so you can see here when we have a class and we have like underweight 140 23 Hello Kitty normal that's pretty confusing we have a high variety of data coming in so it's very noisy and that would cause an issue data set is small so we're usually working with smaller data sets where I you might get into gig of data if it's really clean it doesn't have a lot of noise because KNN is a lazy learner I.E it doesn't learn a discriminative function from the training set so it's very lazy so if you have very complicated data and you have a large amount of it you're not going to use the KNN but it's really great to get a place to start even with large data you can sort out a small sample and get an idea of what that looks like using the KNN and also just using for smaller data sets KNN works really good how does a KNN algorithm work consider a data set having two variables height and centimeter and weight in kilogram and each point is classified as normal or underweight so we can see right here we have two variables you know true false they're either normal or they're not they're underweight on the basis of the given data we have to classify the below set as normal or underweight using KNN so if we have new data coming in that says 57 kilog and 177 cm is that going to be normal or underweight to find the nearest neighbors we'll calculate the ukian distance Accord according to the ukian distance formula the distance between two points in the plane with the coordinates XY and ab is given by distance D equals the square Ro T of x - a^ 2 + y - b^ 2 and you can remember that from the two edges of a triangle we're Computing the third Edge since we know the X side and the yide let's calculate it to understand clearly so we have our unknown point and we placed it there in red and we have our other points where the data is scattered around round the distance D1 is aare < TK of 170 - 167 2 + 57 - 51 squared which is about 6.7 and distance two is about 13 and distance three is about 13.4 similarly we will calculate the ukian distance of unknown data point from all the points in the data set and because we're dealing with small amount of data that's not that hard to do and it's actually pretty quick for a computer and it's not a really complicated math you can just see how close is the data based on the ukan distance hence we have calculated the ukian distance of unknown data point from all the points as shown where X1 and y1 equal 57 and 170 whose class we have to classify so now we're looking at that we're saying well here's the ukian distance who's going to be their closest neighbors now let's calculate the nearest neighbor at k equals 3 and we can see the three closest neighbors puts them at normal and that's pretty self-evident when you look at this graph it's pretty easy to say okay what you know we're just voting normal normal normal three votes for normal this is going to be a normal weight so majority of neighbors are pointing towards normal hence as per KNN algorithm the class of 57170 should be normal so a recap of KNN positive integer K is specified along with a new sample we select the K entries in our database which are closest to the new sample we find the most common classification of these entries this is the classification we give to the new sample so as you can see it's pretty straightforward we're just looking for the closest things that match what we got so let's take a look and see what that looks like in uh a use case in Python so let's dive into the predict diabetes use case so use case predict diabetes the objective predict whether a person will be diagnosed with diabetes or not we have a data set of 768 people who were or were not diagnosed with diabetes and let's go ahead and open that file and just take a look at that data and this is in a simple red sheet format the data itself is comma separated very common set of data and it's also a very common way to get the data and you can see here we have columns a through I that's what 1 2 3 4 five 6 7 eight um eight columns with a particular tribute and then the ninth column which is the outcome is whether they have diabetes as a data scientist the first thing you should be looking at is insulin well you know if someone has insulin they have diabetes CU that's why they're taking it and that could cause issue on some of the machine learning packages but for a very basic setup this works fine for uh doing the KNN and the next thing you notice is it didn't take very much to open it up um I can scroll down to the bottom of the data there's 768 it's pretty much a small data set you know at 769 I can easily fit this into my ram on my computer I can look at it I can manipulate it and it's not going to really tax just a regular desktop computer you don't even need an Enterprise version to run a lot of this so let's start with importing all the tools we we need and before that of course we need to discuss what IDE I'm using certainly you can use any particular editor for python but I like to use for doing uh very basic visual stuff the Anaconda which is great for doing demos with the Jupiter notebook and just a quick view of the Anaconda Navigator which is the new release out there which is really nice you can see under home I can choose my application we're going to be using python 36 I have a couple different uh versions on this particular machine if I go under environments I can create a unique environment for each one which is nice and there's even a little button there where I can install different packages so if I click on that button and open the terminal I can then use a simple pip install to install different packages I'm working with let's go ahead and go back under home and we're going to launch our notebook and I've already you know kind of like uh the old cooking shows I've already prepared a lot of my stuff so we don't have to wait for it to launch because it takes a few minutes for it to open up a browser window in this case I'm it's going to open up Chrome because that's my default that I use and since the script is pre-done you'll see you have a number of windows open up at the top the one we're working in and uh since we're working on the KNN predict whether a person will have diabetes or not let's go and put that title in there and I'm also going to go up here and click on Cell actually we want to go ahead and first insert a cell below and then I'm going to go back up to the top cell and I'm going to change the cell type to markdown that means this is not going to run as python it's a markdown language so if I run this first one it comes up in nice big letters which is kind of nice remind us what we're working on and by now you should be familiar with doing all of our Imports we're going to import the pandas as PD import numpy is NP pandas is the pandas data frame and numpy is a number array very powerful tools to use in here so we have our Imports so we've brought in our pandas our numpy our two general python tools and then you can see over here we have our train test split by now youed should be familiar with splitting the data we want to split part of it for training our thing and then training our particular model and then we want to go ahead and test the remaining data just see how good it is pre-processing a standard scaler pre-processor so we don't have a bias of really large numbers remember in the data we had like number pregnancies isn't going to get very large where the amount of insulin they take can get up to 256 so 256 versus 6 that will skew results so we want to go ahead and change that so that they're all uniform between minus one and one and then the actual tool this this is the K neighbors classifier we're going to use and finally the last three are three tools to test all about testing our model how good is it let me just put down test on there and we have our confusion Matrix our F1 score and our accuracy so we have our two general python modules we're importing and then we have our six modules specific from the sklearn setup and then we do need to go ahead and run this so these are actually imported there we go and then move on to the the next step and so in this set we're going to go ahead and load the database we're going to use pandas remember pandas is PD and we'll take a look at the data in Python we looked at it in a simple spreadsheet but usually I like to also pull it up so that we can see what we're doing so here's our data set equals pd. read CSV that's a pandas command and the diabetes folder I just put in the same folder where my IPython script is if you put in a different folder You' need the full length on there we can also do a quick link of uh the data set that is a simple python command Len for length we might even let's go ahead and print that we'll go print and if you do it on its own line link. dat set in the jupyter notebook it'll automatically print it but when you're in most of your different setups you want to do the print in front of there and then we want to take a look at the actual data set and since we're in pandas we can simply do data set head and again let's go ahead and add the print in there if you put a bunch of these in a row you know the data set one head data set two head it only prints out the last one so I usually always like to keep the print statement in there but because most projects only use one data frame Panda's data frame doing it this way doesn't really matter the other way works just fine and you can see when we hit the Run button we have the 768 lines which we knew and we have our pregnancies it's automatically given a label on the left remember the head only shows the first five lines so we have 0er through four and just a quick look at the data you can see it match matches what we looked at before we have pregnancy glucose blood pressure all the way to age and then the outcome on the end and we're going to do a couple things in this next step we're going to create a list of columns where we can't have zero there's no such thing as zero skin thickness or zero blood PR pressure zero glucose uh any of those you'd be dead so not a really good Factor if they don't if they have a zero in there because they didn't have the data and we'll take a look at that because we're going to start replacing that information with a couple of different things and let's see what that looks like so first we create a nice list as you can see we have the values talked about glucose blood pressure skin thickness uh and this is a nice way when you're working with columns is to list the columns you need to do some kind of transformation on uh very common thing to do and then for this particular setup we certainly could use the there's some Panda tools that will do a lot of this where we can replace the na but we're going to go ahead and do it as a data set column equals data set column. replace this is this is still pandas you can do a direct there's also one that that you look for your n a lot of different options in here but the N numn an is what that stands for is is non doesn't exist so the first thing we're doing here is we're replacing the zero with a numpy none there's no data there that's what that says that's what this is saying right here so put the zero in and we're going to replace zeros with no data so if it's a zero that means the person's well hopefully not dead hope they just didn't get the data the next thing we want to do is we're going to create the mean which is the in integer from the data set from the column do mean where we skip Nas we can do that that is a panda's command there the skip na so we're going to figure out the mean of that data set and then we're going to take that data set column and we're going to replace all the npn with the means why did we do that and we could have actually just uh taken this step and gone right down here and just replaced zero and Skip anything where so you could actually there's a way to skip zeros and then just replace all the zeros but in this case we want to go ahead and do it this way so you could see that we're switching this to a nonexistent value then we're going to create the mean well this is the average person so if we don't know what it is if they did not get the data and the data is missing one of the tricks is you replace it with the average what is the most common data for that this way you can still use the rest of those values to do your computation and it kind of just brings that particular value those missing values out of the equation let's go ahead and take this and we'll go ahead and run it doesn't actually do anything so we're still preparing our data if you want to see what that looks like we don't have anything in the first few lines so it's not going to show up but we certainly could look at a row let's do that let's go into our data set with print a data set and let's pick in this case let's just do glucose and if I run this this is going to print all the different glucose levels going down and we than don't see anything in here that looks like missing data at least on the ones it shows you can see it skipped a bunch in the middle CU that's what it does if you have too many lines in Jupiter notebook it'll skip a few and and go on to the next in a data set let me go and remove this and we'll just zero out that and of course before we do any processing before proceeding any further we need to split the data set into our train and testing data that way we have something to train it with and something to test it on and you're going to notice we did a little something here with the Panda's database code there we go my drawing tool we've added in this right here off the data set and what this says is that the first one in Panda this is from the PD pandas it's going to say within the data set we want to look at the iocation and it is all rows that's what that says so we're going to keep all the rows but we're only looking at zero column 0 to 8 remember column 9 here it is right up here we printed in in here is outcome well that's not part of the training data that's part of the answer yes it's column 9 but it's listed as eight number eight so 0 to 8 is nine columns so uh eight is the value and when you see it in here 0 this is actually 0 to 7 it doesn't include the last one and then we go down here to Y which is our answer and we want just the last one just column 8 and you can do it this way with this particular notation and then if you remember we imported the train test split that's part of the SK learn right there and we simply put in our X and our y we're going to do random State equals zero you don't have to necessarily seed it that's a seed number I think the default is one when you seed it I'd have to look that up and then the test size test size is 0.2 that simply means we're going to take 20% of the data and put it aside so that we can test it later that's all that is and again we're going to run it not very exciting so far we haven't had any print out other than to look at the data but that is a lot of this is prepping this data once you prep it the actual lines of code are quick and easy and we're almost there with the actual writing of our KNN we need to go ahead and do a scale the data if you remember correctly we're fitting the data in a standard scaler which means instead of the data being from you know 5 to 33 in one column and the next column is 1 to six we're going to set that all so that all the data is between minus1 and one that's what that standard scaler does keeps it standardized and we only want to fit the scaler with the training set but we want to make sure the testing set is the X test going in is also transformed so it's processing it the same so here we go with our standard scaler we're going to call it scor X for the scaler and we're going to import the standard scaler into this variable and then our X train equals scor x. fit transform so we're creating the scaler on the XT train variable and then our X test we're also going to transform it so we've trained and transformed the XT train and then the X test isn't part of that training it isn't part of that of training the Transformer it just gets transformed that's all it does and again we're going to go and run this and if you look at this we've now gone through these steps all three of them we've taken care of replacing our zeros for key columns that shouldn't be zero and we replac that with the means of those columns that way that they fit right in with our data models we've come down here and we split the data so now we have our test data and our training data and then we've taken and we scaled the data so all of our data going in now no we don't tra we don't train the Y part the Y train and Y test that never has to be trained it's only the data going in that's what we want to train in there then Define the model using K neighbors classifier and fit the train data in the model so we do all that data prep and you can see down here we're only going to have a couple lines of code where we're actually building our model and training it that's one of the cool things about Python and how far we've come it's such an exciting time to be in machine learning because there's so many automated tools let's see before we do this let's do a quick length of and let's do y we want let's just do length of Y and we get 768 and if we import math we do math. square root let's do y train there we go it's actually supposed to be X train before we do this let's go ahead and do import math and do math square root length of Y test and when I run that we get 12 . 409 I want to see show you where this number comes from we're about to use 12 is an even number so if you know if you're ever voting on things remember the neighbors all vote don't want to have an even number of neighbors voting so we want to do something odd and let's just take one away we'll make it 11 let me delete this out of here that's one of the reasons I love Jupiter notebook because you can flip around and do all kinds of things on the fly so we'll go ahead and put in our classifier we're creating our classifier now and it's going to be the K neighbors classifier n neighbors equal 11 remember we did 12 - 1 for 11 we have an odd number of neighbors P equals 2 because we're looking for is it are they diabetic or not and we're using the ukian metric there are other means of measuring the distance you could do like square square means value there's all kinds of measure this but the ukian is the most common one and it works quite well it's important to evaluate the model let's use the confusion Matrix to do that and we're going to use the confusion Matrix wonderful tool and then we'll jump into the F1 score and finally accuracy score which is probably the most commonly used quoted number when you go into a meeting or something like that so let's go ahead and paste that in there and we'll set the cm equal to confusion Matrix y test y predict so those are the two values we're going to put in there and let me go ahe and run that and print it out and the way you interpret this is you have the Y predicted which would be your title up here we could do uh let's just do p predicted across cross the top and actual going down actual it's always hard to to write in here actual that means that this column here down the middle that's the important column and it means that our prediction said 94 and prediction and the actual agreed on 94 and 32 this number here the 13 and the 15 those are what was wrong so you could have like three different if you're looking at this across three different variables instead of just two you'd end up with the third row down here and the column going down the middle so in the first case we have the the and I believe the zero is the 94 people that don't have diabetes the prediction said that 13 of those people did have diabetes and were at high risk and the 32 that had diabetes it had correct but our prediction said another 15 out of that 15 it classified as incorrect so you can see where that classification comes in and how that works on the confusion Matrix then we're going to go ahead and print the F1 score let me just run that and you see we get a 69 in our F1 score the F1 takes into account both sides of the balance of false positives where if we go ahead and just do the accuracy account and that's what most people think of is it looks at just how many we got right out of how many we got wrong so a lot of people when you're a data scientist and you're talking to other data scientists they're going to ask you what the F1 one score the F score is if you're talking to the general public or the decision makers in the business they're going to ask what the accuracy is and the accuracy is always better than the F1 score but the F1 score is more telling it lets us know that there's more false positives than we would like on here but 82% not too bad for a quick flash look at people's different statistics in running an SK learn and running the knnn the K nearest neighbor on it why reinforcement learning train a machine learning model requires a lot of data which might not always be available to us further the data provided might not be reliable learning from a small subset of actions will not help expand the vast realm of solutions that may work for a particular problem you can see here we have the robot learning to walk um very complicated setup when you're learning how to walk and you'll start asking questions like if I'm taking one step forward and left what happens if I pick up a 50b op how does that change how a robot would walk these things are very difficult to program because there's no actual information on it until the it's actually tried out learning from a small subset of actions will not help expand the vast realm of solutions that may work for a particular problem and we'll see here it learned how to walk this is going to slow the growth that technology is capable of machines need to learn to perform actions by themselves and not just learn off humans and you see the objective climb a mountain real interesting point here is that as human beings we can go into a very unknown environment and we can adjust for it and kind of explore and play with it most of the models the non-reinforcement models in computer um machine learning aren't able to do that very well uh there's a couple of them that can be used or integrated to see how it goes is what we're talking about with reinfor forcement learning so what is reinforcement learning reinforcement learning is a subbranch of machine learning that trains a model to return an Optimum solution for a problem by taking a sequence of decisions by itself consider a robot learning to go from one place to another the robot is given a scenario and must arrive at a solution by itself the robot can take different paths to reach the destination it will know the best path by the time taken on each path it mighty even come up with a unique solution all by itself and that's really important is we're looking for Unique Solutions uh we want the best solution but you can't find it unless you try it so we're looking at uh our different systems our different model we have supervised versus unsupervised versus reinforcement learning and with the supervised learning that is probably the most controlled environment uh we have a lot of different supervised learning models whether it's linear regression neural networks um there's all kinds of things in between decision trees the data provided is labeled data with output values specified and this is important because when we talk about supervised learning you already know the answer for all this information you already know the picture has a motorcycle in it so you're supervised learning you already know that um the outcome for tomorrow for you know going back a week you're looking at stock you can already have like the graph of what the next day looks like so you have an answer for it and you have labeled data which is is used you have an external supervision and solves Problems by mapping labeled input to No One output so very controlled unsupervised learning and unsupervised learning is really interesting because it's now taking part in many other models they start with an you can actually insert an unsupervised learning model um in almost either supervised or reinforcement learning as part of the system which is really cool uh data provided is unlabeled data the output are not specified machine makes its own predictions used to solve association with clustering problems unlabeled data is used no supervision solves Problems by understanding patterns and discovering output uh so you can look at this and you can think um some of these things go with each other they belong together so it's looking for what connects in different ways and there's a lot of different algorithms that look at this um when you start getting into those are some really cool images to come up of what unsupervised learning is how it can pick out say uh the area of a donut one model will see the area of the donut and the other one will divide it into three sections based on this location versus what's next to it so there's a lot of stuff that goes in with unsupervised learning and then we're looking at reinforcement learning probably the biggest industry in today's market uh in machine learning or growing Market it's very it's very infant stage uh as far as how it works and what it's going to be capable of the machine learns from its environment using rewards and errors used to solve reward based problems no pre-defined data is used no supervision follows Trail and error problem solving approach uh so again we have a random at first you start with a random I try this it works and this is my reward doesn't work very well maybe or maybe it doesn't even get you where you're trying to get it to do and you get your reward back and then it looks at that and says well let's try something else and starts to play with these different things finding the best route so let's take a look at important terms in today's reinforcement model and this has become pretty standardized over the last uh few years so these are really good to know we have the agent uh agent is the model that is being trained via reinforcement learning so this is your actual U entity that has however you're doing it whether you're using a neural network or a q table or whatever combination thereof this is the the actual agent that you're using this is the model and you have your environment uh the training situation that the model must optimize to is called its environment uh and you can see here I guess we have a robot who's trying to get a chest full of gems or whatever and that's the output and then you have your action this is all possible steps that can be taken by the model and it picks one action and you can see here it's picked three different uh routes to get to the chest of diamonds and gems we have a state the current position condition returned by the model and you could look at this uh if you're playing like a video game this is the screen you're looking at uh so when you go back here uh the environment is a whole game board so if you're playing one of those Mobius games you might have the whole game board going on uh but then you have your current position where are you on that game board what's around that what's around you um if you were talking about a robot the environment might be moving around the yard where it is in the yard and what it can see what input it has in that location that would be the current position condition returned by the model and then the reward uh to help the model move in the right direction it is rewarded points are given to it to appraise some kind of action so yeah you did good or if uh didn't do as good trying to maximize the reward and have the best reward possible and then policy policy determines how an agent will behave at any time it acts as a mapping between action and present State this is part of the model what what is your action that you're you're going to take what's the policy you're using to have an output from your agent one of the reasons they separate uh policy as its own entity is that you usually have a prediction um of a different options and then the policy well how am I going to pick the best based on those predictions I'm going to guess at different options and we'll actually weigh those options in and find the best option we think will work uh so it's a little tricky but the policy thing is actually pretty cool how it works let's go ahead and take a look at a reinforcement learning example and just in looking at this we're going to take a look uh consider what a dog um that we want to train uh so the dog would be like the agent so you have your your puppy or whatever uh and then your environment is going to be the whole house or whatever it is where you're training them and then you have an action we want to teach the dog to fetch so action equals fetching uh and then we have a little biscuit so we can get the dog to perform various actions by offering incentives such as a dog biscuit as a reward the dog will follow a policy to maximize this reward and hence will follow every command and might even learn new actions like begging by itself uh so you have B you know so we start off with fetching it goes oh I get a biscuit for that it tries something else you get get a handshake or begging or something like that and then goes oh this is also reward-based and so it kind of explores things to find out what will bring it as biscuit and that's very much like how a reinforced model goes is it uh looks for different rewards how do I find can I try different things and find a reward that works the dog also will want to run around and play and explore his environment uh this quality of model is called exploration so there's a little Randomness going on in Exploration and explores new parts of the house climbing on the sofa doesn't get a reward in fact it usually gets kicked off the sofa so let's talk a little bit about markov's decision process uh markov's decision process is a reinforcement learning policy used to map a current state to an action where the agent continuously interacts with the environment to produce new Solutions and receive rewards and you'll see here's all of our different uh uh vocabul we just went over we have a reward our state our agent our environment inter our action and so even though the environment kind of contains everything um that you you really when you're actually writing the program your environment's going to put out a reward in state that goes into the agent uh the agent then looks at this uh state or it looks at the reward usually um first and it says okay I got rewarded for whatever I just did or didn't get rewarded and then looks at the state then it comes back and if you remember from policy the policy comes in um and then we have a reward the policy is that part that's connected at the bottom and so it looks at that policy and it says hey what's a good action that will probably be similar to what I did or um uh sometimes they're completely random but what's a good action that's going to bring me a different reward so taking the time to just understand these different pieces as they go is pretty important in most of the models today um and so a lot of them actually have templates based on this you can pull in and start using um pretty straightforward as far as once you start seeing how it works uh you can see your environment send it says hey this is the agent did this if you're a character in a game this happened and it shoots out a reward in a state the agent looks at the reward looks at the new state and then takes a little guess and says I'm going to try this action and then that action goes back into the environment it affects the environment the environment then changes depending on what the action was and then it has a new state and a new reward that goes back to the agent so in the diagram shown we need to find the shortest path between node A and D each path has a reward associated with it and the path with a maximum reward is what we want to choose the nodes AB b c d denote the nodes to travel from node uh A to B is an action reward is the cost of each path and policy is each path taken and you can see here a can go uh to b or a can go to C right off the bat or it can go right to D and if explored all three of these uh you would find that a going to D was a zero reward um a going to C and D would generate a different reward or you could go AC b d there's a lot of options here um and so when we start looking at this diagram you start to realize that even though uh today's reinforced learning models do really good uh um finding an answer they end up trying almost all the different directions you see and so they take up a lot of work uh or a lot of processing time for reinforcement learning they're right now in their infant stage and they're really good at solving simple problems and we'll take a look at one of those in just a minute in a tic tac toe game uh but you can see here uh once it's gone through these and it's explored it's going to find the ACD is the best reward it gets a full 30 points for it so let's go ahead and take a look at a reinforcement learning demo uh in this demo we're going to use reinforcement learning to make a tic tac toe game you will be playing this game Against the Machine learning model and we'll go ahead we're doing it in Python so let's go ahead and go through um I always uh not always actually have a lot of python tools let's go through um Anaconda which will open up a Jupiter notebook seems like a lot of steps but it's worth it to keep all my stuff separate and it's also has a nice display when you're in the Jupiter notebook for doing python so here's our Anaconda Navigator I open up the notebook which is going to take me to a web page and I've gone in here and created a new uh python folder in this case I've already done it and enabled it to change the name to tic-tac-toe uh and then for this example uh we're going to go ahead and import a couple things we're going to um import nump as NP we'll go ahead and import pickle nump of course is our number array and then pickle is just a nice way sometimes for storing uh different information uh different states that we're going to go through on here uh and so we're going to create a class called State we're going to start with that and there's a lot of lines of code to this uh class that we're going to put in here don't let that scare you too much there's not as much here um it looks like there's going to be a lot here but there really is just a lot of setup going on in the in our class State and so we have up here we're going to initialize it um we have our board um it's a tick Tech toe board so we're only dealing with nine spots on the board uh we have player one player two uh is end we're going to create a board hash uh we'll look at that in just a minute we're just going to stir some information in there symbol of player equals one um so there's a few things going on as far as the initialization uh then something simple we're just going to get the hash um of the board we're going to get the information from the board on there which is uh columns and rows we want to know when a winner occurs uh so if you get three in a row that's what this whole section here is for um me go ahead and scroll up a little bit and you can get a copy of this code if you send a note over to Simply we'll send you over um this particular file and you can play with it yourself and see how it's put together I don't want to spend a huge amount of time on this uh because this is just some real General python coding uh but you can see here we're just going through um all the rows and you add them together and if it equals three three in a row same thing with columns um diagonal so you got to check the diagonal that's what all this stuff does here is it just goes through the different areas actually let me go ahead and put there we go um and then it comes down here and we do our sum and it says true uh minus three just says did somebody win or is it a tie so you got to add up all the numbers on there anyway just in case they're all filled up and next we also need to know available positions um these are ones that don't no one's ever used before this way when you try something or the computer tries something uh it's not going to give it an illegal move that's what the available positions is doing uh then we want to update our state and so you have your position going in we're just sending in the position that you just chose and you'll see there's a little user interface we put in there we P pick the row and column in there and again I mean this is a lot of code uh so really it's kind of a thing you'd want to go through and play with a little bit and just read through it get a copy of it uh great way to understand how this works and here here is a given reward um so we're going to give a reward result equals self winner this is one of the hearts of what's going on here uh is we have have a result self. winner so if there's a winner then we have a result if the result equals one here's our feedback uh if it doesn't equal one then it gets a zero so it only gets a reward in this particular case if it wins and that's important to know because different uh systems of reinforced learning do rewarding a lot differently depending on what you're trying to do this is a very simple example with a a 3X3 board imagine if you're playing a video game uh certainly you only have so many actions but your environment is huge you have a lot going on in the environment and suddenly a reward system like this is going to be just um it's going to have to change a little bit it's going to have to have different rewards and different setup and there's all kinds of advanced ways to do that as far as weighing you add weights to it and so they can add the weights up depending on where the reward comes in so it might be that you actually get a reward in this case you get the reward at the end of the game I'm spending just a little bit of time on this because this is an important thing to note but there's different ways to add up those rewards it might have like if you take a certain path um the first reward is going to be weighed a little bit less than the last reward because the last reward is actually winning the game or scoring or whatever it is so this reward system gets really complicated on some of the more advanced uh setups um in this case though you can see right here that they give a a a 0.1 and a 0. five reward um just for getting a picking the right value and something that's actually valid instead of picking an invalid value so rewards again that's like key it's huge how do you feed the rewards back in uh then we have a board reset that's pretty straightforward it just goes back and resets the board to the beginning because it's going to try out all these different things while it's learning it's going to do it by trial and error so you have to keep resetting it and then of course there's the play we want to go ahead and play uh rounds equals 100 depends on what you want to do on here um you can set this different you obviously set that to a higher level but this is just going to go through and you'll see in here uh that we have have player one and player two this is this is the computer playing itself uh one of the more powerful ways to learn to play a game or even learn something that isn't a game is to have two of these models that are basically trying to beat each other and so they they keep finding explore new things this one works for this one so this one tries new things it beats this we've seen this in um chess I think was a big one where they had the two players in chess with reinforcements learning uh was one of the ways they train one of the top um computer chess playing algorithms uh so this is just what this is it's going to choose an action it's going to try something and the more it tries stuff um the more we're going to record the hash we actually have a board hash where they self get the hash setup on here where it stores all the information and then once you get to a win one of them wins it gets the reward uh then we go back and reset and try again again and then kind of the fun part we actually get down here is uh we're going to play with a human so we'll get a chance to come in here and see what that looks like when you put your own information in and then it just comes in here does the same thing it did above it gives it a reward for its things um or sees if it wins or ties um looks at available positions all that kind of fun stuff and then finally we want to show the board uh so it's going to print the board out each time really um as an integration is not that exciting what's exciting uh in here is one looking at this reward system whoops Play One More up the reward system is really the heart of this how do you reward the different uh setup and the other one is when it's playing it's got to take an action and so what it chooses for an action is also the heart of reinforcement learning how do we choose that action and those are really key to right now now where reinforcement learning is um in today's uh technology is uh figuring this out how do we reward it and how do we guess the next best action so we have our uh environment and you can see the environment is we're going to be or the state uh which is kind of like what's going on we're going to return the state depending on what happens and we want to go ahead and create our agent uh in this CL our player so each one is let me go grab that and so we look at a class player um this is where a lot of the magic is really going on is what how is this player figuring out how to maneuver around the board and then the board of course returns a state uh that it can look at and a reward uh so we want to take a look at this we have uh name uh self State this is class player and when you say class player we're not talking about a human player we're talking about um just a uh the computer players and this is kind of interesting I remember I told you depending on what you're doing there's going to be a Decay gamma um explore rate uh these are what I'm talking about is how do we train it um as you try different moves it gets to the end the first move is important but it's not as as important as the last one and so you could say that the last one has the heaviest weight and then as you as you get there the first one let's see the first move gives you a five reward the second gives you a two reward and the third one gives you a 10 reward because that's the final ending you got it the 10's going to count more than the first step uh and here's our uh we're going to you know get the board information coming in and then choose an action this was the second part that I was talking about that was so important uh so once you have your training going on we have to do a little Randomness and you can see right here is our NP random uh uniform so it's picking out a random number take a random action this is going to just pick which row and which column it is um and so choosing the action this one you can see we're just doing random States um Choice length of positions action position and then it skips in there and takes a look at the board uh for p and positions you it's actually storing the different boards each time you go through so it has a record of what it did so it can properly weigh the values and this simply just depins a hash state what's the last date pin it to the uh to our states on here here's our feedback reward so the reward comes in and it's going to take a look at this and say is it none uh what is the reward and here is that formula remember I was telling you about up here um that was important because it has Decay gamma times a reward this is where as it goes through each step and this is really important this this is this is kind of the heart of this of what I was talking about earlier uh you have step one and this might have a reward of two you have step two I should probably should have done ABC this has a step three uh step four so on till you get to step in and this might have a reward of 10 uh so reward of 10 we're going to add that but we're not adding uh let's say this one right here uh let's say this reward here right before 10 was um let's say it's also 10 it just makes the the math easy so we had 10 and 10 we had 10 this is 10 and 10 n whatever it is but it's time it's 0.9 uh so instead of putting a full 10 here we only do nine that's a 0. n times 10 and so this formula um as far as the Decay times the reward minus the cell State value uh it basically adds in it says here's one or here's two I'm sorry I should have done this ABC it would have been easier uh so the first move goes in here and it puts two in here uh then we have our self uh setup on here you can see how this gets pretty complicated in the math but this is really the key is how do we train our stick and we want the the final State the win to get the most points if you win you get most points um and the first step gets the least amount of points so you're really training this almost in Reverse you're training you're training it from the last place where you have like it says okay this is now where I need to sum up my rewards and I want to sum them up going in reverse and I want to find the answer in Reverse kind of an interesting uh uh play on the mind when you're trying to figure this stuff out and of course we want to go ahead and reset the board down here uh and save the policy load policy these are the different things that are going in between the agent and the state to figure out what's going on let's go ahead and load that up and then finally we want to go ahead and create a human player and the human player is going to be a little different uh in that uh you choose an action row and column here's your action uh if action is is if action in positions meaning positions that are available uh you return the action if not it just keeps asking you until you get an action that actually works and then we're going to go ahead and append to the hash state which uh we don't need to worry about because it Returns the action up here and feed forward uh again this is because it's a human um at the end of the game B propagate and update State values this part isn't being done cuz it's not programming uh the model uh the model is getting its own rewards so we've gotone ahead and loaded this in here uh so here's all our pieces and the first thing we want to do is set up uh P1 player one uh P2 player two and then we're going to send our players to our state so now it has P1 P2 and it's going to play and it's going to play 50,000 rounds now we can probably do a lot less than this and it's not going to get the full results in fact you know what uh let's go ahead and just do five um just to play with it cuz I want to show you something here oops somewhere in there I forgot to load something there we go I must have forgot to run this run oops forgot a reference there for the board rows and columns 3x3 um there is actually in the state it references that we just tack it on on the end it was supposed to be at the beginning uh so now I've only set this up with um see where are we going here I've only set this up to train five times and the reason I did that is we're going to uh come in and actually play it and then I'm going to change that and we can see how it differs on there there we go and I didn't even make it through a run and we're going to go ahead and save the policy um so now we have our player one and our player two policy uh the way we set it up it has two separate policies loaded up in there and then we're going to come in here and we're going to do uh player one is going to be the computer experience rate zero load policy one human player human and we're going to go ahead and play this now I remember I only went through it um uh just one round of training in fact minimal training and so it puts an there and I'm going to go ahead and do row zero column one you can see this is very uh basic on here and so I put in my zero and then I'm going to go zero block it zero zero and you can see right here it let me win uh just like that I was able to win zero two and woo human wins so I only trained it five times we're going to run this again and this time uh instead of five let's do 5,000 or 50,000 I think that's what the guys in the back had and this takes a while to train it this is where reinforcement learning really falls apart look how simple this game is we're talking about uh a 3X3 set of columns and so for me to train it on this um I could do a q table which would take which would go much quicker um you could build a quick Q table with almost all the different options on there and uh you would probably get a the same result much quicker we're just using this as an example so when we look at reinforcement learning you need to be very careful what you apply it to it sounds like a good deal until you do like a large neural network where you're doing um you set the neural network to a learning increment of one so every time it goes through it learns and then you do your action so you pick from the learning uh setup and you actually try actions on the learning setup until you get the what you think is going to be the best action so you actually feed what you think is right back through the neural network there's a whole layer there which is really fun to play with and then it has an output well think of all those processes I mean that is just a huge amount of work it's going to do uh let's go ahead and Skip ahead here give it a moment it's going to take a a minute or two to go ahead and run now to train it uh we went ahead and let it run and it took a while this this took um I got a pretty powerful processor and it took about five minutes plus to run it and we'll go ahead and uh run our player setup on here oops I brought in the last whoops I brought in the last round so give me just a moment to redo the policy save there we go I forgot to save the policy back in there and then go ahead and run our player again so we we've saved the policy and then we want to go ahead and load the policy for P1 as a computer and we can see the computer's gone in the bottom right corner I'm going to go ahead and go uh one one which is a center and it's gone right up the top and if you have ever played Tic Tac Toe you know the computer has me uh but we'll go ahead and play it out row zero column two there it is and then it's gone here and so I'm going to go ahead and go row 01 2 no 01 there we go and column zero that's where I wanted oh and it says I okay you your action there we go boom uh so you can see here we've got a didn't catch the win on this it said TI um kind of funny that didn't catch the win on there but if we play this a bunch of times you'll find it's going to win more and more the more more we train it the more the reinforcement happens this lengthy training process uh is really the stopper on reinforcement learning as this changes reinforcement learning will be one of the more powerful uh packages evolving over the next decade or two in fact I would even go as far as to say it is the most important uh machine learning tool and artificial intelligence tool out there as it learns not only a simple Tic Tac Toe board but we start learning in environments and the environment would be like in language if you're translating a language or something from one language to the other so much of it is lost if you don't know the context it's in what's the environments it's in and so being able to attach environment and context and all those things together is going to require reinforcement learning to do so again if you want to get a copy of the Tic Tac Toe board it's kind of fun to play with uh run it you can test it out you can do U you know test it for different uh uh Val values you can switch from P1 computer um where we loaded the policy one to load the policy 2 and just see how it varies there's all kinds of things you can do on there so what is q-learning Q learning is reinforcement learning policy which will fill the next best action given a current state it chooses this action at random and aims to maximize the reward and so you can see here's our standard reinforcement learning graph um by now if you're doing any reinforcement learning you should be familiar with this where you have your agent your agent takes an action the action affects the environment and then the environment sends back the reward or the feedback and the state the new state the agent's in where is it at on the chess board where's it at in the video game um if your robots out there picking trash up off the side of the road where is it at on the road consider an ad recommendation system usually when you look up a product online you get ads which will suggest the same product over and over again using Q learning we can make an ad recommendation system which will suggest related products to our previous purchase the reward will be if user clicks on the suggested product and again you can see um you might have a lot of products on uh your web advertisement or your pages but it's still not a float number it's still a set number and that's something to be aware of when you're using Q learning and you can can see here that if you have a 100 people clicking on ads and you click on one of the ads it might go in there and say okay this person clicked on this ad what is the best set of ads based on clicking on this ad or these two ads afterwards based on where they are browsing so let's go a and take a look at some important terms when we talk about Q learning uh we have States the state s represents the current position of an agent in an environment um the action the action a is the step taken by the agent when it is particular State rewards for every action the agent will get a positive or negative [Music] reward and again uh when we talk about States we're usually not with when you're using a q table you're not usually talking about float variables you're talking about true false uh and we'll take a closer look at that in a second and episodes when an agent ends up in a terminating State and can't take a new action uh this might be if you're playing a video game name your character stepped in and is now dead or whatever uh Q values used to determine how good an action a taken at a particular State s is QA of s and temporal difference a formula used to find the Q value by using the value of the current state and action and previous state and action and very I mean there's bellman's equation which basically is the equation that kind of uh covers what we just looked at and all those different terms the Bellman equation is used to determine the values of a particular State and deduce how good it is to be in take that state the optimal the optimal state will give us the highest optimal value Factor influencing Q values the current state and action that's your essay so your current state in your action uh then you have your previous state in action which is your s um I guess Prime I'm not sure how they how they reference that S Prime a Prime so this is what happened before uh then you have a reward for Action so you have your R reward and you have your maximum expected future reward and you can see there's also a learning rate put in there and a discount rate uh so we're looking at these just like any other model we don't want to have an absolute um final value on here we don't want it to if you do absolute values instead of taking smaller steps you don't really have that approach to the solution you just have it jump and then pretty soon if you jump one solution out that's what's going to be the new solution whichever one jumps up really high first um kind of ruining the whole idea of doing a random selection and I'll go into the random selection just a second steps in Q learning step one create an initial Q table with all values initialized to zero again we're looking at 0 1 uh so are you you know here's our action we start we're an idol we took a wrong action we took a correct action and in and then we have our um actions fetching sitting and running of course we're just using the dog example and choose an action and perform it update values in the table and of course when we're choosing an action we're going to kind of do something random and just randomly pick one so you start out and you sit and you have then a um then depending on on that um um action you took you can now update the value for sitting after you start from start to sitting get the value of the reward and calculate the Val the value Q value using the Bellman equation and so now we attach a reward to sitting and we attach all those rewards we continue the same until the table's filled with or an episode ends and and IM was going to come back to the random side of this and there's a few different formulas I use for the random um setup to pick it I usually let whatever Q model I'm using do their standard one because someone's usually gone in and done the math uh for the optimal uh spread uh but you can look at this if I have running has a reward of 10 sitting has a reward of seven fetching has a reward of five um just kind of without doing like a a a means you know using the bell curve for the means value and like I said there's some math you can put in there to pick um so that you're more like so that running has even a higher chance U but even if you were just going to do an average on this you could do an average a random number by adding them all together uh so you get 10 + 7 + 5 is 22 you could do 0 to 22 or 0o to 21 but 1 to 22 one to five would be fetching uh and so forth you know the last 10 so you can just look at this as what percentage are you going to go for that particular option um and then that gets your random setup in there and then as you slowly increment these up uh you see that uh U if you're idle U where's one here we go sitting at the end if you're at the end of wherever you're at sitting gets a reward of one um where's the good one on here oh wrong action running for a wrong action gets almost no reward so that becomes very very less likely to happen but it still might happen it still might have a percentage of coming up and that's where the random program and Q learning comes in the below table gives us an idea of how many times an action has been taken and how positively correct action or negatively wrong action it is going to affect the next state so let's go ahead and dive in and pull up a little piece of code and see what this looks like um in Python uh in this demo we'll use Q learning to find the shortest path between two given points if getting your learning started is half the battle what if you could do that for free visit skillup by simply learn click on the link in the description to know more if you've seen my videos before um I like to do it in the uh Anaconda Jupiter notebook um setup just because it's really easy to see and it's a nice demo uh and so here's my anaconda this one I'm actually using uh python 36 environment that I set up in here and we'll go ahead and launch the Jupiter Notebook on this and once we're in our jupyter not notebook uh which has the kernel loaded with Python 3 we'll go ahead and create a new Python 3 uh folder in here and we'll call this uh Q learning and to start this demo let's go ahead and import our uh numpy array we'll just run that so it's imported and like a lot of these uh model programs when you're building them you spend a lot of time putting it all together um and then you end up with this really short answer at the end uh and we'll we'll take a look at that as we come into it so we we go ahead and start with our location to State uh so we have um L1 L2 these are our nine locations one to nine and then of course the state is going to be 0 1 2 3 4 it's just a mapping of our location to a integer on there and then we have our actions our actions are simply uh moving from um One location to another so I can go to I can go to location zero I can go to location 1 2 3 4 5 6 7 8 uh so these are my actions I can choose these are the locations of our state and if you remember earlier I mentioned uh um that the limitation is that you you don't want to put in um a continually growing table because you can actually create a dynamic Q table where you continually add in new values as they arise because um if you have float values this just becomes infinite and then your memory in your computer's gone or you know does it's not going to work at the same time you might think well that kind of really limits the the Q uh T learning setup but there are ways to use it in conjunction with other systems and so you might look at uh well I do um I've been doing some work in stock um and one of the questions that comes out out is to buy or sell the stock and the stake coming in might be um you might take a create what called buckets um where anything that you predict is going to return more than a certain amount of money um the error for that stock that you've had in the past you put those in buckets and suddenly as you start putting the creating these buckets you realize you do have a limited amount of information coming in you no longer have a float number you now have um bucket one two three and four and then you can take those buckets put them through a a q learning table and come up with the best action which stock should I buy it's like gambling stock is pretty much gambling if you're doing day trading you're not doing long-term um Investments and so you can start looking at it like that a lot of the um current feeds say that the best algorithms used for day Traders where you're doing it on your own is really to ask the question do do I want to trade the stock yes or no and now you have it in a q learning table and now you can take it to that next level and you can see where that can be a really powerful tool at the end of doing a basic linear regression model or something um what is the best investment and you start getting the best reward on there uh and so if we're going to have rewards these rewards we just create um it says uh if basically if you're uh this should match our Q table because it's going to be uh you have your state and you have your action across the top if you remember from the dog and so we have whatever state we're in going down and then the next action and what the reward is for it um and of course if you were actually doing a um something more connected your reward would be based on U the actual environment it's in and then we want to go ahead and create a state to location uh so we can map the indexes so just like we defined our rewards uh we're going to go and do state to location um and you can see here it's a a dictionary setup for location State and location to state with items and we also need to um Define what we want for learning rates uh you remember we had our two different rates um as far as like learning from the past and learning from the current so we'll go ahead and set those to uh 75 and the alpha set to 0.9 and we'll see that that when we do the formula and of course any of this code uh send a note to our simply learn team they'll get you a copy of this code on here let's go ahead and pull there we go the new next two sections um since we're going to keep it short and sweet here we go so let's go ahead and create our agent um so our agent is going to have our initialization where we send it the information uh we'll Define our self gamma equals gamma we could have just set the gamma rate down here instead of uh submitting it it's kind of nice to keep them separate because you can play with these numbers uh our self Alpha um then we have our location State we'll set that in here um we have our choice of actions um we're going to go ahead and just embed the rewards right into the agent so obviously this would be coming from somewhere else uh instead of from uh self-generated and then a self state to location equals our state to location uh dictionary and we go ahead and create a q learning table and I went ahead and just set the Q learning table up to um uh 0 to zero what what what the setup is uh location to State how many of them are there uh this just creates an array of zero to zero setup on there and then the big part is the training we have our rewards new equals a copy of self. reward WS ending State equals the self-location state in location so this is whatever we end up at rewards new equals ending State plus ending State equals 999 just kind of goes to a dead end and we start going through iterations and we'll go ahead um let's do this uh so this we're going to come back and we're going to call call it on here uh let me just erase that switch it to an arrow there we go uh so what we're doing is we're going to send in here to train it we're going to say hey um I want to iterate through this a thousand times and see what happens now this part would actually be instead of iterating you might have your external environment and they're going back and forth and you iterate through outside of here uh but just for ease of use our agent's going to come in here and iterate through this sometimes I'll put this iteration in here and I'll have it call the environment and say hey this is what I did what's the next state and the environment does its thing right in here as I iterate through it uh and then we want to go ahead and pick a random state to start with that's what's going on here you have to start somewhere um and then you have your playable actions we're going to start with just an empty thing for playable actions and we'll fill that up so that's what choices I have and so we're going to iterate through the the rewards Matrix to get the states uh directly reachable from the randomly chosen current state assign those states to a list named playable actions and so you can see here we have uh range nine I usually use length of whatever I'm looking at uh which is our locations or States as they are uh we have a reward so we want to look at the current the rewards uh the new reward is our uh is in our chart here of Rewards underscore new uh current state um plus J uh J being what is the next state we want to try and so we goe and do our playable actions and we append J and so we're doing is we're randomly trying different things in here to see what's going to generate a better reward and then of course we go ahead and choose our next State uh so we have our random Choice playable actions and if you remember I mentioned on this let me just go ahead and uh oops let's do a free form when we were talking about the next State uh this right here just does a random selection instead of a random uh selection you might do something where uh whatever the best selection is which might be option three here and then so you can see that it might use a bell curve and then option two over here might have a bell curve like this oops and we start looking at these averages and these spreads um or we can just add them all together and pick the one that kind of goes in all of those uh so those are some of the options we have in here we just go with a random Choice uh that's usually where you start play with it um and then we have our reward section down here and so we want to go ahead and find well in this case a temporal difference uh so you have your rewards new plus the self gamma and this is the formula we were looking at this is bellman's equation here uh so we have our current value our learning rate our discount rate involved in there the reward system coming in for that um and we can add it all together this is of course our uh maximum expected future setup in here uh so this is all of our our bellman's equation that we're looking at here and then we come up in here and we update our Q table that's all this is on this one that's right here we have um self Q current state next state and we add in our um Alpha because we don't want to we don't want to train all of it at once in case there slight differ inches coming in there we want to slowly approach the answer uh and then we have our route equals the start location and next location equals start location so we're just incrementing we took a step forward and then finally remember I was telling you how uh we're going to do all this and just have some simple thing at the end where it just generates a simple path we're going to go ahead and and get the optimal route we want to find the best route in here and so we've created a definition for the optimal route down here just scroll down for that and we get the optimal route we go ahead and put the information in including the Q table self uh start location in location next location route q and it says while next location is not equal to in location so while we can still go our start location equals self location to State start location so we already have our best value for the start location uh the next state looks at the Q table and says hey what's uh the next one with the best value and then the next location we go ahead and pull that in and we just append it that's what's going on down here and then our start location equals the next location and we just go through all the steps and we'll go ahead and run this and now that we have our Q table our um Q agent loaded we're going to go ahead and uh take our Q agent load them up with our Alpha Gamma that we set up above um along with the location step action rewards state to location and uh our goal is to plot a course between L9 and L1 and we're going to go through aund a thousand iterations on here and so when I run that it runs pretty quick uh why is this so fast um if you've been running neural networks and you've been doing all these other models you sit here and wait a long time well we're a very small amount of data these are all integers these aren't float values there's not a the math is not heavy on the on the processing end and this is where Q tables are so powerful if you have a small amount of information coming in you very quickly uh get an answer off of this even though we went through it a thousand times to train it and you'll see here we have l985 2 and one and that's based on our reward table we had set up on there and this is the shortest path going between these different uh setups in here and if you remember on our reward table uh you can see that if you start here you can go to here there's places you can't go that's how this reward table was set up so I can only go to certain places uh so kind of a little maze setup in there and you can play with it this is really fun uh setup to play with uh and you can see how you can take this whole code and you can like I was saying earlier you can embedded into another setup in model and predictions where you put things into buckets and you're trying to guess the best investment the best course of action long as you can take that course of action and and uh uh reduce it down to a yes no um or if you're using text you can use a one hot encoder which word is next there's all kinds of things you can do with a Q table uh depending on just how much information you're putting in there so that wraps up our demo and this demo we've uh found the shortest distance between two paths based on whatever rules or state rewards we have to get from point A to point B and what available actions there are hello and welcome to this tutorial on deep learning my name is moan and in the next about 1 one and a half hours I will take you through what is deep learning and into tens oflow environment to show you an example of deep learning now there are several applications of deep learning really very interesting and Innovative application and one of them is identifying the geographic location based on a picture and how does this work the way it works is pretty much we train an artificial neural network with millions of images which are tagged their geolocation is tagged and then when we feed a new picture it will be able to identify the geolocation of this new image for example you have all these images especially with maybe some significant monuments or or U significant locations and you train with millions of such images and then when you feed another image it need not be exactly one of those that you have claimed it can be completely different that is the whole idea of training it will be able to recognize for example that this is a picture from Paris because it is able to recognize the eiar so the way it works internally if we have to look a little bit under the her is these images are nothing but this is digital information in the form of pixels so each image could be a certain size it can be 256x 256 pixel kind of a resolution and then each pixel is either having a certain grade of color and all that is fed into the neural network and it then gets trained in and it's able to based on these pixels pixel information it is able to get trained and able to recognize the features and extract the features and thereby it is able to identify these images and the location of these images and then when you feed a new image it kind of based on the training it will be able to figure out where this image is from so that's the way a little bit under the hood how it works so what are we going to do in this tutorial we will see what is deep learning and what do we need for deep learning and one of the main components of deep learning is neural network so we will see what is neural network what is a perceptron and how to implement logic gates like and or nor and so on using perceptrons the different types of neural networks and then applications of deep learning and we will also see how neural networks works so how do we do the training of neural networks and at the end we will end up with a small demo C code which will take you through intens of flow now in order to implement deep learning code there are multiple libraries or development environments that are available and tensorflow is one of them so the focus at the end of this would be on how to use tensorflow to write a piece of code using python as a programming language and we will take up a an example which is a very common one which is like the hollow world of deep learning the handwriting number recognition which is a mnist commonly known as mnist database so we will take a look at Mist database and how we can train a neural network to recognize handwritten numbers so that's what you will see in this particular video so let's get started what is deep learning deep learning is like a subset of what is known as a high level concept called artificial intelligence you must be already familiar must have heard about this term artificial intelligence so artificial intelligence is like the high level concept if you will and in order to implement artificial intelligence applications we use what is known as machine learning and within machine learning a subset of machine learning is deep learning machine learning is a little bit more generic concept and deep learning is one type of machine learning if you will and we will see a little later later in maybe the following slides a little bit more in detail how deep learning is different from traditional machine learning but to start with we can mention here that deep learning uses one of the differentiators between deep learning and traditional machine learning is that deep learning uses neural networks and we will talk about what are neural networks and how we can Implement neural networks and so on and so forth as a part of this tutorial so a little deeper into deep learning deep learning primarily in involves working with complicated unstructured data compared to traditional machine learning with where we normally use structured data in deep learning the data would be primarily images or Voice or maybe text file so and it is large amount of data as well and deep learning can handle complex operations it involves complex operations and the other difference between traditional machine learning and deep learning is that the feature extraction happens pretty much automatically in traditional machine learning feature engineering is done manually the data scientists we data scientists have to do feature engineering feature extraction but in deep learning that happens automatically and of course deep learning for large amounts of data complicated unstructured data deep learning gives very good performance now as I mentioned one of the secret sources of deep learning is neural networks let's see what neural networks is is neural networks is based on our biological neurons the whole concept of deep learning and artificial intelligence is based on human brain and human brain consists of billions of tiny stuff called neurons and this is how a biological neuron looks and this is how an artificial neuron look so neural networks is like a simulation of our human brain human brain has billions of biological neurons and we are trying to simulate the human brain using artificial neurons this is how a biological neuron looks it has tendres and the corresponding component with an artificial neural network is or an artificial neuron are the inputs they receive the inputs through dendrites and then there is the cell nucleus which is basically the processing unit in away so in artificial neuron also there is a piece which is an equivalent of this cell nucleus and based on the weights and biases we will see what exactly weights and biases are as we move the input gets processed and that results in an output in a biological neuron the output is sent through a synapse and in an artificial neuron there is an equivalent of that in the form of an output and biological neurons are also intercon connected so there are billions of neurons which are interconnected in the same way artificial neurons are also interconnected so this output of this neuron will be fed as an input to another neuron and so on now in neural network one of the very basic units is a perceptron so what is a perceptron A perceptron can be considered as one of the fundamental units of neural networks it can consist at least one neuron but some sometimes it can be more than one neuron but you can create a perceptron with a single neuron and it can be used to perform certain functions it can be used as a basic binary classifier it can be trained to do some basic binary classification and this is how a basic perceptron looks like and this is nothing but a neuron you have inputs X1 X2 X to xn and there is a summation function and then there is what is known as an activation function and based on this input what is known as the weighted sum the activation function either gets gives an output like a zero or a one so we say the neuron is either activated or not so that's the way it works so you get the inputs these inputs are each of the inputs are multiplied by a weight and there is a bias that gets added and that whole thing is fed to an activation function and then that results in an output and if the output is is correct it is accepted if it is wrong if there is an error then that error is fed back and the neuron then adjust the weights and biases to give a new output and so on and so forth so that's what is known as the training process of a neuron or a neural network there's a concept called perceptron learning so perceptron learning is again one of the very basic learning processes the way it works is somewhat like this so you have all these inputs like X1 to xn and each of these inputs is multiplied by a weight and then that sum this is the formula of the equation so that sum wi ixi Sigma of that which is the sum of all these product of X and W is added up and then a bias is added to that the bias is not dependent on the input but or the input values but the bias is common for one neuron however the bias value keeps changing during the training process once the training is completed the values of these weights W1 W2 and so on and the value of the bias gets fixed so that is basically the whole training process and that is what is known as the perceptron training so the weights and biases keep changing till you get the accurate output and the summation is of course pass through the activation function as you see here this wixi summation Plus B is passed through activation function and then the neuron gets either fired or not and based on that there will be an output that output is compared with the actual or expected value which is also known as labeled information so this is the process of supervised learning so the output is already known and um that is compared and thereby we know if there is an error or not and if there is an error the error is fed back and the weights and biases are updated accordingly till the error is reduced to the minimum so this iterative process is known as perceptron learning or perceptron learning Rule and this error needs to be minimized so till the error is minimized this iteratively the weights and biases keep changing and that is what is the training process so the whole idea is to update the weights and the bias of the perceptron till the error is minimized the error need not be zero the error may not ever reach zero but the idea is to keep changing these weights and bias so that the error is minimum the minimum possible that it can have so this whole process is an iterative process and this is the iteration continues till either the error is zero which is uh unlikely situation or it is the minimum possible Within These given conditions now in 1943 two scientists Warren mik and Walter Pitts came up with an experiment where they were able to implement the logical functions like and or and nor using neurons and that was a significant breakthrough in a sense so they were able to come up with the most common logical Gates they were able to implement some of the most common logical Gates which could take two inputs Like A and B and then give a corresponding result so for example in case of an and gate A and B and then the output is a in case of an orgate it is a plus b and so on and so forth and they were able to do this using a single layer perceptron now most of these gits it was possible to use single layer perceptron except for XR and we will see why that is in a little bit so this is how an endgate works the inputs A and B the output should be fired or the neuron should be fired only when both the inputs are one so if you have 0 0 the output should be zero for 0 1 it is again 0 1 0 again 0 and 1 one the output should be one so how do we implement this with a neuron so it was found that by changing the values of weights it is possible to achieve this logic so for example if we have equal weights like 7.7 and then if we take the sum of the weighted product so for example 7 into 0 and then 7 into 0 will give you 0 and so on and so forth and in the last case when both the inputs are one you get a value which is greater than one which is the threshold so only in this case the neuron gets activated and the output is there is an output in all the other cases there is no output because the threshold value is one so this is implementation of an and gate using a single perceptron or a single neuron similarly an orgate in order to implement an orgate in case of an orgate the output will be one if either of these inputs is one so for example 01 will result in one or other in all the cases it is one except for 0 0 so how do we implement this using a perceptron once again if you have a perceptron with weights for example 1.2 now if you see here if in the first case when both are zero the output is zero in the second case when it is 0 and 1 1.2 into 0 is 0 and then 1.2 into 1 is 1 and in the second case similarly the output is 1.2 in the last case when both the inputs are one the output is 2.4 so during the training process these weights will keep changing and then at one point where the weights are equal to W1 is equal to 1.2 and W2 is equal to 1.2 the system learns that it gives the correct output so that is implementation of orgate using a single neuron or a single layer perceptron now exor gate this was one of the challenging ones they tried to implement an XR gate with a single level perceptron but it was not possible and therefore in order to implement an XR so this was like a a roadblock in the progress of U neural network how however subsequently they realize that this can be implemented an XR gate can be implemented using a multi-level perceptron or MLP so in this case there are two layers instead of a single layer and this is how you can Implement an XR gate so you will see that X1 and X2 are the inputs and there is a hidden layer and that's why it is denoted as H3 and H4 and then you take the output of that and feed it to the output at 05 and provide a threshold here so we will see here that this is the numerical calculation so the weights are in this case for X1 it is 20 and minus 20 and once again 20 and minus 20 so these inputs are fed into H3 and H4 so you'll see here for H3 the input is 01 1 1 and for H4 it is 1011 and if you now look at the output final output with where the threshold is taken as one if we use a sigmoid with the threshold one you will see that in these two cases it is zero and in the last two cases it is one so this is a implementation of XR in case of XR only when one of the inputs is one you will get an output so that is what we are seeing here if we have either both the inputs are one or both the inputs are zero then the output should be zero so that is what is an exclusive or gate so it is exclusive because only one of the input should be one and then only you'll get an output of one which is Satisfied by this condition so this is a special implementation XR gate is a special implementation of perceptron now that we got a good idea about perceptron let's take a look at what is the neural network so we have seen what is a perceptron we have seen what is a neuron so we will see what exactly is a neural network so neural network is nothing but a network of these neurons and there are different types of neural networks there are about five of them these are artificial neural network convolutional neural network then recursive neural network or recurrent neural network deep neural network and deep belief Network so and each of these types of neural networks have a special you know they can solve a special kind of problems for example convolutional neural networks are very good at performing image processing and image Rec recognition and so on whereas RNN are very good for speech recognition and also text analysis and so on so each type has some special characteristics and they can they're good at performing certain special kind of tasks what are some of the applications of deep learning deep learning is today used extensively in gaming you must have heard about alphao which is a game created by a startup called Deep Mind which got acquired by Google and alphao is an AI which defeated the human world champion leas at all in this game of Go so gaming is an area where deep learning is being extensively used and a lot of research happens in the area of gaming as well in addition to that nowadays there are neural networks a special type called generative adversarial networks which can be used for synthesizing either images or music or text and so on and they can be used to compose music so the neural network can be trained to compose a certain kind of music and autonomous cars you must be familiar with Google Google's self-driving car and today a lot of Automotive companies are investing in this space and uh deep learning is a core component of this autonomous cars the cars are trained to to recognize for example the road the the lane markings on the road signals any objects that are in front any obstruction and so on and so forth so all this involves deep learning so that's another major application and robots we have seen several robots including Sofia you may be familiar with sopia who was given a citizenship by Saudi Arabia and there are several such robots which are very humanlike and and the underlying technology in many of these robots is deep learning medical Diagnostics and Health Care is another major area where deep learning is being used and within Healthcare Diagnostics again there are multiple areas where deep learning and image recognition image processing can be used for example for cancer detection as you may be aware if cancer is detected early on it can be cured and one of the challenges es is in the availability of Specialists who can diagnose cancer using these diagnostic images and various scans and and so on and so forth so the idea is to train neural network to perform some of these activities so that the load on the cancer specialist doctors or oncologists comes down and there is a lot of research happening here and there are already quite a few applications that are claimed to be performing better than human beings in this space can be lung cancer it can be breast cancer and so on and so forth so Healthcare is a major area where deep learning is being applied let's take a look at the inner working of a neural network so how does an artificial neural network let's say identify can we train a neural network to identify the shapes like squares and circles and triangles when these images are fed so this is how it works any image is nothing but it is a digital information of the pixels so in this particular case let's say this is an image of 28x 28 pixel and this is an image of a square there's a certain way in which the pixels are lit up and so these pixels have a certain value maybe from 0 to 256 and 0 indicates that it is black or it is dark and 256 indicates it is complet completely it is white or lit up so that is like an indication or a measure of the how the pixels are lit up and so this is an image is let's say consisting of information of 784 pixels so all the information what is inside this image can be kind of compressed into this 784 pixels the way each of these pixels is lit up provides information about what exactly is the image so we can train neural networks to use that information and identify the images so let's take a look how this works so each neuron the value if it is close to one that means it is white whereas if it is close to zero that means it is black now this is a an animation of how this whole thing works so these pixels one of the ways of doing it is with we can flatten this image and take this complete 784 pixels and feed that as input to our neural network the neural network can consist of probably several layers there can be a few hidden layers and then there is an input layer and an output layer now the input layer take these 784 pixels as input the values of each of these pixels and then you get an output which can be of three types or three classes one can be a square a circle or a triangle now during the training process there will be initially obviously you feed this image and it will probably say it's a circle or it will say it's a triangle so as a part of the training process we then send that error back and the weights and the biases of these neurons are adjusted till it correctly identifies that this is a square that is the whole training mechanism that happens out here now let's take a look at a circle same way so you feed these 784 pixels there is a certain pattern in which the pixels are lit up and the neural network is trained to identify that pattern and during the training process once again it would probably initially identify it incorrectly saying this is a square or a triangle and then that error is fed back and and the weights and biases are adjusted finally till it finally gets the image correct so that is the training process so now we will take a look at same way a triangle so now if you feed another image which is consisting of triangles so this is the training process now we have trained our neural network to classify these images into a triangle or a circle and a square so now this neural Network can identify these three types of objects now if you feed another image and it will be able to identify whether it's a square or a triangle or a circle now what is important to be observed is that when you feed a new image it is not necessary that the image or the the triangle is exactly in this position now the neural network actually identifies the patterns so even if the triangle is let's say positioned here not exactly in the middle but maybe at the corner or in the side it would still identify that it is a triangle and that is the whole idea behind pattern recognition so how does this staining process work this is a quick view of how the training process works so we have seen that a neuron consists of inputs it receives inputs and then there is a weighted sum which is nothing but this XI wi summation of that plus the bias and this is then fed to the activation function and that in turn gives us a output now during the training process initially obviously when you feed these images when you send maybe a square it will identify it as a triangle and when you maybe feed a triangle it will identify as a square and so on so that error information is fed back and initially these weights can be random maybe all of them have zero values and then it will slowly keep changing so the as a part of the training process the values of these weights W1 W2 up to WN keep changing in such a way that towards the end of the training process it should be able to identify these images correctly so till then the weights are adjusted and that is known as the training process so and these weights are numeric values it could be 0.5 25.35 and so on it could be positive or it could be negative and the value that is coming here is the pixel value as we have seen it can be anything between 0 to 1 you can scale it between 0 to 1 or 0 to 256 whichever way Zer being black and 256 being white and then all the other colors in between so that is the input so these are numerical values this multiplication or the product W ixi is a numerical value and the bias is also a numerical value we need to keep in mind that the bias is fixed for a neuron it doesn't change with the inputs whereas the weights are one per input so that is one important point to be noted so but the bias also keeps changing initially it will again have a random value but as a part of the training process the weights the values of the weights W1 W2 WN and the value of B will change and ultimately once the training process is complete these values are fixed for this particular neuron W1 W2 up to WN and plus the value of the B is also fixed for this particular neuron and in this way there will be multiple neurons and each there may be multiple levels of neurons here and that's the way the training process works so this is another example of multi-layer so there are two hidden layers in between and then you have the input layer values coming from the input layer then it goes through multiple layers hidden layers and then there is an output layer and as you can see there are weights and biases for each of these neurons in each layer and all of them gets keeps changing during the training process and at the end of the training process all these weights have a certain value and that is a trained model and those values will be fixed once the training is completed all right then there is something known as activation function neural networks consists of one of the components in neural networks is activation function and every neuron has an activation function and there are different types of activation function that are used it could be a relu it could be sigmoid and so on and so forth and the activation function is what decides whether a neuron should be fired or not so whether the output should be zero or one is decided by the activation function and the activation function in turn takes the input which is the weighted sum remember we talked about wixi + B that weighted sum is fed as a input to the activation function and then the output can be either as zero or a one and there are different types of activation functions which are covered in an earlier video you might want to watch all right so as a part of the training process we feed the inputs the label data or the training data and then it gives an output which is the predicted output by the network which we indicate as y hat and then there is a labeled data because we for supervised learning we already know what should be the output so that is the actual output and in the initial process before before the training is complete obviously there will be error so that is measured by what is known as a cost function so the difference between the predicted output and the actual output is the error and U the cost function can be defined in different ways there are different types of cost functions so in this case it is like the average of the squares of the error so and then all the errors are added which can sometimes be called as sum of squares some of Square errors or ssse and that is then fed as a feedback in what is known as backward propagation or back propagation and that helps in the network adjusting the weights and biases and so the weights and biases get updated till this value the error value or the cost function is minimum now there is a optimization technique which is used here called gradient descent optimization and this gthm Works in a way that the error which is the cost function needs to be minimized so there's a lot of mathematics that goes behind this for example they find the uh local Minima the global Minima using the differentiation and so on and so forth but the idea is this so as a training process as the as the part of training the whole idea is to bring down the error which is like let's say this is the function the cost function at certain levels it is very high the cost value of the cost function the output of the cost function is very high so the weights have to be adjusted in such a way and also the bias of course that the cost function is minimized so there is this optimization technique called gradient descent that is used and this is known as the learning rate now gradient descent you need to specify what should be the learning rate and the learning rate should be optimal because if you have a very high learning rate then the optimization will not converge because at some point it will cross over to the side on the other hand if you have very low learning rate then it might take forever to convert so you need to come up with the optimum value of the learning rate and once that is done using the gradient descent optimization the error function is reduced and that's like the end of the training process all right so this is another view of gradient descent so this is how it looks this is your cost function the output of the cost function and that has to be minimized using gradient descent algorithm and these are like the parameters and weight could be one of them so initially we start with certain random values so cost will be high and then the weights keep changing and in such a way that the cost function needs to come down down and at some point it may reach the minimum value and then it may increase so that is where the gradi indent algorithm decides that okay it has reached the minimum value and it will kind of try to stay here this is known as the global Minima now sometimes these curves may not be just for explanation purpose this has been drawn in a nice way but sometimes these curves can be pretty erratic there can be some local Minima here and then there is the peak and then and so on so the whole idea of gradient descent optimization is to identify the global Minima and to find the weights and the bias at that particular point so that's what is gradient descent and then this is another example so you can have these multiple local Minima so as you can see at this point when it is coming down it may appear like this is a minimum value but then it is not this is actually the global minimum value and the gradient desent algorithm will make an effort to reach this level and not get stuck at this point so the algorithm is already there and it knows how to identify this Global minimum and that's what it does during the training process now in order to implement deep learning there are multiple platforms and languages that are available but the most common platform nowadays is tensor flow and so that's the reason we have uh this tutorial we have created this tutorial for tensor flow so we will take you through a quick demo of how to write a tensorflow code using Python and tensorflow is uh an open source platform created by Google so let's just take a look at the details of tensorflow and so this is a a library a python Library so you can use python or any other languages it's also supported in other languages like Java and R and so on but python is the most common language that is used so it is a library for developing deep learning applications especially using neural networks and it consists of primarily two parts if you will so one is the tensors and then the other is the graphs or the flow that's the way the name that's the reason for this kind of a name called tensor flow so what are tensors tensors are like multi-dimensional array if you will that's one way of looking at it so usually you have a one-dimensional array so first of all you can have what is known as a scalar which means a number and then you have a one-dimensional array something like this which means this like a set of numbers so that is a onedimensional array then you can have a two-dimensional array which is like a matrix and beyond that sometimes it gets difficult so this is a three-dimensional array but tens of flow can handle many more Dimensions so it can have multi-dimensional arrays that is the strength of tensor flow and which makes computation deep learning computation much faster and that's the reason why tensorflow is used for developing deep learning applications so tensor flow is a deep learning tool and this is the way it works so the data basically flows in the form of tensors and the way the programming works as well is that you first create a graph of how to execute it and then you actually execute that particular graph in the form of what is known as session we will see this in the tensor flow code as we move forward so all the data is managed or manipulated in tensors and then the processing happens using these graphs there are certain terms called like for example ranks of a tensor the rank of a tensor is like a dimensional dimensionality in a way so for example if it is scalar so there is just a number just one number the rank is supposed to be zero and then it can be a onedimensional vector in which case the rank is supposed to be one and then you can have a twood dimensional Vector typically like a matrix then in that case we say the rank is two and then if it is a threedimensional array then it rank is three and so on so it can have more than three as well so it is possible that you can store multi-dimensional arrays in the form of tensors so what are some of the properties of tensorflow I think today it is one of the most popular platform tensorflow is the most popular deep learning platform or Library it is open source it's developed by Google developed and maintained by Google but it is open source one of the most important things about tensorflow is that it can run on CPUs as well as gpus GPU is a graphical Processing Unit just like CPU is central processing unit now in earlier days GPU was used for primarily for graphics and that's how the name has come and one of the reasons is that it cannot perform generic activities very efficiently like CPU but it can perform iterative actions or computations extremely fast and much faster than a CPU so they are really good for computational activities and in deep learning there is a lot of iterative computation that happens so in the form of matrix multiplication and so on so gpus are very well suited for this kind of computation and tensor flow supports both GPU as well as CPU and there's a certain way of writing code in tensorflow we will see as we go into the code and of course tensorflow can be used for traditional machine learning as well but then that would be an Overkill but just for understanding it may be a good idea to start writing code for a normal machine learning use case so that you get a hang of how tensorflow code works and then you can move into neural networks so that is um just a suggestion but if you're already familiar with how tens ofl works then probably yeah you can go straight into the neural networks part so in this tutorial we will take the use case of recognizing handwritten digits this is like a hollow world of deep learning and this is a nice little amness database is a nice little database that has images of handwritten digits nicely formatted because very often in deep learning and neural networks we end up spending a lot of time in preparing the data for training and with amness database we can avoid that you already have the data in the right format which can be directly used for training and amnest also offers a bunch of built-in utility functions that we can straight away use and call those functions without worrying about writing our own functions and that's one of the reasons why mes database is very popular for training purposes initially when people want to learn about deep learning and tensor flow this is the database that is used and it has a collection of 70,000 handwritten digits and a large part of them are for training then you have test just like in any machine learning process and then you have validation and all of them are labeled so you have the images and they label and these images they look somewhat like this so they are handwritten images collected from a lot of individuals people have these are samples written by human beings they have handwritten these numbers these numbers going from 0 to 9 so people have written these numbers and then the images of those have been taken and formatted in such a way that it is very easy to handle so that is amness database and the way we are going to implement this in our tensor flow is we will feed this data especially the training data along with the label information and uh the data is basically these images are stored in the form of the pixel information as we have seen in one of the previous slides all the images are nothing but these are pixels so an image is nothing but an arrangement of pixels and the value of the pixel either it is lit up or it is not or in somewhere in between that's how the images are stored and that is how they are fed into the neural network and for training once the network is trained when you provide a new image it will be able to identify within a certain error of course and for this we will use one of the simpler neural network configurations called softmax and for Simplicity what we we will do is we will flatten these pixels so instead of taking them in a two-dimensional arrangement we just flatten them out so for example it starts from here it is a 28 by 28 so there are 7484 pixels so pixel number one starts here it goes all the way up to 28 then 29 starts here and goes up to 56 and so on and the pixel number 784 is here so we take all these pixels flatten them out and feed them like one single line into our neural network and this is a what is known as a soft Max layer what it does is once it is trained it will be able to identify what digit this is so there are in this output layer there are 10 neurons each signifying a digit and at any given point of time when you feed an image only one of these 10 neurons gets activated so for example if this is strained properly and if you feed a number nine like this then this particular neuron gets activated so you get an output from this neuron let me just use uh a pen or a laser to show you here okay so you're feeding a number nine let's say this has been trained and now if you're feeding a number nine this will get activated now let's say you feed one to the trained Network then this neuron will get activated if you feed two this neuron will get activated and so on I hope you get the idea so this is one type of a neural network or an activation function known as softmax layer so that's what we will be using here this is one of the simpler ones for quick and easy understanding so this is how the code would look we will go into our lab environment in the cloud and uh we will show you there directly but very quickly this is how the code looks and uh let me run you through briefly here and then we will go into the Jupiter notebook where the actual code is and we will run that as well so as a first step first of all we are using python here and that's why the syntax of the language is Python and the first step is to import the tensorflow library so and we do this by using this line of code saying import tens of flow as TF TF is just for convenience so you can name give any name and once you do this TF is tens flow is available as an object in the name of TF and then you can run its uh methods and accesses its attributes and so on and so forth and mess database is actually an integral part of tensorflow and that's again another reason why we as a first step we always use this example Mist database example so you just simply import mnist database as well using this line of code and you slightly modify this so that the labels are in this format what is known as one hot true which means that the label information is stored like an array and uh let me just uh use the pen to show what exactly it is so when you do this one hot true what happens is each label is stored in the form of an array of 10 digits and let's say the number is uh 8 okay so in this case all the remaining values there will be a bunch of zeros so this is like array at position zero this is at position one position two and so on and so forth let's say this is position 7 then this is position 8 that will be one because our input is eight and again position 9 will be zero okay so one hot encoding this one hot encoding true will kind of load the data in such a way that the labels are in such a way that only one of the digits has a value of one and that indicates So based on which digit is one we know what is the label so in this case the eighth position is one therefore we know this sample data the value is eight similarly if you have a two here let's say then the labeled information will be somewhat like this so you have your labels so you you have this as zero the zeroth position the first position is also zero the second position is one because this indicates number two and then you have third as zero and so on okay so that is the significance of this one hot true all right and then we can check how the data is uh looking by displaying the the data and as I mentioned earlier this is pretty much in the form of digital form like numbers so all these are like pixel value so you will not really see an image in this format but there is a way to visualize that image I will show you in a bit and uh this tells you how many images are there in each set so the training there are 555,000 images in training and in the test set there are 10,000 and then validation there are 5,000 so altogether there are 70,000 images all right so let's uh move on and we can view the actual image by uh using the matplot flip library and this is how you can view this is the code for viewing the images and you can view them in color or you can view them in Gray scale so the cmap is what tells in what way we want to view it and what are the maximum values and the minimum values of the pixel values so these are the Max and minimum values so of the pixel values so maximum is one because this is uh scaled value so one means it is uh White and zero means it is black and in between is it can be anywhere in between black and white and the way to train the model there is a certain way in which you write your denslow code and um the first step is to create some placeholders and then you create a model in this case we will use the softmax model one of the simplest ones and um placeholders are primarily to get the data from outside into the neural network so this is a very common mechanism that is used and uh then of course you will have variables which are your remember these are your weights and biases so for in our case there are 10 neurons and uh each neuron actually has 784 because each neuron takes all the inputs if we go back to our slide here actually every neuron takes all the 784 in puts right this is the first neuron it has it receives all the 784 this is the second neuron this also receives all the 78 so each of these inputs needs to be multiplied with the weight and that's what we are talking about here so these are this is a a matrix of 784 values for each of the neurons and uh so it is like a 10x 784 Matrix because there are 10 neurons and uh similarly there are biases now remember I mentioned bias is only one per neuron so it is not one per input unlike the weights so therefore there are only 10 biases because there are only 10 neurons in this case so that is what we are creating a variable for biases so this is uh something little new in tens of flow you will see unlike our regular programming languages where everything is a variable here the variables can be of three different types you have have placeholders which are primarily used for feeding data you have variables which can change during the course of computation and then a third type which is not shown here are constants so these are like fixed numbers all right so in a regular programming language you may have everything as variables or at the most variables and constants but in tens of flow you have three different types placeholders variables and constants and then you create what is known as a graph so tensorflow programming consists of graphs and tensors as I mentioned earlier so this can be considered ultimately as a tensor and then the graph tells how to execute the whole implementation so that the execution is stored in the form of a graph and in this case what we are doing is we are doing a multiplication TF you remember this TF was created as a tensorflow object here one more level one more so TF is available here now tensor flow has what is known known as a matrix multiplication or matal function so that is what is being used here in this case so we are using the matrix multiplication of tensor flow so that you multiply your input values x with W right this is what we were doing x w + B you're just adding B and this is in very similar to one of the earlier slides where we saw Sigma XI wi so that's what we are doing here matrix multiplication is multiplying all the input values with the corresponding weights and then adding the bias so that is the graph we created and then we need to Define what is our loss function and what is our Optimizer so in this case we again use the tensor flows apis so tf. NN softmax cross entropy with logits is the uh API that we will use and reduce mean is what is like the mechanism whereby which says that you reduce the error and Optimizer for doing deduction of the error what Optimizer are we using so we are using gradient descent Optimizer we discussed about this in couple of slides uh earlier and for that you need to specify the learning rate you remember we saw that there was a a slide somewhat like this and then you define what should be the learning rate how fast you need to come down that is the learning rate and this again needs to be tested and tried and to find out the optimum level of this learning rate it shouldn't be very high in which case it will not converge or shouldn't be very low because it will in that case it will take very long so you define the optimizer and then you call the method minimize for that Optimizer and that will Kickstart the training process and so far we've been creating the graph and in order to actually execute that graph we create what is known as a session and then we run that session and once the training is completed we specify how many times how many iterations we want it to run so for example in this case we are saying Thousand Steps so that is a exit strategy in a way so you specify the exit condition so a training will run for thousand iterations and once that is done we can then evaluate the model using some of the techniques shown here so let us get into the code LLY and see how it works so this is our Cloud environment now you can install tensorflow on your local machine as well I'm showing this demo on our existing Cloud but you can also install denslow on your local machine and uh there is a separate video on how to set up your tsor flow environment you can watch that if you want to install your local environment or you can go for other any cloud service like for example Google Cloud Amazon or Cloud Labs any of these you can use and U run and try the code okay so it has got started we will log in all right so this is our deep learning tutorial uh code and uh this is our tensorflow environment and uh so let's get started the first we have seen a little bit of a code walk through uh in the slides as well now you will see the actual code in action so the first thing we need to do is import tensorflow and then we will import the data and we need to adjust the data in such a way that the one hot is encoding is set to True one hot encoding right as I explained earlier so in this case the label values will be shown appropriately and if we just check what is the type of the data so you can see that this is a data sets python data sets and if we check the number of images the way it looks so this is how it looks it is an array of type float 32 similarly the number if you want to see what is the number of training images there are 55,000 then there are test images 10,000 and then validation images 5,000 now let's take a quick look at the data itself visualization so we will use um mat plot clip for this and um if we take a look at the shape now shape gives us like the dimension of the tensors or or or the arrays if you will so in this case the training data set if we see the size of the training data set using the method shape it says there are 55,000 and 55,000 by 784 so remember the 784 is nothing but the 28 by 28 28 into 28 so that is equal to 784 so that's what it is uh showing now we can take just uh one image and just see what is the the first image and see what is the shape so again size obviously it is only 784 similarly you can look at the image itself the data of the first image itself so this is how it it shows so large part of it will probably be zeros because as you can imagine in the image only certain areas are written rest is U blank so that's why you will mostly see zeros either it is black or white but then there are these values are so the values are actually they are scaled so their values are between zero and one okay so this is what you're seeing so certain locations there are some values and then other locations there are zeros so that is how the data is is stored and loaded if we want to actually see what is the value of the handwritten image if you want to view it this is how you view it so you create like do this reshape and um matplot lib has this um feature to show you these images so we will actually use the function called um I am show and then if you pass this parameters appropriately you will be able to see the different images now I can change the values in this position so which image we are looking at right so we can say if I want to see what is there in maybe 5,000 right so 5,000 has three similarly you can just say five what is in five five has eight what is in 50 again H so basically by the way if you're wondering how I'm executing this code shift enter in case you're not familiar with Jupiter notebooks shift enter is how you execute each cell individual cell and if you want to execute the entire program you can go here and say run all so that is how this code gets executed and um here again we can check what is the maximum value and what is the minimum value of this pixel values as I mentioned this is it is scaled so therefore it is between the values lie between 1 and zero now this is where we create our model the first thing is to create the required placeholders and variables and that's what we are doing here as we have seen in the slides so we create one placeholder and we create two variables which is for the weights and biases these two variables are actually matrices so each variable has 784 by 10 values okay so one for this 10 is for each neuron there are 10 neurons and 784 is for the pixel values inputs that are given which is 28 into 28 and the biases as I mentioned one for each neuron so there will be 10 biases they are stored in a variable by the name b and this is the graph which is basically the multiplication of these matrix multiplication of X into and then the bias is added for each of the neurons and the whole idea is to minimize the error so let me just execute I think this code is executed then we Define what is our the Y value is basically the label value so this is another placeholder we had X as one placeholder and Yore true as a second placeholder and this will have values in the form of uh 10 digigit 10 digigit arrays and uh since we said one hot encoded the position which has a one value indicates what is the label for that particular number all right then we have cross entropy which is nothing but the loss loss function and we have the optimizer we have chosen gradient descent as our Optimizer then the training process itself so the training process is nothing but to minimize the cross entropy which is again nothing but the loss function so we Define all of this in the form of a graph so the up to here remember what we have done is we have not exactly executed any tens oflow code till now we are just preparing the graph the execution plan that's how the tensor flow code works so the whole structure and format of this code will be complet completely different from how we normally do programming so even with people with programming experience may find this a little difficult to understand it and it needs quite a bit of practice so you may want to view this uh video also maybe a couple of times to understand this flow because the way tensorflow programming is done is slightly different from the normal programming some of you who let's say have done uh maybe Spark programming to some extent we'll be able to easily understand this uh but even in spark the the programming the code itself is pretty straightforward behind the scenes the execution happens slightly differently but in tens oflow even the code has to be written in a completely different way so the code doesn't get executed uh in the same way as you have written so that that's something you need to understand and a little bit of practice is needed for this so so far what we have done up to here is creating the variables and feeding the variables and um or rather not feeding but setting up the variables and uh the graph that's all defining maybe the uh what kind of a network you want to use for example we want to use soft Max and so on so you have created the variables have to load the data loaded the data viewed the data and prepared everything but you have not yet executed anything in tens of flow now the next step is the execution in tensor flow so the first step for doing any execution in tensor flow is to initialize the variables so anytime you have any variables defined in your code you have to run this piece of code always so you need to basically create what is known as a a node for initializing so this is a node you still are not yet executing anything here you just created a node for the initialization so let let us go ahead and create that and here onwards is where you will actually execute your code uh in tensive flow and in order to execute the code what you will need is a session tensor flow session so TF do session will give you a session and there are a couple of different ways in which you can do this but one of the most common methods of doing this is with what is known as a with Loop so you have with tf. session as SS and with a uh colon here and this is like a block starting of the block and these indentations tell how far this block goes and this session is valid till this block gets executed so that is the purpose of creating this width block this is known as a width block so with tf. session as sess you say cs. run in it now cs. run will execute a node that is specified here so for example here we are saying cs. run cess is basically an instance of the session right so here we are saying tf. session so an instance of the session gets created and we are calling that sess and then we run a node within that one of the nodes in the graph so one of the nodes here is in it so we say run that particular node and that is when the initialization of the variables happens now what this does is if you have any variables in your code in our case we have W is a variable and B is a variable so any variables that we created you have to run this code you have to run the initialization of these variables otherwise you will get an error okay so that is the that's what this is doing then we within this with block we specify a for Loop and we are saying we want the system to iterate for thousand steps and perform the training that's what this for Loop does run training for, iterations and what it is doing basically is it is fetching the data or these images remember there are about 50,000 images but it cannot get all the images in one shot because it will take up a lot of memory and performance issues will be there so this is a very common way of Performing deep learning training you always do in batches so we have maybe 50,000 images but you always do it in batches of 100 or maybe 500 depending on the size of your system and so on and so forth so in this case we are saying okay get me 100 uh images at a time and get me only the training images remember we use only the training data for training purpose and then we use test data for test purpose you must be familiar with machine learning so you must be aware of this but in case you are not in machine learning also not this is not specific to deep learning but in machine learning in general you have what is known as training data set and test data set your available data typically you will be splitting into two parts and using the training data set for training purpose and then to see how well the model has been trained you use the test data set to check or test the validity or the accuracy of the model so that's what we are doing here and You observe here that we are actually calling an amnest function here so we are saying mnist train. nextt batch right so this is the advantage of using mes database because they have provided some very nice helper functions which are readily available otherwise this activity itself we would have had to write a piece of code to fetch this data in batches that itself is a a lengthy exercise so we can avoid all that if we are using amness database and that's why we use this for the initial learning phase okay so when we say fetch what it will do is it will fetch the images into X and the labels into Y and then you use this batch of 100 images and you run the training so cs. run basically what we're doing here is we are running the training mechanism which is nothing but it passes this through the neural network passes the images through the neural network finds out what is the output and if the output obviously the initially it will be wrong so all that feedback is given back to the neural network and thereby all the W's and Bs get updated till it reaches th000 iterations in this case the exit criteria is th000 but you can also specify probably accuracy rate or something like that for the as an exit criteria so here it is it just says that okay this particular image was wrongly predicted so you need to update your weights and biases that's the feedback given to each neuron and that is run for thousand iterations and typically by the end of this thousand iterations the model would have learned to recognize these handwritten images obviously it will not be 100% accur it okay so once that is done after so this happens for thousand iterations once that is done you then test the accuracy of these models by using the test data set right so this is what we are trying to do here the code may appear a little complicated because if you're seeing this for the first time you need to understand uh the various methods of tensor flow and so on but it is basically comparing the output with with what has been what is actually there that's all it is doing so you have your test data and uh you're trying to find out what is the actual value and what is the predicted value and seeing whether they are equal or not TF do equal right and how many of them are correct and so on and so forth and based on that the accuracy is uh calculated as well so this is the accuracy and uh that is what we are trying to see how accurate the model is in predicting the these uh numbers or these digits okay so let us run this this entire thing is in one cell so we will have to just run it in one shot it may take a little while let us see and uh not bad so it has finished the thousand iterations and what we see here as an output is the accuracy so we see that the accuracy of this model is around 91% okay now which is pretty good for such a short exercise within such a short time we got 90% accuracy however in real life this is probably not sufficient so there are other ways in to increase the accuracy we will see probably in some of the later tutorials how to improve this accuracy how to change maybe the hyperparameters like number of neurons or number of layers and so on and so forth and uh so that this accuracy can be increased Beyond 90% so guys let's start first with a beginner level project and the first project that we are going to encounter that is home value prediction so guys this project aims to develop a productive model to estimate the value of Residential Properties the model will analyze various features such as location square footage number of bedrooms and bathrooms age of the property and other relevant factors by leveraging historical Property Data the model will be able to provide accurate home value predictions which can be useful for real estate agents buyers and sellers so guys the programming language that we are going to use all over here will be Python and machine learning libraries that we will be using will be psychic learn tensorflow kiras and for data handling libraries we have pandas numai and for visualization we have to use mat plot lab and seon now what will be the approach for this one guys so guys the first one that we have a data collection so here what is going to happen guys so first you have to collect the historical Property Data from the sources like Zillow retailer.com you can also get database from the public real estate databases like kaggle data sets where you have Zillo home value prediction ensure that a data set include features like location where you have latitude longitude square footage number of rooms earu property type and previous sales The Next Step that comes is data cleaning you have to handle the missing values by using imputation techniques or removing incomplete records removing outliers that may skew the model's prediction normalize or standardize the data to ensure consistency the third one that we have is feature engineering you have to create new features such as proximity to schools crime rates and access to the public transportation encode categorial variables example property type location using techniques like one hot encoding generate interaction features that capture relationship between existing features the fourth one that we have is model selection use regression models like linear regression random Forest gradient boosting neural networks experiment with different models to identify the best performing one now in the next phase all you have to do guys is model training and evaluation split the data set into training and test sets train the model on the training set and evaluate their performance on the testing set using metrics like rsme which means root means Squad error you can use cross validation to ensure the model's robustness and avoid overfitting the sixth one that we have all over here is hyperparameter tuning you can optimize the model's hyperparameter using techniques such as grid search or random search to improve accuracy and if you're looking forward to deploy your model then you can develop a web interface using flask or Jango to allow users to input property features and get predictions you can deploy the model on the cloud platform like AWS for scalability now if we talk about the complexity level of this we all know that it is a beginner level project now let us move on to the one more set that is music genre classification and generation so guys this is also one of the most beginner level project this project aims to develop a system that can classify music tracks into different genres and generate new music composition within specified genre the goal is to build a model that analyzes audio features to categorize music and uses deep learning techniques to create new music this project introduces Advanced concept of audio processing deep learning and generative mod models so guys what will be used in this so we'll have programming language that will be python for audio processing we'll be using librosa for machine learning libraries we'll be using tensorflow kiras pytorch for data handling libraries we'll be using pandas numpy for visualization we'll be using matte plot L cbon for the data set guys you can use gdan music genre data set or you can get it from free music archive so guys in the first phase we are going to have data collection you can update data sets containing music track and their corresponding genre from the sources from gtzan music genre data set and the free music archive ensure that a data set includes diverse genre and substantial number of tracks per genre next we'll go for data pre-processing use library librosa to load and pre-process audio files including feature extraction such as mil frequency sepal coefficients chroma features and spectral contrast you can normalize the extracted features to ensure consistent input for the given models now now if we talk about feature engineering guys you can extract additional features from the audio files such as Tempo beat zero Crossing rate Etc create a feature Matrix that represents the extracted audio features then go for the model selection use conventional neural network or recurrent neural networks for the music genre classification split the data set into training and testing data set now if we talk about model training and evaluation guys then you can train the selected classification model on the training set evaluate the model's performance on the test set using metrics like accuracy precision recall and F1 score use confusion matrices to understand the classification performance across different genres now if you talk about model selection and training for the music generation what you will do guys you can use the generative ADV verial networks or recurrent neural networks such as lstm long short-term memory for music generation train the generative model on the data set to create new music sequences next we have model training and evaluation you can train the generative model on sequences of audio features we can evaluate the generated music by listening tests and by objective metrics like Inception score or fret audio distance if I talk about hyperparameter tuning guys you can optimize the model you can use the hyper parameters using techniques like grid search or random search to improve performance if I talk about deployment guys you can deploy these models on cloud platforms like AWS now let us move on to our next project so guys the complexity of this project is at the beginner level now now let us move to the intermediate level projects next project that we have all over here is sentiment analysis of Twitter data this project aims to develop a sentiment analysis model that can classify to each set positive negative or neutral the goal is to analyze public sentiment on various topics or events using natural language techniques so guys what will be used all over here so in this we will have programming language like python okay NLP libraries nltk spacy for machine learning libraries we can use psyit learn tensorflow kiras for data handling libraries we have pandas dpai for visualization we have mat plot lip cbon and we can use the API Twitter for data collection now how you're going to work on it guys so guys if I talk about the data collection use a Twitter API to collect tweets based on specific hashtags like keywords or topics extract relevant Fields like tweet text user information timestamp Etc then if I talk about data pre-processing guys clean the Tweet text by removing special characters links mentions hashtags and stop wordss tokenize the text and perform liiz or stemming to reduce the words to their base form next if we talk about feature engineering guys you convert the clean text Data into numerical representation using tfidf bag of words or word embedding now if I talk about model selection guys you can choose a classification algorithms such as logistic regression na bias or lsdm split the data set into training and testing data set now if I talk about model training and evaluation then you can train the selected model on the training set evaluate the model's performance on the testing set using metrics like accuracy precision recall and FN score use cross validation to ensure the model's robustness if I talk about hyperparameter tuning guys you can optimize the models hyper parameters using grid search or random search to improve the performance for deployment which can be optional you can deploy your model on AWS for realtime sentiment analysis so so if I talk about the complexity level guys its complexity is intermediate so guys our next project is customer segmentation using K me clustering this project aims to segment customers into distinct groups based on their purchasing behavior and demographic information the objective is to understand customer segments and tailor marketing strategies accordingly so guys what programming languages we'll be using so basically we'll be using python for machine learning libraries we have pyit learn for data handling libraries we'll have panda numpy for visualization libraries we have M plot cbon and the data set Source will be e-commerce transaction data how we are going to work on this one for data collection we can obtain a data set of e-commerce transactions that include customer demographics purchase history and product information next we'll have data pre-processing for data pre-processing we are going to do the cleaning of the data by handling the missing values and outliers then for feature engineering we are going to create features like total purchase amount purchase frequency and and recenty of the purchases then we are going to proceed for the model selection you can use K means clustering to segment the customers into distinct groups we can determine the optimal number of clusters using methods like elbow methods or Sout score now if I talk about model training and evaluation you can train the K means model on the process data set you can evaluate the quality of clusters by analyzing intra cluster and intercluster distances next we have the evaluation you can visualize the Clusters using techniques like like PCA principal component analysis TS Etc next we have the hyperparameter tuning now now you can tune this model and interpret the characteristic of each segment you can develop a Target marketing strategies for each segment based on unique behavior and preferences now deployment is optional you can develop a dashboard using flask or Django to visualize customer segments and track marketing campaigns so guys if I talk about the complexity of this project project so this is an intermediate level project so guys for data set you can use the kles customer segmentation data set which is available at the kles platform now the third intermediate level project that we have all over here is building a chatboard with rasa this project aims to build an intelligent chatboard using rasa framework the chatboard will be capable of understanding user queries and providing appropriate responses making it useful for customer support personal assistance or information retrieval what language anges we are going to use so it will be python based we will have the NLP libraries like rasa nltk spacy for machine learning libraries we are going to have psyit learn tensorflow kiras etc for data handling libraries we are going to use pandas numai so guys this was what we are going to do it and how you can work on this one by collecting the data collect the conversation data and FAQs from the target domain annotate the data to create training examples for the chatbot next comes is data pre-processing clean the text data by removing special characters and normalizing the text you can tokenize and limiti the text to prepare for a training the third one we have the models training you can use rasa's nlus component to train a model for intent recognition and entity extraction you can Define the dialogue management policies to handle different conversation flows next guys you can perform the feature engineering and integration you can integrate the rasa nlu and core components to build complete chatbot you can connect the chatbot to messenging platform like Facebook Messenger etc for model selection and testing what you can do guys you can test this chatboard with various inputs to ensure that it handles the scenarios appropriately and you can also select the right model using this now if I talk about model training guys what do you have to do you have to collect the user feedback and conversational logs to continuously work on training the model next similarly you have to retrain the model periodically with the new data to see how it is working so that will be your evaluation now for the hyperparameter tuning what you're going to do guys you have to check in those scenarios where it is able to tune up with those scenarios where it can handle the input appropriately and next is deployment so guys for deploying it you can use AWS so guys for a data set you can use rasas open source so that's a very good data set for you to proceed so guys if you talk about difficulty of this project this is an intermediate level project now let us move on to the advanced level projects for advanced level projects the first one that comes up to my mind is movie similarity from plot summaries now this project aims to develop a system that can find out recommended movies similar to a given movie based on their plot summaries by analyzing the textual content of the movie plot summaries the model will identify similarities and suggests movies with similar themes story lines or genres this project introduces beginers to natural language processing text similarly measured and recommendation systems what languages we are going to use guys we'll be using python NLP libraries like nltk Spacey machine learning libraries like psyit learn data handling libraries like pandas numpy visualization we can use numpy and data set Source will be IMDb or kle so guys this process is also involving the data collection then you have to go for data cleaning then feature engineering next model selection the similar process as I have discussed in other projects so you have have to also go through the same one next what you have to do guys similarly what you have to do you have to train the model then evaluate the model then hypertune it and finally proceed for the deployment so this is overall process of this project try to research on the website a lot like how you can extract it so guys you can use kagle or towards data science to research more about this project now guys if I talk about the difficulty of this project and this is an advanced level project now let us move on to the next one that we have all over here that is image segmentation project for brain tumor prognosis this is a very very amazing project and definitely you can put up on your portfolio basically guys this project aims to develop an image segmentation model to identify and de eliminate brain tumors from mni scans the goal is to accurately segment the tumor regions which can Aid in prognosis treatment planning surgical interventions this project introduces intermediate level concepts of computer Visions deep learning and medical image analysis so guys what we'll be using all over here for programming languages we can use Python for deep learning libraries we can use tensorflow kiras pyto for image processing libraries we can use open CV psyit image for data handing libraries we can use pandas numpy for visualization we can use matte plot seon now if I talk about what is a process of developing this project the first St will be same data collection so next step you have to go for data pre-processing third step you have to do the model selection where you can use CNN models for image segmentation task then you go for model selection moving ahead you're going to have the model training and evaluation you have to split the data set into training and validation and testing data sets next proceed for the evaluation phase okay evaluate the model with certain metrics so here I can give you certain idea like you can use dice coefficient intersection over Union or accuracy then go for hyper parameter tuning where you have to optimize the models hyper parameters you can use grid search or random search as we have discussed and finally you can deploy this model on AWS now guys we have come to the final project this is also very amazing project guys so guys the complexity level of this project is advanced level now let us move on to our final project that is the impact of climate change on birds this is a very very amazing project and definitely can add it on your resume this project aims to analyze the impact of climate change on the bird population and migration patterns by examining various climatic factors and their correlation with bird spes data the project seeks to predict how climate change might affect bird behavior and distribution this project will introduce you some advanced level Concepts like time series analysis environmental data modeling Etc so guys what programming languages we'll be using for data analysis you can see we'll have pandas numai for machine learning libraries we have we are going to have psychic learn tensor flow for visualization we are going to have matte plot plotly for GEOS spatel we are going to have geopandas folium for data source we're going to have public data sets on bird observation and climate data sources from eBird now what will the process flow for this one guys first you have to proceed for data collection gather bird observation from the data like eird which provides extensive record of bird sightings okay and for climate data you can collect it from NOA including temperature precipitation and other relevant climatic factors over the time and similar next process will be the data preprocessing then you have to proceed for feature engineering then you have to go for model selection okay moving ahead you have to go for model training then evaluation then hyper parameter tuning and finally you have to deploy the model so research about this project see what models you're going to use suppose I can give you a hint about this you can use time series analysis models like ARA or ml models you can also use random forest or gradient boosting for predicting impact on the bird population so guys use Google exhaustively to research about this project this is also a very amazing project and it's going to give you a lot of idea now if I talk about the complexity level of this it is an advanced level project so what is deep learning deep learning is a subset of machine learning which itself is a branch of artificial intelligence unlike traditional machine learning models which require manual featur extraction deep learning models automatically discovers representation from raw data so this is made possible through neural networks particularly deep neural networks which consist of multiple layers of interconnected nodes so these neural network are inspired by the structure and the function of human brain each layer in the network transform the input data into more abstract and composite representation for instance in image recognition the initial layer might detect simple features like edges and textures while the deeper layer recognizes more complex structure like shapes and objects so one of the key advantage of deep learning is its ability to handle large amount of unstructured data such as images audios and text making it extremely powerful for various application so stay tuned as we delve deeper into how these neural networks are trained the types of deep learning models and some exciting application that are shaping our future types of deep learning deep learning AI can be applied supervised unsupervised and reinforced man machine learning using various methods for each the first one supervised machine learning in supervised learning the neural network learns to make prediction or classify that data using label data sets both input features and Target variables are provided and the network learns by minimizing the error between its prediction and the actual targets a process called back propagation CNN and RNN are the common deep learning algorithms used for tasks like image classification sentiment analysis and language translation the second one unsupervised machine learning in unsupervised machine learning the neural network discovers Ms or cluster in UNL data sets without Target variables it identifies hidden pattern or relationship within the data algorithms like autoencoders and generative models are used for tasks such as clustering dimensionality reduction and anomaly detection the third one reinforcement machine learning in this an agent learns to make decision in an environment to maximize a reward signal the agent takes action observes the records and learns policies to maximize cumulative rewards over time deep reinforement learning algorithms like deep Q networks and deep deterministic poly gradient are used for tasks such as Robotics and gameplay moving forward let's see what are the artificial neural networks artificial neural networks a Ann's inspired by the structure and the function of human neurons consist of interconnected layers of artificial neurals or units the input layer receives data from the external resources and it passes to one or more hidden layers each neuron in these layers computes a weighted sum of inputs and transfers the result to the next layer during training the weight of these connection are adjusted to optimize the Network's performance a fully connected artificial neural network includes an input layer or more hidden layers and an output layer each neuron in a hidden layer receives input from the previous layer and send its output to the next layer so this process continues until the final output layer produce the network response so moving forward let's see types of neural networks so deep learning models can automatically learn feature from data making them ideal to task like image recognition speech recognition and natural language processing so the most common architecture and deep learnings are the first one feed foral neural network fnn so these are the simplest type of neural network where information flows linearly from the input to the output they are widely used for tasks such as image classification speech recognition and natural Li processing NLP the second one convolutional neural network designed specifically for image and video recognition CNN automatically learn feature from images making them ideal for image classification object detection and image segmentation the third one recurrent neural networks RNN are is specialized for processing sequential data time series and natural language they maintain and internal state to capture information from previous input making them suitable for task such as spe recognition NLP and language translation so now let's move forward and see some deep learning application the first one is autonomous vehicle deep learning is changing the development of self-driving car algorithms like CNS process data from sensors and cameras to detect object recognize traffic signs and make driving decision in real time enhancing safety and efficiency on the road the second one is Healthcare diagnostic deep learning models are being used to analyze medical images such as x-rays MRIs and CT scans with high accuracy they help in early detection and diagnosis of diseases like cancer improving treatment outcomes and saving lives the third one is NLP recent advancement in NLP powered by Deep learning models like Transformer chat GP have led to more sophisticated and humanik text generation translation and sentiment analysis so application include virtual assistant chat Bots and automated customer service the fourth one defect technology so deep learning techniques are used to create highly realistic synthetic media known as defix while this technology has entertainment and creative application it also raises ethical concern regarding misinformation and digital manipulation the fifth one predictive maintenance in Industries like manufacturing and anation deep learning models predict equipment failures before they occur by analyzing sensor data the proactive approach reduces downtime lowers maintenance cost and improves operational efficiency so now let's move forward and see some advantages and disadvantages of deep learning so first one is high computational requirements so deep learning requires significant data and computational resources for training whereas Advantage is high accuracy achieves a state-of-the-art performance in tasks like image recognition and and natural language processing whereas deep learning needs large label data sets often require extensive label data set for training which can be costly and time consuming together so second advantage of deep learning is automated feature engineering automatically discovers and learn relevant features from data without manual intervention the third disadvantage is overfitting so deep planning can overfit to training data leading to poor performance on new unseen data whereas the third deep learning Advantage is scalability so deep learning can handle large complex data set and learn from massive amount of data so in conclusion deep learning is a transformative leap in AI mimicking human neural networks it has changed Health Care Finance autonomous vles and NLP imagine being able to create stunning high quality images in just a few seconds without needing any artistic skills or expensive software sounds incredible right thanks to Advanced ments in technology this is now a reality these image generators are in high demand because they allow anyone to generate incredible visuals within just a few clicks traditional tools are becoming outdated as these new generators can produce images faster with better quality and more creative flexibility popular tools like mid journey and Di can be pricey with M Journey costing around $10 per month and dial e at $20 per per month luckily I have found Seven Fantastic free Alternatives that won't break the bank today I'm going to share some free mid Journey Alternatives with you showing you the quality they produce what makes each one unique and answering the big question how free they are I'll break down each tool highlight their strengths and weakness and show you exactly how you can use them first up we have crayon doai if you're looking a tool that lets you create images quickly and easily crayon AI is a great choice one of the best things about crayon AI is its Simplicity you don't need to sign up or log in to start generating images you can just type your prompt here choose a style and you're good to go let me show you how we can do it so this is the interface of crayon a and you can just simply type your prompt you don't have to log in also all right so we're going to type your [Music] Vikings standing in Battleship so we'll just click on this draw option and we have selected the style art there are many other options like Photo drawing all of that so you can select according to your need so these are the unnecessary ads it's going to show because since it is free all right so it's taking time proc 60 seconds so you can see here that uh we have given a prompt Vikings standing in Battleship and these are the images it has generated all right so if you need to give more prompt ideas here are the ideas it's showing and you can also check out The Inspirations and the recent pictures people have uploaded all right so the qu let's check so as you can see the quality of the image is not that grade however it's free you can get a lot of ads which can be bit annoying and the image quality is uh like decent but not that sharp so you might need to upscale your favorites okay so here is the option this upscale is for if you want to buy the premium version of this tool and there's an option called buy it so you you can also print your image in a t-shirt if you want to so yeah so this was for this crayon AI you can see we have the expert mode also that is U suppose Vikings standing in a battleship and you want to remove negative words like suppose I text people I I do not want people to be standing in my um picture so I'll just simply click on draw and then it takes time A BR 60 seconds and it will give it images and there are other options as well like photo drawing none all of these so you can see that in expert mode we have written people in the negative section so it will remove all the people and these are the images it has generated I mean it's not that sharp but then it's fine you you also have a option here to remove background overall if I need to rate this AI tool I would rate its image quality at 5.5 and performance at 9 due to its speed and ease of use so this was all for this crayon AI let's move on to our next AI tool so next let's talk about get image. a this tool offers high quality images and a variety of styles making it a versatile choice for many different projects you can select from Styles like photo realism art and Anime yes so from here you can just select different options like photo realism we have artistic anime whatever picture you want to generate image you can just simply generate it from here and all right so let's suppose I have U I need an image these are the previous images I have generated so I'll just click on photo realism and I will click anything like a mountains mountains with uh Birds and Greenery all right so I'll just what you can also select the number of images you want to suppose 2 3 four five options are there and I want to just generate two images all right so in order to generate more images you have to upgrade this version and the free version is only applicable till two images and one more thing you do not need to sign up to use it which is a minor inconvenience and U there are 100 credit edits so as you can see we can just click simply click on this uh where is the option yeah we can simply click on create two images and it will generate the images faster as you can see you can also control the resolution and the style of the images so it has generated the image for us let's check it out okay the quality is are really good and and yeah the size is around this you can change the size and the resolution so we have an option here as you can see aspect ratio if I need my picture to be in this ratio I can just simply click on that also there are options for styling so you can style according to your needs suppose you can just level it up and down and change the contrast of the images you want to so what I really loved about this get image. a is the quality of the images and the wide range of styles you can choose from them also you get 100 uh free credits you can see here I have already used uh some credits so I have 92 remaining so these are your 100 free credits you get and for each image generation costs just one credit which lets you experiment without worrying about running out of credits too quickly however you can just sign up and use it which is a minor inconvenience and while the 100 free credits are generous I have already signed up you can see my account is already logged in so for this tool as compared to the previous one which was crayon AI you can just you have to log into to this tool and generate images also you can control the resolution and the number of images you want to generate allowing you to manage your credits efficiently also there's an option called Advance where you can just uh give any negative prompt you want to and the image reference you can simply just drag and drop any image you want to just just paste it over here for reference suppose I've pasted Aon musk image over here and you can also create any image you want to for reference all right suppose I want to give a prompt here like Vikings with powers and I just simply click on create two images and the picture I used for reference is Aon musk image so let's see what kind of image will it generate for us I've previously used these images so as you can see that these are the images it's generating I have used Elon Musk image for reference and I've given the prompt as Vikings so it has created Vikings with power using Elon Musk as picture as reference also just check it out I mean the face is not that similar but then it's pretty cool I would rate the quality at six and I think it's a bit better than the previous one but still we could do a lot more better and this is what I will show you in the main generator and the performance I would rate it seven due to its user friendly interface and the speed so let's talk about the next AI tool which is runway. a so next we have Runway AI you have probably heard of this tool which is versatile Powerhouse offering not just image generation but also video creation audio generation and many more this makes it an excellent all-in-one tool for creatives Runway AI is incredibly versatile it's not just an image generator it can create videos generate audio and many more for image generation you get 125 credits we will just see how first we have to log in so we have logged in over here as you can see I've got 495 credits with me you get 125 credits for each image costing around five credits this makes allinone tool for creativity the image quality is also very good so let's check it how uh so here tool section I'll be just clicking on text image to video this one so we'll just in this section we have many tools over here generative audio video to file but for this specific video I'll be clicking on text to image all right so you can see the ratios over your resolution style and prompt any prompt you want to give suppose for this thing I want to write Village R lightings and then I'll just click on generate for styling purpose you can also just click U 3D cartoon or anime or so many things are there suppose I just click on anime let's see what type of image it generates for us all right so you can see the quality of images it has generated for us the the and for the style section we have clicked on anime so it has used the model and created anime wise picture for us so in order to rate the quality and performance both I would rate it six due to higher credit cost per image because for each image generation it cost around five credits so you see I have 95 generation left now and if I complete this generation again I have to upgrade another plan so which is comparatively expensive so this was all for this uh Runway AI which was pretty cool so let's check out the other AI tool so the next on the list is Kaa AI which is unique with its realtime image generation and high degree of customization interesting and a pretty cool image generator you can also adjust AI strength aspect creation and many more to create intricate designs it's a bit more complex but the results are worth it when I saw this for the first time I was quite impressed let me show you how so suppose I click on this generate option over here and as you you can see we have so many models over here cartoon CGI concept HD we'll just click on this photo image you can just simply type any prompt you want to suppose I I typed f i e l d field with sunflowers and I'll just simply click on generate so you can see it has created also one more pretty amazing thing I'll show you if you adjust this thing it will create different images according to these objects so let's check it out how the ads because since it's free it's going to show the ads and these are the type of images it will show you also you have different concepts over here you can see cartoon is there as well and wow this is so good the quality of the Mage is really amazing so so in order to rate this I would rate its quality at seven and performance at 8 due to its instant generation of image quality and it so fast also the unique concept which I liked it very much the only downside is you have 900 images available so you have to be careful with it so let's talk about the next AI tool so next on the list is night Cafe AI which is excellent for generating high quality images in various Styles you can choose tools from cinematic realistic and Anime Styles among others this tool offers a range of models like dream shaper and stable diffusion so let me show you a demo how we have to use this tool so here you just click on this create option and these are the models you have to choose from here the popular model suppose I have selected dream shaper and the text I want to write is uh suppose I write Cinema Matic Sunset Viewpoint all right and the styl is night Cafe you can also select different styles from here realistic anime hyper real you know color painting so many options are there and also the number of images you want to suppose I click on four and I've used the style as preset night Cafe I'll just simply click on Create and I as you can see I have earned free credits so as you can see the number of images has produced 1 2 3 4 since we have selected on night Cafe preset the quality is really amazing so for this aim generator I would read its quality at 8 and performance at 7.5 due to its high quality outputs and learning opportunities now let's talk about the next AI tool yes so next on the list is Leonardo AI which is again my personal favorite powerful and offers extensive customization options you get 150 free credits per month and you can choose from various models and styles this tool even helps with prom generation if you're unsure about how to start so you just click on let's get started as you can see this is the interface showing here so Leonardo AI offers extensive customization aot allowing you to control Dimensions Styles and even generate prompts if you are stuck so we'll just simply click on image generation as you can see here different options are there generation mode for Quality purpose you have we need to have the premium version so I type any prompt suppose The Prompt I give here is underwater landscape and I just click on generate so it will be using 24 credit points all right the style which I've selected is dynamic so let's see what type of images will it create for us and how is it better than the previous one the previous AI tool which we have discussed the interface might be overwhelming at first due to its numerous options available but once you just get the hang of it the creativity is is immense so you can see the image it has generated for us okay just let's click on it the quality is really good as compared to others and also for the preset style we have selected it's really good so if I have to rate this AI generator then the quality will be at 8 and performance is 8.5 due to its customization options because the number of options we get is really good also we have advanced settings over here so overall it's good I would rate it 8 .5 compared to other AI generator tools so finally let's talk about the number one AI generator tool according to me which is Char GPT 40 which is now free for public use this tool excels in generating highly detailed and specific quality images you can just upload images create variations and refine your proms for precise results so chart GPT 4's image generator is simply outstanding let me show you how suppose just it's quite simple and the quality of image it generates it's really good suppose I type I write a prompt here as a hyper realistic cat in the shape of suppose let's say butter on on French toast so we'll just simp give the prompt over here and let's see what type of image is it generating for us all right so the you can see the quality of the image I mean it's really amazing as compared to other AI tools and also it's you just have to write the prompt here and it will generate an image for you so to to rate this tool I would just rate the quality at 9 and performance at also 9 this was all for the AI tools generator so that wraps up our list for the best free AI generators each of these tools offers unique features and capabilities making them great alternatives to Mid Journey welcome to the RNN tutorial that's the recurrent neural network so we talk about a feed forward neural network in a feed forward neural network information flows only in the forward direction from the input nodes through the hidden layers if any and to the output nodes there are no Cycles or Loops in the network and so you can see here we have our input layer I was talking about how it just goes straight forward into the hidden layers so each one of those connects and then connects to the next hidden layer it connects to the output layer and of course we have a nice simplified version where it has a predicted output they refer to the input as x a lot of times and the output as y decisions are based on current input no memory about the past no future scope why recurrent neural network issues in feed forward neural network so one of the biggest issues is because it doesn't have a scope of memory or time a feed forward neural network doesn't know how to handle sequential data uh it only considers only the current input so if you have a series of things and because three points back affects what's happening now and what your output affects what's happening that's very important so whatever I put as an output is going to affect the next one um a feed forward doesn't look at any of that it just looks at this is what's coming in and it cannot memorize previous inputs so it doesn't have that list of inputs coming in solution to feed forward neural network you'll see here where it says recurrent neural network and we have our X on the bottom going to H going to Y that's your feed forward uh but right in the middle it has a value C so there a whole another process was memorizing what's going on in the hidden layers and the hidden layers as they produce data feed into the next one so your hidden layer might have an output that goes off to Y uh but that output goes back into the next prediction coming in what this does is this allows it to handle sequential data it considers the current input and also the previously received inputs and if we're going to look at General drawings and um Solutions we should also look at applications of the RNN image captioning RNN is used to caption an image by analyzing the activities present in it a dog catching a ball in midair uh that's very tough I mean you know we have a lot of stuff that analyzes images of a dog and the image of a ball but it's able to add one one more feature in there that's actually catching the ball in midair time series prediction any time series problem like predicting the prices of stocks in a particular month can be solved using RNN and we'll dive into that in our use case and actually take a look at some stock one of the things you should know about analyzing stock today is that it is very difficult and if you're analyzing the whole stock the stock market at the New York Stock Exchange in the US produces somewhere in the neighborhood if you count all the individual trades and in fluctuations by the second um it's like 3 terabytes a day of data so we're only to look at one stock just analyzing One stock is really tricky in here we'll give you a little jump on that so that's exciting but don't expect to get rich off of it immediately another application of the RNN is natural language processing text Mining and sentiment analysis can be carried out using RNN for natural language processing and you can see right here the term natural language processing when you stream those three words together is very different than I if I said processing language natural leave so the time series is very important when we're analyzing sentiments it can change the whole value of a sentence just by switching the words around or if you're just counting the words you might get one sentiment where if you actually look at the order they're in you get a completely different sentiment when it rains look for rainbows when it's dark look for stars both of these are positive sentiments and they're based upon the order of which the sentence is going in machine translation given an input in one language RNN can be used to translate the input into a different languages as output I myself very linguistically challenged but if you study languages and your good with languages you know right away that if you're speaking English you would say big cat and if you're speaking Spanish you would say cat big so that translation is really important to get the right order to get uh there's all kinds of parts of speech that are important to know by the order of the words here this person is speaking in English and getting translated and you can see here a person is speaking speaking in English in this little diagram I guess that's denoted by the flags I have a flag I own it no um but they're speaking in English and it's getting translated into Chinese Italian French German and Spanish languages some of the tools coming out are just so cool so somebody like myself who's very linguistically challenged I can now travel into Worlds I would never think of because I can have something translate my English back and forth readily and I'm not stuck with a communication gap so let's dive into what is a recurrent neural network recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer sounds a little confusing when we start breaking it down it'll make more sense and usually we have a propagation forward neural network with the input layers the hidden layers the output layer with the recurrent neural network we turn that on its side so here it is and now our X comes up from the bottom into the hidden layers into Y and they usually draw very simplified X to H with c as a loop a to Y where a B and C are the perimeters a lot of times you'll see this kind of drawing in here digging closer and closer into the H and how it works going from left to right you'll see that the C goes in and then the X goes in so the x is going Upward Bound and C is going to the right a is going out and C is also going out that's where gets a little confusing so here we have xn uh CN and then we have y out and C out and C is based on HT minus one so our value is based on the Y and the H value are connected to each other they're not necessarily the same value because H can be its own thing and usually we draw this or we represent it as a function h of T equals a function of C where H of T minus one that's the last H output and X of T going in so it's the last output of H combined with the new input of x uh where HT is the new state FC is a function with the parameter C that's a common way of denoting it uh HT minus one is the Old State coming out and then xit is an input Vector at time of Step T well we need to cover types of recurrent neural networks and so the first one is the most common one which is a one to1 single output one: one neural network is usually known as a vanilla neural network used for regular machine learning problems why because vanilla's usually considered kind of a just a real basic flavor but because it's a very basic a lot of times they'll call it the vanilla neural network uh which is not the common term but it is you know like kind of a slang term people will know what you're talking about usually if you say that then we run one to mini so you have a single input and you might have a multiple outputs in this case uh image captioning as we looked at earlier where we have not just looking at it as a dog but a dog catching a ball in the air and then you have many to One Network takes in a sequence of inputs examples sentiment analysis where a given sentence can be classif ified as expressing positive or negative sentiments and we looked at that as we were discussing if it rains look for a rainbow so positive sentiment where rain might be a negative sentiment if you were just adding up the words in there and then of course if you're going to do a one: one many to one one to many there's many to many networks takes in a sequence of inputs and generates a sequence of outputs example machine translation so we have a lengthy sentence coming in in English and then going out in all the different languages uh you know just a wonderful tool very complicated set of computations you know if you're a translator you realize just how difficult it is to translate into different languages one of the biggest things you need to understand when we're working with this neural network is what's called The Vanishing gradient problem while training an RNN your slope can be either too small or very large and this makes training difficult when the slope is too small the problem is known as Vanishing gradient and you'll see here they have a nice U image loss of information Through Time so if you're pushing not enough information forward that information is lost and then when you go to train it you start losing the third word in the sentence or something like that or it doesn't quite follow the full logic of what you're working on exploding gradient problem Oh this is one that runs into everybody when you're working with this particular neural network when the slope tends to grow exponentially instead of decaying this problem is called exploding gradient issues in gradient problem long tring time poor performance bad add accuracy and I'll add one more in there uh your computer if you're on a lower-end computer testing out a model will lock up and give you the memory error explaining gradient problem consider the following two examples to understand what should be the next word in the sequence the person who took my bike and blank a thief the students who got into engineering with blank from Asia and you can see in here we have our x value going in we have the previous value going forward and then you back propagate the eror like you do with any neural network and as we're looking for that missing word maybe we'll have the person took my bike and blank was a thief and the student who got into engineering with a blank were from Asia consider the following example the person who took the bike so we'll go back to the person who took the bike was blank a thief in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun so was a thief is singular the student who got into engineering well in order to understand what would be the next word in the sequence the RNN must memorize the previous context whether the subject was singular noun or a plural noun and so you can see here the students who got into engineering with blank were from Asia it might be sometimes difficult for the error to back propagate to the beginning of the sequence to predict what should be the output so when you run into the gradient problem we need a solution the solution to the gradient problem first we're going to look at exploding gradient where we have three different solutions depending on what's going on one is identity initialization so the first thing we want to do is see if we can find a way to minimize the identities coming in instead of having it identify everything just the important information we're looking at next is to truncate the back propagation so instead of having uh whatever information it's sending to the next series we can truncate what it's sending we can lower that particular uh set of layers make those small and finally is a gradient clipping so when we're training it we can clip what that gradient looks like and narrow the training model that we're using when you have a Vanishing gradient the option problem uh we can take a look at weight initialization very similar to the identity but we're going to add more weights in there so it can identify different aspects of what's coming in better choosing the right activation function that's huge so we might be activating based on one thing and we need to limit that we haven't talked too much about activation functions so look at that just minimally uh there's a lot of choices out there and then finally there's long short-term memory networks the lstms and we can make adjustments to that so just like we can clip the gradient as it comes out we can also um expand on that we can increase the memory Network the size of it so it handles more information and one of the most common problems in today's uh setup is what they call longterm dependencies suppose we try to predict the last word in the text the clouds are in the and you probably said sky here we do not need any further context it's pretty clear that the last word is going to be Sky suppose we try to predict the last word in the text I have been staying in Spain for the last 10 years I can speak fluent maybe you said Portuguese or French no you probably said Spanish the word we predict will depend on the previous few words in context here we need the context of Spain to predict the last word in the text it's possible that the gap between the relevant information and the point where it is needed to become very large lstms help us solve this problem so the lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior All recurrent neural networks have the form of a chain of repeating modules of neural network connections in standard rnns this repeating module will have a very simple structure such as a single tangent H layer lstm s's also have a chain-like structure but the repeating module has a different structure instead of having a single neural network layer there are four interacting layers communicating in a very special way lstms are a special kind of recurrent neural network capable of learning long-term dependencies remembering information for long periods of time is their default Behavior LST tms's also have a chain-like structure but the repeating module has a different structure instead of having a single neural network layer there are are four interacting layers communicating in a very special way as you can see the deeper we dig into this the more complicated the graphs kit in here I want you to note that you have x a t minus one coming in you have x a t coming in and you have x a t + one and you have H of T minus one and H of T coming in and H of t + one going out and of course uh on the other side is the output a um in the middle we have our tangent H but it occurs in two different places so not only when we're Computing the x of t + one are we getting the tangent H from X of T but we're also getting that value coming in from the X of T minus one so the short of it is as you look at these layers not only does it does the propagate through the first layer goes into the second layer back into itself but it's also going into the third layer so now we're kind of stacking those up and this can get very complicated as you grow that in size it also grows in memory too and in the amount of resources it takes uh but it's a very powerful tool to help us address the problem of complicated long sequential information coming in like we were just looking at in the sentence and when we're looking at our long shortterm memory network uh there's three steps of processing processing in the lstms that we look at the first one is we want to forget irrelevant parts of the previous state you know a lot of times like you know is as in a unless we're trying to look at whether it's a plural noun or not they don't really play a huge part in the language so we want to get rid of them then selectively update cell State values so we only want to update the cell State values that reflect what we're working on and finally we want to put only output certain parts of the cell state so whatever is coming out we want to limit what's going out too and let's dig a little deeper into this let's just see what this really looks like uh so step one decides how much of the past it should remember first step in the lstm is to decide which information to be omitted in from the cell in that particular time step it is decided by the sigmoid function it looks at the previous state h of T minus one and the current input x a and computes the function so you can see over here we have a function of T equals the sigmoid function of the weight of f the H at T minus one and then X at t plus of course you have a bias in there with any of ver neural network so we have a bias function so F of T equals forget gate decides which information to delete that is not important from the previous time step considering an L STM is fed with the following inputs from the previous and present time step Alice is good in physics John on the other hand is good in chemistry so previous output JN plays football well he told me yesterday over the phone that he had served as a captain of his college football team that's our current input so as we look at this the first step is the forget gate realizes there might be a change in context after encountering the First full stop Compares with the current input sent to of exit so we're looking at that full stop and then Compares it with the input of the new sentence the next sentence talks about John so the information on Alice is deleted okay that's important to know so we have this input coming in and if we're going to continue on with John then that's going to be the primary information we're looking at the position of the subject is vacated and is assigned to John and so in this one we've seen that we've weeded out a whole bunch of information and we're only passing information on John since that's now the new topic so step two two is in to decide how much should this unit add to the current state in the second layer there are two parts one is a sigmoid function and the other is the tangent H in the sigmoid function it decides which values to let through zero or one tangent H function gives the weightage to the values which are passed is setting their level of importance minus one to one and you can see the two formulas that come up uh the I of T equals the sigmoid of the weight of I a to t minus1 x of t plus the bias of I and the C of T equals the tangent of H of the weight of C of H of T minus 1 x of t plus the bias of C so our I of T equals the input gate determines which information to let through based on its significance in the current time step if this seems a little complicated don't worry because a lot of the programming is already done when we get to the case study understanding though that this is part of the program is important when you're trying to figure out these what to set your settings at you should also note when you're looking at this it should have some symbol to your forward propagation neural networks where we have a value assigned to a weight plus a bias very important steps than any of the neural network layers whether we're propagating into them the information from one to the next or we're just doing a straightforward neural network propagation let's take a quick look at this what it looks like from the human standpoint um as I step out of my suit again consider the current input at xft John plays football well he told me yesterday over the phone that had served as a captain of his college football team that's our input input gate analyses the important information John plays football and he was a captain of his college team is important he told me over the phone yesterday is less important hence it is forgotten this process of adding some new information can be done via the input gate now this example is as a human form and we'll look at training this stuff in just a minute uh but as a human being if I wanted to get this information from a conversation maybe it's a Google voice listening in on you or something like that um how do we weed out the information that he was talking to me on the phone yesterday well I don't want to memorize that he talked to me on the phone yesterday or maybe that is important but in this case it's not I want to know that he was the captain of the football team I want to know that he served I want to know that John plays football and he was the captain of the college football team those are the two things that I want to take away as a human being again we measure a lot of this from the human Viewpoint and that's also how we try to train them so we can understand these neural networks finally we get to step three decides what part of the current cell State makes it to the output the third step is to decide what will be our output first we run a sigmoid layer which decides what parts of the cell State make it to the output then we put the cell State through the tangent H to push the values to be between minus1 and one and multiply it by the output of the sigmoid gate so when we talk about the output of T we set that equal to the sigmoid of the weight of 0 of the H of t minus1 one and back One Step in Time by the x of t plus of course the bias the H of T equals the out of T times the tangent of the tangent h of C so our oot equals the output gate allows the past in information to impact the output in the current time stamp let's consider the example to predicting the next word in the sentence John played tremendously well against the opponent and won for his team for his contributions Brave blank was awarded player of the match there could be a lot of choic for the empty space current input Brave is an adjective adjectives describe a noun John could be the best output after Brave thumbs up for John awarded player of the match and if you were to pull just the nouns out of the sentence team doesn't look right because that's not really the subject we're talking about contributions you know Brave contributions or Brave team Brave player Brave match um so you look at this and you can start to train this these this neural network so starts looking at and goes oh no JN is what we're talking about so brave is an adjective Jon's going to be the best output and we give John a big thumbs up and then of course we jump into my favorite part the case study use case implementation of lstm let's predict the prices of stocks using the lstm network based on the stock price data between 2012 2016 we're going to try to predict the stock prices of 2017 and this will be a narrow set of data we're not going to do the whole stock market it turns out that the New York Stock Exchange generates roughly three terabytes of data per day that's all the different trades up and down of all the different stocks going on and each individual one uh second to second or nanc to nanc uh but we're going to limit that to just some very basic fundamental information so don't think you're going to get rich off this today but at least you can give an a you can give a step forward in how to start processing something like stock prices a very valid use for more machine learning in today's markets use case implementation of lstm let's dive in we're going to import our libraries we're going to import the training set and uh get the scaling going um now if you watch any of our other tutorials a lot of these pieces just start to look very familiar CU it's very similar setup uh but let's take a look at that and um just a reminder we're going to be using Anaconda the Jupiter notebook so here I have my anaconda Navigator when we go under environments I've actually set up a carass python 36 I'm in Python 36 and U nice thing about Anaconda especially the newer version remember a year ago messing with Anaconda different versions of python and different environments um Anaconda now has a nice interface um and I have this installed both on a Ubuntu Linux machine and on windows so it works fine on there you can go in here and open a terminal window and then in here once you're in the terminal window this is where you're going to start uh installing using pip to install your different modules and everything now we've already pre-installed them so we don't need to do that in here uh but if you don't have them installed on your particular environment you'll need to do that and of course you don't need to use the anaconda or the Jupiter you can use whatever favorite python ID you like I'm just a big fan of this because it keeps all my stuff separate you can see on this machine I have specifically installed one for carass since we're going to be working with carass under tensor flow when we go back to home I've gone up here to application and that's the environment of loaded on here and then we'll click on the launch Jupiter notebook now I've already in my Jupiter notebook um have set up a lot of stuff so that we're ready to go kind of like uh Martha Stewarts and the old cooking shows we want to make sure we have all our tools for you so you're not waiting for them to load and uh if we go up here to where it says new you can see where you can um create a new Python 3 that's what we did here underneath the setup so it already has all the modules installed on it and I'm actually rename this so if you go under file you can rename it we've I'm calling it RNN stock and let's just take a look at start diving into the code let's get into the exciting part now we've looked at the tool and of course you might be using a different tool which is fine uh let's start putting that code in there and seeing what those Imports and uploading everything looks like now first half is kind of boring when we hit the rum button because we're going to be importing numpy as NP that's uh uh the number python which is your numpy array and the mat plot library because we're going to do some plotting at the end and our pandas for our data set our pandas is PD and when I hit run uh it really doesn't do anything except for load those modules just a quick note let me just do a quick uh draw here oops shift alt there we go you'll notice when we're doing this setup if I was to divide this up oops I'm going to actually um let overlap these here we go uh this first part that we're going to do is our data prep a lot of prepping involved um in fact depending on what your system is say we're using carass I put an overlap here uh but you'll find that almost maybe even half of the code we do is all about the data prep and the reason I overlap this with uh carass let me just put that down because that's what we're working in uh is because carass has like their own preset stuff so it's already pre-built in which is really nice so there's a couple Steps A lot of times that are in the Kass setup uh we'll take a look at that to see what comes up in our code as we go through and look at stock and then the last part is to evaluate and if you're working with um shareholders or uh you know classroom whatever it is you're working with uh the evaluate is the next biggest piece um so the actual code here cross is a little bit more but when you're working with uh some of the other packages you might have like three lines that might be it all your stuff is in your pre-processing in your data since carass has is is Cutting Edge and you load the individual layers you'll see that there's a few more lines here and cross is a little bit more robust and then you spend a lot of times uh like I said with the evaluate you want to have something you present to everybody else to say hey this is what I did this is what it looks like so let's go through those steps this is like a kind of just general overview and let's just take a look and see what the next set of code looks like and in here we have a data set train and it's going to be read using the PD our pandas read CSV and it's a Google stock pric train.csv and so under this we have training set equals data set train. iocation and we've kind of sorted out part of that so what's going on here let's just take a look at let's let's look at the actual file and see what's going on there now if we look at this uh ignore all the extra files on this um I already have a train and a test set where it's sorted out this is important to notice because a lot of times we do that as part of the pre-processing of the data we take 20% of the data out so we can test it and then we train the the rest of it that's what we use to create our neural network that way we can find out how good it is uh but let's go ahead and just take a look and see what that looks like as far as the file itself and I went ahead and just opened this up in a basic word pad text editor just so we can take a look at it certainly you can open up in Excel or any other kind of spreadsheet um and we note that this is a comma separated variables we have a date uh open high low close volume this is the standard stuff that we import into our stock one of the most basic set of information you can look at in stock it's all free to download um in this case we downloaded it from uh Google that's why we call it the Google stock price um and it specifically is Google this is the Google stock values from uh as you can see here we started off at 13 20102 so when we look at this first setup up here uh we have a data set train equals pdor CSV and if you noticed on the original frame um let me just go back there they had it set to home Ubuntu downloads Google stock price train I went ahead and changed that because we're in the same file where I'm running the code so I've saved this particular python code and I don't need to go through any special paths or have the full path on there and then of course we want to take out um certain values in here and you're going to notice that we're using um our data set and we're now in pandas uh so pandas basically it looks like a spreadsheet um and in this case we're going to do I location which is going to get specific locations the first value is going to show us that we're pulling all the rows in the data and the second one is we're only going to look at columns one and two and if you remember here from our data as we switch back on over columns we always start with zero which is the date and we're going to be looking at open and high which would be one and two we'll just label that right there so you can see now when you go back and do this you certainly can extrapolate and do this on all the columns um but for the example let's just limit a little bit here so that we can focus on just some key aspects of stock and then we'll go up here and run the code and uh again I said the first half is very boring whenever you hit the Run button it doesn't do anything because we're still just loading the data and setting it up now that we've loaded our data we want to go ahead and scale it we want to do what they call feature scaling and in here we're going to pull it up from the SK learn or the SK kit pre-processing import min max scaler and when you look at this you got to remember that um biases in our data we want to get rid of that so if you have something that's like a really high value um let's just draw a quick graph and I have something here like the maybe the stock has a value One stock has a value of 100 and another stock has a value of five um you start to get a bias between different stocks and so when we do this we go ahead and say okay 100's going to be the Max and five is going to be the men and then everything else goes and then we change this so we just squish it down I like the word squish so it's between one and zero so 100 equals 1 or 1 equal 100 and 0 equals 5 and you can just multiply it's usually just a simple multiplication we're using uh multiplication so it's going to be uh - 5 and then 100 divided or 95 divided by 1 so or whatever value is is divided by 95 and uh once we've actually created our scale we've telling it's going to be from 0o to one we want to take our training set and we're going to create a training set scaled and we're going to use our scaler SC and we're going to fit we're going to fit and transform the training Set uh so we can now use the SC this this particular object we'll use it later on our testing set because remember we have to also scale that when we go to test our uh model and see how it works and we'll go ahead and click on the run again uh it's not going to have any output yet cuz we're just setting up all the variables okay so we pasted the data in here and we're going to create the data structure with a 60 time steps and output first note we're running 60 time steps and that is where this value here also comes in so the first thing we do is we create our X train and Y train variables and we set them to an empty python array very important to remember what kind of array we're in and what we're working with and then we're going to come in here we're going to go for I in range 60 to 1258 there's our 60 60 time steps and the reason we want to do this is as we're adding the data in there there's nothing below the 60 so if we're going to use 60 time steps uh we have to start at 60 because it includes everything underneath of it otherwise you'll get a pointer error and then we're going to take our X train and we're going to append training set scale SC this is a scaled value between 0 and one and then as I is equal to 60 this value is going to be um 60 - 60 is 0 so this actually is 0 to I so it's going to be 0 to 60 1 to 61 let me just circle this part right here 1 to 61 uh 2 to 62 and so on and so on and if you remember I said 0 to 60 that's incorrect because it does not count remember it starts at zero so this is a count of 60 so it's actually 59 important to remember that as we're looking at this and then the second part of this that we're looking at so if you remember correctly here we go we go from uh 0 to 59 of I and then we have a comma a zero right here and so finally we're just going to look at the open value now I know we did put it in there for one to two um if you remember correctly it doesn't count the second one so it's just the open value we're looking at just open um and then finally we have y train aend training set I to zero and if you remember correctly I to or I comma 0er if you remember correctly this is 0 to 59 so there's 60 values in it uh so when we do I down here this is number 60 so we're going to do this is we're creating an array and we have 0 to 59 and over here we have number 60 which is going into the Y train it's being appended on there and then this just goes all the way way up so this is down here is uh uh 0 to 59 and we'll call it 60 since that's the value over here and it goes all the way up to 1258 that's where this value here comes in that's the length of the data we're loading so we've loaded two arrays we've loaded one array that has uh which is filled with arrays from 0 to 59 and we loaded one array which is just the value and what we're looking at you want to think about this as a Time sequence uh here's my open open open open open open what's the next one in the series so we're looking at the Google stock and each time it opens we want to know what the next one 0o through 59 what's 60 1 through 60 what's 61 2 through 62 what's 62 and so on and so on going up and then once we've loaded those in our for Loop we go ahead and take XT train and Y train equals np. array XT tr. NP array y train we're just converting this back into a numpy array that way we can use all the cool tools that we get with numpy array including reshaping so if we take a look and see what's going on here we're going to take our X train we're going to reshape it wow what the heck does reshape mean uh that means we have an array if you remember correctly um so many numbers by 60 that's how wide it is and so we're when you when you do xtrain do shape that gets one of the shapes and you get um XT train. shape of one gets the other shape and we're just making sure the data is formatted correctly and so you use this to pull the fact that it's 60 by um in this case where's that value 60 by 1199 1258 minus 60 1199 and we're making sure that that is shaped correctly so the data is grouped into uh 1199 by 60 different arrays and then the one on the end just means at the end because this when you're dealing with shapes and numpy they look at this as layers and so the in layer needs to be one value that's like the leaf of a tree where this is the branch and then it branches out some more um and then you get the Leaf np. reshape comes from and using the existing shapes to form it we'll go ahead and run this piece of code again there's no real output and then we'll import our different carass modules that we need so from Cross models we're going to import the sequential model we're dealing with sequential data we have our dense layers we have actually three layers we're going to bring in our dents our lstm which is what we're focusing on and our Dropout and we'll discuss these three layers more in just a moment but you do need the with the lstm you do need the Dropout and then the final layer will be the dents but let's go ahead and run this and that'll bring Port our modules and you'll see we get an error on here and if you read it closer it's not actually an error it's a warning what does this warning mean these things come up all the time when you're working with such Cutting Edge modules that are completely being updated all the time we're not going to worry too much about the warning all it's saying is that the h5py module which is part of carass is going to be updated at some point and uh if you're running new stuff on carass and you start updating your carass system you better make sure that your H5 Pi is updated too otherwise you're going to have an error later on and you can actually just run an update on the H5 Pi now if you wanted to not a big deal we're not going to worry about that today and I said we were going to jump in and start looking at what those layers mean I meant that and uh we're going to start off with initializing the RNN and then we'll start adding those layers in and you'll see that we have the lstm and then the Dropout lstm then Dropout lstm then Dropout what the heck is that doing so let's explore that we'll start by initializing the RNN regressor equals sequential because we're using the sequential model and we'll run that and load that up and then we're going to start adding our lstm layer and some Dropout regularization and right there should be the Q Dropout regularization and if we go back here and remember our exploding gradient well that's what we're talking about the uh Dropout drops out unnecessary data so we're not just shifting huge amount of data through um the network so and so we go in here let's just go ahead and uh add this in I'll go ahead and run this and we had three of them so let me go ah and put all three of them in and then we can go back over them there's the second one and let's put one more in let's put that in and we'll go and put two more in I meant to put I said one more in but it's actually two more in and then let's add one more after that and as you can see each time I run these they don't actually have an output so let's take a closer look and see what's going on here so we're going to add our first lstm layer in here we're going to have units 50 the units is the positive integer and it's the dimensionality of the output space this is what's going out into the next layer so we might have 60 coming in but we have 50 going out we have a return sequence because it is a sequence data so we want to keep that through and then you have to tell it what shape it's in well we already know the shape by just going in here and looking at xtrain shape so input shape equals the xtrain shape of one comma 1 makes it really easy you don't have to remember all the numbers that put in 60 or whatever else is in there you just let it tell the regressor what model to use and so we follow our STM with a Dropout layer now understanding the Dropout layer is kind of exciting because one of the things that happens is we can overtrain our Network that means that our neural network will memorize such specific data that it has trouble predicting anything that's not in that specific realm to fix for that each time we run through the training mode we're going to take 02 or 20% of our neurons and just turn them off so we're only going to train on the other ones and it's going to be random that way each time we pass through this we don't overtrain these nodes come back in in the next training cycle we randomly pick a different 20 and finally they see a big difference as we go from the first to the second and third and fourth the first thing is we don't have to input the shape because the shape's already the output units is 50 here this Auto The Next Step automatically knows this layer is putting out 50 and because it's the next layer it automatically sets that and says oh 50 is coming out from our last layer is coming out you know goes into the regressor and of course we have our Dropout and that's what's coming into this one and so on and so on and so the next three layers we don't have to let it know what the shape is it automatically understands that and we're going to keep the units the same we're still going to do 50 units it's still a sequence coming through 50 units and a sequence now the next piece of code is what brings it all together let's go ahead and take a look at that and we come in here we put the output layer the dense layer and if you remember up here we had the three layers we had uh lstm Dropout and d uh D just says we're going to bring this all down into one output instead of putting out a sequence we just know want to know the answer at this point and let's go ahead and run that and so in here you notice all we're doing is setting things up one step at a time so far we've brought in our way up here we brought in our data we brought in our different modules we formatted the data for training it we set it up you know we have our y x train and our y train we have our source of data and the answers we're we know so far that we're going to put in there we reshaped that we've come in and built our carass we've imported our different layers and we have in here if you look we have what uh five total layers now carass is a little different than a lot of other systems because a lot of other systems put this all in one line and do it automatic but they don't give you the options of how those layers interface and they don't give you the options of how the data comes in carass is Cutting Edge for this reason so even though there's a lot of extra steps in building the model this has a huge impact on the output and what we can do with this these new models from carass so we brought in our dense we have our full model put together our regressor so we need to go ahead and compile it and then we're going to go ahead and fit the data we're going to compile the pieces so they all come together and then we're going to run our training data on there and actually recreate our regressor so it's ready to be used so let's go ahead and compile that and I can go ahe and run that and uh if you've been looking at any of our other tutorials on neural networks you'll see we're going to use the optimizer atom atom is optimized for Big Data there's a couple other optimizers out there uh beyond the scope of this tutorial but certainly Adam will work pretty good for this and loss equals mean squar value so when we're training it this is what we want to base the loss on how bad is our error well we're going to use the mean squared value for our error and the atom Optimizer for its differential equations you don't have to know the math behind them but certainly it helps to know what they're doing and where they fit into the bigger models and then finally we're going to do our fit fitting the RN into the training set we have the regressor do fit xtrain y train epics and batch size so we know where this is this is our data coming in for the X train our y train is the answer we're looking for of our data our sequential input epic is how many times we're going to go over the whole data set we created a whole data set of XT train so this is each each of those rows which includes a Time sequence of 60 and badge size another one of those things where carass really shines is if you were pulling this save from a large file instead of trying to load it all into RAM it can now pick smaller batches up and load those in directly we're not worried about pulling them off a file today this isn't big enough to cause the computer too much of a problem to run not too straining on the resources but as we run this you can imagine what would happen if I was doing a lot more than just one column in one set of stock in this case Google stock imagine if I was doing this across all the stocks and I had instead of just the open I had open close high low and you can actually find yourself with about 13 different variables time 60 cuz it's a Time sequence suddenly you find yourself with a gig of memory you're loading into your RAM which will just completely you know if it's just if you're not on multiple computers or cluster you're GNA start running into resource problems but for this we don't have to worry about that so let's go ahead and run this and this will actually take a little bit on my computer CU it's an older laptop and give it a second to kick in there there we go all right so we have epic so this is going to tell me it's running the first run through all the data and as it's going through it's batching them in 32 pieces so 32 uh lines each time and there's 1198 I think I said 1199 earlier but 1198 I was off by one and each one of these is 13 seconds so you can imagine this is roughly 20 to 30 minutes runtime on this computer like I said it's an older laptop running at uh 0.9 GHz on a dual processor and that's fine what we'll do is I'll go ahead and stop go get a drink a coffee and come back and let's see what happens at the end and where this takes us and like any good cooking show I've kind of gotten my latte I also have some other stuff running in the background so you'll see these numbers jumped up to like 19 seconds 15 seconds which is through you can see we've run it through 100 steps or 100 epics so the question is what does all this mean one of the first things you'll notice is that our loss can is over here it kind of stopped at 0.0014 but you can see it kind of goes down until we hit about 0.14 three times in a row so we guessed our epic pretty close since our loss has remain the same on there so to find out what we're looking at we're going to go ahead and load up our test data the test data that we didn't process yet and uh real stock price data set test iocation this is the same thing we did when we prepped the data in the first place so let's go ahead and go through this code and we can see we've labeled it part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test CSV and you see I've change the path on it for my computer and uh then we'll call it the real stock price and again we're doing just the one column here and the values from I location so it's all the rows and just the values from these that one location that's the open Stock open and let's go ahead and run that so that's loaded in there and then let's go ahead and uh create we have our inputs we're going to create inputs here and this should all look familiar this is the same thing we did before we're going to take our data set total we're going to do a little Panda concap from the data set train now remember the end of the data set train is part of the data going in and let's just visualize that just a little bit here's our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of columns it was 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data so to find out what we're looking at we're going to go ahead and load up our test data the test data that we didn't process yet and a real stock price data set test iocation this is the same thing we did when we prepped the data in the first place so let's go aad ahead and go through this code and we can see we've labeled it part three making the predictions and visualizing the results so the first thing we need to do is go ahead and read the data in from our test CSV you see I've changed the path on it for my computer and uh then we'll call it the real stock price and again we're doing just the one column here and the values from iocation so it's all the rows and just the values from these that one location that's the open Stock open let's go ahead and run that so that's loaded in there and then let's go ahead and uh create we have our inputs we're going to create inputs here and this should all look familiar this is the same thing we did before we're going to take our data set total we're going to do a little Panda concap from the data State train now remember the end of the data set train is part of the data going in let's just visualize that just a little bit here's our train data let me just put TR for train and it went up to this value here but each one of these values generated a bunch of column so it's 60 across and this value here equals this one and this value here equals this one and this value here equals this one and so we need these top 60 to go into our new data cuz that's part of the next data or it's actually the top 59 so that's what this first setup is over here is we're going in we're doing the real stock price and we're going to just take the data set test and we're going to load that in and then the real stock price is our data test. test location so we're just looking at that first uh column the open price price and then our data set total we're going to take pandas and we're going to concat and we're going to take our data set train for the open and our data set test open and this is one way you can reference these columns we've referenced them a couple different ways we've referenced them up here with the one two but we know it's labeled as a panda set is open so pandas is great that way lots of Versatility there and we'll go ahead and go back up here and run this there we go and uh you'll notice this is the same as what we did before we have our open data set we pended our two different or concatenated our two data sets together we have our inputs equals data set total length data set total minus length of data set minus test minus 60 values so we're going to run this over all of them and you'll see why this works because normally when you're running your test set versus your training set you run them completely separate but when we graph this you'll see that we're just going to be we'll be looking at the part that uh we didn't train it with to see how well it graphs and we have our inputs equals inputs. reshapes or reshaping like we did before we're Transforming Our inputs so if you're remember from the transform between zero and one and uh finally we want to go ahead and take our X test and we're going to create that X test and for I in range 60 to 80 so here's our X test and we're appending our inputs I to 60 which remember is 0 to 59 and I comma 0er on the other side so it's just the First Column which is our open column and uh once again we take our X test we convert it to a numpy array we do the same reshape we did before and uh then we get down to the final two lines and here we have something new right here on these last two lines let me just highlight those or or mark them predicted stock price equals regressor do predicts X test so we're predicting all the stock including both the training and the testing model here and then we want to take this prediction and we want to inverse the transform so remember we put them between zero and one well that's not going to mean very much to me to look at a at a float number between zero and one I want the dollar amounts I want to know what the cash value is and we'll go ahead and run this and you'll see it runs much quicker than the training that's what's so wonderful about these neural networks once you put them together it takes just a second to run the same neural network that took us what a half hour to train ahead and plot the data we're going to plot what we think it's going to be and we're going to plot it against the real data what what the Google stock actually did so let's go ahead and take a look at that in code and let's uh pull this code up so we have our PLT that's our uh oh if you remember from the very beginning let me just go back up to the top we have our matplot library. pyplot as PLT that's where that comes in and we come down here we're going to plot let me get my drawing thing out again we're going to go ahead and PLT is basically kind of like an object it's one of the things that always through me when I'm doing graphs in Python because I always think you have to create an object and then it loads that class in there well in this case PLT is like a canvas you're putting stuff on so if you've done HTML 5 you'll have the canvas object this is the canvas so we're going to plot the real stock price that's what it actually is and we're going to give that color red so it's going to be a bright red red we're going to label it real Google stock price and then we're going to do our predicted stock and we're going to do it in blue and it's going to be labeled predict it and we'll give it a title because it's always nice to give a title to your uh graph especially if you're going to present this to somebody you know to your shareholders in the office and uh the X label is going to be time because it's a Time series and we didn't actually put the actual date and times on here but that's fine we just know they're incremented by time and then of course the Y Lael is the actual stock price pt. Legend tells us to build the legend on here so that the color red and and real Google stock price show up on there and then the plot shows us that actual graph so let's go ahead and run this and see what that looks like and you can see here we have a nice graph and let's talk just a little bit about this graph before we wrap it up here's our Legend I was telling you about that's why we have the legend to showed the prices we have our title and everything and you'll notice on the bottom we have a Time sequence we didn't put the actual time in here now we could have we could have gone ahead and um plotted the X since we know what the the dates are and plotted this to dates but we also know this only the last piece of data that we're looking at so last piece of data which ends somewhere probably around here on the graph I think it's like about 20% of the data probably less than that we have the Google price and the Google price has this little up jump and then down and you'll see that the actual Google instead of uh a turn down here just didn't go up as high and didn't low go uh down so our prediction has a same pattern but the overall value is pretty far off as far as um stock but then again we're only looking at one column we're only looking at the open price we're not looking at how many volumes were traded like I was pointing out earlier we talk about stock just right off the bat there's six columns there's open high low close volume then there's WEA uhu I mean volume shares then there's the adjusted open adjusted High adjusted low adjusted close they have a special formula to predict exactly what it would really be worth based on the value of the stock and then from there there's all kinds of other stuff you can put in here so we're only looking at one small aspect the opening price of the stock and as you can see here we did a pretty good job this curve follows the curve pretty well it has like a you know little jumps on it bins they don't quite match up so this Bend here does not quite match up with that bend there but it's pretty darn close you have the basic shape of it and the prediction isn't too far off and you can imagine that as we add more data in and look at different aspects in the specific domain of stock we should be able to get a better representation each time we drill in deeper of course this took a half hour for my program my computer to train so you can imagine that if I was running it across all those different variables might take a little bit longer to train the data not so good for doing a quick tutorial like this so we're going to dive right into what is carass we're also uh go all the way through this into a couple of tutorials because that's where you really learn a lot is when you roll up your sleeves so we talk about what what is carass carass is a highlevel deep learning API written in Python for easy imple implementation of neural networks uses deep learning Frameworks such as tensorflow pie torch Etc as backend to make computation faster and this is really nice because as a programmer there is so much stuff out there and it's evolving so fast it can get confusing and having some kind of high level order in there we can actually view it and easily program these different neural networks uh is really powerful it's really powerful to to um uh have something out really quick and also be able to start testing your models and seeing where you're going so cross works by using complex deep learning Frameworks such as tensorflow pytorch um mlpl Etc as a back end for fast computation while providing a userfriendly and easy to learn front end and you can see here we have the cross API specifications and under that you'd have like TF carass for tensor flow thano coras and so on and then you have your tensorflow workflow that this is all sitting on top of and this is like I said it organizes everything the heavy lifting is still done by tensor flow or whatever you know underlying package you put in there and this is really nice because you don't have to um dig as deeply into the heavy- end stuff while still having a very robust package you can get up and running rather quickly and it doesn't distract from the processing time because all the heavy lifting is done by packages like tensor flow this is the organization on top of it so the working principle of carass uh the working principle of coros is cross uses computational graphs to express and evaluate mathematical Expressions you can see here we put them in blue they have the expression um expressing complex problems as a combination of simple mathematical operations uh where we have like the percentage or in this case in Python that's usually your uh left your um remainder or multiplication uh you might have the operator of x uh to the power of3 and it uses useful for calculating derivatives by using uh back propagation so if we're doing with neural networks we send the error back up to figure out how to change it uh this makes it really easy to do that without really having not banging your head and having to handr write everything it's easier to implement distributed computation and for solving complex problems uh specify input and outputs and make sure all nodes are connected and so this is really nice as you come in through is that um as your layers are going in there you can get some very complicated uh different setups nowadays which we'll look at in just a second and this just makes it really easy to start spinning this stuff up in trying out the different models so when we look at caros models uh caros model we have a sequential model sequential model is a linear stack of layers where the previous layer leads into the next layer and this if you've done anything else even like the sklearn with their neural networks and propagation and any of these setups this should look familiar you should have your input layer it goes into your layer one layer two and then to the output layer and it's useful for simple classifier decoder models and you can see down here we have the model equals a coros sequential and this is the actual code you can see how easy it is uh we have a layer that's dense your layer one has an activation uh they're using the railo in this particular example and then you have your name layer one layer dense Rao name Layer Two and so forth uh and they just feed right into each other so it's really easy just to stack them as you can see here and it automatically takes care of everything else for you and then there's a functional model and this is really where things are at this is new make sure you update your carass or you'll find yourself running this um doing the functional model you'll run into an error code because this is a fairly new release and he uses multi-input and multi-output model the complex model which Forks into two or more branches and you can see here we have our image inputs equals your coros input shape equals 32x 32x 3 you have your dense layers dense 64 activation railu this should look similar to what you already saw before uh but if you look at the graph on the right it's going to be a lot easier to see what's going on you have two different inputs uh and one way you could think of this is maybe one of those is a small image and one of those is a full-sized image and that feedback goes into you might feed both of them into one node because it's looking for one thing and then only into one node for the other one and so you can start to get kind of an idea that there's a lot of use for this kind of split and this kind of setup uh we have multiple information coming in but the information's very different even though it overlaps and you don't want it to send it through the same neural network um and they're finding that this trains faster and is also has a better result depending on how you split the data up and and how you Fork the models coming down and so in here we do have the two complex uh models coming in uh we have our image inputs which is a 32x 32 by3 or three channels or four if you're having an alpha channel uh you of your dense your layers dense is 64 activation using The Rao very common uh x equals dense inputs X layers dense x64 activation equals Ru X outputs equals layers dense 10 X model equals coros model inputs equals inputs outputs equals outputs name equals NC model uh so we add a little name on there and again this is this kind of split here this is setting us up to um have the input go into different areas so if you're already looking at corus you probably already have this answer what are neural networks uh but it's always good to get on the same page and for those people who don't fully understand neural networks to dive into them a little bit or do a quick overview neural networks are deep learning algorithms modeled after the human brain they use multiple neurons which are mathematical operations to break down and solve complex maical problems and so just like the neuron one neuron fires in it fires out to all these other neurons or nodes as we call them and eventually they all come down to your output layer and you can see here we have the really standard graph input layer a hidden layer and an output layer one of the biggest parts of any data processing is your data pre-processing uh so we always have to touch base on that with a neural network like many of these models they're kind of uh when you first start using them they're like a black box you put your data in you train it and you test it and see how good it was and you have to pre-process that data cuz bad data in is uh bad outputs so in data pre-processing we will create our own data examples set with carass the data consists of a clinical trial conducted on 2100 patients ranging from ages 13 to 100 with a the patients under 65 and the other half over 65 years of age we want to find the possibility of a patient experiencing side effects due to their age and you can think of this in today's world with h covid uh what's going to happen on there and we're going to go ahead and do an example of that in our uh live Hands-On like I said most of this you really need to have hands on to understand so let's go ahead and bring up our anaconda and uh I'll open that up and open up a Jupiter notebook for doing the python code in now if you're not familiar with those you can use pretty much any of your uh setups I just like those for doing demos and uh showing people especially shareholders it really helps cuz it's a nice visual so let me go and flip over to our anaconda and the Anaconda has a lot of cool to tools they just added datal lore and IBM Watson Studio Cloud into the Anaconda framework but we'll be in the Jupiter lab or Jupiter notebook um I'm going to do Jupiter notebook for this because I use the lab for like large projects with multiple pieces because it has multiple tabs where the notebook will work fine for what we're doing and this opens up in our browser window because that's how Jupiter notebook so Jupiter notebook is set to run and we'll go under new create a new Python 3 and uh it creates an Untitled python we'll go ahead and give this a title and we'll just call this uh cross tutorial and let's change that to Capital there we go I'm go and just rename that and the first thing we want to go ahead and do is uh get some pre-processing tools involved and so we need to go ahead and import some stuff for that like our numpy do some random number Generation Um I mentioned sklearn or your s kit if you're installing sklearn the sklearn stuff it's a s kit you want to look up that should be a tool of anybody who is um doing data science if if you're not if you're not famili with the sklearn toolkit it's huge uh but there's so many things in there that we always go back to and we want to go ahead and create some train labels and train samples uh for training our data and then just a note of what we're we're actually doing in here uh let me go ahead and change this this is kind of a fun thing you can do we can change the code to markdown and then markdown code is nice for doing examples once you've already built this uh our example data we're going to do experimental there we go experimental drug was tested on 2100 individuals between 13 to 100 years of age half the participants are under 65 and 95% of participants are under 65 experience no side effects well 95% of participants over 65 um experience side effects so that's kind of where we're starting at um and this is just a real quick example because we're going to do another one with a little bit more uh complicated information uh and so we want to go ahead and generate our setup uh so we want to do for I in range and we want to go ahead and create if you look here we have random integers train the labels of pin so we're just creating some random data uh let me go ahead and just run that and so once we've created our random data and if you if I mean you can certainly ask for a copy of the code from Simply learn they'll send you a copy of this or you can zoom in on the video and see how we went ahead and did our train samples a pin um and we're just using this I do this kind of stuff all the time I was running a thing on uh they had to do with errors following a bell-shaped curve on uh a standard distribution error and so what do I do I generate the data on a standard distribution eror to see what it looks like and how my code processes it since that was the base line I was looking for in this we're just doing uh uh generating random data for our setup on here and uh we could actually go in um print some of the data up let's just do this print um we'll do train samples and we'll just do the first um five pieces of data in there to see what that looks like and you can see the first five pieces of data in our train samples is 49 85 41 689 19 just random numbers generated in there that's all that is uh and we generated significantly more than that um let's see 50 up here 1,000 yeah so there's 1,000 here 1,000 numbers we generated and we could also if we wanted to find that out we could do a quick uh print the length of it and so or you could do a shape kind of thing and if you're using numpy although the link for this is just fine and there we go it's actually 2100 like we said in the demo setup in there and then we want to go ahead and take our labels oh that was our train labels we also did samples didn't we uh so we could also print do the same thing oh labels um and let's change this to labels and [Music] labels and run that just to double check and sure enough we have 2100 and they're labeled one Z one0 one0 I guess that's if they have symptoms or not one symptoms uh 0 none and so we want to go ahead and take our train labels and we'll convert it into a numpy array and the same thing with our samples and let's go ahead and run that and we also Shuffle uh this is just a neat feature you can do in uh numpy right here put my drawing thing on which I didn't have on earlier um I can take the data and I can Shuffle it uh so we have our so it's it just randomizes it that's all that's doing um we've already randomized it so it's kind of an Overkill it's not really necessary but if you're doing uh a larger package where the data is coming in and a lot of times it's organized somehow and you want to randomize it just to make sure that that you know the input doesn't follow a certain pattern uh that might create a bias in your model and we go ahead and create a scaler uh the scaler range uh minimum Max scaler feature range 0 to one uh then we go ahead and scale the uh scaled train samples we're going to go ahead and fit and transform the data uh so it's nice and scaled and that is the age uh so you can see up here we have 49 85 41 we're just moving that so it's going to be uh between zero and one and so this is true with any of your neural networks you really want to convert the data uh to zero and one otherwise you create a bias uh so if you have like a 100 creates a bias versus the math behind it gets really complicated uh if you actually start multiplying stuff because there's a lot of multiplication addition going on in there that higher end value will eventually multiply down and it will have a huge bias as to how the model fits it and then it will not fit as well and then one of the fun things we can do in Jupiter notebook is that if you have a variable you're not doing anything with it it's the last one on the line it will automatically print um and we're just going to look at the first five samples on here it's just going to print the first five samples and you can see here we go uh 9 1957 91 so everything's between zero and one and that just shows us that we scaled it properly you it looks good uh it really helps a lot to do these kind of print UPS halfway through uh you never know what's going to go on there I don't know how many times I've gotten down and found out that the data sent to me that I thought was scaled was not and then I have to go back and track it down and figure it out on there uh so let's go ahead and create our artificial neural network and for doing that this is where we start diving into tensor flow and carass uh tensor flow if you don't know the history of tensor flow it helps to uh jump into we'll just use Wikipedia careful don't quote Wikipedia on these things because you get in trouble uh but it's a good place to start uh back in 2011 Google brain built disbelief as a proprietary ma learning setup tensor flow became the open source for it uh so tensor flow was a Google product and then it became uh open sourced and now it's just become probably one of the deao when it comes for neural networks as far as where we're at uh so when you see the tensorflow setup it it's got like a huge following there are some other setups like a um the S kit under the sklearn has our own little neural network uh but the tensor flow is the most robust one out there right now and Koss sitting on top of it makes it a very powerful tool so we can leverage both the carass uh easiness in which we can build a sequential setup on top of tensor flow and so in here we're going to go ahead and do our input of tensor flow uh and then we have the rest of this is all coras here from number two down uh we're going to import from tensorflow the carass uh connection and then you have your tensor flow cross models import sequential it's a specific kind of model we'll look at that in just a second if you remember from the files that means it goes from one layer to the next layer to the next layer there's no funky splits or anything like that uh and then we have from tensorflow Cross layers we're going to import our activation and our dense layer and we have our Optimizer atom um this is a big thing to be aware of how you optimize uh your data when you first do it atoms as good as any Adam is usually uh there's a number of Optimizer out there there's about uh there's a couple main ons but atom is usually assigned to bigger data uh it works fine usually the lower data does it just fine but atom is probably the mostly used but there are some more out there and depending on what you're doing with your layers your different layers might have different activations on them and then finally down here you'll see um our setup where we want to go ahead and use the metrics and we're going to use the tensorflow cross metrics um for categorical cross entropy uh so we can see how everything performs when we're done that's all that is um a lot of times you'll see us go back and forth between tensor flow and then scikit has a lot of really good metrics also for measuring these things um again it's the end of the you know at the end of the story how good does your model do and we'll go ahead and load all that and then comes the fun part um I actually like to spend hours messing with with these things and uh four lines of code you're like ah you're going to spend hours on four lines of code um no we don't spend hours on four lines of code that's not what we're talking about when I say spend hours on four lines of code uh what we have here I'm going to explain that in just a second we have a model and it's a sequential model if you remember correctly we mentioned the sequential up here where it goes from one layer to the next and our first layer is going to be your input it's going to be uh what they call D which is um usually it's just dents and then you have your input and your activation um how many units are coming in we have 16 uh what's the shape What's the activation and this is where it gets interesting uh because we have in here uh railu on two of these and softmax activation on one of these there are so many different options for what these mean um and how they function how does the ru how does the softmax function and they do a lot of different things um we're not going to go into the activations in here that is what really you spend hours doing is looking at these different activations um and just some of it is just uh um almost like you're playing with it like an artist you start getting a fill for like a uh inverse tangent activation the tan activation takes up a huge processing amount uh so you don't see it a lot yet it comes up with a better solution especially when you're doing uh when you're analyzing Word documents and you're tokenizing the words and so you'll see this shift from one to the other because you're both trying to build a better model and if you're working on a huge data set um it'll crash the system it'll just take too long to process um and then you see things like soft Max uh soft Max generates an interesting um setup where a lot of these when you talk about Ru oops let me do this uh railu there we go rilu has um a setup where if it's less than zero it's zero and then it goes up um and then you might have what they call lazy uh setup where it has a slight negative to it so that the errors can translate better same thing with softmax it has a slight laziness to it so that errors translate better all these little details make a huge different on your model um so one of the really cool things about data science that I like is you build your uh what they call you build to fail and it's an interesting uh design setup oops I forgot the end of my code here the concept to build a fail is you want the model as a whole to work so you can test your model out so what you can do uh you can get to the end and you can do your let's see where was it overshot down here you can test your test out the the quality of your setup on there and see where did I do my tensor flow oh here we go I did it was right above me here we go we start doing your cross entropy and stuff like that is you need a full functional set of code so that when you run it you can then test your model out and say hey it's either this model works better than this model and this is why um and then you can start swapping in these models and so when I say I spend a huge amount of time on pre-processing data is probably 80% of your programming time um well between those two it's like 8020 you'll spend a lot of time on the models once you get the model down once you get the whole code and the flow down uh set depending on your data your models get more and more Rob bust as you start experimenting with different inputs different data streams and all kinds of things and we can do a simple model summary here uh here's our sequential here's our layer our output our parameter this is one of the nice things about coros is you just you can see right here here's our sequential one model boom boom boom boom everything's set and clear and easy to read so once we have our model built uh the next thing we're going to want to do is we're want to go ahead and uh train that model and so the next step is of course model training and when we come in here this a lot of times is just paired with the model because it's so straightforward it's nice to print out the model setup so you can have a tracking but here's our model uh the keyword in carass is compile Optimizer atom learning rate another term right there that we're just skipping right over that really becomes the meat of uh the setup is your learning rate uh so whoops I forgot that I had an arrow but just underline it a lot of times the learning rate set to 0.0 uh set to 0.01 uh depending on what you're doing this learning rate um can overfit and underfit uh so you'd want to look up I know we have a number of tutorials out on overfitting and underfitting that are really worth reading once you get to that point in understanding and we have our loss um sparse categorical cross entropy so this is going to tell carass how far to go until it stops and then we're looking for metrics of accuracy so we'll go ahead and run that and now that we've compiled our model we want to go ahead and um run it fit it so here's our model fit um we have our scaled train samples our train labels our validation split um in this case we're going to use 10% of the data for validation uh batch size another number you kind of play with not a huge difference as far as how it works but it does affect how long it takes to run and it can also affect the bias a little bit uh most of the time though a batch size is between 10 to 100 um depending on just how much data you're processing in there we want to go ahead and Shuffle it uh we're going to go through 30 epics and uh put a verbose of two let me just go a and run this and you can see right here here's our epic here's our training um here's our loss now if you remember correctly up here we set the loss see where was it um compiled our data there we go loss uh so it's looking at the sparse categorical cross entropy this tells us that as it goes how how how much um how how much does the um error go down uh is the best way to look at that and you can see here the lower the number the better it just keeps going down and vice versa accuracy we want let's see where's my accuracy value accuracy at the end uh and you can see 619 69 74 it's going up we want the accuracy would be ideal if it made it all the way to one but we also the loss is more important because it's a balance um you can have 100% accuracy and your model doesn't work because it's overfitted uh again you won't look up overfitting and underfitting models and we went ahead and went through uh 30 epics it's always fun to kind of watch your code going um to be honest I usually uh um the first time I run it I'm like Ah that's cool I get to see what it does and after the second time of running it I'm like I like to just not see that and you can repress those of course in your code uh repress the warnings in the printing and so the next step is going to be building a test set and predicting it now uh so here we go we want to go ahead and build our test set and we have uh just like we did our training set a lot of times you just split your your initial set up uh but we'll go ahead and do a separate set on here and this is just what we did above uh there's no difference as far as um the randomness that we're using to build this set on here uh the only difference is that we already um did our scaler up here well it doesn't matter because the the data is going to be across the same thing but this should just be just transform down here instead of fit transform uh because you don't want to refit your data um on your testing data there we go now we're just transforming it because you never want to transform the test data um easy mistake to make especially on an example like this where we're not doing um you know we're randomizing the data anyway so it doesn't matter too much because we're not expecting something weird and then we went ahead and do our predictions the whole reason we built the model is we take our model we predict and we're going to do here's our excal data batch size 10 verbose and now we have our predictions in here here and we could go ahead and do a um oh we'll print predictions and then I guess I could just put down predictions and five so we can look at the first five of the predictions and what we have here is we have our age and uh the prediction on this age versus on what what we think it's going to be what what we think is going to going to have uh symptoms or not and the first thing we notice is that's hard to read because we really want a yes no answer uh so we'll go ahead and just uh round off the predictions using the argmax um the numpy argmax uh for predictions so it just goes to a 01 and if you remember this is a Jupiter notebook so I don't have to put the print I can just put in uh rounded predictions and we'll just do the first five and you can see here 0 1 0 0 0 so that's what the predictions are that we have coming out of this um is no symptoms symptoms no symptoms symptoms no symptoms and just as uh we were talking about at the beginning we want to go ahead and um take a look at this there we go confusion matrixes for accuracy check um most important part when you get down to the end of the story how accurate is your model before you go and play with the model and see if you can get a better accuracy out of it and for this we'll go ahead and use the S kit uh the SK learn metric uh pyit being where that comes from import confusion Matrix uh some iteration tools and of course a nice map plot library that makes a big difference so it's always nice to um have a nice graph to look at um pictures worth a thousand words um and then we'll go ahead and do call it CM for confusion Matrix y true equals test labels y predict rounded predictions and we'll go ahead and load in our cm and I'm not going to spend too much time on the plotting um going over the different plotting code um you can spend uh like whole we have whole tutorials on how to do your different plotting on there uh but we do have here is we're going to do a plot confusion Matrix there's our CM our classes normalized false title confusion Matrix cmap is going to be in blues and you can see here we have uh to the nearest cmap titles all the different pieces whether you put tick marks or not the marks the classes the color bar um so a lot of different information on here as far as how we're doing the printing of the of the confusion Matrix you can also just dump the confusion Matrix um into a caborn and real quick get an output it's worth knowing how to do all this uh when you're doing a presentation to the shareholders you don't want to do this on the Fly you want to take the time to make it look really nice uh like our guys in the back did and uh let's go ahead and do this forgot to put together our CM plot labels we'll go and run that and then we'll go ahead and call the little the definition for our mapping and you can see here plot confusion Matrix that's our the the little script we just wrote and we're going to dump our data into it um so our confusion Matrix our classes um title confusion Matrix and let's just go ahead and run that and you can see here we have our basic setup uh no side effects 195 had side effects uh 200 no side effects that had side effects so we predicted the 10 of them who actually had side effects and that's pretty good I mean I I don't know about you but you know that's 5% error on this and this is because there's 200 here that's where I get 5 percent is uh divide these both by by two and you get five out of a 100 uh you can do the same kind of math up here not as quick on the flly it's 15 and 195 not an easily rounded number but you can see here where they have 15 people who predicted to have no uh with the no side effects but had side effects kind of setup on there and these confusion Matrix are so important at the end of the day this is really where where where you show uh whatever you're working on comes up and you can actually show them hey this is how good we are or not how messed up it is so this was a uh I spent a lot of time on some of the parts uh but you can see here is really simple uh we did the random generation of data but when we actually built the model coming up here uh here's our model summary and we just have the layers on here that we built with our model on this and then we went ahead and trained it and ran the prediction now we can get a lot more complicated uh let me flip back on over here cuz we're going to do another uh demo so that was our basic introduction to it we talked about the uh oops here we go okay so implementing a neural network with carass after creating our samples and labels we need to create our coros neural network model we will be working with a sequential model which has three layers and this is what we did we had our input layer our hidden layers and our output layers and you can see the input layer uh coming in uh was the age Factor we had our hidden layer and then we had the output are you going to have symptoms or not so we're going to go ahead and go with something a little bit more complicated um training our model is a two-step process we first compile our model and then we train it in our training data set uh so we have compiling compiling converts the code into a form of understandable by Machine we use the atom in the last example a gradient descent algorithm to optimize a model and then we trained our model which means it let it uh learn on training data uh and I actually had a little backwards there but this is what we just did is we if you remember from our code we just had o let me go back here um here's our model that we created summarized uh we come down here and we compile it so it tells it hey we're ready to build this model and use it uh and then we train it this is the part where we go ahead and fit our model and and put that information in here and it goes through the training on there and of course we scaled the data which was really important to do and then you saw we did the creating a confusion Matrix with caras um as we are performing classifications on our data we need a confusion Matrix to check the results a confusion Matrix breaks down the various misclassifications as well as correct classifications to get the accuracy um and so you can see here this is what we did with the true positive false positive true negative false negative and that is what we went over let me just scroll down here on the end we printed it out and you can see we have a nice print out of our confusion Matrix uh with the true positive false positive false negative true negative and so the blue ones uh we want those to be the biggest numbers because those are the better side and then uh we have our false predictions on here uh as far as this one so it had no side effects but we predicted let's see no side effects predicting side effects and vice versa in today's video we are diving deep into the world of machine learning interview preparation as we gear up for 2024 it's crucial to be well prepared for the questions that can come your way in any machine learning interview we have compiled 30 essential interview questions and answers thoughtfully categorized into beginner intermediate and advanced levels so let's start with beginner level questions and number one is what is machine learning so machine learning is a subset of artificial intelligence that involves the use of algorithms and statistical models to enable computers to perform task without explicit instructions that is by relying on patterns and interference and now moving to number second question that is what are the different types of machine learning so the three main types of machine learning are number one is supervised learning and then comes unsupervised learning and then there is reinforcement learning now now mov to next question that is third that is what is supervised learning so supervised learning involves training a model on a label data set which means each training example is paired with an output lbel the model learns to predict the output from the input data now moving to the fourth question that is what is unsupervised learning so unsupervised involv training a model on data that does not have labeled responses the model tries to learn the patterns and the structure from the input data so Guys these are the beginner level questions and now we'll move to the fifth question that is what is reinforcement learning so reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions and receiving Rewards or penalties the goal is to maximize the cumulative reward so now moving to the sixth question that is what is a model in machine learning so a model in machine learning is a mathematical representation of a real well process it is trained on data to recognize patterns and make predictions or decisions based on new data so now moving to seventh question that is what is overfitting so overfitting occurs when a machine learning model performs well on the training data but poorly on new unseen data it indicates that the model has learned the noise and details in the training data instead of the actual patterns so now coming to question number eight that is what is underfitting so underfitting occurs when a machine learning model is too simple to capture the underlying patterns in the data it performs poorly on both the training data and new data now move to the next question that is n9th question and the question is what is a confusion Matrix so confusion Matrix is a table used to evaluate the performance of a classification model it summarizes the number of correct and incorrect predictions made by the model and that is categorized by each class now moving to the 10th question that is what is cross validation so cross validation is a technique for assessing how the results of a statistical analysis will generalize to an independent data set it involves partitioning the data into subsets training the model on some subsets and validating it on the remaining subsets this was all about that is the 10th question or the overall 1 to 10 questions for beginner level Now we move to intermediate level and here we'll cover 10 questions so we'll start with 11th question that is what is a Roc curve so Roc that is receiver operating characteristic curve it is a graphical representation of a classifier's performance across different thresholds it plots the true positive rate that is tpr against a false positive rate that is fpr now moving to 12th question that is what is precision and recall so Precision is the ratio of correctly predicted positive observations to the total predicted positives and recall is the ratio of correctly predicted positive observations to all actual positives so the formula is precision equal to tp/ tp+ FP and the recall is tp/ TP + FN so now we'll move to the 13th question that is what is the F1 score so the F1 score is the harmonic mean of precision and recall it provides a balance between the two metrics and is useful when you need to balance precision and recall F1 score is equal to twice into Precision into recoil and that is divided by Precision plus recoil now we'll move to 14th question and here we will cover regularization so the question is what is regularization so it is a technique used to prevent overfitting by adding a penalty to the model's complexity and the common types of regularization include L1 that is lasso and L2 Ridge regularization now we'll move to the 15th question that is what is the bias variance tradeoff so the bias variance tradeoff is a fundamental issue in machine learning that involves balancing the error introduced by the model's assumptions and the error due to model complexity so a good model should have low bias and low variance now we'll move to the question number 16 that is what is feature engineering so feature engineering is the process of creating new features or modifying existing ones to improve the performance of a machine learning model it involves techniques like normalization encoding categorical variables and creating interaction terms so now we'll move to question number 17 and that is about gradient descent so the question is what is gradient descent and your answer is gradient descent is an optimization algorithm used to minimize the cost function in machine learning models and it itively adjust the model parameters in the direction of the steepest Descent of the coast function so with this we'll move to the 18th question and that will cover with the difference between bagging and boosting so the question is what is difference between bagging and boosting and you could answer this with starting with bagging that is bootstrap aggregating that involves training multiple models on different subsets of the data and averaging their predictions then comes boosting that involves training models sequentially with each new model focusing on correcting the errors of the previous and then we have the question number 19 that is what is a decision tree so a decision tree is a known parametric supervis learning algorithm used for classification and regression it splits the data into subsets based on the value of input features resulting in a tree likee structure of decisions Now we move to question number 20 that is what is a random Forest So Random Forest is an emble learning method that combines multiple decision trees to improve the accuracy and robustness of the model it builds each tree using a random subset of features and data points and then averages their predictions so these were the questions that are for the intermediate level and these are just the basic questions or I will just say the theoretical questions that can be asked in an interview so be prepared for that now we'll move to the advanced level interview questions and we'll start with question number 21 and here also we'll cover the 10 questions so number one question or that is 21th question question and the question is what is a support Vector machine so support Vector machine is a supervised learning algorithm used for classification and regression it finds the optimal hyper plane that maximizes the margin between different classes in the feature space and then comes question number 22 that is what is principal component analysis so principal component analysis is a dimensionality reduction technique that transforms High dimensional data into a lower dimensional Space by finding the directions that is principal components that maximize the variance in the data and then comes the question number 23 that is what is a neural network so a neural network is a series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates it consists of layer of interconnected nodes or neurons and then comes the question number 24 that is what is deep learning so deep learning is a subset of machine learning that involves neural networks with many layers that is deep neural networks and it is particularly effective for task like image and speech recognition now I'll move to question number 25 that is what is convolutional neural network that is CNN so we will start the answer by answering the interviewer that a convolutional neural network is a type of deep learning model specifically designed for processing structured grid data like images it uses convolutional layers to extract spec special features or the spal features and patterns from the input data now we'll move to the question number 26 that is what is a recurrent neural network or RNN so a recurrent neural network is a type of neural network designed for sequential data and it has connections that form directed Cycles allowing it to maintain a memory of previous inputs and process sequences of data so this was all about question number 26 and now we will cover the question number 27 that is what is the difference between batch gradient descent and stochastic gradient descent so batch gradient descent computes the gradient of the cost function using the entire training data set while stochastic gradient descent that is SGD computes the gradient using only one training example at a time so SGD is faster but no easier now we'll move to question number 28 that is what is Dropout in neural networks so Dropout is a regularization technique used in neural networks to prevent overfitting and it involves randomly setting a fraction of the neur Rons to zero during training forcing the network to learn more robust features and now we'll move to question number 29 and that will be about transfer learning and your question is what is transfer learning so we'll answer this to the interviewer by starting that transfer learning is a technique in machine learning where a model developed for one task is reused as the starting point for a model on a second related task it is particularly useful when there is limited data available for the second task now we'll move to the last question and the 30th question so that is what is a generative ADV verial Network that is g so you can start answering this so generative adversor network is a type of deep learning model consisting of two neural networks a generator and a discriminator that are trained simultaneously the generator creates fake data while the discriminator tries to distinguish between real and fake data leading to the generator producing increasingly realistic data and these questions and answers are over and these covers a wide range of topics in machine learning and should help prepare for interviews at various levels thank you guys for joining us on ai's full course we hope you found it insightful and valuable as AI continues transforming Industries your new skills would be crucial for staying ahead in the dynamic field feel free to reach out if you have any questions or need further assistance don't forget to like share and subscribe for more content stay tuned for our upcoming videos where we will delve deeper into the advanced AI topics and emerging Trends see you in the next video staying ahead in your career requires continuous learning and upskilling whether you're a student aiming to learn today's top skills or a working professional looking to advance your career we've got you covered explore our impressive catalog of certification programs in cuttingedge domains including data science cloud computing cyber security AI machine learning or digital marketing designed in collaboration with leading universities and top corporations and delivered by industry experts choose any of our programs and set yourself on the path to Career Success click the link in the description to know more hi there if you like this video subscribe to the simply learn YouTube channel and click here to watch similar videos to ner up and get certified click here