Transcript for:
Introduction to Python Strings

in this video we are going to look at a python data type string so strings are a type of data type in Python so if you talk about python strings then these strings these are characters that are enclosed in either single or double quotes like what I have indicated here these ones here so we are going to look at how to create strings how to convert string ourc case how to find the letter of strings there are a lot of content here we are going to cover so we have all of them here I've listed all of them here so we are going to create stram but we are going to also look at how we apply it in the field of biology B so specifically sequences so that means I'm going to explain everything using biological data types and for those who will like the notebook I will leave a link in the description I can use to download so the notebooks Will Made available on my patreon channel so I'll leave the link the description so that at the end of this tutorial if you want to practice or reproduce this exercise you can download them so make sure you check the patreon channel for um exclusive videos so if you don't have python also installed then you can also use Google collab so when you go to Google collab let's check here so when you come to Google collab which is here uh you can just go to file go to new notebook click it and then you have access to a new notebook that you can use to grun the python code so you don't need any additional Library so we are going to use just the python um inbuild functions and then commands so just use Google collab so that means if you're going to use Google collab you need to have access to the internet so just get an internet connection and then you'll be fine okay so we are going to start now so if you on the python interface then we are going to start by creating a string or DNA sequence okay so we are going to create string but we are going to try to apply it in the field of biology so just um start the command so to create a strange let's just say uh my strange so we we assign this to variable so you you assign this so we can say bi informatics so I have placed everything in double code so once they in double code that means that these are string characters okay you can also use single Cotes maybe as we go along I'll use single Cotes so that you be used to both approaches that you can use to create strings let me enlarge this a bit um yeah this should be fine so we can continue okay so this is my string so once I have it I can just print it I have created it so I can print it I'll say print strange and then I'll have bi informatics you can also just do straight away that should work and then you have bi informatics so this one I use it more like uh in the interactive session so I just wanted to display it but if you are going to use that strange in another operation or you going to use it in another command then you have to assign it to a variable like what I have done here so this is a strange let's just also print another one let's say hello world this time I'm using a single quot so this is how it's going to be done so I have Hello World okay so this is how we create strings now we are we are biologist let me just put that because this is python for bu informa so I'm going to switch to bi informatics so let's create a DNA sequence so DNA sequences are also considered as strengths in the field of programming so let's create one let's say my DNA my DNA equals let's see a g c any of them so this is in single Cotes so because these are in single quates they are strings so now you have your DNA so you can print it you can say print my DNA so we have printed our sequence so this is how we manipulate strength sorry this is how we manipulate DNA sequences okay so if you have a DNA sequence you are analyzing it using some tools then behind the scenes these sequences they are strength so you just have to manipulate them as strings and then you can p them of course we are going to manipulate them by looking at all these um activities so just stay tuned and then um continue okay so let's continue now we have a DNS sequence we also have our classical strings and so we can continue and then start some stuffs okay so let's uh go let's go to the next section so in the next session we are going to look at converting string to lower or upper case so this how we do it in Python I'll create another string here again so that we can use this so let's start with one string let's say strange let's see I'll create this again yeah so if I have the strange and I want to convert to lower case then I can say My Strange dots lower let say do lower and then I just execute the command so now everything is in lower case let's try another one let's say my string two just time let's use all upper case let's say hello I want to change it to lower case I'll just say string 2 lower so this has been done for me so this mean that if you're going to change it to a lower case or upper case you need to specify that strange object first okay so once you do that then it becomes easy for you to manipulate them using the inbuild functions in Python so this is how that we do it okay so we've done that we've converted or we've changed a string to lower case now let's look at how we can change to upper case but before I even do that let's look at something here you may have your DNA sequence let's say like this let's a GTC like this so if you want to change it to lower case then it's do lower then you get it in lower case now let's look at the upper case so for upper case let's let's use a DNA sequence first I will just do that here I can say DNA equals let's say at GC so to change to uppercase then it becomes DNA Dot Upper so this is how we do it or sometimes you will have it in a lower case like this and then maybe for formatting reasons so you want to present them in uppercase then you can just use this to do that let's try another one let's see inform this say strange so I'll just say upper so I have that also given to me there's other thing we can do sometimes you can have your strange all in lower case and then maybe because it's a name let's just try this let's let's use something let's use this name my name was let's see let's see byy informatics if this a name and then you want to have the first character as capital so then you can say my name do Capital let me just confirm from here I have some stuffs here that we can check okay um it should be somewhere here okay yeah so we have done that so my name do capitalized so this is a z take note of that so now you see that the first letter has been capitalized so this is how we change the cases when dealing with strengths okay so let's continue let's move on to the next activity so we are going to look at how to find the length of strength and sequences so if you want to find the length of strings or in other words the number of character that make up a string then we use the length function let me show you how let's just use this again my strange because hello world if I want to get a length I'll just use the Len function Len and then I'll indicate my stren so I'll have 11 here okay now before I proceed with this let me say that all these activities here you can also assign them to variables but it's something that we are going to look at later on now let's get back to the length of the strengths and sequences so if you have this it's 11 if I beginner you may be tempted to say that the number of characters here the L is 10 because we have hello 1 2 3 4 5 world 1 2 3 4 5 so you are supposed to get 10 but we have 11 this is because a space is also a character in Python okay so all the alphabets here have been counted plus the space character okay plus the space character then that's is what gives us the length everything in total that's what it is so bear in mind that in addition to the alphabet there are other characters that python also includes whenever it's counting or finding the length of strength so take note of that okay so now we can proceed now let's switch to a DNA sequence let's type this let's say DNA again let just use this one here or you can also use your own DNA sequence you have 1 2 3 4 5 6 7 is it a 1 2 3 4 5 6 7 okay this is seven so if you want to find the length then it becomes Len DNA so it will give us seven okay so that's how we find so in other words we have F seven bases okay this sequence is made up of seven Bas or seven neyes okay so now we can continue so one of the reasons why it's important to learn programming is that in a number of cases you may not have this short sequence you will have maybe a sequence which is very long and so in that case you can't count it will be difficult to count that is where programming comes in let's take a look at this situation here if you take a sequence like this I'll just type everything here I'll copy everything here okay I have a sequence like this which is very long so it's going to be very challenging trying to count but with the L function here you can count it so we have 460 MCR ties or basis okay so this is the power of programming you can assign this operation to so you can just say length length of sequence equals and you say l DNA so then we can use this value later on we can say let's say print length of sequence and then you have for so you can use this operation so for example if let's say you want to to get the GC content then you have to count the number of GCS divided by the length multip by 100 or whatever and then so in that situation you can just assign this operation to a variable and then later on compute the percentage GC okay so that is how we find the length of sequences now let's check for the presence or absence of characters or sub strengths or subsequences if you call it that way as well so this time let's let's try this let's get it let's see my strange let's see I love programming if let's say I want to check for a character let's say I want to find if there is a p in this particular strange then I can just say p that's the query what I want to find in my string so if that character is there then we are going to have true return to us if let's say that's character is not there let's try let's say Z in my strange what do we have falce because Z is not here now let's try this let's say p upper case p in my strange we have Force you may be tempted to say that P is p this and this are all P but why do we have false is because in Python in Python is case sensitive so that means that this P upper case p is different from lower case p and so you need to also pay attention to the pces that you use when dealing with strengths okay so here is FAL because there is no upper case P here but this is true because there's a small letter a lower case P here so the same applies to all the other alphabets as well okay so let's continue you can also check for the par or cars or sub using another command which you call the find so you can say my string. findind okay so this will return a value so this value here represents the index the starting index so we have 1 2 3 4 5 6 7 8 okay this is number eight but indexing is um seven I'll talk about indexing later on so don't worry now for I just know that when you use a find function it tells you the starting points in that particular sequence of strength let's say love a strange find love is two so we have 012 okay start starts from two the index two okay yeah um we'll talk about indexing later on so just stay tuned then you get to understand why we have these values there and you should also know that in Python Counting Start from zero I'll talk about all these things I mean the indexing Parts okay so let's go now we are going to look at counting the number of characters or substrings this is something that's that this is something that we always do to find the the number the number of occurrences so we will talk about that also here I will start with by forat again my stren equals biomatics now let's say you want to count the number of O's in this string then we have to say my string.c counts and we indicat the character or the substring of Interest so this will give us the count so we have one two and by the way uh this particular section here checking for the par or absence of characters um there are other ways you can use to do Advanced queries okay you can use regular Expressions if you're looking for specific patterns it's something I I did not make mention of so you can do more advanced queries here but this is something that I'll cover in another tutorial so for I'll just do with what you have now okay now back to the counting of the number of character sub substrings so we have my string bi informatics we are counting the letter O we have two of them here this and this so we use the my string that the string objects and then do counts so let's try again here by informatics counts o so we have what two because we have o and O here again python is case sensitive so if I do this as string. count oh you get zero because o is different from this so take note of that o is different from this okay so let's continue now let's use a sequence we are getting to the interesting part let's use a sequence let's say DNA equals ATC let's use any random one like this here you can use any sequence that you also have if L I want to count the number of a that's adding it then I just have to say D do count a and I'll get five because we have five of them let's count the number of gen burning DNA counts I have four of them let's repeat for C and now let's repeat for T okay so we have them here so what can we do with this we can do this and we can use it to find the GC contents so if let's say you want to find the GC content content the number of gen and C because sometimes the DC content is a property that we use to um make make decisions or make inferences okay or we use to evaluate or decide if a particular sequence has to be treated to this or not or we even use that to explain um certain properties okay or uh yeah certain properties of sequences or certain behaviors of sequences then in that situation you have to find a GC contents okay there are tool that can be used to find you content but behind the scenes it is these little little codes or commands that are being used to do the computation for us so that's why if you know the basics of this if you know it's from the beginner level if you know the foundation and it helps you to understand how these tools or programs work now let's take a look at the GC content we are going to apply what we FL so far to do that now I want to find a DC content for jeans and then the C so what do you do you can just do this now we have G we have C which is 4 and one I can say JC equals DNA do count G Plus VNA count C so now can print JC so this should give us five so this is GC now if you want to find a percentage GC then what do you do you first have to find the length also and then you can divide the GC ID by length * 100 so let's try this now we know GC so let's look at the percentage so we need to find the length so say length equals L DNA and now to find the percentage DC we can say what DC over length multiply by 100 this will give us our answer here okay so we have GC that's the number divided by the length total length time 100 or to be on the sa side we can just say GC over lens we put them in parenthesis and say star 100 and we get our answer here so this means we can assign the entire operation here to a variable so you can say percentage GC equals and then we paste the entire operation there and then we can get our answer so we can now print percentage or if he wants any other information if you want extra information you can just say print percentage AC is bring your comma and say perent GC and then you have it nicely done because sometimes you will do your computation but then you need to describe it because if you put just the number here then the user may not understand what this is about so if you add a bit of discription like this one then the user knows that yes this is percentages okay so if you're running programs and you have text message or you have a text being displayed in addition to the numbers then you have this little little code that are been used behind the scenes okay so we have percentages which is good so how will you find per and perid I'll leave that to you to find out now you know how to count them so I'll leave you to find that for the counting you can also count the number of occurrences for example let's say we have some pattern that you want to count you can also do that okay so let's try this let's create another sequence let's say DNA 2 because a c now if I want to count the number of ginges and I can say DNA 2 count so we have 42 because you have J one and we also have another which is two so this is how we count them okay so bear in mind that when you are counting or you are looking for patterns there are better ways of doing it for example you can use regular Expressions to do that okay but this is something that I'm going to cover in that tutorial or if you have any challeng like that you can always contact me and then check with me and then we can arrange some session together if you want to book a time with me I've left some links description that you can use to contact and also book a session me for further Tut us okay let's continue now what about if you want to replace characters or subst strings that is what we are going to do next let's say I have my strange let's say Str cross let's say programing and then here I put a g here so if I want to replace this G with the lower case for example I will just have to call my strange the strange object and say do replace so I start with what I want to remove what I want to replace in this case G we replacing this with a lower case so the G comes comma and then what we want to add that's a lowercase gen so let's execute this command so now we have programming this has been done nice so this is how we replace characters if let's say you want to replace maybe a pattern let's take for example we have yeah let me just change this you can also do that so you can say My Strange do replace bring that pattern of Interest let's see let's do this so it will also be replaced for us now let's switch to DNA sequences let's say DNA let's do it again if let's say I want to replace the T with use okay then I can just say DNA just replace t u okay so they have place okay okay and you know that rnas are the ones that contain the use okay so I have this done so with some additional commands you can actually change your DNA sequen or convert them to RNA sequences but this is something that I'm going to cover in another tutorial where you have to find the reverse and do the SWA and all that but uh for now this just a simple command to let you know that it's possible to convert the DNA sequences to RNA okay by performing some operations and then also swapping the T with the use okay so now we have swapped the T with the U take note that whatever we are doing here is just a copy of the sequences and then modifying that okay so if you want to assign that entire operation to a variable then you have to add that variable so let let's just assume let's just say that changing this T here to you gives us an r or a transcript let's say RNA then I can just say RNA equals and then I indicate my command like this and then I execute it so now I can just print R and I have it done nicely for me okay so this is we replace characters or sub strings okay in Python so let's proceed now we are going to look at how to combine strings or sequences so if you want to combine strings and sequences and there are a number of ways you can do it okay I'll give you some approaches here so let's start with the first one by the way combining strange of sequences we also call that concatenation so let's just look at some of them let's say I have DNA one plus let's say aenc AGT I have DNA 2 TT C or TT yeah so if I want to combine I can just say let's say combined because I can say DNA 1 is supposed to be this way plus DNA 2 so now I can print it I can say print combined so that will give me my my whats my combined sequences okay let me just add this that this is approach one okay now let's try with a name let's see first name equals informatics second name equals Co so if I want to combine them let's say I want to get full name to be equals to what first name plus second name so now let's print it you have it this way okay so there's a name so you expect that there should be a space character okay so you will be tempted to just do it this way you can just you can just say this plus a space character plus this okay and then if you print it you are going to get the full name with a space but there are much elegant ways of doing this so that leads me to string formatting okay so let's look at approach two approach two and approach three are strange formatting so in approach two let's just copy everything here the first and second names again so if I want to get a full name I can say full name equals and then I bring my quotes and I'll say percentage s space percentage s so this these are string formatting um techniques and then bring this percentage symbol again and I say first name second name there is something that I'll cover in details in another tutorial okay so if you have any challenge you want further explanation you can always contact me so I can have it this way so now if I print it I should have byy informatic coach okay so this and this are placeholders okay the S here indicates that we are going to um add a strange character this is also a strange character okay and so these are the string C we have one two the reason why this is a better approach to this the first one here approach one here is that yeah it's easier to combine strange characters with with numbers okay it's easier to combine strange characters with numbers and other data types it's possible to do that when you have the strength formatting so if you take a look at our percentage for example let's check this one here I can I can do the same thing using this approach here okay I can I can let me just use the print statement um or let me just I can I can say this percentage s or let me just use that because we already had a statement there so percentage JC yes and I can say percentage D or 0.2 F this is a float a float character and then I can just bring the percentage this so that will also give me an answer so when you are combining strings you can use the string formatting for that now there's another strange formatting approach that is also um available that's the strange do format so that's approach three so in our approach three let's just create this one again so I have it this way so I can just say name equals I put and I'll use a ky bracket and another KY bracket and I'll say do format and I'll see first name second name so now I can print it full okay you can do some Advanced operations with this particular approach as well but that is something that is beyond the scope of this tutorial so let's proceed now let's go to indexing and slicing so indexing and slicing um this technique is used to assess sub strengths or characters or nucle ties by specifying an index or range of indexes where we have start and end and in some cases a step or strip value so there's a syntax we have S that the stranger sequence we have m n that's the starts and then DN and then d s that's the step of strip so let me also use this to explain something so in Python Counting Start from zero okay so if I have this let me do this if I have this sequence yes I have DNA plus ATC let me use upper case now if let's say I should ask you to find the DNA in position let's say position two in indexing we can just say DNA and then we give the index value but because Counting Start from zero the position two will have an index of one that's how it's going to be so let's run this we are going to get G okay so this is how indexing looks like so we can use indexing to do a number of things we can use that to find patterns or even find or exract sequences stct particular regions okay of a sequence okay so let's look at how to do that I've covered this indexing and slicing in detail in another tutorial so I'll enquire you to watch that video I leave the link to that video the description box for you to check but for now I'm just looking at it at a glance so let's start I have indexing and slicing I also have negative indexing which is used to assess the last element of a stranger sequence we also have short hand indexing we will go through all of these okay now let's start with this um first one here let's say DNA cross let's try any one now if let's say you want to find the DNA or let's say you want to find the the nucleotide the nucleotide in position three that would be index two okay so know that what the counter start from zero so the position of interest that you want you always have to subtract one from it so it becomes minus1 that is if you are making the query from this to this side so let's say let's say DNA or let's just say um we want the thir nucle let me yeah let me just use that one let's say he want the thir nucleotide then it become DNA 2 because coun start from zero in Pyon so the third nucleotide will have an index of two okay that's what we have here so this one differs from R because in our C start from one so if you're doing R and you're new to Pyon it might sound a bit confusing but it's just a matter of you subtracting one from that value of interest if you want the fourth nucleotide then becomes DNA or the strange in question three that will be the index okay this what we have let's try with another string let's see have let's see hello let's see strange one now I'm shifting from DNA sequence so if I want the letter the character in or if I want the T character then becomes strange one two that's what we have here now let's try this we have been using just one value but what about if you want a range and if it's a range that you want then you have to add a start and then end so let's go back to our DNA here let's copy it I have this let's say I want a DNA sequence okay of let's say I want the range let's say from the third I want the third nucleotide up to let's say there six nucleotide then what do I have to do let's start so set DNA two because I want the third from third to let's say the six so because I want the six from 10 to six I need to add an extra value you know third will have an index of two the six nucle will have an index of five so I have to use 2 is to six when you specify the query like this python will return what the startat up to whatever you have here minus one so in this particular command we are going to have two index two up to index 5 being retained okay so this six here is more like a boundary so that python does not go beyond what it's supposed to do so let's run this command so we are now going to have AG okay so we have the third which is is two 0 1 2 so 2 3 4 5 okay we have five here so if let's say yeah so we had for so there's the actual position we have position or the test so I said third to six so that means 3 4 5 6 and that is what we have here 2 to six okay so that's what you have this uh you just have to know the rules the minus one rule and then after that everything should be fine with you so that's how we do it let's try again let's say we want the second nucleotide up to let's say the fourth so then it becomes DNA 1 2 4 we want the second as one the second to Fourth that would be three okay so it's supposed to be like this this what we actually want so we add plus one to it like this and then we can execute so you're going to have fourth second third fourth okay so that's what we have here so just make sure to check the um documentation for Python and then you can understand whatever is going on here of course you can also reach out to me if you want the one one tutorial on python for informat so just check the description box you are going to to find my contacts there okay so let's proceed so that's for the indexing and then slicing okay we've done this one where we just specify one value we've also use a range now let's look at the step this is very interesting so I want us to look at it now let's recreate this DNA sequence and then let's see we have a sequence here now let's experiment let's say we want maybe the first to third okay or let's just say we want let's say I think it's better I type this so that we can we can we can look at it so let's let's do this let's say I want I say this DNA and I want everything here I want the interval of three that's where the step or stri comes in so if let's say I want if I take all this DNA all this nucle here I want to get some nucle but I want the third I want an interval of three so that means that I want the third of this the third the thir the third so then we can just say let's say zero nine and then three I want something like this so let's run this okay so let's take a look at the code here so I have 093 so what it means that we are interested in this range that is 0 to 9 but of course the 9 here itself will not be included so in reality what we looking at is 0 to 8 so the step here is three so that means every third number so we will start with the first one here which is zero that is a okay now we count 1 2 3 that is g that is what is here and now we count 1 2 3 that is another G okay so that is the step I was referring to okay that's what we have here so you can always swap or you can change the values to see what comes up so this principle can be used to generate codons this is something that I've also cover in tutorial but I you can check it I'll leave the link to that video in the description box for you to check so it is this principle where you have start end and step that can be used to get the codons for you so make sure to check that video I think it's going to be very helpful for you if you want to learn further so this is how we use the step as well okay so now we have our indexing and slicing we have one index we have start and end we can also add a step okay so just take not of that now let's take a look at other thing here let's look at negative indexing let's um remove this ones first so negative indexing is used to assess the last element of a strange so let's look at how that can let's recreate this sequence here I just paste it and then let's look at what I mean by negative so negative index mean that the value that you are using mean we specify 2 six for ETC we make it negative so with negative indexing let's try let's dna1 so with negative indexing counting starts from one we don't use Zer we use one so if I want the last element then becomes netive one so that means I'm starting the counting from this side okay so just like how we have five Prime and three prime for DNA sequencing then um sometimes you go from five to three prime sometimes three to five Prime okay so we have something like so if I want the last last nucle the last character then I can use this to get it so let's see how we get it so it's a this the last of course here if we have a and a let's just um use this to get started let's change this one to T now let's do it again DNA one okay so we have t if you want this one okay the nucleotide just after the last one then of course you have to use two so we can say Das two so that will give us G okay now let's try something you can also use this technique to also get um the reverse sequence as well but that's something that we are going to look at in the next session so you can also do this so let's do this so let's say DNA um I start with let's say four I my colon and I say1 I'm going to get some values here okay so that is also possible to students okay so let's proceed okay um I have a message from Rodriguez okay so okay thanks Val for the message yeah so just make sure you learn some stuff I think this live stream I'll be doing it weekly by the way so um Tuesdays will be for Python Programming Wednesdays will be for linox or B scripting and then Thursdays will be for R so just make sure to um come and then join the live stream so before I proceed I also say that with the timing it's still experimental so I'm trying to look for the time that's works for a lot of people because of the time zone so um just let me know what what you think about the time initially I said 3:00 p.m. but then later on I made mention I was going to start at 5:00 p.m. so I'm still experimenting to see uh the time that will work for most people so just make sure to join and then learn something new and again the notebooks and everything will be made available so just check my patreon channel and then you have access to all my materials not just the program material but for the YouTube tutorial that I make I put the B scripts and everything in patreon so just make sure to check the patreon and then just check them I'll later um give a video that describes the patreon um system that I use because there are um levels or TI so just take not of that okay so let's proceed so we were we were on the negative index so this is what we had yeah the earlier one there were some um errors so you can also do something like this so you get from position four to the last one okay there are uh SES you can make this tutorial is the introductory one and so there are things that I may not be able to cover for now but I hope this tutorial is going to start serve as a start a starting point for those who want to learn programming um so um thanks for joining all of you and then just make sure to learn some new stuff okay so let's proceed so there is for the negative indexing um so um let's try this let's try this if it will work let's try another experiment here to see if that will work um we are experimenting and then I'll see so this time I'll say4 to1 let's see okay perfect that has also worked okay so that means that the earlier errors I made I should have rather started with this and with this so I think it's fine I've also learned something new here okay so let's proceed so there is for negative IND so here we are starting from 1 2 3 4 that's one here up to J okay again the limiting of the boundary still persists so that means if I have minus one it's minus two that'll be the last nucleotide that will be returned to me so that is why I have this here okay so let's proceed now we will continue this is for negative index and we also have short index which is also here let's try this um so shorthand indexing is done by omitting either the first or the last index so I have some Shands here so we have this one here starts the slice from beginning of the sequence to index just before and we also have this this and that okay okay so um I think this one let me just remove this one this one was an error I will change it we use this one rather minus one yeah so um let's look at the shorthand indexing so I made mention it is done by editing the first or last index so we have some shorthand indexing techniques here so we are going to look at all these ones here so just download The Notebook and you find find the statements also there so I have all of them here um yeah so let's try let's try the first one let's um create our DNA sequence again I will paste here and then I'll will use that so let's start with the first one S is to n so in this case become D so let's say s is to and let's say 5 so that means that python return return all the nucleotides starting from the first in index zero up to index 4 that's what this statement means so let's run this so we have this so we have 0 1 2 3 4 we have a gen a g t okay so we are starting everything okay so notice what we have here we have a g a g t 0 1 2 3 4 okay that's the index okay again the minus one applies so even though we have five it's going to be 0 1 2 3 4 that's the IND that's what we have here let's do the second one so I say DNA let's say five so here we are going to get we are going to start from position five up to the end the last that's what this one means so let's run this so we are going to start from here up to the last which is here so 0 1 2 3 4 5 position let's do it 0 1 2 3 4 5 that's a okay this to the and that's what this other one now let's look at this one here this one what DNA this will return a copy so it to return basically a copy of the sequence of Interest okay the sequence that we want to make the query on so it's just a copy now let's look at this one here this Returns the reverse of the entire string Okay so um that one is s let's say yeah DNA DNA and I have this Das one and this will return return a copy that's reverse so that's what you have here now let's type DNA this is what we had and now let's look at DNA this one that's a reverse okay so I have that also here okay so that's what I have Okay so this forend the reverse of for us that's the reverse sequence so reverse sequences I've also covered them in detail in tutorial so just to the description box you'll find the link to that video there okay and for these things I've done I've made for some of the section some of the parts I've made detailed videos on that and so if you want detailed explanation then just check the appropriate video and then use them okay so this uh for the other one now let's check so with strengths you can also um manipulate them using other functions which I've not covered I've not covered all of them but you can look at what we have here just experiment and then use the ones that are appropriate for you okay so this one like this I'm going to cover that in another tutorial okay because uh we are going to cover the list objects in our tutorial so this ones to just read them get the explanation and then you'll be fine so just practice and then everything should be fine for you so that's it so for beginners I will inquire you to just practice the Python Programming and then it's more about being consistent and I believe um me starting from scratch is going to help you beginners so as a matter of fact there's nothing to be scared of when it comes to programming you just need to um practice okay and I think you being biologist and me using the DNA sequences makes it less scary and more enjoyable so just make sure you check the description box for useful uh videos okay thanks um I also have some information says uh this is a good lecture please share experience of the use of python F of live data set okay yeah so um for live data set I mean I've covered some of them but it depends on what you want to achieve at the end of the day but I think if you have some topics you can just bring them up and also um because I always have lots of requests you can also sponsor a video and then I'll try and then make that for you so yeah that's also fine so again you can support this channel uh you can also sponsor videos anyhow you want to support the channel it's accepted so just make sure you support links that you can use to support and the description box okay so let's proceed so just check this additional functions here and then you can use that to manipulate your your strings and then sequence so um whenever you are analyzing sequences when you are using these tools to analyze some of these sequences um behind the scenes um what is happening is these little little codes this not Little P but these simple simple codes that are here so just a combination of these simple codes that give us those um results or those those statistics that we get when we analyzing um sequence data so make sure you check um this video reproduce and then just be expert be honest uh develop an expertise in the field of uh Python Programming and bi informatics okay so now let's proceed now let's look at this exercise here okay uh this is the last thing we are going to do but I think you can also try and then just answer them if you can but um you will try to answer that here I will just do this we are going to answer them all here here so I have this sequence says consider a DNA sequence this how many nucleotides are present okay so let's start with question number one we have how many are present calculate GC percent GC how many perin percent of PES Etc so we are going to take a look at these questions one by one I think it's going to be very helpful so question one how many nucle ties are present before we even do that we need to what first create this sequence it's a strange so that means we have to create the sequence okay so let's say DNA let's use this so I have that created so now I can answer so question one let me check um question one is how many nucleotides are present so in other words we are looking at the length of the sequence so if you want the length then that's what Lear what DNA simple then we get our answer so we have what 23 nucleotides that's what we have here basically now let's look at question two calculate the percentage GC so how do you do it first you need to get a GC contents so that means what GC equals to what DNA do [Music] counts plus what DNA do counts C and then you also need to get the length which is what L DNA which I've already covered and then percentage B is equals to what DC over length * 100 so now you can print your GC percentages we have it here okay now we can convert or we can change the we can um change the decimal places if let's say you want two decimal places then you can just say percentage GC cost round percentage GC then to let's say want to the smallart places so now I can print it and then I'll have my answer so I have 5217 okay let's look at question number three how many PS are present pins are a and then G so get a pins is equals what DNA do count a plus what dna. counts G yeah let me make it a BP I see number of RS so now I can print this number of it's 15 question number four how many periers so it's the same principle here okay by this time we change let me just do it again say number of perid so that DNA do count and that'll be what see plus DNA do count T we have CNT so now I can print it number of remedes so I have eight so this was for question number four now let's look at question number five question number five is calculate the percentage of pins in the sequence so if you want pins I already have that here I will just um copy this code here so if it's percentage of pins I need to get my number of pins I need to get my length which I already done here I'll just do a copy and paste here um and then when I come to percentage percentage of appearance then it becomes what number of over length St 100 that's times I for this mathematical operations I'm going to cover them in another tutorial so uh for now I'm just giving them bits I'm giving it them to you in bits okay so I have this so now I can print percentage of prints okay so I have that here again you can change it to two decimal places so this one I'll leave it to you so that's your assignment number six question number six which nucleotide is in the 10th position so nuc in the 10th position is going to be what um let's just see um let's just use this um nucle number 10 it's going to be what DNA 9 because counting starts from zero so I can print it so that's G so you can count you can just count them manually 1 2 3 4 5 6 7 8 9 10 that's G but of course indexing we start from zero so it's what n um let's look at question number seven write the sequences that are present in position 3 to position 9 so here we are looking at a range okay so nucleotides what's position three to 9 will be equal 2 but DNA two because we're looking at positions three to or the third nucle to the nth nucle let me just put it that way um so 2 2 9 so why are we saying n because we want um the third nucleotide to the ninth nucleotide okay so the N nucleotide will have an index of eight but we have to add plus one so that we get that value itself that is why we are adding this anyway you can always experiment some of these things it may sound confusing but then when you practice when you repeat them when you repeat them you do it you make mistakes you repeat you repeat then you start to get a hang of things so just make sure to um practice and then when you do it for some time you understand why some of this things are there and some of them are rules I can't change them so uh these are the rules that were set by the creators or the creator of python so you just have to follow it's just like learn a language There Are Rules so you just have to F the rule and then you'll be fine so let's run this I'll print nuti 3 9 so I should have the answer here now let's look at question number eight what command will you use to find which nucleotides are present from position 10 to 20 what command will you use to find so it's basically the same thing here okay the same thing I've done here so this time I will just copy and then I'll modify the code here to give me 10 to 20 so let's see let's see let's see let's see so 10 to 20 that be what n okay let me use the actual value here but when it come here it becomes nine this now I can print it I'll just print that here okay so that's what we have here so this is what 9 2 20 so this 20 here will not be included but we have up to 19 and 19 is the What 20th position okay so let's go now we are going to question number n does the DNA sequence contain g a question number n so we have to look for this so how do you do this oh let's see so we have to just query I can just do this this in DNA and it's true okay it's there if it's there then how do we locate it so then I can use what let me just add that here I can just say let me use a print statement here first I can say dna. find to locate where it starts yeah let me use a print statement so that everything will dis play so 9 it start from index 9 so let's check index 9 0 1 2 3 4 5 6 7 8 9 that's G that's what we have here okay so take not if you want the exact range then you have to use regular Expressions to do this but this is something that I'll cover in other tutori it will be a video so for now um I'll skip it question number 10 what do we have how many times does TT occur so you can say DNA Dooms TTA two times okay uh please take not that this particular approach sometimes to uh may not um be appropriate okay and there are situations where you have let's say you have TTA you can also have um other um nucleotides making it a bit complex and so there are better ways to count okay you can use regular expressions or if this was to be a list data type then of course this one will work for all situation but for the strange data type this approach sometimes may not work okay so just take note of that but of course that's why you are learning so you can just build on whatever you've been taught today let's look at question number 11 at which index does GTA begin that's question number 11 so of course here we can use the find to do that I can just say DNA that's fine and then I'll just paste it so I have one okay index one let's see um let's check 01 we are looking for gtta let's check yeah so we have um index one but here what you should note that if there are if um GT occurs more than one okay if um we have let's say two or more of the gtt of the gtts then this approach may not work that's why I recommend that to use regular Expressions okay in some situation they find and the account command may not work appropriately so I just have to look at the situation and look at the appropriate method can be used but regular Expressions that one I mean you are short of getting the right results when it comes to the find command now let's look at the last question that's question number 12 does the sequence contain the character Z of course this is simple so we see what Z in what DNA it's false okay so here you know we have DNA sequence we have protein sequences so for DNA sequences we should have G we should have a c GT if you're dealing with then you you you also are expecting to find you okay but if you see any other character like you have Z then that means that is invalid it's an invalid sequence okay so I mean you can also use this to uh check if there are characters in your sequence which are not supposed to be there so that's how we do it okay so so um this is it for this tutorial and then just make sure to check the description box you'll find the link that you can use to download this notebook as well as other buy information Tut that I make so just to the description box and then get all that information so if you also have any other comments if there are things that you think should have been added then you can just put them in the comment section and then you can all discuss and then learn something new so that'll be all for this tutoria