in this video we are going to look at a python data type string so strings are a type of data type in python okay so if you talk about python strings then these strings these are characters that are enclosed in either single or double quotes like what i have indicated here these ones here so we are going to look at how to create strings how to convert strings lower uppercase how to find the letter of strains there are a lot of content here we are going to cover so we have all of them here i've listed all of them here so we are going to create string but we are going to also look at how we apply it in the field of biology basically so specifically sequences so that means i'm going to explain everything using biological data types and for those who will like the notebook i will leave a link in the description i can use too download them so the notebooks will be made available on my patreon channel so i'll leave the link in the description so that at the end of this tutorial if you want to practice or reproduce this exercise you can download them so make sure you check the patreon channel for exclusive videos so if you don't have python also installed then you can also use google collab so when you go to google collab let's check here so when i come to google collab which is here you can just go to file go to new notebook click it and then you have access to a new notebook that you can use to run the python code so you don't need any additional library so we are going to use just the python um inbuilt functions and then comments so just use google collab so that means if you are going to google collab you need to have access to the internet so just get an internet connection and then you'll be fine okay so we are going to start now so if you're on the python interface that we are going to start by creating a string or dna sequence okay so we are going to create strings but we are going to try to apply it in the field of biology so just um start the command so to create a string let's just say my string so we assign this to a variable so you assign this so we can say bioinformatics so i have placed everything in double quotes so once they are in double code that means that these are string characters okay you can also use single quotes maybe as we go along i'll use single quotes so that you'll be used to both approaches that you can use to create strings let me enlarge this a bit um yeah this should be fine so we can continue okay so this is my string so once i have it i can just print it i've created it so i can print it i'll say print a string and then i'll have bioinformatics you can also just do straight away that should work and then you have bioinformatics so this one i use it more like in the interactive session so i just wanted to display it but if you are going to use that string in another operation or if you are going to use it in another command then you have to assign it to a variable like what i have done here so this is a string let's just also print another one let's say hello world this time i'm using a single quote so this is how it's going to be done so i have hello world okay so this is how we create strings now we are we are biologists let me just put that because this is python for bioinformatics so i'm going to switch to bioinformatics so let's create a dna sequence so dna sequences are also considered as strengths in the field of programming so let's create one let's say my dna my dna because let's see a g c t any of them so this is in single quotes so because these are in single quotes they are strengths so now you have your dna so you can print it you can see rains my DNA. so we have printed our sequence so this is how we manipulate strings sorry this is how we manipulate dna sequences okay so if you have a dna sequence you are analyzing it using some tools then behind the scenes these sequences they are strings so you just have to manipulate them as strings and then you can research them of course we are going to manipulate them by looking at all these activities so just stay tuned and continue okay so let's continue now we have a dna sequence we also have our classical strings and so we can continue and then start some steps okay so let's go let's go to the next section so in the next session we are going to look at converting strings to lower or uppercase so that's how we do it in python i'll create another string here again so that we can use this so let's start with one string let's say strange let's see i'll create this again here so if i have the string and i want to convert to lowercase then i can say my strange dot lower let's say dot lower and then i'll just execute a command so now everything is in lowercase let's try another one let's say my strange two this time let's use our upper case let's say hello if i want to change it to lowercase i'll just say string 2.lower so this has been done for me so this means that if i want to change it to a lowercase or uppercase you need to specify that string object first okay so once you do that then it becomes easy for you to manipulate them using the inbuilt functions in python so this is how that we do it okay so we've done that we've converted or we've changed a string to lowercase now let's look at how we can change to uppercase but before i even do that let's look at something here you may have your dna sequence let's say like this let's say a g t c like this so if you want to change it to lowercase then it's dot lower then you get it in lowercase now let's look at the so for uppercase let's let's use a dna sequence first i'll just do that here i can say dna equals let's say atgc okay so to change to uppercase then it becomes dna dot upper so this is how we do it but sometimes you will have it in a lower case like this and then maybe for formatting reasons so you want to present them in uppercase then you can just use this to do that let's try another one let's see find for myself this is strange so i'll just say dot upper so i'll have that also given to me there's other thing we can do sometimes you can have your strange all in lowercase and then maybe because it's a name let's just try this let's let's use something let's use this name my name because let's see let's see bioinformatics if this is a name and then you want to have the first character as capital so then you can see my name dot capital vice let me just confirm from here i have some stuff here that you can check okay um it should be somewhere here Thank Okay, yeah, so we have done that. So my name dot capitalize. So this is a Z.
Take note of that. So now you will see that the first letter has been capitalized. So this is how we change the cases when dealing with strings. Okay, so let's continue.
Let's move on to the next activity. So we are going to look at how to find the length of strings and sequences. if you want to find the length of strings or in other words the number of characters that make up a string then we use the length function let me show you how let's just use this again my string equals hello world if i want to get a length i'll just use the length function link and i'll indicate my strength so i'll have 11 here okay now before i proceed with this let me say that all these activities here you can also assign them to variables but it's something that we are going to look at later on now let's get back to the length of the string sequences so if you have this it's 11 if i begin you may be tempted to say that the number of characters here the length is 10 because we have hello one two three four five world one two three four five so you are supposed to get ten but we have 11 this is because a space is also a character in python okay so all the alphabets here have been counted plus the space character okay plus the space card then that is what gives us the length everything in total that's what it is so bear in mind that in addition to the alphabet there are other characters that python also includes whenever it's counting or finding the length of strength so take note of that okay so now we can proceed now let's switch to a dna sequence let's type this let's see dna again let's just use this one here or you can also use your own dna sequence we have one two three four five six so it's is it a one two three four five six seven okay this is seven so if you want to find the length then becomes len dna so to give us seven okay so that's how you find so in other words we have what seven bases okay this sequence is made up of seven bases or seven nucleotides okay so now we can continue so one of the reasons why it's important to learn programming is that in a number of cases you may not have this short sequence you will have maybe a sequence which is very long and so in that case you can't count it will be difficult to count that is where programming comes in let's take a look at the situation here if you take a sequence like this i'll just type everything here i'll copy everything here okay i have a sequence like this which is very long so it's going to be very challenging trying to count but with the length function here you can count it so we have four and sixty nucleotides for basis okay so this is the power of programming you can assign this operation also to a variable so you can just say length length of sequence equals and you say length dna So then we can use this value later on.
We can say, let's say print length of sequence. and then you have four systems so you can use this operation so for example if let's say you want to get the gc content then you have to count the number of gc's divided by the length multiplied by 100 or whatever and then so in that situation you can just assign this operation to a variable and then later on compute the percentage okay so that's how we find the length of sequences now let's check for the presence or absence of characters or substrings or sub-sequences if you call it that way as well so this time let's let's try this let's get it let's see my string let's see i love programming if let's say i want to check for a character let's say i want to find if there is a p in this particular string then i can just say p that's the query what i want to find in my string so if that character is there then we are going to have to return to us if let's say that character is not there let's try let's say z in my string but we have false because z is not here now let's try this let's say p uppercase p in my string we have false you may be tempted to say that p is p this and this are all p but why do we have false it's because in python in python is case sensitive so that means that this p uppercase p is different from lowercase p and so you need to also pay attention to the cases that you use when dealing with strings okay so here is false because there is no uppercase p here but this is true because there's a small letter a lowercase p here so the same applies to all the other alphabets as well okay so let's continue you can also check for the prints or car tests or substance using another command which you call the find so you can say my string.find okay so this will return a value so this value here represents the index the starting index so we have one two three four five six seven eight okay this is number eight but indexing is uh seven i'll talk about indexing later on so don't worry now for now just know that when you use the find function this tells you the starting points in that particular sequence or strings let's say laugh a strange but find laugh it's two so you have zero one two okay it starts from two the index two okay yeah um we'll talk about indexing later on so just stay tuned then you get to understand why you have these values there and you should also know that in python counting starts zero i'll talk about all these things i mean the indexing parts okay so let's go now we are going to look at counting the number of characters or substrings this is something that's that's and this is something that we always do to find it the number the number for currencies So we will talk about that also here. I will start with bioinformatics again. My strategy course.
Bioinformatics. Now let's say you want to count the number of O's in this string. Then. we have to say my string.counts and we indicate the character or the substring of interest so this will give us the count so we have one two and by the way um this particular section here checking for the principal absence of characters there are other ways you can use to do advanced queries okay you can use regular expressions if you are looking for specific patterns it's something that i i did not make mention of so you can do more advanced queries here but this is something that i'll cover in another tutorial so for now just do with what you have now okay now back to the counting of the number of characters of substrings so we have my strange bioinformatics we are counting the letter o we have two of them here this and this so we use the my string that are strange objects and then dot counts so let's try again here by informatics accounts oh so we have what's two because we have o and o here again python is case sensitive so if i do this as train dot counts oh you get zero because o is different from this so take note of that o is different from this okay so let's continue now let's use a sequence we are getting to the interesting part let's use a sequence let's say dna equals a t c let's use any random one like this here you can use any sequence that you also have if let's say i want to count the number of a's that's adding it then i just have to say dna.count a and i'll get five because we have five of them let's count the number of genes running dna counts yeah I have four of them. Let's repeat for C.
And now let's repeat for T. Okay so we have them here. So what can we do with this? We can do this and we can use it to find the gc contents so if let's say you want to find the gc content the number of genes and seed because sometimes the gc content is a property that we use to um make decisions or make inferences okay or we used to evaluate or decide if a particular sequence has to be treated to this or not or we even use that to explain um certain properties okay or yeah certain properties of sequences or certain behaviors of sequences then in that situation you have to find the gc contents okay there are tools that can be used to find the sequence but behind the scenes it is these little little codes or commands that are being used to do the computation for us so that's why if you know the basics of this if you know it's from the beginners level if you know the foundation and it's helping to understand how these tools or programs work now let's take a look at the gc content we are going to apply what we've learned so far to do that now i want to find the gc content for genes and then the c's so what do you do you can just do this now we have gene we have c which is four and one i can say gc equals dna.com's gene plus dna.com's c so now we can print jc so this should give us 5 so this is jc now if you want to find the percentages then what do you do you first have to find the lengths also and then you can divide the gc divided by length times 100 so let's try this now we know gc so let's look at the percentage gc so we need to find the length so say length equals length dna and now to find the percentage gc we can say what gc over length multiply by 100 this will give us our answer here okay so we have gc that's the number divided by the length total length times hundred or to be on the c4 side we can just say gc over our length we put them in parentheses and say star hundred then we get our answer here so this means we can assign the entire operation here to a variable so you can say %gc equals and then we paste the entire operation there and then we can get our answer so we can now print %gc if he wants any other information if you want extra information you can just say print percentage yes bring your comma and say percentage and then you have it nicely done because sometimes you will do your computation but then you need to describe it because if you put just the number here then the user may not understand what this is about so if you add a bit of description like this one then the user knows that yes this is percentages okay so if you're running programs and you have text message or you have a text being displayed in addition to the numbers then you have this little little code that are being used behind the scenes okay so we have percentages which is good so how will you find periods and premedians i'll leave that to you to find out now you know how to count them so i'll leave you to find that for the accounting you can also count the number of occurrences for example let's see we have some pattern that you want to count you can also do that okay so let's try this let's create another sequence let's say dna2 because j j a j j c t now if i want to count the number of changes and i can say dna2.count changing so we have 42 because we have j j one and also another engine which is two okay so this is how we count them okay so bear in mind that when you are counting or you are looking for patterns there are better ways of doing it for example you can use regular expressions to do that okay but this is something that i'm going to cover in a tutorial or if you have any challenge like that you can always contact me and then check with me then we can arrange some session together if you want to book a time with me i've left some links in the description that i can use to contact and also book a session with me for further tutorials okay let's continue now what about if you want to replace characters or substrings that is what we are going to do next let's say i have my string let's say string equals let's say programming and then here i put a gene here so if i want to replace this gene with a lowercase for example i'll just have to call my string the strange objects and say that's replace so i start with what i want to remove what i want to replace in this case jane we are replacing this with a lowercase so the j comes comma and then what we want to add that's the lowercase thing so let's execute this command so we now have programming this has been done so this is how we replace characters if let's say once we replace maybe a pattern let's take for example we have let me just change this you can also do that so you can see my strange just replace bring that pattern of interest let's see let's do this so it will also be replaced for us now let's switch to dna sequences let's say dna let's do it again if let's say i want to replace the t's with use okay then i can just say dna that should replace to you okay so they have a place okay okay and you know that's ironies are the ones that contain their use okay so i have this done so with some additional commands you can actually change your dna sequence so convert them to rna sequences but there's something that i'm going to cover in another tutorial where you have to find the reverse and do their swapping order but for now there's just a simple command to let you know that it's possible to convert the dna sequences to rna okay by performing some operations and then also swapping the t's with the use okay so now we have swapped the t's with the u take note that whatever we are doing here is just return a copy of the sequences and then modifying that okay so if you want to assign that entire press into a variable then you have to add a variable so let's just assume let's just say that changing this t here to you gives us an irony or a transcript let's see irony then i can just say rna equals and then i indicate my command like this and then i execute it so now i can just print irony and i have it done nicely for me okay so that's how we replace characters or substrings okay in python so let's proceed now we are going to look at how to combine strings or sequences so if you want to combine strings and sequences and there are a number of ways you can do it okay i'll give you some approaches here so let's start with the first one by the way combining string sequences we also call that concatenation so let's just look at some of them let's say i have dna1 plus let's say agency adt i have dna2 ttc or tt yeah so if i want to combine i can just say let's say combined because i can say dna1 is supposed to be this way plus dna2 so now i can print it so i can say print combine so that will give me my my hearts my combined sequences okay let me just add this that's this is approach one okay now let's try with a name let's see first name equals for informatics second name equals coach so if i want to combine them let's say i want to get full name to be cost what first name second name so now let's princess you have it this way okay so there's a name so you expect that there should be a space character okay so you will be tempted to just do it this way you can just you can just say this plus a space character plus this okay and then if you print it you are going to get the full name with a space but there are much elegant ways of doing this okay so that leads me to strange formatting okay so let's look at approach two approach two and approach three are strange so in approach 2 let's just copy everything here the first and second names again so if i want to get a full name i can say full name equals and then i bring my quotes and i'll say percentage s space percentages so there's these are strange formatting and techniques and then bring this percentage symbol again and i say first name second name there is something that i'll cover in details in another tutorial okay so if you have any challenge you want further explanation you can always contact me so i can have it this way so now if i print it i should have by informatics coach okay so this and this are placeholders okay the s here indicates that we are going to um add a strange character this is also a string character okay and so these are the string card graph one two the reason why this is a better approach to this the first one here approach one here is that here it's easier to combine strange characters with numbers okay it's easier to combine strange characters with numbers and other data types it's possible to do that when you have the strength formatter so if you take a look at our percentage gc for example let's check this one here i can i can do the same thing using this approach here okay i can i can just let me just use the print statements or let me just i can also i can say this percentage s or let me just use that because we already have the statement there so percentage jc yes and i can say percentage d or 0.2 f this is a float a float character and then i can just bring the percentage this so that will also give me an answer so when you are combining strings you can use the string formatting through that now there's another strange formatting approach that's is also available that's the strange.format so that's a putt tree so in approach 3 let's just create this once again so i have it this way so i can just say because bring my code and i'll use a curly brackets and another curly brackets and i'll say dot formats and i'll say first name second name so now i can print it for okay you can i'll do some advanced operations with this particular approach also but that's something that is beyond the scope of this tutorial so let's proceed now let's go to indexing and slicing so indexing and slicing um this technique is used to assess substrates or characters or nucleotides by specifying an index or range of indexes where you have start and end and in some cases a step or strike value so there's a syntax we have s that's a stranger sequence we have m n that's the stats and then dash n and then that's that's the step of stripe so let me also use this to explain something so in python counting starts from zero okay so if i have this let me do this if i have this sequence yes i have dna because it's easy let me use uppercase now if let's say i should ask you to find the dna in position let's say position two in indexing we can just say dna and then we give the index value but because counting starts from zero the position two will have an index of one that's how it's going to be so let's run this we are going to get jane okay so this is how indexing looks like so we can use indexing to do a number of things we can use that to find patterns or even find or extract sequences and start particular regions okay of a sequence okay so let's look at how to do that i've covered this indexing and slicing in detail in another tutorial so i'll i'm going to watch that video i'll leave the link to that video in the description box for you to check but for now i'm just looking at it at a glance so let's start i have an indexing and slicing i also have negative indexing which is used to access the last element of a stranger sequence we also have shorthand indexing we will go through all of this okay now let's start with this first one here let's say dna because let's try any one now if let's say you want to find the dna or let's say you want to find the the nucleotide the nucleotide in position three that would be index two okay so now that but the accountants start from zero so the position of interest that you want you always have to subtract one from it so it becomes minus one okay that is if you are making the query from this to this side so let's say let's say dna or let's just say um we want the third nucleotide let me yeah let me just use that one let's say we want the third nucleotide then it becomes dna2 because counting start from zero in python so the third nucleotide will have an index of two okay that's what we have here so this one differs from r because in r constant starts from one so if you are doing r and you are new to python it might sound a bit confusing that is just a matter of you subtracting one from that value of interest if you want the fourth nucleotide then becomes dna or the string in question right that'll be the index okay this what you have let's try this on that string let's see how let's see hello let's see strange one now i'm shifting from dna sequence so if i want the letter the cartoon or if i want a third character then becomes strange one two that's what we have here now let's try this we have been using just one but what about if you want a range then if it's a ring that you want then you have to add a start and then end so let's go back to our dna here let's copy it i have this let's say i want a dna sequence okay of let's say i want the range let's say from the third i want the third nucleotide up to let's say the sixth nucleotide then what do i have to do let's start so i said dna two because i want the third from third to let's say sixth so because i want the sixth from third to sixth i need to add an extra value you know third will have an index of two the sixth nucleotide will have an index of five so i'll have to use two is two six when you specify the query like this python will return what that starts up to whatever you have here minus one so in this particular command we are going to have two index two up to index five being returned okay so this six years more like a boundary so that's python does not go beyond what it's supposed to do so let's run this command so we are now going to have agta okay so we have the third which is two zero one two so two three four five okay we have five here so if let's say yeah so we have four so there's the actual position we have position or the third so i said third to sixth so that means three four five six and that is what we have here two to six okay so that's what you have this you just have to know the rules the minus one rule and then after that everything should be fine with you so that's how we do it let's try again let's say we want the second nucleotide up to let's say the fourth so then it becomes DNA one two four we want the second as one the second to fourth that would be three okay so it's supposed to be like this is what we actually want so we add plus one to it like this and then okay so you are going to have fourth second third fourth okay so that's what we have here so just make sure to check the documentation for python and then you can understand whatever is going on here of course you can also reach out to me if you want the one i wanted to on python for my information so just check that description box you are going to find my contacts there okay so let's proceed so that's for the indexing and then slicing okay we've done this one where we just specify one value we've also used a range now let's look at the step this is very interesting so i want us to look at it now let's recreate this dna sequence and then let's see we have a sequence here now let's experiment let's say we want maybe the first to third okay or let's just say we want let's say i think it's better i type this so that we can we can we can look at it so let's let's do this let's say i want i say this and i want everything here i want the interval of three that's where the step or stride comes in so if let's say i want if i take all this dna all this nucleotides here i want to get some nucleotides but i want the third i want an interval of three so that means that i want the third of this the third the third the third so then we can just say let's say 0 9 and then 3 i want something like this so let's run this okay so let's take a look at the code here so i have 0 9 3 so what it means that we are interested in this range that is zero to nine but of course the nine here itself will not be included so in reality what we are looking at is zero to eight so the step here is three so that means every third number so we will start with the first one here which is zero that's eight okay now we count one two three that is jane that is what is here and now we count one two three that is another jane okay so that is the step i was referring to okay that's what we have here so you can always swap or you can change the values to see what comes up so this principle can be used to generate codons this is something that i've also covered in another tutorial but uh i'll you can check it i'll leave the link to that video in the description box for you to check so it is this principle where you have start end and step that can be used to get the codons for you so make sure to check that video i think it's going to be very helpful for you if you want to learn further so this is how we use the step as well okay so now we have our indexing and slicing we have one index we have starting and we can also add a step okay so just taking to that now let's take a look at that thing here let's look at negative indexing Let's remove this one first. So, negative indexing is used to assess the last element of a string.
So, let's look at how that can be. Let's recreate this sequence here. I'll just paste it and then let's look at what I mean by negative.
So, negative indexing means that the value that we are using. i mean we specify two six four etc we make it negative so with negative indexing let's try let's say dna negative one so with negative indexing counting starts from one we don't use zero we use one so if i want the last element then becomes negative one so that means i'm starting the counting from this side okay so just like how we have five prime and three prime for dna sequencing then um sometimes you go from five to three prime sometimes three to five prime okay so we have something like so if i want the last nucleotide or the last character then i can use this to get it so let's see how you get it so it's a what is the last of course here if we have a and a let's just use this to get started let's change this one to t you Now let's do it again. DNA 1. Okay. So we have T. If you want this one.
Okay. The nucleotide just after the last one. Then of course you have to use 2. So you can say dash 2. so that will give us jane okay now let's try something you can also use this technique to also get the reverse sequence as well but that's something that we are going to look at in the next session so you can also do this so let's do this so let's say dna um i start with let's say or i bring my colon and i say negative one i'm going to get some values here okay so that is also possible to do that okay so let's proceed okay um i have a message from olvaldo rodriguez okay so okay thanks for the message yeah so just make sure you learn some stuff so i think this live stream i'll be doing this weekly by the way so Tuesdays will be for Python programming Wednesdays will be for Linux or batch scripting and then Thursdays will be for R so just make sure to come and then join the live stream so before I proceed I also say that with the timing it's still experimental so I'm trying to look for the time that's best for a lot of people because of the time zone so just let me know what you think about the time initially I said 3pm but then later on I made mention that it's going to start at 5pm so I'm still experimenting to see the time that will work for most people so just make sure to join and then learn something new and again the notebooks and everything will be made available so just check my Patreon channel and then you'll have access to all my materials not just the program material but for the YouTube tutorial that I make I put the batch scripts and everything in patreon so just make sure to check the patreon and then just check them i'll later um give a video and that describes the patreon system that i use because there are levels of tears so just take note of that okay so let's proceed so we were we were on the negative index so this is what we had yeah the earlier one and there was some errors so you can also do something like this so you get from position four to the last one okay there are advanced queries you can make this is to rise the introductory one and so there are things that i may not be able to cover for now but i hope this tutorial is going to start this server says that's a starting point for those who want to learn programming and so um thanks for joining all of you and then just be sure to learn some new stuff okay so let's proceed so that is for the negative indexing um so um let's try this let's try this if it will work let's try another experiment here to see if that will work we are experimenting and then i'll see so this time i'll say negative four to negative one let's see okay perfect that has also worked okay so that means that the earlier errors i made i should have rather started with this and then with this so i think it's fine i've also learned something new okay so let's proceed so that's for a negative unit so here we are starting from one two three four this one here up to jane okay again the limiting or the boundary still persists so that means if i have minus one it's minus two that would be the last nucleotide that will be returned to me so that is why i have this here okay so let's proceed Now we will continue. This is for negative index.
And we also have shorthand index, which is also here. Let's try this. So shorthand indexing is done by omitting either the first or the last index.
So I have some shorthands here. So we have this one here. Start the slice from beginning of the sequence 2. index just before and we also have this this and that's okay okay so um i think this one let me just remove this one this one was an error i'll change it uh we use this one it's rather minus one yeah so um let's look at the shorthand indexing so i made mention it is done by editing the first or last index so we have some shorthand indexing techniques here so we are going to look at all these ones here so just down the notebook and you'll find the statements also there so i have all of them here um yeah so let's try let's try the first one let's um create our dna sequence again i will paste here and then i will use that so let's start with the first one s is to end so in this case it becomes dna so let's say s is two and let's say five so that means that python will return all the nucleotides starting from the first in index zero up to index four that's what the statement means so let's run this so we have this so we have zero one two three four we have a g a g t okay so we are starting everything okay so notice what we have here we have a g a g t zero one two three four okay that's the index okay again that minus one applies so even though we have five it's going to be zero one two three four that's the in that's what we have here let's do the second one so I'll say DNA let's say five so here we are going to get we are going to start from position five up to the end the last that's what this one means so let's run this so we are going to start from here up to the last which is here so zero one two three four five position let's do zero one two three four five that's a okay this up to the end that's what this other one now let's look at this one here this one what's dna this will return a copy so it will turn basically a copy of the sequence of interest okay the sequence that's and we want to make the query on so it's just a copy now let's look at this one here this returns the reverse of the entire string okay so um that one is s let's say yeah dna dna and i have this dash one and this will return a copy that's reverse so that's what you have here now let's type dna this is what we had and now let's look at dna this one that's a reverse okay so i have that also here okay so that's what i have okay so this will return the reverse of for us that's a reverse sequence so reverse sequences i've also covered them in detail in another tutorial so just to the description box you'll find the link to that video there okay and for these things i've done i've made for some of the sections some of the parts i've made detailed videos on that and so if you want detailed explanation then just check the appropriate video and then use them okay so this for the other one now let's check so with strings you can also um manipulate them using other functions which i've not covered i've not covered all of them but you can look at what we have here just experiment and then use the ones that's appropriate for you okay okay so this one like this i'm going to cover that in another tutorial okay because we are going to cover the list objects in our tutorial so this one is to just read them get the explanation and then you'll be fine so just practice and then everything should be fine for you so that's it so for beginners i will encourage you to just practice the python programming and then it's more about being consistent and i believe me starting from scratch is going to help you begin it so as a matter of fact there's nothing to be scared of when it comes to programming you just need to um practice okay and i think you being biologists and me using the dna sequences makes it less scary and more enjoyable so just make sure you check the description box for useful videos okay thanks i also have some information says uh this is a good lecture please share experience of the use of python finance online data sets okay yeah so um for live data sets i mean i've covered some of them but it depends on what you want to achieve at the end of the day but i think if you have some topics you can just bring them up and also um because i always have lots of requests you can also sponsor a video and then i'll try and make that for you so yeah that's also fine so again you can support this channel you can also sponsor videos anyhow you want to support the channel and it's accepted so just make sure you support links that you can use to support and the description box okay so let's proceed so just check these are just now functions here and then you can use that to manipulate your your strengths and then sequence so whenever you are analyzing sequence when writing these tools to analyze some of these sequences behind the scenes um what is happening is these little little quotes this not let's specify these simple simple codes that are here so just a combination of these simple codes that give us those results or those those statistics that we get when we are analyzing um sequence data so make sure you check um this video reproduce and then just be expect be honest uh develop an expertise in the field of uh python programming and bioinformatics okay so now let's proceed now let's look at this exercise here okay and this is the last thing we are going to do but i think you can also try and then just answer them if you can but you will try to answer that here i'll just do this we are going to answer them all here so i have the sequences consider dna sequence this how many nucleotides are present okay so let's start with question number one we have how many nucleotides are present calculate gc percentage gc how many prunes pruritins calculate percentage of prunes etc so we are going to take a look at these questions one by one i think it's going to be very helpful so question one how many nucleotides are present before we even do that we need to what first create this sequence it's a strange so that means we have to create the sequence okay so let's say dna let's use this so i have that created so now i can answer so question one let me check um question one is how many nucleotides are present so in other words we are looking at the length of the sequence so if you want the length then it's what you learn what dna simple then we get our answer so we have thoughts 23 nucleotides that's what we have here basically now let's look at question two calculate the percentage gc so how do you do it first you need to get a gc content so that means it was gc is equals to what dna.counts by SWAT dna.counts c and then you also need to get the length of the SWAT's length dna which are fully covered and then percentage dc is equals to what dc over length times hundred so now you can print your gc percentages you have it here okay now we can convert or we can change that we can um change the smart places if let's say you want to the smart places then you can just say percentage gc course round percentage you see again to let's say one to the small places so now i can print it and then i'll have my answer so i have 52.17 okay let's look at question number three how many prunes are present prunes are ace and then jeans so there the prunes it's equals to a dna.count a plus what dna.count gene yeah let me make it a bit description i'll say number of rings so now i can print this number of clearance it's 15. question number four how many perimeters so it's the same principle here okay by this time we change let me just do it again i'll source a number of remedies so that's what dna.com and that'll be what C plus DNA dot count T. We have C and T. So now I can print its number of remedies.
So I have eight. So this was for question number four. Now let's look at question number five.
question number five is calculate the percentage of p-rings in the sequence so if you want p-rings i already have that here i will just um copy this code here so if it's percentage of p-rings i need to get my number of p-rings i need to get my length which i've already done here i'll just do a copy and paste here and then when i come to percentage percentage of appearance then it becomes what number of appearance over length star 100 that's times now for this much practical operations i'm going to cover them in another tutorial so for now i'm just giving them bits i'm giving it them to you in bits okay so i have this so now i can print percentage of prints okay so i have that here again you can change it to two decimal places so this one i'll leave it to you so that's your assignment number six question number says which nucleotide is in the tenth position so nucleotide in the 10th position is going to be hot um let's just see um let's just use this nucleotide number 10 it's going to be hot dna 9 because counting starts from zero so i can print it so that's j so you can count you can just count them manually one two three four five six seven eight nine ten that's j that's of course indexing i will start from zero so it's what nine um let's look at question number seven write the sequences that are present in position three two position nine so here we are looking at a range okay so nucleotides what's position three to nine would be equals to what dna two because we're looking at position straight to now the third nucleotide to that ninth nucleotide let me just put it that way um so two to nine so why are we saying nine because we want and the third nucleotide to the ninth nucleotide okay so the ninth nucleotide will have an index of eight but we have to add plus one so that we get that value itself that's why we are adding this anyway you can always experiment some of these things it may sound confusing but then when you practice when you repeat them when you pick them you do it you make mistakes you repeat you repeat then you start to get a hang of things so just make sure to practice and then when you do it for some time you understand why some of these things are there and some of them are rules i can't change them so uh these are the rules that were set by the creators or the creator of python so you just have to follow it's just like learning a language there are rules so you just have to follow the rule and then you'll be fine so let's run this i'll print include so i should have the answer here now let's look at question number eight what command will you use to find which nucleotides are present from position 10 to 20. what command will you use to find so it's basically the same thing here okay the same thing i've done here so this time i'll just copy and then i'll modify the code here to give me 10 to 20. so let's see let's see let's see let's see so 10 to 20 Wats 9 okay let me use the actual value here yeah and when you come here it becomes 9 to this so now i can print it i will just print that here okay so that's what we have here so this is what's 9 to 20. so this 20 here will not be included but we have up to 19 and 19 is the hot 20th position okay so let's go now we are going to question number nine does the DNA sequence contain JGAG question number nine so we have to look for this so how do we do this let's see so we have to just query i can just do this this in dna and it's true okay it's there if it's there then how do we locate it so then i can use what let me just add that here i can just say let me use a print statement here first i can say dna.find to locate where it starts here let me use a print statement so that's everything displayed so nine it starts from index nine so let's check index nine zero one two three four five six so it's nine that's jane that's what we have here okay so if you want the exact range then you have to use regular expressions to do this but this is something that i'll cover um in our tutorials it will be a detailed video so for now i'll skip it question number 10 what do we have how many times does tt occur so you can say dna.com tta two times okay uh please take note that this particular approach sometimes to uh may not um be appropriate okay there are situations where you have let's say you have tt you can also have um other nucleotides making it a bit complex and so there are better ways to count okay you can use regular expressions or if this were to be a list data type then of course this one will work for situation but for a strange data type and this approach sometimes may not work okay so just take note of that but of course that's why you are learning so you can just build on whatever you've been taught today let's look at question number 11 at which index does gta begin that's question number 11. so of course here we can use the find to do that so i can just say dna.fine and then i'll just paste it so i have one okay index one let's see um let's check zero one we are looking for gtta let's check yeah so we have index one but here what you should notice that if there are if gtt occurs more than one okay if um you have let's say two or more of the gt8 on the gtt is then this approach may not work that's why i recommend that you use regular expressions okay in some situations they're fine and then accounts command may not work appropriately so i just have to look at the situation and look at the appropriate method that can be used that's regular expressions that one i mean you are sure of getting the right results when it comes to the find command now let's look at the last question last question on my top does the sequence contain the character z of course this is simple so you see what's z in what dna is false okay so here you know we have dna sequence we have protein sequences so for dna sequences i wish you have g if you have a c g t uh if you're dealing with irony then you you also are expecting to find you okay but if you see any other character like you have z then that means that it's invalid it's an invalid sequence okay so i mean you can also use this to uh check if there are characters in your sequence which are not supposed to be there so that's how we do it okay so um this is it for this tutorial and then just make sure to check the description box you'll find the link that you can use to download this notebook as well as other bioinformatics tutorial that i make so just to the description box and then get all that information so if you also have any other comments if there are things that you think should have been added then you can just put them in the comment section and then you can all discuss and then learn something new so that'd be all for this tutorial