Building an AWS Trading Bot Pipeline

[Music] in this video tutorial we're going to cover the data pipeline portion of our trading bot that uses AWS Cloud to automate the entire process and so as you can see in the architecture diagram I have outlined the portion that we're going to cover today in this video in the Big Orange Box so I'm going to demonstrate how to use Python code to interact with the Yahoo finance API as well as our Dynamo DB table which is going to hold our price data I'm also going to show you how to create uh an AWS user which will allow you to interact remotely with AWS services and here we have the quick outline of what my goals are for this particular video and so number one we're going to start with how to create an AWS user so that we can use bolto 3 which is the python package that will allow us to connect to our remote AWS services and then after that I will show you how to create the Dynamo DB table that we will use to store the prices and then how to configure the local AWS client and after that we will get into the code part of this where we will discuss the requirements that are needed to to create the data Pipeline and then I will go over the actual code itself and then we'll run the code to populate the database to ensure that everything is working properly and then finally we will cover the unit test which will help us capture any bugs that might have slipped in to the code and that part is also very important because eventually we would like to use this code to trade live and so we want to ensure that it is stable and robust and we want to try to capture any bugs that are created now and if we make any changes to the code that they will be captured later on as well okay so after you create your account with Amazon web services you'll be greeted with the console home and so you might not have all of these recently visited services but the service we want here is the I am right it's time to create our user our user is going to be the identity that we use to access these Services remotely specifically our Dynamo DB and so to do that we're going to click on I am and if you don't see that when you log in you can always search for it in the search bar and it'll bring it up here and you can go to the I am service dashboard now here we're going going to create our user I already have a user here but we're going to create a new one for this project we're going to give our user a name AWS Alo Trader it's fine we're going to click next now the next portion is the to set the permissions so this is one of the benefits of using a cloud provider like AWS in an in an Enterprise setting right because you can be very very specific about what permissions are allowed for what user and for what roles in our case we don't have a group created at the moment so we're going to attach policies directly and what we really need is Dynamo DB right and here we have Amazon denimal DB full access that's what we want and this is just showing all the different abilities and permissions it's going to give our user in this case this is all we need to connect to our Dynamo DB and read and write from it or read from it and write to it so we're going to click next and it's telling us our permission summary again we just need Dynamo DB for now that's the only service that we're using so we're going to create the user the user is created but right now we don't have an access key or a secret key and so we're going to have to create that so we click on our user that was just created and you can see in this corner we have create access key we're going to click that and we're going to use the command line interface we're going to say I understand and then we're going to click next create access key now we have our access key and our secret key you want to make sure you save these credentials download them and then once that's done you're going to click done if you find this content valuable you can help support the channel by commenting like share and subscribe if You' like to collaborate with me on your own personal projects if you need a strategy for your own personal portfolio that is fully automated or you need help developing your python developer skills especially as it applies to algorithmic Trading in automated trading the links are in the description okay The Next Step before we leave the console we need to create the price table that is going to hold our data and in this case we're going to use Dynamo DB and so we're just going to click that and just remember if you don't see it you can always search for it in the search bar and you're going to see in the left sidebar we have dashboard we have tables we're going to click tables so you can see I already have some tables that are operating but we're going to create a new one for this project create the table give it a table name AWS call it price so we're going to call it price history table the partition key that is going to be the ticker you can call it a symbol instrument whatever you have uh or whatever you prefer next we're going to create the sort key here we're going to use the timestamp and that is also going to be a string both of these are string types um you'll notice that we don't have to actually create the columns at this point uh because this is a nosql so we'll actually create the columns when it's time to populate the database or the table we at this moment we can leave everything as default and just create that table and then in a few moments this table will be ready for use okay now our price history table is ready and so we can click on it and just kind of look at a couple of things that will be important once we populate the table with data so we can see that our partition key is set our sort key is set and then we have this little widget that is going to give us a live item count and right now it says zero because it's empty in the top right corner you're going to see this explore table items once it's populated you'll be able to see examples of the items that are in your table and you'll also be able to run filter uh queries and sorts if you want to find a unique item or you want to make sure that a particular data point is populated within the table really quickly I wanted to cover something that I forgot to mention so obviously the data that we collect is going to be really important and so there are protections that we can put on our tables to avoid uh accidental deletion or for Recovery purposes now they offer a service called point and time recovery this costs money so you'd have to decide if it's worth it for you but the minimum level of protection that you can give for your for your table is to turn deletion protection on and so we're going to handle that now to do that so we went to additional settings and we're going to turn that on and now we're all set and that is it there are a lot of other options but for our purposes this is good enough to start with okay now we're at the portion where we're going to configure our AWS client to utilize boto 3 and connect to our Dynamo DB table and to first before we do that we need to ensure that we've already installed the correct packages make sure that we installed the following now I already have these packages installed but I just wanted to show you the commands and says that all the the requested packages are already updated we will then configure the AWS client okay so we can see that the requested packages are already installed and so now we're going to configure our client configure okay so now we're going to need our access key that we downloaded before and then our secret key our default region name that's going to be for me US east1 output format I'm going to use Json and now we are configured so this is going to let you connect now uh by default you don't have to pass in your access key or secret key uh in your code there's a configuration file that is stored where the AWS client will look for your credentials and will be able to connect so I'm going to demonstrate that now okay now here I'm going to demo how to use boto3 to connect to our price table just so you know that the configuration works and is completed so first we're just going to import Bodo 3 we're going to give our price table region name correct table is the table name we're going to print the table and it should return a Dynamo DB table object if everything is good so we can see that it worked so the connection is working the configuration is right and so now we'll get into the code all right before we get into the code specifically I wanted to touch on one of the things I do in the process of coding a project and the number one thing I like to do is clarify the requirements and with this uh with this data pipeline we need to be able to do what query the Yahoo finance API we need to be able to pass in a start date an end date period time frame things of that nature next thing we need to be able to do is format the data for storage so typically the data that we receive or the excuse me the data types that we receive from the API is going to be different than the data types that need to be stored in the in the database and in this in this case specifically Dynamo DB for example requires that all new uh all numbers all numeric data be in decimal format next thing we need to be able to do is query our Dynamo DB and get the latest records and we need to be able to tell it how many records we want at any given time we also need the ability to determine how many days of data we need to grab from the API in order to update the table and to do that we need a function that can compare today's date with the most recent date of the most recent item in our Dynamo DB table we also need to be able to update the table with our price data and if available to us and it is in Dynamo DB we need to be able to bulk insert the new prices usually batch inserting or bulk inserting is going to be faster okay before we get into the details of the code let's let's discuss a little bit about what's happening on a on a broad scale right from a bird's eye view right so essentially we have a couple of things going on we H we need a structure with um methods to be able to communicate with the Yahoo finance API and then we need uh a class or structure that will allow us to handle our database operations and from there we're going to wrap all of those things up into an easy to understand function right so we have one that is called warm-up asset data and this function its primary purpose is to populate our Dynamo DB table initially if it has no data the next function is simply to update that price table if we have data in it that's all so very simple understanding of what's happening before we go into the details and then just to touch on briefly this function the init Dynamo DB that just that just sets up our resource and connect our code to the Amazon web service Dynamo DB and our price table so that our functions can access this price table on a global scale and here we have a code flow diagram so you can understand how everything is working before we get into the nitty gitty of the code itself and so as as I said earlier there's the initialization of the price table and it doesn't call any other functions and then we have our main two functions right and the number one function we start with is the warm-up asset data and what that is going to do is it's going to query our Dynamo DB table and see if it has any data if it has no data it's going to grab the data from the Yahoo finance API as much as it can right and once it gets that data it'll put it into the data table and so you can see and follow the arrows the next function is the update price table and that does exactly what it sounds like it's going to query the Dynamo DB table get the last record and then get the date from that record and then compare it to the current date and if there is a gap it will call the Yahoo finance API download the new data and then put that data into the Dynamo DB database very simple okay so now let's get into the nitty-gritty of the code so first things first we have our Imports the key Imports here going to be bolto 3 related as well as y Finance which is going to allow us to connect to the Y Finance API and here is our class and it only has one method meaning one function inside of it and it's the get price history function this is the function that is going to do the heavy lifting in terms of grabbing the data from Yahoo finance all right looked at the data before we don't need these two columns so we're going to drop them the symbols that we're using don't have um capital gains or stock splits but you'll want to adjust that if you decide to utilize um Regular equities we are in this tutorial we're using uh currencies and ETFs so if we pass in the period parameter this is how we're going to structure the call and and format it into a data frame that we can utilize if we have a start date without a period or excuse me if we don't have a start date and no period then we're just going to get the maximum amount of data available and then if we have a start date and an end date we're going to use those to query the data now if there's some combination that is not covered here we're going to raise a value error and then it's going to try to help you understand what's going on earlier I discussed how the data types from our data frame or from the API is likely going to be different than what the D DB requires here we're going to convert that data into the numeric data into a decimal type and that's happening here you can make a design choice if you want to abstract away this into a different function or something like that but here I just included it all in one so let's close that up next we're going to get into to our database operations class this has four methods so let's touch on those the method that interacts directly with the database in terms of qu querying the last record or the last end records is query last end prices couple of things I want to touch on so we're going to query it based on the ticker and the ticker needs to equal whatever we pass in into the function we want the scan index forward equal equal to false so that will set it up so that the most recent records are at the top so that when we call for example a limit of five it'll bring the five most recent records with the first record Being the most recent now there's a couple of uh quirks to this that I noticed when I was building this out the first one is that when you pass in N equal to one it doesn't it doesn't return a list of dictionaries like it normally would it returns just that single dictionary now normally that wouldn't be a problem but nor but you don't want to have functions return different data types in general that creates a lot of problems for your code and uner complexity and it and it makes it really buggy so if this is the case if n is equal to one and it's not a list in fact we could change this to make it specifically a dictionary if it is a dictionary then we want to wrap it as a list if it's not a list then there's a type error something's wrong and we need to manually intervene to see if the Dynamo DB API has changed or if yeah essentially if it's changed or we're getting an unexpected response we need to come in and fix that now get last n prices wraps that function and so this essentially will go through each symbol and gather up all of the relevant records change the data types as needed in this case for example the time stamp is a string in the in the Dynamo DB table we need that to be a daytime object and we convert the numbers into quanti Siz floats and then we wrap up all the records and return a data frame so it just makes it easier to work with the next wrapper function that we have is the put price data in table so we have two main parameters that we want to pay attention to that's bulk insert and overwrite and typically you would use these uh you know overlapping like if one is is is true they're both true if one is false they're both both false but we'll discuss that here is where we're going to establish what columns we want to keep in our Dynamo DB because it is a nosql type of format we didn't have to tell it you know we want to keep a open column a high column no we're doing that now so in this in this tutorial we're going to keep the ticker the Tim stamp the Open high low close volume and dividends the timestamp format is specific so you can see that here and then here we go if we choose bulk insert or overwrite we're going to use the Dynamo DB batch Rider which is a little bit faster and it'll put more items in at once otherwise we're going to use our put price price items with condition and so let's talk about what that function is so put price items with condition this essentially protects our Dynamo DB from putting in data that is already existing so that's really all that's happening here there's a couple of um parameters to pay attention to so the condition expression is what we're using in order to manage this process and you can see that we have attribute not exists the timestamp label is actually protected in AWS and so if you use that directly with attribute not exists it will generate an error and so you have to use this additional parameter called ex uh expression attribute names and you can put an alias and in this case we're using pound TS next we have our well assuming that the item does not exist it will be a successful um put operation and we log that information as well as the the response what happens if this does exist it actually generates an error but the error is not something that we want to deal with we actually want to pass that error or catch that error and so that's what happens if we get the client error if the code is a conditional check failed exception that's that's okay we just get a warning and we let we know that that data already exists and that there was no overwrite that occurred otherwise if there's any other type of error it's unexpected and we actually do want to raise and stop the program and understand what is happening so that is it for our d operations okay next let's talk about the warm-up asset data function and so as I mentioned before all it does is essentially query our database to see if there's any data that exists if there's no data that exists it's just going to get the maximum amount of data that it can from Yahoo finance API for the symbols and then put that into our Dynamo DB table using the batch writer everything else is just um Edge case handling for for example we always want the item to either return a list or be none so if if that's not the case we want to raise an exception if item is not none and it's not an empty list then we know that there's an item and the function will just return otherwise we know that the um that the item is empty and and that our database or our table has no data and it's time to grab as much data as possible and uploaded next we're going to discuss our update price table function and as it as I said before and as it's properly labeled that's exactly what it does so it's going to get the most recent record in our Dynamo DB table grab the date from that compare that date with with today's date and if there is a gap it will query the Yahoo finance API and grab the difference in price history and then upload that into the Dynamo DB table here this is just um an edge case where if it's equal to one we we want to use a period equal to one day if we use the start date and end dates it won't it won't return any data if it's larger than one day then we can use the actual dates to avoid um a start date that overlaps the date that's already in our Dynamo DB table we add one day to the start date if you query for a different time frame you're going to have to adjust this according to your use case Okay I want to touch on the init Dynamo DB function so this is what connects our code to the AWS service and returns our table so that we can do table operations on it one thing I wanted to touch on is that we are creating a session and then using that session to connect to the resource which is Dynamo DB now here I specify a profile name normally you don't have to do that especially if there is only one set of credentials however if you have multiple sets of credentials you'll need to set a profile name otherwise it's just going to use the default credentials so that is the only thing I wanted to touch on there and once it once we connect it returns our price table and then our previously discussed functions can access that price table to do operations like query and inserts okay so now that we have covered the code what it does and how it works we are ready to import the price data from the Yahoo finance API and populate our Dynamo DB table so let's do that now okay so we can see that we have finally finished it took about 30 minutes to upload all of the data but we can see that it is now populating our Dynamo DB table and then up next we will check and confirm that using the AWS console okay here we are and we're just going to quickly confirm that we do indeed have our Dynamo DB table loaded with data we can see that the item count is the same 15,252 items and we can explore our table items as well and we can see an example of the items that are upad loaded or inserted into our data table and everything is as we would expect so in the beginning I said that I would cover the test Suite that I created for this uh for the data pipeline but this video is already running a little bit long so uh I think I'll just point you to it um it could actually take up its own entire video um but the tests are there if you want to run them all of all of this can be found on GitHub just run python tests be able to see that here and everything passed so if you are interested in more details I could create a video for it otherwise I would just recommend you explore the GitHub if you are interested in testing testing is very important but I don't want to make this video too much longer if you enjoyed this content let me know okay welcome to part two AWS trading and today we're going to discuss the strategy that we'll be implementing and just for your just to note that the code and the diagrams for the strategy can be found on GitHub so note in part one we covered the data pipeline portion of the AWS trading bot architecture I demonstrated how to set up the AWS environment including creating a simple Dynamo DB database or collection of tables essentially to hold our price and strategy data then we walk through the data pipeline code in detail and at the end I showed you how to get the data and populate the DB with it in this post we'll be covering the strategy that we're going to implement um as a quick aside I did try to make the architecture relatively modular so if you have a different data API or you want use a different DB you can do that uh but I just wanted to make something that was relatively accessible for most people who might stumble across this um in this example we're using uh the yaho finance API and again the DB is Dynamo DB so the strategy and the end of part one I needed to choose a strategy and I wanted to do something a little bit more interesting than a moving average crossover which is essentially the H world of algo trading so I decided on a market neutral long only ETF strategy it sounded interesting to me to implement and to kind of do some basic exploration around um it does have some quirks though since it's long only and we need it to be Market neutral we initiate the synthetic short positions by purchasing inverse ETFs so overall the strategy is a pairs a pairs trading strategy in this strategy each pair will be equally weighted and then within each pair we allocate Market neutral weights accounting for the leverage of the inverse ETF so just keep in mind most most uh inverse ETFs are in fact leveraged so it if we were using a three times leverage ETF and we had a $100 port P folio we would invest $75 in the unlevered portion and this example is XLF and $25 in F so 3:1 ratio unlevered to levered so now we'll explore a little bit of the theory about how we expected to make money and I did that a little bit in the EXP exploratory analysis so uh we have to consider Market Direction and volatility uh specifically and volatility is especially important because of the leveraged inverse leg um and we'll talk a little bit about why in a second but in theory we expect the strategy to perform well and moderate to strongly trending markets um the strategy will get cooked in high volatility range-bound markets and this is because of something called volatility Decay or beta slippage or there might be some other terms that you might come across but essentially the IDE is that for a sequence of returns there will be a gap between the underlying theoretical leverage return and the actual return from the leveraged inverse ETF now this results from the fact that the inverse ETF has to rebalance daily to maintain the proper leverage and they do this by literally buying high and selling low right so they have to buy in the direction of the increase and they have to sell in the direction of the decrease uh so I I detail uh more about this concept in um a previous blog post that you can click here to find and I go through it in detail and examples of what is actually happening happening under the hood for why um this phenomenon occurs and why most people don't understand it so here's a littleit matrix about how we expect the strategy to perform based on Market Direction and volatility and we split the volatility into three regimes low moderate and high or low medium and High um this is a little bit more of a detailed uh exploration of that idea using XLF and F and we talk about and we you know we talk about what happens in each leg given the volatility and Direction you can see sideways again because of the severe Decay losses we do not expect um the unlever uh excuse me we don't expect the leverage leg F to outgain the um losses in XLF and XLF is basically going to oscillate around zero um and so the strategy is going to lose a lot so we're not going to gain anything on the upside and we're not going to gain anything on the out on the downside and we're just going to keep losing uh because of the decay of the F of the leverage leg um when when it's bearish and we have moderate to low excuse me and we have moderate to low volatility is going to depend um very much on the sequence of returns whether or not it's slightly negative or break even or just negative and so here we are with the exploratory analysis here we can actually look at it here a little bit bigger so these are just the raw returns of the pair had you invested in it and held so we can see most of the pairs actually end up positive over time except for XLE ER y here we are looking at the regimes the volatility regime relative to the correlation regime and the correlation is between the the two pair the two um legs right so n triple q s triple Q the we're the correlation regime would be how correlated is triple Q with s triple Q as well as how how volatile is the pair so you can see that we have low volatility low correlation low volatility medium correlation and so on and so what I think is interesting is we would expect that high volatility lowest correlation in my opinion you would think that that would have the most diverse or wide range of returns but it doesn't and it doesn't for all of them and in this case um for example High volatility High correlation has the highest but when we look at SMH s so x s High volatility low correlation has the highest dispersion of returns uh TLT TBT High volatility High correlation again as well as xly ER y but then XLF F A follows that high volatility low correlation um dispersion again the next row we look at the sharp ratios relative to the two regimes um without summarizing these into like a table I don't see any obvious patterns obviously um XLE ER y has the worst performance during medium volatility across all correlation regimes um XLF f a low V low correlation is really bad as well as medium volatility and low correlation uh SMH s Soxs tends to do well regardless of regime so that speaks to an overall uh Market Trend or economic Trend that is not really beholden to these typical uh regime changes uh TLT TBT we can see high volatility low and medium correlation is is pretty poor performance as well as medium volatility High correlation so I say that to say this strategy could benefit from Individual pair filters based on regime and then the last exploration is just the annual annualized sharp ratio by year and so we can see it's pretty diverse so we would expect a portfolio of them to do pretty decently and we can see that in a second um so I also ran a custom back test for the strategy and what kind what surprised me is that the overall strategy with these pairs is is decently correlated with uh spy so that kind of that kind of surprised me to be honest with you um but volatility is lower and sharp ratio I think is similar I don't remember what the uh spy sharp ratio is over this time period uh for the portfolio it was about 08 which is pretty decent uh Max draw down we can see is almost 20% um and really underperformed from 2022 into mid 2023 and so we can just kind of look at the returns rolling sharp ratio trading cost based on based on um when it happened so in this back test we use slippage as well as commissions but that can be modified and again the code is in the GitHub I don't know why this is so hard to see uh op the opacity might not be dark enough and then we have our cost cumulative cost and the amount of cost drag in the portfolio so this is basically the differ between the gross returns or the gross Equity curve and the net Equity curve and it's a function of our of our trading activity the again the correlation between the strategy and spy was roughly between 6 and 7 um some things to improve if you want to take this strategy and work on it yourself or um for me this is what I would probably look into into further researching um the portfolio waiting scheme is is is extremely basic right we're using equal weights but it may improve by incorporating um volatility and momentum into its into the um waiting scheme um as we saw with the exploratory analysis that there the you know the performance of the pairs varies based on volatility as well as correlation so there might be um a good use of volatility regime filters you know on the portfolio as a whole and a uh and each pair individually could benefit as well uh one thing to note is that the strategy is very sensitive to the rebalance frequency so what I noticed was the more frequent the rebalance the worse the performance um and the way I set up the back test was that the individual pair would be rebalanced if it if the weight the portfolio weight of the pair deviated uh Beyond some um threshold so I don't want to go too much deeper than we already did um again this is just a a toy strategy to implement to keep it interesting and we're going to go through this in more detail how to actually implement this uh using AWS Lambda um in the next in the next YouTube video and the next blog post uh so if you want to join the conversation join the Discord um if you want to reach out to me directly through email you can do that as well uh but other than that thank you for your time and if you have any other questions you know feel free to reach out or drop a comment [Music]

Transcript for:Building an AWS Trading Bot Pipeline

Transcript for:
Building an AWS Trading Bot Pipeline