I'm I've got a a deck of 60 slides I'm GNA go through them quickly so if you have a portable device you might want to grab a copy here and because you might not see all the details otherwise so and actually I have a little clicker right here if I can turn it on okay let talking about sqli I wanted to give this a brief introduction uh about what sqlite is what it is not and why you should be using it I really didn't know what to title The Talk um uh Dustin put it this you know the database at the edge of your network so I put that there but I might also title The Talk uh misunderstood and underutilized and mispronounced I don't care how you pronounce it Andrew you call it anything you want uh you might also call it why aren't you using using this more that's what this talk is about so you might have heard of sqli you you're familiar with SQL you know there's a bazillion SQL database engines out there and a lot of people they come up the idea SQL well it's just a little toy version of MySQL or postgress how many people here are using an SQL database of some sort show of hands MySQL raise your hand post grass SQL Server Oracle uh what am I missing inic yeah okay nobody so sqi is not it it you people people look at it and say well SQ it's just a baby version of these that's not its purpose it's it's it's trying to solve a very different problem it happens to speak the same language as these other guys but it's really has a very different Focus so rather than being in competition with these others we say that sqlite is in addition to these other guys you use them both and that's what this talk is really going to try and drive home um sqlite is designed for storing local data the stuff that's nearby your application whereas the other client server database engines are designed to store a global repository for the entire Enterprise um yeah the other guys are the Enterprise data Depot this is the big pot in the middle of the room where all of your data goes whereas sqlite is more geared toward an application file format I'm going to talk more about that as we move forward but first a quick introduction about what is sqli you know these other database engines can I walk on the floor can you still see me because I'm going to fall off the stage if I don't get off of it the um the other you know MySQL postgress these other the big applications they client server they they run as a separate process and then your application connects to them using some sort of interprocess Comm communication usually tcpip but that that's a detail and and but SQL is different it's a library that you statically or that you link in with your application it runs in the same address space as the rest of your application it writes directly to the disk a single file on disk whereas if You' set up a MySQL installation it's got some magic folder somewhere on your computer where all of your data goes who can tell me the name of that folder anybody didn't think so you don't even know where the data is stored you know there's really nothing wrong with that but SQ taking a different view I mean you you would you store your data in a specific file um sqlite doesn't have any configuration files when you're setting up MySQL or postgress after you do the install you got to open up the configuration file and tune it and tweak it and and you know add user permissions whatnot there's none of that you've got automatic recovery if you have a crash if you have a power loss there's nothing you have to do to recover the uh the the database to roll back the previous transaction it's completely automatic um it's portable code crossplatform um well- defined file format the the file format is stable other database engines if you upgrade from postgress 8.6 to postgress 8.7 you know what you have to do you have to dump out all of your data into a portable format install the new server then reimport all your data they change file formats when you do an upgrade like that with SQL the file format is fixed and stable it's documented we've got a document online which is sufficient for somebody else to come along and write a competing implementation if they want to so um oh I mentioned that site is in the public domain there's no licensing involved so you're free to use it for anything you want there's the website there I wrote SQ about 15 years ago I was having a problem I was I was doing a problem kind of one of these things that you Skookum does you know where it's a contract programming thing and uh I had to read a bunch of data out of an inic database and sometimes that inic database when they would reboot the machine the informix database server would not come up and so they'd BR launch my application and my application would print a dialogue box that looked like that because that's what dialog boxes looked like 15 years ago and and and so as my program that was printing out that dialogue box it couldn't connect to the database engine because it didn't come up and so even though I had absolutely nothing to do with the database I got the support call because it was my application that painted the dialog box and so that was that was annoying and I'm thinking is read only I don't have any of these transactional things all I just need to is read data out of this thing why can't I just read it out of a disk file so I wrote this little program sqlite which did just that it stored all the data in a disk file and then you I could read it in and I could boot up and then if the machine were healthy enough to actually you know allow the user to double click on the icon from for my application then it was going to be healthy enough to read the database and I and that saved me a lot of early morning support calls so that was its origin but it I put it out there and I've written a lot of Open Source software before and I put it on my website and it would get five to 10 downloads per year you know and I figured this was just another one of those and I put it out there but it really got popular really fast and before long companies were contacting me and say hey can you support this for us and I thought sure absolutely I can let me get back with you and then it's like how much do I charge for that I mean how does this work and I didn't realize you could make money off of Open Source I thought I was just putting it out there so but um it's been picked up now and it's in just about everything all the devices you own it's in your smartphone it's in every smartphone it's in uh your tablet it's in your game console it's in your camera it's probably in this camera right here that we're looking at it's in um oh it's in the a onics of the Airbus a350 um it's in all kinds of software it's in Dropbox uh all the the the big web browsers not IE it's uh but all the others um but it might be in the next because uh Edge May the name oh is that the new one it could be because uh Microsoft has now announced as of uh two weeks ago that SQ is now built into Windows 10 it's a core component used by the operating system Windows 10 will not boot without SQ light they they contacted us and we had to produce a special version of the dll that was that used um standard call rather than C deal just for them okay so Skype QuickBooks iTunes Wikipedia uh there's lots of things out there using this it's it's really uh amazing so it's found in every every phone every iPhone every Android phone every Mac every Firefox Chrome and Safari browser all these other software it's built into Python and PHP apparently it's also built into Ruby well Ruby on Rails anyway most TV sets have it uh it's in your if you got a a newer car it's in the you know all the new cars have the multimedia entertainment center that's pretty much all powered by sqlite and millions of other applications we did a a did a question online the other day was talking with the community what is the most deployed soft Ware library in the world and you know is there anything that that has more deployments out there right now than sqlite and we came up maybe maybe Z the compression Library I kind of think that's number one but other people saying well no that's not number one because with zve there's a single shared Library that's on the machine and everybody uses that one shared library with sqlite everybody statically links their own copy so the machine has multiple copies of SQ life but only one copy of Z I don't know if that's a good argument or not but it's Poss possible that SQ is the single most deployed software in the world today uh if certainly not it's it's within the top five so here's a fun chart here's um this is a Apple stock price from when they went public through about 2012 and this is before their split their 71 split and I I I cut it off here because um now this would going up to like 900 or something split adjusted but it's really interesting to look at this because it starts down here and it was really kind of a penny stop for like 20 years there was a little this is the tech bubble here it was a kind of penny stock right here and then suddenly they just sort of took off like that you see it's they they're flat for a long long time and then suddenly they really took off I'm just saying I'm I'm just saying uh this absolutely true um um we've got a lot of fans uh I saw a I saw a tweet I think it was Wednesday and I I thought I'll throw this in the deck for Skookum U here here's I don't know who Eric bus boom is he says I proposed to my wife in sqli it's that awesome I don't know Eric I'm sure he's a nice guy but he might be taking this a little bit too far uh so um anyway s you like by the Numbers it's one file of C code one file now we don't actually edit that one file because it's a big file it's 160 lines of code we've actually got like a hundred files that we edit and then then we have a make process that builds this one file and actually there's a lot of code in there that's machine generated as well from high LEL specs that we give it but there's one file so if you want to import sqli into your C application or your C++ application you just take this one file and you drop it in with all the others and compile it and then you've got a full-blown relationship database built into your application um there's three developers myself Dan Kennedy from Brisbane and and Joe matachin from Seattle we've been working on it for 15 years uh 100% modified condition decision test coverage this is um this is a an FAA standard we we follow um do 178b they're actually on this the do178b is the specification you use for safety critical uh Aviation software actually now it's do 178c but I haven't paid the fee to get my upgrade yet so but they're about the same so there very detailed 100% Branch test cover or this is basically you have to have test coverage such that every every Branch operation at the machine code level goes in each Direction at least once and really you're also supposed to show that every Branch instruction affects the output as well so we've got that and more we've got we're we're just fanatical about testing because when you when you're on everything when you run the world the least little bug causes major pain so we have to be very very careful that we get it right so we have Aviation grade testing there's about 22,000 or 23,000 Branch instructions at the machine code level which gives you a measure of how big it is um um 94,000 Source lines of code that exclusive of comments if you throw in the comments it goes up to 160,000 lines of code so we do have a lot of comments and they're not boilerplate comments either they're actually real comments that tell you what the code intends to do they're useful comments and the idea here is that we want sqlite to last a long time we want sqlite to be readable and understandable by programmers who are not yet born so it's very well documented that way uh less than a megabyte compiled millions of applications use it um I just ran a a test run the other day and it came and it did 177 million different tests all of which have to pass and that's on one platform that's uh that's actually one of three test Suites on a single platform we actually do that on multiple platforms where you know various different flavors of Unix uh Windows VX Works open open BSD Mac um and we also do it on different processors we've got a 64-bit spark which is Big Indian we've got a 32bit power PC we do arm we do of course we do Intel we do all these things to make sure everything working real well and then last year alone 2014 of course precise numbers are difficult to know but U if you add up all the things that shipped with SQ we've got over two billion new installs last year alone so it's kind of a popular program uh features uh it's got a like I said it's got a well-defined crossplatform file format power safe transactions that means that if you're in the middle of a transaction and and your machine suddenly loses power uh either the transaction goes in completely or not at all when the power comes back on you don't have half of the stuff staved it's either all or nothing that's a very important property we' got the usual triggers and foreign Keys we have full text indexes archery indexes for geospatial queries we've got Advanced features like recursive Common Table expressions and partial indices that MySQL does not have does my SQL have R trees does anybody know MySQL people don't know okay anyway uh the limits a lot of people think well SQL light it's got light in the name be little bits of data actually we we've got very high limits uh the one thing we are limited on is just we only like one writer at a time lots of different processes can connect to the database at once but they have to take turns writing okay one Rider at a time at unlimited number of readers we can do joints up to 64 way strings and blobs up to a gigabyte uh and and we and and and that's fine storing large strings and blobs and SQ database works fine a lot of other database vendors tell you uh oh if you got a big blob or if you got a big string store it in a separate file and then just store the file name in the database sqlite is not that way I've got a slide on that coming up uh databases can be up to 140 terabytes in size we've never actually tested that because that's hard and and and and the database is a single file as well so I don't have a computer that supports a file system that'll let you have 140 terabyte files okay that that's big um and you can have up to and and and the same database connection can talk up to 125 database files at the same time each of those 140 terabytes do the math okay um one thing if you've done a lot of SQL before one thing that throws people that SQL it's is more in the flavor of JavaScript or python uh it's it's it's it's kind of kind of Loosey Goosey with the typing system you can give types to the various columns but sqlite treats them more as suggestions rather than as hard requirements so if you if you if you give a column that's a small lint and you put a a mass of energer in there sqi is perfectly cool with that and it'll save the big integer that's not a problem or you can say I've got a varar field with for five characters and it'll happenly take a million character string and put it there and save all million characters okay we don't have a date type field a date type type you can store your dates as either strings or Julian day numbers or Unix timestamps and our datetime functions know how to interpret all three of those and we don't have a Boolean type just use an integer integer one is true integer zero is false just like in C actually in nonzero is true okay here's the thing I was talking about before if you can see this uh it's actually faster to read content out of a SQL database than it is to read it directly off the disk here's a this Adobe discovered this it surprised us the developers Adobe discovered this when they were doing Lightroom any photographers here you know the Lightroom product uh so that's it it uses SQL as its file for and they this and they were running well do we store our our do we store our um uh uh not not our raw files uh the um what's the word that just I lost the word thumbnails do I store our thumbnails as separate files and then put the file name in the database or do we just store the thumbnail directly in the database and so they ran the numbers and they found that um it's faster to store thumbnails directly in the database what these numbers mean this is the ratio of how long or how how how fast uh information is read out of the database divided by how fast it's read out of off directly off the dis and so you can see for smaller blobs it's up it's over two times faster to read it out of the database than it is to read off the disk how can that be since the database is on disk well when you're reading a file off of dis you've got to first open the file and the cost of doing that extra open operation is enough that it gives SQ The Edge because SQ only has to open the database once whereas if you're reading a bunch of things out of separate files you have to rope in each file separately so um now as as the blobs get bigger it becomes faster to stream it off of disk but for smaller blobs put them in the database and actually you know this is not a a terrible penalty we routinely see people put multi megabyte blobs in an Esco database without a problem um very simple C++ interface uh there's a a detailed here there's actually hundreds of these API calls but this is the core seven that you need to know about open a database Connection by calling SQL I3 open it gives you a an object which is your connection to the database and this is the destructor for that object so that's two down then you want to run some SQL you first have to prepare that SQL you give it an SQL string and it produces what's called a prepared statement this is like the compiler with any SQL database any SQL think of every SQL statement as a as a miniature program it's it's and you have to first compile that program and then you run the program so prepare is the compiler and then you've got maybe got some un bind is like telling you what your what your command line arguments are to the program and then you can run the program here and it it runs up to the first breako which is the first breakpoint is going to be when you've got data that becomes available and so it stops at the first break point and then you call column and it gives you the the the the data that's in that row and then you keep running step until you get to the end and then you can e reset or finalize reset rewinds the program back to the beginning so they can run it over again possibly giving it new input parameters or finalizes the destructor and these are Big classes of of routines like there's a bazillion sq3 binds for different data types and there's a Bazil these for different data types 32bit integer 64 bit integer real string blob strings in utf8 utf16 little Indian big Indian native whatever there's a bunch of these so that's the basic introduction so if you are going to program this is what you need to know uh we also have in sqlite a bunch of extensions since it's built into your application you're not limited to the set of SQL functions that your vendor provides you can create your own own SQL functions written in C or whatever your programming language of choice is and then have sqlite call them as necessary so you've got SQL embedded in your application you've got some other specialized functions in your application you can set it up so that your query can call back to your application to run these functions uh we also have virtual tables a virtual table is like um it it it's not actually storing the table on disk it's reaching back into your application a virtual table is a an object and it has things like you know give me the next row whatever and it it can look at at a complex internal State information of your application and use that as part of a query part of a join say uh application toine coating sequences you can plug in the file system and memory allocators at runtime there's command line tools that come with sqli from the website there's the the SQ 3.exe command line tool everybody here wait who programs on Windows who's primary Desktop Windows primary Desktop Windows primary primary desktop Mac primary desktop Linux okay well all right so for on Windows it's SQ 3.exe on Mac and all all Unix you you course leave off the exe this is a commandline tool that allows you to access an sqlite database so you might want to say pull your contact information off of your iPhone put on your laptop and see what actually what information apple is remembering about you might be an interesting thing to to to look to look into this is the tool that you would use and this is a command line tool uh I won't go into the details it basically you just say SQL like 3 the name of the database file and then you start giving it SQL but there's these also these dot commands that you can do for things like tables tell you the names of all the tables in the database schema will give you the complete schema listing for the database uh you can import a CSV file into the database as a table uh you can read a script and so forth there's a a bazillion of these doc commands to help you do some things there's also a lot of third party tools for interacting with sqlite databases some people really insist on having a guey and there's lots of those out there this is the one we ship with it new we also U have this tool called sq3 analyzer if you have some big databases what this does is it goes through and it shows you how much space is being taken up by each table and it's a very insightful thing because what we find is that programmers don't have a good intuitive sense of where what what part of their data structure is using the most space on disk if you ask them say what what what which which of your tables is the biggest they'll give you one answer and then you look and it's really a different one it's kind of the same way with profiling you know most people have this sense of what part of the code is going to run running slow in terms that that part's really fast there some other part over here that they weren't worried about that's run running slow same kind of thing esite 3 analyzer will show you where which tables are using a lot of data and that's very important we also have a tool just come out called SQL diff that takes two sqlite databases and shows you the differences between them Dan had a lot of fun when he was developing this he um he took the um the cookies database from Firefox he made a snapshot of it and then he went to a website and then he made another snapshot of the database and diffed them just to see what cookies were coming up and visiting various data uh websites very very insightful he was shocked he was shocked it was a very interesting thing to do and there's some command line options there to give you summary statistics and so forth these are these are fun tools to know about okay and if you want to build SQ from sources you go to the website you download the tarball or the zip archive it unpacks on Unix you type configure and make on if you're running on Windows nobody here's much using Windows so I'll skip that step but uh do make SQL like 3. C it builds the source file do make test it runs a lot of tests and well you know how that goes okay go back to some conceptual things if you think about the way uh an SQL database or any database engine works it's actually acting sort of as a a filter for data you've got your application over here you've got data on dis the application wants to get some data a lot of stuff is coming off the dis and the the the database engine says well you know well this doesn't this doesn't meet the constraints that I'm after I'm not going to give that back to the application it's filtering it's looking for the information that you want but not everything that comes off the disc is actually usable by the application so the database engine is a filter so now think about this if your app application is separated from your data by a network a slow Network do you want that Network here or do you want that Network here obviously you want the network here you want the least amount of data possible going across the network because that's the slow link so the key to any database engine is that you want the database engine itself on the same machine with the data now I bring this up because um in most you know when mql postgress and these sorts of things that's exactly the way it is the database engine sits on the same machine with the data it's a client server architecture and so this is the network connection right here you can you can actually put sqlite databases on a remote machine with a remote file system but with sqlite the database engine lives inside the application and so what's going over the network is now the the fat pipe that will work it g in the sense that it gives you the correct answer but it's not bandwidth optimal because you're if if if if your if your if your data is off on another machine you really ought to be thinking about it using a client server database engine so here's your decision Matrix so you've got a problem and you want to know well okay what database engine am I going to use to solve this problem and you think all right one is my data remote from my application do I have to across a network to get to my data if the answer to that question is yes then you want to go with my kill postgress orle SQL Server client server database okay do I have big data and by Big Data I mean data that's too big to fit in a single file okay if it won't if if if the file is too it won't fit in a single file that's big in that case you need to go with a client server database engine do I have concurrent writers multiple applications that need to be writing at the same time and they can't take turns they need they really need to be doing it at the same time if so then you need to go with a client server database the client server databases are able to deal with that because they have a server there that can coordinate the access from multiple applications coming at the same time sqlite doesn't have that because there's no Central server to coordinate things so it doesn't do that very well so but those are the only three reasons to use a client server database those are important reasons but if none of those three things apply then you really ought to be using SQL L and instead so I've heard a lot of people here doing Ruby on Rails and apparently Ruby on Rails their default is to use sqli at least for testing and that makes perfect sense because when you're testing you don't have remote data it's all right there on your machine and you don't have big data because you're testing and you don't have concurrent writers because I mean how fast can you issue web requests to your Ruby on rail server you it's only your your rate limited so that makes perfect sense there but then when you go to deploy and suddenly you have thousands or millions of people hitting on your website well then your data might get a little bit bigger then you might have multiple processes running with the central uh dis and then you might have multiple rights happening at the same time in that case You' want to switch over to a different database engine okay but anyway here's the checklist you've all done this checklist if you built a product involving databases but some of you have done this wrong this is how this is the error that you were making this is the decision checklist fail okay the first three parts are the same you no remote data no big data no concurrent writers okay I don't need to use one of those big client server database ISS you know what I'm just going to open a file using F open or whatever the Primitive is in your programming language of choice and write a bunch of stuff into a file I'm going to make it I'm going to put I'm going to put in um Json or XML or I'm just going to make something up on myself because I know better than all those other other people I'm going to put it in a file or maybe not just one file I'm going to create a directory and I'm going to put lots of little files in here a pile of files database these are all over the place this is not the way to do it okay there's a you give up so much when you do a pile of files database and what do you give up well before I get to that of course you see and and really SQ light is designed for this specific purpose and it's it's not competing against the client server database engines it's competing with EP open and here's the advantages you get when you're using sqlite instead of opening a file and writing stuff there yourself and inventing your own format you get power safe Atomic transactions you get a crossplatform database file you can move it 64-bit 32bit big Indian little it doesn't matter uh utf8 utf16 it doesn't matter it's crossplatform uh you get higher performance remember we can read out of a database faster than we can can read off a disc uh you get automatic concurrency control if two different applications open the same file at the same time they are not going to bump into each other SQ light will make sure of that and you get incremental updating if you make a change what time is it I'm just2 12:42 I'm way behind I need to go faster I'm gonna I'm I'm not not going to go down this Rabbit Trail um incremental updating but the most important thing about using it this way is that it uses SQL SQL is the secret sauce so many people I run into think well SQL is just the wire protocol it's this goofy language we have to use so that we can talk to our database engines it's really old it's like written in the 80s oh it's older than I am you know this sort of thing and um it's not but SQL is really really cool it's very very powerful um there are two Unstoppable ideas behind SQL two Unstoppable ideas and this is really key uh the first one is it gives you very powerful data abstraction and let's talk about that a little bit more here's who has read the mythical man month by Fred Brooks raise your hand yes yes um H in the mythical 40 years old 40 years ago he wrote this it is still topical today it's the bestselling computer science text of all time still topical he says representation is the essence of computer programming show me your flowcharts and conceal your tables and I shall be continue to be mystified show me your tables and I usually won't need your flowcharts they'll be obvious isn't that amazing he's talking about flowcharts you know when he wrote this back in the early 70s a flowchart was state-of-the-art in program construction it was it was it was the cats meow everything was done that was Topline now I guess the state-ofthe-art is I don't know objectory and it's kind of a little stale now what's the trendy thing now but you know 40 years from now whatever whatever 40 years from now whatever you think is the new and trendy solution to all your problems 40 years from now people are going to think about oh back there they were they were using objectoriented program can you believe how primitive they were I guarantee that's going to happen but you know his his Insight is is still Key by by having the schema by defining a schema you're really defining what your data looks like and once you have a schema in place writing the code becomes really simple now Rob Pike said something very similar really the same thing data dominates if you chose the right data structures and organize things well algorithms will almost always be self-evident data structures not algorithms are Central to programming same idea represented in different words choose your data structures carefully here is Lind tals he says bad programmers worry about code good programmers worry about data structures and their relationships and he's absolutely right I'm going to pick on him he said this on the get mailing list of all places and I'm irony of ironies we wish that he would follow his own advice because here are some things that the git data structure does not do with you all who get get users raise your hand get users raise everybody uses get pretty much yeah um with Git if you know a particular checkin there's no way to find the descendants of that checkin at least no way that I've ever found that's a show stopper for me I I can't I can't live with that I I need to be able to go back and and find what happened next and get doesn't show that it doesn't show the the check-ins over a particular period of time all the time we're out there looking we find some bug and we're trying to understand why we why this bug went in or why some change happened or what was going on in that area what was going what were we doing in other branches at the same time what were we thinking back then a couple years ago get doesn't give you that capability on and on shortage of time I'll skip over the the rest of these but the reason for this if you think about it this is the kind of the underlying git data structure these are the commit objects now if you use git apparently all the git tutorials start with a detailed description of the underlying data structures and then move upwards to the user interface which tells you a little bit of something about get but this is the the commit objects so this is the first commit right here and then there was a couple of other check-ins onto that and this it it it branched right here so each one of the commits points to its parent and then so there's some more check-ins in here and B we did a merge so this Branch merged into this branch and so B points to both of its parents so you've got these pointers backwards but what this means is that if I have this check in right here at C I can point backwards in time and find out what its ancestors are but I don't have any way of moving forward and you can't go back and add a pointer going the other way to the data structure because if you added something to the the commit record that would change its hash and change its identifier you can't do that okay there's a simple solution to this problem and that's to create a side table that keeps track of this stuff this is an auxiliary table all of this stays the same it's just every time you add a new commit object you also make an entry over here in this lineage table and it's got the parent the child and the rank I'll explain the rank in just a second so for example we've got an entry here that says the parent is B and the child is a right there and we got an entry that says the parent is C and the child is B right there and it's got a rank of and here's a parent is D and the child is B and its rank is one so the rank is showing you zero if it's the primary parent it's one if it's a secondary two if it's a tertiary and so forth and so this little bit of data here exactly corresponds to that and this is still your primary data this is just sort of a a little auxiliary data it can be reconstructed anytime you want so anytime you need to reconstruct this you can and and it be keeping up to date every time you just accept a new object you just make make a new entry in the table well given this now you can go back and you can do queries that will give you the um lineage in both directions you can go forwards in time and backwards in time and this completely solves the git problem of being unable to find your descendants completely solves it here's an example of this is example code I won't walk you through the details this is a procedure for finding all of the ancestors of a checkin using a data structure that the schema is a little different the things to note here are it really this this procedure consists of a single function call to something that's going to run a bunch of SQL and this SQL is a single SQL statement it's using a recursive Common Table expression to walk through that tree and find the first n minus one ancestors you can understand this later key points to remember are that what there's no Loops in this code there are no conditions there's no there's no algorithm specified I'm I'm leading up to the second Point there's no algorithm specified in that um we're just talk we're we're we're telling the the databas engine this is what we want you figure out how to get it I don't want I'm busy trying to solve my customer's problem you figure out how we want it and the database engine is going to figure out the algorithm involved there's no Loops no variables nothing you just say this is the data I want give it to me and then the database engine is going to figure that out for you this came up uh looks like Tuesday I had this problem I had the slide said you know this is this is a good example where this came up we had the need Tuesday to find all of the check-ins which added a new file so there was a there was merge check-ins that added a new file so we had the main line and we were merging in some other branches and sometimes you'll merge in a branch and that new Branch will bring in extra files with it you know what I mean so we need to Define that how would you find that how would you figure that out and get we need to find all instances where that happened how would you find that and get well I don't know I a bunch of python code I I guess I don't really know we we just did this wrong one query and it it outputed it gave us a bunch of hyperlinks then we could just click on those hyperlinks and and look at every instance where we had files that were added by Emer so if you have these sorts of data structures at hand you can do some really powerful things uh which is the next point it's a declarative language uh SQL with the whole point behind SQL is that you're asking the machine you're asking the machine uh a question rather telling rather than telling the machine what to do that's a it's very difficult to get your mind around but once you understand and get SQL at that level it becomes a very very very powerful tool so here's uh Andrew Lloyd at Berlin buzzwords talking about at Google he was talking he was giving a talk on um spanner which is Google's uh uh global scale SQL database engine which is internal proprietary nobody's ever seen it if you're not a Google employee and this one talk or a couple of talks and one paper is all we know about it somehow they've they've they've used atomic clocks to get by the cap theorem it's all very mysterious but whatever um he says uh people realize it's incredibly powerful just to push the semantics of your query down into the storage engine and let it figure out what to do so this is this is why SQL is so so cool because you're you're pushing off work to the database okay so all that said so what is SQL L good for uh it actually works out pretty well for low volume web uh websites the sqlite database or the sqlite website is itself run by sqlite of course we eat our own dog food we get only about 400,000 hits per day that's not shabby though you know that's okay and and but a lot of that static content but a lot of it's Dynamic and when the dynamic pages are about 200 SQL statements each and uh but we're running on 124 slice of a server and our load average is rarely above 0.1 so there's got lots of room to grow here so for smaller websites sqlite works great the rule of thumb is if when you need to go to multiple machines when you're ready to go to two servers to run your website if that's the point you need to switch over to a client server database uh data analysis SQ is great for this you've got a bunch of experimental data it's and some weird format you can pull it into an sqlite database you can do lots of interesting queries on it and what's really cool about this though is it's all in a single file so that if you're a researcher and you've collected a lot of instrumentation data and you're doing some analys Anis on it for a paper it's all in one file and then you can email that file to a colleague as an attachment now you could do the same analysis in any other SQL database but you cannot email a mySQL database to a colleague as an attachment at least not easily um very popular in bioinformatics so another big use for SQ light is is an application file format uh do people still write desktop applications anymore anybody here do desktop applications one everybody now is doing web stuff aren't you okay well but you know you still have file formats and here's an uh here's some examples um Adobe Lightroom when you you know when you save your stuff from the Adobe Lightroom it goes directly into an sqlite database um how about the micro station from Bentley systems this is the cad cam system that a lot of uh major architectural firms and governments use for tracking things like um the water system or um like AT&T uses this so when your technician comes out and he's got his iPad he's got the entire AT&T network there and you know it's it's really one of these CAD systems is what what it is and sqlite is the database engine that backs this all the cad data is stored in the database and this has worked out really well for Bentley because what they've done is they've gone out and have acquired a bunch of other smaller companies so you're building a big building or something or you're designing a big building and you know you you you you you know develop all the big you know columns and stuff and then you have to figure out well what happens if an earthquake comes you've got to run some simulations to make sure that it's not going to fall over and there's like a bazillion little companies out in the world where three phds get together in a garage and say we've got this great software for modeling uh high-rise structures in earthquakes or we've got this great software for watering or for for modeling a water hammer effects on public water systems and things like this and so Bentley would buy up these different companies and of course their their software is written in in in a huge variety of languages like you know 4 TR four or you know whatever and but see now SQ light is their linga franka because they get all of these all of these different programs can Comm communicate easily with the database it's completely interchangeable and it becomes sort of their Central hub for for for getting a lot of people to work together uh I'm going to skip that for interest of time because we're about out of time sqlite is used a lot for data transfer sure enough there are I don't I don't know how many are actually still deploying this but there were at one time it probably still are there are cable companies the you know on your your cable television had you got the complete program guide for the next two days or something and it's in the little scrolling window they download that whole thing as an sqlite database it's streaming over the satellite continuously and then it gets down on the machine and that's really convenient for them because that little window of the next two hours and and three or four channels that you get to see that's an archery query against the Matrix right so so they can down they can stream the data directly this way with an sqlite database um but the real thing that I wanted to bring to your attention here is the use of sqlite as like the this is this is its key use here as a local cache for system data so here's a system where you've got some big big data over here and it's in a um uh a data center it's a nice air conditioned data center with attendance and and and condition power and it's well connected to the rest of the world it's a really sweet place to be a program yeah you you're really doing well there and then you go up to a tower then out here you've got some remote thing and it's some little piece of plastic in a box and it's connected uh back to the central office by via a very dodgy radio link That's goes away a lot and um so you know a lot of people say well you know I can just query across the radio link you know but there's a lot of latency there and and then if you you you go in a tunnel or in a building you might lose it so the idea is that you you pull down the parts of the data that you need into a local sqlite store and you're querying this and so your queries are much much much faster because they're local and if you lose connectivity for a little while you can probably still continue operating so this is this is really the key use um oh went too far so John Chambers the soon to be ex CEO of Cisco uh I he a couple months ago he had an interview and he said that they asked him about big data and his his remark was contrary to a lot of people's views of big data I think the majority of data will actually be acted upon analyzed and acted upon at the edge of the network he's talking about the data is is is analyzed and stored out in the phone out in the little device out in the the thermostat out in the in the radio of your car the things out on the edge that's where all the data is really going to be analyzed right now the trend is to pull all the data into the data center and analyze it there but he's saying no the trend is is is it's going to be analyzed on the edge and that's exactly where sqlite lives it's out there on the edge so all of the other database engines they are designed specifically to work in a data center a well attended lovingly monitored data center which has a DBA there to to to handle things if things go wrong sqlite is designed to live out on the edge out in the jungle out Works Rough and Tumble to take care of itself to require no user interaction or no maintenance to just work that's what I mean by sqlite is the database for the edge of your network I hope that I have some time for questions and that you have them questions about sqli questions about how it should be used questions about why it should not be used or when it should be used yes sir you started with the need you had when did you realize that was the fun that's been a when did I realize that this is really the way sqi works what you think about the need that I had was was very similar in that um I I was going the the words I used then were um I need a zero Administration database I I need a database engine that doesn't depend on having a DBA there to reboot the server when they power cycle the machine and and so that sort of evolved because at that time we didn't have nobody thought of the concept of The Internet of Things okay this was 2000 this was actually in the 90s uh and so it's really the same concept but the words have evolved over time to express it in ways that are relevant to what we're doing today the idea is that SQ light is designed for for for the problem that's right here now localized fast local cash no zero Administration uh yes was it that habit of representation that allowed you to create a file format which has stood the test of what yeah you know see I didn't really realize what a great thing SQL was at the time I was kind of grumbling and having to access the data in the SQL format for this problem because my problem it was a I I took over the problem from locked Martin and they had they' spent Millions trying to solve this and they had all these objects that were trying to do things and and I backed up and analyzed the problem and model the problem as a graph problem and realize that this problem is MP complete that's why you're not converging on an answer it's an MP hard problem and um that I was able to find some heuristics that gave them a good solution not necessarily optimal but a good solution very quickly but I had to build in all sorts of monitoring to make sure it didn't and they had lost focus of this because they were so objectoriented that they were focused on individual objects and their methods they weren't seeing the big picture and uh so that I was focused on the on the algorithm I really and I was grumbling and so when the the data that I had to pull in to solve this problem was all in a relational database but I was sucking the entire thing into memory and building in memory structures and so you know when the program would start up I'd read all the data and that was it it was a one-time thing it might as well have been a flat file but um that was but then later as I got into this I began to realize hey this SQL thing is really really cool cool and it solves a lot of problems so it was a learning process for me I don't expect people to understand it right away I certainly didn't other questions yes Dustin have you ever gotten board working on the same thing for that long of time have you ever gotten bored working on the same for for this long of time no there are always a thousand exciting challenges with sqli uh like the other day uh Mikel I won't or Michael I won't try and pronounce his last name because it's a Polish last name um what no it's it's z z salowski or something um uh he he wrote this fuzzer that went through and it really Define a it's found lots of lots of really bad bugs and lots and lots of software but he put it against sqlite and he found a lot of problems and that's kept us busy for the past month fixing those problems they're all fixed now that we know of we have you know we've been running the fuzzer continuously for weeks and weeks and we haven't hit anything serious in a long time and so if you download and get the latest everything's fixed but but there's always excitement like that there's always customers coming to us we need this new feature we've this query is too slow it's it's it's a lot of fun we enjoy it it's it's a great job uh Airbus wants us to support sqlite for the life of the a350 airframe so that means I will be supporting sqlite until I'm in my 90s I'm cool with that I'm good other questions yes sir have you ever thought about adding new data types have I ever thought about adding new data types well uid anything like yeah uh that would require file format change and a big part of sqlite is that the file format is unchanging so that the database you have today is going to be accessible 40 years from now and so so there there's a little bit of wiggle room in the in the file format and you know we have extended some things a couple of times and we've got a little bit more space to extend but we're not in a big hurry to use up what's left or our expandability we want to be very very selective about what we put in there so we don't have plans to put in like a Json a b Json type like in post grass now you can always store Json as a text string and we can add new things to to um to access that efficiently but you're thinking of a data type that's like parsed Json that could be indexed and so we don't have plans to do that right now but you know it could change other questions no more questions yeah I'm I was kind of curious how you decided to separate the tables in and separate the database tables in one file what method you use to keep uh yeah that's getting down in into some some lowl stuff um we the well I could bring it up there it's it's hard to say so so there's there's it so it's it uses a b tree structure and the there's there's a single B tree is rooted at page one and that's the root of the B tree the database is is built up of pages and they're all the same size let's just say they're 1K they can be different sizes depending on the database file but in each one one F always the same 1K so the first 1K is the root page of the sqlite master table and then it might have other Pages as it fills up but there's pointers to the other pages in that root page and then that that esite master table contains the root page of every other table as part of the schema and come to me afterwards and I'll I'll show you the details over my head yes more questions question yes sir have you ever done a big rewrite have I ever ever ever done a big rewrite we commonly rewrite major subsections of sqi because we have this massive amazing test protocol with thousands of times more test code than we have actually production code we strip out entire subsystems and rewrite them periodically yes that's like serious refactoring yeah and and we can do that and get away with it because we have such intense testing so really the site product is a our test scripts that tells us what what s supposed to do and then you know the stuff the code that we ship that changes massively from One release to the next and we don't even announce it we just change it it works the same just a little bit faster so most your tested at the functional level as opposed to unit tests um part of do178b is that you fly what you test and test what you fly okay also you're only allowed to use published and documented interfaces for your testing okay so es is design in such a way that we can actually test everything in excruciating detail using published interfaces so that's hard it's hard to get there we spend a lot of time maintaining that but you know when when you're when you run the world yeah it's it's an infrastructure thing when you turn on the water at your TAP you expect water to come out yeah when you store stuff in the sqlite you expect it just to work that's not something that you want to have to fuss with many lessons learned from going through that process tons tons we could talk all day for that yeah yeah that that that we don't have time for that now but we can chat later over over coffee okay more questions I'm way over time I think how far over am I Andrew yes siros um well yeah I mean everything I've done has led up to sqlite yeah certainly I didn't know about do178b until some aerospace companies started contacting me and then that because you have to buy that specification you have to pay for it and I went and bought a copy and read it and it takes a year or two to kind of get your mind around it because it's written in in in um uh you know that kind of weird language that Aviation documents are written in I don't know it's it's you know technically it's English but it you have to yeah it's hard I if you ever read an aviation standard have you I don't recommend it okay but but it took me about a year to kind of get my come to grips with it and it made a huge difference that that was uh when Android was just kicking off in fact Android's might had not had been publicly announced but we had been called in to help with get Android going with sqlite and I they had been publicly announced and and there were a bunch of Android phones out and they were we were we were getting flooded with problems coming in from Android I mean it work work great in the lab work great in all the testing and um then uh then you get out there and you you give it to uh 200 million people and let them start clicking on their phone all day and suddenly bugs come up and this was a big problem for us and so um I started doing this uh this following this D 178b process and it took a good solid year to get us there good solid year of 12-h hour days six days a week I mean we really really pushed but we got it there and you know by the once we got SQ to the point where it was at that do178b level a standard yeah we still get bugs but you know they're they're they're very manageable they're they're infrequent and they don't affect nearly as many people so it's been a huge huge thing for for something if if you're writing an application do1 you know a website a d17 is way Overkill okay it's just because it's very expensive and very timec consuming but if you're running an infrastructure thing like sqlite it's it's the only way to do it how long does it take to run your test uh the longest test package we have well the the the one that you saw there with 177 million takes about eight hours on a modern workstation we've got another test package that runs for 16 uh we um we just did a release yesterday that was the request of Mozilla Firefox and they requested it on Monday morning and it took us until yesterday to get there so we got to run all these tests on multiple machines and we've got a lot of manual analysis and code review to do as well it takes three four five days well we did it in four days because we didn't run into any serious problems assuming you don't run into problems it takes about four days to do a full release test cycle three guys working on it so all are we done well would you mind sticking around I'm happy to stick around is there any pizza left yeah so