Transcript for:
Anders Hejlsberg sobre a Visão Geral do Projeto Corsa

Hi everyone uh this is Anders Hejlsberg lead architect of the typescript project and Technical fellow at Microsoft um today I'm here to talk about project Corsa which is our effort to Port uh the typescript compiler and tool set to native code now since uh the inception of typescript more than a decade ago uh typescripts been written in itself and that's brought a lot of benefits but it's also consistently brought some challenges in particular around performance and skill availability uh you know the JavaScript runtime platform is really optimized for UI and browser usage and not so much for compute intensive workloads like compilers and system level tools um when we talk to customers one of the most uh commonly reported issues that they face is out of- memory situations you know where their projects are just getting bigger and bigger um and we've likely reached the limit of what we can squeeze out of uh out of JavaScript you know JavaScript comes with a bunch of overhead that you know is is troublesome at times like you know it's a jit compiled language so there's always startup cost to jit compile your code uh the very flexible object model with expando properties and so forth but that adds a lot of cost you can't inline allocate stuff in structures and so forth there's no shared memory concurrency um so all of these things mean that we leave money on the table you know um to to sort of illustrate what I mean by that let me try to start um a full compile of uh the visual studio code project which is about one and a half million lines of code and I'm going to start that with our existing compiler while we while we keep talking here um so about six months ago we started an effort to evaluate uh moving typescript to to native code um we knew that we wanted a port not a rewrite meaning that we wanted to Port our existing compiler Lock Stock and Barrel and get all of the same semantics in the new code base um we're now well underway with that uh it we have more than 100,000 lines ported uh scanner parser binder are all pretty much complete uh the type Checker is about 80% complete um and we're working on language service um at at this point we're porting to a language called go um and some of you might ask well why not language my favorite language why not C why not rust why not C++ we actually spent a lot of time prototyping in all of the various platforms or all the various languages um and we found that go is the most suitable for the particular workload that we're trying to do um it's sort of the lowest level language we can get to that gives us full uh optimized native code support on all platforms um great control over data layout uh the ability to have cyclic data structures and so forth it gives you automatic memory management with a garbage collector and great access to concurrency um what I'm going to show you now you saw that compiling Visual Studio code with our existing compiler takes about a minute now we're going to take a look at what how long does it take with our new native code compiler and there we go a little over 5 Seconds 5 and a half seconds so we're talking more than 10x faster here um which is quite dramatic you know that means your projects will load 10 times faster that means your batch compiles will be 10 times faster um now let's try to look at some code here um I said earlier that we're porting and that means that we are literally taking file by file function by function and moving from JavaScript or typescript to to go let's for example try to find here um Checker doget type of symbol which is uh a function that we have uh in in the type checker and then let's look at the same function in our old code base there and put them side by side and really you can see that this is a port this is the same function with all the same code in all the same order but in go instead of in in typescript um now what that means is that all of the implicit behaviors that we have in our type Checker such as which type do we pick when there are multiple inference candidates Etc and Etc all of the semantics will be the same all of the code will be different but the semantics are are preserved and and that in turn means that we actually expect to fully be able to replicate the same behavior um Let me let me try for example to introduce an error in this program here this is actually the source code for the old typescript compiler I'm going to add an extra argument that shouldn't be there so I'm going to switch to uh the compiler directory and now I'm going to set off the old compiler compiling itself and now we should see an error reported uh the error that we just introduced in in the code and sure enough there's that error now let's try it with the new compiler um and you'll see exactly the same error reported all be it a lot faster obviously um and that's the experience that we're going for here really is this is a pluck andplay replacement for our old compiler now of course we're not just building a command line compiler that's one of the one of the artifacts that that you know are part of the typescript tool set but there's also the language service and which is arguably the more important thing so I'm going to launch the go implementation of the language service from inside uh my visual studio project here um let's do here and then let's launch VSS and that brings up a copy of visual studio with the go language service now the native language service running underneath um and you'll see that we have uh hover implemented here um we have go to definition we can navigate um and also we have you know red squigglies from errors if I introduce an error you'll see red squigglies here just to give an you an idea this is the visual studio code project again which is 4,500 files 1 and A5 million lines of code let me try to restart the language server and you just count here to see how many seconds that's going to take it's about one two maybe 3 seconds to completely kill the language server process restart it parse all 4500 files uh and bring up uh semantic errors for the file that I'm in that's at least five times faster than project loads with the with the old language service so you're going to see some really nice benefits When in in your coding uh experience now the new language service we're moving to um the uh to use the LSP architecture that most languages service services use now um LSP predate or typescript predates the introduction of LSP the language server uh protocol um and we've been w for for years really to to move to that and that's what we're doing here um now that means that that the language service isn't necessarily going to be a feature for feature Port of the old language service it's obviously going to have all the all the basics but there are also today in this modern world many cases where AI assisted uh capabilities are more appropriate than than what is in our existing language Service uh especially when it comes to refactorings and so forth so we're looking at how to how to SL and and dice that now I wanted to also talk a little bit about how did we actually get all of this extra performance 10 10x is is pretty dramatic right um it turns out that about half of the performance gain comes from moving to native code the other half comes from our ability to use concurrency let me try and illustrate this by first compiling uh doing a full compile of of the old typescript compiler of using the old typescript compiler so so this is about 250,000 lines of code that's being uh checked and emitted uh and it takes about 7 Seconds let's now try and compile it with uh our new compiler but let's Force the let's Force the compiler to run single threaded and you'll see it completes in about 2 seconds so we're getting about 3 and 1 12x here from uh from being uh Native but if we try to do it concurrent which is the default um you'll see that we complete in less than a second uh and here we are about 8X faster um part of why it isn't 10x here is just because there is one gigantic file in the in the called the type Checker and we can't parallelize each file so so therefore you know that thing is going to take a little bit longer with with projects like Visual Studio as you saw we're more than 10x typically um now it's it's interesting that that um you know the the way the way we get there like our our our existing compiler we've we ported the existing compiler and the existing compiler has a very functional uh uh architecture internally you know once we pars and binding files we we have IM uh immutable ests and then we can have multiple programs share the same a abstact syntax tree for for each file um now in the in the new compiler here for all of our parsing binding and emit we are fully concurrent because these are what what I often call embarrassingly parallelizable problems you know to parse a source file basically means load it into memory build a data structure that represents the source text uh in an abstract syntax tree um and then leave it in memory that's basic and and you can do that for each file and if you have eight cores available you can go eight times faster because it's completely parallel um type checking isn't quite parallelizable in in the same manner because you know types kind of reach across multiple files um so what we do there is that instead of running one type checker on a program we create some number of type Checkers currently it's hardwired to be four um and then we give each of those type Checkers the same program but tell them to check a quarter of the files and then they go off to work um and of course there's some duplication there each each of those four type Checkers will all resolve all the the built-in types from the standard library and so forth but most of the most of the resolution and checking is local and so we see that we can we can get two to three times faster in the check phase at the cost of maybe consuming 20 to 25% more memory but overall we still consume less memory than we used to okay so at this point now that we're public with the the repo we have a uh single project command line compiler um that's about 80% complete I would say it doesn't currently support JavaScript and JS doc or jsx um but that's just a waypoint we are we are moving towards completing that we have the language service underway but it's still early um we do expect within this year to have a fully functional replacement command line compiler that supports JS do jsx uh project references possibly incremental compile as well um a new interprocess API such that from other languages you can talk to uh the compiler um and we are of course looking at using this performance gain that we now have of 10x to to think about new kinds of AI features that become possible such as you know uh immediate type checking of of a of an lm's uh output Pro provision of semantic information in the llm prompts and and and so forth um now I hope you you'll want you you'll go check out the new uh GitHub repo uh build it try it on your own projects and share your experiences and if you find issues uh please share them as well thank you