Transcript for:
AI工程师大会的见解

I'm one which I think is one the top time in the US it was last year and is the organizer of the AI engineer conference works theying there recommend it it's thank you thanks M thank you um yeah we uh we have yet to actually have M on the podcast but I'm told that we're going to do that eventually for for repl agent uh okay so this is a very casual thing I was asked to prepare a few notes for you guys um and uh probably half of you also like have no idea who I am so I have to do obligatory obligatory social proof as well um maybe I'll just kind of present I don't think this okay this doesn't work the Wi-Fi is not super helpful here um like who am I to be speaking um obviously I'm just a hacker I actually got my start a lot with uh AGI house hackathons as well so it's really nice to be uh speaking instead of uh just hacking but U I'm also really Pro hacking as well um we organized the first um ax meetups about a year and a half ago um this was together with Maggie appon lus Lee from notion uh Jeffrey lit and uh a bunch of other folks uh 200 last year 400 this year uh and uh there's a lot of demos that you can see on Laten space if you want to see uh Inspirations in Ai and I think it's the the interest is growing I think partially it's mostly just frontend developers and developer designers getting into AI U and so we all need something to call it and we just call it AI um if you want more writing some of the most popular essays on Laten space lat. space is the URL um have been all about AI um I think uh and and then and then the last thing that I work on is AI engineer uh which is the conference that Mi should have spoken at but didn't um the general thesis uh that that I promote uh is something that's very favorable to this crowd and to the uh CLS AI model Labs uh which is that uh there's a there's a general idea between there's a spectrum oh this is not very good huh um can I zoom in does that does that work okay um there's a General SP between uh ml researcher ml engineer and then there's a like a dotted API line and then there's a full stack engineer on the right uh and what has emerg of fation models and generative AI is that there's an increasingly specialist role called the AI engineer that is qualitatively very different than EML engineer and the research scientist uh and that is growing and growing this is the engineer that's figuring out agents it's figuring out I figuring out all the other custom stack that is that is uh that is not uh that is the specialized thing so you can read the rise of the a engineer post there um I think the the tricky thing is um over the past two years of the existence of the AI engineer um it's comeing under a lot of criticism mostly because like do we need this to exist is this a real job um should an awful SE Engineers just use AI or write Ai and of course they should um or like is it valuable compared to just being an ml engineer just being a a research scientist and the answer I've been thinking about which I haven't shared at all except for here um is something I learned from Ben Thompson orary um just to show hands who knows Ben Thompson orary 5% of the audience um all of you should read them read him he's he is like the leading sort of tech business thinker on the planet like apple quot him for their strategy conversations um I don't need to say more um they have the concept of smiling Curves in technology where um this is originally invented to describe the semiconductor Market where uh the um the the accumulation of value occurs at two extremes one at at the the deepest part of tech and two closest to the users so this is describing um Intel um and and the semiconductors it later shifted to Mobile so that's Qualcomm Samsung ssung arm on the hardware side and then Apple and Samsung on the uh the the the user side and then the manufacturers are very very commoditized and not having that much value same thing also happened uh Ben Thompson also applied it to um content um the what what the internet did to the content world was effectively reduce the the value of Publishers and have direct connections for writers to Their audience as well as uh social platforms that became the most valuable companies on the planet what does that mean for AI um what is the smiling curve for AI something I've been thinking about a lot uh been showing here for the first time uh Nvidia of course um the lowest level making the most money um a whole bunch of um these happen to be the sort of middling um uh Foundation model companies that uh actually happen to be doing the worst um out of out of all these and actually doing slightly better are the sort of end user companies um and I think this is the core chart that you have to kind of kind of be comfortable with um to work in AI you have to believe this is real um this this is not real you should not work on this you should just write Cuda um and uh and I I think the the core idea is that like you know are you here just to write GPT rappers and is is that all there is like is that is that meaningful at all um and I I I think the the the controversial thing is that actually this is uh this year 2024 has been a total victory of GPT rappers like over like everyone here is doing well everyone here has uh gone under a it's been AC hired I call it execu hired um so actually uh working on really good AI for a domain for exploiting some some model feature and turning into a product that is the entire job of the AI engineer and that's the entire job of AI I don't super care about like agents in particular like it's such a broad word that is is not really meaningful uh but I do care that the ux makes sense to people who are not even knowing that they're we're using AI they don't care how it's implemented they just care about is it intuitive does it help them do their jobs does it get out of the way um I think uh I think I doubt I my sides okay um I'm sorry I didn't I didn't have time to like add whole bunch of logos U and by the way I'll I'll tweet tweet out these slides later so you don't need to take photos I also recorded the the the the zoom um what do we know work um these are all basically ux layers on top of a foundation model layer right so chat layer over knowledge base uh PDF any any chat with PDF thing all printing money uh all doing super well chat with web search so uh I think the the basic idea is are you searching your internal knowledge base are you searching the the Brad internet um there's also spicy autocomplete uh which I co-pilot and all the other co-pilot clones um chat while clothing I I I've been thinking about this as like a a qualitatively separate thing from autocomplete um and it took us a few iterations to get this right but I think clot artifacts is the main thing that you should be aware of U vzer actually even pivoted uh to to this format uh once you once you have that and I think the the new rep agents implementation also matches that quite quite closely zero to1 Creator tooling uh this is honestly the the the least AI engineer thing of of of the entire set because it's basically custom access to a model that does one thing well that I don't have so as a podcaster we have um custom AI music AI host and AI uh image generation and we use that that full stack but we're just kind of using those models like we don't really care about the ux it could be bad it doesn't matter um it has spreadsheets uh elicit I think is is is one of the companies that I I strongly recommend and there's a whole bunch of spreadsheet companies that basically you sort of uh enter and rolls of data and and you just kind of fan out work um AI slides a lot of people talk about Gamma I'm I'm a bit of a skeptical one of this but people like it a lot and I don't get it yet things that we know do not work uh things I actually encourage I think a lot of things in heck sometimes heck fun um you rarely get people telling you what is a waste of time um so I will tell you what I think is a waste of time high band with voice I have voice mode you can try it uh it's not good the reason you don't have it is not because uh opening is like short of money or like whatever it's just not good like why would you release something that's not done yet um video video Generation Um uh you can try tapis um I've seen this at a website and again like these are all smart people I don't want to like CRA on their work they just don't work yet it's not that they don't work um canvases um there's a lot of uh we we had this T draw on um I think about late last year early this year we had uh you know all these draw UI to code and generate code and we create that um I think very very viral um definitely tends to win hackathons and then nothing afterwards I think I think it's worth reflecting on why like why why do these things impress and then flop I I don't I don't get it uh I don't all I know is that it hasn't worked yet um so either you break through it or you choose a different direction uh and by the way we have a t dra episode as well if you want to uh go into that uh multi-hour coding agents I think doesn't work I think that the qualitative difference between what you see um open ey's approach is versus like all the other ination de is mostly that open ey tends to just like take the minimal uh uh extension of inference um uh to do the next best thing right like code interpreter was very short um like the the the amount of multi-step like web search thing is very short and um what you see with o1 is slightly longer like the the longest that I've get been able to get o1 to do something is two minutes um whereas the default Devon experience is like 15 minutes to five hours um and uh that's not really the ux that people want people want fast feedback um I I will say though that Devon also nailed the chat with code experience so it's not it's a it's a complex story and I'm not a full critic or full fan of it uh VR um needless to say hasn't worked yet um so those are those are the things that you know I want to set context for you for for hackathon um you should have an idea of things that work you should have an idea of things that don't work and find something in between or find something if you're like I'm if you if you know that I'm wrong please prove it right like that's that's the spirit of the hackathon um and then from there from those core examples um I want people to one what I'm really seeking is is to extract the principles uh for me the first first core principle is every time you copy paste on chat gbt that is a product opportunity right to eliminate copy paste in your app uh means like it it just like puts the person in flow second thing is constraints um having like a general thing a text box that can do anything causes me to do nothing right like to to i' rather you play give me m lip templates to to fill in the blanks so that I can like work with that um third is explor multimodality uh far far far too many apps at just text um actually hey like you can draw and and and um you know convey a lot more information and a lot more uh dimensions then uh takes effort in text uh fourth principle is fan out work um I don't have an example here pulled up but literally just you know uh import a list like it's 100 rows f it out um you have like you have servess intelligence why are you still calling um you know apis in sequence rather than in parallel call it in parallel call it massively parallel it's fine uh and finally every 10x and speed unlocks new products um I think maybe the previous hackathon was a brock hackathon um and we're seeing another two to 3x now with cerebras and Sova and all these all these other guys uh giving you 500 tokens per second 1,000 tokens per second inference what can you do with that that you could not do before because um it was too slow and user unusable um we're still mapping this up I think this is very early days in a this is what I discovered over the last two years and I just wanted to share with you um thank you next or questions or two questions oh okay I don't know if your time being any question yeah I'd love to hear any comments you have on W yeah we were the first podcast to interview wsim um uh what comments can you can you give me more prompts yeah I think um I'm I'm interested in your thoughts on highly openend that for people who don't know allows you to essentially describe a website and have a website generated and it's surprisingly good but not perfect there are a lot of maybe anything like um highly generative Concepts where you mention constrain to create yeah and what do you think about things to take off approach highly open ended uh I really like that um it's not a real product yet in the sense that it doesn't make money um it's a art piece um so walim is a it's a new kind of canvas um so for people who want to try it out um uh yeah this world Sim and web Sim are the two the two things here and you can search for our our take on it web. a it looks like the one um yeah I think it's art piece again it's it's going to do well hackathons and then the the question is like what do people want to create out of it uh but I am very supportive of it because uh it does one thing that um all the other AI approaches fail to do which is take advantage of generation generative instead of treat it like a a disease um you know um I call these temperature two versus temperature uh what zero use cases right like we got generative AI we got an infinite creative machine and the only thing we end up doing is treat hallucination like a bug and try to squash it um what if you make hallucination a feature uh to explore different worlds that then exist today um so I I I'm very supportive with that we just haven't worked on the business model like it's it's fun art yeah um so you have slide you said high band with voice you were skeptical of that uh so there's been some startups like BL AI that have been um making testing that have been so Point proven High B with voice is a yeah so there been uh some starters have been using uh like voice agents for customer support out sales things like that is that what you were referring to with skeptical of highand no no no uh so I I use this to to be Shand for like interruption um the stuff that was demoed in the voice mode demo um so like customer support agents yeah very very uh I mean it's proven it's working it's uh it's making money um yeah just it's high bandwidth like the the the next Frontier is is literally back and forth with low latency like like you and I are doing um the things that need to happen there are basically we need to predict uh start and stop points uh we need to use context in the language to do that uh and no one I think no one including open has trained that um so you mean like character AI that kind of experience hasn't been nailed yet no have you tried character voice yeah it's not good it's not good yeah so doesn't work yet yeah okay um yeah yeah yeah voice generation works it's not high bandwidth back and forth conversation right like so every time every time someone demos something in hackathons or meetups I see a lot of these right like I see I go to these like three four times a week um it just like it's like it's like cool CP like great for participation prize but not there yet I have a question so where do you see the future of software engineering in one five and 10 years oh are we all going to be here or beook uh so this is yeah sorry okay um yeah absor question to answer because you know I don't know the answer right like of course uh the the last the the the main meme question thing I I say is that if you believe that AI will will take jobs then you should be an AI engineer because it's the job to take the other jobs AI engineer is the last job um you need a you need AI lawyer you need an engineer to build that lawyer right like um like you need and and like the the the the real screw of your mind question is which one will last longer AI researcher or AI engineer I even think AI engineer will replace researcher right because like if you can automate the researcher then then engineer will be left to to maintain that so uh yeah keep at it because you this has some longevity like uh until until such time we run out of Scar okay and in the mean time I promise of to say we