so we're not an AGI company we're not interested in building digital Gods I'm interested in making this te technology like deliver on the promise it has made and solve real world problems today we are in Toronto we're in cooh here's office in Toronto we've been invited to um their build day here in Toronto they're running these in four cities so in Toronto in San Francisco in New York and London and we're going to be capturing the events here in Toronto and in London next week so a couple of weeks ago you've probably heard about this coh here absolutely shocked the market they released a new set of models called command R and these models are particularly good for a trieval augmented generation now we interviewed Patrick Lewis last year in in the London offices and Patrick invented retrieval augmented generation he wrote the original paper on it it's a really powerful technology because it allows you to ground language models uh retrieval augmented generation is kind of a general term uh these days is it refers to having a generator uh a model that generates stuff usually it's text and augmenting it I.E giving it some stuff that's not just the kind of usual thing that you would put into a large language model uh to make it better at its task and you're going to find that augmentation from a retrieval database so uh they're often used in situations where you have a large language model or or a chatbot and you want to give it extra knowledge that you have that's external to the chatbot and so that retrieval augmented generation Paradigm is about how to hook up and get the large language model or the AI to leverage that unstructured knowledge that you have uh via retrieval to better do your your tasks your knowledge uh intensive or your knowledge based tasks today we're going to be speaking with Nick Frost one of the co-founders of cooh here enjoy yeah so I met Aiden while I was at Google brain uh working as a research as a research engineer um Aiden was a intern actually at the time uh that's where he was working on the Transformers paper which is the paper that kind of kicked off this whole thing um so I met him there working in Jeff hinton's lab of the Toronto office and I had met Ivan actually while I was a student at UFT and Aiden had also met Iden Ivan at UFT but we Aiden and I hadn't didn't meet there so yeah yeah Serendipity but yeah what was it like working with Hinton uh I really really enjoyed working with Jeff Hinton uh he's is where I learned how to it's like from him that I where I learned how to do research he kind of taught me everything I know about machine learning and about research in general uh yeah so I was really really lucky to get to work with him for those years yeah I can only imagine yeah that that's that's amazing and um I I also hear that you're in a band yeah tell me about that yeah uh yes I S sing in a in an indie pop rock band called good kid yeah what what kind of music is it it's Indie pop rock so we're like somewhere between The Strokes and panic at the Disco interesting interesting so how is coh here differentiated from other players in the space yeah uh the space has heated up a lot recently so there's there's a lot of players but we we've kind of carved out a niche for ourselves uh by being focused on real world Enterprise Business Solutions so we're not an AGI company we're not interested in building digital Gods I'm interested in making this tech technology like deliver on the promise it has made and solve real world problems um so we we as a company stay focused on that can you can you go into a little bit more details so what what do you mean by AG you know does AGI imply that it's magic and it's generalizable and it can do everything and and are you saying coher is building more pragmatic specialized solution yeah so I think when aany says that they're interested they're going after AGI they're describing a technology that does not exist yet right they're describing some future world in which there are computers that you treat as a person or that are you know go even beyond the capabilities of a person are you know those sci-fi visions of artificial super intelligence and all these things if a company says they're going after AGI they're saying that's what they're trying to build I I actually it's unclear to me that the technology we have now will get us there I don't I don't really care about that I'm interested and coher is interested in making large language models useful for businesses so like that's what we're focused on yeah I mean do you think AGI could exist or or do you think it's just not possible uh I I think it's possible I I see no reason why it couldn't I I'm not I'm not a dual uh I think we we might be able to create you know some representation of of a human mind that is that functions as one I don't think we've done that with Transformers at all I don't I don't think we've made Transformers are nowhere close to the to a human mind okay but if it could exist would it be a good thing and why why would it not be interesting I'm not sure if it would be a good thing yeah yeah I'm not I think that's I think that's a really interesting conversation I think there's a lot of people who think it wouldn't be a good thing I think there's a lot of people who think it would be a good thing I don't know I think we're not close to that at all and so I like to think about those things from a philosophical perspective but when it comes to running a business I'm really interested in making the technology useful to solve these like you know real business problems and I I I like philosophical debates but I like being grounded in reality a lot more so that's what we're focused on here absolutely well maybe we'll go philosophical a bit later happy to yeah um cooh here has just made incredible waves with command R can you can you tell us about that yeah yes we recently released uh two two new models command R and command R plus uh they're similar from the same model family we just have one that's bigger and one that's smaller um they're particularly good at multilingual retrieval augmented generation and Tool use those are kind of the things we we went after um we've open sourced the weights and they're available on on a bunch of the cloud providers so if you're a builder I encourage you to download the weights try it out yourself try it out on our platform if you're in an Enterprise try it out on whatever cloud provider you're working with uh they're really good at yeah they're they're really good at real world problems and out of those features that you mentioned is there a particular standout feature that you feel perhaps has the potential to be really big yeah I mean I think all I think all three do if I had to pick one out of the three I think retrieval augmented generation is is is the most exciting I think I mean people talk about hallucinations in language models a lot sure I'm sure your listeners are familiar um but for those who are are not hallucination in a language model is when you you ask it a question and it writes something that is not true right it writes something that you that you isn't isn't based on your view of the world I I don't really like the term hallucination because it implies that there's something the model can do that isn't hallucinate but really all large language models do on their own is hallucinate it just so happens that times that hallucination lines up with what we with how the world is but that's kind of a fundamental issue so we've trained our models to instead of just being good at memorizing facts that you know go out of date or that change or something we've trained them to be really good at taking relevant information and answering based on those things and providing citations so you know where it got that answer so that's I think that like addresses this fundamental issue now you can get a language model as an interface into some external source of Truth rather than relying on the internal uh memory and weights of that model I agree I think there's a problem with anthrop anthropomorphization which is that there's no kind of explanation of why the information is there it's still interpretive but it seems you know better to me to at least have the citations there so it's clear why the model said what it did 100% yeah so our our model is particularly good at being like here's a doc here's a bunch of documents here's what I wrote this part of this sentence came from this document this part of this sentence came from that document and so you can check and see see where it got its source of information yeah yeah that's very interesting yeah because um the these models they they mimic language so incredibly well that when you just see you know generated text the Temptation is to ascribe agency and you know personhood I think a lot of the a lot of the difficulties people have been having with large language models comes from personifying the model to be doing one thing and treating it like it's doing that when in reality it's doing something quite different so so I I really like the retrieval augmented Generation stuff of our models because I think it addresses that fundamental problem yeah I mean reflecting on you know Rag and the maturity landscape I mean how is it changing the way we are building applications oh it's I think it's changing a lot yeah I think it's making I think it's making models a lot more usable when you when you've set the whole system up you know you use coher embeddings that are multilingual coher rerank to improve the the search results and then you feed in those relevant information to the generative model you can actually start to use that at production scale you can actually start to you know put in your company's Internal Documentation and get a real answer instead of something that the model just made up interesting so um we're at one of your build days you're running four build days in four cities and we're here in Toronto and there's about 80 developers here tell me about the day and and who are you speaking to today yeah the the build days are something uh new new for us we haven't done these before uh but the first went really well this one's going very well uh and I'm really excited for the next two um yes we have a whole bunch of people from like a variety of backgrounds actually um some really experienced like longtime ml Engineers some more recent but they're all on the second floor right now working with our new models and working with our open source chat toolkit uh building well I don't know what they're building yet they they just started but I'll be excited to see what they build in the end yeah yeah it's so interesting because I I think there's a last mile problem in in um you know building applications with language models it's easy to put a demo together it's very impressive but there's still a hell of a lot of you know software engineering good oldfashioned engineering that needs to happen and just building that developer awareness I think is is really really important but given that there is still a you know let's say a 5% robustness problem um what kind of General guidance are you giving to develop is it is is it a case that there are general rules or is it quite specific yeah well I think I think if you're building an application that's like a chat interface to do something I think it's really good practice to use the citations in the UI that our model provides um because that really does increase uh like trust right like if you can see where the information came from then you can know okay I'm relying on this this is this is good information that the model had so you mentioned there's a Last Mile problem I think there are uh C there's a few one of them was building out good user interfaces to to use large language models so one of the things that they're working on is our chat toolkit this is like a whole full-featured chat interface uh that we just open sourced so now we have something called the coher toolkit you can deploy that locally you can deploy it it's all set up for Docker and everything uh so you can deploy in your own environment a chat interface using our models with citations with rag with tool use even python interpreter like all of these extra bells and whistles and that has already gone a long way in setting people up to build new products with this Tech I'm I'm really excited about the evolution of rag so tool use is fascinating and I think in the future it could be generalized to just discovering um you know microservices you know just and doing flexible semantics and and so on but what what's the evolution there like what kind of tool use are you seeing now and where do you think it's going yeah so right now so so just for yeah so tool use is is kind of a new a new thing people are interested in in llms and it's where instead of just getting the model write uh you know write some text you get it to use some tool to like write a query for that tool and then based on the result of that tool which is externally computed give you an answer so retrieval augmented generation is a is a is like kind of the first example of this like the tool is a search and so you get the model to write a search query do the search based on those documents answer the question provide citations that's like one tool another tool we have in our Coral in our uh toolkit is like uh is a calculator so you can get the model to write a a mathematical expression execute the mathematical expression give you an answer based on that that's one thing language models are notoriously bad at math as you would expect them to be given that they learn by repetition and and you know reading from examples it's really hard to learn math by memorizing every math equation you ever see so yeah so they're really bad at that so we can we can augment that by giving them access to a calculator and and then answering the question based on the output of the calculator those are two kind of simple examples third one that we've added into toolkit recently is python so you can use code as a tool itself the input to that tool is a bunch of code the output is the output of that code and that kind of opens it up to everything so where does it go from there I think it you start combining those so one thing in in the toolkit you can also do is turn on you know search as a tool and as a tool and turn on multistep so now you can ask a question like you know create a graph of the height of the five tallest pyramids and first it will find what are the five tallest pyramids what are their Heights and then it will use Python to create a graph to do that so you can see how chaining those things together you could get to a world where you kind of arrive at a computer maybe the screen is blank and you just say do this and then it figures out how to call ious things if it doesn't know how to do it maybe it Google searches it first finds an API builds you a front end to interact with that API like we can get to a point where language really can be the default interface between you and a computer yeah I think that's really exciting that's fascinating because I'm I'm interested in how this is going to kind of evolve over time so as I understand it right now um you know Co here has an incredibly sophisticated um you know toolkit which generates prompts which constrains the model and you know we can solve problems because even if we run this thing non- interactively it's going to work for liely it's going to do you know what we need it to do but when the tools become more open-ended so let's say it's generating code to do something and people might be building these agentic meta programming systems and so on then it feels like it's it's harder and harder to run it unsupervised 100% what does that look like yeah so I mean you you you can think of this as you can think of this in the same way you think of like game AI or something like like chess i s in front of these chess boards you can think of the same chess AI or go I like there's a branching factor to these things right like one of the reasons why the algorithms that beat chess were so different than the algorithms that beat go is because in chess there's what an average of like 80 moves or something you can make per move and in go there's an average of like 200 so we had to completely redesign the computer systems that were used to create AI for both of those games simply because the branching factor is is bigger and if you're going to you know every time going to make a move there's 200 possible moves to make I don't know what the branching factor for writing python is but it's probably pretty big right so I think like as you start to do multi-step and do more open-ended tools it certainly requires more supervision and more correction just in the same way that you know if you're asking a person to do a simple task that's one step you probably don't need to supervise them they can probably do it but if you're asking somebody to do a task that requires many many many steps you might want to check in on the every now and again yeah yeah I mean the way the way I think about this is It's to do with like the Divergence of semantics and understanding so because I'm really excited about um you know agentic systems for example and the problem at the moment is every single step it diverges so it it's moving away from the thing that I wanted it to do yeah and I guess you could solve this one of two ways I mean you know maybe if we made it understand better we could trust it to you know work in these Divergent flows and it would still end up doing what we want it to do well I think here's where here's where we're get into some of the like non AGI things like I don't love the term agents although it has been widely adopted you know agent workflows or something I don't love it I I don't like applying implying agency to a sequence model I think that's actually kind of misleading I think I like to use the term multi-step tool use because when you say something like that you can imagine how the problem you're trying to tackle is fundamentally a sequence modeling problem like you have some sequence of calls you've trained a model a probabilistic autor regressive model on those sequences and we're going to try to make the model better and better and better at that you're not going to get agency out of that it's not going to suddenly get agency and it's always going to be constrained by the data that it has been trained on so I don't think we I don't think these models in a few years will get to a place where you can say something like hey llm run my business for me like that's not going to happen that there's too it's too open-ended the branching factor is too big it requires too much about the real world there's not going to been data that represents that use case it's just not going to work but I do think we could get to a point where lots of little things throughout running a business could be automated by a language model you could say hey read these documents and make a report that tells me this information like that could probably be done by multi-step tool use so you could say Hey you know look at these customer requests and you know respond to them giving them it respond to them with these particular things if this happened or something like you could start to chain things like that together yeah I think part of the issue with agencies people think that it's just a thing which acts and I love the definition I mean there's actually many definitions on the Stanford encyclopedia philosophy but you know let's say um a system with um preferences with goals with intentionality um something which is you know trying to affect the system around it you know to to to achieve it its preferred States and you know what's to say that a system couldn't have that it seems to me that the missing component is the understanding I agree that it doesn't have Val because we tell it what to do we tell it what the preference is it doesn't infer the preferences but let's say you know here's a thing that is aligned with us it has the same intentions as we do as as CEOs or something like that um why couldn't that be an agent I mean I think yeah I mean this is just this is going to get down to like a yeah a philosophical discussion on what the definition of agent is yeah and and I I think like you know we we can we can have an llm that can based on training data figure figure out what tool to call we can have it figure out how to use multiple Tools in sequence or perhaps even create you know sub agents to accomplish goals like we we could do all of that and I think we will but fundamentally these models are still based off training data they still work on things that they have seen before for the most part um and that means you know they're never going to be as robust as you or I are like you know we do we are doing something very different than an llm is doing even when we're trying to solve the same tasks and so I think that means there are going to be some things that llms are very good at like I use this graph example right I ask it to go make a graph of the five tallest pyramids in the world uh you know that that would actually take me a while like I would have to Google search it find it I'd have to remember how to use matplot lib or I'd have to remember how to use Google Sheets or something like it would take a little bit but you can use our model to do that in in a second now in a few you know so there are things that is going to be very good at but the things about operating in the real world operating in you know with the complexities and intricacies of interpersonal dynamics that that come up when you're running a business those are probably not things you're going to get from a training data set and so I don't think these models are going to yeah going to extend to that yeah there's also this thing that Contra Nick Bostrom um I think intelligence is about go dynamism you know part of agency is about you know the ability to dynamically change your goals yeah and having an agent that is explicitly told to do one thing can't be intelligent by definition I mean I think that that's another yeah all the all these complicated debates just like well it depends on what you mean by intelligent and I think like I think in some ways those debates yeah they're interesting but they don't really help you get to the bottom of it I think what helps me get to the bottom of it and like understanding where this technology is going is going back to what it does and what it does is probabilistically based on a huge training data set write a sequence of tokens yes that's what it does that's not what you're doing you're doing something very different and so these are obviously these two things these two technologies a human brain and a large language model are going to be very different it's like lots of people have used the analogy before but it's similar to like artificial flight right like we have planes which are incredible and can carry huge amounts of weight massive like massively long distances but you know they're not the wings of a hummingbird they can't hover in one place they can't dynamically change like they're very different Technologies planes are super useful and are you know a truly incredible feat of engineering and are a fundamental part of our lives these days so we created artificial flight it just ended up being very different than biological flight and so is particularly good at some things and not good at others right absolutely um there's this kind of narrative coming from Silicon Valley that llms are are General and I I think it serves as as as a useful fiction in in my opinion but I I believe that these models are more specialized than than people realize and with with that in in mind I mean would we is there a future where we might see specialized versions of of llms or do you think there'll always be this idea of a foundation model that is General so I think I mean they they are really General when you look at like the history of machine learning so when I first started doing machine learning research if you wanted to solve any task the best way to solve it was to create a data set for that task train a model on that data set use the model for that one task and that one task alone that was like where things started then in image recognition shortly thereafter it turned out that you could actually like have a general purpose kind of image recognition model and then you could fine-tune it on your particular data set or something and that was helpful we're now in a in a place where if you take a really big Transformer and you have a language task you want to solve you you might actually get better results by just prompting that general purpose language model than you would on fine-tuning a Model based on examples of your task I like that's a really interesting outcome I think that has to do with the perception of the narrowness of language we think of language tasks as being very constrained you know like I can imagine a task of like you know reading a reading a document reading like a quarterly earnings report and extracting sentiment and you know the revenue or something you you might might think that's pretty constrained but when you actually look at the way people write earnings reports that they're all over the place they use different terms of phrase they use different it it's very difficult to get a data set that captures that and that alone so you probably better taking a gen like a general purpose Transformer and prompting it a few times and you might get better results out of that so they're General in that sense again I don't think they're General in the way you and I are General like I don't think they are as robust as that and so there are going to be things that they don't extrapolate to and some like fundamental limitations of where this technology can go but that doesn't mean I don't think they're massively useful and they're general purpose enough that if you have a language problem you should probably be using a Transformer to solve it and that will probably get you the best results yeah I mean a couple of thoughts on this I mean first of all language might not be as general as as we think and also there's the the benchmarks problem maybe you can comment on that but it's possible that there's you know like some of these models you could say are cheating because they're kind of training on on the the test set and so you know writers in in um you know the New York Times or whatever they say oh you know these things can pass the the bar exam they they can they can pass SATs and so on and they're not they're not really passing but the thing is they are general for the reasons you just said they really are General so so what is it well it's a very this as with most things the answer is somewhere in between right the answer is somewhere yeah they are General but they're not General the way you or I General so yeah benchmarks has become a real problem in language modeling yeah um and that you know over the course of the history of this company and this technology in general like every year there's been a new Benchmark that people have cared about so in our first year the Benchmark that everybody cared about was L m1b which was measuring a model's ability to generate news from 2012 that was The Benchmark that everybody used after that it was something called H swag which was your ability to come up with the ending of s of like weird sentences about where cups were and like odd kind of things uh more recently it's been uh mlu which is a series of multiple choice questions involving some surprisingly specific things turns out there's a lot of questions in there about Sigman Freud's theories we use all of these as proxies to understand how good a model is and like they give you some intuition but they're by no they're they're easily cheated and normally not at all correlated with what people want to use the model for like everybody's measuring mlu nobody building application with a language model is giving it MML like questions right like that's actually not useful we don't need a language model to answer weird multiple choice questions we need it to you know help us with our work we need it to structure unstructured documents we needed to you know call tools based on the input from a user and like none of those are really captured by the benchmarks that exist today would cooh here ever verticalized or do you expect customers to build derivative or fine tune models on top so I think our our objective has been yeah to build these general purpose models tailored specifically for business yeah a lot of people talk about like oh you know should we make a financial model or a medical model or something I think actually the way to do is to use retrieval augmented generation so we're not going to try to train a model that memorizes all the facts about medicine we're going to make it generally good at language and if you want it to know facts about medicine you can give it documents in the retrieval augmented generation system and it can answer it based on that can you contrast rag with having a really long context and just putting everything in there or more in there yeah yeah uh so we've expanded our context like window our context window is now 128k um for the command R plus release um yeah these things are are are different so a retrieval augmented generation system I think when I say that word a lot of people think like a whole search thing where you know you first you you generate a query with that query you find some uh relevant document you then use you know you use an embedding model to measure similarity use a rerank model to refine that then you put that back into the prompt and you might say well can't we just put all those documents into the prompt and answer uh directly without any of the embed or anything I would still call that a retrieval augmented generation it's just that your retrieval is really broad you're saying all of these documents put them into the context window no matter how big your context window gets you're probably going to have in a production setting more documents than fit like that might not be true if you're building like you know I I've built retrieval augmented generation uh systems for like little games where I like give it a bunch of lore like a backstory and I answer questions on that there yeah now I need to do embedding or rerank anymore I just like put it into the prompt and answer the question but when we're looking at you know real world production business usage it's like answer this question from all of case law or answer this question based on a person's entire you know history or something like any of these huge things that really quickly go over your context window yeah fascinating um I can imagine a world where let's say in the future you know we've got 10 million context window or something like that and folks using coh here it just kind of remembers everything that you've ever done so you know all of your previous um interactions are in there and so you get this kind of Divergence where everyone has a different state so rather than starting with a blank slate where everyone starts from the beginning it has you know you know what what did I buy for shopping last week and when I went on holiday last year and so on and my gut intuition is that model would be kind of weird and constrained and quite hard to Benchmark I mean what do you think yeah yeah I mean I think that would get pretty weird I think the bench I think the obsession with metrics and benchmarking right now is like a local minimum like I think the obsession with that uh is because we we don't we're this technology is still so exciting and it's being talked about all the time and it has yet to really deliver on that promise like I'm starting to see a bunch of people use llms as part of their daily workflow and part of their life but a lot of people still don't a lot of people have seen this technology as like a cool audit and then moved on it hasn't impacted the way we use computers the way the internet did the way the touchreen did the way the mobile phone did like any of those the way databases did I guess like any of those things those Technologies they became like Inseparable from the way we use technology and we use computers large language models I think will do that but have yet to do it and now like there's benchmarks for touch screens could you name me one right like yeah no like it's not it's not relevant anymore right like you just use the thing and there's like benchmarks for phones but like you know as a consumer you just kind of decide which one you think is best and you use it I think that the obsession with like leaderboards and benchmarks exists because the technology hasn't landed yet it hasn't delivered on its promise and when it does we're not going to be worrying about like well hey does your model you know is it great at memorizing facts about Freud's theories in a multiple choice setting you're going to be like well does this one help me solve my problem like it does great cool that's it you know that's what it's to but then the question is has the technology landed so I mean I've got a bit of an intuition looking at some of the bench marks and you know benchmarks are broken but there's also things like the um you know the llm arena which I think is very interesting which you can talk to but but it look it appear I mean like the the models are all in a similar ballpark now um does that mean it's saturating or or do you think that you know if we put 10x more into data and training you know we could still see a huge Improvement um I think that's an interesting question um I I think on some of the like benchmarks like not the you know we're saturating because people are training on the test set so there's some saturation going on there I think in the ELO ranking I think that's a really that's a much more interesting Benchmark to me um I think that one's that one's quite interesting I I think the shortcoming of the ELO ranking is that what people want to use llms for in business applications is mostly not idle chitchat it's mostly not asking at a brain teaser and seeing if it gets it it's mostly work like work automation or you know augmenting systems that are in place already to handle a bunch of you know complex language processes or something so the ELO ranking is cool but you have to remember it's it's measuring not how useful is this bu is this model going to be for my business it's measuring how good is it to chitchat with and those are correlated they certainly like a a model that is you know language models that are really good conversationalist are often really good workhorses as well so it it's a decent uh one but it has its shortcomings as well I think there's still room to grow like I think these models are going to get a lot better over the next few years how I where how I think that's what I think that's going to look like is the stuff we were talking about earlier you know multihop tool use and integrating with systems that are you know externally the LMS I think that stuff's just going to get more and more and more reliable I don't know if we're going to make a model that's you know even better to chitchat with on Elo but I know that but I know that we're going to make a model that is more useful yeah even if it wasn't chitchat if it was something better than that you could argue that the the The Benchmark is still kind of human parasitic and human limited because it's limited by the ability of humans to kind of recognize something as being better I'll give you chess as an example so you know in chess there are roughly 30 kind of skill FL where um according to ELO one person in the floor above will beat the person you know 100 ELO points below 95% of the time so there there's a massive dynamic range of skill in in chess and I don't think there is such a a high dynamic range of of skill in in humans and and of course chess is one skill um you know we we have yeah a whole you know like a whole nine yards of of skill but then what you were saying before is interesting which is that there's a kind of good enough thing right you know like when it reaches maturity then at some point it's like the iPhone 15 like it's it's good enough yeah yeah that's that's interesting yeah um yeah I think I think we're I think Tech I think the Tech's going to get a lot better in the next little bit so I don't think like like I we've you know I'm excited for our next model release like we're we're cooking up some exciting things I don't I think uh I don't think we've saturated on that but I I do think that a person's ability to look at a model and say this one's better than that one based on my brief convers ation with it yeah those don't really reflect the business use cases that people are making and we've seen I mean a bunch of research has come up on this recently showing the effect that preambles have on Elo ranking showing the the like slight formatting changes make a huge difference in people's perception of how good the model is all of those things like kind of show the brittleness of that rank that ranking I imagine as time goes on we'll get better at making the ranking as well so hopefully we can have one that reflects like hey this is if like we want the ranking to say if you're going to Sol a business problem you're going to use a model to to do something within your work this one's going to solve it for you this one's going to solve it for you this one's not like that's what we'd like to get to and and we're not really there either so I expect the models to get better and and hopefully the way we evaluate them to get better as well yeah it's interesting the comment you made about Almost Human adversarial example is that you know you kind of like change change a couple of things and humans see it completely differently um I'm interested you know you probably can't talk too much about this but how is coh here differentiated in terms of like your data acquisition Pipeline and how you kind of like Get the data prepared for the language model um yeah well I don't I don't know the way other companies are getting their uh data so it's hard to talk about the ways in which it's differentiated I know that you know we we make a point of getting data uh where we can do so uh re reasonably and we make a point of getting data where we're confident in its usage um so that's uh I mean that that comes from our focus on and business Enterprise so we we do like data indemnifications so if you're using our model we can indemnify you against that and that comes from our you know belief in our uh principled approach to data acquisition yeah I'm a huge fan of of Sarah of course who runs um C Co here for for AI and you know the the ethical question is is really really big here because I know you folks have done so much around you know like the multilingual model you know looking at low resource languages and and stuff like that but um it gets to the point when you're when you're playing the soter chasing game you know you almost like do you take a principled stand and say oh no you know we actually want to have fairer models even some of our headline um you know metrics might not look better I mean what where do you kind of think about that yeah I mean it comes from yeah having it comes from trying to make a model that is can be deployed anywhere and that any business can be confident deploying and using yeah like that's if our objective is to make a useful thing for businesses then how we handle data like falls out of that naturally right like that yeah we're not really building custom machine learning models anymore the data science function you know as a role it it Rose to prominence around 2017 and loads of folks building custom models and it it feels to me that now there's more of a software engineering Focus you know there's this like let's use Foundation models let's do llm ops um is is that a fair assessment or or do you still think there is huge you know like importance of data science well I mean look the time period you're talking about like 2017 18 when people were bunch of ml Engineers um making custom models they weren't doing languages of modality right they were doing you know like tab like predictions of of uh you know like uh of image classification or they were doing predictions of of outcomes based on numeric features or something like and none of that has changed like that's all still happening everybody who was doing that they're still doing it in some form it's just that a whole new thing has opened up of like oh you have a language problem to solve cool a a Transformer a large language model can help you solve that so I think all of those things are still probably on going we just don't talk about them as much anymore yeah yeah but how would you contrast them because it feels like a difference in kind so one feels more like software engineering and it feels more like composing you know like in software you compose modules together there's still a whole bunch of metrics around it so you could argue it's a similar mindset but it it feels like in those days it was you know it was different it it felt like R&D it it just it felt different now it feels more like it's just yeah well I I would I would I would imagine that actually it always feels like software engineering and that even when we're thinking back of the days of like 2018 when somebody at some company was told to train a neural net to predict click-through rates of something the challenging problems in getting that running was software engineering you know in the end it was like all right cool I'm going to do some feature engineering for a bit and then I'm going to spend a long time figuring out how to get this system ready in production and like actually working so I think now it's a lot of the same stuff you spend some time figuring out how to get a good prompt you spend some time looking at Which models particularly good at your task and then you spend time as a regular software engineer building out the infrastructure to actually run it um so yeah I bet they're not that dissimilar I mean I I agree with you so I I always thought of software engineering as being hugely important to gets into production but there was this kind of impedance mismatch so you would you you know the data scientist would build the model yeah and then you'd hand over to the ml engineer you build this devops pipeline and it was always like super brittle because you know what happens when the model changes how do and and now now the difference it seems to me that a software engineer on their own can do the whole thing H yeah I mean I think now now that we're all back ending our stuff based on a pre-made model yeah it does mean that you don't need to do you're not doing feature engineering you're doing a little prompt tuning you're doing some prompt engineering but that's a thing you know that's you're not taking some weird you know you're not applying some weird function to a numeric value and noticing that if you apply this function that's a better feature than this one you're just toying with language a little bit and that's a thing that's a lot more accessible yeah amazing is there anything else you want to talk about I want to shout out our open- source release of the toolkit so that any of your developers if they're building chat interfaces like I highly recommend that they try out our check out our GitHub and and use the coher toolkit for building chat interfaces with tool use uh multihop tool use including things like python interpreter and web search so like try that out and check out the new command R plus model it's really great uh I use it a lot it's it's my go to these days it's amazing yeah and it's good at uh yeah tool use retrieval augmented generation and multilinguality um Nick is been an absolute honor to have you on thank you so much yeah thanks so much for having me I really enjoyed the conversation amazing [Music]