so today is a remarkable day for AI development this announcement is one that I didn't think we would be receiving just yet but as with AI developments it's always quicker than you think today Sakana Labs introduced the AI scientist the world's first AI system for automating scientific research and open-ended Discovery so Sak Labs is actually a Tokyo based AI startup founded by former Google researchers Leon Jones and David H the company is actually focused on developing AI models inspired by natural systems such as schools of fish and beehives to create flexible adaptive and economically efficient AI models and this is of course a rather different approach and today they've come out and said that they've got their new AI scientist so from ideation to writing code and running experiments and summarizing results to writing entire papers and conducting peerreview the AI scientists opens up a new era of Aid driven scientific research and accelerated Discovery now this company aims to utilize numerous smaller models that work collaboratively akin to swarms in nature and Leon Jones one of the co-authors of the very influential 2017 paper attention is all you need which introduced the very famous Transformer architecture actually serves as the CTO of Sakana Ai and David ha who previously LED research at stability Ai and Google brain is the CEO and they actually raised $30 million in seed funding so let's actually get into this announcement because this is one that actually shook the industry so right here we can see the conceptual illustration of the AI scientist so this is where they state that the AI scientist first brainstorms a set of ideas then evaluates their novelty this is basically where it's checking how new these ideas are and if they've been covered before next edits a codebase powered by recent advances in automated code generation to implement the novel algorithms then the scientist then runs experiments to gather results consisting of both numerical data and visual summaries and it crafts a scientific report explaining and contextualizing the results then finally the AI scientist generates an automated peer review based on top tier machine learning conference standards and this review helps refine the current project and informs future generations of open-ended ideation so you can see right here we do have four key steps the idea generation which is you know given a starting template the AI scientist first brainstorms of course you've got the experiment template where given the idea template the second phase of the AI scientist first executes the proposed experiments then obtains and produces plots to visualize the results and it makes a note describing what each plot contains PS enabling the saved figures and experimental notes to provide the information required to write up the paper then finally the AI scientist produce a concise and informative write up of its progress in the style of a standard machine learning proceeding in latex and it uses semantic scholar to autonomously find relevant papers to site now a key aspect of this work is the development of an automated llm powered reviewer which we'll get into later but this is capable of evaluating generated papers with near human accuracy and the generated reviews can be used to either improve the project or as feedback to Future generations for open-ended ideation this enables a continuous feedback loop allowing the AI scientist to iteratively improve its research output and when combined with the most capable llms the AI scientist is capable of producing papers judged by the automated reviewer as a weak accept at a top Machine level learning conference now there's been some really fascinating things about this entire thing that I do want to dive into because the paper is actually pretty long but there's a few things that you might not know the one of the things is that the cost effectiveness of the system reducing papers with potential conference relevance is actually at $15 per paper so in order to you know run this model effectively and actually produce papers it costs $15 to produce each paper at about you guys but I think that is relatively cheap considering the fact that in the future models are going to be getting cheaper overall and with algorithmic efficiencies and different agentic workflows we're likely to squeeze out more capabilities of the same level of compute of the llm basically saying that look these things are going to get more efficient and a lot smarter over the time which means that we could be getting maybe this new Benchmark to how much each paper is significantly down and of course you can see here that this says this highlights its ability to democratize research and accelerate scientific progress of course imagine when we get to a world where these research papers are even a dollar or even a few cents I mean how many areas of research are we going to be able to be automating now one of the biggest questions that I personally had were okay we now have an automated AI researcher but were the papers any good so based on the information provided the papers generated by the AI scientist contained some potentially novel insights but overall the quality was mixed so here's a quick summary on their papers so some papers did present potentially new ideas or approaches for example a paper on diffusion models for low dimensional data showed significant improvements in Sample quality and distribution matching another paper proposed a novel duel expert denoising architecture for diffusion models which showed Improvement performance and the papers were generally described as medium quality and they were comparable to work by early stage machine learning researchers who can execute ideas competently but may lack deep background knowledge now many papers did actually have significant limitations such as the lack of thorough theoretical justification for the proposed methods of course we have limited experimental scope the occasional hallucination and inconsistent quality across different sections of the paper and the authors pretty much stress that you know this scientific content should not be taken at face value they can be used for hints of promising ideas that require followup and verification by human researchers so whilst this isn't exactly breakthrough research these papers could serve as a source of ideas or starting points for human researchers who EXP explore further and some of these papers achieved scores that exceeded the exception threshold for top machine learning conferences according to the automated reviewer system and the system's ability to generate many papers you know quickly such as hundreds in a week could be valuable for brainstorming research directions even if these individual papers aren't publication quality so whilst the AI scientists demonstrated the ability to generate coherent research papers with some novel ideas the papers did not generally represent new knowledge ready for publication but that doesn't mean that this isn't useful research there were more akin to preliminary research proposals or early stage work that would require more substantial human expert involvement to develop into publishable research now one of the models that they were using in this research was of course Claude Sonet 3.5 so they State here that we find that Claude Sonet 3.5 consistently produces the best papers with a few of them even achieving a score that exceeds the threshold for acceptance at a standard machine learning conference from the automated paper reviewer and what they also State here which is a clear indication of what's to come is that there is no fundamental reason to expect a single model like Sonic 3.5 to maintain its lead they state that we anticipate that all Frontier llms including open models will continue to improve and that the competition among the llms has led to their commoditization and increased capabilities they also spoke about using open models you know they were like you know lower cost and easier to use but they were worse in quality one of the things that you might want to see here for those of you who are looking to get into the prompt engineering side in the research paper they actually did have a section where they you know laid out all of the prompts that they used you can see right here one of the idea novelty system prompts promps was you are an ambitious AIP PhD student who is looking to publish a paper that will contribute significantly to the field you have an idea and you want to check if it is novel or not I.E overlapping significantly with existing literature or already well explored you can see that these kind of prompts and there were like a few of them showcase how these models were T to think with regards to the responses if you do want to see that it is in the research paper for those of you that might want to try this out yourself now one of the craziest limitations about this entire thing was the fact that the AI scientist currently doesn't have any Vision capabilities so it's unable to fix visual issues with the paper or read plots example the generated plots sometimes unreadable and tables sometimes exceed the width of the page the page layout is often suboptimal and adding multimodal Foundation models can actually fix this they also talk about how the AI scientists can incorrectly implement ment its ideas or make unfair comparisons to baselines leading to you know misleading results and it occasionally makes critical errors when writing and evaluating results for example struggling to compare the magnitude of two numbers now what's crazy about all of this is that they actually speak about how they've managed to open source this so for those of you who want to know the complete ins and outs those of you who want to run this yourself to conduct AI research you're going to be able to do that they've given a link to their G Hub repo they've open sourced the entire thing so it's going to be fascinating to see what happens here I honestly didn't expect this announcement today but with AI development you truly never know what's coming