the BBC has zero method for actually monitoring what they do we monitored what they do the model is blind the model doesn't know that this comes from the BBC and this doesn't and we're doing things over and over and over again in big numbers you've developed a method that news rooms could use themselves in order to establish not if one specific article or broadcast is problematic but to look at your broad overall output and say we have potentially a problem we now have a method that allows us to give again numbers and to look on the things the way they are without being affected with our prejudice with our beliefs or with anyone's else we showed that over a period of 4 months which is 9 million words they were not balanced and they were consistently not balanced balance should be consistent across their products we thrive on push back that's that's how we you know how science works and this is what we expect in order to refine our methods and and get to the bottom of things the BBC has what it I think laughably calls complaints department because I say laughably because almost no complaints are ever accepted I'm Jonathan Sati Dr haran shaniis Trevor asserson Dr Orin so thank you for joining me Trevor could you tell me as the report Bears your name what is the assesin report well the assin report is a deep dive into the material that was produced and broadcast uh published by the BBC during the first four months of the present War uh between Israel and Hamas and it was uh intended to try to analyze the material to see whether or not the BBC was compliant with its fundamental obligation which is an obligation to be impartial in its news reporting there were many people that were concerned that it was failing to achieve its um obligations to to meet its obligations and we designed experiments to try to see whether or not one could test empirically whether that was the case what's unusual about the report is that it in included two completely separate disciplines that is to say I'm a I'm a lawyer and I use traditional legal principles to run many of the experiments in the report but I was also joined by a team of a very experien iced and very senior data scientists who brought um a completely new methodology to the analysis of um broadcast material and that's what makes the report particularly I think unusual and groundbreaking so it's a a combination of lawyers and data scientists and that's where you two gentlemen come in the idea then if I understand you correctly is to use science and perhaps existing research methods uh for linguistic and and natural language processing to see if there's a way to remove the opinions or biases of the researchers and try and find a sort of objective way of measuring bias in news coverage using computers and learning models so Ain you're you're a senior researcher in computational linguistics and natural language processing at Boran University in Israel um how could you tell me how you think that turned out to be the way that you chose using AI to do this so the use of of machine learning artificial intelligence AI in order to analyze text in general and media specifically and to um anpac media bias or the literature is called media slant is is something we or the research Community is doing for 20 years or something like that in that sense it's not new that machines are used to analyze texts and find Nu on obviously there are some challenges what special I think about this effort the dawn of large language models CH GPT Etc which allow you to work in a sophisticated and on one hand and a more straightforward way on the other hand um with very strong models that are capable of analyzing text so Haron what did you do uh what sort of research was it what were you asking the AI chat GPT and what did you find one of the biggest opportunities we we had in this point in time is the new era of having large language models like chpt you can speak with them in human language it's not about a code or or technical terms like weights and layers and the architecture of models that people like or and myself are used to deal with when we train models and and and establish them but it's more about the language with which you communicate and this was very helpful for us to speak with the lawyers and to understand which kind of questions we should really ask so just to give you an example in classical sentiment analysis model you would see whether the sentiment is negative or positive and that's it now if you think about this case when you speak about Palestinians who are being um violent you know Hamas Etc the sentiment would be very negative but it's also the case when you speak about Palestinians who are suffering from the war and are under um you know bombings of Israel so the sentiment would be negative there as well when it comes to a new large language model like chpt you can explicitly ask whether someone is depicted as a victim or an aggressor or it can be way more specific than that and you can direct the models the Trev what were the legal um aspects what was the legal framework that was used to assess the BBC's coverage then well the starting point was the 130 pages or so of guidelines in the BBC and those guidelines are designed by the BBC to um achieve impartial news coverage now not every line of those guidelines relates to news but much of it does and what we did is we isolated the principal rules that the BBC sets and then devised experiments to check whether or not those rules were being obeyed so for example um the BBC says that where it's reporting from a place of reporting with restrictions it must say so um because obviously uh what they're really saying by by telling you telling the audience that they are reporting restrictions is it's quite likely that what's being said isn't true it's being controlled by a dictatorship and that's very much the case with almost every word that comes out of Gaza in which the BBC reminds its audience frequently that they are not allowed into Gaza um by the Israeli uh government no journalists are allowed into Gaza obviously it's a very dangerous place and then whatever the reasons might be but there are many many journalists working in Gaza that are putting out material the BBC is giving the audience that material and ought to be giving the health warning we counted the extent to which it was giving that health warning and almost never gave it as far as we can see they never gave that health warning that's an example and there were many many of those rules that we looked at in each one of those rules we devised ways of testing them empirically so just to be clear about that point um it is difficult for journalists to get in some journalists have been allowed in on embeds with the IDF but again the BBC and others are always Keen to point out that they can't get journalists in but you're saying they don't then make the the sort of counter point which is that everything you're therefore seeing comes from people who are already in Gaza who kind of work from Gaza and maybe either intentionally or unintentionally subject to Hamas propaganda techniques that stop them being able to report fairly that's correct and there's a further again one of the obligations that they have is where the person that they're interviewing or um that is that is broadcasting has an interest in the out the outcome then they've got to inform you of that and we decided to look into the background of many of the journalists and many of the talking heads and experts they put in the in the studio many of them were hamus supporters or hamus members and that was concealed from the audience and and we saw very recently in the um uh the documentary that had to be withdrawn that a member of uh a very senior hamus family was being paid as an actor to actor Parts which was entirely dishonest and that was not noticed if the BBC say they didn't know that's a bit implausible but if they but they certainly didn't tell us and they should have done that's an example of the rules um that they were breaking all the way through so those are some of the the traditional ways and legal ways that we were looking at or you were looking at what the BBC um did in terms of its coverage and and as you say that documentary not only used that Hamas connected child whose father was a Hamas Minister but even engaged in mistranslations things like translating yahood as Israeli and and Jihad as battle but not explaining the full context so there's that example where they had an English subtitle of of fighting and resisting Israeli forces but the accurate Arabic translation is Jihad against the Jews looking back to the AI side because I think that this is one of the Lesser known Parts in the public of what what's in the asserson report um and how it fit in with those legal requirements Ain can you tell me a bit about some of the findings that you had some of the interesting results what we actually did was to um use chpt to feed it news stories or headlines or both and to ask um two simple questions um is does this um headline or story or text convey sympathy toward Israel or toward Gaza toward um Israel or toward Hamas okay and the answers could be either yes and yes sympathy for both sides or yes and no sympathy to only one of the sides or no sympathy no specific sympathy for any of the sides first of all we saw that if we look at the sympathy ratio okay so the sympathy toward um the Palestinian side is 1.5 higher than the sympathy towards the Israeli side this is not biased necessarily that in itself isn't the finding that shows a bias it's simply a way of measuring it exactly and let me just make sure it's absolutely clear so the single trial of this experiment is inputting one article by the BBC to a model chpt 4 in this case and we kept it simple because it was important for us to be very communicative with the legal team and not to use Ensemble of models or anything like that so you take one article you input it to church gp4 and you ask exactly the questions or described does the following text create sympathy to Israel does the following text create sympathy to Gaza now for every such a trial you can have four different outcomes you can either create sympathy to Israel but not to Gaza it can create sympathy to Gaza but not to Israel it can create sympathy to both and it creat sympathy to neither now if you repeat this experim this trial or this experiment multiple times in our case we had 1500 items published by the BBC in the four months following October 7th you get numbers so let's say you get 100 items in which the model said yes there is sympathy to Israel here and you get let's say 300 items in which the model said there is sympathy to Gaza then the sympathy Ratio or referred to and it's going to be a very important term we're going to repeat this term um today so the sympathy ratio would be 3: 1 because 300 over 100 equals 3 and this was a way in which we could use large language model to create one number that now quantifies The Stance of a specific news Outlet towards this subject so as well as that you ran uh the same experiment with different uh question questions as an as a way of getting towards an effective way of measuring that sympathy ratio but you also did it for articles and then for headlines so could you explain a bit about uh the different questions you tried before you settled on the ones you did for the main part of the research and then also what you found when you compared headlines to the actual text absolutely so first um so Orin described the main two questions we had in our essay but in reality we had three pairs of questions so the first question was sympathy to Gaza and sympathy to Israel but then we asked about sympathy to the Palestinian people and sympathy to the Israeli people that's the second pair and the third pair of questions was about sympathy to the IDF and sympathy to the Hamas so sympathy to the militant entities if you'd like and this was an important Nuance because what we saw is that when it comes to sympathy to Israel and sympathy to Gaza the results were very similar to sympathy to the Israeli people and sympathy to the Palestinian people it gave us some confidence in the method because one of the things you can often find in the literature regarding lar language model is that there are very probe to subtle changes in the prompts so it was very encouraging for us to see that you can change the way you phrase the actual question and still the results are similar and the and the study was replicated however when it cames to Hamas and the IDF the result was different so IDF received more sympathy than Hamas and this was again very encouraging in terms of our confidence in the methodology because now you have something you can ask a model and it's not you know you cannot say it's completely anti-semitic or completely islamophobic or islamophobic or anything like that because you see that the answer really depends on the exact question you see by that that the the model right um could handle Nuance okay it's not it it doesn't give you like a fixed answer no matter what you ask it capture nuance and it it could um distinguish between different entities it it gives you like yes the confidence in in working with it all of the things that are in and um har talking about are methods of labeling uh particular articles or particular pieces of material that have been transcribed from from uh broadcast they are not decisions on virus the BBC in its analysis of our report mistakenly said chat GPT cannot detect bias we didn't ask it to detect bias we asked it to detect a very simple label does this make you feel more positive to the left or the right and even when we've got the number of uh articles which are left or right that in itself as Ain said does not indicate bias and we're we're about to come to how we then go from that that labeling to a finding of bias that's a very important distinction by Trevor and and in fact we we expect that some items would create sympathy to side a and some items would create sympathy to side B and as Orin said before we do not we as a the scientist uh at least we do not aim to determine whether that's fine or that's not fine we just described the picture and that's why I called it samples because every trial give you a sample and now you have a label to this sample but it's the accumulation of these samples that let you depict the the big picture so what I'm getting from all of you is that the aim at this point in your research was not necessarily to find out an answer is the BBC bias in biased in one way or another it is to find a methodology a technique that can give as objective a possible way of measuring a potential bias quantifying quantif an important word okay so at this stage when we repeated these questions and we looked on the Dynamics in time and we saw many things that are going together with our intuition we scientists we are very suspicious especially when we try to establish something new so there was back and forth and some of the people in our team has criticism we should do this in order to be more certain we should do that and we repeatedly established control experiments and we came to a point where we were quite confident and happy with what we see and this is when we decided to do the headlines experiment so in the headlines experiment we wanted to assess or to check or to empirically validate an assumption we had or an observation we thought that we're seeing I will just say that this observation is very much based on on Research in media studies and journalism it's not just an assumption okay we we think this might happen like it's something that is is known to Media Scholars and perhaps I'll very briefly say what that assumption is the assumption is that the headlines we know factually they're written by different journalists to the journalist that wrote write the articles that is the bane of Our Lives as journalists right and that that's that there's no dispute over that I don't believe secondly that the headlines there's much research which demonstrates headlines are given are much more widely read than the Articles and thirdly the headlines are pushed in uh social media by the BBC so they have a much wider uh coverage so therefore they have a great deal of importance so the question that we we asked ourselves um particularly the data scientist was is there a correlation between the headlines and the Articles because that would be what you would expect there should be if the articles are 2: one uh moving left rather than right the headlines also ought to be two to one moving in the same direction that's the that was the test there was just now an example of this where the BBC published a report on the website about an Indian man who had been uh led to Jordan on a work scam and as he had experienced this problem when he was in Jordan he tried to cross into Israel to get work and the jordanians killed him in trying to do so the BBC headline for that initially said that a man crossing into Israel was killed uh and had been led by a job scam of course when there were complaints they corrected that headline to make it clear what was actually said in the article so it wasn't part of your research but it's a very good example of how somebody wrote a headline with some kind of bias either either unknown or known where they basically nailed this on Israel and it had very little to do with Israel so let's find out now the difference between the analysis of the article and of the headline what did you find so I'm very happy that you gave this example because it gives me a good starting point to explain I think what Trevor meant by correlation the thing is that mistakes can happen and if sometimes the article conveys sympathy towards one side but then in the headline it's not exactly the same or if it's noisy I mean again sometimes to the right sometimes to the left everything is okay but the assumption is that the headlines should reflect in general the same thing that the main text reflect now what's important for me to to to explain is that after we establish the methods that we believe that is objective and robust and now allows us to give some quantity in uh an experimental design that doesn't depend on the eye of the Ober Observer Etc we wanted to check the Assumption we had that the headlines are even worse than the main texts now we didn't really know and the model didn't know what our assumptions and and what we we suspect so we decided to repeat exactly the the same experiment as we described before so inputting sequentially and independently one item after the other into a large language model chp4 and asking exactly the same questions as I described before but this time the input was not the complete article but only the headline you were running them as two separate experiments the other thing to note is that the models are since large language models are constantly trained on newer data so it's important to say that this the model we used was trained before October 7 so it was not exposed to any coverage of the events that happened okay so there was no bias or any way of output being generated that is impacted by reporting or online articles that relate to the conflict past October 7 I I will just say that what we describ till now before the headlines even is to establish a b a baseline so the Baseline is the 1 to 1.5 within the main text of the stories okay so this is a baseline it has no claim no we don't um say this means bias now when we analyze um the headlines we we see these ratio is double toward the Palestinian side so here we start seeing something that we could suggest that there is something fundamentally wrong going on here again it may require more scrutiny but if the headlines consistently misrepresent the story and toward One Direction the mistakes are not equally distributed here is a problem you found that when you ran it on the articles there was a 1.5 ratio and when you run it on the headlines for the same articles it was times three so it was double the amount of discrepancy between the two which is to say that you're not still stating whether or not there's a bias that is Justified or not because an article about Palestinians who've been bombed May well be justifiably sympathetic towards Palestinians and therefore it's not the number itself that matters but the comparison of the ratio for the headlines to the ratio for the Articles now you did something else there which is the BBC doesn't just do all of this in English it has many uh different services so at this point maybe you could tell us just a little bit about why the BBC has foreign language services like BBC Arabic and then we can discuss what the findings were when you ran the same results on uh Arabic material unlike the most of the BBC which is paid for by the British public through the license fee and is meant to be apolitical and not controlled by government the foreign office does contribute substantially to the foreign language departments in the BBC very recently it increased the amount to be paid to BBC Arabic by several million and the idea behind that is that the BBC through its foreign language um programs is expressing soft power and is explaining uh liberal principles to uh people in foreign languages So in theory the Arabic should be um an expression of liberal ideology um as as expressed by uh English culture to uh the Arab arabic speaking world what we found was when we did the same experiment with the headlines I say we it's the data scientists that conducted that experiment that the ratio of um articles that were positive to Israel as opposed to those that were positive to Palestinians was twice as many in other words the the number doubled in terms of sympathy towards Palestinians and then then the headlines doubled again so you've got a ratio of ultimately in the headlines I think it's 1 to six and there again what what we're saying is we're not stating to the BBC what the ratio should be what is the correct level because that's not our editorial uh uh discretion that's their editorial discretion but we do say because the BBC itself says there should be consistency across the body of material that they're producing and what we find is that BBC Arabic is very very substantially more sympathetic towards Palestinians than BBC English and that discrepancy suggests a very significant indication of bias of a failure in a breach of their own obligations um to be impartial what you've hit on there seems to me very important that you're not necessarily saying or trying to dictate what the ratio should be but you are saying that in internally this is something the BBC says itself they should be consistent and so if they have a measure themselves of what the ratio should be for impartiality it should be reflected both in the text and the headlines in English and then that should be reflected in other languages like Arabic but if the Arabic has a much bigger discrepancy and the English already has a discrepancy between the texts and the headlines this is starting to ring alarm Bells could you explain why may I take a just just one minute for making a recap for everything we said so far the main results CU I believe it's really important so we established a system that doesn't depend on the eye of the of of the Observer which allows a large language model to detect sympathy independently and sequentially in multiple items doing big data analysis we started by scanning the articles by the BBC where we found a sympathy ratio of 1 to 1.5 favoring the Palestinians over Israel we don't say there's anything wrong about that but then we utilize the same system to look on the headlines and when it comes to the headlines we find a sympathy ratio of 1 to three and that's the first time when we stop and we look on this discrepancy and we say maybe something here is off now we decided to take this system and to check just one out of the different languages coverage that the BBC has which is Arabic and we translated the items and we repeated exactly the same experiments as Trevor said everything doubles when it comes to main text we're now looking on sympathy ratio of 1 to three when it comes to headlines we're now looking on sympathy ratio of 1 to six meaning again one sympathetic headline towards Israel for every six sympathetic headlines to Gaza now this is again as scientists and as trevo said the scientific team conducted the results we never shared the result with the legal team te before the experiment was done and we were very keen to keep this professional environment and this was something the Iron Wall between them absolutely absolutely the most um strict scientific standards where everyone including the model by the way is blind we're kept blind to everything to the Assumption and to the conditions Etc now this was the first time when we looked on the numbers and we said okay now we have a way to quantify again we believe in this quantification we did many many uh controls you mentioned changing Israel and Gaza taking Israel and Gaza out so we had this experiment when we changed Israel with Oz and Gaza with Narnia just to see that it's the language and not the model being uh you know more favorable towards Jewish entities or Muslim entities or whatever we had controls as or described of running the same thing over and over again to see that it's reproducible and that the result is stable and we had really many many ways in which we tested this but we came to be quite confident that we now have a method that allows us to give again numbers and to look on the things the way they are without being affected with our prejudges with our beles or with anyone else that I I should say that what what haran is is telling you is that he wasn't telling me his results I at the same time had a small army of lawyers um made lawyers uh reading all the same material and classifying it in a very very similar way to the way in which we were asking chat GPT and setting out their Reasons by quoting from the article to explain the reason of their categorization I say that because we gave the BBC the results of some of those so they could check it and what's remarkable is at the end I didn't know until very near the end of the experiment and I could count up the results what they were finding but there was an amazing similarity between my group of lawyers and their findings and the findings that haran and his team and Orin and were were finding in terms of the precise numbers and percentages both of the English and also of the Arabic I want to give an analogy to may make it maybe even a bit clear so there is this expression um you don't see the forest for the trees right and the methods that we kind of established here is that a you can see the forest due to the trees even if you misidentify some of the trees but you could pretty accurately um uh discover the the the the way you know the ratio between the different trees in the forest this is one thing so again even if you get one tree wrong the general mixture is right and once you have that you could also compare different forests so if for some reason Forest needs to always include the same ratio of trees of mixure of trees again could be whatever trees they are um you would expect all these Forest owned by the BBC to have the same ratio what we see here comparing stories to headlines and English reporting to Arabic reporting in the BBC that the forests are very very different so again even if we get that we or the algorithm the AI model gets one headline wrong or few headlines wrong or story wrong it doesn't matter we manag to establish this General look uh that captures uh the trend like what's going on here and so effectively what I understand that to mean is you've developed the method that news rooms could use themselves in order to establish not if one specific article or broadcast is problematic that's not the point here to point at one piece of work and say it's wrong but to look at your broad overall output and say we have potentially a problem because there is a discrepancy in the ratios between headlines and texts or between languages from the same supposed news organization and when you talk about these different forests I think that brings us nicely on to this comparison that you did between the BBC uh both in English and in Arabic and other news sources which will also provide a way of seeing how there may be a problem even if you yourselves don't say what the ratio should be um so Trev you've got something say that I'm going yes just to follow on what Orin said the BBC has what it I think laughably calls complaints department because I say laughably because almost no complaints are ever accepted but nevertheless they have these complaint departments and the standard response to which they give is we're disappointed that you didn't like this particular program it's impossible for us to give all points of view on complex story in a single program but over time we ensure that we are balanced what our experiment proved is that's just simply a lie it's just not true because we showed that over a period of four months which is 9 million wor they were not balanced and they were consistently not balanced so that gives a lie to the BBC's response and that's the important point about why we conducted the experiment over a set a fairly significant period of time a third of a year so rather than looking at individual pieces you took them at their word and said okay let's measure it by your standards and look at the broad for and we never complained that a particular program shouldn't have been put on you know that's the story they chose to broadcast we're not complaining about that we're simply saying that balance should be consistent across their products yeah the the forest should all be the same and and I really like this analogy about the forest I have to say when it comes to articles to main texts often you can find sympathy towards both sides so a war is horrible and especially in the four months following October 7th there was great suffering in both sides Israelis and Palestinians uh were all well they all deserve sympathy I think and in many of the Articles published by the BBC this was the case and in 16% of the items we found sympathy towards both sides in the main text analysis however when we looked on headlines that wasn't the case less than 1% of the headlines conveyed sympathy to both sides now this was a sad thing to me because if you want to promote a real solution as a news Outlet a responsible news outlet and an an influential one like the BBC you should really depict and there's literature by the way I this is not my invention you should depict the complexity in the conflict that's the way to promote a real solution you should explain and emphasize everyone's point of view for the public to be better informed and to better understand the conflict in the headlines that's what we saw we saw this us and them mentality promoted so it's either sympathy to the Israeli hostages or sympathy to those suffering in Gaza and in very rare cases you could see a headline about you know a minute of Silence in in Wembley or something like that conveying sympathy to the victims of Israel and Gaza having demonstrated through this labeling system that there were problems that that the headlines were more appeared to be uh leaning more one way than their articles and the Arabic considerably further we then asked ourselves the question well maybe the BBC has the analysis right maybe the BBC Arabic is in the right place and their English is very sympathetic to Israel we don't know let's try sythetic not very sympathetic relatively sympathetic yes so let's let's see where the BBC and the BBC Arabic lies in the constellation of uh media output uh across the world and and that and devising an experiment that could place BBC and BBC Arabic in a in a graph that showed All U media outlets in the world was really a very important experiments which the data scientists conducted and I think it's worth explaining that and the results right so this is where it gets really interesting I think because once you've devised the system and it relies uh in some parts quite largely on these computer models these um AI models it means you can actually reproduce it in other places so you then did a study of 1,400 different news outlets from around the world including some which were Israeli some which were from Arab countries so let's say notionally more pro-jewish or Israeli or pro-arab or Muslim and also some in between western or or from elsewhere which had no obvious affiliation one way or the other they taught me through this astounding chart that shows the results so we now establish a system and we have the AI models working with us so to speak well it never needs to sleep to rest it can read whatever and you can keep it blind to what whatever elements you'd like it to be blind it will never try to assume what you're trying to do and again we're repeating the same experiment over and over again so the idea is this when we described the sympathy ratio of six or in BBC Arabic or of three in BBC English it's a simple calculation as I said before so you have this amount of articles that are sympathetic towards that side that amount that is sympathetic towards the other side and then the ratio between them would give you one number now why not repeating exactly the same experiment as you said with almost 1500 actually news outlets worldwide uh and with more than 600,000 items where we can basically fit the model without the model knowing whether a certain headline comes from CNN BBC the guardian Fox News some Israeli Outlet or um El jazer Etc so that's what we did so in this experiment the model gets a thousand or so items from every news Outlet or hundreds of items at least and for every news Outlet you get one number that reflects the sympathy ratio that was measured for these news outlet and I remind you again for the last time that the model doesn't know a what you're trying to check and what you're trying to prove about the BBC or not and B which news Outlet is the news outlet that published this headline and it's important to note as Oren said before that we were positioned in a very unique place because the model we used back then CH GPT 4 was trained on data up to October 7 up to October 23 so the model was completely oblivious of anything that's going on later and there was no way for the model to know that a specific headline comes from this or that news Outlet that's why it's so important right because and that's why I I can say that it's an independent sample and so on that point it's La its understanding of language and its use of language has absolutely no input from a world in which October the 7th 2023 had happened exactly correct ex correct and the model can now judge one headline after the other without knowing which news Outlet taken from and the result would be aggregated to give one number for every news Outlet well first we created a scale um an unprecedented scale I think the first time a scale like that is is being created in regards to a specific event and we were now able to POS position each news Outlet on the scale and to show its I wouldn't say bias but I would say how balanced it is towards each one of the sides in this specific War composition of trees in the forest in the BBC forest in the guardian forest in Lamont forest jazer forest exactly we were able to describe the composition of different forests worldwide it was we were highly impressed I'd say with the ability of this methodology to classify between different Outlets or different forests on one side we saw a cluster of Jewish Affiliated um news outlets many of which Israeli but not only so we're talking for example uh talking the jc.com the Jewish Chronicle Israel today uh the the Jewish star these websites you would perhaps expect to have more sympathy towards Israel except maybe that was one of the surprising I mean I'm I'm a subscriber of far yes they're The Usual Suspects and you would expect to find them sympathetic to Israel and that's exactly what the experiment showed so we have on the one side um Israeli and Jewish and more right-wing or conservative new sources and then on the other end of the spectrum you had sources like Al jazer uh and also Gulf news uh the guardian sources like that which you might expect to reflect the other side El man so the guardian was not as extreme but yeah but you see this cluster of green um and Muslim Affiliated news outlets on the other side absolutely now the BBC English was positioned towards the other side I wouldn't say very extreme in the middle and leaning towards the other side again it's not for me to interpret this result I'm just quantifying and measuring and putting things on scale when it came to the BBC Arabic well it was complet completely in the we talked about Forest so it was in the woods but in the green wood the green part yeah it was surrounded by Green forests and green meaning is Evergreens surrounded by Evergreens so in this in this graph you've you've shown all of the sources but some of them you have painted in green and some in blue blue are the ones that are Jewish Affiliated green are the ones that are Muslim Affiliated and all the others which don't necessarily have a religious affiliation are National Rel and and so you said that BBC English I can see it's already quite far along this scale towards the Arab or Muslim side of things but BBC Arabic is way over there surrounded by all these other green sources on here al-manar Al jazer um all of these sites that really you would expect to have an Arab or Muslim perspective but you wouldn't necessarily after what you said Trevor expect BBC Arabic to have a similar Outlook because if it does what's the point in it the point is it should be giving a British version in Arabic one that represents a British sensitivity towards news reporting correct so I think what's there are two important lessons to learn from this this particular experiment the first is almost all the newspapers that one knows about than I know about appeared relative to one another where I would have expected them in the scale the Jewish Chronicle and many of the Israeli newspapers the The Times of Israel were way to the pro-israel side um the Iran times and the Pakistan times were way to the pro pakist Pakistan Pro Palestinian side so and and all of the newspapers appeared where you would expect them on the scale and that demonstrates to my mind that the experiment is at least consistent across all news outlets because it's placing things where you expect them to be the guardian was not one of the pro-israel newspapers and it would have been very disappointed if it had been proven to be so um so it validates the method basically it it validates the method and then you find that there is almost no Blue Water between Al jazer and BBC Arabic BBC Arabic is literally putting out Hamas propaganda material that's what it's filming on on October the 7th um is Hamas propaganda films and instead of um broadcasting soft power and liberal values what what's happened is that the BBC Arabic has been captured by people who seem to share the same ideology as the people that are running Al jazer and let's remember that Al jazer is a Qatari based newspaper and Qatar is along with Iran is funding Hamas they are the main funders of of Hamas so this is something which is seriously problematic for the BBC and for all of the members of the public that are funding the BBC what is it doing in that particular position I I'd like to put my finger on something really important Orin mentioned at the beginning of our talk and now maybe we can better understand it or better explain it let's say that the model is biased let's say that cpt4 is not a perfect measurement tool and it's not by the way there is no measurement tool there's never any a perfect measurement tool that's the tragedy of scientists but even in that case again look at the experiment we designed here the model is blind the model doesn't know that this comes from the BBC and this doesn't and we're doing things over and over and over again in big numbers so even if it's not perfect and this time the call was wrong one example I really like to give um for something that the model cannot know is the phrase Kamas hostages so the model can mistakenly think that Kamas hostages is a term that creates sympathy towards Kamas because what does it mean to be a Kamas hostage maybe these are hostages that are people of Hamas and are being held by Israel right so the model can do this error and and and we can understand why so the model can do errors to this side or or to the other side but then it does something when it does something repeatedly over and over again these mistakes are averaged out basically so the experiment again allows us to assess this and now we're not saying whether six symmet ratio of six or of three is bad but we're just putting things one next to another and we let other people see the full teure and decide for themselves now the last thing I would say in that regard is that if if the model was biased uh when it came to Headlines by the BBC the same should be true for headlines that came out of CNN or Fox News or you name so the system is internally consistent is what you're saying um it would have to be complete junk for it not to be valid in the comparisons and it obviously isn't complete junk because of the fact that you have explained through the course of the last hour or so the extreme and and very careful measures you took to make this as scientifically valid as possible now at this stage is important that I mention what happened when you submitted this very thorough report both legally and scientifically to the BBC they said that they have read the asserson report in its entirety uh and have responded to its authors in detail having carefully examined the report we do not think that its methodology leads to Rel relable conclusions we do not accept that impartiality can be assessed using sympathy nor by quantifying daily coverage of events or counting words we believe the use of AI to measure impartiality in this way is unreliable and unproven the methods used in the report fail to take account of basic journalistic principles and practice and often rely on selective interpretations and incomplete evidence and in conclusion we do not see any new evidence to suest we have breached our obligations for due impartiality and accuracy during our coverage of this highly complex challenging and polarizing conflict so that is their very disappointing um reaction I know Trevor that you have actually appealed and what I want to ask you is just for a brief uh Summary of why you think that this was a very unfair response and then um all three of you really what is the next step here what what are your plans not just with the BBC but with this uh very fascinating and outstanding technique you've devised that could be used in a number of ways uh so Trevor first off what what do you think of their response and and I know you know something very interesting about their methods for measuring their own well the first thing is I don't think it was an unfair response I think it was a fatuous response it's an indication that they hadn't actually read the report and if they read it they hadn't understood it um that they completely fail to respond to most of the report and I just to give you some examples we looked at many experiments that haven't been discussed here we looked at there the the the BBC's obligation to ensure that no major area was left unre under reported and there were many stories they they never mention that Hamas has a charter that it's a terrorist organization not that it's a dictatorship they mention it although they promis to say that it's a terrorist organization in fact they more frequently refer to it as a health Ministry they didn't mention at all the socioeconomic effects um of the war on Israel they the displacements of Israeli citizens yeah the con the connections of many of the reporters to Hamas yes there are so many areas that we looked at out of 27 areas I think the BBC's response only dealt with about seven which means that 20 of the points that we made um each one of which was very carefully experimented and very very carefully uh substantiated was actually ignored um which means that the BBC has no answer to those questions the response to the AI was really incoherent um because they never looked at the methodology they said they have questions about the methodology never asked those questions and they say that the methodology that AI cannot be used to detect bias well we didn't ask it to detect bias we asked it to label things which it certainly can do they referred to only one experiment which they one published article which they used to support their rejection of the AI and that was an article that was written um on chat GPT 3.5 not chat GPT 4 that's a bit like comparing a bicycle with a Formula 1 racing car um so it it it was it was again um an incoherent response and essentially they are ducking the issues in failing to deal with the issue with what we put before them and also to try to um ignore the report because of AI which in itself as I said is unsubstantiated it fails to deal with the fact that the more than half of the report was dealing with traditional forensic systems which they should look at and they didn't look at most of those arguments they didn't look at and comment on at all so I'm afraid the BBC is running scared here they know they've been caught um the report has been uh within two months there were 400 million social media um searches for the report there were uh it was referred to in the house of parliament in the House of Lords in the House of Commons uh in debate uh it's been generally presented in many many forums as proving that the BBC is failing is in breach of its obligations of impartiality and is biased so yeah so I want to add something I don't know maybe on a personal note here about the BBC response that I find it a bit insulting because we as a scientist we we we seek criticism we we thrive on push back that's that's how we you know how science works and this is what we expect in order to refine our methods and and get to the bottom of things and this response kind of dismisses it all in a very shallow way which is you know just sad I I agree it's disappointing and again as a scientist there's something philosophical here I think and something about the essence in science so we never I spoke about not being able to find the perfect measurement tool we never we always have a concept that we try to measure right how the brain operates or how something happens in nature we are never able to really reach this concept and we always look for some approximation for this thing for this abstract thing we're trying to measure and we say okay this is if we can only measure this then we can say something about that and that's what we did here now we thought really hard about how would it be reasonable to measure bias and this was you know a process of long discussions with the legal team that made us understand that really sympathy Framing and the way sympathy is framed and this is against that's back in the literature um this is really the bottom line right you can say many things about side a side B but at the end of the day the question is whether a story you know shifted you towards this way or that way and this is something that's happening with sympathy now saying our guidelines is not about sympathy and there's nothing that has to do with our guidelines and what you did is like showing well I'm speaking about height and you are speaking about centimeters what do centimeters have to do with height you know in in this case we again worked very hard to establish a system in which sympathy would be the measurement that could describe rigorously and robustly and objectively something very abstract as sympathy and the answer that is saying well there what do sympathy have to do with with our guidelines really shows a lack of understanding of a scientific conduct I would say and the way you design an experiment the way you make an operationalization and the way you measure things in quantitative sciences and that's what most disappointing to me which is again a bit surprising because the BBC has this huge and very serious news organization they have or supposed to have teams of data journalist and data team that analyzes all these things and supposed to be able to understand like what we did can I make a final Point um which is that you you said earlier that I had something interesting to say about the methods that the BBC itself employees and I'm I investigated this talking to people at director level uh very senior journalist level all the way down through the ranks to more Junior Deb journalists and many people who who were formerly and presently at the BBC and people with a huge amount of experience Decades of experience the BBC has zero method for actually monitoring what they do we monitored what they do counted it and we able to show them the BBC has no methodology at all and that's the reason they're so off the rails that's the reason they've gone so far from their mission statement is because they actually don't measure at all and that is a fundamental problem within the BBC and that's what our report proves I think Beyond any reasonable doubt that the BBC is in breach of its obligations so I thank you all and the many dozens of people that also helped you in this research and this report and I urge uh anyone watching this to go to ass.co slass report where you can see the report itself and make your own decisions about what you think uh and I really am grateful for the work you've done not just on the BBC because as I see it your methodology could actually be rolled out and used in newsrooms across the world for journalists and editors to make sure that they are doing a better job I think your final point there Trevor is is a very important one that this is something that those of us working in news need to be measuring as well in order to improve our work uh and so for your contribution to that I thank you all very much for this conversation thank you thank you