Transcript for:
Understanding AI Slop

In today's ever-evolving digital age, it is crucial to recognize that clear prose is not only important but also a powerful tool that helps us to delve deeper into this ever-shifting landscape. My goodness, that nonsense sentence was an example of low-quality AI-generated content, colloquially known as a AI slop. And you don't need me to tell you that it's everywhere in homework assignments and emails, in white papers, and even sometimes, in in comments to YouTube videos so I hear. Now the word delve, for example, that shows up in papers published in 2024 25 times more often than papers that were published a couple of years earlier. Delve is an AI slop word. AI slop is text produced by large language models that is formulaic, it's generic, it's error prone, and it's really offering very little value. So let's uh delve into some characteristics of AI slop, so we can be sure to recognize it. Let's look at why AI slop happens, and let's discuss some strategies to reduce it. We can break down AI slop into two categories: phrasing and content. And let's start first with phrasing. Now AI-generated text, it often exhibits distinctive stylistic quirks that make its output, well, a bit of a slog to read through. So, for example, there is inflated phrasing like "it is important to note that," that comes up a lot, and it's, well, it's needlessly verbose, and this phrasing can be ponderous and self-important. "In the realm of X, it is crucial to Y." Now AI slop often adopts formulaic constructs as well. "Not only but also" is one of my least favorite. So not only are formulaic constructs annoying, but also they are unnecessarily wordy. You'll also find over-the-top adjectives that don't add substance. That includes phrases like "ever-evolving" and "game-changing." That leave us with the impression that AI slop is rather desperately trying to sell us something. And then there's the good old em dash that's used to tack on clauses or extend sentences. And honestly, I'm not even sure how to actually generate an em dash on my keyboard, but they are everywhere in AI slop. And a little tip for detecting these AI-generated em dashes. Typically, they don't leave a space between words that they connect, so we just have this no space and then that. But most often, humans do put a space there. So that's kind of worth knowing if you're trying to detect if something is AI-generated or not. Now these phrasing tics, they can be pretty annoying, but content problems are another characteristic of AI slop. So there is verbosity. LMMs tend to be quite verbose by default, writing maybe three sentences when one would do. An LLM response to a user question might run to several paragraphs in length, but not really contain much in the way of useful information. A bit like a human student trying to meet a minimum word count for a homework assignment. That was. That was me back in the day. Sorry, Mr. Painter. 800 words on Hadrian's Wall was a lot. Now, another hallmark of AI slop is false information, which states fabrications as if they were true. And we all know that LLMs can hallucinate. That's to generate plausible sounding text that is factually incorrect, but there are ways to minimize that. And if none of those steps are taken, there's a good chance you're outputting AI slop. And look, AI slop can be proliferated at scale. AI content farms can churn out SEO friendly articles that are packed with keywords but low on accuracy or originality. And before you know it, we're swimming in a sea of slop. But why does this happen? Well, let's consider how the models function. LLMs are built on transformer neural networks that are trained to do one thing, and that one thing is to predict the next word or the next token in a sequences, token-by-token generation. In essence, an LLM is output-driven rather than goal-driven. It keeps writing until some stop condition. It's always choosing a likely next word based on statistical patterns learned from its training data, and that can lead to some overly generic and low quality responses. Also, training data bias also plays a role. LLMs are trained on a vast corpora of human-written text, and they inherently reflect the distributions of language in that data. So that means if certain phrases or stars were overrepresented in the training data set, well, the model will tend to reproduce them. Now there's also reward optimization that can lead to low-quality outputs. So LLM models typically go through some amount of fine-tuning and that often includes RLHF. That's reinforcement learning from human feedback. Now that's designed to help the model produce more helpful answers. During RLHF, the model is trained to maximize a reward based on how humans rate its outputs, and if those humans rate the certain types of answers higher than others like, for example, answers that sound very organized and thorough and polite, well, the model will adapt to match those preferences, and this can lead to a form of model collapse, which well, as its name suggests, not good. Can I, uh, does this look scary? It's supposed to look scary. Model collapse. We don't want that. That's where the the models outputs, they become overly similar to one another. They all start to conform to kind of a narrow style that was perceived as high scoring during this training, the result being that every LLM output starts to look a bit alike. So what can we do about it? Well, let's look at strategies to reduce AI slop from two perspectives: users of AI models and developers of AI models. Now, some basic prompting strategies can lead to higher-quality outputs for users. And you've probably heard some of these before. One strategy is to be specific. A well-crafted prompt can significantly reduce generic AI output, so tell the model about the about the tone of voice you're looking for, or who the audience is. And something else I like to do is to always be sure to provide examples. Give the AI model a sample of the style or of the format you're looking for. LLMs are master pattern matchers, so anchoring a prompt with the style you want, well, you're going to reduce the chances it defaults to a generic tone. And also make sure to iterate. Don't just blindly accept the first draft of AI output. One big advantage of LLMs is that you can converse with them. You can say exactly how an output should be improved. Where an output may be started out as AI slop, with a bit of back and forth between a user and an LLM, that takes can turn into higher quality, slop-free content. Now, on the developer side, one of the things that you should consider is to refine your training data curation. The old computer science adage of garbage in, garbage out really applies very strongly to LLMs. If the training set includes a lot of low-quality web text, the model will inevitably learn those patterns to filter out all the bland SEO spam and sources with poor writing before using those sources to train or fine-tune models. The second thing to consider is reward model optimization. That's about tweaking that RLHF process I mentioned just earlier with more nuanced feedback signals. So for example, multiobjective RLHF is where you optimize for, let's say, helpfulness and correctness and brevity and maybe novelty as well, and all as separate axes. And then to overcome AI slop filled with hallucinations, be sure to integrate retrieval systems that allow the model to look up real documents when answering using techniques such as RAG. LLMs have brought some incredible capabilities to content creation, but it can also result in formulaic generic content filled with inflated language and outright incorrect information. A wave of AI slop may indeed be washing over the web, but by recognizing the typical signs of low-quality AI-generated text and then understanding why they occur, it's not hopeless. We can counteract slop through prompt engineering, through editing and through developing smarter models. Oh, and I would love to hear your tales of AI slop. Let me know your favorite examples in the comments. I look forward to delving into them.