Public Lecture: AI Safety, Watermarking, and Neurocryptography by Scott Aronson

Jul 3, 2024

Public Lecture: AI Safety, Watermarking, and Neurocryptography by Scott Aronson

Introduction

  • Speaker: Scott Aronson
  • Host: Andre Asashi, Chair of the Department of Mathematics
  • Event: First public lecture with an invited speaker since the COVID-19 pandemic
  • Topic: Neurocryptography, interface between AI and cryptography

About Scott Aronson

  • Theoretical computer scientist specializing in Quantum Computing
  • PhD from Berkeley (2004)
  • Former Assistant/Associate Professor at MIT EECS
  • Currently a Professor at UT Austin
  • Working on theoretical foundations of AI safety at OpenAI
  • Awards: NSF Waterman Award, ACM Prize in Computing, Simons Investigator

Overview of Talk

  • Shifted from Quantum Computing to AI safety
  • Collaboration with OpenAI on AI safety
  • Unexpected rapid advancement in AI capabilities (e.g., ChatGPT)
  • Rising interest in AI alignment and safety
  • Five future scenarios for AI's impact: AI Fizzle, Futurama, AI Dystopia, Singularity, AI Apocalypse
  • Near-term AI safety focuses on practical problems like watermarking and cryptographic backdoors

AI Safety and Alignment

  • AI alignment: Ensuring AI actions align with human values and interests
  • AI ethics vs AI alignment: Different focal points but ultimately related
  • Reform AI alignment: Addressing practical, near-term issues with AI safety

Neurocryptography

  • Definition: Integrating cryptographic functionalities in/with neural networks
  • Applications:
    • Watermarking: Recognizing AI-generated content
    • Cryptographic Backdoors: Secret commands for controlling AI
    • Preserving privacy and protecting copyright
    • AI's capability to break CAPTCHAs

Watermarking in AI

  • Importance: Recognize AI-generated text to prevent misuse (e.g., academic cheating, misinformation)
  • Techniques:
    • Not detectable by ordinary users but statistically significant for those who know what to look for
    • Can be inserted by modifying the probabilities used in generating text
    • Effective without degrading text quality
  • Problems: Potential attacks like text translation, inserting dummy words
  • Future approaches: Watermarking at the semantic level, similar to tree-ring watermarking in images

Cryptographic Backdoors

  • Concept: Insert secret commands for triggering specific behaviors (e.g., off-switch)
  • Challenges: Creating unremovable backdoors; ensuring safety if AI modifies itself

Key Points and Discussions

  • AI's Capability Growth: Rapid improvements leading to ethical and safety challenges
  • Reliability of Watermarking: Effective with a large number of tokens; robustness against simple attacks
  • Regulation and Coordination: Necessary involvement and alignment among AI companies
  • Various AI Misuse Scenarios: From academic cheating to generating tailored phishing attacks
  • Potential Solutions: Combining watermarking with discriminator models, cryptographic databases

Conclusion

  • Neurocryptography's potential impact on AI safety
  • Combining modern cryptographic techniques with AI to solve emerging problems
  • Open questions and challenges with AI-generated content monitoring and control

Q&A Highlights

  • False Positives with Watermarking: Needs many tokens to significantly lower false positives
  • Verification and Regulation: Government regulation essential; how to implement it is still debated
  • Future AI Capabilities: Unpredictability and potential existential risks
  • Role of Discriminator Models: Combined with watermarking for flexibility and accuracy
  • Impacts on Education: AI's role in applications like college essays; ethical trade-offs
  • Continuing Development: The necessity for empirical studies and testing to address evolving challenges