Visual Representation of Wikipedia

Jul 18, 2024

Visual Representation of Wikipedia

Overview

  • Visualization: Depicts 6.3 million English Wikipedia articles and nearly 200 million links between them.
  • Graph Components: Each circle represents an article; links show the network between articles.
  • Development: Months of work, thousands of lines of code, significant computation time.

Community Detection and Colors

  • Community Colors: Different colors represent different communities of articles, algorithmically determined.
  • Total Communities: 44 communities detected.
    • Theory Tested: Articles within the same community have more similar content.
    • Common Categories Found: Analyzed top categories within communities to verify similarity.

Examples of Communities

  • Community #3: 760,000 articles, mostly related to politics and law (e.g., US presidents).
  • Community #5: Focused on music (e.g., popular musicians).
  • Community #10: Video games.
  • Other Examples:
    • #11: Space objects
    • #19: Religion politicians
    • #6: English and American movies and TV, notable separation between Indian and Korean cinema from Western cinema.
    • #14: Canadian people and hockey.
  • Unexpected Findings: Sports articles were more separate than expected.

Size of Circles

  • Circle Size: Proportional to the number of incoming links.
  • **Examples: **
    • Basketball: 44,000 links.
    • COVID-19: 46,000 links.
    • World War I: 100,000 links.
    • World War II: 189,000 links.
    • United States: ~280,000 links.

Wikipedia Race/Game

  • Game Description: Navigate from one Wikipedia page to another only by clicking links within articles.
    • Example Path: Pokémon to ancient Egypt in 2 clicks.
  • Importance of Links: Ignored links in references or see also sections to simulate the game.
  • Path Existence: Not always possible due to orphaned and dead-end articles.
    • Orphans: 5% (350,000) of articles have no incoming links.
    • Dead-ends: 6,000 articles with no outgoing links.
    • Dead-end Orphans: ~2,000 articles.

Concept of Separation

  • Degrees of Separation: Tested path lengths to see how many degrees it took to reach other articles.
    • Example: From Pluto to various other articles:
      • 1st degree: 255 articles
      • 2nd degree: 20,000 articles
      • 3rd degree: 618,000 articles
      • 4th degree: 3 million articles
    • 6 Degrees of Separation: 90% of articles reached at the 6th degree; max 8 degrees to reach 92% of articles.
  • Average Path Length: 4.8 links, with 8% of paths not existing.

Longest and Special Paths

  • Max Paths: Some articles have extremely long paths.
    • Example Long Path: 166 links from athletics in the 1953 Arab games to a list of highways numbered 999.
  • Unique Findings: Disguised dead-end orphans like Fanta cake with self-links only.

Final Thoughts

  • Dynamic Nature: Wikipedia is ever-changing; data may become outdated as edits are made.
  • Audience Encouragement: Encourage participation in the evolution of Wikipedia articles.

Acknowledgments: Thanks to sponsors on GitHub who allow video creation.

Call to Action: Encouraged audience to subscribe and like the video.