Visual Representation of Wikipedia
Overview
- Visualization: Depicts 6.3 million English Wikipedia articles and nearly 200 million links between them.
- Graph Components: Each circle represents an article; links show the network between articles.
- Development: Months of work, thousands of lines of code, significant computation time.
Community Detection and Colors
- Community Colors: Different colors represent different communities of articles, algorithmically determined.
- Total Communities: 44 communities detected.
- Theory Tested: Articles within the same community have more similar content.
- Common Categories Found: Analyzed top categories within communities to verify similarity.
Examples of Communities
- Community #3: 760,000 articles, mostly related to politics and law (e.g., US presidents).
- Community #5: Focused on music (e.g., popular musicians).
- Community #10: Video games.
- Other Examples:
- #11: Space objects
- #19: Religion politicians
- #6: English and American movies and TV, notable separation between Indian and Korean cinema from Western cinema.
- #14: Canadian people and hockey.
- Unexpected Findings: Sports articles were more separate than expected.
Size of Circles
- Circle Size: Proportional to the number of incoming links.
- **Examples: **
- Basketball: 44,000 links.
- COVID-19: 46,000 links.
- World War I: 100,000 links.
- World War II: 189,000 links.
- United States: ~280,000 links.
Wikipedia Race/Game
- Game Description: Navigate from one Wikipedia page to another only by clicking links within articles.
- Example Path: Pokémon to ancient Egypt in 2 clicks.
- Importance of Links: Ignored links in references or see also sections to simulate the game.
- Path Existence: Not always possible due to orphaned and dead-end articles.
- Orphans: 5% (350,000) of articles have no incoming links.
- Dead-ends: 6,000 articles with no outgoing links.
- Dead-end Orphans: ~2,000 articles.
Concept of Separation
- Degrees of Separation: Tested path lengths to see how many degrees it took to reach other articles.
- Example: From Pluto to various other articles:
- 1st degree: 255 articles
- 2nd degree: 20,000 articles
- 3rd degree: 618,000 articles
- 4th degree: 3 million articles
- 6 Degrees of Separation: 90% of articles reached at the 6th degree; max 8 degrees to reach 92% of articles.
- Average Path Length: 4.8 links, with 8% of paths not existing.
Longest and Special Paths
- Max Paths: Some articles have extremely long paths.
- Example Long Path: 166 links from athletics in the 1953 Arab games to a list of highways numbered 999.
- Unique Findings: Disguised dead-end orphans like Fanta cake with self-links only.
Final Thoughts
- Dynamic Nature: Wikipedia is ever-changing; data may become outdated as edits are made.
- Audience Encouragement: Encourage participation in the evolution of Wikipedia articles.
Acknowledgments: Thanks to sponsors on GitHub who allow video creation.
Call to Action: Encouraged audience to subscribe and like the video.