By Jonathan Morgan and Isaac Johnson, Wikimedia Research
Some Wikipedia articles are born popular, some achieve popularity, and some have popularity thrust upon them. But it can be hard to anticipate when or why a particular Wikipedia article will become popular, especially when that popularity is driven by viral content posted on social media platforms like Facebook, Twitter, YouTube, and Reddit.
The pace and scale of social media mean that any internet content—from the perennially-popular cute cat video genre to topical trends like the latest in DIY face mask couture—can go from obscurity to ubiquity in a matter of hours. Sudden reader traffic spikes due to viral content can have major consequences for the encyclopedia that anyone can edit. Traffic spikes that pull from multiple sources—search engines, Wikipedia’s internal search, and links from other Wikipedia articles—generally signal that an article’s newfound popularity is related to major global events.
A sudden increase in attention to a previously sleepy Wikipedia article coming from a single social media platform is harder to predict and interpret.
Most of the time, these social media traffic spikes are harmless. They can even be beneficial, as when they draw the attention of subject matter experts outside the community to previously unidentified errors or content gaps. They can also lead to increases in vandalism to the target article—or to good-faith but nonetheless damaging edits by first-time editors who aren’t familiar with how Wikipedia works yet. A surprise traffic spike may even indicate a coordinated attempt by a group of social media users to surreptitiously insert disinformation into a Wikipedia article. A traffic spike could also indicate that the social media platforms themselves are linking to Wikipedia to debunk disinformation or fact-check controversial content that users are posting on their sites.
In mid-March, the Wikimedia Research team launched a pilot project that gives editors the timely information they need to monitor these social media traffic spikes. The social media traffic report is an experimental initiative intended to help us understand whether editors with more visibility into reader behavior are better at maintaining article quality. Editors in the pilot project will help the Research Team identify examples of potential disinformation campaigns and aid in ongoing efforts to build tools to model, monitor, and respond to disinformation on Wikipedia.
Historically, social media traffic spikes have been largely invisible to Wikipedia editors when and even after they occur. This is because the pageview data that the Wikimedia Foundation makes public and that powers our pageview API, as well as tools like TopViews and WikiStats, does not include information about where the source of traffic comes from. This is a measure that Wikimedia takes to preserve user privacy.
For this project, we worked with the Foundation’s Privacy, Legal, and Analytics Engineering teams to determine whether we could release platform-specific pageview data for the articles that were receiving the most traffic from that platform on a given day. We determined that as long as the report only shares traffic from articles that received at least 500 views from a single platform on a single day, this data could be safely released without endangering the privacy of individual Wikipedia users.
The report also includes other public data about each of the articles that we think will be useful like the previous day’s views from that platform (to identify spikes vs. persistently popular articles), the total pageviews for the article on that day (to check whether a spike is platform-specific, or part of a broader trend), and the number of Wikipedia editors who have watchlisted the article—so that patrollers can focus their attention on the articles that are least likely to be under active monitoring already.
This project was initiated based on the output of a disinformation discussion at Wikimania 2019 and the findings of a recent research project aimed at identifying vulnerabilities created by technological limitations of current patroller workflows.
The social media traffic report is updated daily at around 14:00 UTC with the previous day’s traffic. We plan to maintain the report through the end of May 2020. If we receive positive feedback about the report, we’ll consider working with our partner teams at Wikimedia to make this data available on a long term basis. We’re also considering making social media traffic data available for more platforms on more Wikipedia languages. Our goal is to make it possible for community researchers and developers to use this new data source to build analysis and monitoring tools to help editors ensure that Wikipedia remains the most accurate, up-to-date, and reliable source for information on the internet—whether that information is being used to debunk Flat Earth conspiracy theorists (currently trending on YouTube), inform political discussions (Three-Fifth’s Compromise, currently trending on Twitter), or something completely different (Triskaidekaphobia aka “fear of the number 13”, currently trending—where else?—on Reddit).
We are looking for ways to improve the social media traffic report, and we want your help! If you use the report, please provide feedback on the project talkpage on Meta. If you note any suspicious edits when using the report to monitor trending articles, you can report these using the anonymous Google Form linked at the top of the social media traffic report page.
About this post
Featured image credit: Nested hyperboloids, David Eppstein, CC BY-SA 3.0 and GNU Free Documentation License version 1.2 or higher