How people use Search to access Wikipedia is a common question by researchers. Until now, however, there has been little data available about this relationship. To help address these questions, the Wikimedia Foundation is releasing a new, faceted dataset on search engine traffic to Wikipedia so you can ask questions like “What is the most common search engine in my country?” or “Which search engine is most-used by Android users?”
The Wikimedia Analytics Engineering team manages multiple systems, all gravitating around a big (for our standards) Hadoop cluster. This post describes our path to changing our Hadoop distribution in a single day, together with the lessons learned while doing it.
This article describes the methodology used by the Wikimedia Foundation to monitor outages on Wikipedia around the world. These events are called anomalies and could be due to various causes, among them censorship.
We have been working this past year to better identify and tag the “bot spam” traffic so we can produce top pageview lists that (mostly) do not require manual curation.
Learn about using the Mediawiki History Dataset to explore the every day experience of editors on Wikipedia.
Part 3 of 3 posts on Wikimedia’s event data platform.
In the previous post, we talked about why Wikimedia chose JSONSchema instead of Avro for our Event Data Platform. This post will discuss the conventions we adopted and the tooling we built to support an Event Data Platform using JSON and JSONSchema.
The Wikimeda Foundation has been working with event data since 2012. Over time, our event collection systems have transitioned from being used only to collect analytics data to being used to build important user facing features. This 3 part series will focus on how Wikimedia has adapted these ideas for our own unique technical environment.
Posts on social media can make an otherwise obscure Wikipedia article go viral. A new traffic report gives English Wikipedians new insight into which ones are being read and shared most on four major social media platforms.
Maintaining and improving one of the largest websites in the world using Open Source software requires a continuous commitment. The site is always evolving, so for every new component we want (or need!) to deploy, we need to evaluate the Open Source solutions available.