We have been working this past year to better identify and tag the “bot spam” traffic so we can produce top pageview lists that (mostly) do not require manual curation.
Learn about using the Mediawiki History Dataset to explore the every day experience of editors on Wikipedia.
Part 3 of 3 posts on Wikimedia’s event data platform.
In the previous post, we talked about why Wikimedia chose JSONSchema instead of Avro for our Event Data Platform. This post will discuss the conventions we adopted and the tooling we built to support an Event Data Platform using JSON and JSONSchema.
The Wikimeda Foundation has been working with event data since 2012. Over time, our event collection systems have transitioned from being used only to collect analytics data to being used to build important user facing features. This 3 part series will focus on how Wikimedia has adapted these ideas for our own unique technical environment.
Posts on social media can make an otherwise obscure Wikipedia article go viral. A new traffic report gives English Wikipedians new insight into which ones are being read and shared most on four major social media platforms.
Maintaining and improving one of the largest websites in the world using Open Source software requires a continuous commitment. The site is always evolving, so for every new component we want (or need!) to deploy, we need to evaluate the Open Source solutions available.