This bot is judging you: here’s how we made it better at it
By Travis Briggs
On October 15th, 2020, the 2nd generation WP 1.0 web tool was shut down, and the 3rd generation tool was launched to 100%. This was a major milestone for the project, representing over 2 years of planning and migration work.
What is the WP 1.0 “web tool” and the related WP 1.0 “bot”? And what does this mean for editors on Wikipedia?
Rate this!
WP 1.0 was originally a project to create a so-called “one-point-oh” release of Wikipedia and now exists to curate articles for inclusion in offline versions of Wikipedia such as Kiwix. This curation takes place as part of a system of WikiProjects, i.e. groups of editors working together on a specific topic, whether it be Trains, or Catholicism, or Australian Maritime History. Editors of a WikiProject “tag” articles that belong to their project on the article’s talk page, which allows us to publish selections like the WikiMed medical encyclopedia.
Have you ever seen a banner like this one on an article’s talk page?
This means the article is part of one or more WikiProjects. As part of each WikiProject, the editors of that project rate the article on a quality scale, and optionally, on an importance scale. The above article, on “Pope” in general, is rated C-Class quality and Top importance for WikiProject Catholicism. Note that importance ratings can differ between projects, with this article only being Mid importance for WikiProject Politics.
Where do the WP 1.0 bot and WP 1.0 web tool come in?
Well, the WP 1.0 bot is launched every day and looks over all the articles that have been tagged with these WikiProject categories. It then summarizes the results into tables for each WikiProject:
The tables help editors identify areas for improvement, answering questions such as: are there a large number of articles that are Top importance but only Stub class quality? That would be a good place to focus their efforts. This is where the WP 1.0 web tool comes in. If you visit that Catholicism table live on Wikipedia, you’ll see that most of the individual table cells are clickable links. These links, such as the one for Stub-Class/High-Importance, take you to the WP 1.0 web tool.
Maintenance matters
Until recently, the WP 1.0 bot and web tool were operating under the so-called “second generation” version of each. This was a version of the software that was written around 2010, so it was quite old. More drastically, no one who was familiar with the software remained associated with the WP 1.0 project. This was disastrous when something went wrong with the bot and needed to be fixed. No one had any expertise with the codebase. In fact, the code was written in Perl, which has seen waning popularity in the developer community, so few folks were eager to learn the code’s ins and outs. As of 2018, the code for WP 1.0 bot was considered to be in “emergency maintenance mode.” Around the fall of 2018, a decision was made to re-write the tool in Python as the 3rd generation of the bot. As part of this process, the new maintainers of the tool, myself (Audiodude) and Kelson first had to gain insight into the operation of the tool by reading its source code. Python was chosen because it is one of the most popular languages around. It is powerful enough for the task and has requisite tooling and libraries easily available. Hopefully, this will be helpful to future maintainers.
Få saker gjorda (getting things done, but in Swedish)
The rewrite process started in earnest in early 2019 and finished around August when I participated in Wikimania in Stockholm. We actually got as much done in the 4 or 5 days there, working full time on the project, as had been completed in the previous 6 months! From that point on, the new Python WP 1.0 bot was powering all of the tables and logs on English Wikipedia. The new bot features 97% code coverage by tests, which is substantial in ensuring its stability and keeping from introducing bugs.
What was the test coverage of the Perl bot? 0%! There were no tests!
Now not only do we run them on existing code, but also whenever a pull request is submitted in Github — so it’s difficult to submit something that breaks the project. The code runs on Wikimedia Cloud VPS, inside of Docker containers. We use Docker Hub to automatically build three separate images automatically, and then deploy those images on the VPS. This is an altogether cleaner architecture than the old bot, which was simply unzipped and “git cloned” into a directory on a server. Meanwhile, the new WP 1.0 web tool is written in Vue.js, a modern reactive Javascript framework. Vue allows us to build reusable components that dynamically update in response to new data, without a big amount of overhead as might be necessary in a React app. We also use Bootstrap for laying out and styling components.
Thank you, thank you, thank YOU
Already we are reaping the benefits of the rewritten bot. It is much more stable, with drastically fewer performance issues and random crashing. In fact, over the first year that it’s been in operation, the bot has only failed to do a daily update a few times. Complaints on Wikipedia from editors are way down. We are also planning to roll this out on other wikis that already use a similar rating system. We believe that the new web tool will be easier for editors to use, to get right to the heart of the information they’re trying to find without large amounts of excessive cruft impeding them.
Last but not least, it can be extended easily to perform more tasks at a later date. Feel free to ask!
Overall, this is a big step forward for Wikipedia and the WP 1.0 project, and we’re happy to have completed this milestone. Thank you to all the editors who make the WP 1.0 project possible and everyone who has tested our new versions and helped us get to this point!
About this post
Featured image credit: KIWIX Illustration article Blog 2020 (Robot Kiwix), Stefan Schlageter, CC BY-NC 2.0
wonderful article
Nativo perú
Buscando La libertad de el conocimiento