From hell to HTML: releasing a Python package to easily work with Wikimedia HTML dumps
Announcing mwparserfromhtml, a new library that makes it easy to parse the HTML content of Wikipedia articles
Open Source for Open Knowledge
For more information, please see: Parsing