Archive for June, 2009
Wikimedia Mobile is Officially Launched
Posted by Hampton Catlin in open-source, software on June 30th, 2009

iPhone Version in English
After spending about 6 months in alpha-beta-development-maybe-kind-live mode, we have recently moved Wikipedia Mobile over to a new fast and sexy server. With this new server, we’ve reached the point in development where we can call this baby “launched”!
When I was brought on board at Wikimedia, I was tasked with endowing Wikimedia with a compelling mobile offering. From the beginning, we knew we were going to focus on “fully featured” smart phones. These phones are taking more and more of the market and we believe they will have an easy majority-share in a couple years. The goal is to build for the future.
At the moment, the Mobile site supports iPhone, Kindle, Android, and Palm Pre. And we fully support both English and German. There are other working languages, but they haven’t been fully translated yet. Our goal is to grow slowly and do it really well. We are starting out simple with limited support in order to test the usability and the platform’s stability. So far, things are looking good.
During the beta test period, we’ve served around 10,000,000 pages. You can view the hourly stats here (updated every hour on the hour). And with this new test server, we should be able to do more.
Based off of requests from Google and the Palm Pre folks… and with what just makes sense. We are doing default mobile redirects. That is, if you open a wikipedia link on a supported mobile device, then you get redirected automatically to the mobile gateway. If you click the “View this page on main Wikipedia” then we disable that redirect with a cookie. This way, the 99% of people using mobile devices to read Wikipedia on-the-go have a seemless experience. And, the 1% who like to edit on their mobile device can use their browser to view the main site and do all the fancy things that they like doing. We suspect an initial outcry from the editors that use their mobile devices, but hope that will calm down. We’ve had very good feedback from the 99% and so we can’t forget those folks. If anyone has any suggestions on how to make this easier for the 1% who are editing while mobile, we’d love to hear from you.
If you want live updates about the Mobile site then you can follow WikimediaMobile on Twitter. Also, if you know any Ruby, you can grab the source code via git from Github and helpout! Feel free to contact me via email with any questions.
Also, special thanks to Nic Williams and Ryan Bigg from Mocra for help with the Ruby 1.9 transition and thanks to Yahuda Katz for help with the XML parsing layer and for all his work on the Merb framework.
Firefox 3.5 brings native open video support
Posted by brion in open-source, wikimedia on June 30th, 2009
Congralutations are in order for our friends and comrades-in-arms at Mozilla: they’ve released version 3.5 of their open-source Firefox browser today.
Aside from major improvements to speed and memory usage, one of the updates that has got us most excited at Wikimedia is the support for HTML 5’s native <video> and <audio> elements.
What does this mean? Well in short, it means that Firefox 3.5 is the best browser to run video and audio clips from Wikimedia Commons on!
A few months more down the line, we’ll start being able to integrate support for our inline video sequencer, which’ll make it easy to extract snippets of a longer video and combine them — entirely using open-source, non-patent-encumbered web standards. This makes heavy use of the new HTML 5 multimedia support; while at first editing will be limited to Firefox 3.5 users, other browsers are continuing to improve and adopt the same support.
On templates and programming languages
As many folks have noted, our current templating system works ok for simple things, but doesn’t scale well — even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise…
<includeonly><span style="white-space: nowrap;">{{#if:{{{3|}}}|
{{coord|{{{1|0}}}|{{{2|0}}}|{{{3|0}}}|{{{4|N}}}|{{{5|0}}}|{{{6|0}}}|{{{7|0}}}|{{{8|E}}}|{{{9|type:other}}}|format={{{format|dms}}}|display={{#if:{{{title|}}}|inline,title|inline}} }}| {{#if:{{{2|}}}|
{{coord|{{{1|0}}}|{{{2|0}}}|{{{4|N}}}|{{{5|0}}}|{{{6|0}}}|{{{8|E}}}|{{{9|type:other}}}|format={{{format|dms}}}|display={{#if:{{{title|}}}|inline,title|inline}}}}| {{#if:{{{4|}}}|
{{coord|{{{1|0}}}|{{{4|N}}}|{{{5|0}}}|{{{8|E}}}|{{{9|type:other}}}|format={{{format|dec}}}|display={{#if:{{{title|}}}|inline,title|inline}}}}| {{#if:{{{1|}}}|
{{coord|{{{1|0}}}|{{{5|0}}}|{{{9|type:other}}}|format={{{format|dec}}}|display={{#if:{{{title|}}}|inline,title|inline}}}}}}}}}}}}</span></includeonly><noinclude>
{{pp-template|small=yes}}
{{documentation}}
</noinclude>
And we all thought Perl was bad! ;)
Lua
There’s been talk of Lua as an embedded templating language for a while, and there’s even an extension implementation.
One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.
An inherent disadvantage is that it’s a fairly rarely-used language, so still requires special learning on potential template programmers’ part.
An implementation disadvantage is that it currently is dependent on an external Lua binary installation — something that probably won’t be present on third-party installs, meaning Lua templates couldn’t be easily copied to non-Wikimedia wikis.
There are perhaps three primary alternative contenders that don’t involve making up our own scripting language (something I’d dearly like to avoid):
PHP
- Advantage: Lots of webbish people have some experience with PHP or can easily find references.
- Advantage: we’re pretty much guaranteed to have a PHP interpreter available. :)
- Disadvantage: PHP is difficult to lock down for secure execution.
JavaScript
- Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.
- Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P
Python
- Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)
- Wash: Python is probably better known than Lua, but not as well as PHP or JS.
- Disadvantage: Like PHP, Python is difficult to lock down securely.
Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)
– brion
Update:
Hampton reminds me that Ruby has some sandboxing features and may also be a contender.
Blog Downtime
Posted by RobH in open-source, software, wikimedia on June 29th, 2009
I am sure that many folks noticed that on the morning of 2009-06-26, techblog.wikimedia.org and blog.wikimedia.org went down. It turns out that some of the parts of our Wordpress installations were compromised. I do not want to get in to a direct show and tell of what they did, but hopefully we have hardened the installation to the point that it will not occur again.
This is why the blogs exist on their own server, so when things like this happen we can minimize the impact. The blogs are both up and running now, along with the other services that were affected. All but techblog was back online before Friday was over, techblog lagged behind until today. (As techblog was the point of exploit, we got everything else back up first.) Other affected services were the Open Conference Systems site for Wikimania 2009, as well as our survey software. Both of those were back online ASAP after the incident and the rest followed after.
Of course, it was hard to get this information out to folks when the blogs were down! It goes to show how easily using the blogs to get info out has been, since without it we had to scramble to get the information out of other channels.
Thanks to everyone who assisted in the restoration, and also thanks to everyone for their patience while the system was fixed.
Current events and traffic spikes
News agencies today are reporting that pop star Michael Jackson has been hospitalized, and perhaps died. We can all think back on how the King of Pop has touched our lives, but today we can also see how high-profile news events can affect a web site… See also past events such as the Popedotting and the 2008 US election.
Here at the office we first noticed something was going on when IM services such as AOL Instant Messenger started logging people out — we quickly noticed that our own servers were hitting load spikes, and suspected there was something going on…
Server CPU load spike (likely several more to come):
The actual traffic load spike is subtler; server effects can be disproportionate to the actual traffic:
Update 22:53 UTC:
The traffic is pretty much holding steady but we’ve still been seeing intermittent load spikes:
These are at least in part due to one of our memcached internal data cache servers going wonky and swapping due to overuse of memory from text storage running on the same node. We’ve reduced traffic on the node and restarted it to even out its memory usage. (Thanks Domas!)
Update 23:00 UTC:
You may see intermittent messages like “(Cannot contact the database server: Unknown error (10.0.6.24))” as temporary database overloads cascade around the system. Sorry for the inconvenience while we work the kinks out; just wait a few minutes and try again…
Update 23:43 UTC:
We believe a large chunk of the CPU overload is due to cache swarming — many visitors simultaneously causing a re-render of the page due to an expired cache version. I’ve put in a temporary hack which will reduce the amount of rendering, but may cause some people to see out of date copies of the page.
Update 2009-07-02:
Here’s a link to Domas’s blog post with technical details on the cash swarming problem.
First usability release is coming up soon.

Screenshot of enhanced toolbar
I am happy to announce that the first set of usability improvements is scheduled to be integrated in MediaWiki and will be enabled as one of user preferences in Wikipedia in the first week of July. The nickname for this release is called Acai. The release names will follow the names of tropical fruits in alphabetical order. The description of features are found in this release page. The major improvements are; 1) reorganized tabs which clearly indicates the state of “Read” and “Edit”, 2) enhanced edit toolbar which is expandable based on users’ needs, 3) search result page which hides the clutter and make search results more visible, and etc. We are still combating with IE6 bugs , but come and play with the prototypes and let us know your feedback. On the localization front, we have introduced a set of new texts for localization. If you are a MediaWiki translator, your collaboration on localization is greatly appreciated as always.
Naoko Komura
Chinese-language search fixes for MediaWiki
Search is an important part of any web app like a wiki, but search is harder than it looks — especially in a multilingual environment. MediaWiki has to support not just your standard Western languages like English and Spanish, but many more with special requirements:
- Some can be written in multiple scripts (such as Serbian in Cyrillic or Latin), and searches should match text written either way.
- Some languages don’t use word spacing, like Chinese and Japanese. To let the search index know where word boundaries are, we have to internally insert spaces between some characters:
维基百科 -> 维 基 百 科
Then to add insult to injury, we need to fudge the Unicode characters to ensure things work reliably with older and newer versions of MySQL:
维 基 百 科 -> u8e7bbb4 u8e59fba u8e799be u8e7a791
For a long time, this word segmentation wasn’t being handled correctly for Chinese in our default MySQL search backend, so searching for a multi-character word often gave false matches where the characters were all present, but not together.
This is now fixed for MediaWiki 1.16; the intermediate query representation passed to the search backend now internally treats your multi-character Chinese input as a phrase, which will only match actual adjacent characters:
维基百科 -> +”u8e7bbb4 u8e59fba u8e799be u8e7a791″
Note that Wikimedia’s sites such as Wikipedia run on a fancier, but more demanding, search backend with a separate Java-based engine built around Apache Lucene. Sometimes we have to remind ourselves that third-party users will mostly be using the MySQL-based default, and oh boy it still needs some lovin’! :)
IRC server refresh for irc.wikimedia.org
In a couple of minutes, I will be shutting down irc.wikimedia.org and the rc bot to perform a much needed upgrade of the irc server.
We have been experiencing problems with the current server, with IRC feeds not showing up, and the goal is to remedy these problems, as well as incorporating new functionality.
The documentation for the new new server will show up shortly thereafter on Wikitech.
It is excpected that there will be a little downtime during which users will not be able to connect to irc.wikimedia.org. The upgrade should take about 10 minutes, and there is a rollback plan in case something goes wonky.
Welcome to Steve Kent
On Monday, Steve Kent will be joining the Wikimedia Foundation team as the Head of Office IT Support. Steve will take the IT support torch from Ariel as well as incorporating some new responsibilities that were previously shared by other staff. Ariel’s responsibilities will shift to software development work.
Steve comes to us with more than 20 years of IT systems management experience. He has been in similar roles with several organizations including; RR Donnelley, Charrette LLC, Communicomp and CMP Media. Steve was most recently the Director of Information Technology for Sandbox Studios located here in San Francisco.
Welcome, Steve, to the Wikimedia Foundation team.
Erik Moeller
Wikimedia & FourKitchens support CiviCRM development
Posted by tomasz in open-source, wikimedia on June 9th, 2009
Here at Wikimedia we’ve been avidly using CiviCRM for over two years now. Over that period we’ve seen it grow and mature as a platform for fundraising, contact tracking & mailings and have been wanting to make the platform evolve even more. Together with Civi community, we’ve worked to organize the early release of the CiviReport architecture for the 2.2 branch. Thanks go to the core Civi team for doing the backport and FourKitchens for contributing a wealth of new reports for us. You can read a full write up of the release at the CiviCRM blog.
For those of our readers who are interested in CiviCRM and are in the Bay Area, we’ve also started to organize regular user meetups. The first one had a great turn out and we’d love for both developers and users of CiviCRM to attend the next one on August 4th at 6pm.



