Archive for July, 2009

PMTPA Router Reboot – Scheduled Downtime (Resolved)

Our primary router for the pmtpa cluster had to be rebooted today at 12:00 GMT.  A line card had died and needed replacing, and the

120px-Gnome-face-sick.svg

system required a reboot for it to fully take effect.  Once that finished, CentralNotice was adding a lot of overhead and had to be disabled for our caching cluster to catch up.  Then the overload caused the primary database master for S3 to overload, and we are in the process of switching database masters to another server.

If all went as planned, this would have been a quick 5 minute router reboot and back online.  Unfortunately, things do not always work smoothly, so what would have been 5 minutes has been awhile.  This post will be updated as more details are resolved.

Update: We have switched database masters successfully and all sites and projects should once again be fully functional as of 14:13 GMT.

1 Comment

SVG for all… with Flash?

For several years, we’ve supported uploading SVG vector images to Wikimedia sites… with the limitation that they would be rendered to static PNG raster images when actually used inline.

This gives our editors great flexibility in editing, customizing, and translating maps and diagrams using cross-platform free tools like Inkscape, but we’re missing out on some of the big potential in SVG — high-quality scaling for zoomed displays and printing, and animation and scripted interactivity.

In large part we can blame Internet Explorer — the most widely used browser has never supported SVG graphics natively, and Adobe isn’t even maintaining their plug-in anymore! With the majority of users cut out, we’ve had little incentive to move forward with new capabilities that would be closed to most visitors.

But that may be changing, thanks to… Flash??

svgweb implements a highly capable SVG renderer in JavaScript and Flash, bringing high-quality, scriptable SVG support to the ~95% of web users who have either Flash or a naitvely SVG-capable browser.

I love to see Flash’s near-ubiquity used for good — implementing support for modern, open web standards on older and less capable browsers.

One of the chief drivers of the project is Google open standards evangelist Brad Neuberg; we had a great talk today along with Trevor on our Usability team and Michael of Metavid/Kaltura/video awesomeness, and we’re all very excited at the possibilities.

We’re going to see if we can whip together some basic integration in time to show at the SVG Open conference in October, starting with a basic zoom-and-pan view for SVG images which can make use of native or emulated SVG support.

Future ideas that have us really excited include:

  • Live previewing of parameterized images at insert time (localized text, highlighted map segments, charts, etc)
  • On-web basic vector image editing? Sometimes you just need to make an adjustment and installing Inkscape is kind of heavyweight.

Pure SVG + Javascript should be able to provide for selecting, moving, adding, and altering objects, which we could then save back to a new version of the file… svgweb’s powerful scripting support should be able to extend this to Internet Explorer users too!

Use of SVG originals inline in article pages is more dependent on file size issues. We have a lot of files that are just plain huge, especially detailed maps, and the SVG version ends up being a lot slower to download and display.

A project which can help with that is Scour, a tool to optimize SVG source by stripping out unneeded verbosity and rearranging style bits to keep size down.

With further work to strip out detail that will never be visible, a filter like this could let us produce output files that are more suitable for on-screen viewing while still scaling up nicely on zoomed displays and printed output.

7 Comments

Scaling Wikipedia Mobile

Some of the things I learned about new projects and scaling issues.

BTW, I think we can settle that Ruby applications don’t have to be slow. Far from it.


Follow me on Twitter @hcatlin or @WikimediaMobile

No Comments

We’re adding an off site archive for Commons and the XML snapshots

Thanks are due to eBart consulting and User:Milosh for proving a backup server and storage array at their colocation facility in Europe. This server will store archives of our publicly available data of Wikimedia Commons and the XML snapshots.

Everyone knows that this has been long and coming as having an off site location for our data is extremely important for disaster recovery. With this archive in place we’ll have another external archive space for Commons image data to complement the one living at MIT.

Given the 10T’s donated were likely to also store yearly archives of the XML snapshots.

This won’t stop us from continuing to be rigorous about our internal backups for the same data along with keeping all of our users private data within our own data centers. It will simply be another physical space for us to archive our publicly available content.

While this off line mirror will only be used internally we have some other leads about other sponsors who might be able to offer a publicly available mirror. Over the next weeks we’ll be streamlining the off line archiving process and seeding the initial commons upload which currently comes in at just under 4T’s ! Once we make some sense of how best to manage the archiving process we’ll see who else is able to host our data.

11 Comments

Internal updates: moving files around

We’re now running all Wikimedia sites on our internal deployment branch of MediaWiki, so we’ve got a consistent record of what software we’re running and can more easily test it offline. There were a couple brief glitches during the switchover, but everything seems to be running smoothly now.

Additionally, we’ve started the process of moving thumbnail images from our primary upload file server to another machine to free up space and reduce the repetitive slowness problems we’ve been seeing lately as the ZFS data pool got near full. Even with an updated OS kernel we’ve been seeing it come back. :(

The transition of thumbs should be smooth and fairly non-disruptive in the background.

Update 2009-07-17 00:22 UTC: With the file server getting hyperslugging again, we’ve temporarily re-disabled uploads to speed up the transfers. Should be back online shortly.

Update 03:31 UTC: Uploads re-enabled after disk space rearranged.

6 Comments

ESAMS Servers not reachable, some EU traffic affected. (Fixed!)

Starting approx. 03:20 GMT, servers in our ESAMS facility began to roll offline one after another.  After some investigation, it appea

rs power is not being supplied to all the servers.  This has resulted in some slow downs for traffic of EU users.

120px-Gnome-face-sick.svg

We have temporarily migrated all traffic to our primary FL datacenter.  Once the servers are

back online in ESAMS, we will be pushing service back to it as well.

Update: The problem has been identified and finally fixed. Traffic has been returned to normal.

The best guess so far is that there was a cooling failure in the datacenter which caused the Sun boxes to shut themselves down.

An update from Leaseweb/Evoswitch is here:  http://noc.leaseweb.com/status.php?i=389

6 Comments

Uploads temporarily offline for site fix (done!)

120px-Gnome-face-sick.svgUploading and generation of new thumbs will be temporarily disabled on Wikimedia sites while we patch & reboot the server to fix the performance issues we’ve been seeing.

We hope to be done within a couple hours (by 22:00 UTC or so — 3pm PDT), but it could run shorter or longer.

Rough procedure for the curious:

  • Take image thumbnailing servers offline
  • Disable uploads
  • Unmount file server from web servers
  • Patch & reboot file server: rebooted – 21:00 UTC
  • Remount file server on web servers – 21:09
  • Put image thumbnailing servers back online – 21:12
  • Re-enable uploads
  • <- Done 21:18!

With the kernel fix, the file server should now behave better. We’ll then be able to continue our more leisurely migration of thumbnail files to another server, freeing up disk space on the primary box.

Updated 20:20 UTC: Added our hoped-for ETA

Update 20:44 UTC: A side-effect of taking the image server offline broke account creation and some editing which triggers an anti-bot captcha. Have switched to the simple captcha mode which doesn’t use images for now.

Update 20:56 UTC: Just noting that this affects <math> and <timeline> rendering as well. You may see some math rendering errors until we’ve completed; sorry!

Update 21:12 UTC: File server is back online and uploads are re-enabled. So far so good!

9 Comments

PDF export service temporarily down (fixed)

Wikimedia’s PDF export service is temporarily down; the server failed to reboot after a routine kernel upgrade. It should be resolved or replaced with a spare box within a couple hours…

Update: Server is back online.

No Comments

PDF Export currently down (fixed)

Our PDF export server is presently down.  It had to be rebooted to organize and route some power cables in our racks.  When it powered back on, it is failing to load all software correctly.  We are working on resolving it, I just wanted to post something here on the blog since it is the first place that many people check when they think some service is broken.

,

No Comments

Intermittent media server load problems

pokey-file-serverWe’ve been seeing some general slowdowns in our image and media file serving recently, including some instances in the last couple days where the sites as a whole have been affected to the point of extreme slowness or temporary inaccessibility.

Domas believes this is related to this reported problem with NFS performance when ZFS snapshots are active. We’ve had some luck so far with it improving after dropping older snapshots (possibly along with restarting NFS and temporarily disabling the image scaler servers to give it a little breathing room to reset).

We’ve been planning for some time to redo the way we access our media files internally which can help reduce the impact on the rest of the site when load problems on the file servers occur, but we might also be able to spread out the load among multiple servers to improve things even more.

Updates will come as we get things back on track…

Update 2009-07-15: We’re temporarily shutting off uploads while we apply the ZFS fix patch and reboot the main file server. You may see some missing images or funky error messages for a little bit, but the sites should otherwise continue working normally until the file server is back up.

Update 2: Server is patched and uploads are back online. This should resolve our performance problems while we continue rearranging the upload servers to be more future-proof.

, ,

13 Comments

Power outage in Wikimedia’s European servers

This seems to be a power outage at our European proxy caching cluster; we’ll see if we can give more details later.

deadeuro-reqstats-hourly

European traffic has been rerouted to our US servers, but the extra load may cause the sites to be a little sluggish for now. (If your DNS is still seeing the old entries, you can manually configure your browser to use the US proxy: rr.pmtpa.wikimedia.org port 80. You should only do this temporarily, as you won’t be able to access anything *but* Wikipedia and our sister projects. :)

Update 21:13 UTC:

European servers are coming back online, we should have this cleaned up pretty soon.

Update 21:26 UTC:

We’re starting to switch traffic back to Europe. Should be better in a few minutes… In the meantime, amuse yourself reading the Twitter panic. :)

Update 21:40 UTC:

You can also use the SSL interface to Wikipedia, which doesn’t have the proxy overload.

No Comments

Improving Wikimedia’s Discussion System

Hi all,

Some of you might have already seen my blog posts about LiquidThreads, Wikimedia’s in-development discussion system.

For those who haven’t, this is a quick primer on what LiquidThreads is, and what it’s going to do for Wikimedia’s communities.

Currently, Wikimedia’s discussion system sucks. Here’s why:

  • It’s not easily usable by the average user. It isn’t obvious how to leave a comment on a talk page, or how to reply to a comment. The indenting we use now is ad-hoc and unsustainable for long discussions.
  • Signatures are done manually and we have to jump on poor unsuspecting newbies who don’t know this (or write bots…)
  • Archiving is done unevenly by bots, which are maintained by users and therefore of very uneven quality. Archives are something of a black hole — they aren’t searchable, easily maintainable or easily accessible. You can’t resurrect an archived discussion easily, nor can you view its history.
  • It’s stored as plain wikitext, which is opaque to any sort of automated process.
  • You can’t move a thread to a different discussion page and preserve its history.
  • There’s no encouragement, mechanism or incentive for quoted, point by point inline replies like we’re all used to with e-mail.
Imagine being a new user and trying to figure out how to add your comment to this.

Imagine being a new user and trying to figure out how to add your comment to this.

Enter LiquidThreads. LiquidThreads is a system that makes MediaWiki’s discussion system behave like a forum or comments thread, while still maintaining the unique refinements that make wikis work. It was originally designed by a Google Summer of Code student, David McCabe, and I’ve been making incremental improvements to make it work for Wikimedia.

Overview of the new LiquidThreads interface

Overview of the new LiquidThreads interface

So, what’s changed?

  • Comments are separated from each other in the wikitext, so there are no more edit conflicts in discussions, and the usability is vastly improved.
  • Instead of indenting, each comment is in its own box, along with its replies. It makes it much easier to follow each post and its replies, and it’s much nicer on the horizontal whitespace. Hopefully, it will be the death of the ‘arbitrary section break’!
  • Each post has its own history page, making it easy to see what’s going on with individual threads without trying to navigate the history of a whole page.
  • It’s easy to move threads between pages, preserving the page history.
  • Discussions  are never ‘archived’. Instead, older discussions fall to the bottom of the page, and eventually they drop off entirely, to hit a new page. If you missed the chance to have your say, just reply to a discussion and it’ll be bumped right up to the top of the page again!
  • Discussions with recent changes are at the top of the page. Discussions that have fallen dormant fall to the bottom. It’s easy to find out what’s happening!
  • You can watch individual threads of a discussion, and even get an email when they’re replied to.
  • It’s easy to link to a discussion, and the links are permanent unless the discussion is deleted. There’s no need to point to an archive or to an old revision ID.

If you’re interested, I’ve put together a test setup for you to play with it.

As always, questions, comments and suggestions are more than welcome, in the comments or elsewhere.

, , ,

8 Comments

First usability release, Acai, is now available.

Screenshot-Editing July 1 Wikipedia

The first usability release, Acai, hit Wikipedia and sister projects this afternoon. The new skin, Vector, and the enhanced toolbar can be turned on from the user preference under “Appearance” and “Editing”. Search result page now has a new layout with less daunting information. Vector is only available for left-to-right languages at a moment due to IE6 incompatibility. However, the enhanced toolbar can be selected from all languages and the new search result page is enabled globally. We could not roll out two features we had planned. First, warning messages for unsaved changes when a user switches away from the edit tab did not work properly thus they are disabled. So please be careful when you switch away from the edit tab. Secondly importing language specific configuration for special characters were not graceful, so we disabled special character function from the toolbar. We are working on the fixes and plan to roll them out as soon as we have stable solutions. The usability project wiki has Vector and the new toolbar as a default, so if you prefer to check them out without changing your preferences it is a good place to visit first. Let us know what you think. We would love to hear from you.

Best,

Naoko

, ,

No Comments

Open Translation Tools 2009 report

View of the towers of De Waag, Amsterdam With six projects in over 250 languages, multilingual communication and content translation are big priorities for us. That’s one reason I was excited to go to the Open Translation Tools 2009 conference and be in the same room with 80 other translators, content providers and developers all working in the open translation space. Another reason is that the conference was held in Amsterdam in the old city center, in a beautiful venue right by one of the canals.

We have some amazing opportunities to collaborate with folks on other projects, from translation memory based systems like that in use by the World Wide Lexicon to source code string repository interfaces like Transifex. As one person put it, the perfect testbed for crowd-sourced translation is Wikipedia; if we can’t make it work there, where can it work? I also had a chance to talk with Gerard Meijssen and Siebrand Mazeland about new ways to facilitate tighter integration with translatewiki.net and to encourage more projects to make use of the translatewiki facilities. It should be a really productive year.

Folks told me to go visit the Van Gogh Museum, so I was dismayed to find that they don’t allow photography. However, the Wiki Loves Art NL project, organized by the NL Wikimedia chapter, had reached an agreement with the museum to allow two small groups in for photographs, during the week I happened to be there! So, come Tuesday morning, I was one of 20 lucky Wikimedia community members and photojournalists to be given private access to the Van Gogh collection. Some photos from the group are already available on the flickr group from which they will be uploaded to the Commons.

Right after the conference I went to the first two days of the OTT book sprint, which had as its goal the production of a comprehensive manual for beginner volunteer translators of open content with open tools. Once again we were in an awesome venue (see the picture; we were in one of the turrets!) and under the expert guidance of Adam Hyde we got a huge amount of content generated in just a few days.

On the last day I skipped town to go visit a colleague on one of the Wikimedia projects; we’ve worked closely together for over two years and had never met face to face. Perhaps that was the most important part of the whole trip: bringing our virtual community into the real world one person at a time.

1 Comment

Downtime on en.wikipedia.org resolved

We had 52 minutes of downtime on the English-language Wikipedia site today; only en.wikipedia.org was affected. Our master database server was thrown into a funky state in which hundreds of access threads were stuck in the “statistics” state — which seems to be MySQL’s way of saying “I’ve fallen and I can’t get up”.

It’s unclear exactly what set it off, but basically nothing works until you restart MySQL. After switching the site to an alternate master database, all has been well.

At 52 minutes from start of event, this took us a bit longer than I’d like to resolve — we had to percolate through a couple levels of alert calls before we finished diagnosing it and getting the DB switch pushed through. (Sorry to wake you up early Tim!)

A similar event in future should be fixable within a few minutes, thanks to Tim’s work on making the master-switch system more foolproof. We’re fixing up our internal documentation so all our site ops will now know  how to run the database master switch script next time!

sad-wiki

– brion

4 Comments