Today I had the first milestone meeting with the folks of Hallo Welt about the TIFF support they are implementing for MediaWiki. This is one of the projects Wikimedia Germany was offering contracts for a while back. Now we are starting to see the first results, cool!
So, what will full TIFF support give us? Nothing spectacular, but something quite useful. TIFF is an image format widely used by museums and in scientific research. It’s also the de-facto standard used in print/reproduction. It is however rarely seen on the web, and browsers generally are not able to display TIFF images. However, the need to deal with TIFF files has increased lately, as we get more and more media from museums and archives. Especially for the people who work on image restauration, it is important to be able to have the original digital version of the image around – which is usually a TIFF file. So, TIFF uploads had been enabled on Commons a few months ago. But MediWiki can’t render them, and nither can browsers. They can’t be used as images on the wiki.
So, what Hallo Welt is doing now is implementing rendering support for TIFF files – which is not so easy, because TIFF files may contain multiple images or pages, similar to PDF files, or the DjVu files we use to represent scanned books. But the project is coming along pretty well, and it looks like it will bring some small improvements also to the existing support for PDF and DjVu files. We are also experimenting with automatic user interface testing using the Selenium framework. If this works out well, we may well use it for more things on MediaWiki in the future.
The project is scheduled to be completed in November, and I hope we will be testing it on the live sites soon after. So, look out for more pretty pictures!
#1 by Durova on September 11th, 2009
Will you be supporting 16 bit TIFFS now or only 8 bit? A portion of Library of Congress files currently require conversion.
#2 by Daniel on September 12th, 2009
@Durova:Not sure – it will support whatever ImageMagick supports.
#3 by Micke on September 13th, 2009
I think ImageMagick can handle up to 32 bits depending on the version: http://www.imagemagick.org/script/architecture.php
/Micke
#4 by Adam Cuerden on September 14th, 2009
That’s great, but why is it possible to do thlis when the much more efficient lossless format, PNGs, which generally have half the file size of tiffs, and thus allow twice as big of a file to be uploaded, are still stuck at the appallingly low limit of 12.5 megapixels, and requests to fix this on Bugzilla just get met with complete stonewalling and talk about “maybe improving the message”.
We’re thus providing a perverse incentive for providing lower resolution lossless scans: You won’t have to deal with idiots deleting your files because they don’t display, and, worse, don’t display in a way that makes the file look corrupt.
I did some calculations. With a 100MB upload limit, 42 megapixels of data is about the maximum for PNGs; for TIFFs, 21 megapixels. That means that PNGs can be about 6000×8000, TIFFs only 4000×6000 pixels. For an average size engraving, that’s the difference between a mediocre 500dpi, and a decent 720. (Engravings can have a LOT of very fine detail, particularly steel engravings).
Will you be doing something to fix the perverse incentives this action causes?
#5 by Daniel on September 14th, 2009
@Adam: first: yes, we are thinking about doing something about it. The idea is to render a medium-sized (~6 megapixel) version on upload and then generate smaller thumbnails from that. This is an idea that was discussed with Brion & co, but someone has to go and do it. It’s on the list of things WMDE would like to have, but I can’t püromise we’ll do it.
In the mean time: did you notice that the 12mpx limit applies to PNG but not to JPG? I don’t know if it will apply to TIFF, I don’t know enough about the compression mechanism. But comparing them directly is misleading. Also, The Right Thing currently is to emulate the method described above: first, upload the *full* resolution (the 100MB upload limit should not be a problem for images, and getting around t6hat is another issue). If the full res version doesn’t get rendered, upload a scaled version. This way, the original is still available online.
#6 by Michael on September 14th, 2009
I don’t have a lot of visibility as to where the tiff code is being added? … but I would mention for large tiff rendering imagemagic is probably not ideal (because it loads the full image into memory for resizing) .
If we could test something like vips that could potentially help with quickly generating thumbnails of very large tiff images without saturating all the available memmory.
An example on the vips site has the vips re-sizer running in 1/5th the time at 1/32 as much memory for a 25 megapixle image.
#7 by Adam Cuerden on September 15th, 2009
JPG is lossy, which means if you reedit an image enough times, it eventually becomes unusable because of JPEG artefacting.. That doesn’t fit in very well with our collaborative ethos, does it? Generally, I upload a PNG AND a JPEG, but, because PNG thumbnailing is handled so badly, the PNGs sometimes get deleted. Plus, the thumbnailing tends to blur the images, but only JPEGs have any sharpening applied, and a lot of other such problems.
#8 by Adam Cuerden on September 15th, 2009
By the way, if you’re going down that route,, wouldn’t it be better to generate a full-sized JPEG from both TIFFs and PNGs? Then you have the archival copy, but people who just want to look at the image at full resolution it can have a link to download the (much lower filesize) JPEG, and that can be used for thumbnailing. You will need the JPEG to have decently high quality, but it’d probably work out fairly well. I’d only do thiss above 12.5 Megs for PNGs, though, as PNGs are used, for instance, to explain JPEG artefacting, and if we make it so that no lossless medium can be used full-size in articles without becoming a JPEG, we’ve screwed ourselves over.
#9 by Adam Cuerden on September 15th, 2009
A third possibility is to allow administrators only to upload a file to make thumbnails from. This has some advantages with engravings and other images that the thumbnailer has trouble with. For instance, [[Image:Rajpoots_2.png]] is a featured picture on en-wiki, but thumbnails poorly. [[Image:Rajpoots_small.jpg]] is an awful image, but thumbnails very well.
#10 by Bryan on September 15th, 2009
This is because all PNG renderers that I am aware of try to load the entire file uncompressed into memory before resizing and recompressing it.
I wrote a tool some time ago that resizes the image “by row”, which greatly limits its memory foot print. It can be found in Wikimedia SVN as pngds (the C app that does the actual resizing) and PngHandler (the MediaWiki plugin).
#11 by Steve on September 22nd, 2009
Keep in mind that TIFF has some variants in the wild, some of which don’t conform to the standard spec. SketchbookPro for example simply uses the metaspace differently, which borks its ability to load properly in GIMP.
In a certain way, we don’t need to preserve the TIFF formatting, simply the layers – so a TIFF to XCF process might be the ideal solution.