Archive for February, 2006

Google Images Is The Worst! True That! Double True!

February 24, 2006

More than two months ago, “Lazy Sunday” (AKA “The Chronic-what-cles of Narnia”), a mildly amusing Saturday Night Live sketch was put onto the Internet and spread rapidly. The sketch praised Alexander Hamilton, cupcakes, and Google Maps.

But for all the love “Lazy Sunday” shows for the search engine, Google Images has no idea the sketch exists.

Two months later! And it’s not because no one is taking pictures of this sketch—they are. Google Images is using an index that is many weeks out of date.

This isn’t even the most notorious occurence. People searching for some of the most newsworthy photographs of recent history—photos from Abu Ghraib—would find nothing on Google Images until well after the November 2004 elections—a lag time of more than half a year.

Google Images is now more responsive when it comes to news photographs. A search for “danish cartoons” turns up photographs from news agencies that were taken as recently as February 15. It’s improvement, but it doesn’t give me my Chris Parnell animated gifs any sooner.

Tagging Ourselves to Death

February 22, 2006

All Tom Bridge wanted were some photographs of Washington, DC. What he got was a giant panda.

This giant panda does happen to be a resident of the National Zoo, and therefore the District of Columbia. But the photos are hardly about DC, so Bridge doesn’t want to see it—he’s being buried in pandas! His blog entry singles out one particular user who adds a “dc” tag where she shouldn’t, but she’s only one of at least a few offenders.

First of all, what kind of a masochist is Bridge that he subscribes to such a high traffic feed—every photo tagged with DC—instead of just browsing them at Flickr every once in a while? Most feed readers demand an attention that Flickr photos just aren’t worth.

Secondly, this onslaught of pictures is a problem whether you feed or browse Flickr. Take a glance at the photos tagged with Princeton. Right now you must scroll past a massive set of one young man’s graduation photos (evidently uploaded three-quarters of a year after the fact).

Flickr likes to cluster tags elsewhere; if one user uploads a set of photos with the same tag at the same time, why show all of them when tag browsing? They are likely very similar.

So show the first few photos, then throw in a Google-esque [More “DC” Results from Tom Bridge] link. Or whatever the trendy AJAX way to do this would be.

This works for tag browsing on the site, but what about the feed? A similar link could be added, but it would depend on a delay between uploading the photos and updating the feed. Are feed subscribers more discerning than they are impatient? I’d rather not ask.

Without using a delay, we can turn to my imagined Weed and Feed Reader. This discerning program could throw out all the Flickr feed entries from the same author that arrive within a close period of time.

A problem for another time is Flickr’s use of space-separated tags. Who knows how many of those DC pictures are really pictures of DC Talk?

Web 2.0 No!

February 16, 2006

A recent Anne van Kesteren article got me hooked on the Web 2.0 Validator! The validator will check any site against the criteria of the future web!

In the true, lazy spirit of Web 2.0, the Validator’s criteria come from its users. Through del.icio.us bookmarks and regular expressions, sites are tested by what the “distributed community” considers the “rules” of Web 2.0.

And as you’d expect, most of these rules are poorly defined.

Just look at these searches:

  1. According to the tests, Google Maps is apparently not very Web 2.0! Though the site is probably the most popular AJAX implementation ever, it fails both of the Validator’s AJAX tests. It doesn’t use a file called prototype.js and doesn’t list any inline XMLHTTPRequests. The Validator doesn’t test for anything else AJAX related, so Google Maps is out of luck.
  2. The previous example notwithstanding, sites that imbed a Google Map are very Web 2.0. Hence the very well-made rule “Uses Google Maps API”. Unfortunately one of the most unique uses of Google Maps—Map Sex Offenders—does not pass the test. All of its Google magic occurs in an iframe that the Validator does not detect. The page inside this frame, though, passes the test.
  3. One rule claims to test for “links to BoingBoing.” Unfortunately what this rule actually looks for are appearances of the URL boinboing.net. I’m sure Cory Doctorow appreciates this Plam Pilotism.
  4. Worst of all is the rule that got van Kesteren so excited: “appears to be web 3.0”. Look closely at that del.icio.us list and you’ll find that this so-called rule only tests for the letter “a. An “a”! Anywhere in the document! I suppose this means Web 3.0 will reach as of yet unimaginable levels of hype.

Since none of the existing tests seemed to search for tags, I made it my mission to abandon my total ignorance of regular expressions and make a rule that would test for rel attributes with values of tag.

I got as far as rel="[^"]+", an expression that would select rel="and everything in here" before I reached my dim-witted limit. How do you find the word tag inside that? Instead of cracking that puzzle, I dumbed my search down to a few common ways to declare a tag: rel="tag"|rel="tag |rel="category tag".

I now see that other rules have already tested for rel="tag" under different rules (“Uses Microformats”?). But duplicated effort, where “effort” means taking the easy way out? What could be more Web 2.0 than that?!

That’s all the validation I need.