Feed, Rinse, Repeat

May 17, 2006

Cory Doctorow nails it:

I can understand that certain keywords should never show up in my RSS reader, while others should always show up. It’s about time that my computer can be instructed to do the grunt work of checking to make sure that the stuff I know I hate and the stuff I know I love go into the right hoppers.

I called this the concept of Weed and Feed. Doctorow is praising a service called Feed Rinse, which allows you to “automatically filter out syndicated content that you aren’t interested in.” Doctorow and I both feel this capability is long overdue.

Is this the job of an outside service, though, and not the job of your own personal reader? Of course it is—an up-and-running service beats an imaginary weed-and-feed reader any day.

It’s time to sign up for Feed Rinse. Worry about pricing nightmares and possible XML transformation problems later! Someone is finally willing to do the job Live Bookmarks, Bloglines, Gregarius, Planet, etc. should have been doing all along.

Google Images Is The Worst! True That! Double True!

February 24, 2006

More than two months ago, “Lazy Sunday” (AKA “The Chronic-what-cles of Narnia”), a mildly amusing Saturday Night Live sketch was put onto the Internet and spread rapidly. The sketch praised Alexander Hamilton, cupcakes, and Google Maps.

But for all the love “Lazy Sunday” shows for the search engine, Google Images has no idea the sketch exists.

Two months later! And it’s not because no one is taking pictures of this sketch—they are. Google Images is using an index that is many weeks out of date.

This isn’t even the most notorious occurence. People searching for some of the most newsworthy photographs of recent history—photos from Abu Ghraib—would find nothing on Google Images until well after the November 2004 elections—a lag time of more than half a year.

Google Images is now more responsive when it comes to news photographs. A search for “danish cartoons” turns up photographs from news agencies that were taken as recently as February 15. It’s improvement, but it doesn’t give me my Chris Parnell animated gifs any sooner.

Tagging Ourselves to Death

February 22, 2006

All Tom Bridge wanted were some photographs of Washington, DC. What he got was a giant panda.

This giant panda does happen to be a resident of the National Zoo, and therefore the District of Columbia. But the photos are hardly about DC, so Bridge doesn’t want to see it—he’s being buried in pandas! His blog entry singles out one particular user who adds a “dc” tag where she shouldn’t, but she’s only one of at least a few offenders.

First of all, what kind of a masochist is Bridge that he subscribes to such a high traffic feed—every photo tagged with DC—instead of just browsing them at Flickr every once in a while? Most feed readers demand an attention that Flickr photos just aren’t worth.

Secondly, this onslaught of pictures is a problem whether you feed or browse Flickr. Take a glance at the photos tagged with Princeton. Right now you must scroll past a massive set of one young man’s graduation photos (evidently uploaded three-quarters of a year after the fact).

Flickr likes to cluster tags elsewhere; if one user uploads a set of photos with the same tag at the same time, why show all of them when tag browsing? They are likely very similar.

So show the first few photos, then throw in a Google-esque [More “DC” Results from Tom Bridge] link. Or whatever the trendy AJAX way to do this would be.

This works for tag browsing on the site, but what about the feed? A similar link could be added, but it would depend on a delay between uploading the photos and updating the feed. Are feed subscribers more discerning than they are impatient? I’d rather not ask.

Without using a delay, we can turn to my imagined Weed and Feed Reader. This discerning program could throw out all the Flickr feed entries from the same author that arrive within a close period of time.

A problem for another time is Flickr’s use of space-separated tags. Who knows how many of those DC pictures are really pictures of DC Talk?

Web 2.0 No!

February 16, 2006

A recent Anne van Kesteren article got me hooked on the Web 2.0 Validator! The validator will check any site against the criteria of the future web!

In the true, lazy spirit of Web 2.0, the Validator’s criteria come from its users. Through del.icio.us bookmarks and regular expressions, sites are tested by what the “distributed community” considers the “rules” of Web 2.0.

And as you’d expect, most of these rules are poorly defined.

Just look at these searches:

  1. According to the tests, Google Maps is apparently not very Web 2.0! Though the site is probably the most popular AJAX implementation ever, it fails both of the Validator’s AJAX tests. It doesn’t use a file called prototype.js and doesn’t list any inline XMLHTTPRequests. The Validator doesn’t test for anything else AJAX related, so Google Maps is out of luck.
  2. The previous example notwithstanding, sites that imbed a Google Map are very Web 2.0. Hence the very well-made rule “Uses Google Maps API”. Unfortunately one of the most unique uses of Google Maps—Map Sex Offenders—does not pass the test. All of its Google magic occurs in an iframe that the Validator does not detect. The page inside this frame, though, passes the test.
  3. One rule claims to test for “links to BoingBoing.” Unfortunately what this rule actually looks for are appearances of the URL boinboing.net. I’m sure Cory Doctorow appreciates this Plam Pilotism.
  4. Worst of all is the rule that got van Kesteren so excited: “appears to be web 3.0”. Look closely at that del.icio.us list and you’ll find that this so-called rule only tests for the letter “a. An “a”! Anywhere in the document! I suppose this means Web 3.0 will reach as of yet unimaginable levels of hype.

Since none of the existing tests seemed to search for tags, I made it my mission to abandon my total ignorance of regular expressions and make a rule that would test for rel attributes with values of tag.

I got as far as rel="[^"]+", an expression that would select rel="and everything in here" before I reached my dim-witted limit. How do you find the word tag inside that? Instead of cracking that puzzle, I dumbed my search down to a few common ways to declare a tag: rel="tag"|rel="tag |rel="category tag".

I now see that other rules have already tested for rel="tag" under different rules (“Uses Microformats”?). But duplicated effort, where “effort” means taking the easy way out? What could be more Web 2.0 than that?!

That’s all the validation I need.

Atom All-Stars!

January 30, 2006

Need to do some one-stop Atom shopping? Lucky for you Planet Atom is open for business!

As expected of “planet” sites, Planet Atom collects and consolidates the individual feeds of Atom-minded gurus. It isn’t yet very pretty, but so what? A “planet” site is for feed funneling, and Planet Atom does it very well.

That might not be apparent on first glance. Doing a view-source on the Planet Atom feed reveals thousands of lines of whitespace! Yikes!

But look at all the well-behaved XML! Look at the preserved entry ids! Apart from the weird <a shape="rect"> in everyone’s atom:content, things look hunky-dory.

I wish Planet Atom as a site did a little bit more with its feed content. In particular, feed entries are coming in with different category schemes. It would be nice to see this information on the site.

Beyond categorization, Planet Atom might benefit from some editorial oversight. Like many “planets”, Planet Atom takes every entry from every participant, whether or not the entry fits the topic. Currently (using the Friday afternoon update) eleven entries out of 40 mention Atom. Only 11 out of 40? Not exactly “A fusion of atom-related news.” Heaven knows what plantetary damage this is causing Phil Ringnalda.

Perhaps the scope of Planet Atom will be limited, but their list of contributed feeds doesn’t suggest it.

Wait; let’s take a closer look at these contributors:

Mark Pilgrim

Could it be? The return of Mark Pilgrim? Or is this just wishful thinking?

I’m inclined to think the latter, no matter how much I enjoy his rhetorical flourishes. I even enjoyed the few entries on his IBM blog following his main weblog shutdown.

Apparently, though, there are Pilgrim-watchers more rabid than me. Last week a mailing list message by Pilgrim inspired a flurry of links. And I thought bloggers only linked to other blogs!

I half-expect that on my next visit to the Mariano estate, I’ll pass throught the greater Raleigh area to find that Pilgrim has stapled Yard Sale flyers on various telephone poles. The flyers will be surrounded by frenzied Pilgrim fans.

With devotees like these, who needs a planet site?

Our Feeds, Ourselves

January 19, 2006

On January 5, Nikolas Coukouma made what appeared to be the final adjustments to a patch for WordPress that would allow it to generate Atom 1.0.

The experience drove him insane.

Or, more accurately, Coukouma was so disgusted that he summed up all his hard work with the sentence, “I never want to look at WordPress again.”

By the way, what happened to his hard work? According to Trac, Coukouma’s patch is sitting right where he left it, with no discussion or integration with any future version of WordPress.

That’s a shame, because the patch addresses all the shortcomings I could spot, not that a non-programmer playwright/blogger is someone you want auditing your code. If the patch works as advertised, what’s the hold-up? Do the WordPress developers not like their XHTML sent as escaped HTML in the feed? Do they not like that the patch removes Atom 0.3?

Or is it something else entirely? Some in the WordPress community are looking at the tool’s other feeds, and think it might be time for RSS 1.0 and RSS 0.92 to walk the plank. Lording over the fates of two feed formats is a lot more fun than making sure another one works as expected.

Luckily Owen Winkler is trying to pull back from this feed deathmatch. WordPress needs to handle comments, categories, and permanent links in a consistent way, regardless of the feeds it produces. Winkler is talking about the next generation of WordPress, while Coukouma’s patch only updates the existing Atom 0.3 template (for example, the patch does not create an Atom Comments Feed).

So, should WordPress’s Atom template be updated now? Or should everything feed related be completely redone later? Why can’t we do both?

Image Evilness!

January 6, 2006

Sam Angove of rephrase describes a text obfuscation technique so cruel, so hostile to the web, yet so inventive that it’s like the offending web designers learned the technique from some mirror universe version of A List Apart.

POINTILLIST IMAGE TEXT, by Evil Joe Clark.  Tired of people copying, scraping, or right-clicking on your valuable text?  Worried that the blind may somehow conquer your precious Javascript?  Pointillist Image Text can protect you.  Evil Joe Clark tells us how.

Talking ’Bout URIs

January 4, 2006

Consistent web identification seems to be the topic of the day; observe posts by Gordon Weakliem and Christian Stocker. By chance my own experiences allow me to contribute to this discussion.

Late last night on our usual weblog I mocked The Village Voice in an entry titled Shaw ’Nuff, complete with that copied-and-pasted right single quote (doubling here as an apostrophe). I finished the entry, hit publish, then ambled over to the main page and hovered over some links. That’s when I saw this permanent link:

mikemariano.com/weblog/2006/01/04/shaw-%e2%80%99nuff/

Gyah!

Rather than stripping the apostrophe, WordPress had masticated it into an undigested, percent-encoded nightmare!

There’s nothing cool about that URI, so I immediately committed a web no-no and changed the Post Slug to shaw-nuff. As far as I know, though, the dark magic of Ping-o-matic (or is it now Ping-o-mattic?) had already sent the mangled link far across the Web, so I added a permanent redirect in .htaccess.

I was surprised; we no longer live in the WordPress dark ages. Authors can no longer fall into the same trap Eric Meyer did fifteen months ago. I use cites, ems, and apostrophes in my titles all the time, and WordPress 1.5.2 never lets me down.

But contracting “enough? as “’nuff? was too much for the program to handle. And barring my emergency surgery, George Bernard Shaw would have one ugly link.

Do you think WordPress MU/2.0 has corrected this problem? Only one way to find out….

Atom 1.0: Low Priority

November 16, 2005

Did anyone else see what happened Sunday on the WordPress Atom 1.0 ticket?

Ticket #1526: Have wp-atom.php generate Atom 1.0

Sun Nov 13 02:33:32 2005: Modified by matt

  • milestone changed from 1.6 to 2.1
  • priority changed from high to low

Atom 1.0, meet the back burner.

Back in July, Mullenweg said, “It looks like WordPress is going to continue supporting 0.3 while adding 1.0 support.” That’s not what it looks like now. It’s a shame when you consider Sam Ruby’s wishes….

By the time Longhorn comes out, I have every intent to make Atom 0.3 feeds as rare as Atom 0.2 feeds are now; which is to say, practically non-existent.

Instead Sam’s worst fears are coming true. When hungry baby Longhorn finally emerges from the shell, WordPress 1.6 will be cramming Atom 0.3 feeds down its throat. Microsoft is going to have to let it digest them.

In the meantime I’ll keep watching Atom/WordPress mavericks such as Ben de Groot and Matthew Gertner, who both have valid Atom 1.0 feeds—as long as you don’t look at them sideways. (The feed level IDs and rel="self"s are hard-wired to the main feed, making category feeds and others slightly incorrect.) I’d also like to see what the dissatisfied Phil Ringnalda comes up with.

But any maverick solutions will only help out our own mikemariano dot com. Here at mikemariano dot WordPress dot com, it’s Atom 0.3 until 2007!

Weed and Feed

November 5, 2005

Phil Ringnalda yesterday questioned the usefulness of omnibus “Planet” feeds. To do his part, he has decided to feed Planet Mozilla a finely-tuned subcategory of his Mozilla writings.

But why should Phil have to do all the work? A subscriber to all things Mozilla should expect the Planet feed to contain all things Mozilla, even when people repeat themselves.

So how do readers manage such large feeds? The work has to come at the feed reader level. Let readers custom filter their feeds and Planets will become a lot more useful.

For months Greasemonkey has allowed us to remove Xeni Jardin’s entries from Boing Boing directly on their web page. Why can’t we do the same for the Boing Boing feed? I have yet to see a feed reader that would allow us to set up a simple de-Xenifying rule.

People want this customization; The New York Times offers a feed creator that allows readers to choose multiple news departments for a single feed. This creates hundreds of combinations, hundreds of feeds filled with the same information. Why not feed it all and let the reader sort em out?

The Village Voice supplies an Arts section feed. They’ve already narrowed it down this far, but I, the cranky, impatient reader, only want to read theatre articles. Should I badger them to create a theatre only feed? Or should I ask my feed reader to Display Only entries that contain theater in the guid?

If someone out there knows a feed reader that can do the latter, tell me. Then let the weeding begin.