Tuesday, May 20, 2008

Boingboing, Pheedo, and Cookie Annoyance

I'm one of those paranoid types who likes to keep precise control over who is able to save cookies, and for which duration. For that reason, I have the "Ask me every time" option set in Firefox, so that I can choose for each cookie (and blanket-ban whole domains) whether it will not be allowed to be set, whether it can be set forever, or whether it'll only last for the session. In particular, any advertising related URLs don't get set.

Most of them are pretty nice in playing with Firefox, and they'll try to set a cookie for something like .doubleclick.net, meaning that the same cookie will be used for foo.doubleclick.net, bar.doubleclick.net, whatever. I reject it once, reject everything from .doubleclick.net ever being set, and we're all happy.

The way they do this is by having a single point of entry to their cookie setting, so that:
  • They always serve ads from a single url (like ads.advertising.com)
  • They differentiate the ad either by cookie or by URL (so serving http://ads.advertising.com/8347283479327482937)
  • They properly set the domain to .advertising.com for other usage.

However, then I encountered Pheedo. Their motto appears to be "RSS Advertising Done Right." I might recommend that it's really "RSS Advertising Designed To Annoy Your Audience." I've only encountered them thus far with Boingboing, but it's annoying enough that I might actually drop BB from my blog roll and only read it on the web in the future as a result. Here's what they appear to do.

Every single link that they put into your feed has a unique domain name (not URL) (like e61225ff1c0b2a237f8fb7b3efbe3dd6.img.pheedo.com). That means that you have to individually reject or accept every single image/ad which, by the way, I've already adblocked out. The cookie itself is properly specifying the domain to be .pheedo.com, but the site its being served from is playing DNS differentiation rather than URL differentiation, meaning that the standard Firefox rules don't apply.

I tried to contact Pheedo on this and ask them to stop, but they didn't reply. C'est la vie.

I guess my annoyance is three-fold:
  • Pheedo, for sucking so hard.
  • Boingboing, for using them
  • Firefox's cookie wrangling (which I otherwise love by the way, and is one main reason why I use it on my Mac rather than Safari or whatever) for not allowing more complicated cookie rejection rules (AdBlock has no problem with blocking pheedo stuff, so I never even see the ads anyway).

Sunday, May 11, 2008

Dynamic vs. Static Generation

I've been doing a fair amount of work recently with large systems dealing with semi-dynamic documents in the financial service space (not having anything to do with web sites BTW). One thing that's an inevitable question that comes up is how much you pre-compute, and how much you rely on your dynamic generation system to work at runtime.

For example, in one of the really old-skool CMS systems (think Vignette circa 1999) you were looking at a pure generation system: text went in, went through a recompute-the-world process, and out came fully formatted linked in HTML. If you've ever looked at CNET, it's clear that they're using at least something based on this technique, because I recognize the classic Vignette-style file names.

However, in a purely dynamic system, all content is in the form of some low-level structured persistence store (e.g. RDBMS) and each page is uniquely generated for each request.

The key differentiations here have to do with what you think is your principle scalability issues:
  • Static systems are ideal for seldom (in the grand scheme of things) updating systems with a massive number of readers (think New York Times or BBC News) and minimal per-user customization
  • Dynamic systems are ideal for constantly updating systems with relatively fewer readers and maximal per-user customization (think web stores)
  • Static systems also have the downside that all your content must be generated before it can be displayed, which in the case of a nearly infinite search space of items means that you have to have everything you might possibly serve on disk. Dynamic systems just generate it as needed.

But is any of that really accurate, particularly with modern page layout technology?

Imagine Jeff Atwood's post on how WordPress gobbles CPU time. I know from experience that when we thought dynamic content was expensive (because CPUs WERE expensive), you spent as much time as possible optimizing away your CPU effort, by statically generating everything you can. How much on a typical blog entry really needs to be properly dynamic?

Well, you've got:

  • The text itself. This changes seldom, and seldom enough that in general you're going to want to have a special annotation that something has been updated as a result.
  • Your little bits on the right. These aren't changing with every post, unless you're doing some sort of randomization or prioritization on your blog roll. But that you can do with a dynamic iframe
  • Comments.

It's the last bit that I think people have gone crazy in terms of purely dynamic systems. How often is someone posting? Even for a super-hot post, I contend maybe once per minute for a few hours. How long does it take you to generate that page? Maybe a tenth of a second on a super-hot posting.

So why not follow what we used to do back in the day for systems like this:
  • Statically generate the world. Disk is cheap. Super cheap.
  • Have an event-based system that takes in new "events" (comment, new posting) and modifies the output content as a result?

The key issue here is whether your "system" is sufficient to capture all external interactions with your underlying data model or whether you are likely to go around and poke your database manually. I contend you're probably going to do this seldom enough with most systems that a "rebuild the world" operation would probably suffice, and the performance gains you get out of a mostly-static system (using things like iframe for dynamic content) are going to be massive enough that it's worth the effort.

By the way, why do I care about all this? Static content can be served fast. Super-duper-ultra fast. sendfile fast. Kernel server fast. And it's something that guarantees that you're not into user-level code that requires a lot of explicit concurrency control, which people always get wrong (meaning not fast enough).