IMarsman: 11/01/2006

Thursday, November 16, 2006

Sitemaps protocol no longer just for Google

As Brent might say. Google's sitemaps protocol, mostly just an XML schema to lay out the important bits of your site for web crawlers, is being set up as a more neutral protocol to be used by Google, Yahoo, and Microsoft. I suppose this is just recognizing the fact that though initiated by Google, the sitemap files can be read by anyone. Collaboration in this is pretty easy and makes for good PR. I am planning a site that will be most fun if it uses lots of dynamically fetched and generated content. Unfortunately, this will make it inscrutable by HTML-parsing search engines. Laying out a site map that points to raw content with as much metadata as possible will make it much easier to reliably ensure that search engines get the most information possible. The key is to allow Google or another advertiser maximum exposure to site data and meta information so that ads can be properly targeted. The issue that remains for me is whether or not Google will frown on sending ads to a black box, i.e. will they trust that my AJAX site when calling in Google ads with given keywords is serving up the same contents as is indexed through my sitemaps file. The irony of a lack of trust in my situation would be that I am planning on using Googles Web Toolkit (GWT) to build my dynamic site.

Here's what Netcraft says about the site:
http://www.sitemaps.org was running GWS on unknown when last queried at 16-Nov-2006 08:56:49 GMT

Wednesday, November 15, 2006

AJAX and web crawlers/advertising

I've been looking into the topic of how to both have an AJAX/DHTML website using a toolkit like prototype or an interface infrastructure like Google Web Toolkit. It boils down to the fact that if you need to let the web crawlers in you have to give them something non-DHTML/AJAX to consume. This can be done a few ways it seems, including intercepting page requests and directing to different handlers based on who's requesting and having a parallel site, one AJAX and the other plain old HTML. Another option is to limit the AJAX to page elements that make things more convenient and functional for the user; things like hide/show login areas, etc. Again, the point is to have a strategy to let the web crawlers in.

I'm thinking about all of this in the context of setting up a site that needs to be indexed properly by Google in order for Adsense to work properly. I'm leaning toward the route of having an entry point for web crawlers and another one for the application with exactly the same core content visible on both. The web crawler content would be optimized to provide maximum meta data and minimum extraneous bulk. I'm leaning to this solution because I am intruiged by the Google Web Toolkit and its ability to be used to build a content-rich site. Of course, I'd have to get a half-decent development machine to do this as well since the GWT uses a model whereby the application moves from a Java one in development to a JavaScript on at deploy time. The Java/Eclipse/etc. part is pretty resource intensive methinks.

Monday, November 13, 2006

Two bug firsts

Two of my bugs have made their way on bugguide.net as firsts in their category. One is a picture wing fly. I found it on the stump of a recently cut-down tree in our drive.

The other is a sap feeding beetle. This one was on a moon flower in our neighbour's small but wild front yard.

Wednesday, November 01, 2006

Ladybug takes flight

I've not taken photos of bugs since May, but while camping at Selkirk Provincial Park I got this fluke shot of a ladybug taking flight from my thumb.

IMarsman