Thursday, November 16, 2006

Sitemaps protocol no longer just for Google

As Brent might say. Google's sitemaps protocol, mostly just an XML schema to lay out the important bits of your site for web crawlers, is being set up as a more neutral protocol to be used by Google, Yahoo, and Microsoft. I suppose this is just recognizing the fact that though initiated by Google, the sitemap files can be read by anyone. Collaboration in this is pretty easy and makes for good PR. I am planning a site that will be most fun if it uses lots of dynamically fetched and generated content. Unfortunately, this will make it inscrutable by HTML-parsing search engines. Laying out a site map that points to raw content with as much metadata as possible will make it much easier to reliably ensure that search engines get the most information possible. The key is to allow Google or another advertiser maximum exposure to site data and meta information so that ads can be properly targeted. The issue that remains for me is whether or not Google will frown on sending ads to a black box, i.e. will they trust that my AJAX site when calling in Google ads with given keywords is serving up the same contents as is indexed through my sitemaps file. The irony of a lack of trust in my situation would be that I am planning on using Googles Web Toolkit (GWT) to build my dynamic site.

Here's what Netcraft says about the site:
http://www.sitemaps.org was running GWS on unknown when last queried at 16-Nov-2006 08:56:49 GMT

Wednesday, November 15, 2006

AJAX and web crawlers/advertising

I've been looking into the topic of how to both have an AJAX/DHTML website using a toolkit like prototype or an interface infrastructure like Google Web Toolkit. It boils down to the fact that if you need to let the web crawlers in you have to give them something non-DHTML/AJAX to consume. This can be done a few ways it seems, including intercepting page requests and directing to different handlers based on who's requesting and having a parallel site, one AJAX and the other plain old HTML. Another option is to limit the AJAX to page elements that make things more convenient and functional for the user; things like hide/show login areas, etc. Again, the point is to have a strategy to let the web crawlers in.

I'm thinking about all of this in the context of setting up a site that needs to be indexed properly by Google in order for Adsense to work properly. I'm leaning toward the route of having an entry point for web crawlers and another one for the application with exactly the same core content visible on both. The web crawler content would be optimized to provide maximum meta data and minimum extraneous bulk. I'm leaning to this solution because I am intruiged by the Google Web Toolkit and its ability to be used to build a content-rich site. Of course, I'd have to get a half-decent development machine to do this as well since the GWT uses a model whereby the application moves from a Java one in development to a JavaScript on at deploy time. The Java/Eclipse/etc. part is pretty resource intensive methinks.

Monday, November 13, 2006

Two bug firsts

Two of my bugs have made their way on bugguide.net as firsts in their category. One is a picture wing fly. I found it on the stump of a recently cut-down tree in our drive.
Some sort of Picture Wing Fly

The other is a sap feeding beetle. This one was on a moon flower in our neighbour's small but wild front yard.

Sap-feeding Beetle on Moonflower

Wednesday, November 01, 2006

Ladybug takes flight

Ladybug taking flight

I've not taken photos of bugs since May, but while camping at Selkirk Provincial Park I got this fluke shot of a ladybug taking flight from my thumb.

Wednesday, September 20, 2006

A Java vs. Ruby example

Here’s some Java code I borrowed and wrote to extract multi-word tags surrounded by quotes from a string and add them to a list
// Now extract all multi-word keywords delimited by spaces
// but not surrounded by quotes
p = Pattern.compile("\"(.*?)\"\\s*");
m = p.matcher(keywordString);
sb = new StringBuffer();

while (m.find()) {
// Get previous match and add it to the keywords list
String kw = m.group();
if (! "".equals(kw)) {
kw = kw.trim().replaceAll("\"", "");
keywords.add(kw);
}
// remove the current match from the string
// and thus from consideration
m.appendReplacement(sb, "");
}
m.appendTail(sb);
keywordString = sb.toString();
Here’s the Ruby equivalent
keywordString.gsub!(/\\"(.*?)\"\s*\/) {keywords << $1.strip}
It’s not the lines of code (although the Ruby code is 1/3 the size) that I notice so much as the unintuitive Java API. appendReplacement? What’s that? Verbosity does not add clarity to the Java and the lack thereof does not detract from clarity for Ruby.

Saturday, August 26, 2006

OpenID

I've been investigating (and testing) OpenID lately. The cause, by the way, would be helped if openid.org's url worked without the www subdomain. Openid is an api that allows a person to claim ownership of a url and through that to claim an identity of sorts. It's not a way to prove that a person controlling a url has a certain name, so it's not an authentication mechanism. What it allows is for identity verification to happen in one place rather than over and over and over again in multiple places. An OpenID url I've gotten is ian.marsman.myopenid.com. With this I can log in to livejournal.com, zoomr.com, and other OpenID-using sites. The benefits for the user include the need to have a single identity verification location that can be used on multiple sites. A web application developer using openid as a login mechanism doesn't need to worry about account registration, which is rather nice.

The business model for providing and managing OpenID accounts does not seem to be that promising if that's all one is providing. The API is public and client and server libraries are available in a number of programming languages. One would need to use account management as a way to gain credibility for an identity management consulting business or add extra services on top of the base account management. claimid.com is doing this (or will be once they're out of beta). They seem to want to offer a way for people to point to various urls about the 'net and say "this is mine or about me". They also offer the ability to register other OpenID urls with their site which can be verified by them (the OpenID api allows for this).

In any case, I've installed and gotten running the Ruby version of OpenID. It's available as a gem, which I can't install easily on my non-root-access account. I've thus put all openid libraries under the lib directory of my rails application. This works pretty well. The sample openid_login generator is found and thus can be installed if one puts it in one's ~/.rails/ directory. openid_login.

One gold rush identity management system I'm not crazy about is i-name. i-names can look like "=ian.marsnan" for a personal i-name or "=@myorg*ian.marsman" for a person at an organization. I'm not crazy about this setup because the going rate to register an iname is twenty bucks US. For this, one gets more control over who you give what personal info to. However, OpenID has the ability to create profiles and choose which profile to give to a site that's requesting permission to access one's identity. i-name is an api designed by rather large organizations. OpenID is more grass roots, although Verisign is on the standards committee. Who knows how things will pan out. Both offer the hope of single sign-on. i-name seems more targetted at uses for businesses like corporate identity management and online banking signon authentication. It's a big topic and I'm starting to wander. At the moment all I want is a way to offload user signup management and give people a way to avoid adding my site as another to keep membership track of.

Wednesday, May 31, 2006

DIGITAL MAOISM: The Hazards of the New Online Collectivism

A great piece on the risks involved in building knowledge using groups. Essentially, the author seems to be suggesting that the results of consensus is not necessarily genius or deep insight, but rather blandness or at the very least, something lacking in boldness and insight. A great read.
The beauty of the Internet is that it connects people. The value is in the other people. If we start to believe the Internet itself is an entity that has something to say, we're devaluing those people and making ourselves into idiots.
The trick is to allow people to have a say and interact while preserving the individual. The article's not a slam against algorithms or Wikipedia, but rather a call to carefully consider how peoples' individual and collective wisdoms can best be used without wiping each other out.

Friday, March 10, 2006

Potential calendar problem in Java!

Java has a GregorianCalendar class (representing a data's year, month, day, etc.) with a get method that takes an integer argument and returns things like that instance's year as an int value. This means that the value returned will be invalid for year values beyond 2,147,483,647 (the maximum value of a Java int)! I think that's the time in Babylon 5 where humans escaped their physical bodies and left our solar system to avoid the impending explosion of the sun. Seriously, of more concern to me is the pain in the neck date and number parsing and formatting is in Java. Arggh! Things like this bug me:
int remainder = new Double(
Math.IEEEremainder(
new Double(i).doubleValue(),
new Double(startMonth).doubleValue())/12
).intValue();
What about Ruby's
 (27 % 12).to_i
If that's syntactic sugar I'll risk the cavities.

Tuesday, February 14, 2006

Letter to the Discovery Channel

Assassin spider with preyYou have an informative article on newly discovered species of assassin spiders from Madagascar. Unfortunately, the person writing the story chose to depict them as bizarre and ugly, with phrases such as "recognized by their peculiarly ugly, stretched-out necks and sword-like fangs" and "venom-loaded fangs, attached to the ends of grotesquely stretched-out jaws". Perhaps the author has a particular dislike of spiders or perhaps he wanted to avoid sounding too "sciency". Whatever the reason, I think that descriptions of well adapted spiders as ugly and bizarre does both the spiders and the intelligence and curiosity of your readers a disservice. Most of those who read the article will most likely be there because they find spiders interesting, not horrific. Besides, from the first image the spiders' jaws, head, and necks look a lot like the beak, head and neck of a pelican or stork. One doesn't hear of the blue heron as "having a grotesquely long beak adapted for stabbing its unsuspecting prey from above with lightning speed". In parting I'll leave you with a link to some images of beautiful, well adapted spiders

Thursday, January 26, 2006

Chandler is coming along

The Chandler project, started by Mitch Kapor of Lotus fame, is finally starting to look polished and usable. At the moment, the most polished component is the calendar, which has some beginning ineroperability with the server using CalDav. Previously, there were no screenshots of the application, mostly, I suspect, because it was so ugly. It takes a long time to get the backend for such a data-driven application going and yet more time to get the backend hooked into the front-end. Congrats!