Interweb++

Dapper: Web Scraping gets a 2.0

I just took a look at Dapper - the demo shows what’s basically Web Scraping 2.0. By looking at a few pages from a site, Dapper can analyze the page structure and let the user point and click fields, instead of the old school way of viewing the source and finding patterns to match against.

I don’t imagine this is in their business plan, but I’d love to see this built into an API for various languages. Currently Dapper can generate XML, JSON or YAML (among other output formats), but you’re still reliant on their server, which may not be appropriate for internal apps within a company.

It’d also be interesting to see how adaptable an algorithm could be against changes to the markup. I remember a company that offered “view your other accounts” features for banking websites, and they basically had to hand-code scraping algorithms for a huge array of bank and investment accounts that kept breaking as companies changed their layouts. I think at one point they had someone working pretty much full time checking and re-checking sites. Just like with example based machine translation, could automated markup analysis help with site changes?

Technorati Tags: ,

Interweb++

Comments (0)

Permalink

All I want for Christmas is a full-text I, Cringely feed

I, Cringely is semi-offline for the next few days while they install some new blogging software, but the RSS feed finally has full-text content instead of the one-liners of the past! Here’s hoping that the full-text isn’t just a placeholder while the core site is offline, and it’ll still be there when the site returns.

(For those who miss NerdTV, there’s also a preview video on the temp page)

Technorati Tags:

Interweb++

Comments (1)

Permalink

Free AJAX “loading” widgets

I keep meaning to get more into AJAX, but then I figure I’ve got to go and get me some cool “transaction in progress” graphics that all seem to look like stuff from the Apple desktop.  Well now I’m out of excuses, thanks to Ajaxload.  The service lets you pick a few options and generates an animated graphic for you on the spot.  Now I’ll just have to admit I’m too busy watching Fraggle Rock to get on with coding…  (via The Farm)

Interweb++

Comments (0)

Permalink

Building Scalable Web Sites

As follow up to my Carson Workshops post, I just noticed that Cal Henderson has a book out called Building Scalable Web Sites, and it’s on Safari.  Added.

Speaking of Safari, I like the new look, but I really miss the bookshelf management features.  In the old version, if you tried to add a book but your bookshelf was full, there was a workflow that would let you pick a book to remove and then add the new book in a single sequence.  Now I need to go remove the old book, find the new one again, and add it, but hey, there’s lots of plugs for “upgrade your account” so you can get more slots in your bookshelf.  I’m sure it’ll help with their upsell conversions,  but it still kinda stinks…

Just so I can end this post on a high note, I absolutely love the fact that I can type “Building Scalable Web Sites” into the address bar of FireFox and I get the O’Reilly page for the book.  Sure, it doesn’t work for everything (though “miserable failure” becomes funny all over again when it’s through the address bar), but it saves me a step for well known stuff.

Interweb++

Comments (0)

Permalink

July 19 is going to be Flick-tastic!

I just registered for Carson Workshops’ Building Enterprise Web Apps on a Budget seminar! Yay!

Seriously, I’m stoked. I’ve never seen a program outline so in sync with the kind of stuff I do on a daily basis, and Cal’s talk at the Future of Web Apps summit was one of my favourites from the set. Life’s awesome!

Better still, this might just be my rationalization for a new laptop. Oh yeah, I feel the rationalization train pulling up…

Interweb++

Comments (3)

Permalink

The state of the domain name universe

Dennis Forbes wrote up some interesting Facts About Domain Names, which included a few semi-useless four letter names that were still available, like agjv.com. Sure enough, that one was registered on the day the article was posted. eiyk.com, however, seems to still be available.

I managed to resist the urge to pick up any of the other low hanging names mentioned in the post, but it still inspired me to doodle around at Whois.net. I ended up picking up fartle.com, because I thought it was a funny name. It wasn’t until after that I looked it up and discovered the semi-official definitions. Right now I’m thinking it’d make a good name for an e-card site. Any other ideas?

(via Seth)

Interweb++

Comments (0)

Permalink

Web 2.0 or Star Wars?

I was proud of my score on Cerado’s Web 2.0 or Star Wars Quiz until I read the score interpretations at the bottom. Of course, the convergence of Web 2.0, Star Wars, and the concept of having a life might not be the best fit - maybe a Web 2.0 or Hockey Player quiz might be a better bit of outreach for the web community, or it would be, if this was 1989 and people still thought computers were dorky. Whatever, just posting the link felt wasteful. (via Monkey Bites)

Interweb++

Comments (0)

Permalink