Jun
04
2009

Wave goodbye to spam

Google Wave combines the best of email, instant messaging and real-time collaborative editing into a new form of online communication.

The email paradigm of ’send and receive’ is replaced with a model of hosted conversations, in which “people can communicate and work together with richly formatted text, photos, videos, maps, and more.”

Wave is refreshingly ambitious. In years to come, I hope we will be waving nostalgically about email as “something that my parents used to do.”

This blog post describes an idea built upon Google Wave that could also turn email *spam* into the stuff of nostalgia.

Spam sent by people you don’t know is a real pain in the inbox. But simply ignoring emails from people you don’t know is not the answer. (Otherwise I would never have learnt about my recent win on the Nigerian lottery. Just kidding.)

So how might Google Wave help us to finally wave goodbye to spam?

  • assume that developers will build robots to connect my wave account with the rest of my social graph (either that &/or Google plugs in Friend Connect)
  • if someone (or a spambot) outside of my social graph invites me to a wave, my wave server responds to that invite with a reCAPTCHA challenge (try one out below)

  • a spambot will fail to solve the reCAPTCHA so i am spared the distraction of spam waves
  • a genuine person however will be able to solve the reCAPTCHA challenge, so i can enjoy an invite for a genuine non-spam wave
  • alternatively the genuine person can befriend me on Facebook, Twitter, etc. to avoid the need for a reCAPTCHA
  • and later the genuine person can befriend me on Google Wave to avoid the need for futher reCAPTCHAs

Here’s the translation for my parents: “if I am sent an email by someone I don’t know yet, make sure it’s from a real person before bothering me about more /iagra.”

This solution could be built as an optional extension to a wave server. But that would not be optimal for genuine people waving to others they don’t yet know, because multiple reCAPTCHAs might be required for a single wave. So an improvement would be:

  • when a reCAPTCHA is solved, my wave server issues a “Turing token” (proof of humanity) that is also valid for other invitees connected to my social graph
  • this “Turing token” can be securely federated between wave servers so that others in my social graph know that the wave originated from a genuine person

That’s it; an idea for combating spam using Google Wave. Thoughts please!


This post originated as a tweet that was imported here by Fresh From, and then I thought about it some more.

Jul
01
2009

Fresh From Twitter

happy new financial year

and happy Canada Day! Here’s the full mounty spotted in North Sydney this morning http://flic.kr/p/6AQGZ6

happy new financial year

you can now respond ‘google’ to an #aardvark question if you think #google has a better answer http://bit.ly/3avdqQ but no #bing option yet

easter egg? i just wrote ‘do not use Windows’ in an email, and Windows promptly gave me the bluescreen of death

(busted) WHAT are you doing? (curious) what ARE you doing? (motivational) what are YOU doing? (@Rove1974) what are you DOING?

Powered by Fresh From
Written by Bob Hitching in: Twitter | Tags: ,
Jun
30
2009

Fresh From Flickr

and happy Canada Day! Here’s the full mounty spotted in North Sydney this morning
Image

Written by Bob Hitching in: Flickr | Tags: ,
Jun
14
2009

Fresh From FriendFeed

Docs Are Old-School, We Need PageRank for People
Image
leo: yes indeed, and such a content filter would work well by taking into account the distance through the social graph between the author and each reader, rather than using a fixed measure of the author’s reputation for all readers.

Written by Bob Hitching in: FriendFeed | Tags: ,
Apr
22
2009

Mobile Social Technology and Alternate Reality Gaming (ARG)

Today I spent an enjoyable couple of hours at the Australian Film Television and Radio School (AFTRS), learning about Multi Platform Content, and talking about Mobile Social Technology & Alternate Reality Gaming (ARG).

We examined some emerging mobile social technologies, and how they can enable new forms of story-telling. And we shared my personal journey into a Star Trek Alternate Reality Game which has so far involved me sending pictures of sheep to strangers in Paris, and which explains my recent cryptic Twitter and Facebook status updates. Well some of them anyway.

The slide deck is embedded below, and contains all the links for those of you who asked.

[Update 3 June 2009] OMG! I was chosen as one of the five finalists in the game. Here’s a video of Leonard Nimoy putting my name into the hat to pick the winner.

(more…)

Mar
15
2009

10 cloud datasets that I’d like to mashup

Cloud computing is being sold as a hosting architecture to provide instantly scalable on-demand computing power, storage and bandwidth.

“The cloud’s resources scale with user demands. Pay only for what you use” says RackSpace, the latest to join the cloud gang.

One problem for the cloud gang, however, is that hosting has always struggled as a low margin commodity business.

Rackspace has just hired Robert Scoble to help spread the message, so we should expect this space to soon get hotter than an Sun SPARC with a loose heatsink.

But where exactly can some value be added in cloud computing, to increase the margins and keep Scoble funded so he can continue to filter the signal from the noise on FriendFeed? Okay, that’s slightly selfish but it’s an interesting question.

The interesting answer IMHO is cloud datasets.

Having useful datasets available in the cloud will unlock value from the data by allowing a new generation of mashup. These aren’t mashups that simply use data from remote web services, like plotting Craigslist ads onto a Google Map. This involves the mashup (joining) of datasets in the cloud using the power and speed of a relational database.

This cloud database approach might also provide Twitter and other owners of valuable data with a revenue model that doesn’t depend on advertising.

Here’s 10 cloud datasets that I’d personally like to mashup, to help explain:

1. Wikipedia. Funnily enough Amazon Web Services has just announced that it now offers a 66Gb dataset of Wikipedia. “The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and redirects are extracted in tabular form.” One example: imagine the opportunities for a start-up social travel site to mashup its content with the wealth of travel information now available on Wikipedia. Massive.

2. Geonames. It bugs me that everyone who wants to use the geonames database needs to duplicate 800Mb of data. Move it into the cloud! Example: the travel site can now analyse reams of user-generated content (or Wikipedia content) for up-to-date categorization and geo-coding onto a map. Another example: most websites need a simple (but updated-more-often-than-you-would-think) list of countries on the rego form. Wouldn’t it be good if everyone used the same (geonames) list?

3. MaxMind IP address lookup. Turn an IP address into an always accurate city location. Example: targeted ad serving and traffic analysis.

4. Google PageRank. For any URL, what’s the PageRank measure of quality? If this is relational data (rather than from a remote web service), it can be combined with other measures of quality at database speeds.

5. Real-time stock market data.

6. Real-time sports data.

7. Dodgy credit card numbers.

8. Dodgy email addresses.

9. Twitter. Some of the above might be considered proprietary rather than public data, which brings me to Twitter and a potential revenue model for them and the cloud gang. If you’ve got valuable proprietary data like Twitter has got (some would say that’s all they’ve got), then replicating it into a relational cloud database will unlock more value than could ever be extracted (or sold) via a remote web API.

Example: when visiting an e-commerce site, it would be nice to see only the product reviews submitted by people I am following on Twitter, sequenced by a measure of quality based on how often those people have been retweeted. Of course, the cloud gang already have the billing infrastructure and monitoring in place to work out exactly how much proprietary data you have used, and what to charge you for it. Did I mention yet that Jeff Bezos is an investor in Twitter?

The advertising pie is not big enough to fund the whole of the interweb, so perhaps paid data consumption is the revenue model for Twitter and others. Businesses are happy to pay hosting providers for commodity services like CPU cycles and disk space, so why not pay Twitter (via a hosting provider) for valuable information? Did I mention yet that Jeff Bezos is an investor in Twitter?

10. This one is further out there; private foreign keys. Imagine the Twitter dataset including the email address of users, joined using that email address to a Facebook or Digg dataset, but not revealing that email address in the result set. That’s number 10 on my list. It would need to work in a similar way to Facebook’s FQL or Yahoo’s YQL or Google’s GQL, to expose enough information to be useful but to not expose anything that would violate privacy concerns. I hope to write some more about this and the privacy implications in another post.

So, who’s in the cloud gang? Google is well placed with AppEngine and plenty of valuable datasets to get started with. Amazon has all the billing machinery in place to sell proprietary data from Twitter and others. Sun now has MySQL which already supports remote replication and column-level permissions to enforce private foreign keys. And now RackSpace has Robert Scoble. This will be an interesting one.

Feb
19
2009

10 ways to combine your blog with your micro-blogging

Your micro-blogging on Twitter or FriendFeed is topical.

Your blog is quality.

Both are valuable. How can you combine the two?

1. Display your FriendFeed content on your blog using an embeddable widget:

2. Display your latest Twitter updates on your blog with a customized widget:


3. Combine the display of RSS feeds from FriendFeed and Twitter and elsewhere using an RSS plugin on your WordPress, Blogger, Moveable Type or TypePad blog.

4. Install Fresh From FriendFeed and Twitter - a WordPress plugin that keeps your blog always fresh by regularly adding your best recent content from FriendFeed or Twitter. Unlike the above solutions that only display content, Fresh From allows your visitors to search your micro-blogging content, and allows you to easily edit, tag and turn it into regular blog posts. Disclosure: I wrote this plugin, it got me thinking about this post.

5. Going the other way, FriendFeed makes it easy to import your blog’s RSS feed into FriendFeed.

6. Make sure your blog’s feed is using Media RSS extensions if you can, so FriendFeed picks up any media attachments. There are a couple of Wordpress plugins available that achieve this.

7. You can import your blog’s RSS feed into Twitter using services such as twitterfeed and RSS To Twitter:

8. Alex King’s Twitter Tools is a WordPress plugin that creates a tweet on Twitter whenever you post in your blog, with a link to the blog post. It can also create a daily or weekly digest post of your tweets on your blog.

9. Glenn Slaven’s FriendFeed Comments WordPress plugin will take the comments & ‘likes’ on your posts from FriendFeed and place them on the post that they’re related to on your blog.

10. If you are using the Disqus comment system on your WordPress, Blogger, Moveable Type or TypePad blog, comments can now be synchronised between your blog and FriendFeed.

What have I missed out? Comments please!

Jan
07
2009

What’s the difference between user generated content and user generated rubbish? Comments please…

Some user generated content (UGC) is genuine, honest, credible, reputable, trustworthy, valuable, quality information. But some is rubbish (let’s call that UGR), including deliberately misleading propaganda, biased blog comments, bogus product reviews, spam, veiled advertising, and bad poetry (or is it just my blog that attracts poetry bots?)

Google’s PageRank algorithm does a good job of measuring the quality of a simple web page, based on the number of incoming links to that page, and recursively weighted on the quality of those linking pages. However, web2.0 has given us blogs, wikis, forums, media sharing, customer product reviews and ratings, social bookmarking, and more recently aggregation of all of the above; resulting in web pages that contain an increasingly complex array of UGC and UGR, making it increasingly difficult for algorithms, and site visitors and site owners to filter the signal from the noise, the UGC from the UGR.

So I wanted to write a post about some of the emerging technology innovations attempting to solve this problem. Readers are kindly asked to add a comment at the bottom of the post. All comments will be shown, even bad poetry, for purposes of research and experimentation.

Measuring quality is relatively easy for eBay. Its Feedback Ratings provide an excellent indicator of trustworthiness, because online auctions involve measurable user actions such as ‘Was the product description accurate?’ and ‘Did the buyer pay up?’ Such actions speak louder than the mere words of a blog comment or product review.

Amazon now owns a valuable database of customer product reviews to help people through their purchasing decisions. Innovation by Amazon in this area has included the ability to provide feedback on the usefulness of other users’ comments, and a Reviewer Rank algorithm which provides a measure of reviewer quality (interestingly, this algorithm was recently improved to include some PageRank-like recursiveness).

In a past life I had the pleasure of working for Lonely Planet, a travel publisher whose credibility and quality has been built upon the independence of its authors and their unbiased travel reviews. Lonely Planet and its peers have long struggled with the opportunity to harvest UGC from loyal and passionate travelers, because it is just so difficult to measure the independence and quality of contributing users.

TripAdvisor was allowed to emerge as a disruptive force in the market for travel advice, allowing anybody to review any hotel or restaurant. That created a lot of quality content for a while, but ever since hotel owners found out about TripAdvisor and began to review their own hotels, it’s been difficult to tell the UGC and UGR apart. TripAdvisor still desperately needs a reliable measure of user generated quality to restore its credibility.

Perhaps social networking can help TripAdvisor; being able to filter your travel advice to that written only by your friends would eliminate biased reviews (unless you are friends with a bunch of hotel owners, in which case you’re probably going to stay in their hotel anyway). But until the internet settles on a standard for social data portability, not many of us will have enough online friends who have traveled enough and generated enough online travel content for such a social filter to work reliably, even allowing for recursive algorithms.

If it’s just travel advice and inspiration you’re looking for, you could wait for Lonely Planet’s upcoming blog syndication feature, which promises a novel solution to the problem.

But more generally, I think we all need a universal reputation system, one which aggregates lots of measures of quality from lots of different sites. Imagine if you could easily see a summary of my quality metrics from eBay and Amazon and Yahoo Answers and LinkedIn Answers and GetSatisfaction, perhaps even my Bugzilla and Basecamp metrics too; would that be enough for you to trust my travel advice and any other content that I generate?

Site visitors would benefit from increased visibility of users who generate content. Genuine contributors would be encouraged by being able to build a universal reputation for quality UGC, and discouraged from the risk of creating UGR. And site owners would benefit from data to filter out the UGC from the UGR.

A universal reputation system could also help to eliminate online vote rigging, astro-turfing (all those reviews of iPhone apps posted by the developers themselves), and space-faking (setting up false identities on social networking sites).

Who are the players?

SezWho SezWho provides a plugin for blog commentary which presents a useful summary of UGC history for each contributor, and allows customizable 5-point rating scales for site owners.
Intense Debate Intense Debate has a great interface design. It’s recently been acquired by Automattic, the owners of the Wordpress blogging platform, which will provide some valuable distribution, perhaps critical mass. But will the other blogging platforms want to adopt or integrate with a standard controlled by a competitor?
Google Friend Connect Google Friend Connect allows any site to embed a comments or ratings gadget onto any page. The universal view of previous UGC is not there yet, however this will become powerful when integrated fully with Google’s other stuff; Blogger and SearchWiki and the Social Graph API and YouTube (arguably the site most in need of a UGR filter!)
Disqus Disqus is getting lots of press for its prompt Facebook Connect integration which takes the hassle out of commenting. Video comments can by posted, powered by Seesmic. Readers can nudge comments up and down the list by voting on them. Try it out below.

If you have a view on who will win the race to become the universal reputation system, please comment below. Are there any other players that I have missed out? (Yes I know that is exposing me to some comments on the quality of this post!)

Also here’s some further questions to inspire some commentary:

  • Should we settle on a word for what is being measured here? Quality, importance, value, trust, reputation, credibility, honesty, transparency? Or will the winner of the race provide a web2.0 brand name to describe this concept of a universal measure of user generated content?
  • Is it even possible to determine an objective universal score? The success of PageRank would suggest yes. Or is quality in the eye of the beholder? Is one person’s signal another person’s noise?
  • Would a universal metric destroy the democratic level playing field that is UGC / UGR?
  • What are the consequences of such a universal reputation system being gamed?
  • How likely are eBay and Amazon to open up their reputation data? What are the privacy implications?

Thoughts please. Don’t be shy!

Dec
22
2008

Social data portability: who benefits?

In 2006, a certain old-media tycoon reportedly asked Mark Zuckerberg, the 20-something founder of Facebook, “how can I build a social network like Facebook?”

Zuckerberg replied “You can’t!”

What Zuckerberg meant was that Facebook hadn’t set out to ‘build’ a social network. His billion dollar insight was that Facebook would instead provide online social tools to help existing friends and existing social groups to communicate easily, share photos, stalk, and poke each other.

Then in 2007, Facebook opened its app platform for third party developers to add additional social stuff to keep users on the site. Soon we were all happily throwing sheep at each other and spamming our friends with app invites.

App fatigue arrived in 2008. A redesign of the Facebook site removed some of the weeds, but the metrics spoke loudly, or rather their unit of measurement did; popular apps began to be listed according to ‘monthly active users’ rather than ‘daily active users’.

Slide, RockYou and iLike had been quick enough to make some money, however there was a long tail of apps without enough active users to generate a decent return on investment. The app gold rush was over.

It become apparent that there was less value in creating new social activities inside of a social site such as Facebook, and more value in socializing, or adding social data and context to, the existing sites that people are already using out there in the big wide web.

In other words, social data portability has arrived, and extends Zuckerberg’s earlier “You can’t!” insight; you can’t ‘build’ the platform because the web is the platform.


We are told that data portability is for people who want more control over their data and do not want to be locked in to any particular social network. In 2008, Facebook Connect and Google Friend Connect and MySpaceID have emerged as the big solutions from those wanting to port your social data, and profitably.

Facebook makes money from people viewing and clicking on ads on their website. Facebook Connect therefore allows you to export your Facebook profile and friend list to external sites, but really is intended to increase activity back on the Facebook website, by importing social information from those connected external sites back into your Facebook Feed for your friends to see. MySpaceID ditto.

Google however makes money from people clicking on ads anywhere, so Google Friend Connect can afford to remain socially agnostic, allowing users to identify themselves and their friends according to any network they belong to, and feed their external site activity into the social sites of their choice.

Being socially agnostic is more useful to more users in theory, but not yet in practice for Google Friend Connect. Even though it would be technically simple for Google to access your profile and friend lists using the Facebook Platform, what happened when Google submitted its Friend Connect app to Facebook for approval earlier in 2008?

Zuckerberg replied “You can’t!”, then added some fud about privacy.

This week however Google was able to make some progress on the theory of Friend Connect by launching an integration with Twitter. It’s now possible for you to use your Twitter identity and friends list on external sites powered by Friend Connect, which significantly increases the chances of spotting someone you know on those sites.

What’s interesting about this recent development to me is the apparent haste, including Google asking for my Twitter username and password directly, rather than waiting for Twitter to complete its long-awaited OAuth implementation. I’ve also seen more than the usual number of server errors and teething problems in this latest build of Friend Connect.

Maybe this is an indication that OAuth will be coming soon from Twitter, which would be fantastic.

Or maybe this is an indication that Twitter will be coming soon from Google; some visibility into Twitter data would be useful for Google in working out an acquisition price.

Or maybe this haste reveals how social data is such a hugely valuable chunk of information for Google to organize, and monetize, if ways can be found to use external social data to improve ad targetting without abusing the privacy of users and the privacy policies of their social networks.

In any event, there are interesting times ahead for social data portability. Users stand to benefit from a richer, more social, internet experience, as long as their privacy is not abused. And stay tuned on the social data portability battle between Facebook and Google and MySpace: who will work out how to best monetize external social data in 2009?

Dec
06
2008

Thanks to everyone who sponsored my Mo

Movember was a big success this year, raising awareness and $20 million for men’s health issues including prostate cancer.

Thanks to everyone who sponsored my Mo, and to anyone who smiled knowingly or looked aghast or otherwise encouraged me along the way.

Here’s the video evidence:

The Mo will return in 2009…

Written by bob in: everything | Tags: , , ,
Nov
21
2008

SearchWiki + OpenSocial = mainstream social search?

Google today launched a rather massive change to its core search product.

SearchWiki adds some innocuous buttons to your search results page, enabling Digg -style voting and Friendfeed -style commenting on each result.

swiki

I think this feature might prove valuable for some users, at least the bad spellers among us and those who prefer to repeatedly type the same search term into Google rather than use bookmarks or their memory.

However this feature becomes massively valuable for Google if enough people bother to vote for their favourite sites and add comments. Harnessing the collective wisdom of all those users is a great way for Google to improve upon its not-so-secret-anymore search algorithm.

Currently your own SearchWiki wisdom impacts only your own search results, nobody else’s. But the words chosen to explain SearchWiki do leave the door open for Google to evolve into a social search engine; “Customize your search results with your rankings, deletions, and notes — plus, see how other people using Google have tailored their searches.”

Personally, I’m not sure how much I want strangers (or bots) to influence (or game) my search results.

But I might want my friends and social networks to influence some of my search results.

If only Google could somehow identify all my friends in all my social networks, and keep track of their searching activity. Wait a minute…

SearchWiki + OpenSocial = mainstream social search.

The web is the their platform.

Written by bob in: everything | Tags: , , , ,
Nov
19
2008

Speaking in HTTP/1.1

I’m a big fan of Hypertext Transfer Protocol. I am particularly fond of HTTP status codes and the meanings they convey with such concise and precise brevity. I just don’t GET why they are not used more often in natural language, and so this POST contains some examples of how we can start to use HTTP status codes in everyday dialogue.

Rita, trying to wrestle Bob’s attention away from his laptop: “Hey! Bob…?”

Bob, calmly, with Keanu Reeves curling fingers gesture: “100…”

Rita: “… will you put the bins out please?”

Bob, with a shrug of the shoulders: “202?”

Rita, with a roll of the eyes: “406!”

Bob, putting on shoes: “200 200 200 … It’s raining. Where’s my hoody?”

Rita, matter-of-fact: “302. Charity shop.”

Bob, slowly, to himself: “4 … 0 … 9″

Rita: “… and while you’re up, can you pop down to Woolies and pick up some bread and milk …”

Bob: “503″

Rita: “… and a pack of cheese sticks for Jack’s packed lunch. And nappies. And some of those stuffed jalapeño peppers ….”

Bob: “408″

Teapot: “418″

Bob, to the teapot: “Oh don’t YOU start.” (ref)

Rita: “Seriously; I NEED some jalapeños!

Bob, smiling: “402″

Rita, blushing: “403 …”

Bob, tasting victory: “…”

Rita, faking defeat: “… 200″

Bob: “200″

Written by bob in: everything | Tags: , ,

Powered by WordPress. Theme: TheBuckmaker.