Archive for the ‘Machine learning’ Category

Where do you get your (breaking) news from?

There’s been a lot of hype and VC investment lately for web monitoring services that analyze blogs and social media for topical relevance and deliver near real time alerts to customers.

These services claim to find breaking stories, at least a day before they hit the news headlines at major news sites such as The New York Times. With millions of blog posts each day it is indeed not trivial to build a system which can syndicate the best and most recent vertical content for a designated audience such as individual traders targeting, for example, hedge funds.

Of course, it would be impossible to create such services without RSS technology, and it would be also impossible if there wasn’t an automated RSS strategy for the market vertical. Apart from manual monitoring and tagging of posts, the later can only be achieved by the so called automated “machine learning” methods. Ben Barren in a recent post reports that there is an increasing involvement of machine learning experts in this field.

It is no secret that the whole philosophy around Feeds 2.0 is its machine learning personalization engine. It is therefore no wonder that Feeds 2.0 already silently offers such web monitoring services to its registered users and it does so for free! For example the following screenshots depict the reaction of Feeds 2.0 to two breaking news stories that appeared in blogs yesterday and are making headlines around the world… today.

The first screenshot – taken in my Ubuntu box 🙂 – shows posts about Google’s acquisition of Jotspot clustered in a big cluster at the page formed by the tag “Google” yesterday afternoon. The size of the cluster obviously depicts the importance of the story as soon as it started circulating around the blogosphere.

Breaking News at Feeds 2.0 (Google / Jotspot)

The second screenshot shows a cluster with posts about the second important story that broke out yesterday concerning the acquisition of Reddit by Condé Nast. The information has been automatically clustered from sites such as Techcrunch, Digg, Original Signal, etc and it would make an ideal candidate early e-mail alert for an interested audience.

Breaking News at Feeds 2.0 (Conde Nast / Reddit)

It is obvious that the automatic formation of the clusters in the above screenshots helps reduce information overload and also helps users to focus on the actual stories, when data are coming fast. In this way the importance of the stories comes to the surface since the information is coherent and relevant.

I was therefore just wondering…. should we perhaps create a new service and start charging users, at least for just-in-time e-mail alerts? 🙂