So, what’s the technology behind Feeds 2.0 personalization engine ?

A lot of people ask us about the technology on which Feeds 2.0 powerful personalization engine is based. Even though Feeds 2.0 personalization algorithms are proprietrary and patentable, we believe that we can indeed elaborate on the principles of our algorithms since, after all, they represent state of the art techniques in information retrieval and machine learning.

Feeds 2.0 personalization engine is based on the principle of text categorization. Text categorization is the process of classifying documents to one or more existent categories according to the concepts present in their texts. The organization of text in categories allows the user to limit the target of a search submitted to an information retrieval system (e.g. a search engine), to explore the collection of documents, and to find relevant information to their needs without any prior knowledge about the various keywords describing topics.

You can think of the process of personalizing individual posts coming from various feed sources as a text categorization task. In this case there are just two categories: Interesting and not-interesting groups of posts. For each individual user, Feeds 2.0 assigns new posts into one of his/her interesting or not-interesting groups.

The text categorization task can in general be utilized by machine learning algorithms or computational intelligence techniques. These algorithms can be for example artificial neural networks (feedfroward networks or Self-organizing Maps (SOM) ) or more traditional machine learning algorithms like for example C4.5 decision trees, PART decision rules and Naive Bayes or Markov classifiers.

Comparing the best performance of each algorithm, in terms of classification error, experimental results have shown that artificial neural networks are good classifiers for text categorization problems. In general, the feedforward networks are distinguished as the best classifiers and the SOM networks have usually better performance than traditional machine learning algorithms.

Feeds 2.0 uses a unique combination of the principles of the above techniques. In particular, it utilizes advanced statistical natural language processing and feature selection techniques as well as proprietrary artificial neural network classifiers. Other factors are also taken into account, like for example the sources a particular user likes, or authors and topics he’s interested in. This combination provides an advanced computational intelligence framework which gives Feeds 2.0 personalization engine a classification accuracy almost equal to 100% for the individual categories of each user.

In our next post we will elaborate more on the Feeds 2.0 Recommendation feature.

Advertisements

14 comments so far

  1. didaio on

    Hello!
    I find some problems with Feeds 2.0.
    One of important problem is tags recognition with cyrillic languages (Russian, Ukrainian). In tag cloud popular tags is “in”, “for”, “where”, “about”, “no”, “and” (I list english analog of russian and ukrainian words), etc.
    Another problem is related news in cyrillic language. News about sport and about web showed as related. Or about sport and about politic.

    I have problem with go to “Next” page. Sometimes page doesn’t load. I see infinitely “Loading” in tab and “waiting for feeds2.com” in status bar. When I try go to http://www.feeds.com in another tab I see infinitely “waiting for feeds2.com” too. I should close and open browser again for opening Feeds 2.0 Home Page.
    I use Firefox 1.5.0.4. In IE7 beta3 all worked good.

    I don’t find how I can mark feeds (folders) as read. And readed news/feeds/folders doesn’t marked read after reading.

    My interesting/non-interesting preferences doesn’t saved. When I check down arrow or heart, go to other page and return to previous I don’t see my preferences…

  2. feeds2 on

    Hi Didaio,

    Thanks for your comments. Please find below some brief explanations why you’ve encountered these problems:

    >One of important problem is tags recognition with cyrillic languages (Russian, Ukrainian). In tag cloud popular tags is “in”, “for”, “where”, “about”, “no”, “and” (I list english analog of russian and ukrainian words), etc.

    Unfortunately we do not have a list of common stopwords for cyrillic languages so that we can remove them. Since these common words have a higher probability of appearing in posts, they tend to dominate the content of the posts and consequently get extracted as auto-tags (and appear in the tag cloud also).

    Would it be possible that you or any other cyrillic speaking user could provide us with such a list of common stopwords for cyrillic?

    >Another problem is related news in cyrillic language. News about sport and about web showed as related. Or about sport and about politic.

    This is due to the problem desribed above. Since the content of the posts has not been cleared by the stopwords, posts tend to share many common of these stopwords which induces noise to the clustering process.

    > When I try go to http://www.feeds.com in another tab I see infinitely “waiting for feeds2.com” too. I should close and open browser again for opening Feeds 2.0 Home Page.

    Feeds 2.0 uses Ajax technologies for its display. Tabbed browsing for Ajax based sites is generally not recommended. We would recommend to stick to one tab browsing especially when clicking on posts. By clicking on a post to read it you also train the system whereas this training is not performed when you right-click and open the post in a new tab or window!

    >I don’t find how I can mark feeds (folders) as read.

    The “marking posts as read” (without actually reading them) feature has not been implemented yet. In this stage of development we have focused on features that make Feeds 2.0 unique among other RSS Aggregators rather than implementing all the standard aggregator features. Of course, we plan to add them shortly and definitely before we leave the private beta stage 🙂

    >And readed news/feeds/folders doesn’t marked read after reading.

    Read posts can be found in your Read Items folder. You will also notice that posts titles of read posts have a different colour from unread posts.

    >My interesting/non-interesting preferences doesn’t saved. When I check down arrow or heart, go to other page and return to previous I don’t see my preferences…

    Did you turn “Personalization On” to sort posts according to your interests? If the answer is yes and you still didn’t see a difference you may either have to press F5 to refresh the page, or just select again the same folder or feed in which you trained a post.

    Thanks for the feedback.

    The Feeds 2.0 team

  3. Doron on

    Hello Feeds 2.0 team,
    does your clustering and personalization engine technology support languages other then english (and in my case, hebrew)?
    cause if i understand your replay to the first comment, its seems you need to have some knowledge of the language you analyze in order for those technologies to work well.

    do you have plans (that you are willing to share with us) for using this technologies for automated tagging of web pages?
    thanks and good luck.
    can’t wait to get into the beta stage.
    Doron.

  4. feeds2 on

    Dear Doron,

    Feeds 2.0 can gather and display feeds written in any language (Hebrew are also supported) because it internally stores the information using robust unicode character encoding.

    Our personalization and clustering features do support languages other than english, provided of course that we have (as you correctly stated) some knowledge of the properties of the language that we should analyze.

    Currently most of our features (personalization, clustering, auto-tagging etc) support the following languages: English, French, Greek, Danish, Dutch, Finnish, German, Italian, Portuguese, Spanish, Swedish, Chinese (simplified), Japanese and Arabic.

    As is the case with the answer that we gave to Didaio, concerning cyrillic languages, we would be very much oblidged if you or any other Hebrew speaking user could supply us with at least a set of common Hebrew stopwords so that we can reduce noise and increase accuracy in order for these advanced features to fully support Hebrew.

    Our current plans do not include the auto-tagging, personalization and clustering for whole web pages. Obviously the technology is there and can be applied to other domains apart from feeds, however at this stage we have focused our efforts on the developement of Feeds 2.0 so that we open a public service as soon as possible. In case, however, that you have something more concrete in mind we will be very glad to discuss it with you.

    Thanks,

    The Feeds 2.0 team

  5. didaio on

    > Would it be possible that you or any other cyrillic speaking user could provide us with such a list of common stopwords for cyrillic?

    Yes, of course. Can you send me list of stopwords for English, where I can see examples? My e-mail didaio [at] google.com.

    > Feeds 2.0 uses Ajax technologies for its display. Tabbed browsing for Ajax based sites is generally not recommended. We would recommend to stick to one tab browsing especially when clicking on posts. By clicking on a post to read it you also train the system whereas this training is not performed when you right-click and open the post in a new tab or window!

    I doesn’t have any problems with Gmail, what written with AJAX too 🙂 And I open all links in Feeds2.0 w/o tabs. Today I haven’t any problem with clicking next….

    > Read posts can be found in your Read Items folder. You will also notice that posts titles of read posts have a different colour from unread posts.

    I know it. But I don’t understand how I posts marked as read automatically? It’s happened only when I open some article or for all articles in one feed when I open this feed?

    > Did you turn “Personalization On” to sort posts according to your interests?

    No 🙂 TY

  6. feeds2 on

    Thanks Didaio,

    >Yes, of course. Can you send me list of stopwords for English, where I can see examples? My e-mail didaio [at] google.com.

    A list of common stopwords for translation is already in the mail for you!

    >I doesn’t have any problems with Gmail, what written with AJAX too 🙂 And I open all links in Feeds2.0 w/o tabs. Today I haven’t any problem with clicking next….

    We’re glad that you don’t have these problems today!

    >I know it. But I don’t understand how I posts marked as read automatically? It’s happened only when I open some article or for all articles in one feed when I open this feed?

    Posts are marked as read whenever you click on the title of a post and read it in a new window.

    >> Did you turn “Personalization On” to sort posts according to your interests?

    >No 🙂 TY

    No problem. In terms of usability do you think that we should make the personalization button bigger?

    The Feeds 2.0 team

  7. didaio on

    > In terms of usability do you think that we should make the personalization button bigger?

    For me it is more convenient to see marks without transition to Personalization Mode.
    1. To not mark twice.
    2. And to see my previous opinion about articles.

  8. didaio on

    //We’re glad that you don’t have these problems today!//
    I have this problem again. May be it’s happened because of low-speed (48 Kbps) connection?

  9. Doron on

    Hey guys,
    i would do my best to help you out with those stop words.
    send them in english to dleaper [at] gmail.com

    as far as i know it might me a bit more complicated then what happens in english, but i’ll give it a try.

    however, i won’t be able to see how it works out inside feeds 2.0 till i get into the beta.

    as for the unicode support you mentioned, you mean your system works only with unicode enocoding (UTF-8 and such), or all the other encoding (such as hebrew windows-1255 and hebrew ISO 8859-8-i and others). i am not a web developer, so i am asking it from what i know about browsers supports of different languages.
    Doron.

  10. feeds2 on

    Hi Doron,

    We really appreciate your offer to help with the Hebrew language and we have already sent you a list of common english stopwords for translation to Hebrew.

    By unicode we mean that Feeds 2.0 supports all the multibyte languages (Chinese, Japanese, etc) and internally it also represents all other languages (from English and French to Greek and Hebrew etc) also with multibyte character encoding. Hence there is a common character set representation for all the languages which means that Feeds 2.0 can gather feeds written in any language, and can display them without any compatibility problems with modern versions of (almost) all browsers.

    The Feeds 2.0 team

  11. […] Comments (RSS) « So, what’s the technology behind Feeds 2.0 personalization engine ? […]

  12. […] It’s killer features are memetracking (clustering related items), tag cloud (view all posts by a topic tag), and personalisation (tracking reading behaviour plus taking into account manual voting of items)…there is also a visualisation tool. […]

  13. […] A technical discussion on Feeds 2.0 is available here. […]

  14. areti on

    hello!i found this page on a research i am doing for personalization algorithms.if anybody can send me an algorithm it would be quite helpfull cause i have to do an exercise on it.thanks!!!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: