Thursday, January 17, 2008

Yahoo finds fault with Google's secret sauce

Yahoo Finds Fault with Google's Secret Sauce


As complex as Google's PageRank may be, search experts at Yahoo seem to think it's not complex enough. Based on patent filings, Yahoo is dabbling in ranking algorithms that incorporate more user behavior data in advance of the company's next run at toppling Google's haloed relevance.


Editor's Note: Yahoo, as usual, is fairly confident in its ability to create a search algorithm that meets or exceeds the quality of Google's. Lots of search players have felt the same and have yet to deliver. Do you think Yahoo will ever catch Google or is it just too late to take down the Mountain View Monolith. Let us know in the comment section.

Seeing will be believing when it happens, of course, as Google is highly secretive about how its search engine calculates PageRank. If history is any indication, they're already way ahead on behavioral factoring.

Nonetheless, Yahoo can afford the best search engineers in the business (if they can get them before Google does, anyway) and the patent filings shed some light on how PageRank is currently calculated and ways it might be improved in the future.

Bill Slawski, Director of Search Marketing at KeyRelevance, goes into painstaking detail of Yahoo's user data challenges at his SEObytheSea blog. Patent language, especially when dealing with algorithms, can be confusing and dense, so we'll just highlight a few interesting points and leave the lexicographical deciphering to you.

Some Yahoo assumptions about PageRank and flaws associated:
  • Internal and external links are often weighed equally even though internal links can be less reliable and more self-promotional. Some links, like disclaimer links, are rarely followed.
  • PageRank ignores that webpages are often purchased and repurposed, decay or become less valuable over time at variable rates.
  • Current calculations, like TrustRank, are engineered more to combat webspam than to reflect actual user behavior.
  • Sometimes PageRank deals with links in bulk, aggregating according host or domain, also known as blocked PageRank.

What Yahoo plans to do about it:
  • Measure link weight – influenced by the frequency with which users follow a link
  • Note when links are ignored and users leave (teleport) to another page of their choosing
  • Calculate the probability that a user stops and reads a webpage rather than views it and moves on.
  • Incorporate user data into the algorithm – "User Sensitive PageRank could reflect "the navigational behavior of the user population with regard to documents, pages, sites, and domains visited, and links selected."
  • Personalize PageRank based on demographic information – age, gender, income, user location)
  • Emphasize recent information
  • Weigh anchor text more heavily – the patent filing calls anchor text "one of the most useful features used in ranking retrieved Web search results"

Sunday, January 13, 2008

January Newsflash

  • Python has been declared as programming language of 2007. It was a close finish, but in the end Python appeared to have the largest increase in ratings in one year time (2.04%). There is no clear reason why Python made this huge jump in 2007. Last month Python surpassed Perl for the first time in history, which is an indication that Python has become the "de facto" glue language at system level. It is especially beloved by system administrators and build managers. Chances are high that Python's star will rise further in 2008, thanks to the upcoming release of Python 3.

  • A couple of interesting trends can be derived from the 2007 data. First of all, languages without automated garbage collection are losing ground rapidly. The most prominent examples of languages with explicit memory management, C and C++, both lost about 2% in one year. Another trend is that the battle between scripting languages seems to be going on in the background. There is a continuous flow of new scripting languages. In 2006, Ruby entered the main scene, followed this year by Lua. In the top 50, Groovy and Factor are new kids on the block. None of these new scripting languages seem to stay permanently, they are just replaced by successors.

  • What were the big movers and shakers in 2007? The big winners are Lua (from 46 to 16), Groovy (from 66 to 31), Focus (from 78 to 41), and Factor (new at 45). The most prominent shakers are ABAP (from 15 to 29) and IDL (from 23 to 48).

  • What is to be expected in 2008? And, what became of the forecasts for 2007? At the beginning of 2007, I thought C# and D would become the winners and Perl and Delphi the losers. C# was indeed one of the big winners, and Perl one of the big losers. But the forecasts for D and Delphi were completely wrong. There has been no breakthrough for D. On the other hand, Delphi reclaimed a top 10 position... What about 2008? C, C++ and Perl will continue to fall. C and C++ because they have no automated garbage collection. C++ will get an extra push down because Microsoft is not actively supporting the language anymore. Perl is just dead. Java and C# will eventually be the 2 most popular languages. So I expect them to rise further in 2008. What new languages will enter the top 20 in 2008 is a wild guess, but I think ActionScript and Groovy are really serious candidates.

  • Nguyen Quang Chien suggested to rename the OCaml entry to Caml. This has been done. Thanks Nguyen!

  • In the tables below some long term trends are listed about categories of languages. The tables show that dynamically typed object-oriented languages are still becoming more popular.

    Category Ratings January 2008 Delta January 2007
    Object-Oriented Languages 56.1% +4.0%
    Procedural Languages 40.9% -3.6%
    Functional Languages 1.9% +0.2%
    Logical Languages 1.1% -0.6%


    Category Ratings January 2008 Delta January 2007
    Statically Typed Languages 56.2% -1.5%
    Dynamically Typed Languages 43.8% +1.5%

TIOBE declares Python as programming language of 2007 !!

The TIOBE Programming Community index gives an indication of the popularity of programming languages. The index is updated once a month. The ratings are based on the world-wide availability of skilled engineers, courses and third party vendors. The popular search engines Google, MSN, Yahoo!, and YouTube are used to calculate the ratings. Observe that the TIOBE index is not about the best programming language or the language in which most lines of code have been written.

The index can be used to check whether your programming skills are still up to date or to make a strategic decision about what programming language should be adopted when starting to build a new software system. The definition of the TIOBE index can be found here.

Position
Jan 2008
Position
Jan 2007
Delta in PositionProgramming LanguageRatings
Jan 2008
Delta
Jan 2007
Status
1 1 Java 20.849% +1.69% A
2 2 C 13.916% -1.89% A
3 4 (Visual) Basic 10.963% +1.84% A
4 5 PHP 9.195% +1.25% A
5 3 C++ 8.730% -1.70% A
6 8 Python 5.538% +2.04% A
7 6 Perl 5.247% -0.99% A
8 7 C# 4.856% +1.34% A
9 12 Delphi 3.335% +1.00% A
10 9 JavaScript 3.203% +0.36% A
11 10 Ruby 2.345% -0.17% A
12 13 PL/SQL 1.230% -0.34% A
13 11 SAS 1.204% -1.14% A
14 14 D 1.172% -0.16% A
15 18 COBOL 0.932% +0.30% A
16 46 Lua 0.579% +0.48% A--
17 22 FoxPro/xBase 0.506% +0.05% B
18 19 Pascal 0.456% -0.11% B
19 16 Lisp/Scheme 0.413% -0.26% A--
20 27 Logo 0.386% +0.07% B

Sunday, January 6, 2008

IBM developerWorks: Mastering regular expressions in PHP, Part 1

The IBM developerWorks website has posted the first part of a series they've created to help PHP developers become more informed about what regular expressions are and how they can harness their power for their applications.

Pattern matching is such a common chore for software that a special shorthand â€" regular expressions â€" has evolved to make light work of the task. Learn how to use this shorthand in your code here in Part 1 of this "Mastering regular expressions in PHP" series.

In this first part of the series, they look at the basics - the idea behind regular expressions, some of the common operators, the PHP functions to use them and example of how to use them to match/split out strings and capture just the data you need from the given input.

Thursday, January 3, 2008

CakePHP 1.2 Release (and a New Site Design)

As Chris Hartjes points out there's a new release of the popular PHP framework CakePHP (as well as a new web site design).

You can grab the latest download directly from the homepage or look into the manual to find out more about the framework and how it can be used.

Rails for PHP Developers Website Launched

Mike Naberezny has start up a new resource to try to bridge some of the gap between PHP and Ruby and to help developers of either to get a bit more insight into the others' side - Rails for PHP Developers (based on the book published by the Pragmatic Programmers).

Rails for PHP Developers is a new site for PHP developers who are also interested in Rails and Ruby. PHP and Ruby are great complementary tools that are sometimes seen as adversarial, which is really unfortunate. We use both and we'll be writing regular articles to help cross-pollinate ideas and promote collaboration between the communities.

There's already some good content there - useful perlisms in ruby, a look at PHP object attributes and some information about the release of the site itself.

Tuesday, January 1, 2008

The Web's Most, Biggest, Best, and Worst of 2007

Yes, another year-end retrospective

2007 was a frenzied year for all things digital, and could be marked as when the revolution really began to take hold. Social media took center stage, impacting everything from politics to major corporate maneuvers to raising awareness of social causes.


Editor's Note:Superlatives are always a matter of opinion. What's the best, worst, biggest, or most to one person is trivial to another. Anything you'd like to add to the list? Let us know in the comments section.

There were lawsuits, mysteries, legal abuses, policy shifts, embarrassments, scandals, oppressions, miscalculations, bubble discussions, and significant innovations. All and all, 2007 was a big year for anybody with a stake on the Net.

So, without further ado, we present the Most, the Biggest, the Best, and the Worst of 2007.