Rubyで書くMovieLensデータセットでの推薦

前回は,データセットの準備まででした.
これで準備が整ったので,これまでのプログラムを使って計算することができます.

MovieLensデータセットでの推薦

まずは87番さんのデータをみてみましょう.

movie_lens_critics = Marshal.load(File.open('movie_lens.dump'))
pp movie_lens_critics[87]

いざ,実行!

% ./movie_lens.rb
{"Evil Dead II (1987)"=>2.0,
 "Strictly Ballroom (1992)"=>3.0,
 "Batman & Robin (1997)"=>4.0,
 ...
 <snip>
 ...
 "Return of the Pink Panther, The (1974)"=>4.0,
 "Net, The (1995)"=>5.0,
 "Lost World: Jurassic Park, The (1997)"=>3.0}

OKですね.


では,87番さんへ推薦をしてみます.

require 'fast_recommender'
movie_lens_critics = Marshal.load(File.open('movie_lens.dump'))
recommender = My::FastRecommender.new
pp recommender.get_recommendations(movie_lens_critics, 87).slice(0...30)

実行!

% ./movie_lens.rb
[["Saint of Fort Washington, The (1993)", 5.0],
 ["They Made Me a Criminal (1939)", 5.0],
 ["Santa with Muscles (1996)", 5.0],
 ["Star Kid (1997)", 5.0],
 ["Boys, Les (1997)", 5.0],
 ["Entertaining Angels: The Dorothy Day Story (1996)", 5.0],
 ["Marlene Dietrich: Shadow and Light (1996) ", 5.0],
 ["Great Day in Harlem, A (1994)", 5.0],
 ["Legal Deceit (1997)", 4.89884443128923],
 ["Letter From Death Row, A (1998)", 4.81501908224271],
 ["Hearts and Minds (1996)", 4.73210829839414],
 ["Pather Panchali (1955)", 4.69624446649087],
 ["Lamerica (1994)", 4.65239706102676],
 ["Leading Man, The (1996)", 4.53872369347481],
 ["Mrs. Dalloway (1997)", 4.5350813391061],
 ["Innocents, The (1961)", 4.53233761257299],
 ["Casablanca (1942)", 4.52799857474708],
 ["Everest (1998)", 4.51027014971986],
 ["Dangerous Beauty (1998)", 4.49396775542844],
 ["Wallace & Gromit: The Best of Aardman Animation (1996)", 4.48515130180134],
 ["Wrong Trousers, The (1993)", 4.46328746129022],
 ["Kaspar Hauser (1993)", 4.45097943694103],
 ["Usual Suspects, The (1995)", 4.43107907117952],
 ["Maya Lin: A Strong Clear Vision (1994)", 4.42752068286496],
 ["Wedding Gift, The (1994)", 4.41487078459207],
 ["Affair to Remember, An (1957)", 4.37744525265647],
 ["Good Will Hunting (1997)", 4.37607111044777],
 ["As Good As It Gets (1997)", 4.3760110990014],
 ["Anna (1996)", 4.37414617950097],
 ["Close Shave, A (1995)", 4.3674372665046]]

本文のp.28と同じ結果になりました.


更に,アイテム相関値を求めてから推薦値を求めてみましょう

movie_lens_items = recommender.get_item_similarity(movie_lens_critics, { :how_many => 50 })
pp recommender.get_item_similarity_recommendations(movie_lens_critics, 87, { :item_similarity => movie_lens_items }).slice(0...30)

えい!

% ./movie_lens.rb
[["Stand by Me (1986)", 5.0],
 ["Shine (1996)", 5.0],
 ["Robin Hood: Prince of Thieves (1991)", 5.0],
 ["1-900 (1994)", 5.0],
 ["Fresh (1994)", 5.0],
 ["Toy Story (1995)", 5.0],
 ["What's Eating Gilbert Grape (1993)", 5.0],
 ["Rock, The (1996)", 5.0],
 ["Denise Calls Up (1995)", 5.0],
 ["Silence of the Lambs, The (1991)", 5.0],
 ["Reservoir Dogs (1992)", 5.0],
 ["Shining, The (1980)", 5.0],
 ["Assignment, The (1997)", 5.0],
 ["Scream (1996)", 5.0],
 ["Sense and Sensibility (1995)", 5.0],
 ["Vertigo (1958)", 5.0],
 ["Titanic (1997)", 5.0],
 ["House of the Spirits, The (1993)", 5.0],
 ["Usual Suspects, The (1995)", 5.0],
 ["Police Story 4: Project S (Chao ji ji hua) (1993)", 5.0],
 ["Sling Blade (1996)", 5.0],
 ["Rumble in the Bronx (1995)", 5.0],
 ["Before the Rain (Pred dozhdot) (1994)", 5.0],
 ["Sword in the Stone, The (1963)", 5.0],
 ["Day the Sun Turned Cold, The (Tianguo niezi) (1994)", 5.0],
 ["Ed's Next Move (1996)", 4.875],
 ["Anna (1996)", 4.83333333333333],
 ["Dark City (1998)", 4.8],
 ["Broken English (1996)", 4.75],
 ["Flower of My Secret, The (Flor de mi secreto, La) (1995)", 4.75]]

本文のp.29と同じ結果になりました.


しかし,上記2つのロジックでトップ30に両方入る映画を調べると...

user_base = recommender.get_recommendations(movie_lens_critics, 87).slice(0...30)
item_base = recommender.get_item_similarity_recommendations(movie_lens_critics, 87, { :item_similarity => movie_lens_items }).slice(0...30)
puts user_base.map { |movie, critic| movie } & item_base.map { |movie, critic| movie }

結果はなんと!

% ./movie_lens.rb
Usual Suspects, The (1995)
Anna (1996)

2つだけでした.
ここまで結果が変わるってのも正直どうなんでしょうか...


これで2章「推薦を行う」は終わりです.
3章はまだ勉強中なので,期間を空けてから再開します.


明日はLL Futureですね.
私も行きますよ〜.