Rubyで書くMovieLensデータセットでの推薦
前回は,データセットの準備まででした.
これで準備が整ったので,これまでのプログラムを使って計算することができます.
MovieLensデータセットでの推薦
まずは87番さんのデータをみてみましょう.
movie_lens_critics = Marshal.load(File.open('movie_lens.dump')) pp movie_lens_critics[87]
いざ,実行!
% ./movie_lens.rb {"Evil Dead II (1987)"=>2.0, "Strictly Ballroom (1992)"=>3.0, "Batman & Robin (1997)"=>4.0, ... <snip> ... "Return of the Pink Panther, The (1974)"=>4.0, "Net, The (1995)"=>5.0, "Lost World: Jurassic Park, The (1997)"=>3.0}
OKですね.
では,87番さんへ推薦をしてみます.
require 'fast_recommender' movie_lens_critics = Marshal.load(File.open('movie_lens.dump')) recommender = My::FastRecommender.new pp recommender.get_recommendations(movie_lens_critics, 87).slice(0...30)
実行!
% ./movie_lens.rb [["Saint of Fort Washington, The (1993)", 5.0], ["They Made Me a Criminal (1939)", 5.0], ["Santa with Muscles (1996)", 5.0], ["Star Kid (1997)", 5.0], ["Boys, Les (1997)", 5.0], ["Entertaining Angels: The Dorothy Day Story (1996)", 5.0], ["Marlene Dietrich: Shadow and Light (1996) ", 5.0], ["Great Day in Harlem, A (1994)", 5.0], ["Legal Deceit (1997)", 4.89884443128923], ["Letter From Death Row, A (1998)", 4.81501908224271], ["Hearts and Minds (1996)", 4.73210829839414], ["Pather Panchali (1955)", 4.69624446649087], ["Lamerica (1994)", 4.65239706102676], ["Leading Man, The (1996)", 4.53872369347481], ["Mrs. Dalloway (1997)", 4.5350813391061], ["Innocents, The (1961)", 4.53233761257299], ["Casablanca (1942)", 4.52799857474708], ["Everest (1998)", 4.51027014971986], ["Dangerous Beauty (1998)", 4.49396775542844], ["Wallace & Gromit: The Best of Aardman Animation (1996)", 4.48515130180134], ["Wrong Trousers, The (1993)", 4.46328746129022], ["Kaspar Hauser (1993)", 4.45097943694103], ["Usual Suspects, The (1995)", 4.43107907117952], ["Maya Lin: A Strong Clear Vision (1994)", 4.42752068286496], ["Wedding Gift, The (1994)", 4.41487078459207], ["Affair to Remember, An (1957)", 4.37744525265647], ["Good Will Hunting (1997)", 4.37607111044777], ["As Good As It Gets (1997)", 4.3760110990014], ["Anna (1996)", 4.37414617950097], ["Close Shave, A (1995)", 4.3674372665046]]
本文のp.28と同じ結果になりました.
更に,アイテム相関値を求めてから推薦値を求めてみましょう
movie_lens_items = recommender.get_item_similarity(movie_lens_critics, { :how_many => 50 }) pp recommender.get_item_similarity_recommendations(movie_lens_critics, 87, { :item_similarity => movie_lens_items }).slice(0...30)
えい!
% ./movie_lens.rb [["Stand by Me (1986)", 5.0], ["Shine (1996)", 5.0], ["Robin Hood: Prince of Thieves (1991)", 5.0], ["1-900 (1994)", 5.0], ["Fresh (1994)", 5.0], ["Toy Story (1995)", 5.0], ["What's Eating Gilbert Grape (1993)", 5.0], ["Rock, The (1996)", 5.0], ["Denise Calls Up (1995)", 5.0], ["Silence of the Lambs, The (1991)", 5.0], ["Reservoir Dogs (1992)", 5.0], ["Shining, The (1980)", 5.0], ["Assignment, The (1997)", 5.0], ["Scream (1996)", 5.0], ["Sense and Sensibility (1995)", 5.0], ["Vertigo (1958)", 5.0], ["Titanic (1997)", 5.0], ["House of the Spirits, The (1993)", 5.0], ["Usual Suspects, The (1995)", 5.0], ["Police Story 4: Project S (Chao ji ji hua) (1993)", 5.0], ["Sling Blade (1996)", 5.0], ["Rumble in the Bronx (1995)", 5.0], ["Before the Rain (Pred dozhdot) (1994)", 5.0], ["Sword in the Stone, The (1963)", 5.0], ["Day the Sun Turned Cold, The (Tianguo niezi) (1994)", 5.0], ["Ed's Next Move (1996)", 4.875], ["Anna (1996)", 4.83333333333333], ["Dark City (1998)", 4.8], ["Broken English (1996)", 4.75], ["Flower of My Secret, The (Flor de mi secreto, La) (1995)", 4.75]]
本文のp.29と同じ結果になりました.
しかし,上記2つのロジックでトップ30に両方入る映画を調べると...
user_base = recommender.get_recommendations(movie_lens_critics, 87).slice(0...30) item_base = recommender.get_item_similarity_recommendations(movie_lens_critics, 87, { :item_similarity => movie_lens_items }).slice(0...30) puts user_base.map { |movie, critic| movie } & item_base.map { |movie, critic| movie }
結果はなんと!
% ./movie_lens.rb Usual Suspects, The (1995) Anna (1996)
2つだけでした.
ここまで結果が変わるってのも正直どうなんでしょうか...
これで2章「推薦を行う」は終わりです.
3章はまだ勉強中なので,期間を空けてから再開します.
明日はLL Futureですね.
私も行きますよ〜.