- Hadoop+Spark大數據巨量分析與機器學習整合開發實戰, 作者:林大貴 ,出版社:博碩 ,出版日期:2015/11/0
- http://grouplens.org/datasets/movielens/
- mkdir -p ~/workspace/Recommend/data
- cd ~/workspace/Recommend/data
- wget http://files.grouplens.org/datasets/movielens/ml-100l.zip
- unzip -j ml-100k
- cd ~/workspace/Recommand/data
- 11-6 (查看匯入資料) (11-6To11-9 YouTube)
- spark-shell
- scala>val rawUserData = sc.textFile("u.data")
- scala>rawUserData.first()
- scala>rawUserData.take(5).foreach(println)
- scala>rawUserData.map(_.split('\t')(0).toDouble).stats())
- scala>rawUserData.map(_.split('\t')(1).toDouble).stats())
- scala>rawUserData.map(_.split('\t')(2).toDouble).stats())
- 11-7 (ALS.train)
- scala>import org.apache.spark.mllib.recommendation.ALS
- scala>import org.apache.spark.mllib.recommendation.Rating
- scala>val rawRatings= rawUserData.map(_.split("\t").take(3))
- scala>val ratingRDD= rawRatings.map{case Array(user, movie, rating)=>Rating(user.toInt, movie.toInt, rating.toDouble)}
- scala>val model= ALS.train(ratingsRDD, 10, 10, 0.01)
- 11-8 (推薦)
- scala>model.recommendProducts(196,5).mkString("\n")
- scala>model.predict(194,164)
- scala>model.recommendProducts(464,5).mkString("\n")
- 11-9 (顯示推薦電影名稱 )
- scala>val itemRDD = sc.textFile("u.item")
- scala>val movieTitle = itemRDD.map(line=>line.split("\\|").take(2)).map(array=>(array(0).toInt,array(1))).collectAsMap()
- scala> movieTitle.take(5).foreach(println)
- scala> movieTitle(146)
- scale>model.recommendProducts(196,5).map(rating=> (rating.product, movieTitle(rating.product), rating.rating)).foreach(println)
- scala>model.recommendUsers(464,5).mkString("\n")
- 11-10 建立Recommend 專案
- 11-15,11-16,11-17 如何決定最佳 Rank, LookNumber, Lamda for Recommend ALS.train()