大數據資料處理(Big Data Information Processing)
(Hadoop Map&Reduce Practices with Windoop and AWS EMR)

成績(Score)



上課時間 (Class period)
(三)234 I627 ( Wed. 234 I627)


教科書 (Text Book):


  • Python+Spark 2.0+Hadoop 機器學習與大數據分析實戰,林大貴,出版商: 博碩,出版日期: 2016-10-03,語言: 繁體中文,ISBN: 9864341537,ISBN-13: 9789864341535
    部落格 http://pythonsparkhadoop.blogspot.tw , 
    (Facebook) Python+Spark 2.0+Hadoop 機器學習與大數據分析社團
    P21622_example.zip


  • 參考書 (Reference Book):


  • 科技巨頭:Hadoop+Spark大規模實際運作進行式 作者: 譚磊, 范磊, 出版社:佳魁資訊 出版日期:2019/01/16
  • 雲端&區塊鏈必備技能 Hadoop 大數據高效處理實戰範典作者: 譚磊, 范磊, 出版社:佳魁資訊 出版日期:2020/02/06
  • 大數據分析與應用實戰:統計機器學習之資料導向程式設計 (作者: 鄒慶士)(年份:2019)(東華書局:謝韡韜 wthsieh@thunghus.com.tw, 04-2285-5920)
  • 從大數據到人工智慧:理論及Spark實作(熱銷版)(二版),作者: 鄧立國, 佟強 ,出版社:佳魁資訊 出版日期:2019/08/25 (ISBN:9789863797692)
  • 入手大數據DB的輕鬆選擇:HBase快上手作者: 楊曦 ,出版社:佳魁資訊 出版日期:2019/10/04

  • JOB: Artificial Intelligence (AI) = Big Data + Data Science + Machine Learing + Cloud Computing)
    1111人力銀行 (Big Data)
    104人力銀行 (Big Data)

  • AWS Educate Program

  • 參考課程 (Reference course)
    1. EDUCBA - Hadoop Tutorials - Free guides & resources for students
    2. Hadoop Training Program (20 Courses, 14+ Projects) - This Hadoop Certification Training includes 20 courses, 14 Projects with 135+ hours of video tutorials, and Lifetime Access. You will also get verifiable certificates (unique certification number and your unique URL) when you complete each of the 20 courses. This Hadoop certification course will help you learn about MapReduce, HDFS, Hive, Pig, Mahout, NoSQL, Oozie, Flume, Storm, Avro, Spark, Splunk, Sqoop, Cloudera.


    授課內容(Contents):

  • Hadoop Ecosystem 系列文簡介(From: stana團隊就是有亦思2017-12-04)

  • Hadoop 課程:赵强老师:大数据(Hadoop+Spark)从入门到精通系列课程(From:Udemy)
  • Hadoop Ecosystem: Hadoop Tools for Crunching Big Data

  • Hadoop
  • The Apache™ Hadoop® project

  • Day 2 - Hadoop Ecosystem 之 Hadoop 介紹

  • A href="https://data-flair.training/blogs/hadoop-cluster/"> What is Hadoop Cluster | Hadoop Cluster Architecture
  • (From:BY DATAFLAIR TEAM · NOVEMBER 14, 2018)
  • 認識大數據的黃色小象幫手 –– Hadoop(2015/03/12 )
  • Big Data & Hadoop Full Course - Learn Hadoop In 10 Hours | Hadoop Tutorial For Beginners | Edureka (From:YouTube)
  • 學習Hadoop——MapReduce介紹
  • 了解Hadoop裡的MapReduce到底是什麼?(From:Alan Tsai 的學習筆記)

  • (Notes for MOOCs: jdwang) Hadoop MapReduce Programming with Java

    (Single Node in MS-Window) [2] Windoop Download, Installation and Test(pdf解說)

    Download: (Hadoop 2.7.1) windoop_2.7.1_with_HBase_jre8_x64_zh_TW.7z (感謝 Windoop 林奇暻 先生提供)

    [2-1-1]:Windoop Download
    [2-1-2]:Windoop Install
    [2-1-3]:Windoop Start.bat
    NameNode HDFS Web http://localhost:50070/
    Resource Manager Web http://localhost:8088
    [2-1-4]:Windoop Program Testing : PresentElection

  • WordCount Examples WordCount_jdwang_2016_10_12.zip

    匯入外部 jar
    1. "\windoop\hadoop\share\hadoop\common\*.jar
    2. "\windoop\hadoop\share\hadoop\common\lib\*.jar
    3. "\windoop\hadoop\share\hadoop\hdfs\*.jar
    4. "\windoop\hadoop\share\hadoop\mapreduce\*.jar
    5. "\windoop\hadoop\share\hadoop\yarn\*.jar

    修改"WordCount_jdwang.java" 參數:"input output_學號"
    環境:
    HADOOP_HOME=>${eclipse_home}\..\hadoop
    PATH=> %PATH%;${eclipse_home}\..\hadoop\bin



  • Hadoop Cluster in Linux
    Python+Spark 2.0+Hadoop 機器學習與大數據分析實戰,林大貴,出版商: 博碩,出版日期: 2016-10-03,語言: 繁體中文,ISBN: 9864341537,ISBN-13: 9789864341535
    部落格 http://pythonsparkhadoop.blogspot.tw , 


    Windoop PC Cluster Setup
    (Please modify the origal "Windoop" into "Windoop_Localhost")
  • Windoop (Windoop Cluster)
  • Commercial Product: CloudEra +Hortonwork)
    Apache Hadoop兩大廠Cloudera與Hortonworks宣布合併


    參考(References)
  • Hadoop Cluster Setup
  • What is Hadoop Cluster? Hadoop Cluster Setup and Architecture | Hadoop Training | Edureka
  • Hadoop 簡易架設不求人 (From: 作者:楊德倫 / 臺灣大學計算機及資訊網路中心教學程式設計組幹事(VirtualBox+Linux+Hadoop)
  • How to Install and Set Up a 3-Node Hadoop Cluster(From: Florent Houbart)
  • (Windoop 2. 0)(感謝(Thanks):賴敬勳,王俊平,楊松儒 環境測試)
    Windoop_WorkerNode_jdwang2018_10_16.zipThanks for (陳咨雅)
    Horizontal Scale Up : How to add worker nodes efficiently?



    Amazon EMR 管理指南
    Amazon EMR 開始使用

    Introduction to Map/Reduce (Part 1/3)(From: Prof. Patterson, 2017)
    Creating a Java Program for Map/Reduce (Part 2/3)(From: Prof. Patterson, 2017)
    (*)Running a custom java jar on an AWS EMR cluster (Part 3/3)(From: Prof. Patterson, 2017)

    Elastic MapReduce 的運作方式
    如何在五分鐘內透過AWS的EMR服務快速開啟一個Hadoop叢集?
    Amazon EMR - Amazon Web Services
    Amazon EMR Hadoop Demonstration
    100. How to Launch Amazon EMR Cluster with sample data in AWS EMR service
  • What is Amazon EMR and how can I use it for processing data?
  • (From:YouTube)


  • Introduction to Map/Reduce (Part 1/3)
  • (From:Prof. Patterson,YouTube)

  • Creating a Java Program for Map/Reduce (Part 2/3)
  • (From: Prof. Patterson, YouTube)

  • Running a custom java jar on an AWS EMR cluster (Part 3/3)
  • (From:Prof. Patterson, YouTube)


  • Chapter 6. Hadoop HDFS commands
  • HDFSOperation.7z

  • HADOOP_HOME ${eclipse_home}\..\hadoop
  • PATH %PATH%;${eclipse_home}\..\hadoop\bin

  • 參考:專題:『交通部高速公路閘道資料』資訊擷取

    評分(Score):
    國道高速公路- 連續電子匝道間之車行時間分析 (Data Pipeline)
    (1) 篩選特定路段  TDCS_MRP_Statistic_Demo_jdwang2020_12_29.7z (Hadoop;MapReduce)
    (2) 計算24 小時-各時段連續電子匝道間之車行時間 TDCS_MRE_TimeSeries_ClusteringMining_jdwang2020_12_29.7z (Hadoop;MapReduce)
    (3) 統計排序繪圖 (擷取規則與異常) TDCS_MRP_Statistic_Demo_jdwang2020_12_29.7z (Python)