大數據資料處理(Big Data Information Processing)
(Hadoop Map&Reduce Practices with Windoop and AWS EMR)

成績(Score)



上課時間 (Class period)
Time: Wed, 9:10am~12:00am, Room: I627

調課(Rescheduling Classes ):

  • 2019/10/25 (Thursday,2,3,4 (9:10am~12:00am))=> 2019/10/18 (Thursday 7,8,9 (7:10pm~10:00pm)
  • Prof. Jing-Doo Wang Attend AWS 學院計劃大中華區教育專家研討會 (10/25-10/26)


    教科書 (Text Book):



  • 從大數據到人工智慧:理論及Spark實作(熱銷版)(二版),作者: 鄧立國, 佟強 ,出版社:佳魁資訊 出版日期:2019/08/25 (ISBN:9789863797692)
  • 科技巨頭:Hadoop+Spark大規模實際運作進行式

  • 參考書 (Reference Book):


  • 大數據基礎與實務 (Big Data Fundamentals and Practices),2017, 胡嘉璽 著,ISBN:9789869527767, 普林斯頓 (高立圖書)

  • 大數據分析與應用實戰:統計機器學習之資料導向程式設計 (作者: 鄒慶士)(年份:2019)(東華書局:謝韡韜 wthsieh@thunghus.com.tw, 04-2285-5920)

  • JOB: Artificial Intelligence (AI) = Big Data + Data Science + Machine Learing + Cloud Computing)
    1111人力銀行 (Big Data)
    104人力銀行 (Big Data)

  • AWS Educate Program

  • 參考課程(MOOCs) 磨課師線上課程
    (ewant) 亞洲大學 大數據資料處理 –Hadoop MapReduce 程式設計與資料視覺化 教師: 王經篤,何承遠)


    授課內容(Contents):

  • Hadoop Ecosystem 系列文簡介(From: stana團隊就是有亦思2017-12-04)

  • Hadoop 課程:赵强老师:大数据(Hadoop+Spark)从入门到精通系列课程(From:Udemy)
  • Hadoop Ecosystem: Hadoop Tools for Crunching Big Data

  • Hadoop
  • The Apache™ Hadoop® project

  • Day 2 - Hadoop Ecosystem 之 Hadoop 介紹

  • A href="https://data-flair.training/blogs/hadoop-cluster/"> What is Hadoop Cluster | Hadoop Cluster Architecture
  • (From:BY DATAFLAIR TEAM · NOVEMBER 14, 2018)
  • 認識大數據的黃色小象幫手 –– Hadoop(2015/03/12 )
  • Big Data & Hadoop Full Course - Learn Hadoop In 10 Hours | Hadoop Tutorial For Beginners | Edureka (From:YouTube)

  • (Notes for MOOCs: jdwang) Hadoop MapReduce Programming with Java

    (Single Node in MS-Window) [2] Windoop Download, Installation and Test(pdf解說)

    Download: (Hadoop 2.7.1) windoop_2.7.1_with_HBase_jre8_x64_zh_TW.7z (感謝 Windoop 林奇暻 先生提供)

    [2-1-1]:Windoop Download
    [2-1-2]:Windoop Install
    [2-1-3]:Windoop Start.bat
    NameNode HDFS Web http://localhost:50070/
    Resource Manager Web http://localhost:8088
    [2-1-4]:Windoop Program Testing : PresentElection

  • WordCount Examples WordCount_jdwang_2016_10_12.zip

    匯入外部 jar
    1. "\windoop\hadoop\share\hadoop\common\*.jar
    2. "\windoop\hadoop\share\hadoop\common\lib\*.jar
    3. "\windoop\hadoop\share\hadoop\hdfs\*.jar
    4. "\windoop\hadoop\share\hadoop\mapreduce\*.jar
    5. "\windoop\hadoop\share\hadoop\yarn\*.jar

    修改"WordCount_jdwang.java" 參數:"input output_學號"
    環境:
    HADOOP_HOME=>${eclipse_home}\..\hadoop
    PATH=> %PATH%;${eclipse_home}\..\hadoop\bin


  • Windoop PC Cluster Setup

    (Please modify the origal "Windoop" into "Windoop_Localhost")
    windoop_ClusterIP_10.36.27.170.7z(Modified From:Windoop 林奇暻)
    SpeedUp Problem?
    Commercial Product: CloudEra +Hortonwork)
    Apache Hadoop兩大廠Cloudera與Hortonworks宣布合併
    1. Check IP: DOS> ipconfig
    2. (Make sure that the all IPs of PCs are in the same internet segment (e.g. 172.168.115.?))
    3. Modify the file "windoop\hadoop\etc\hadoop\core-site.xml"
    4. ("localhost"=> IP)
    5. Modify the file "windoop\hadoop\etc\hadoop\yarn-site.xml"
    6. ("localhost"=> IP)
    7. Modify the file "windoop\hadoop\etc\hadoop\hdfs-site.xml"
    8. ("localhost"=> IP)
      "windoop/dfs/name"
      "windoop/dfs/data"

  • (加速speedUp)(I627 PCs)
    Windoop (Windoop Cluster)
    Master Node IP=> DOS> ipconfig (查出作為 MasterNode (namenode+resource manager)之 IP)
    windoop_ClusterIP_10.36.27.170.7z(Modified From:Windoop 林奇暻)
    參考: 環境設定(Windoop 2. 0)(感謝:賴敬勳,王俊平,楊松儒 環境測試)

  • 參考(References)
  • Hadoop Cluster Setup
  • What is Hadoop Cluster? Hadoop Cluster Setup and Architecture | Hadoop Training | Edureka
  • Hadoop 簡易架設不求人 (From: 作者:楊德倫 / 臺灣大學計算機及資訊網路中心教學程式設計組幹事(VirtualBox+Linux+Hadoop)
  • How to Install and Set Up a 3-Node Hadoop Cluster(From: Florent Houbart)
  • (Windoop 2. 0)(感謝(Thanks):賴敬勳,王俊平,楊松儒 環境測試)
    Windoop_WorkerNode_jdwang2018_10_16.zipThanks for (陳咨雅)
    Horizontal Scale Up : How to add worker nodes efficiently?



  • What is Amazon EMR and how can I use it for processing data?
  • (From:YouTube)


  • Introduction to Map/Reduce (Part 1/3)
  • (From:Prof. Patterson,YouTube)

  • Creating a Java Program for Map/Reduce (Part 2/3)
  • (From: Prof. Patterson, YouTube)

  • Running a custom java jar on an AWS EMR cluster (Part 3/3)
  • (From:Prof. Patterson, YouTube)


  • Chapter 6. Hadoop HDFS commands
  • HDFSOperation.7z

  • HADOOP_HOME ${eclipse_home}\..\hadoop
  • PATH %PATH%;${eclipse_home}\..\hadoop\bin
  • 評分(Score):
  • Chapter 7 Hadoop MapReduce
  • (optional)Google:Machine Learning Crash Course


  • Python 最強入門邁向數據科學之路 -- 王者歸來 (火力加強版) (深智數位股份有限公司,2019-04-22,ISBN:9869772609)
    TDCS_06A_Download_WebRobot.7z(Python)


    Text Categorization(classification): Spam mails,
    Ministry of Education Intercollegiate AI CUP 2019 – Artificial Intelligence Analysis and Classification of Thesis
    教育部全國大專校院人工智慧競賽(AI CUP 2019)-人工智慧論文機器閱讀競賽之論文分類


    (教育部智慧聯網技術與應用人才培育計畫-107年度磨課師課程發展計畫(國立成功大學 電機系 李順裕 教授)


  • FREE COURSE:Intro to Hadoop and MapReduce byCloudera(UDACITY)
  • Big Data 2014: Introduction to MapReduce(Big Data 2014: Introduction to MapReduce)
  • How to Download & Install Java JDK 8 in Windows(From:Guru99)
  • Taichung Traffic Big-Data (Demo)
    Taichung Traffic Big-Data (Course)
    Taichung Traffic Big-Data (Demo/en)
    Taichung Traffic Big-Data (MRP/en)
    Account Management

  • GeoPandas
    Leaflet
    台中市年長者(55-85) (東海大學楊朝棟教授)
    pm2.5 (東海大學楊朝棟教授)
    ETC(東海大學楊朝棟教授)