雲端運算實務(Cloud Computing Practices)
(Hadoop Map&Reduce Practices with Windoop and AWS EC2, S3, EMR)

成績 (四)345 I627 (Thu.345 I627)

LineID : 108_2CloudComputingPractices


This course will change the teaching style from 2020/3/23


Due to the prevention of 「COVID-19」 corona-virus disease,
  • this course will adapt on-line course via Microsoft Teams or videos.
  • Each of the students in this course will need to handle out his/her video (1-3 minutes), to show the processes or experimental results after classes every week, and upload that video to YouTube with permission for sharing (URL) to prove his/her work done every week.

    (The details of this course will be announced each week via Line 108_2CloudComputingPractices )


    教科書(Text Book):



  • Python+Spark 2.0+Hadoop 機器學習與大數據分析實戰,林大貴,出版商: 博碩,出版日期: 2016-10-03,語言: 繁體中文,ISBN: 9864341537,ISBN-13: 9789864341535
    部落格 http://pythonsparkhadoop.blogspot.tw , 
    (Facebook) Python+Spark 2.0+Hadoop 機器學習與大數據分析社團
    P21622_example.zip


  • 授課內容(Contents):
  • AWS Educate Program
  • AmazonEC2.html
  • AmazonS3.html
  • AWS_Training_Certification.html
  • Windoop_SingleNode.html
  • HadoopMapReduce.html
  • Windoop_Cluster.html (I627)
  • Hadoop_OnLinux.html
  • Hadoop Cluster Setup (VirtualBox + VMs)
  • AmazonEMR.html
  • 20200520 Academy Cloud Foundations v2 (Asia University)

  • 評分 (Score): Submission Delay (Original Score * 0.9 / per day)(Deadline: Delay at most one week)
  • (15%)(Homework 1)(submit to Moodle, Deadline : 2020/4/2):
    (1) AWS Educate : Apply for an AWS Account (Starter account if you have no credit card)
    (2) Create two VMs (One MS-Window server(CPU: 4~8 cores, RAM 16GB), One CentOS (CPU: 4~8 cores, RAM 16GB)) (2) Show how to connect these two VMs.
    (3) MS-Window server: install Java IDK (Under JDK8)(JDK 9 Failed) + Windoop
    (4) Change the project name as "WordCount_YourAsiaID", instead of "WordCount_jdwang") to show your work in Report (PDF)
    (5) Show your work and what you have learn with one report embedded with one YouTube video (3~5 Min)(URL should be embedded in your report)
    (6) Upload to Moodle


  • (30%)(Middle Project:

    Presentation :2020/4/23
    Report : 2020/4/30 (submit to Moodle with your report and presentation shared in Youtube )
    TDCS project with MapReduce programming on Single node Windoop on AWS VMs)

    Dataset: (1) 2020/3/2-29 (Four weeks) (2) 2020/3/30-4/5 (One week) (4/2-4/5 : Taiwan Holidays)
    Big Data Processing Project: TDCS-06A
    How to use Web Robot(pdf)

    Java Web Robot Example: Web Robot(TDCS_WebURLDownload_jdwang_2017_10_20.zip)



    Please select one gantry you favor to observe the variations of 24 hours frequency distribution
    Choose one Gantry you favor to observe on Google Map
    (1) Is there any significant differences existed for every seven days (one week) when you compare the Dataset (1) and (2)?
    (2) Can you have the comparison according to different types of vehicles (31,32, 41, 42, 5)?
    (3) How is the computation time in AWS VMs? What is the spec of your VM hardware ?
    (4) What is the fee (charge) (AWS Bill) for your computation? How do you think about using AWS EC2 for this middle project?
    (You may check with AWS Trusted Advisor to adjust your choices)
    (5) Report with the results in(1)(2)(3)(4) and explain with your own words via YouTube (URL shared and embedded within your report)


    References for Middle Project
    (Gantry Information)(國道計費門架座標及里程牌價表104.09.04版.csv)
    (How to import the Gantry locations into GoogleMap?)高速公路計費匝道位置-Google Map 匯入教學

    The locations of the Gantries in the National freeway Example: "03F-186.0S"=> GantryID="03F1860S"



    TDCS Gantry parsing(24 hours)(pdf)
    The frequency distribution of "VehicleType", "GantryID" or "Specific GantryID" within 24 Hours)
    (1) Choose one Gantry you favor to observe on Google Map

    (Hadoop MapReduce Program: Project for TDCS Gantry parsing (24 hours)
    TDCS_GIDSequence_GantryID_VihicleType_Date_Weekday_24Hour_Statistics_jdwang_2018_10_12.zip

    (1) (main&mapper){String TargetGantryID = "01F0557N";
    (2) modify parameter : input path
    (3) modify parameter : output path


    Testing Data
    (One hour)TDCS_M06A_20161127_230000.csv
    (one day: 24 Hours)(201701_1-1.7z)
    (2018_9_1-7.7z)
    Result: part-r-00000_2018_9_1-7_01F0557N.xlsx
    (You can write your own python code to further have these statistics with data visualization)



  • (15%)(Homework 2: Hadoop Cluster Comparison, Deadline: 2020/5/28)
    (0) Windoop Cluster Setup
  • Windoop_Cluster.html (I627)

  • (1) AWS EMR Setup
  • AmazonEMR.html

  • (2) Run your middle project at Windoop Cluster and AWS EMR, respectively
    SpeedUp Experiment (
    WindoopExecuteTime.xlsx
    SingleNode vs. Multinodes (1,2,4,8)?
    small dataset vs. large dataset
    many small files (? MB << 128 MB) vs. one packed files (e.g. ?GB)

    (3) Comparing Computational Time and Cost via Windoop Cluster or AWS EMR (1 master + 2 workers))
    (4) Report (Moodle) + Youtube (3~5 min)


  • (40%) AWS Cloud Practitioner (AWS ACF) Certification (Deadline; 2020/7/3)
    (0) Login AWS Training & Certification Portal (You have received one invitation email)
    Registration Link: 20200520 Academy Cloud Foundations v2 (Asia University)
    Prepare for Your AWS Certification Exam
    (1) AWS ACF hands-on labs
    (2) On-Line video Course
    (3) white paper
    (4) AWS Cloud Practitioner Practices ((Cost: 20 USD))
    (5) AWS Cloud Practitioner Exam ( Cost: 100 USD )
    AWS Certified Cloud Practitioner
    Sample Questions
    (5) Report (with AWS ACF Training and ACF practices Exam) + YouTube (3~5 min)