雲端運算實務(Cloud Computing Practices)
(Hadoop Map&Reduce Practices with Windoop and AWS EC2, S3, EMR)
成績
(四)345 I627 (Thu.345 I627)
LineID : 108_2CloudComputingPractices
This course will change the teaching style from 2020/3/23
Due to the prevention of 「COVID-19」 corona-virus disease,
this course will adapt on-line course via
Microsoft Teams
or videos.
Each of the students in this course will need to handle out his/her video (1-3 minutes),
to show the processes or experimental results after classes every week,
and upload that video to YouTube with permission for sharing (URL) to prove his/her work done every week.
(The details of this course will be announced each week via
Line 108_2CloudComputingPractices )
教科書(Text Book):
Python+Spark 2.0+Hadoop 機器學習與大數據分析實戰,林大貴,出版商: 博碩,出版日期: 2016-10-03,語言: 繁體中文,ISBN: 9864341537,ISBN-13: 9789864341535
部落格 http://pythonsparkhadoop.blogspot.tw ,
(Facebook) Python+Spark 2.0+Hadoop 機器學習與大數據分析社團
P21622_example.zip
授課內容(Contents):
AWS Educate Program
AmazonEC2.html
AmazonS3.html
AWS_Training_Certification.html
Windoop_SingleNode.html
HadoopMapReduce.html
Windoop_Cluster.html (I627)
Hadoop_OnLinux.html
Hadoop Cluster Setup (VirtualBox + VMs)
AmazonEMR.html
20200520 Academy Cloud Foundations v2 (Asia University)
評分 (Score): Submission Delay (Original Score * 0.9 / per day)(Deadline: Delay at most one week)
(15%)(Homework 1)(submit to Moodle, Deadline : 2020/4/2):
(1) AWS Educate : Apply for an AWS Account (Starter account if you have no credit card)
(2) Create two VMs (One MS-Window server(CPU: 4~8 cores, RAM 16GB), One CentOS (CPU: 4~8 cores, RAM 16GB))
(2) Show how to connect these two VMs.
(3) MS-Window server: install Java IDK (Under JDK8)(JDK 9 Failed) + Windoop
(4) Change the project name as "WordCount_YourAsiaID", instead of "WordCount_jdwang") to show your work in Report (PDF)
(5) Show your work and what you have learn with one report embedded with one YouTube video (3~5 Min)(URL should be embedded in your report)
(6) Upload to Moodle
(30%)(Middle Project:
Presentation :2020/4/23
Report : 2020/4/30 (submit to Moodle with your report and presentation shared in Youtube )
TDCS project with MapReduce programming on Single node Windoop on AWS VMs)
Dataset: (1) 2020/3/2-29 (Four weeks) (2) 2020/3/30-4/5 (One week) (4/2-4/5 : Taiwan Holidays)
Big Data Processing Project: TDCS-06A How to use Web Robot(pdf)
Java Web Robot Example: Web Robot(TDCS_WebURLDownload_jdwang_2017_10_20.zip)
Please select one gantry you favor to observe the variations of 24 hours frequency distribution
Choose one Gantry you favor to observe on Google Map
(1) Is there any significant differences existed for every seven days (one week) when you compare the Dataset (1) and (2)?
(2) Can you have the comparison according to different types of vehicles (31,32, 41, 42, 5)?
(3) How is the computation time in AWS VMs? What is the spec of your VM hardware ?
(4) What is the fee (charge) (AWS Bill) for your computation? How do you think about using AWS EC2 for this middle project?
(You may check with AWS Trusted Advisor to adjust your choices)
(5) Report with the results in(1)(2)(3)(4) and explain with your own words via YouTube (URL shared and embedded within your report)
References for Middle Project
(Gantry Information)(國道計費門架座標及里程牌價表104.09.04版.csv)
(How to import the Gantry locations into GoogleMap?)高速公路計費匝道位置-Google Map 匯入教學
The locations of the Gantries in the National freeway Example: "03F-186.0S"=> GantryID="03F1860S"
TDCS Gantry parsing(24 hours)(pdf)
The frequency distribution of "VehicleType", "GantryID" or "Specific GantryID" within 24 Hours)
(1)
Choose one Gantry you favor to observe on Google Map
(Hadoop MapReduce Program: Project for TDCS Gantry parsing (24 hours)
TDCS_GIDSequence_GantryID_VihicleType_Date_Weekday_24Hour_Statistics_jdwang_2018_10_12.zip
(1) (main&mapper){String TargetGantryID = "01F0557N";
(2) modify parameter : input path
(3) modify parameter : output path
Testing Data
(One hour)TDCS_M06A_20161127_230000.csv
(one day: 24 Hours)(201701_1-1.7z)
(2018_9_1-7.7z)
Result: part-r-00000_2018_9_1-7_01F0557N.xlsx
(You can write your own python code to further have these statistics with data visualization)
(15%)(Homework 2: Hadoop Cluster Comparison, Deadline: 2020/5/28)
(0) Windoop Cluster Setup
Windoop_Cluster.html (I627)
(1) AWS EMR Setup
AmazonEMR.html
(2) Run your middle project at Windoop Cluster and AWS EMR, respectively
SpeedUp Experiment (WindoopExecuteTime.xlsx)
SingleNode vs. Multinodes (1,2,4,8)?
small dataset vs. large dataset
many small files (? MB << 128 MB) vs. one packed files (e.g. ?GB)
(3) Comparing Computational Time and Cost via Windoop Cluster or AWS EMR (1 master + 2 workers))
(4) Report (Moodle) + Youtube (3~5 min)
(40%) AWS Cloud Practitioner (AWS ACF) Certification (Deadline; 2020/7/3)
(0) Login AWS Training & Certification Portal (You have received one invitation email)
Registration Link:
20200520 Academy Cloud Foundations v2 (Asia University)
Prepare for Your AWS Certification Exam
(1) AWS ACF hands-on labs
(2) On-Line video Course
(3) white paper
(4) AWS Cloud Practitioner Practices ((Cost: 20 USD))
(5) AWS Cloud Practitioner Exam ( Cost: 100 USD )
AWS Certified Cloud Practitioner
Sample Questions
(5) Report (with AWS ACF Training and ACF practices Exam) + YouTube (3~5 min)