Machine Learning (機器學習)
Outline
Time: 上課時間:(一)234 I627
Class period:Mon.234 I627
jdwang@asia.edu.tw

Score
  • Class Switching (4/29, 9:10am-12:00am => 4/22, 18:00pm- 21:00pm (You just have YouTube On-Line Vedio by yourself))
    2019 International Conference on Soft Computing & Machine Learning(SCML2019) April 26th-29th, Wuhan, China
    (Jing-Doo Wang)Invited Speakers:Title: Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data.
  • (教育部智慧聯網技術與應用人才培育計畫-107年度磨課師課程發展計畫(國立成功大學 電機系 李順裕 教授))

    ECG MOOCs (the 1~3 week) (Provided By Prof. Shuenn-Yuh Lee, NCKU)(Language : Mandarin Chinese, Subtile:English)
    第一週
    16'45" 單元一:醫療照護感測器電路與系統_1_醫療照護感測器
    11'12" 單元二:醫療照護感測器電路與系統_2_醫療照護系統模組
    18'15" 單元三:醫療照護感測器電路與系統_3_醫療照護系統晶片
    第二週
    16'54" 單元一:穿戴式模組(Trianswer )介紹_1_模組製作動機
    16'44" 單元二:穿戴式模組(Trianswer )介紹_2_生理訊號檢測
    10'45" 單元三:穿戴式模組(Trianswer )介紹_3_模組操作
    第三週
    08'46" 單元一:臨床醫學知識_1_臨床醫學知識介紹
    13'54" 單元二:臨床醫學知識_2_認識心臟的節律
    15'01" 單元三:臨床醫學知識_3_如何偵測心臟的節律

    Take a view about these vedio in MOOCs and find the answer for
    What is ECG? EMG? PPG?
    What is the stages of ""?

    Text Book
    (Chinese Version)
    Python 機器學習 (第二版)MP11804 ( Sebastian Raschka, Vahid Mirjalili )(劉立民, 吳建華 譯 博碩)
    博碩文化 中區業務經理 林世昌(Rick Lin)Cell Phone:0925-275-775 LINE ID 0925275775 Mail:rick@drmaster.com.tw
    範例(183MB)

    (English Version)
    Python Machine Learning
    , Sebastian Raschka, Vahid Mirjalili. Packt (2nd Edition), ISBN:9789864343324 )
    博碩文化
    中區業務經理 林世昌(Rick Lin)Cell Phone:0925-275-775 LINE ID 0925275775 Mail:rick@drmaster.com.tw
    Code Example(183MB)


    Reference Book

    Python Deep Learning
    code(Packt)
    Python程式設計學習經典:工程分析x資料處理x專案開發 (碁峰)(作者: 吳翌禎, 黃立政 ISBN:9789864768837)(出版日期:2018/08/23)
    林政益 (scott_lin@gotop.com.tw) 電話: 04-2452-7051 分機 11,
    Code Examples

    Chapter 11: Pandas Package
    How to read data (files: Txt, Execel, HTML) to "series" and "DataFrame" ?
    CH_11_3_1_EX_1_Pandas_Reading_CSV_File.py
    CH_11_3_2_EX_1_Pandas_Reading_Excel_File.py
    CH_11_3_3_EX_1_Pandas_Reading_HTML_File.py

    Chapter 12: Matplotlib (2D or 3D Data Visiualization)
    CH_12_2_EX_1_Basic_Plot_Using_Pyplot.py
    CH_12_3_EX_1_Pyplot_Figure.py
    CH_12_4_EX_1_Pyplot_Subplot_Common_Usages.py


    The IPython notebook

  • Data Mining:Practical Machine Learning Tools and Techniques (4th Edition), 2017,Morgan Kaufmann.(ISBN-13: 978-0128042915) Content
    1. Supervised Learning
    2. Chapter 2 - Training Machine Learning Algorithms for Classification
      Data Sets for Data Mining
      Decision Tree (J48(java version of C4.5)) with "iris.arff"
      iris.data

      How to constuct one of Decision Tree (DT)?
      (Shanon Entropy)(Information Gain)
      Weka with weather(YouTube)
      WEKA Data Sets (weather)

      How to avoid overfitting when tranning a DT?
      (Pre-Pruning and Post-Pruning)

      How to evaluate the preformance of one classifier?
      How to compare the performance of several classifiers?

      K-Nearesst Neightbor
      Training : determining the best value of k that aci=hieves the best performance ?

      Naive Bayes Classifier (Probability model)
      Training : Find the conditional independent probablity of variable.
      Bayesian Decision Theory
      Naive Bayes Classifier (From Tom M. Mitchell)
      Naive_Bayes_training_example_Tennis.htm

      Linear Classifier (Vecor Space Model)
      Training : Find the hyperplanes that can separate the instances of one class from the other classes
      Rocchi Alogirhtm (Linear Classifier)LinearClassifier_jdwang.xls

      Support Vecotr Machine (Vector Space Model)
      Training: Find the support vectors that can maximumize the marge region between two classes

      Chapter 3 - A Tour of Machine Learning Classifiers Using Scikit-Learn
      wine.data

      Chapter 4 - Building Good Training Sets – Data Preprocessing
      Missing Values (NaN, Not a Number)
      What do you do about "Missing Values"? (impute?)
      Categorical data:(1) Nominal feature (color?) (2) ordinal feature (Size?)
      Feature Scaling: Normalization vs. standardization
      How to choose meaningful features? (L1 regularization & L norm2 )
      Dimension Reduction:Feature selection vs. Feature extraction

      Chapter 5 Dimension Reduction
      Principle component analysis (PCA)(unsupervised)
      Linear discriminat analysis (LDA)(Fishr's LDA) (Supervised)

      Chapter 8 - Applying Machine Learning To Sentiment Analysis
      Opinion mining practice: Internet Movie Database (IMDb)
      (0) Download aclImdb_v1.tar.gz
      (1) uncompressed "aclImdb_v1.tar.gz" into the working directory
      (2) download Chapter08.7z
      (3) python package: PyPrimd, NLTK
      conda prompte>pip install pyprind
      conda prompte>pip install nltk
      Vectorization : (n-gram model)(Bag-of-words)(Vect2Word)
      word stemming: Poster stemmer algorithm
      Pattern Weighting:(TF)(DF)(IDF)

      Chapter 9 - Web Application Embedded with Machine learning Mode
      http://raschkas.pythonanywhere.com/
      download Chapter09.7z
      Model persistence (package : packle)
      Anacoda Prompt > pip install flask
      C:\Users\jdwan\Chapter09\1st_flask_app_1>python app.py * Restarting with stat * Debugger is active! * Debugger PIN: 559-490-217 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

      WTForms
      Anacoda Prompt> pip install WTForms
      SQLLite Manager
      PythonAnywhere

      Chapter 10 - Predicting Continuous Target Variables with Regression Analysis
      How to predict the price of one hourse?
      housing.data.txt
      (506 samples in which contain 14 features)
      housing.names Chapter.7z
      conda prompt> pip install seaborn
      Why we need the pairplot(Exploratory Data Analysis, EDA)
      Perrson Product-monment correlation coefficients, Persson'r
      RANdom SAmple Consensus (RANSAC)

      Chapter 12 - Implementing a Multi-layer Artificial Neural Network from Scratch
      MNIST dataset

      Chapter 13 - Parallelizing Neural Network Training with TensorFlow

      Chapter 14 - Going Deeper: The Mechanics of TensorFlow

      Chapter 15 - Classifying Images with Deep Convolutional Neural Networks

      Chapter 16 - Modeling Sequential Data Using Recurrent Neural Networks

    3. UnSupervised Learning
    4. Chapter 11 - Working with Unlabeled Data – Clustering Analysis
      k-means
      prototype-based
      hierarchical-based
      density-based
      How many clusters ?
      elbow method
      hard clustering
      soft clustering (or Fuzzy C-means, FCM)

    5. Reinforcement Learning
    6. Machine Learning Practice with Python

    References
    Difference Between Data mining and Machine learning
    AI, Machine Learning and Deep Learning
    Machine Learning vs. Deep Learning
    Understanding of Convolutional Neural Network (CNN) — Deep Learning(Prabhu,2018)
    Convolutional Neural Network (CNN)
    Recurrent Neural Network (RNN) Tutorial
    Deep Learning (DL) In Healthcare
    MLCC(Machine Learning Crash Course)
    Course Overview -- Microsoft AI Workshop (24 Hours)
    Course Overview -- Microsoft Professional Program for Artificial Intelligence (96 Hours)

    Homework 1 (15%) Weka Practice (2019/3/25, Report to moodle with 1~3 minutes YouTube presentation(URLsharing) )

    (1) Decision Tree (DT) with WEKA Data Sets (labor.arff)
    (2) kNN with WEKA Data Sets (glass.arff)
    (3) Can you justify which classifier, kNN or DT, achieve better performance with (labor.arff) and (glass.arff) ?
    (Which classifier (its parameters ? with CV-5Fold)? "Accuray"? "Receiver Operating Characteristic (ROC) Curve"? "Confusion Matrix"? "F-measure"?"Training Time"? "Testing Time"?)


    Middle Project (30%) Zodiac Signs (Star Signs)" and "Blood Type"(A,B,AB,O) Classification (or Prediction) of Your Friends (2019/4/8 Presentation, 2019/4/15 Report to moodle with 3~5 minutes YouTube presentation(URLsharing))

    (0) How do you collect your own data (Friends)? (FB, Google, IG, Line?)
    =>(At least 50 records=> 5-fold, You have to make As robust as Possible)
    => Is your dataset Robust enough?
    (1) What characteristis (features), e.g. types of personality, religion, sex (M or F), blood type, you choose for predicting "Zodiac Signs (Star Signs)" and "Blood Type"
    (2) Classifying with your own data (Weka ? Python Sci-kit?)(CV K-fold?)(Confution Matrix, Accuracy, F-Measure)
    (3) From your own experimental results, can we pridict the "Zodiac Signs (Star Signs)" or "Blood Type" of some ones
    (4) What are your points to improve your experimental results if possible


    Homework 2 (15%) (Presentation:2019/5/6 (10-15 minutes), Report: 2019/5/13 (Moodle)) Report to moodle with 1~3 minutes YouTube presentation(URLsharing))

    MOOCs (Provided By Prof. Shuenn-Yuh Lee, NCKU)
    ECG MOOCs (the 2nd&3rd week)

    (Choose at least one of three following database).
    PhysioBank ATM(On_line)
    MIT-BIH Arrhythmia Database (raw data and annotation)
    MIT-BIH Arrhythmia Database Directory(raw data and annotation(age, sex, drug usage))

    (1) Where were the resource derived from ?
    (2) The format of thse ECG data ?
    (3) What can you extract from these data?
    (4) Try to losd these data using python "Panda", and to view with python "Metplotlib", and to verify (learn for classification) with python "Scikit".
    (5) Find Related Potential Applications : MIT ECG signal processing, data mining, classification and clustering


    Final Project (40%): (2019/6/10 presentation, 2019/6/17 Report to moodle with 3~5 minutes YouTube presentation(URLsharing))
    Topic : Text Classification and Clustering of Cancers via Medical Articeles in PubMed Corpus
    The types of Cancer(Choose five types at least)
    Resource : PubMed
    (Download PedMed Articles via PubMed API
    (Download PedMed Articles via Download MEDLINE/PubMed Data

    As in Chapter 8, try to train and construct a classifier for determining the class ("cancer type") of PubMed articles

    (Python Package : NLTK Natural Language ToolKit)
    (bag-of-words)
    Term-Frequency (TF)
    Inverse-Document-Frequency (IDF)
    Term Weighting : TF*IDF

    stop-word removal
    Word stemming : Porter stemming algorithm

    Single-class Classification (Sentiment Analysis (positive or negative)) vs. Multi-classes Classification (Which type of cancers is ?)

    Computation Power limitation : Out-of-core learning


    As in Chapter 9, construct a Web interace for user to upload one PubMed articels and then to determine the cancer types of that PubMed article via above trained classifier.
    Example: Please enter your movie review/ <

    Web Service with machine learning (Text Classification)(Sentimental Analysis)
    Package "pickle" (model persistence)
    Package "flask" (microframework) : pip install flask

  • AWS Educate Program

  • AWS Educate
    AWS 準備認證
    AWS Services
    完成 AWS Educate Propgram : Student Account 申請)Apply for an AWS Educate (蘇棻翎同學 提供)

    Chapter 01: 建置Python 開發環境

    Microsoft Docs
    Microsoft Azure Portal

  • Window Azure Mahine Learning Studio
    Linear Regression for Predicting the price of Car
    Example: jdwang

  • Name:
    ID:


    What do you know about "Machine Learning"?

    Why do you need to learn "Machine Learning"?

    What are you expect form this course ?

    The others: