Text Mining
(Project-Based or Problem-Based Learning)(PBL)


Time: Monday, 9:10am~12:00am, Room: I627
jdwang@asia.edu.tw, Room:I517, ext:1847

AWS Educate Program
AWS Educate
AWS 準備認證
On-Line program
(1) Apply for an AWS Educate (By Suca)
(Deadline : 2019/10/18, 12:00PM)


Reference Book
  • Python Data Science Handbook (2015) Jake VanderPlas
  • Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition(2017) Sebastian Raschka, Vahid Mirjalil

  • Learn and Master Python in a Month Python 從入門到精通:一個月就夠了!

    Related Tpoics of Text Mining
    Multi-Class Text Classification Model Comparison and Selection(From: Susan Li)
    Classify Toxic Online Comments with LSTM and GloVe(From: Susan Li)
    Classify Toxic Online Comments with LSTM and GloVe(From: Susan Li)
    Python Data Science Getting Started Tutorial: NLTK(From: SWAYAM MITTAL)
    A Quick Introduction to Text Summarization in Machine Learning(From: Dr. Michael J. Garbade, 2018)



    Related Learning Programs or Tools for Text Mining:
    On-Line Learning Program(巨匠) (You need to register in advance in order to take the following courses)

    Grade
    Middle Project (40%): (2019/10/28, presentation (ppt))(2019/11/4 report (word(pdf)+YouTube(URL:sharing)) to moodle)



    Final Project (40%): (2020/1/6, report (word(pdf)+YouTube(URL:sharing)) to moodle)



    (English) Ministry of Education Intercollegiate AI CUP 2019 – Artificial Intelligence Analysis and Classification of Thesis
    (Chinese)教育部全國大專校院人工智慧競賽(AI CUP 2019)-人工智慧論文機器閱讀競賽之論文分類
    DataSet
    Public Leaderboard


    AU_Juwara (Asia University - Win)
    Team Members : Okto (蘇亞迪)(Project Leader), Prayitno, Aninda Astuti and K.M.AFAQ
    Advisor: Jing-Doo Wang
    AI_CUP_2019-_SrcData_jdwang2019_12_23.zip
    AI_CUP_2019_PythonPrograms_jdwang2019_12_23.zip

    Multiclass and multilabel algorithms (From: https://scikit-learn.org)
    Support multilabel:
    sklearn.tree.DecisionTreeClassifier
    sklearn.tree.ExtraTreeClassifier
    sklearn.ensemble.ExtraTreesClassifier
    sklearn.neighbors.KNeighborsClassifier
    sklearn.neural_network.MLPClassifier
    sklearn.neighbors.RadiusNeighborsClassifier
    sklearn.ensemble.RandomForestClassifier
    sklearn.linear_model.RidgeClassifierCV

    System Diagram
    Example codes (From: jdwang)(load_csv, LabelsTransform, ..., etc.)
    sklearn.preprocessing.MultiLabelBinarizer
    Multilabel classification
    Classification of text documents using sparse features
    Parameter estimation using grid search with cross-validation

    sklearn.feature_selection (python)


    Discussion Topics:
    Extra Features? (1-gram, bi-gram, tri-gram?) (Authors, subcatogory)(Reference Network?)
    Feature Selection?
    Instance Vector Normailization
    Feature Combination (PCA, LDA)
    Src Text Postion ? (Title , Abstract)
    Classsifiers Parameters Tuning
    Classifiers Performance Comparison
    (Miss-Classified instances checking)
    MNIST_handwritten_digits_classification_ShowErrorPrediction_jdwang2019_12_1.html (Example for ShowErrorPrediction)
    Classifiers Combination?
    Model Construction
    Computation Pipeline Construction