PubMed Articles Information Extr

PubMed Articles Information Extraction

(Information Extraction using Perl)

How to handle multiple files within one directory?
- ../SrcData_OrderByPubDate
- program3_jdwang2014_10_13.7z
How to remove duplicate PMID ?
- program4_jdwang2014_10_20.7z
  - （%Hash)
- program5_jdwang2014_10_20.7z
  - （MySQL database) install XAMPP (XAMPP Apache + MySQL + PHP + Perl)
    - create database pubmed_2014_10_20;
    - create user pubmed@localhost identified by "pubmed123";
    - grant all on pubmed_2014_10_20.* to pubmed@localhost;
  - install perl DBD::mysql
    - >ppm install DBD::mysql
How to handle multi-directory in which each contains multiple files?
- program6_jdwang2014_10_20.7z
- SrcData_ByYearDir.7z
Extract information from parsed data (Year, Month, day?)
Mining the words and their TF(Term Frequency) from the parsed data

Compute the weight (TF*IDF) （term frequency–inverse document frequency） of each words.
- program9_jdwang2014_10_27.7z
- SrcData_Topic_WilsonDisease_Statistics.7z
PubMed Significant Pattern History
Wordle - Beautiful Word Clouds
- program10_jdwang2014_10_27.7z
  - InstancesVectorTransform
- SVM Classifiers