PubMed Articles Information Extraction
(Information Extraction using Perl)
ActivePerl Installation : (Local)ActivePerl-5.16.3.1604-MSWin32-x64-298023.msi (64bit) ActivePerl-5.16.3.1604-MSWin32-x86-298023(x86)
How to handle one file with multiple articles?
../SrcData/pubmed_result_WilsonDisease_jdwang2014_10_9.xml(Don't click this link, but download it directly)
PMID25276143 (Normal)
PMID25112974(No abstract)
PMID25288051( <AbstractText NlmCategory="UNLABELLED">)
PMID25291347(<AbstractText Label="GOALS:" NlmCategory="UNASSIGNED">)
How to handle multiple files within one directory?
How to remove duplicate PMID ?
(%Hash)
(MySQL database) install XAMPP (XAMPP Apache + MySQL + PHP + Perl)
create database pubmed_2014_10_20;
create user pubmed@localhost identified by "pubmed123";
grant all on pubmed_2014_10_20.* to pubmed@localhost;
install perl DBD::mysql
>ppm install DBD::mysql
How to handle multi-directory in which each contains multiple files?
Extract information from parsed data (Year, Month, day?)
Mining the words and their TF(Term Frequency) from the parsed data
Question: how to compute the DF (Document Frequency) of each word?
Compute the weight (TF*IDF) (term frequency–inverse document frequency) of each words.