中央社新聞(CNA news)


=======================PerlCNAParseKgram_jdwang2010_3_30.zip===============================

==================================================================

  1. Check the ParsedData
  2. Check the ParsedData (Sentences)
  3. Check the ParsedData (Sentences Length)
  4. The statistics of Chinese characters(1-gram)(one sentence)
  5. The statistics of Chinese characters (k-mers, k-grams)(one news article)
  6. The statistics of Chinese characters (k-mers, k-grams)=> outputfile
  7. The statistics of Chinese characters (k-mers, k-grams)=> outputfile(+tf distribution)
  8. The statistics of Chinese characters (k-mers, k-grams)=> outputfile(tf_df)

==================================================================

  1. perl Creat_MySQLTable_kgram_tf.pl (匯入CNA_CharStatistic.txt, TF_DF_IDF_Statistic.txt)