Keywords : gene. protein entities
European Journal of Molecular & Clinical Medicine,
2020, Volume 7, Issue 3, Pages 4309-4322
As the size of the biomedical databases are increasing day-by-day, finding an essential feature set for classification problem is complex due to large data size and sparsity problems. Text feature ranking and clustering is one of the major challenges to scientific and medical researchers due to its high dimensional feature space and limited number of samples. High dimensionality of the feature space is one of the major issues in biomedical document clustering due to large number of candidates sets. Selection of high probabilistic features for clustering is therefore essential for biomedical document analysis such as classification and clustering. In this paper, a novel probabilistic key phrase extraction and preprocessing model is designed and implemented on large number of biomedical documents. In this framework, a novel key-phrase extraction method is used to filter the large biomedical document sets. Experimental results show that the present key phrase extraction approach is better than existing key-phrase extraction approaches in terms of runtime and accuracy are concerned.