Statistical multivariate methods of analysis of data from biological texts and ontologies
Abstract
The research involves text mining of biological texts using statistical methods of classification and clustering. The classification involves the use of Linear Discriminant Analysis, LDA, Support Vector Machines, SVA and Multinomial Logistic Regression, MLR, LDA was found to perform the best. Non Linear Canonical Correlation, Analysis, NLCCA was also used in order to describe the information of the words of the texts, their gene ontology and Medical Subject Headings with only one dataset, with reduced number of variables. The clustering was based on a stochastic algorithm, namely Markov clustering (MCL) and represented the results to the end user in a 2d or 3d environment.
Download full text in PDF format (17.35 MB)
(Available only to registered users)
|
All items in National Archive of Phd theses are protected by copyright.
|
Usage statistics
VIEWS
Concern the unique Ph.D. Thesis' views for the period 07/2018 - 07/2023.
Source: Google Analytics.
Source: Google Analytics.
ONLINE READER
Concern the online reader's opening for the period 07/2018 - 07/2023.
Source: Google Analytics.
Source: Google Analytics.
DOWNLOADS
Concern all downloads of this Ph.D. Thesis' digital file.
Source: National Archive of Ph.D. Theses.
Source: National Archive of Ph.D. Theses.
USERS
Concern all registered users of National Archive of Ph.D. Theses who have interacted with this Ph.D. Thesis. Mostly, it concerns downloads.
Source: National Archive of Ph.D. Theses.
Source: National Archive of Ph.D. Theses.