Biomarker discovery using bioinformatics methods

Islam, Md Tawhidul

doi:10.25949/19440575.v1

01whole.pdf (1.03 MB)

Biomarker discovery using bioinformatics methods

thesis

posted on 2022-03-28, 21:46 authored by Md Tawhidul Islam

A biomarker is a biochemical indicator of a biologic state that may serve as an indicator or predictor of a disease. Biomarker is used to measure presence, risk, progress or the effect of treatment of a disease rather than measuring the disease itself. Biomarkers act as a basis for the selection of lead candidates for clinical trials. Scientists have been searching for biomarkers for decades. Methods of discovery have developed as the technology emerges. Advances in genomics and proteomics have made it easier to interrogate hundreds or thousands of potential markers at a time and produced an unprecedented growth in the volume of new data in the field of biomarker, drug discovery and patient care. However success and progress of such work is very much dependent on prior knowledge and experience with the potential markers of interest. The diverse data generated by high-throughput biotechnology is an ideal starting point for gaining knowledge in system bioinformatics. This information is only useful if it is easily accessible. However, majority of them are presented in free-text format that are not readily available for automatic computerized analysis. In this thesis we present a novel knowledge aggregation approach based on statistical, user-defined structural rules, machine learning, text mining and Natural Language Processing (NLP) techniques to automatically extract biomarker related information from scientific literatures. Our knowledge aggregation approach combines of two major tasks namely, Information Extraction and Relationship Extraction. Therefore the thesis first presents an automatic information retrieval, summarization and extraction (mExtract) tool. Built on statistical and pattern matching NLP technique our intelligent agent system, mExtract is capable of retrieving most relevant documents from the web based on user queries. Once the documents are retrieved, system then uses its underlying techniques to extract biomarker specific information (i.e. protein, gene, genome, disease) from the text by finding out the focal topic of the document and extracting the most relevant properties of that topic and also generates a summary of the topic. Secondly, we present our extended system namely Biomarker Information Extraction Tool (BIET), that is capable of extracting biomarker relationship within disease, gene and protein. For a given set of oncology related texts (i.e., Abstract), BIET extracts biomarker relationship namely, is biomarker of (disease, gene/protein) from the texts. Built on state-of-the-art statistical models and machine learning techniques BIET consists of three major components; Semantic Category Recognition to identify the evaluative sentences among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities, Assertion Classification to determine whether the statement refers to biomarker entity (protein, gene and disease) relationship and Semantic Relationship Classification to identify the biomarker relationship among the biomedical entities. The diverse applications presented in this thesis demonstrate that our new knowledge aggregation approach is practical, effective in the sense it utilizes a series of statistical models that are heavily reliant on local lexical and syntactic context and achieve competitive results compared to more complex NLP solutions; versatile as it is easily extendable to similar or more complex relation extraction task and represents an important contribution to bioinformatics and to the fields of biomedical research in which it is applied.

History

Notes

"A thesis submitted in fulfilment of the requirements for the degree of Master of Philosophy" Includes bibliographical references "March 2010" Thesis by publication.

Awarding Institution

Macquarie University

Degree Type

Thesis MPhil

Degree

MPhil, Macquarie University, Faculty of Science, Department of Chemistry and Biomolecular Sciences

Department, Centre or School

Department of Chemistry and Biomolecular Sciences

Year of Award

2010

Principal Supervisor

Shoba Ranganathan

Rights

Copyright Md Tawhidul Islam 2010. Copyright disclaimer: http://mq.edu.au/library/copyright

Language

English

Extent

1 online resource (xi, 47 leaves)

Former Identifiers

mq:71896 http://hdl.handle.net/1959.14/1279246

Usage metrics

Keywords

text mining Biochemical markers machine learning biomarker natural language processing bioinformatics

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Biomarker discovery using bioinformatics methods

History

Table of Contents

Notes

Awarding Institution

Degree Type

Degree

Department, Centre or School

Year of Award

Principal Supervisor

Rights

Language

Extent

Former Identifiers

Usage metrics

Categories

Keywords

Licence

Exports