Dr. Arzucan Özgür on “Discovering Molecular Interactions using Language Processing Techniques”

Dr. Arzucan Özgür will give a seminar on “Discovering Molecular Interactions using Language Processing Techniques” on 25 December at 4 pm. The abstract of the talk and a short bio is shared below.


Bio:
Arzucan Özgür is an associate professor at the Computer Engineering Department of Boğaziçi University. She holds a Ph.D. degree in Computer Science and Engineering from the University of Michigan, and MS and BS degrees from the Department of Computer Engineering at Boğaziçi University. She is a recipient of the FP7 Marie Curie Career Integration Grant as well as The Science Academy Young Scientist Award (BAGEP 2016) and the Turkish Science Academy Young Scientist Award (TUBA-GEBIP 2019). She is the co-founder of the Text Analytics and Bioinformatics (TABI) Research Lab at Boğaziçi University and a member of the AILAB. Her research areas are in the intersection of Bioinformatics and Natural Language Processing. Her recent focus has been on developing language processing techniques for information extraction and knowledge discovery from textual data available in natural language or in biology.

Abstract:
New discoveries are often disseminated through scientific publications. Due to the huge and rapidly growing scientific literature in life sciences, most of the important information remains hidden in the unstructured text of the published papers. Automatically extracting the useful information using natural language processing techniques and presenting the extracted information to the scientists in a structured format is vital for facilitating research in this domain. In the first part of her talk, Dr. Arzucan Özgür will describe Vapur, a search system that they developed for enabling easy access to protein-chemical compound relations in the Covid-19 related literature. Vapur’s pipeline includes sentence segmentation, named entity recognition and normalization, relation extraction, query correction, and similar molecule suggestion components. Users can search using protein or compound names and related molecules as well as the sentences in the publications describing these relations are displayed as a result. Evaluations by domain experts revealed that Vapur may be useful for supporting drug and vaccine development studies in this area.

Besides the text written in natural language, Dr. Arzucan Özgür and her team hypothesize that molecules are also written in a certain molecular language. For example, DNA can be considered as written in a language with an alphabet of four letters. Similarly, proteins and chemical compounds can also be represented in textual format. In the second part of her talk, she will describe ChemBoost, a method for predicting the binding affinities of protein-chemical compound interactions. ChemBoost assumes that molecules are documents made up of words and uses language processing methods to represent them. Since ChemBoost is only based on the textual representations of the molecules, it can be used when their three-dimensional structures are not known. The team believes that language processing-based approaches can be effective in revealing the interactions between molecules and may shed light on new drug/vaccine development studies.