At the Carleton University Biomedical Informatics Co-laboratory (CUBIC), we apply machine learning and data science to solve problems in biomedical informatics. Current projects requiring additional students include an exploration of the use of RGB-D video and pressure-sensitive mats for real-time patient monitoring in the NICU at CHEO (data collection ongoing), automated analysis of audiograms for telemedicine applications in under-served communities (with an industry partner), development of a wireless system to monitor neonates during emergency ground and air transport to the NICU at CHEO (collaboration already in place), and development of novel machine learning methods for analyzing protein structure, function, interaction, and chemical modification. Interested students should have strong software and communication skills proven through academic performance and/or industry experience. Hands-on experience with deep learning, artificial intelligence, web programming, and statistics is highly valued.
Species-specific Prediction of microRNA.
microRNA are short RNA molecules that play an important role in post-transcriptional gene regulation. Our collaborators are continuously sequencing new species and wish to identify novel microRNA within these new genomes. However, most widely-used microRNA prediction tools are only effective on human data. We have developed SMIRP, a framework for the creation of species-specific predictors of microRNA from genomic sequence. We have achieved up to 500% increases in sensitivity at precisions of up to 90% when compared with existing methods. SMIRP has been applied to study numerous genomes including turtles, slime moulds, and a snail. We are now developing methods to leverage transcriptomic RNA-Seq data as this continues to becomes more accessible to experimental researchers.
Artifact detection in real-time patient monitoring.
In a second application, we are part of a research initiative to conduct real-time patient monitoring in intensive care settings. Currently, enormous quantities of data are measured continuously from patients such as blood oxygen saturation, ECG, respiration rate, and blood pressure. However, typically only periodic readings are recorded by a health care worker at wide intervals since most hospitals lack the infrastructure required to store and analyze this data in real-time. Research has shown that analyzing patient data such as heart rate on a continuous basis can detect the onset of serious illness such as sepsis hours before symptoms become evident. While the infrastructure and algorithms are currently being developed and deployed in hospitals around the world, artifacts in the data continue to be problematic, leading to loss of data and potential mis-diagnosis. With our collaborators, we are developing a framework for real-time artifact detection that would enable the automated selection of optimal detection algorithms tailored for the specific clinical setting of each patient.
Protein structure prediction.
Much like the shape of a tool suggests its intended purpose, knowledge of a protein's structure can provide substantial insight into its function. Therefore, computational prediction of protein structure based solely on protein sequence data is a challenge of fundamental importance to biomedical research. An effective solution promises significant advances in computational drug discovery and an increased understanding of complex disease processes such as cancer. We have recently developed a novel approach to determining the 1D secondary structure of proteins from protein sequence data which makes use of Parallel Cascade Identification (PCI), a powerful method of nonlinear system identification. We are currently working towards extending this method to the prediction of full 3D tertiary structure prediction.
Post-translational modification.
While progress continues to be made on the prediction of structure from sequence, knowledge of a protein's structure may not be sufficient to discern its function. For example, most proteins undergo some form of post-translational modification (PTM) following initial synthesis which may have a profound impact on protein function. Our lab is therefore working to develop intelligent predictors of important PTM's such as sumoylation and phosphorylation. Iterative prediction of protein function and structure is a long term goal as well.
Information-driven mass spectrometry.
Tandem mass spectrometry (MS/MS) is an analytical technique for identifying proteins from an unknown mixture, and has become a cornerstone of modern proteomics. Currently, protein identification is relegated to take place offline, after the data collection phase, when it is too late to take corrective action in the case of an ambiguous identification. By sufficiently accelerating the data analysis, it becomes possible to close the feedback loop and achieve true information-driven data collection. Our research program aims to leverage advanced parallel processing approaches and architectures (GPU, multicore) to accelerate protein identification to enable real-time control of a MS/MS device. By simultaneously collecting and analyzing data on a tandem mass spectrometer, new forms of data analysis become possible including more effective identification of low abundance biomarkers.
Bioinformatics web services.
Please click here for a list of web services developed by our lab.