加州大学圣地亚哥分校 (University of California, San Diego)

是美国一所著名公立学府,成立于1960年,为美国全国性第一级大学(Tier I),隶属加州大学行政系统。该校位于美国加利福尼亚州南部城市圣迭戈以北城镇拉霍亚,拥有一个所有加州大学中最大,占地866公顷的校园。 圣迭戈加大虽然建校只有短短的五十多年,但是已经成为美国顶尖以研究科学为主,且学术声望非常高的研究性公立大学。此间学校亦被誉为“公立常春藤”之一,同时也是美国重要的学术联盟美国大学协会的成员。

生物信息工程科研

一、课题方向

Bioinformatics analysis of epigenomic

表观基因组的生物信息学分析

Motif analysis of genetics

基因动力分析

Molecular modeling of protein structures

蛋白质结构的分子模型

Statistical learning of genetic network

遗传网络的统计学习

Biophysical modeling of epigenetic landscape

表观遗传景观的生物物理建模

Biophysics

生物物理学

 

二、导师背景

化学与生物化学系

细胞与分子医学系

圣地亚哥加利福尼亚大学

斯坦福大学博士后研究员

 

三、科研内容参考

Motivation

Sequence analysis frequently requires intuitive understanding and convenient representation of motifs. Traditionally, motifs are usually stored in PWMs1 and visualized by sequence LOGO2. However, PWMs is bulky and non-intuitive to visualize, and LOGO is limited by graphical interface support. Here, we propose to represent motifs by wildcard-style consensus sequences. We further show how this conversion can be compact, informative and intuitive. Based on mutual information theory and Jenson- Shannon Divergence3, we propose a mathematical framework to optimize the proposed conversion. Then we implement an efficient algorithm to achieve such conversion. We show how using such representation can be a significant improvement over current alternatives4. In summary, we believe our package will find its niche in textual representation of motif, where visualization support is often lacking.

Students will learn

-- How to use design efficient algorithm implemented in python/R/Bash.
-- How to design and test statistical hypothesis, specifically with information theory.
-- What are the principle bioinformatics tools for DNA sequence manipulation and motif analysis.
-- How to visualize DNA sequence in browser and customized scripting tools.
-- How to design biological problem that can be solved by application of various machine learning algorithms including logistic regression, random forest, SVM, neural network, and deep learning.
By the end of the summer, student should achieve working knowledge in statistics, programming execution and domain knowledge in human genomics. The other aim of the project is to btter prepare students for transition to and application of PhD level bioinformatics or master level data science-related disciplines.

pre-requisite:
-- Programming: Students need to be able to successfully run "hello world" for python/R/Bash on your laptop.
-- Statistics: Students should be able to articulate what a binomial distribution is, preferably to a 5-year old.
-- Biology: Students need to know the genomic information is encoded in the DNA, and DNA sequences is composed of A, T, C and Gs.

向上