Lectures - Ming Chen's Group of Bioinformatics!

Lecture 1
- Course Introduction
- Data mining is the process of semi-automatically analyzing large databases to find useful patterns
- Grouping
- Reading: Machine learning for bioinformatics and neuroimaging (pdf)
- Online course: Data Mining in Bioinformatics (link)
- Types of Data Mining Tasks
  - Prediction based on past history (mechanism examples)
    - Classification
      - Items (with associated attributes) belong to one of several classes
      - Training instances have attribute values and classes provided
      - Given a new item whose class is unknown, predict to which class it belongs based on its attribute values
    - Regression
      - Given a set of mappings for an unknown function, predict the function result for a new parameter value
      - Regression deals with the prediction of a value, rather than a class.
  - Descriptive Patterns
    - Associations
      - Find books that are often bought by “similar” customers. If a new such customer buys one such book, suggest the others too.
      - Associations may be used as a first step in detecting causation. E.g., association between exposure to chemical X and cancer
      - An association rule must have an associated population; the population consists of a set of instances. Rules have an associated support, as well as an associated confidence.
    - Clusters
      - Detection of clusters remains important in detecting epidemics. E.g., typhoid cases were clustered in an area surrounding a contaminated well
      - Clustering: Intuitively, finding clusters of points in the given data such that similar points lie in the same cluster.
  - Classification Rules
    - Decision Tree Classifiers
      - Each internal node of the tree partitions the data into groups based on a partitioning attribute, and a partitioning condition for the node
    - Bayesian Classifiers, Naïve Bayesian Classifiers
      - naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate
    - Support Vector Machine Classifiers
      - SVMs can be used separators that are curve, not necessarily linear, by transforming points before classification
    - Neural Network Classifiers
      - Neural network has multiple layers. For classification, each output value indicates likelihood of the input instance belonging to that class
      - Deep neural networks have a large number of layers with large number of nodes in each layer
      - Deep learning refers to training of deep neural network on very large numbers of training instances
- Other Types of Mining
  - Text mining: application of data mining to textual documents
  - Sentiment analysis: E.g., learn to predict if a user review is positive or negative about a product
  - Information extraction: Create structured information from unstructured textual description or semi-structured data such as tabular displays
  - Entity recognition and disambiguation: E.g., given text with name “Michael Jordan” does the name refer to the famous basketball player or the famous ML expert
  - Knowledge graph: Can be constructed by information extraction from different sources, such as Wikipedia, Pubmed.
- Deep Learing深度学习
  - 基础和基本思想
    1. 1. 人工智能、计算智能、类脑智能
    2. 2. 机器学习、记忆学习、归纳学习、统计学习
    3. 3. 深度学习
    4. 4. 人工神经网络、前馈神经网络、BP 算法、Hessian 矩阵、结构性特征表示
  - 基本框架结构
    1. 1. Tensorflow
    2. 2. Caffe
    3. 3. Torch
    4. 4. MXNet
  - CNN卷积神经网络
    1. 1. CNN 卷积神经网络之卷积层（一维卷积、二维卷积）、池化层（均值池化、最大池化）、全连接层、激活函数层、Softmax 层
    2. 2. CNN 卷积神经网络改进 R-CNN（SPPNET）、Fast-R-CNN、Faster-R-CNN（YOLO、SSD）
    3. 3. 深度学习的模型训练
    4. 4. 梯度下降的优化方法
  - RNN循环神经网络
    1. 1. RNN 循环神经网络之梯度计算、BPTT
    2. 2. RNN 循环神经网络改进 LSTM、GRU、Bi-RNN、Attention based RNN
    3. 3. RNN 实际应用 Seq2Seq 的原理与实现
  - DRL强化学习
    1. 1. 强化学习的理论
    2. 2. 经典模型 DQN
    3. 3. AlphaGo 原理
    4. 4. RL 实际应用AlphaGo
  - GAN对抗性生成网络
    1. 1. GAN 的理论知识
    2. 2. GAN 经典模型 CGAN、LAPGAN、DCGAN
    3. 3. GAN 经典模型 INFOGAN、WGAN、S2-GAN
    4. 4. GAN 实际应用 DCGAN 提高模糊图片分辨率
    5. 5. GAN 实际应用 InfoGAN 做特定的样本生成
  - TL 迁移学习
    1. 1. 迁移学习的理论
    2. 2. 迁移学习的常见方法特征、实例、数据、深度迁移、强化迁移、研究案例
  - 算法理论解析
    1. 1. 基于区域卷积网络 RCNN
    2. 2. 深度残差网络 Resnet
    3. 3. 胶囊网络 Capsule
    4. 4. 长短时记忆网络 LSTM
    5. 5. 注意力机制
    6. 6. BP 反传算法
    7. 7. 可变分编码器 VAE
  - 实际应用案例操作
    1. 1. CNN—>图像分类
    2. 2. Lstm—>文本分类
    3. 3. Lstm—>命名实体抽取
    4. 4. Yolo—>目标检测
    5. 5. 图像分类（CNN）
    6. 6. 目标定位和识别（RCNN）
    7. 7. 图像重建（Auto-encoder）
    8. 8. 文本识别（RNN）
    9. 9. 实体标注（LSTM）
    10. 10. 手写体数字生成（GAN）
    11. 11. 逻辑回归导出图像分类
    12. 12. 静/动态图编写（CNN）
  - 深度学习 DeepLearning 的常用模型或者方法
    1. 1. AutoEncoder 自动编码器
    2. 2. Sparse Coding 稀疏编码
    3. 3. Restricted Boltzmann Machine (RBM) 限制波尔兹曼机
    4. 4. Deep Belief Networks (DBN) 深度置信网络

2020浙江省生物信息学”学会杯“大赛

Lecture 2

Data collection
group reporting
what kinds of data on the bioinformatics field?
- data sources
- data preprocessing / cleaning
- data integration / selection
- data transformation

Lecture 3
- Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.
- Feature selection / pattern recognition
- group reporting
Lecture 4
- Pattern evaluation
- group reporting
Lecture 5
- Statistics model
- group reporting
Lecture 6
- Knowledge presentation
- group reporting
Lecture 7
- Prelimary results & Discussions
- 10 Challenging Problems in Data Mining Research
  - Developing a Unifying Theory of Data Mining
  - Scaling Up for High Dimensional Data/High Speed Streams
  - Mining Sequence Data and Time Series Data
  - Mining Complex Knowledge from Complex Data
  - Data Mining in a Graph Structured Data
  - Distributed Data Mining and Mining Multi-agent Data
  - Data Mining for Biological and Environmental Problems
  - Data-Mining-Process Related Problems
  - Security, Privacy and Data Integrity
  - Dealing with Non-static, Unbalanced and Cost-sensitive Data
Lecture 8
- Seminar
- final report

Home >>Content

Practical Course: 生物数据挖掘与知识发现

Lecture 1

Lecture 2

Lecture 3

Lecture 4

Lecture 5

Lecture 6

Lecture 7

Lecture 8