Home >>Content

In this page:


Date:

Practical Course: 生物数据挖掘与知识发现

  • Lecture 1

    • Course Introduction
    • Data mining is the process of semi-automatically analyzing large databases to find useful patterns
    • Grouping
    • Reading: Machine learning for bioinformatics and neuroimaging (pdf)
    • Online course: Data Mining in Bioinformatics (link)
    • Types of Data Mining Tasks
      • Prediction based on past history (mechanism examples)
        • Classification
          • Items (with associated attributes) belong to one of several classes
          • Training instances have attribute values and classes provided
          • Given a new item whose class is unknown, predict to which class it belongs based on its attribute values
        • Regression
          • Given a set of mappings for an unknown function, predict the function result for a new parameter value
          • Regression deals with the prediction of a value, rather than a class.
      • Descriptive Patterns
        • Associations
          • Find books that are often bought by “similar” customers. If a new such customer buys one such book, suggest the others too.
          • Associations may be used as a first step in detecting causation. E.g., association between exposure to chemical X and cancer
          • An association rule must have an associated population; the population consists of a set of instances. Rules have an associated support, as well as an associated confidence.
        • Clusters
          • Detection of clusters remains important in detecting epidemics. E.g., typhoid cases were clustered in an area surrounding a contaminated well
          • Clustering: Intuitively, finding clusters of points in the given data such that similar points lie in the same cluster.
      • Classification Rules
        • Decision Tree Classifiers
          • Each internal node of the tree partitions the data into groups based on a partitioning attribute, and a partitioning condition for the node
        • Bayesian Classifiers, Naïve Bayesian Classifiers
          • naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate
        • Support Vector Machine Classifiers
          • SVMs can be used separators that are curve, not necessarily linear, by transforming points before classification
        • Neural Network Classifiers
          • Neural network has multiple layers. For classification, each output value indicates likelihood of the input instance belonging to that class
          • Deep neural networks have a large number of layers with large number of nodes in each layer
          • Deep learning refers to training of deep neural network on very large numbers of training instances
    • Other Types of Mining
      • Text mining: application of data mining to textual documents
      • Sentiment analysis: E.g., learn to predict if a user review is positive or negative about a product
      • Information extraction: Create structured information from unstructured textual description or semi-structured data such as tabular displays
      • Entity recognition and disambiguation: E.g., given text with name “Michael Jordan” does the name refer to the famous basketball player or the famous ML expert
      • Knowledge graph: Can be constructed by information extraction from different sources, such as Wikipedia, Pubmed.
    • Deep Learing深度学习
      • 基础和基本思想
        1. 1. 人工智能、计算智能、类脑智能
        2. 2. 机器学习、记忆学习、归纳学习、统计学习
        3. 3. 深度学习
        4. 4. 人工神经网络、前馈神经网络、BP 算法 、Hessian 矩阵、 结构性特征表示
      • 基本框架结构
        1. 1. Tensorflow
        2. 2. Caffe
        3. 3. Torch
        4. 4. MXNet
      • CNN卷积神经网络
        1. 1. CNN 卷积神经网络之卷积层(一维卷积、二维卷积) 、池化层(均值池化、最大池化) 、全连接层、激活函数层、Softmax 层
        2. 2. CNN 卷积神经网络改进 R-CNN(SPPNET)、Fast-R-CNN、Faster-R-CNN(YOLO、SSD)
        3. 3. 深度学习的模型训练
        4. 4. 梯度下降的优化方法
      • RNN循环神经网络
        1. 1. RNN 循环神经网络之梯度计算、BPTT
        2. 2. RNN 循环神经网络改进 LSTM、GRU、Bi-RNN、Attention based RNN
        3. 3. RNN 实际应用 Seq2Seq 的原理与实现
      • DRL强化学习
        1. 1. 强化学习的理论
        2. 2. 经典模型 DQN
        3. 3. AlphaGo 原理
        4. 4. RL 实际应用AlphaGo
      • GAN对抗性生成网络
        1. 1. GAN 的理论知识
        2. 2. GAN 经典模型 CGAN、LAPGAN、DCGAN
        3. 3. GAN 经典模型 INFOGAN、WGAN、S2-GAN
        4. 4. GAN 实际应用 DCGAN 提高模糊图片分辨率
        5. 5. GAN 实际应用 InfoGAN 做特定的样本生成
      • TL 迁移学习
        1. 1. 迁移学习的理论
        2. 2. 迁移学习的常见方法 特征、实例、数据、深度迁移、强化迁移、研究案例
      • 算法理论解析
        1. 1. 基于区域卷积网络 RCNN
        2. 2. 深度残差网络 Resnet
        3. 3. 胶囊网络 Capsule
        4. 4. 长短时记忆网络 LSTM
        5. 5. 注意力机制
        6. 6. BP 反传算法
        7. 7. 可变分编码器 VAE
      • 实际应用案例操作
        1. 1. CNN—>图像分类
        2. 2. Lstm—>文本分类
        3. 3. Lstm—>命名实体抽取
        4. 4. Yolo—>目标检测
        5. 5. 图像分类(CNN)
        6. 6. 目标定位和识别(RCNN)
        7. 7. 图像重建(Auto-encoder)
        8. 8. 文本识别(RNN)
        9. 9. 实体标注(LSTM)
        10. 10. 手写体数字生成(GAN)
        11. 11. 逻辑回归导出图像分类
        12. 12. 静/动态图编写(CNN)
      • 深度学习 DeepLearning 的常用 模型或者方法
        1. 1. AutoEncoder 自动编码器
        2. 2. Sparse Coding 稀疏编码
        3. 3. Restricted Boltzmann Machine (RBM) 限制波尔兹曼机
        4. 4. Deep Belief Networks (DBN) 深度置信网络
  • Lecture 2

    • Data collection
    • group reporting
    • what kinds of data on the bioinformatics field?
      • data sources
      • data preprocessing / cleaning
      • data integration / selection
      • data transformation
  • Lecture 3

    • Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.
      • Classification(分类)Clustering(聚类)
      • Statistical Learning(统计学习)
      • Association Analysis(关联分析)
      • Link Mining(链接挖掘)
      • Bagging and Boosting(自举汇聚法和迭代算法,属于集成学习ensemble learning)
      • Sequential Patterns(序列模式)
      • Integrated Mining(集成挖掘)
      • Rough Sets(粗糙集,属性约简算法)
      • Graph Mining(图挖掘)

      【十大经典数据挖掘算法】系列

    • Feature selection / pattern recognition
    • group reporting
  • Lecture 4

    • Pattern evaluation
    • group reporting
  • Lecture 5

    • Statistics model
    • group reporting
  • Lecture 6

    • Knowledge presentation
    • group reporting
  • Lecture 7

    • Prelimary results & Discussions
    • 10 Challenging Problems in Data Mining Research
      • Developing a Unifying Theory of Data Mining
      • Scaling Up for High Dimensional Data/High Speed Streams
      • Mining Sequence Data and Time Series Data
      • Mining Complex Knowledge from Complex Data
      • Data Mining in a Graph Structured Data
      • Distributed Data Mining and Mining Multi-agent Data
      • Data Mining for Biological and Environmental Problems
      • Data-Mining-Process Related Problems
      • Security, Privacy and Data Integrity
      • Dealing with Non-static, Unbalanced and Cost-sensitive Data
  • Lecture 8

    • Seminar
    • final report

Up to Top

This page last modified: Jan 30, 2020