Home >>Content
In this page:
Date: |
Practical Course: 生物数据挖掘与知识发现
-
Lecture 1
- Course Introduction
-
Data mining is the process of semi-automatically analyzing large databases to find useful patterns

- Grouping
- Reading: Machine learning for bioinformatics and neuroimaging (pdf)
- Online course: Data Mining in Bioinformatics (link)
- Types of Data Mining Tasks
- Prediction based on past history (mechanism examples)
- Classification
- Items (with associated attributes) belong to one of several classes
- Training instances have attribute values and classes provided
- Given a new item whose class is unknown, predict to which class it belongs based on its attribute values
- Regression
- Given a set of mappings for an unknown function, predict the function result for a new parameter value
- Regression deals with the prediction of a value, rather than a class.
- Descriptive Patterns
- Associations
- Find books that are often bought by “similar” customers. If a new such customer buys one such book, suggest the others too.
- Associations may be used as a first step in detecting causation. E.g., association between exposure to chemical X and cancer
- An association rule must have an associated population; the population consists of a set of instances. Rules have an associated support, as well as an associated confidence.
- Clusters
- Detection of clusters remains important in detecting epidemics. E.g., typhoid cases were clustered in an area surrounding a contaminated well
- Clustering: Intuitively, finding clusters of points in the given data such that similar points lie in the same cluster.
- Classification Rules
- Decision Tree Classifiers
- Each internal node of the tree partitions the data into groups based on a partitioning attribute, and a partitioning condition for the node

- Bayesian Classifiers, Naïve Bayesian Classifiers
- naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate
- Support Vector Machine Classifiers
- SVMs can be used separators that are curve, not necessarily linear, by transforming points before classification

- Neural Network Classifiers
- Neural network has multiple layers. For classification, each output value indicates likelihood of the input instance belonging to that class

- Deep neural networks have a large number of layers with large number of nodes in each layer
- Deep learning refers to training of deep neural network on very large numbers of training instances
- Other Types of Mining
- Text mining: application of data mining to textual documents
- Sentiment analysis: E.g., learn to predict if a user review is positive or negative about a product
- Information extraction: Create structured information from unstructured textual description or semi-structured data such as tabular displays
- Entity recognition and disambiguation: E.g., given text with name “Michael Jordan” does the name refer to the famous basketball player or the famous ML expert
- Knowledge graph: Can be constructed by information extraction from different sources, such as Wikipedia, Pubmed.
- Deep Learing深度学习
- 基础和基本思想
- 1. 人工智能、计算智能、类脑智能
- 2. 机器学习、记忆学习、归纳学习、统计学习
- 3. 深度学习
- 4. 人工神经网络、前馈神经网络、BP 算法 、Hessian 矩阵、 结构性特征表示
- 基本框架结构
- 1. Tensorflow
- 2. Caffe
- 3. Torch
- 4. MXNet
- CNN卷积神经网络
- 1. CNN 卷积神经网络之卷积层(一维卷积、二维卷积) 、池化层(均值池化、最大池化) 、全连接层、激活函数层、Softmax 层
- 2. CNN 卷积神经网络改进 R-CNN(SPPNET)、Fast-R-CNN、Faster-R-CNN(YOLO、SSD)
- 3. 深度学习的模型训练
- 4. 梯度下降的优化方法
- RNN循环神经网络
- 1. RNN 循环神经网络之梯度计算、BPTT
- 2. RNN 循环神经网络改进 LSTM、GRU、Bi-RNN、Attention based RNN
- 3. RNN 实际应用 Seq2Seq 的原理与实现
- DRL强化学习
- 1. 强化学习的理论
- 2. 经典模型 DQN
- 3. AlphaGo 原理
- 4. RL 实际应用AlphaGo
- GAN对抗性生成网络
- 1. GAN 的理论知识
- 2. GAN 经典模型 CGAN、LAPGAN、DCGAN
- 3. GAN 经典模型 INFOGAN、WGAN、S2-GAN
- 4. GAN 实际应用 DCGAN 提高模糊图片分辨率
- 5. GAN 实际应用 InfoGAN 做特定的样本生成
- TL 迁移学习
- 1. 迁移学习的理论
- 2. 迁移学习的常见方法 特征、实例、数据、深度迁移、强化迁移、研究案例
- 算法理论解析
- 1. 基于区域卷积网络 RCNN
- 2. 深度残差网络 Resnet
- 3. 胶囊网络 Capsule
- 4. 长短时记忆网络 LSTM
- 5. 注意力机制
- 6. BP 反传算法
- 7. 可变分编码器 VAE
- 实际应用案例操作
- 1. CNN—>图像分类
- 2. Lstm—>文本分类
- 3. Lstm—>命名实体抽取
- 4. Yolo—>目标检测
- 5. 图像分类(CNN)
- 6. 目标定位和识别(RCNN)
- 7. 图像重建(Auto-encoder)
- 8. 文本识别(RNN)
- 9. 实体标注(LSTM)
- 10. 手写体数字生成(GAN)
- 11. 逻辑回归导出图像分类
- 12. 静/动态图编写(CNN)
- 深度学习 DeepLearning 的常用 模型或者方法
- 1. AutoEncoder 自动编码器
- 2. Sparse Coding 稀疏编码
- 3. Restricted Boltzmann Machine (RBM) 限制波尔兹曼机
- 4. Deep Belief Networks (DBN) 深度置信网络
- Data collection
- group reporting
- what kinds of data on the bioinformatics field?
- data sources
- data preprocessing / cleaning
- data integration / selection
- data transformation
-
Lecture 3
- Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.
- Classification(分类)Clustering(聚类)
- Statistical Learning(统计学习)
- Association Analysis(关联分析)
- Link Mining(链接挖掘)
- Bagging and Boosting(自举汇聚法和迭代算法,属于集成学习ensemble learning)
- Sequential Patterns(序列模式)
- Integrated Mining(集成挖掘)
- Rough Sets(粗糙集,属性约简算法)
- Graph Mining(图挖掘)
【十大经典数据挖掘算法】系列
- Feature selection / pattern recognition
- group reporting
-
Lecture 4
- Pattern evaluation
- group reporting
-
Lecture 5
- Statistics model
- group reporting
-
Lecture 6
- Knowledge presentation
- group reporting
-
Lecture 7
- Prelimary results & Discussions
- 10 Challenging Problems in Data Mining Research
- Developing a Unifying Theory of Data Mining
- Scaling Up for High Dimensional Data/High Speed Streams
- Mining Sequence Data and Time Series Data
- Mining Complex Knowledge from Complex Data
- Data Mining in a Graph Structured Data
- Distributed Data Mining and Mining Multi-agent Data
- Data Mining for Biological and Environmental Problems
- Data-Mining-Process Related Problems
- Security, Privacy and Data Integrity
- Dealing with Non-static, Unbalanced and Cost-sensitive Data
-
Lecture 8
Up to Top |