大数据分析中的算法 (2017年春季)
基本信息: (教学大纲)
课程代码:00136720 (本科生),00100863 (本研合)
课程内容: 侧重数据分析中的数值代数和最优化算法
教师信息: 文再文,wenzw at pku dot edu dot cn
助教信息: 李勇锋: pdlyf1128 at 163 dot com, 刘浩洋: liuhaoyang at pku dot edu dot cn
课程微信群(下面二维码2月26日失效)
|
|
上课地点:理教302
上课时间:
先修课程要求:
其它信息:
参考材料:
“Mining of Massive Datasets”, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Cambridge University Press
“Convex optimization”, Stephen Boyd and Lieven Vandenberghe
“Numerical Optimization”, Jorge Nocedal and Stephen Wright, Springer
“Optimization Theory and Methods”, Wenyu Sun, Ya-Xiang Yuan
“Matrix Computations”, Gene H. Golub and Charles F. Van Loan, The Johns Hopkins University Press
The Matrix Cookbook, Kaare Brandt Petersen, Michael Syskind Pedersen
Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville
数据分析相关的网站:
Mining Massive Data Sets, Stanford University
Jure Leskovec @ Stanford - Stanford Computer Science
Mining Massive Data Sets: Hadoop Labs, Stanford University
Big Data Analytics, Columbia University
Theoretical Foundations of Big Data Analysis, The Simons Institute for the Theory of Computing, UC Berkeley
Introduction to Data Science, University of Washington
Core Methods in Data Science, University of Washington
Machine Learning and Artificial Intelligence, Sanjeev Arora and Elad Hazan, Princeton
Reinforcement Learning, David Silver
Deep Reinforcement Learning, Sergey Levine, John Schulman, Chelsea Finn
courses from Berkeley Artificial Intelligence Research (BAIR) Lab
Large-scale Machine Learning and Optimization, Dimitris Papailiopoulos
课程信息 (大致安排)
Acknowledgement: quite a few materials are taken from slides or lecture notes online. Please email me if the usage of any part is not appropriate and they will be deleted immediately.
第1周,2月21日,课程简介: lecture notes
线性规划,二次锥规划,半定规划简介: lecture notes
read: “Convex optimization”, Stephen Boyd and Lieven Vandenberghe, chapters 2 and 3
思考题: 有哪些问题和应用可以化成线性规划,二次锥规划,半定规划?
第2周,2月28日,凸优化对偶理论: lecture notes 线性规划,二次锥规划,半定规划例子: lecture notes
模型语言: CVX, YALMIP
LP, SOCP, SDP典型算法软件: SDPT3,
MOSEK,
CPLEX,
GUROBI
Prof. Boyd lecture notes on Disciplined convex programming and CVX
“Convex optimization”, Stephen Boyd and Lieven Vandenberghe, chapters 4 and 5
第3周,3月7日,线性规划单纯形算法,内点算法: lecture notes
“Numerical Optimization”, Jorge Nocedal and Stephen Wright, Springer, chapters 13 and 14
第4周,3月14日,压缩感知和稀疏优化基本理论: lecture notes
references: An Introduction to Compressive Sensing
Compressive Sensing Resources
第5周,3月21日,稀疏优化的算法 lecture notes
“Proximal Algorithms”, N. Parikh and S. Boyd, Foundations and Trends in Optimization, 1(3):123-231, 2014.
Amir Beck, Marc Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
“Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Foundations and Trends in Machine Learning, 3(1):1–122, 2011
第6周,3月28日, 推荐系统与低秩矩阵恢复的算法 lecture notes
SVD: read section 2.5 in “Matrix Computations”, Gene H. Golub and Charles F. Van Loan, The Johns Hopkins University Press
chapter 9 in “Mining Massive Data Sets”, Stanford University
“Guaranteed Minimum-Rank Solutions of Linear
Matrix Equations via Nuclear Norm Minimization”, Benjamin Recht,
Maryam Fazel, Pablo A. Parrilo
第7周,4月4日, 并行计算、分布式计算 lecture notes (This lecture is moved to 4月11日)
Some backgrounds on parallel computing can found at Prof. James Demmel's course
“Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Foundations and Trends in Machine Learning, 3(1):1–122, 2011
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Benjamin Recht, Christopher Re, Stephen Wright, Feng Niu
第8周,4月11日,放假取消
第9周,4月18日, 随机优化算法 lecture notes
Joel A Tropp, An introduction to matrix concentration inequalities
Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, Jorge Nocedal
Introductory Lectures on Stochastic Optimization, by John Duchi
Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville
第10周,4月25日, 随机特征值分解算法 lecture notes
Petros Drineas, Ravi Kannan, and Michael W. Mahoney, Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication, SIAM J. Comput., 36(1), 132–157
Petros Drineas, Ravi Kannan, and Michael W. Mahoney, Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix, SIAM J. Comput., 36(1), 158–183
N. Halko, P. G. Martinsson, and J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions,SIAM Rev., 53(2), 217–288.
第11周,5月2日,相位恢复以及低温电子显微镜 lecture notes
E. J. Candes, Y. Eldar, T. Strohmer and V. Voroninski. Phase retrieval via matrix completion. SIAM J. on Imaging Sciences 6(1), 199–225.
E. J. Candès, X. Li and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Transactions on Information Theory 61(4), 1985–2007.
A. Singer, Y. Shkolnisky, Three-Dimensional Structure Determination from Common Lines in Cryo-EM by Eigenvectors and Semidefinite Programming, SIAM Journal on Imaging Sciences, 4 (2), pp. 543-572 (2011).
L. Wang, A. Singer, Z. Wen, ‘‘Orientation Determination from Cryo-EM images using Least Unsquared Deviations”, SIAM Journal on Imaging Sciences, 6(4), pp. 2450–2483 (2013).
第12周,5月9日,高维数据降维 lecture notes
A Global Geometric Framework for Nonlinear Dimensionality Reduction, Joshua B. Tenenbaum, Vin de Silva, John C. Langford
Unsupervised Learning of Image Manifolds by Semidefinite Programming, KILIAN Q. WEINBERGER AND LAWRENCE K. SAUL
A Duality View of Spectral Methods for Dimensionality Reduction, Lin Xiao, Jun Sun, Stephen Boyd
Dimensionality Reduction: A Comparative Review, Laurens van der Maaten, Eric Postma, Jaap van den Herik
support vector machine lecture notes
LIBLINEAR: A Library for Large Linear Classification
LIBSVM: A library for support vector machines
Working set selection using the second order information for training SVM R.-E. Fan, P.-H. Chen, and C.-J. Lin.
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, CRISTIANINI, Nello, and John SHAWE-TAYLOR, Cambridge, UK: Cambridge University Press
第13周,5月16日,
图和网络流问题: 最短路径问题 lecture notes
“Network Flows: Theory, Algorithms, and Applications”, Ahuja, Magnanti, and Orlin
图和网络流问题: 最大流问题,组合优化与线性规划 lecture notes
“Network Flows: Theory, Algorithms, and Applications”, Ahuja, Magnanti, and Orlin
链接分析: page rank lecture notes
chapter 5, link analysis, Mining of Massive Datasets
第14周,5月20日下午2点-5点,理教313,Submodular 优化与数据挖掘 lecture notes
Learning with Submodular Functions: A Convex Optimization Perspective, Francis Bach
Learning with Submodular Functions, Francis Bach
EE596B - Submodular Functions, Optimization, and Applications to Machine Learning, Prof. Jeff A. Bilmes
第15周,5月30日,端午节放假
第16周,6月6日,Reinforcement Learning lecture notes
作业和课程项目,重要日期和提交内容:
3月7日上课前, 作业一: Convex Optimization 习题: 4.11, 4.12, 4.25, 4.27, Convex Optimization, Additional exercises,习题: 3.11, 3.12
3月7日前,课程项目一提交分组和选题信息, 具体课题在中期报告提交前还可以修改,
3月7日前, 请在如下网页提交lecture notes准备选题
3月14日上课前, 作业二: Convex Optimization 习题: 5.11, 5.17, 5.19, “Numerical Optimization”, Jorge Nocedal and Stephen Wright, Springer,习题: 13.9, 14.6,
3月28日上课前,作业三文件
4月16日北京时间晚上12点前, 作业四文件
demo: sparse_l1_example.m
4月25日晚12点, 提交项目一中期报告,请发email给助教 (pkuopt@163.com, wendouble@gmail.com)
中期报告不超过3页纸(单面)。需要描述你们组的选题,已经完成的工作,你们将要做哪些事情
提交的文件名为 “proj1rep1-name1-name2.pdf”. 比如如果成员为Pengyu Qian 和 Junzi Zhang,则文件名为 proj1rep1-PengyuQian-JunziZhang.pdf
5月9日晚12点, 作业五文件
5月16日晚12点 lecture notes 中期检查: 发给助教 (pkuopt@163.com, wendouble@gmail.com)
6月6日上课前, 作业六文件
6月20日晚12点,项目一书面报告 (包括latex源文件,程序等等打包发email给助教 (pkuopt@163.com, wendouble@gmail.com)。
提交的文件请全部打包,文件名为 “proj1-name1-name2.zip”
6月20日晚12点, lecture notes (包括latex源文件等等打包发email给助教 (pkuopt@163.com, wendouble@gmail.com)。
提交的文件请全部打包,文件名为 “lectx-name.zip”
准备 Lecture notes
课程项目
成绩评定
迟交一天(24小时)打折10%, 不接受晚交4天的作业和项目(任何时候理由都不接受)
大作业,包括习题和程序: 30%
课程 lecture notes 准备: 10%
课程项目: 60% (作业四算成一个课程大项目)
作业要求:i) 计算题要求写出必要的推算步骤,证明题要写出关键推理和论证。数值试验题应该同时提交书面报告和程序,其中书面报告有详细的推导和数值结果及分析。
ii) 可以同学间讨论或者找助教答疑,但不允许在讨论中直接抄袭,应该过后自己独立完成。 iii) 严禁从其他学生,从互联网,从往年的答案,其它课程等等任何途径直接抄袭。iv) 如果有讨论或从其它任何途径取得帮助,请列出来源。
课程项目一细节
分组: 自由组合一般2至3人一组,可以一个人
选题
重要日期和提交内容:
3月7日前,提交分组和选题信息, 具体课题在中期报告提交前还可以修改,
4月18日晚12点(不接受晚交报告), 中期报告请于前发email给助教 (pkuopt@163.com, wendouble@gmail.com)
中期报告不超过3页纸(单面)。需要描述你们组的选题,已经完成的工作,你们将要做哪些事情
提交的文件名为 “proj1rep1-name1-name2.pdf”. 比如如果成员为Pengyu Qian 和 Junzi Zhang,则文件名为 proj1rep1-PengyuQian-JunziZhang.pdf
6月20日晚12点(不接受晚交报告),书面报告 (包括latex源文件,程序等等打包发email给助教 (pkuopt@163.com, wendouble@gmail.com)。
提交的文件请全部打包,文件名为 “proj1-name1-name2.zip”. 比如如果成员为Pengyu Qian 和 Junzi Zhang,则文件名为 proj1-PengyuQian-JunziZhang.zip
课程项目文献
|