ECE/CS 6960/5960 Fundamentals of Cloud Systems

Project Ideas

Lecture 1 (01/08/2019) (slides, syllabus)

What is cloud computing
Examples on cloud computing (Amazon Web Services (AWS))
What we will/won't cover
Grading Polices

Part I: Cloud Computing

Lecture 2 (01/10/2019) (Lecture notes, Lecture Scribe 1, Lecture Scribe 2)

MapReduce and Spark
Coded MapReduce

References:
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107-113.
- Zaharia, Matei, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. “Spark: Cluster computing with working sets.” HotCloud 10, no. 10-10 (2010): 95.
- Li, Songze, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. “Coded mapreduce.” In Communication, Control, and Computing (Allerton), 2015 53rd Annual Allerton Conference on, pp. 964-971. IEEE, 2015.
- Li, Songze, Sucha Supittayapornpong, Mohammad Ali Maddah-Ali, and Salman Avestimehr. “Coded terasort.” In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International, pp. 389-398. IEEE, 2017. (Coded TeraShort Implementations: here)
- Useful Slides 1, Slides 2
Further reading:
- Li, Songze, Mohammad Ali Maddah-Ali, Qian Yu, and A. Salman Avestimehr. “A fundamental tradeoff between computation and communication in distributed computing.” IEEE Transactions on Information Theory 64, no. 1 (2018): 109-128.
- Ji, Mingyue, Giuseppe Caire, and Andreas F. Molisch. “Fundamental limits of caching in wireless D2D networks.” IEEE Transactions on Information Theory 62, no. 2 (2016): 849-869.

Lecture 3 (01/15/2019) (Lecture notes, Lecture Scribe 1, Lecture Sribe 2)

Coded Distributed Computing (CDC) and its constraints

References:
- Li, Songze, Mohammad Ali Maddah-Ali, Qian Yu, and A. Salman Avestimehr. “A fundamental tradeoff between computation and communication in distributed computing.” IEEE Transactions on Information Theory 64, no. 1 (2018): 109-128.
Further reading:
- Li, Songze, Qian Yu, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. “A scalable framework for wireless distributed computing.” IEEE/ACM Transactions on Networking 25, no. 5 (2017): 2643-2654.
- Li, Songze, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. “Coding for distributed fog computing.” IEEE Communications Magazine 55, no. 4 (2017): 34-40.
- Reisizadeh, Amirhossein, Saurav Prakash, Ramtin Pedarsani, and Amir Salman Avestimehr. “Coded computation over heterogeneous clusters.” arXiv preprint arXiv:1701.05973 (2017).
- Kiamari, Mehrdad, Chenwei Wang, and A. Salman Avestimehr. “On heterogeneous coded distributed computing.” In GLOBECOM 2017-2017 IEEE Global Communications Conference, pp. 1-7. IEEE, 2017.
- Ezzeldin, Yahya H., Mohammed Karmoose, and Christina Fragouli. “Communication vs distributed computation: an alternative trade-off curve.” In Information Theory Workshop (ITW), 2017 IEEE, pp. 279-283. IEEE, 2017.
- Prakash, Saurav, Amirhossein Reisizadeh, Ramtin Pedarsani, and Salman Avestimehr. “Coded Computing for Distributed Graph Analytics.” arXiv preprint arXiv:1801.05522 (2018). (Coded PageRank Implementations: here)
- Srinivasavaradhan, Sundara Rajan, Linqi Song, and Christina Fragouli. “Distributed Computing Trade-offs with Random Connectivity.” In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1281-1285. IEEE, 2018.
- Song, Linqi, Sundara Rajan Srinivasavaradhan, and Christina Fragouli. “The benefit of being flexible in distributed computation.” In Information Theory Workshop (ITW), 2017 IEEE, pp. 289-293. IEEE, 2017.
- Li, Songze, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. “Compressed Coded Distributed Computing.” arXiv preprint arXiv:1805.01993 (2018).

Lecture 4 (01/17/2019) (Lecture notes part 1, Lecture notes part 2, Lecture Scribe 1, Lecture Scribe 2)

General Scheme for CDC
Coded Distributed Computing (CDC) with a hypercube design
Cascaded Coded Distributed Computing

Refeferences:
- Woolsey, Nicholas, Rong-Rong Chen and Mingyue Ji, “A New Combinatorial Design of Coded Distributed Computing,” 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, 2018, pp. 726-730.
- Li, Songze, Mohammad Ali Maddah-Ali, Qian Yu, and A. Salman Avestimehr. “A fundamental tradeoff between computation and communication in distributed computing.” IEEE Transactions on Information Theory 64, no. 1 (2018): 109-128.
Further reading:
- Konstantinidis, Konstantinos, and Aditya Ramamoorthy. “Leveraging Coding Techniques for Speeding up Distributed Computing.” arXiv preprint arXiv:1802.03049 (2018).
- Woolsey, Nicholas, Rong-Rong Chen and Mingyue Ji, “A New Combinatorial Design of Coded Distributed Computing,” 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, 2018, pp. 726-730.
- Woolsey, Nicholas, Rong-Rong Chen, and Mingyue Ji. “Cascaded Coded Distributed Computing on Heterogeneous Networks.” arXiv preprint arXiv:1901.07670 (2019).
- Yang, Yaoqing, Matteo Interlandi, Pulkit Grover, Soummya Kar, Saeed Amizadeh, and Markus Weimer. “Coded Elastic Computing.” arXiv preprint arXiv:1812.06411 (2018).

Lecture 5 (01/21/2019) (Slides)

Overview of error control codes

References:
- Roth, Ron. Introduction to coding theory. Cambridge University Press, 2006.

Lecture 6 (01/23/2019) (Lecture notes, Lecture Scribe 1, Lecture Scribe 2)

Straggler Mitigation using Codes
1. MDS Codes
2. Prouct Codes
3. Polynomial Codes

References:
- Lee, Kangwook, Maximilian Lam, Ramtin Pedarsani, Dimitris Papailiopoulos, and Kannan Ramchandran. “Speeding up distributed machine learning using codes.” IEEE Transactions on Information Theory 64, no. 3 (2018): 1514-1529.
- Lee, Kangwook, Changho Suh, and Kannan Ramchandran. “High-dimensional coded matrix multiplication.” In Information Theory (ISIT), 2017 IEEE International Symposium on, pp. 2418-2422. IEEE, 2017.
- Yu, Qian, Mohammad Maddah-Ali, and Salman Avestimehr. “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication.” In Advances in Neural Information Processing Systems, pp. 4403-4413. 2017.
Further Reading:
- Huang, Kuang-Hua. “Algorithm-based fault tolerance for matrix operations.” IEEE transactions on computers 100, no. 6 (1984): 518-528.
- Lee, Kangwook, Maximilian Lam, Ramtin Pedarsani, Dimitris Papailiopoulos, and Kannan Ramchandran. “Speeding up distributed machine learning using codes.” IEEE Transactions on Information Theory 64, no. 3 (2018): 1514-1529.
- Yu, Qian, Mohammad Maddah-Ali, and Salman Avestimehr. “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication.” In Advances in Neural Information Processing Systems, pp. 4403-4413. 2017.
- Yu, Qian, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. “Coded fourier transform.” In Communication, Control, and Computing (Allerton), 2017 55th Annual Allerton Conference on, pp. 494-501. IEEE, 2017.
- Tandon, Rashish, Qi Lei, Alexandros G. Dimakis, and Nikos Karampatziakis. “Gradient coding: Avoiding stragglers in distributed learning.” In International Conference on Machine Learning, pp. 3368-3376. 2017.
- Raviv, Netanel, Itzhak Tamo, Rashish Tandon, and Alexandros G. Dimakis. “Gradient coding from cyclic MDS codes and expander graphs.” arXiv preprint arXiv:1707.03858 (2017).
- Ye, Min, and Emmanuel Abbe. “Communication-computation efficient gradient coding.” arXiv preprint arXiv:1802.03475 (2018).

Lecture 7 (01/28/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Straggler Mitigation using Codes
1. Polynomial Codes
2. MatDot Codes

References:
- Yu, Qian, Mohammad Maddah-Ali, and Salman Avestimehr. “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication.” In Advances in Neural Information Processing Systems, pp. 4403-4413. 2017.
- Dutta, Sanghamitra, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck Cadambe, and Pulkit Grover. “On the optimal recovery threshold of coded matrix multiplication.” arXiv preprint arXiv:1801.10292 (2018).
Further Reading:
- Yu, Qian, Mohammad Maddah-Ali, and Salman Avestimehr. “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication.” In Advances in Neural Information Processing Systems, pp. 4403-4413. 2017.
- Dutta, Sanghamitra, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck Cadambe, and Pulkit Grover. “On the optimal recovery threshold of coded matrix multiplication.” arXiv preprint arXiv:1801.10292 (2018).
- Dutta, Sanghamitra, Viveck Cadambe, and Pulkit Grover. “Short-dot: Computing large linear transforms distributedly using coded short dot products.” In Advances In Neural Information Processing Systems, pp. 2100-2108. 2016.
- Dutta, Sanghamitra, Viveck Cadambe, and Pulkit Grover. “Coded convolution for parallel and distributed computing within a deadline.” In Information Theory (ISIT), 2017 IEEE International Symposium on, pp. 2403-2407. IEEE, 2017.
- Sheth, Utsav, Sanghamitra Dutta, Malhar Chaudhari, Haewon Jeong, Yaoqing Yang, Jukka Kohonen, Teemu Roos, and Pulkit Grover. “An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation.” arXiv preprint arXiv:1811.11811 (2018).
- Dutra, Sanghamitra, Ziqian Bai, Haewon Jeong, Tze Meng Low, and Pulkit Grover. “A unified coded deep neural network training strategy based on generalized polydot codes.” In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1585-1589. IEEE, 2018.
- Yu, Qian, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. “Coded fourier transform.” In Communication, Control, and Computing (Allerton), 2017 55th Annual Allerton Conference on, pp. 494-501. IEEE, 2017.
- Tandon, Rashish, Qi Lei, Alexandros G. Dimakis, and Nikos Karampatziakis. “Gradient coding: Avoiding stragglers in distributed learning.” In International Conference on Machine Learning, pp. 3368-3376. 2017.
- Raviv, Netanel, Itzhak Tamo, Rashish Tandon, and Alexandros G. Dimakis. “Gradient coding from cyclic MDS codes and expander graphs.” arXiv preprint arXiv:1707.03858 (2017).
- Ye, Min, and Emmanuel Abbe. “Communication-computation efficient gradient coding.” arXiv preprint arXiv:1802.03475 (2018).
- Yang, Yaoqing, Pulkit Grover, and Soummya Kar. “Coded distributed computing for inverse problems.” In Advances in Neural Information Processing Systems, pp. 709-719. 2017.
- Haddadpour, Farzin, Yaoqing Yang, Malhar Chaudhari, Viveck R. Cadambe, and Pulkit Grover. “Straggler-resilient and communication-efficient distributed iterative linear solver.” arXiv preprint arXiv:1806.06140 (2018).
- Kiani, Shahrzad, Nuwan Ferdinand, and Stark C. Draper. “Exploitation of stragglers in coded computation.” In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1988-1992. IEEE, 2018.
- Ferdinand, Nuwan, and Stark C. Draper. “Hierarchical coded computation.” In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1620-1624. IEEE, 2018.

Lecture 8 (01/31/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Straggler Mitigation using Codes
1. Polydot Codes
2. Entangled Polynomial Codes/Generalized Polydot Codes
Overview of Probability and Random Processes

References:
- Yu, Qian, Mohammad Maddah-Ali, and Salman Avestimehr. “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication.” In Advances in Neural Information Processing Systems, pp. 4403-4413. 2017.
- Dutta, Sanghamitra, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck Cadambe, and Pulkit Grover. “On the optimal recovery threshold of coded matrix multiplication.” arXiv preprint arXiv:1801.10292 (2018).
- Grimmett, Geoffrey, and David Stirzaker. Probability and random processes. Oxford university press, 2001.
- Gallager, Robert G. Stochastic processes: theory for applications. Cambridge University Press, 2013.
Further Reading:
- Same as in Lecture 7

Lecture 9 (02/05/2019) (Slides Probablity and Random Processes Overview, Slides)

Overview of Probability and Random Processes
Using Efficient Redundancy to Reduce Latency in Cloud Systems

References:
- Grimmett, Geoffrey, and David Stirzaker. Probability and random processes. Oxford university press, 2001.
- Gallager, Robert G. Stochastic processes: theory for applications. Cambridge University Press, 2013.
- David, Herbert Aron, and Haikady Navada Nagaraja. “Order statistics.” Encyclopedia of Statistical Sciences 9 (2004) (Chapter 10).
- De Haan, Laurens, and Ana Ferreira. Extreme value theory: an introduction. Springer Science & Business Media, 2007 (Chapter 1)
- Wang, Da, Gauri Joshi, and Gregory Wornell. “Using straggler replication to reduce latency in large-scale parallel computing.” ACM SIGMETRICS Performance Evaluation Review 43, no. 3 (2015): 7-11.

Lecture 10 (02/07/2019) (Slides)

Using Efficient Redundancy to Reduce Latency in Cloud Systems

References:
- Wang, Da, Gauri Joshi, and Gregory Wornell. “Using straggler replication to reduce latency in large-scale parallel computing.” ACM SIGMETRICS Performance Evaluation Review 43, no. 3 (2015): 7-11.
Further reading:
- S. Dutta, V. Cadambe and P. Grover, “Coded convolution for parallel and distributed computing within a deadline,” 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, 2017, pp. 2403-2407.
- Mallick, Ankur, Malhar Chaudhari, and Gauri Joshi. “Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication.” arXiv preprint arXiv:1804.10331 (2018).
- Aktas, Mehmet Fatih, Pei Peng, and Emina Soljanin. “Effective straggler mitigation: Which clones should attack and when?.” arXiv preprint arXiv:1710.00748 (2017).

No Class on 02/12 due to ITA

Lecture 11 (02/14/2019) (Slides)

Overview of Queueing Theory

References:
- Harchol-Balter, Mor. Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, 2013. (Chapter 6)
- Gallager, Robert G. Stochastic processes: theory for applications. Cambridge University Press, 2013.
- Durrett, Richard, and R. Durrett. Essentials of stochastic processes. Vol. 1. New York: Springer, 1999.

Lecture 12 (02/19/2019) (Slides)

Tasks Assignment Policies in Server Farms

References:
- Harchol-Balter, Mor. Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, 2013. (Chapter 24)

Lecture 13 (02/21/2019) (Slides)

Using Efficient Redundancy of Queued Tasks to Reduce Latency and Computing Cost in Cloud Systems

References:
- Gauri Joshi, Emina Soljanin, and Gregory Wornell. “Efficient redundancy techniques for latency reduction in cloud systems.” ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS) 2, no. 2 (2017): 12.
Further Reading:
- Gauri Joshi, Emina Soljanin, and Gregory Wornell. “Efficient redundancy techniques for latency reduction in cloud systems.” ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS) 2, no. 2 (2017): 12.
- Aktas, Mehmet Fatih, Pei Peng, and Emina Soljanin. “Straggler mitigation by delayed relaunch of tasks.” arXiv preprint arXiv:1710.00414 (2017).
- Shuai, Qiqi, Victor OK Li, and Zhiyi Lu. “Which achieves lower latency with redundant requests, replication or coding?.” In GLOBECOM 2017-2017 IEEE Global Communications Conference, pp. 1-6. IEEE, 2017.
- Chakravarthy, Srinivas R., and Alexander Rumyantsev. “Efficient redundancy techniques in cloud and desktop grid systems using MAPGc-type queues.” Open Engineering 8, no. 1 (2018): 17-31.
- Joshi, Gauri. “Synergy via redundancy: Boosting service capacity with adaptive replication.” ACM SIGMETRICS Performance Evaluation Review 45, no. 2 (2018): 21-28.
- Wang, Weina, Mor Harchol-Balter, Haotian Jiang, Alan Scheller-Wolf, and R. Srikant. “Delay asymptotics and bounds for multi-task parallel jobs.” ACM SIGMETRICS Performance Evaluation Review 46, no. 3 (2019): 2-7.
- Kaler, Tim, Yuxiong He, and Sameh Elnikety. “Optimal Reissue Policies for Reducing Tail Latency.” In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 195-206. ACM, 2017.
- Qiu, Zhan, Juan F. Pérez, and Peter G. Harrison. “Tackling latency via replication in distributed systems.” In Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering, pp. 197-208. ACM, 2016.
- Zaryadov, Ivan, Andrey Kradenyh, and Anastasiya Gorbunova. “The Analysis of Cloud Computing System as a Queueing System with Several Servers and a Single Buffer.” In International Conference on Analytical and Computational Methods in Probability Theory, pp. 11-22. Springer, Cham, 2017.
- Wang, Huajin, Jianhui Li, Zhihong Shen, and Yuanchun Zhou. “Approximations and Bounds for (n, k) Fork-Join Queues: A Linear Transformation Approach.” In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 422-431. IEEE, 2018.
- Mukherjee, Debankur. “Scalable load balancing algorithms in networked systems.” arXiv preprint arXiv:1809.02018 (2018).

Lecture 14 (02/26/2019) (Slides)

Overview of Lyapunov Optimizaiton Framework

References:
- Neely, Michael J. “Stochastic network optimization with application to communication and queueing systems.” Synthesis Lectures on Communication Networks 3, no. 1 (2010): 1-211.
- Georgiadis, Leonidas, Michael J. Neely, and Leandros Tassiulas. “Resource allocation and cross-layer control in wireless networks.” Foundations and Trends® in Networking 1, no. 1 (2006): 1-144.

Lecture 15 (02/28/2019) (Slides)

Optimal Dynamic Cloud Network Control

References:
- Feng, Hao, Jaime Llorca, Antonia M. Tulino, and Andreas F. Molisch. “Optimal dynamic cloud network control.” IEEE/ACM Transactions on Networking (TON) 26, no. 5 (2018): 2118-2131.
Further Reading:
- Feng, Hao, Jaime Llorca, Antonia M. Tulino, and Andreas F. Molisch. “Optimal Control of Wireless Computing Networks.” IEEE Transactions on Wireless Communications 17, no. 12 (2018): 8283-8298.
- Wang, Chang-Heng, Jaime Llorca, Antonia M. Tulino, and Tara Javidi. “Dynamic Cloud Network Control under Reconfiguration Delay and Cost.” IEEE/ACM Transactions on Networking (2019).
- Feng, Hao, Jaime Llorca, Antonia M. Tulino, Danny Raz, and Andreas F. Molisch. “Approximation algorithms for the NFV service distribution problem.” In IEEE INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1-9. IEEE, 2017.

Part II: Distributed Storage Theory

Lecture 16 (03/05/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Introduction to Distributed Storage Theory
Network Coding Theory

References;
- A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright and K. Ramchandran, “Network Coding for Distributed Storage Systems,” in IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4539-4551, Sept. 2010.
Further Reading:
- A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright and K. Ramchandran, “Network Coding for Distributed Storage Systems,” in IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4539-4551, Sept. 2010.
- A. G. Dimakis, K. Ramchandran, Y. Wu and C. Suh, “A Survey on Network Codes for Distributed Storage,” in Proceedings of the IEEE, vol. 99, no. 3, pp. 476-489, March 2011.
- Li, S-YR, Raymond W. Yeung, and Ning Cai. “Linear network coding.” IEEE transactions on information theory 49, no. 2 (2003): 371-381.
- Koetter, Ralf, and Muriel Médard. “An algebraic approach to network coding.” IEEE/ACM Transactions on Networking (TON) 11, no. 5 (2003): 782-795.
- T. Ho et al., “A Random Linear Network Coding Approach to Multicast,” in IEEE Transactions on Information Theory, vol. 52, no. 10, pp. 4413-4430, Oct. 2006.

Lecture 17 (03/07/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Information Flow Graph and Applications

References;
- A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright and K. Ramchandran, “Network Coding for Distributed Storage Systems,” in IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4539-4551, Sept. 2010.
Further Reading:
- Same as in Lecture 16

Lecture 18 (03/19/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Storage-Bandwidth Tradeoff
1. Proofs
2. Minimum Bandwidth Regenerating (MBR) codes
3. Minimum Storage Regerating (MSR) codes

References:
- A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright and K. Ramchandran, “Network Coding for Distributed Storage Systems,” in IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4539-4551, Sept. 2010.
Further Reading:
- Same as in Lecture 16

Lecture 19 (03/21/2019) (Lecture Notes, guest lecture by Nicholas Woolsey , Lecture Scribe 1, Lecture Scribe 2)

Private Information Retrieval

References:
- Sun, Hua, and Syed Ali Jafar. “The capacity of private information retrieval.” IEEE Transactions on Information Theory 63, no. 7 (2017): 4075-4088.
Further Reading:
- Sun, Hua, and Syed Ali Jafar. “The capacity of private information retrieval.” IEEE Transactions on Information Theory 63, no. 7 (2017): 4075-4088.
- Sun, Hua, and Syed Ali Jafar. “The capacity of robust private information retrieval with colluding databases.” IEEE Transactions on Information Theory 64, no. 4 (2018): 2361-2370.
- Sun, Hua, and Syed Ali Jafar. “The capacity of symmetric private information retrieval.” IEEE Transactions on Information Theory 65, no. 1 (2019): 322-329.
- Sun, Hua, and Syed Ali Jafar. “Private Information Retrieval from MDS Coded Data With Colluding Servers: Settling a Conjecture by Freij-Hollantiet al.” IEEE Transactions on Information Theory 64, no. 2 (2018): 1000-1022.
- Sun, Hua, and Syed Ali Jafar. “Optimal download cost of private information retrieval for arbitrary message length.” IEEE Transactions on Information Forensics and Security 12, no. 12 (2017): 2920-2932.
- Sun, Hua, and Syed Ali Jafar. “Multiround private information retrieval: Capacity and storage overhead.” IEEE Transactions on Information Theory 64, no. 8 (2018): 5743-5754.
- Sun, Hua, and Syed Ali Jafar. “The capacity of private computation.” IEEE Transactions on Information Theory (2018).
- Tian, Chao, Hua Sun, and Jun Chen. “Capacity-achieving private information retrieval codes with optimal message size and upload cost.” arXiv preprint arXiv:1808.07536 (2018).
- Banawan, Karim, and Sennur Ulukus. “The capacity of private information retrieval from coded databases.” IEEE Transactions on Information Theory 64, no. 3 (2018): 1945-1956.
- Banawan, Karim, and Sennur Ulukus. “The capacity of private information retrieval from Byzantine and colluding databases.” IEEE Transactions on Information Theory 65, no. 2 (2019): 1206-1219.
- Banawan, Karim, and Sennur Ulukus. “Multi-message private information retrieval: Capacity results and near-optimal schemes.” IEEE Transactions on Information Theory 64, no. 10 (2018): 6842-6862.
- Wei, Yi-Peng, Karim Banawan, and Sennur Ulukus. “Fundamental limits of cache-aided private information retrieval with unknown and uncoded prefetching.” IEEE Transactions on Information Theory (2018).
- Banawan, Karim, Batuhan Arasli, Yi-Peng Wei, and Sennur Ulukus. “The Capacity of Private Information Retrieval from Heterogeneous Uncoded Caching Databases.” arXiv preprint arXiv:1902.09512 (2019).
- Wei, Yi-Peng, Batuhan Arasli, Karim Banawan, and Sennur Ulukus. “The capacity of private information retrieval from decentralized uncoded caching databases.” arXiv preprint arXiv:1811.11160 (2018).
- Wei, Yi-Peng, Karim Banawan, and Sennur Ulukus. “Cache-aided private information retrieval with partially known uncoded prefetching: Fundamental limits.” IEEE Journal on Selected Areas in Communications 36, no. 6 (2018): 1126-1139.
- Attia, Mohamed Adel, Deepak Kumar, and Ravi Tandon. “The capacity of private information retrieval from uncoded storage constrained databases.” arXiv preprint arXiv:1805.04104 (2018).
- Tandon, Ravi. “The capacity of cache aided private information retrieval.” In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1078-1082. IEEE, 2017.
- Wang, Qiwen, and Mikael Skoglund. “Symmetric Private Information Retrieval from MDS Coded Distributed Storage with Non-colluding and Colluding Servers.” IEEE Transactions on Information Theory (2019).
- Wang, Qiwen, Hua Sun, and Mikael Skoglund. “The capacity of private information retrieval with eavesdroppers.” IEEE Transactions on Information Theory (2018).
- Shah, Nihar B., K. V. Rashmi, and Kannan Ramchandran. “One extra bit of download ensures perfectly private information retrieval.” In 2014 IEEE International Symposium on Information Theory, pp. 856-860. IEEE, 2014.
- Fanti, Giulia, and Kannan Ramchandran. “Efficient private information retrieval over unsynchronized databases.” IEEE Journal of Selected Topics in Signal Processing 9, no. 7 (2015): 1229-1239.
- Mirmohseni, Mahtab, and Mohammad Ali Maddah-Ali. “Private function retrieval.” In 2018 Iran Workshop on Communication and Information Theory (IWCIT), pp. 1-6. IEEE, 2018.
- Woolsey, Nicholas, Rong-Rong Chen, and Mingyue Ji. “A New Design of Private Information Retrieval for Storage Constrained Databases.” arXiv preprint arXiv:1901.07490 (2019).

Lecture 20 (03/26/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2, Lecture Scribe 3)

Exact Repair: Minimum Bandwidth Regenerating Codes
1. Fractional Repetition Code

References:
- Rashmi, K. V., Nihar B. Shah, P. Vijay Kumar, and Kannan Ramchandran. “Explicit construction of optimal exact regenerating codes for distributed storage.” In 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1243-1249. IEEE, 2009.
- El Rouayheb, Salim, and Kannan Ramchandran. “Fractional repetition codes for repair in distributed storage systems.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1510-1517. IEEE, 2010.
- Pawar, Sameer, Nima Noorshams, Salim El Rouayheb, and Kannan Ramchandran. “Dress codes for the storage cloud: Simple randomized constructions.” In 2011 IEEE International Symposium on Information Theory Proceedings, pp. 2338-2342. IEEE, 2011.
Further Reading:
- Rashmi, K. V., Nihar B. Shah, P. Vijay Kumar, and Kannan Ramchandran. “Explicit construction of optimal exact regenerating codes for distributed storage.” In 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1243-1249. IEEE, 2009.
- El Rouayheb, Salim, and Kannan Ramchandran. “Fractional repetition codes for repair in distributed storage systems.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1510-1517. IEEE, 2010.
- Pawar, Sameer, Nima Noorshams, Salim El Rouayheb, and Kannan Ramchandran. “Dress codes for the storage cloud: Simple randomized constructions.” In 2011 IEEE International Symposium on Information Theory Proceedings, pp. 2338-2342. IEEE, 2011.
- Rashmi, Korlakai Vinayak, Nihar B. Shah, and P. Vijay Kumar. “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction.” IEEE Transactions on Information Theory 57, no. 8 (2011): 5227-5239.
- Pawar, Sameer, Salim El Rouayheb, Hao Zhang, Kangwook Lee, and Kannan Ramchandran. “Codes for a distributed caching based video-on-demand system.” In 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 1783-1787. IEEE, 2011.

Lecture 21 (03/28/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Exact Repair: Minimum Bandwidth Regenerating Codes
1. DRESS code
2. Product-Matrix Code

References:
- El Rouayheb, Salim, and Kannan Ramchandran. “Fractional repetition codes for repair in distributed storage systems.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1510-1517. IEEE, 2010.
- Pawar, Sameer, Nima Noorshams, Salim El Rouayheb, and Kannan Ramchandran. “Dress codes for the storage cloud: Simple randomized constructions.” In 2011 IEEE International Symposium on Information Theory Proceedings, pp. 2338-2342. IEEE, 2011.
- Rashmi, Korlakai Vinayak, Nihar B. Shah, and P. Vijay Kumar. “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction.” IEEE Transactions on Information Theory 57, no. 8 (2011): 5227-5239.
Further Reading:
- Rashmi, K. V., Nihar B. Shah, P. Vijay Kumar, and Kannan Ramchandran. “Explicit construction of optimal exact regenerating codes for distributed storage.” In 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1243-1249. IEEE, 2009.
- El Rouayheb, Salim, and Kannan Ramchandran. “Fractional repetition codes for repair in distributed storage systems.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1510-1517. IEEE, 2010.
- Pawar, Sameer, Nima Noorshams, Salim El Rouayheb, and Kannan Ramchandran. “Dress codes for the storage cloud: Simple randomized constructions.” In 2011 IEEE International Symposium on Information Theory Proceedings, pp. 2338-2342. IEEE, 2011.
- Rashmi, Korlakai Vinayak, Nihar B. Shah, and P. Vijay Kumar. “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction.” IEEE Transactions on Information Theory 57, no. 8 (2011): 5227-5239.
- Pawar, Sameer, Salim El Rouayheb, Hao Zhang, Kangwook Lee, and Kannan Ramchandran. “Codes for a distributed caching based video-on-demand system.” In 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 1783-1787. IEEE, 2011.

Lecture 22 (04/02/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Exact Repair: Minimum Storage Regenerating Codes
1. Product-Matrix Code
2. Exact Repair MDS Codes using Interference Alignment
3. Piggyback-RS Codes
Locally Repairable Codes

References:
- Wu, Yunnan, and Alexandros G. Dimakis. “Reducing repair traffic for erasure coding-based storage via interference alignment.” In 2009 IEEE International Symposium on Information Theory, pp. 2276-2280. IEEE, 2009.
- Rashmi, Korlakai Vinayak, Nihar B. Shah, and P. Vijay Kumar. “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction.” IEEE Transactions on Information Theory 57, no. 8 (2011): 5227-5239.
- Rashmi, K. V., Nihar B. Shah, and Kannan Ramchandran. “A piggybacking design framework for read-and download-efficient distributed storage codes.” IEEE Transactions on Information Theory 63, no. 9 (2017): 5802-5820.
- Sathiamoorthy, Maheswaran, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. “Xoring elephants: Novel erasure codes for big data.” In Proceedings of the VLDB Endowment, vol. 6, no. 5, pp. 325-336. VLDB Endowment, 2013.
Further Reading:
- Wu, Yunnan, and Alexandros G. Dimakis. “Reducing repair traffic for erasure coding-based storage via interference alignment.” In 2009 IEEE International Symposium on Information Theory, pp. 2276-2280. IEEE, 2009.
- Rashmi, K. V., Nihar B. Shah, and Kannan Ramchandran. “A piggybacking design framework for read-and download-efficient distributed storage codes.” IEEE Transactions on Information Theory 63, no. 9 (2017): 5802-5820.
- Sathiamoorthy, Maheswaran, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. “Xoring elephants: Novel erasure codes for big data.” In Proceedings of the VLDB Endowment, vol. 6, no. 5, pp. 325-336. VLDB Endowment, 2013.
- Rashmi, Korlakai Vinayak, Nihar B. Shah, and P. Vijay Kumar. “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction.” IEEE Transactions on Information Theory 57, no. 8 (2011): 5227-5239.
- Cadambe, Viveck R., Syed Ali Jafar, Hamed Maleki, Kannan Ramchandran, and Changho Suh. “Asymptotic interference alignment for optimal repair of MDS codes in distributed storage.” IEEE Transactions on Information Theory 59, no. 5 (2013): 2974-2987.
- Suh, Changho, and Kannan Ramchandran. “Exact-repair MDS code construction using interference alignment.” IEEE Transactions on Information Theory 57, no. 3 (2011): 1425-1442.
- Papailiopoulos, Dimitris S., and Alexandros G. Dimakis. “Locally repairable codes.” IEEE Transactions on Information Theory 60, no. 10 (2014): 5843-5855.

Part III: Distributed Machine Learning

Lecture 23 (04/04/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Distributed Stochastic Gradient Descent with Mini-batches
1. Parameter Server-Worker architecture
2. Convergence Analysis

References:
- Yin, Dong, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, and Peter Bartlett. “Gradient diversity: a key ingredient for scalable distributed learning.” arXiv preprint arXiv:1706.05699 (2017).
Further Reading:
- Yin, Dong, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, and Peter Bartlett. “Gradient diversity: a key ingredient for scalable distributed learning.” arXiv preprint arXiv:1706.05699 (2017).
- Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. “Optimization methods for large-scale machine learning.” Siam Review 60, no. 2 (2018): 223-311.
- Bousquet, Olivier, and André Elisseeff. “Stability and generalization.” Journal of machine learning research 2, no. Mar (2002): 499-526.
- Chen, Jianmin, Xinghao Pan, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. “Revisiting distributed synchronous SGD.” arXiv preprint arXiv:1604.00981 (2016).
- Cotter, Andrew, Ohad Shamir, Nati Srebro, and Karthik Sridharan. “Better mini-batch algorithms via accelerated gradient methods.” In Advances in neural information processing systems, pp. 1647-1655. 2011.
- De, Soham, Abhay Yadav, David Jacobs, and Tom Goldstein. “Big batch SGD: Automated inference using adaptive batch sizes.” arXiv preprint arXiv:1610.05792 (2016).
- Karimi, Hamed, Julie Nutini, and Mark Schmidt. “Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition.” In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795-811. Springer, Cham, 2016.
- Lee, Jason D., Qihang Lin, Tengyu Ma, and Tianbao Yang. “Distributed stochastic variance reduced gradient methods by sampling extra data with replacement.” The Journal of Machine Learning Research 18, no. 1 (2017): 4404-4446.
- Li, Mu, Tong Zhang, Yuqiang Chen, and Alexander J. Smola. “Efficient mini-batch training for stochastic optimization.” In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 661-670. ACM, 2014.

Lecture 24 (04/09/2019) (Lecture Notes, Lecture Scribe 1)

Distributed Stochastic Gradient Descent with Mini-batches
1. Gradient Diversity and applications
Asynchronous Parallel Sparse Gradient Descent
1. Problem Formulation
2. Examples
3. Hogwild

References:
- Yin, Dong, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, and Peter Bartlett. “Gradient diversity: a key ingredient for scalable distributed learning.” arXiv preprint arXiv:1706.05699 (2017).
- Recht, Benjamin, Christopher Re, Stephen Wright, and Feng Niu. “Hogwild: A lock-free approach to parallelizing stochastic gradient descent.” In Advances in neural information processing systems, pp. 693-701. 2011.
- Mania, Horia, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. “Perturbed iterate analysis for asynchronous stochastic optimization.” arXiv preprint arXiv:1507.06970 (2015).
Further Reading:
- Yin, Dong, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, and Peter Bartlett. “Gradient diversity: a key ingredient for scalable distributed learning.” arXiv preprint arXiv:1706.05699 (2017).
- Recht, Benjamin, Christopher Re, Stephen Wright, and Feng Niu. “Hogwild: A lock-free approach to parallelizing stochastic gradient descent.” In Advances in neural information processing systems, pp. 693-701. 2011.
- Mania, Horia, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. “Perturbed iterate analysis for asynchronous stochastic optimization.” arXiv preprint arXiv:1507.06970 (2015).
- Pan, Xinghao, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, and Christopher Ré. “Cyclades: Conflict-free asynchronous machine learning.” In Advances in Neural Information Processing Systems, pp. 2568-2576. 2016.

Lecture 25 (04/11/2019) (Lecture Notes, Lecture notes (Asyn. Mini-batch SGD (didn't cover in class)), Lecture Scribe 1, Lecture Scribe 2)

Asynchronous Parallel Sparse Gradient Descent
1. Hogwild and convergence analysis
Asynchronous Mini-batch SGD (didn't cover in class)

References:
- Recht, Benjamin, Christopher Re, Stephen Wright, and Feng Niu. “Hogwild: A lock-free approach to parallelizing stochastic gradient descent.” In Advances in neural information processing systems, pp. 693-701. 2011.
- Mania, Horia, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. “Perturbed iterate analysis for asynchronous stochastic optimization.” arXiv preprint arXiv:1507.06970 (2015).
- Dutta, Sanghamitra, Gauri Joshi, Soumyadip Ghosh, Parijat Dube, and Priya Nagpurkar. “Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD.” arXiv preprint arXiv:1803.01113 (2018).
Further Reading:
- Recht, Benjamin, Christopher Re, Stephen Wright, and Feng Niu. “Hogwild: A lock-free approach to parallelizing stochastic gradient descent.” In Advances in neural information processing systems, pp. 693-701. 2011.
- Mania, Horia, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. “Perturbed iterate analysis for asynchronous stochastic optimization.” arXiv preprint arXiv:1507.06970 (2015).
- Dutta, Sanghamitra, Gauri Joshi, Soumyadip Ghosh, Parijat Dube, and Priya Nagpurkar. “Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD.” arXiv preprint arXiv:1803.01113 (2018).
- Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. “Optimization methods for large-scale machine learning.” Siam Review 60, no. 2 (2018): 223-311.

Lecture 26 (04/16/2019) (Slides)

Federated Learning
1. System Design
2. Detailed Implementations using Raspberry Pi's and Sochket Programming

References:
- Wang, Shiqiang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, and Kevin Chan. “When edge meets learning: Adaptive control for resource-constrained distributed machine learning.” In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 63-71. IEEE, 2018.
- Tuor, Tiffany, Shiqiang Wang, Theodoras Salonidis, Bong Jun Ko, and Kin K. Leung. “Demo abstract: Distributed machine learning at resource-limited edge nodes.” In IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 1-2. IEEE, 2018.
- Wang, Shiqiang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, and Kevin Chan. “Adaptive federated learning in resource constrained edge computing systems.” IEEE Journal on Selected Areas in Communications (2019).
Further Reading:
- Konečný, Jakub, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. “Federated learning: Strategies for improving communication efficiency.” arXiv preprint arXiv:1610.05492 (2016).
- Wang, Shiqiang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, and Kevin Chan. “When edge meets learning: Adaptive control for resource-constrained distributed machine learning.” In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 63-71. IEEE, 2018.
- Tuor, Tiffany, Shiqiang Wang, Theodoras Salonidis, Bong Jun Ko, and Kin K. Leung. “Demo abstract: Distributed machine learning at resource-limited edge nodes.” In IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 1-2. IEEE, 2018.
- Wang, Shiqiang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, and Kevin Chan. “Adaptive federated learning in resource constrained edge computing systems.” IEEE Journal on Selected Areas in Communications (2019).
- Tuor, Tiffany, Shiqiang Wang, Kin K. Leung, and Kevin Chan. “Distributed machine learning in coalition environments: overview of techniques.” In 2018 21st International Conference on Information Fusion (FUSION), pp. 814-821. IEEE, 2018.
- Yousefpour, Ashkan, Caleb Fung, Tam Nguyen, Krishna Kadiyala, Fatemeh Jalali, Amirreza Niakanlahiji, Jian Kong, and Jason P. Jue. “All one needs to know about fog computing and related edge computing paradigms: a complete survey.” Journal of Systems Architecture (2019).
- Yang, Qiang, Yang Liu, Tianjian Chen, and Yongxin Tong. “Federated Machine Learning: Concept and Applications.” ACM Transactions on Intelligent Systems and Technology (TIST) 10, no. 2 (2019): 12.
- B. McMahan and D. Ramage, “Federated learning: Collaborative machine learning without centralized training data,” Apr. 2017. Online. Available: https:ai.googleblog.com201704/federated- learning-collaborative.html
- J. Wang and G. Joshi, “Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms,” arXiv, Jan. 2019. Online. Available: http:arxiv.orgabs1808.07576
- J. Wang and G. Joshi, “Adaptive communication strategies to achieve the best error-runtime trade-off in local-update SGD,” in SysML, Mar.–Apr. 2019. Online. Available: http:arxiv.orgabs1810.08313

Lecture 27 (04/18/2019) (Lecture Notes, Lecture Notes from Xiang Zhang)

Federated Learning
1. Convergence Analysis

References:
- S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge computing systems,” Tech. Rep., 2019. Online. Available: https:arxiv.orgabs1804.05271
Further Reading:
- Same as in Lecture 26

Lecture 28 (04/23/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Communication Bottleneck and Gradient Quantization

References:
- Alistarh, Dan, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. “QSGD: Communication-efficient SGD via gradient quantization and encoding.” In Advances in Neural Information Processing Systems, pp. 1709-1720. 2017.
- Wen, Wei, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. “Terngrad: Ternary gradients to reduce communication in distributed deep learning.” In Advances in neural information processing systems, pp. 1509-1519. 2017.
Further Reading:
- Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In INTERSPEECH, 2014.
- Nikko Strom. Scalable distributed DNN training using commodity GPU cloud computing. In INTERSPEECH, 2015.
- Alistarh, Dan, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. “QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding.” arXiv preprint arXiv:1610.02132 (2016).
- Wen, Wei, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. “TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning.” arXiv preprint arXiv:1705.07878 (2017).
- Christopher M De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. Taming the wild: A unified analysis of hogwild-style algorithms. In NIPS, 2015.

ECE/CS 6960/5960 Fundamentals of Cloud Systems

Project Ideas

Lecture 1 (01/08/2019) (slides, syllabus)

Part I: Cloud Computing

Lecture 2 (01/10/2019) (Lecture notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 3 (01/15/2019) (Lecture notes, Lecture Scribe 1, Lecture Sribe 2)

Lecture 4 (01/17/2019) (Lecture notes part 1, Lecture notes part 2, Lecture Scribe 1, Lecture Scribe 2)

Lecture 5 (01/21/2019) (Slides)

Lecture 6 (01/23/2019) (Lecture notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 7 (01/28/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 8 (01/31/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 9 (02/05/2019) (Slides Probablity and Random Processes Overview, Slides)

Lecture 10 (02/07/2019) (Slides)

No Class on 02/12 due to ITA

Lecture 11 (02/14/2019) (Slides)

Lecture 12 (02/19/2019) (Slides)

Lecture 13 (02/21/2019) (Slides)

Lecture 14 (02/26/2019) (Slides)

Lecture 15 (02/28/2019) (Slides)

Part II: Distributed Storage Theory

Lecture 16 (03/05/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 17 (03/07/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 18 (03/19/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 19 (03/21/2019) (Lecture Notes, guest lecture by Nicholas Woolsey , Lecture Scribe 1, Lecture Scribe 2)

Lecture 20 (03/26/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2, Lecture Scribe 3)

Lecture 21 (03/28/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 22 (04/02/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Part III: Distributed Machine Learning

Lecture 23 (04/04/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Lecture 24 (04/09/2019) (Lecture Notes, Lecture Scribe 1)

Lecture 25 (04/11/2019) (Lecture Notes, Lecture notes (Asyn. Mini-batch SGD (didn't cover in class)), Lecture Scribe 1, Lecture Scribe 2)

Lecture 26 (04/16/2019) (Slides)

Lecture 27 (04/18/2019) (Lecture Notes, Lecture Notes from Xiang Zhang)

Lecture 28 (04/23/2019) (Lecture Notes, Lecture Scribe 1, Lecture Scribe 2)

Downloads and Useful Links