References¶
- Chen et al., 2015
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., … Zhang, Z. (2015). Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.
- Chen et al., 2018
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., … others. (2018). Tvm: an automated end-to-end optimizing compiler for deep learning. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (pp. 578–594).
- Howard et al., 2017
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., … Adam, H. (2017). Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
- Lai & Seznec, 2013
Lai, J., & Seznec, A. (2013). Performance upper bound analysis and optimization of sgemm on fermi and kepler gpus. Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (pp. 1–10).
- Liu et al., 2019
Liu, Y., Wang, Y., Yu, R., Li, M., Sharma, V., & Wang, Y. (2019). Optimizing cnn model inference on cpus. 2019 USENIX Annual Technical Conference (USENIX ATC 19) (pp. 1025–1040).
- Nath et al., 2010
Nath, R., Tomov, S., & Dongarra, J. (2010). An improved magma gemm for fermi graphics processing units. The International Journal of High Performance Computing Applications, 24(4), 511–515.
- Ragan-Kelley et al., 2013
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 519–530). ACM.
- Roesch et al., 2019
Roesch, J., Lyubomirsky, S., Kirisame, M., Pollock, J., Weber, L., Jiang, Z., … Tatlock, Z. (2019). Relay: a high-level ir for deep learning. arXiv preprint arXiv:1904.08368.