Matrix Multiplication ===================== Matrix Multiplication is one of the most widely operators in scientific computing and deep learning, which is typically referred to as *GEMM* (GEneral Matrix Multiply). Let's implement its computation in this section. Given :math:`A\in\mathbb R^{n\times l}`, and :math:`B \in\mathbb R^{l\times m}`, if :math:`C=AB` then :math:`C \in\mathbb R^{n\times m}` and .. math:: C_{i,j} = \sum_{k=1}^l A_{i,k} B_{k,j}. The elements accessed to compute :math:`C_{i,j}` are illustrated in :numref:`fig_matmul_default`. .. _fig_matmul_default: .. figure:: ../img/matmul_default.svg Compute :math:`C_{x,y}` in matrix multiplication. The following method returns the computing expression of matrix multiplication. .. code:: python import d2ltvm import numpy as np import tvm from tvm import te # Save to the d2ltvm package def matmul(n, m, l): """Return the computing expression of matrix multiplication A : n x l matrix B : l x m matrix C : n x m matrix with C = A B """ k = te.reduce_axis((0, l), name='k') A = te.placeholder((n, l), name='A') B = te.placeholder((l, m), name='B') C = te.compute((n, m), lambda x, y: te.sum(A[x, k] * B[k, y], axis=k), name='C') return A, B, C Let's compile a module for a square matrix multiplication. .. code:: python n = 100 A, B, C = matmul(n, n, n) s = te.create_schedule(C.op) print(tvm.lower(s, [A, B], simple_mode=True)) mod = tvm.build(s, [A, B, C]) .. parsed-literal:: :class: output // attr [C] storage_scope = "global" allocate C[float32 * 10000] produce C { for (x, 0, 100) { for (y, 0, 100) { C[((x*100) + y)] = 0f for (k, 0, 100) { C[((x*100) + y)] = (C[((x*100) + y)] + (A[((x*100) + k)]*B[((k*100) + y)])) } } } } The pseudo code is simply a naive 3-level nested for loop to calculate the matrix multiplication. And then we verify the results. Note that NumPy may use multi-threading to accelerate its computing, which may result in slightly different results due to the numerical error. There we use ``assert_allclose`` with a relative large tolerant error to test the correctness. .. code:: python a, b, c = d2ltvm.get_abc((100, 100), tvm.nd.array) mod(a, b, c) np.testing.assert_allclose(np.dot(a.asnumpy(), b.asnumpy()), c.asnumpy(), atol=1e-5) Summary ------- - We can express the computation of matrix multiplication in TVM in one line of code. - The naive matrix multiplication is a 3-level nested for loop.