Reduction Operations ==================== Reduction is an operation to reduce certain dimension(s) of an input tensor, usually to scalar(s), e.g. ``np.sum`` in NumPy. Reduction is often straightforward to implement with for-loops. But it's a little bit more complicated in TVM since we cannot use a Python for-loop directly. In this section, we will describe how to implement reduction in TVM. .. code:: python import d2ltvm import numpy as np import tvm from tvm import te Sum --- Let's start with summing the rows of a 2-D matrix to reduce it to be a 1-D vector. In NumPy, we can do it with the ``sum`` method. .. code:: python a = np.random.normal(size=(3,4)).astype('float32') a.sum(axis=1) .. parsed-literal:: :class: output array([ 1.8948135, -2.4319794, 1.9638997], dtype=float32) As we did before, let's implement this operation from scratch to help understand the TVM expression. .. code:: python def sum_rows(a, b): """a is an n-by-m 2-D matrix, b is an n-length 1-D vector """ n = len(b) for i in range(n): b[i] = np.sum(a[i,:]) b = np.empty((3,), dtype='float32') sum_rows(a, b) b .. parsed-literal:: :class: output array([ 1.8948135, -2.4319794, 1.9638997], dtype=float32) It's fairly straightforward, we first iterate on the first dimension, ``axis=0``, and then sum all elements on the second dimension to write the results. In NumPy, we can use ``:`` to slice all elements along that dimension. Now let's implement the same thing in TVM. Comparing to the vector addition in :numref:`ch_vector_add`, we used two new operators here. One is ``tvm.reduce_axis``, which create an axis for reduction with range from 0 to ``m``. It's functionally similar to the ``:`` used in ``sum_rows``, but we need to explicitly specify the range in TVM. The other one is ``tvm.sum``, which sums all elements along the reducing axis ``k`` and returns a scalar. .. code:: python n, m = te.var('n'), te.var('m') A = te.placeholder((n, m), name='a') j = te.reduce_axis((0, m), name='j') B = te.compute((n,), lambda i: te.sum(A[i, j], axis=j), name='b') s = te.create_schedule(B.op) tvm.lower(s, [A, B], simple_mode=True) .. parsed-literal:: :class: output produce b { for (i, 0, n) { b[(i*stride)] = 0f for (j, 0, m) { b[(i*stride)] = (b[(i*stride)] + a[((i*stride) + (j*stride))]) } } } We can see that the generated pseudo codes expand ``tvm.sum`` into another for loop along axis ``k``. As mentioned before, the pseudo codes are C-like, so the index of ``a[i,j]`` is expanded to ``(i*m)+j`` by treating ``a`` as a 1-D array. Also note that ``b`` is initialized to be all-zero before summation. Now test the results are as expected. .. code:: python mod = tvm.build(s, [A, B]) c = tvm.nd.array(np.empty((3,), dtype='float32')) mod(tvm.nd.array(a), c) np.testing.assert_equal(b, c.asnumpy()) We know that ``a.sum()`` will sum all elements in ``a`` and returns a scalar. Let's also implement this in TVM. To do it, we need another reduction axis along the first dimension, whose size is ``n``. The result is a scalar, namely a 0-rank tensor, can be created with an empty tuple ``()``. .. code:: python i = te.reduce_axis((0, n), name='i') B = te.compute((), lambda: te.sum(A[i, j], axis=(i, j)), name='b') s = te.create_schedule(B.op) tvm.lower(s, [A, B], simple_mode=True) .. parsed-literal:: :class: output produce b { b[0] = 0f for (i, 0, n) { for (j, 0, m) { b[0] = (b[0] + a[((i*stride) + (j*stride))]) } } } Let's also verify the results. .. code:: python mod = tvm.build(s, [A, B]) c = tvm.nd.array(np.empty((), dtype='float32')) mod(tvm.nd.array(a), c) np.testing.assert_allclose(a.sum(), c.asnumpy(), atol=1e-5) In this case we use ``np.testing.assert_allclose`` instead of ``np.testing.assert_equal`` to verify the results as the calculation on ``float32`` numbers may differ due to the numerical error. Beyond ``tvm.sum``, there are other reduction operators in TVM such as ``tvm.min`` and ``tvm.max``. We can also use them to implement the corresponding reduction operations as well. Commutative Reduction --------------------- In mathematics, an operator :math:`\circ` is commutative if :math:`a\circ b = b\circ a`. TVM allows to define a customized commutative reduction operator through ``tvm.comm_reducer``. It accepts two function arguments, one defines how to compute :math:`a\circ b`, the other one specifies the initial value. Let's use the production by rows, e.g ``a.prod(axis=1)``, as an example. Again, we first show how to implement it from scratch. .. code:: python def prod_rows(a, b): """a is an n-by-m 2-D matrix, b is an n-length 1-D vector """ n, m = a.shape for i in range(n): b[i] = 1 for j in range(m): b[i] = b[i] * a[i, j] As can be seen, we need to first initialize the return values to be 1, and then compute the reduction using scalar product ``*``. Now let's define these two functions in TVM to serve as the arguments of ``te.comm_reducer``. As discussed, the first one defines :math:`a\circ b` with two scalar inputs. The second one accepts a data type argument to return the initial value of an element. Then we can create the reduction operator. .. code:: python comp = lambda a, b: a * b init = lambda dtype: tvm.tir.const(1, dtype=dtype) product = te.comm_reducer(comp, init) The usage of ``product`` is similar to ``te.sum``. Actually, ``te.sum`` is a pre-defined reduction operator using the same way. .. code:: python n = te.var('n') m = te.var('m') A = te.placeholder((n, m), name='a') k = te.reduce_axis((0, m), name='k') B = te.compute((n,), lambda i: product(A[i, k], axis=k), name='b') s = te.create_schedule(B.op) tvm.lower(s, [A, B], simple_mode=True) .. parsed-literal:: :class: output produce b { for (i, 0, n) { b[(i*stride)] = 1f for (k, 0, m) { b[(i*stride)] = (b[(i*stride)]*a[((i*stride) + (k*stride))]) } } } The generated pseudo codes are similar to the one for summing by rows, except for the initialized value and the reduction arithmetic. Again, let's verify the results. .. code:: python mod = tvm.build(s, [A, B]) b = tvm.nd.array(np.empty((3,), dtype='float32')) mod(tvm.nd.array(a), b) np.testing.assert_allclose(a.prod(axis=1), b.asnumpy(), atol=1e-5) Summary ------- - We can apply a reduction operator, e.g. ``te.sum`` over a reduction axis ``te.reduce_axis``. - We can implement customized commutative reduction operators by ``te.comm_reducer``.