Reduction Operations
====================

Reduction is an operation to reduce certain dimension(s) of an input
tensor, usually to scalar(s), e.g. ``np.sum`` in NumPy. Reduction is
often straightforward to implement with for-loops. But it's a little bit
more complicated in TVM since we cannot use a Python for-loop directly.
In this section, we will describe how to implement reduction in TVM.

.. code:: python

    import d2ltvm
    import numpy as np
    import tvm
    from tvm import te

Sum
---

Let's start with summing the rows of a 2-D matrix to reduce it to be a
1-D vector. In NumPy, we can do it with the ``sum`` method.

.. code:: python

    a = np.random.normal(size=(3,4)).astype('float32')
    a.sum(axis=1)


.. parsed-literal::
    :class: output

    array([ 1.8948135, -2.4319794,  1.9638997], dtype=float32)


As we did before, let's implement this operation from scratch to help
understand the TVM expression.

.. code:: python

    def sum_rows(a, b):
        """a is an n-by-m 2-D matrix, b is an n-length 1-D vector 
        """
        n = len(b)
        for i in range(n):
            b[i] = np.sum(a[i,:])
    
    b = np.empty((3,), dtype='float32')
    sum_rows(a, b)
    b


.. parsed-literal::
    :class: output

    array([ 1.8948135, -2.4319794,  1.9638997], dtype=float32)


It's fairly straightforward, we first iterate on the first dimension,
``axis=0``, and then sum all elements on the second dimension to write
the results. In NumPy, we can use ``:`` to slice all elements along that
dimension.

Now let's implement the same thing in TVM. Comparing to the vector
addition in :numref:`ch_vector_add`, we used two new operators here.
One is ``tvm.reduce_axis``, which create an axis for reduction with
range from 0 to ``m``. It's functionally similar to the ``:`` used in
``sum_rows``, but we need to explicitly specify the range in TVM. The
other one is ``tvm.sum``, which sums all elements along the reducing
axis ``k`` and returns a scalar.

.. code:: python

    n, m = te.var('n'), te.var('m')
    A = te.placeholder((n, m), name='a')
    j = te.reduce_axis((0, m), name='j')
    B = te.compute((n,), lambda i: te.sum(A[i, j], axis=j), name='b')
    s = te.create_schedule(B.op)
    tvm.lower(s, [A, B], simple_mode=True)


.. parsed-literal::
    :class: output

    produce b {
      for (i, 0, n) {
        b[(i*stride)] = 0f
        for (j, 0, m) {
          b[(i*stride)] = (b[(i*stride)] + a[((i*stride) + (j*stride))])
        }
      }
    }


We can see that the generated pseudo codes expand ``tvm.sum`` into
another for loop along axis ``k``. As mentioned before, the pseudo codes
are C-like, so the index of ``a[i,j]`` is expanded to ``(i*m)+j`` by
treating ``a`` as a 1-D array. Also note that ``b`` is initialized to be
all-zero before summation.

Now test the results are as expected.

.. code:: python

    mod = tvm.build(s, [A, B])
    c = tvm.nd.array(np.empty((3,), dtype='float32'))
    mod(tvm.nd.array(a), c)
    np.testing.assert_equal(b, c.asnumpy())

We know that ``a.sum()`` will sum all elements in ``a`` and returns a
scalar. Let's also implement this in TVM. To do it, we need another
reduction axis along the first dimension, whose size is ``n``. The
result is a scalar, namely a 0-rank tensor, can be created with an empty
tuple ``()``.

.. code:: python

    i = te.reduce_axis((0, n), name='i')
    B = te.compute((), lambda: te.sum(A[i, j], axis=(i, j)), name='b')
    s = te.create_schedule(B.op)
    tvm.lower(s, [A, B], simple_mode=True)


.. parsed-literal::
    :class: output

    produce b {
      b[0] = 0f
      for (i, 0, n) {
        for (j, 0, m) {
          b[0] = (b[0] + a[((i*stride) + (j*stride))])
        }
      }
    }


Let's also verify the results.

.. code:: python

    mod = tvm.build(s, [A, B])
    c = tvm.nd.array(np.empty((), dtype='float32'))
    mod(tvm.nd.array(a), c)
    np.testing.assert_allclose(a.sum(), c.asnumpy(), atol=1e-5)

In this case we use ``np.testing.assert_allclose`` instead of
``np.testing.assert_equal`` to verify the results as the calculation on
``float32`` numbers may differ due to the numerical error.

Beyond ``tvm.sum``, there are other reduction operators in TVM such as
``tvm.min`` and ``tvm.max``. We can also use them to implement the
corresponding reduction operations as well.

Commutative Reduction
---------------------

In mathematics, an operator :math:`\circ` is commutative if
:math:`a\circ b = b\circ a`. TVM allows to define a customized
commutative reduction operator through ``tvm.comm_reducer``. It accepts
two function arguments, one defines how to compute :math:`a\circ b`, the
other one specifies the initial value.

Let's use the production by rows, e.g ``a.prod(axis=1)``, as an example.
Again, we first show how to implement it from scratch.

.. code:: python

    def prod_rows(a, b):
        """a is an n-by-m 2-D matrix, b is an n-length 1-D vector 
        """
        n, m = a.shape
        for i in range(n):
            b[i] = 1
            for j in range(m):
                b[i] = b[i] * a[i, j]

As can be seen, we need to first initialize the return values to be 1,
and then compute the reduction using scalar product ``*``. Now let's
define these two functions in TVM to serve as the arguments of
``te.comm_reducer``. As discussed, the first one defines
:math:`a\circ b` with two scalar inputs. The second one accepts a data
type argument to return the initial value of an element. Then we can
create the reduction operator.

.. code:: python

    comp = lambda a, b: a * b
    init = lambda dtype: tvm.tir.const(1, dtype=dtype)
    product = te.comm_reducer(comp, init)

The usage of ``product`` is similar to ``te.sum``. Actually, ``te.sum``
is a pre-defined reduction operator using the same way.

.. code:: python

    n = te.var('n')
    m = te.var('m')
    A = te.placeholder((n, m), name='a')
    k = te.reduce_axis((0, m), name='k')
    B = te.compute((n,), lambda i: product(A[i, k], axis=k), name='b')
    s = te.create_schedule(B.op)
    tvm.lower(s, [A, B], simple_mode=True)


.. parsed-literal::
    :class: output

    produce b {
      for (i, 0, n) {
        b[(i*stride)] = 1f
        for (k, 0, m) {
          b[(i*stride)] = (b[(i*stride)]*a[((i*stride) + (k*stride))])
        }
      }
    }


The generated pseudo codes are similar to the one for summing by rows,
except for the initialized value and the reduction arithmetic.

Again, let's verify the results.

.. code:: python

    mod = tvm.build(s, [A, B])
    b = tvm.nd.array(np.empty((3,), dtype='float32'))
    mod(tvm.nd.array(a), b)
    np.testing.assert_allclose(a.prod(axis=1), b.asnumpy(), atol=1e-5)

Summary
-------

-  We can apply a reduction operator, e.g. ``te.sum`` over a reduction
   axis ``te.reduce_axis``.
-  We can implement customized commutative reduction operators by
   ``te.comm_reducer``.