Abstract:
We implement new arithmetic unit for matrix operation named matrix FMA. This unit read three matrices at one cycle and output the result of 𝐴𝐵+𝐶f in every cycle. We employ this unit to array processor for matrix multiplication and evaluate this unit in 45nm NanGate process. The result shows throughput become double though ×1.3 area penalty in the case of 4×4 matrix operation.
Description:
This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc. and Cadence Design Systems, Inc.