TY - GEN
T1 - High speed special function unit for graphics processing unit
AU - Qoutb, Abd Elrahman G.
AU - El-Gunidy, Abdullah M.
AU - Tolba, Mohammed F.
AU - El-Moursy, Magdy A.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/2/10
Y1 - 2015/2/10
N2 - A fixed-point ASIC design for high-speed, second-order, piecewise function approximation is presented. A Non-Uniform segmentation method based on Minimax approximation is used to get the interpolation coefficients. Non-Uniform segmentation, effectively, reduces the size of the coefficient table with a small area overhead for the address encoder. The proposed algorithm truncates the binary coefficients within the pre-al located error. Radix-eight Booth multipliers are used to reduce the number of partial products to, around one third of the traditional multiplication, hence speeding up the evaluation process. Very fast reduction trees with four-to-two compressors are used to reduce the number of the resulting partial products. Also, a new radix-eight sign template which reduces the overall area of the multipliers is proposed. Hybrid carry-look ahead, carry-ripple adders are, also, used. The design has been verified on FPGA Moreover, 45nm PDK is used to synthesize and layout the design. A maximum propagation delay of 5.251ns is achieved with a reduction of 19% in the total delay as compared to other traditional methods. A total chip area of 0.014mm2 is also achieved.
AB - A fixed-point ASIC design for high-speed, second-order, piecewise function approximation is presented. A Non-Uniform segmentation method based on Minimax approximation is used to get the interpolation coefficients. Non-Uniform segmentation, effectively, reduces the size of the coefficient table with a small area overhead for the address encoder. The proposed algorithm truncates the binary coefficients within the pre-al located error. Radix-eight Booth multipliers are used to reduce the number of partial products to, around one third of the traditional multiplication, hence speeding up the evaluation process. Very fast reduction trees with four-to-two compressors are used to reduce the number of the resulting partial products. Also, a new radix-eight sign template which reduces the overall area of the multipliers is proposed. Hybrid carry-look ahead, carry-ripple adders are, also, used. The design has been verified on FPGA Moreover, 45nm PDK is used to synthesize and layout the design. A maximum propagation delay of 5.251ns is achieved with a reduction of 19% in the total delay as compared to other traditional methods. A total chip area of 0.014mm2 is also achieved.
KW - Booth Multiplier
KW - GPU
KW - Hybrid Multiplier
KW - Minimax
KW - Nmeric Function Generator (NFG)
KW - Non-Uniform Segmentation
KW - Special Function Unit (SFU)
KW - Vertix Shader Processor
UR - http://www.scopus.com/inward/record.url?scp=84924308567&partnerID=8YFLogxK
U2 - 10.1109/IDT.2014.7038581
DO - 10.1109/IDT.2014.7038581
M3 - Conference contribution
AN - SCOPUS:84924308567
T3 - Proceedings of 2014 9th International Design and Test Symposium, IDT 2014
SP - 24
EP - 29
BT - Proceedings of 2014 9th International Design and Test Symposium, IDT 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 9th International Design and Test Symposium, IDT 2014
Y2 - 16 December 2014 through 18 December 2014
ER -