(2021-08-05) Grale, Trenton John; Swartzlander, Earl E., Jr., 1945-; Gerstlauer, Andreas; John, Lizy K; Orshansky, Michael E; Schulte, Michael J

Show more

Two well-known cryptographic protocols, RSA and ECC, employ modular multiplication on large integers or binary polynomial bit strings of hundreds or thousands of bits. The modulus may be an odd integer (usually prime), or an irreducible polynomial. Large products exceeding the value of the modulus must be reduced to congruent values smaller than the modulus. In simple terms, this is done by taking the remainder with respect to the modulus. However, reduction by integer or polynomial division is computationally expensive.
The Montgomery Multiplication transform and algorithm replace arbitrary division by the modulus with division by a power of two. Both hardware and software realizations of the Montgomery algorithm have been proposed over the past three decades. These range from serial algorithms that perform bit or digit level operations to large full word parallel architectures.
A widely-adopted classification scheme categorizes and characterizes serial
Montgomery architectures. This dissertation introduces the Serial Montgomery Model by fundamentally upgrading the existing scheme to accurately characterize a rich universe of architectures that employ digit level parallelism.
Certain properties of the Montgomery computation present optimization opportunities that have been little noted by prior researchers. This dissertation presents a novel Rescheduled Montgomery Multiplication architecture that targets those opportunities to drive a new level of optimal tradeoffs between area cost and latency. The architecture exploits digit level parallelism and dependency scheduling to attain higher performance than is attainable with serial architectures while avoiding the high area cost associated with large parallel architectures.