GALS System Design:
Side Channel Attack Secure Cryptographic Accelerators
Chapter 6:
Conclusions
Disclaimer:
This is the www enabled version of my thesis. This has been converted from
the sources of the original file by using TTH, some perl
and some hand editing.
There is also a PDF.
This is essentially as it is, but includes formatting for A4, and some of the color pictures
from the presentation.
Contents
1 Introduction
2 GALS System Design
3 Cryptographic Accelerators
4 Secure AES Implementation Using GALS
5 Designing GALS Systems
6 Conclusion
6.1 Cryptographic Hardware Design
6.2 GALS Design Methodology
6.3 Final Words
A 'Guessing' Effort for Keys
B List of Abbreviations
B Bibliography
B Footnotes
This thesis essentially combines two different research areas: cryptographic
hardware design, and the GALS design methodology. Separate conclusions
can be drawn for both areas:
6.1 Cryptographic Hardware Design
A hardware designer's view of implementing cryptographic hardware
is presented in this work. As a result of the experience obtained
during the design of six different ASICs with different constraints
on operation speed, area and side-channel security, an extensive analysis
of hardware implementations of the popular Advanced Encryption Standard
is given.
As part of this work, a completely new implementation of the AES with
improved countermeasures against the differential power analysis (a
particularly efficient form of side-channel attacks) has been developed.
This implementation, called Acacia, is based on the GALS design
methodology and utilizes several levels of countermeasures.
Several of the implemented countermeasures, like adding noise generators
or inserting 'dummy' operations, are well-known. Some of the countermeasures
are however new, and were only made possible as a result of using
the GALS design methodology. Acacia consists of three datapath
units that are clocked independently. In addition, Acacia is
able to change the period of individual clock cycles randomly. Combined
with the aforementioned countermeasures, this results in a very unpredictable
operation order and is expected to present a formidable challenge
to attackers.
Almost all countermeasures incur some sort of performance penalty.
Additional pseudo random number generators used for generating additional
noise increase circuit area, and not surprisingly, power consumption.
Dummy operations used to interrupt the regular operation flow increase
the time required to complete a cryptographic operation, and reduce
the throughput. In addition, the AES datapath needs to be modified
in a GALS friendly way, which results in lower performance when compared
to an optimized implementation.
Acacia is able to dynamically change its security effort during
an operation. The so-called policy can be programmed to increase the
effort for countermeasures during the initial and final rounds of
an AES operation, where it is more vulnerable. During middle rounds,
where it is more difficult to stage a DPA attack, the security effort
is reduced. Measurement results have shown that, when run using this
policy setting, the throughput of Acacia is only 15% less
than that of the synchronous reference design and consumes only 20%
more energy per data item.
This thesis provides a fresh idea on how to implement DPA countermeasures.
The evaluation of the efficiency of the proposed countermeasures requires
the help of the cryptanalysis community, and is beyond the scope of
this thesis.
There are many practical problems associated with determining the
quality of the countermeasures. For instance, to be able to verify
whether or not a proposed countermeasure results in 10x improvement
in DPA security, the design must be attacked with no countermeasures
first. Then with the countermeasures activated, it must be shown that
an attack with comparable certainty can only be obtained with at least
10x more effort56.
The successful DPA attack on Fastcore shown in figure 3.13
required more than three days of automated measurements. Unless the
measurement setup is refined to reduce the attack time dramatically,
it is impractical to demonstrate the efficiency of the countermeasures
using the current measurement setup. Even if such measurements could
be performed in reasonable time, they would not offer conclusive proof
that the proposed countermeasures are effective against side-channel
attacks of similar nature.
6.2 GALS Design Methodology
Up to now, GALS implementations have been limited to demonstrator
circuits or large-scale testbeds. The Acacia design presented
in this thesis is the first circuit where GALS has been applied to
address a specific problem. The GALS design flow has been proven to
produce results comparable to what can be expected from more established
industrial design flows. The design effort was comparable to that
of a synchronous design of the same complexity, and all major steps
of the design flow were performed using industry-standard design tools.
Designs that employ self-timed circuits are often criticized to have
poor testability. Similar concerns have been raised over GALS applications
over the years. By using a combination of scan-based test and a simple
functional test, a stuck-at fault coverage of more than 99.8% has
been obtained for Acacia.
Since GALS-based designs have pausable local clock generators, it
is considered to be hard to interface to external sources that use
synchronous clocking. While a generic solution for this problem has
not been formulated, it was shown that under certain timing assumptions,
it is indeed possible to transfer data between a GALS module and a
standard synchronous design reliably.
The main design criteria behind Acacia was to improve the resistance
against differential power analysis attacks. Such countermeasures
invariably add penalties to system parameters such as circuit area,
operation speed, and power consumption. However, even with significant
countermeasures, the GALS-based Acacia achieves throughput
and power figures within 20% of those from a fully synchronous version
without any countermeasures. This shows that it is indeed possible
to design GALS systems that have similar or even better performance
metrics than their synchronous counterparts.
As can be seen from table 4.3, Acacia
is more than two times larger than the reference design. However,
the overhead required for the self-timed wrapper is less than 5%
of the total area of Acacia. Most of the additional area in
Acacia can be attributed to the countermeasures and the pseudo
random number generators.
The most critical aspect of GALS remains to be partitioning the design
into GALS modules. The partitioning has more influence on the performance
of the system than all other factors combined. A well-defined methodology
to determine the partitioning for GALS designs has yet to be developed.
In an attempt to make GALS design attractive to a broader audience,
it has been often suggested that a synchronous design can be easily
converted to a GALS design in a process called GALSification. While
it is possible to design working GALS systems using this method, more
efficient systems can only be realized if it is designed with GALS
in mind in the first place.
This will be more apparent for larger SoCs that are expected to benefit
significantly from GALS-based design. Present SoC designs require
several tens of clock domains. Several of these domains are introduced
to enable system wide communication protocols between blocks with
different operating speeds. If such SoC circuits were designed with
GALS in mind, several of such clock domains would not be required.
Moreover, especially for inter-module communication, inherently asynchronous
communication protocols would be favored over synchronous versions
that are more difficult to implement in a GALS system.
6.3 Final Words
This work shows that the GALS approach is indeed a relatively mature
design methodology that can be safely applied to design digital systems.
As long as the system has been designed with GALS in mind, a designer
using GALS should not expect a notable performance loss or an increased
design effort.
The main advantage of the GALS design methodology is that it reduces
the effort required for integrating multiple blocks on a large System-on-Chip
design. However, in the example design described in this thesis, GALS
has been applied to address a completely different problem. By implementing
a common cryptographic algorithm using GALS, it was shown that completely
new countermeasures against common side-channel attacks can be developed.
File translated from
TEX
by
TTH,
version 3.77.
On 20 Dec 2006, 15:44.