1100 results sorted by ID
HADES: Range-Filtered Private Aggregation on Public Data
Xiaoyuan Liu, Ni Trieu, Trinabh Gupta, Ishtiyaque Ahmad, Dawn Song
Cryptographic protocols
In aggregation queries, predicate parameters often reveal user intent. Protecting these parameters is critical for user privacy, regardless of whether the database is public or private. While most existing works focus on private data settings, we address a public data setting where the server has access to the database. Current solutions for this setting either require additional setups (e.g., noncolluding servers, hardware enclaves) or are inefficient for practical workloads. Furthermore,...
On the practicality of quantum sieving algorithms for the shortest vector problem
Joao F. Doriguello, George Giapitzakis, Alessandro Luongo, Aditya Morolia
Attacks and cryptanalysis
One of the main candidates of post-quantum cryptography is lattice-based cryptography. Its cryptographic security against quantum attackers is based on the worst-case hardness of lattice problems like the shortest vector problem (SVP), which asks to find the shortest non-zero vector in an integer lattice. Asymptotic quantum speedups for solving SVP are known and rely on Grover's search. However, to assess the security of lattice-based cryptography against these Grover-like quantum speedups,...
Multiplying Polynomials without Powerful Multiplication Instructions (Long Paper)
Vincent Hwang, YoungBeom Kim, Seog Chung Seo
Implementation
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli...
On Constructing Pseudorandom Involutions: Feistel variants using a single round function
Chun Guo, Meiqin Wang, Weijia Wang
Secret-key cryptography
An involution is a permutation that is the inverse of itself. Involutions have attracted plenty attentions in cryptographic community due to their advantage regarding hardware implementations. In this paper, we reconsider constructing {\it pseudorandom involutions}. We demonstrate two constructions.
First, the 4-round Feistel network {\it using the same random function (Feistel-SF) in every round} is a pseudorandom involution. This shows the Feistel-SF construction still provides...
Efficient Boolean-to-Arithmetic Mask Conversion in Hardware
Aein Rezaei Shahmirzadi, Michael Hutter
Implementation
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Shutong Jin, Zhen Gu, Guangyan Li, Donglong Chen, Çetin Kaya Koç, Ray C. C. Cheung, Wangchen Dai
Implementation
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
Shaking up authenticated encryption
Joan Daemen, Seth Hoffert, Silvia Mella, Gilles Van Assche, Ronny Van Keer
Secret-key cryptography
Authenticated encryption (AE) is a cryptographic mechanism that allows communicating parties to protect the confidentiality and integrity of messages exchanged over a public channel, provided they share a secret key. In this work, we present new AE schemes leveraging the SHA-3 standard functions SHAKE128 and SHAKE256, offering 128 and 256 bits of security strength, respectively, and their “Turbo” counterparts. They support session-based communication, where a ciphertext authenticates the...
HEonGPU: a GPU-based Fully Homomorphic Encryption Library 1.0
Ali Şah Özcan, Erkay Savaş
Implementation
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
Functional Adaptor Signatures: Beyond All-or-Nothing Blockchain-based Payments
Nikhil Vanjani, Pratik Soni, Sri AravindaKrishnan Thyagarajan
Cryptographic protocols
In scenarios where a seller holds sensitive data $x$, like employee / patient records or ecological data, and a buyer seeks to obtain an evaluation of specific function $f$ on this data, solutions in trustless digital environments like blockchain-based Web3 systems typically fall into two categories: (1) Smart contract-powered solutions and (2) cryptographic solutions leveraging tools such as adaptor signatures. The former approach offers atomic transactions where the buyer learns the...
The SMAesH dataset
Gaëtan Cassiers, Charles Momin
Implementation
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
A Combined Design of 4-PLL-TRNG and 64-bit CDC-7-XPUF on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
Marian: An Open Source RISC-V Processor with Zvk Vector Cryptography Extensions
Thomas Szymkowiak, Endrit Isufi, Markku-Juhani Saarinen
Implementation
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
Attestation Proof of Association – provability that attestation keys are bound to the same hardware and person
Eric Verheul
Implementation
We propose a wallet provider issued attestation called Wallet Trust Evidence (WTE) and three related specific instructions for the European Digital Identity (EUDI) Wallet cryptographic hardware, most notably the generation of a Proof of Association (PoA). These allow the EUDI Wallet providing verifiable assurance to third parties (issuers, relying parties) that attestation private keys are not only bound to conformant cryptographic hardware but also that they are bound to the same such...
Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms
Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede
Implementation
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
Eva: Efficient IVC-Based Authentication of Lossy-Encoded Videos
Chengru Zhang, Xiao Yang, David Oswald, Mark Ryan, Philipp Jovanovic
Applications
With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...
Reality Check on Side-Channels: Lessons learnt from breaking AES on an ARM Cortex A processor
Shivam Bhasin, Harishma Boyapally, Dirmanto Jap
Attacks and cryptanalysis
AES implementation has been vastly analysed against side-channel attacks in the last two decades particularly targeting resource-constrained microcontrollers. Still, less research has been conducted on AES implementations on advanced hardware platforms. In this study, we examine the resilience of AES on an ARM Cortex A72 processor within the Raspberry Pi 4B model. Unlike their microcontroller counterparts, these platforms operate within the complex ecosystem of an operating system (OS),...
EUCLEAK
Thomas Roche
Attacks and cryptanalysis
Secure elements are small microcontrollers whose main purpose is to generate/store secrets and then execute cryptographic operations. They undergo the highest level of security evaluations that exists (Common Criteria) and are often considered inviolable, even in the worst-case attack scenarios. Hence, complex secure systems build their security upon them.
FIDO hardware tokens are strong authentication factors to sign in to applications (any web service supporting FIDO); they often embed...
AGATE: Augmented Global Attested Trusted Execution in the Universal Composability framework
Lorenzo Martinico, Markulf Kohlweiss
Foundations
A Trusted Execution Environment (TEE) is a new type of security technology, implemented by CPU manufacturers, which guarantees integrity and confidentiality on a restricted execution environment to any remote verifier. TEEs are deployed on various consumer and commercial hardwareplatforms, and have been widely adopted as a component in the design of cryptographic protocols both theoretical and practical.
Within the provable security community, the use of TEEs as a setup assumption has...
Update to the Sca25519 Library: Mitigating Tearing-based Side-channel Attacks
Lukasz Chmielewski, Lubomír Hrbáček
Implementation
This short note describes an update to the sca25519 library, an ECC implementation computing the X25519 key-exchange protocol on the Arm Cortex-M4 microcontroller. The sca25519 software came with extensive mitigations against various side-channel and fault attacks and was, to our best knowledge, the first to claim affordable protection against multiple classes of attacks that are motivated by distinct real-world application scenarios.
This library is protected against various passive and...
SoK: Instruction Set Extensions for Cryptographers
Hao Cheng, Johann Großschädl, Ben Marshall, Daniel Page, Markku-Juhani O. Saarinen
Implementation
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
MAESTRO: Multi-party AES using Lookup Tables
Hiraku Morita, Erik Pohle, Kunihiko Sadakane, Peter Scholl, Kazunari Tozawa, Daniel Tschudi
Cryptographic protocols
Secure multi-party computation (MPC) enables multiple distrusting parties to jointly compute a function while keeping their inputs private. Computing the AES block cipher in MPC, where the key and/or the input are secret-shared among the parties is important for various applications, particularly threshold cryptography.
In this work, we propose a family of dedicated, high-performance MPC protocols to compute the non-linear S-box part of AES in the honest majority setting. Our protocols...
R-STELLAR: A Resilient Synthesizable Signature Attenuation SCA Protection on AES-256 with built-in Attack-on-Countermeasure Detection
Archisman Ghosh, Dong-Hyun Seo, Debayan Das, Santosh Ghosh, Shreyas Sen
Applications
Side-channel attacks (SCAs) remain a significant threat to the security of cryptographic systems in modern embedded devices. Even mathematically secure cryptographic algorithms, when implemented in hardware, inadvertently leak information through physical side-channel signatures such as power consumption, electromagnetic (EM) radiation, light emissions, and acoustic emanations. Exploiting these side channels significantly reduces the attacker’s search space.
In recent years, physical...
A Survey on SoC Security Verification Methods at the Pre-silicon Stage
Rasheed Kibria, Farimah Farahmandi, Mark Tehranipoor
Foundations
This paper presents a survey of the state-of-the-art pre-silicon security verification techniques for System-on-Chip (SoC) designs, focusing on ensuring that designs, implemented in hardware description languages (HDLs) and synthesized circuits, meet security requirements before fabrication in semiconductor foundries. Due to several factors, pre-silicon security verification has become an essential yet challenging aspect of the SoC hardware lifecycle. The modern SoC design process often...
Robust but Relaxed Probing Model
Nicolai Müller, Amir Moradi
Applications
Masking has become a widely applied and heavily researched method to protect cryptographic implementations against SCA attacks. The success of masking is primarily attributed to its strong theoretical foundation enabling it to formally prove security by modeling physical properties through so-called probing models. Specifically, the robust $d$-probing model enables us to prove the security for arbitrarily masked hardware circuits, manually or with the assistance of automated tools, even when...
FELIX (XGCD for FALCON): FPGA-based Scalable and Lightweight Accelerator for Large Integer Extended GCD
Sam Coulon, Tianyou Bao, Jiafeng Xie
Implementation
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
EMI Shielding for Use in Side-Channel Security: Analysis, Simulation and Measurements
Daniel Dobkin, Edut Katz, David Popovtzer, Itamar Levi
Attacks and cryptanalysis
Considering side-channel analysis (SCA) security for cryptographic devices, the mitigation of electromagnetic leakage and electromagnetic interference (EMI) between modules poses significant challenges. This paper presents a comprehensive review and deep analysis of the utilization of EMI shielding materials, devised for reliability purposes and standards such as EMI/EMC, as a countermeasure to enhance EM-SCA security. We survey the current landscape of EMI-shields materials, including...
Koala: A Low-Latency Pseudorandom Function
Parisa Amiri Eliasi, Yanis Belkheyar, Joan Daemen, Santosh Ghosh, Daniël Kuijsters, Alireza Mehrdad, Silvia Mella, Shahram Rasoolzadeh, Gilles Van Assche
Secret-key cryptography
This paper introduces the Koala PRF, which maps a variable-length sequence of $64$-bit input blocks to a single $257$-bit output block.
Its design focuses on achieving low latency in its implementation in ASIC.
To construct Koala, we instantiate the recently introduced Kirby construction with the Koala-P permutation and add an input encoding layer.
The Koala-P permutation is obtained as the $8$-fold iteration of a simple round function inspired by that of Subterranean.
Based on...
Designing a General-Purpose 8-bit (T)FHE Processor Abstraction
Daphné Trama, Pierre-Emmanuel Clet, Aymen Boudguiga, Renaud Sirdey
Applications
Making the most of TFHE programmable bootstrapping to evaluate
functions or operators otherwise challenging to perform with only the native addition
and multiplication of the scheme is a very active line of research. In this paper, we
systematize this approach and apply it to build an 8-bit FHE processor abstraction,
i.e., a software entity that works over FHE-encrypted 8-bits data and presents itself
to the programmer by means of a conventional-looking assembly instruction set.
In...
Hardware Implementation and Security Analysis of Local-Masked NTT for CRYSTALS-Kyber
Rafael Carrera Rodriguez, Emanuele Valea, Florent Bruguier, Pascal Benoit
Implementation
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic
Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
Towards ML-KEM & ML-DSA on OpenTitan
Amin Abdulrahman, Felix Oberhansl, Hoang Nguyen Hien Pham, Jade Philipoom, Peter Schwabe, Tobias Stelzer, Andreas Zankl
Implementation
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
Lightweight Dynamic Linear Components for Symmetric Cryptography
S. M. Dehnavi, M. R. Mirzaee Shamsabad
Foundations
In this paper, using the concept of equivalence of mappings we characterize all of the one-XOR matrices which are used in hardware applications and propose a family of lightweight linear mappings for software-oriented applications in symmetric cryptography. Then, we investigate interleaved linear mappings and based upon this study, we present generalized dynamic primitive LFSRs along with dynamic linear components for construction of diffusion layers.
From the mathematical...
STORM — Small Table Oriented Redundancy-based SCA Mitigation for AES
Yaacov Belenky, Hennadii Chernyshchyk, Oleg Karavaev, Oleh Maksymenko, Valery Teper, Daria Ryzhkova, Itamar Levi, Osnat Keren, Yury Kreimer
Attacks and cryptanalysis
Side-channel-analysis (SCA) resistance with cost optimization in AES hardware implementations remains a significant challenge. While traditional masking-based schemes offer provable security, they often incur substantial resource overheads (latency, area, randomness, performance, power consumption). Alternatively, the RAMBAM scheme introduced a redundancy-based approach to control the signal-to-noise ratio, and achieves exponential leakage reduction as redundancy increases. This method...
MATTER: A Wide-Block Tweakable Block Cipher
Roberto Avanzi, Orr Dunkelman, Kazuhiko Minematsu
Secret-key cryptography
In this note, we introduce the MATTER Tweakable Block Cipher, designed principally for low latency in low-area hardware implementations, but that can also be implemented in an efficient and compact way in software.
MATTER is a 512-bit wide balanced Feistel network with three to six rounds, using the ASCON permutation as the round function.
The Feistel network defines a keyed, non-tweakable core, which is made tweakable by using the encryption of the tweak as its key.
Key and tweak are...
Rudraksh: A compact and lightweight post-quantum key-encapsulation mechanism
Suparna Kundu, Archisman Ghosh, Angshuman Karmakar, Shreyas Sen, Ingrid Verbauwhede
Public-key cryptography
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
LR-OT: Leakage-Resilient Oblivious Transfer
Francesco Berti, Carmit Hazay, Itamar Levi
Cryptographic protocols
Oblivious Transfer (OT) is a fundamental cryptographic primitive, becoming a crucial component of a practical secure protocol.
OT is typically implemented in software, and one way to accelerate its running time is by using hardware implementations.
However, such implementations are vulnerable to side-channel attacks (SCAs).
On the other hand, protecting interactive protocols against SCA is highly challenging because of their longer secrets (which include inputs and randomness), more...
A Fast and Efficient SIKE Co-Design: Coarse-Grained Reconfigurable Accelerators with Custom RISC-V Microcontroller on FPGA
Jing Tian, Bo Wu, Lang Feng, Haochen Zhang, Zhongfeng Wang
Implementation
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Oryx: Private detection of cycles in federated graphs
Ke Zhong, Sebastian Angel
Cryptographic protocols
This paper proposes Oryx, a system for efficiently detecting cycles in federated graphs where parts of the graph are held by different parties and are private. Cycle detection is an important building block in designing fraud detection algorithms that operate on confidential transaction data held by different financial institutions. Oryx allows detecting cycles of various length while keeping the topology of the graphs secret, and it does so efficiently; Oryx achieves quasilinear...
QuietOT: Lightweight Oblivious Transfer with a Public-Key Setup
Geoffroy Couteau, Lalita Devadas, Srinivas Devadas, Alexander Koch, Sacha Servan-Schreiber
Cryptographic protocols
Oblivious Transfer (OT) is at the heart of secure computation and is a foundation for many applications in cryptography. Over two decades of work have led to extremely efficient protocols for evaluating OT instances in the preprocessing model, through a paradigm called OT extension.
A few OT instances generated in an offline phase can be used to perform many OTs in an online phase efficiently, i.e., with very low communication and computational overheads.
Specifically, traditional OT...
A More Compact AES, and More
Dag Arne Osvik, David Canright
Implementation
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal...
VIMz: Verifiable Image Manipulation using Folding-based zkSNARKs
Stefan Dziembowski, Shahriar Ebrahimi, Parisa Hassanizadeh
Applications
With the rise of generative AI technology, the media's credibility as a source of truth has been significantly compromised. This highlights the need to verify the authenticity of media and its originality.
Ensuring the integrity of media during capture using the device itself presents a straightforward solution to this challenge.
However, raw captured media often require certain refinements or redactions before publication. Zero-knowledge proofs (ZKP) offer a solution by allowing...
Reading It like an Open Book: Single-trace Blind Side-channel Attacks on Garbled Circuit Frameworks
Sirui Shen, Chenglu Jin
Attacks and cryptanalysis
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers.
We investigate popular open-source GC frameworks and discover that the AES...
Grafting: Complementing RNS in CKKS
Jung Hee Cheon, Hyeongmin Choe, Minsik Kang, Jaehyung Kim
Implementation
The RNS variant of the CKKS scheme (SAC 2018) is widely implemented due to its computational efficiency. However, the current optimized implementations of the RNS-CKKS scheme have a limitation when choosing the ciphertext modulus. It requires the scale factors to be approximately equal to a factor (or a product of factors) of the ciphertext modulus. This restriction causes inefficiency when the scale factor is not close to the power of the machine's word size, wasting the machine's...
Secure Vickrey Auctions with Rational Parties
Chaya Ganesh, Shreyas Gupta, Bhavana Kanukurthi, Girisha Shankar
Cryptographic protocols
In this work, we construct a second price (Vickrey) auction protocol (SPA), which does not require any auctioneers and ensures total privacy in the presence of rational parties participating in auction. In particular, the confidentiality of the highest bid and the identity of the second highest bidder are protected. We model the bidders participating in the second price auction as rational, computationally bounded and privacy-sensitive parties. These are self-interested agents who care about...
Side-Channel and Fault Resistant ASCON Implementation: A Detailed Hardware Evaluation (Extended Version)
Aneesh Kandi, Anubhab Baksi, Peizhou Gan, Sylvain Guilley, Tomáš Gerlich, Jakub Breier, Anupam Chattopadhyay, Ritu Ranjan Shrivastwa, Zdeněk Martinásek, Shivam Bhasin
Implementation
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
ICICLE v2: Polynomial API for Coding ZK Provers to Run on Specialized Hardware
Karthik Inbasekar, Yuval Shekel, Michael Asa
Applications
Polynomials play a central role in cryptography. In the context of Zero Knowledge Proofs (ZKPs), protocols can be exclusively expressed using polynomials, making them a powerful abstraction tool, as demonstrated in most ZK research papers. Our first contribution is a high-level framework that enables practitioners to implement ZKPs in a more natural way, based solely on polynomial primitives.
ZK provers are considered computationally intensive algorithms with a high degree of...
Designs for practical SHE schemes based on Ring-LWR
Madalina Bolboceanu, Anamaria Costache, Erin Hales, Rachel Player, Miruna Rosca, Radu Titiu
Public-key cryptography
The Learning with Errors problem (LWE) and its variants are among the most popular assumptions underlying lattice-based cryptography. The Learning with Rounding problem (LWR) can be thought of as a deterministic variant of LWE. While lattice-based cryptography is known to enable many advanced constructions, constructing Fully Homomorphic Encryption schemes based on LWR remains an under-explored part of the literature. In this work, we present a thorough study of Somewhat Homomorphic...
CISELeaks: Information Leakage Assessment of Cryptographic Instruction Set Extension Prototypes
Aruna Jayasena, Richard Bachmann, Prabhat Mishra
Attacks and cryptanalysis
Software based cryptographic implementations provide flexibility but they face performance limitations. In contrast, hardware based cryptographic accelerators utilize application-specific customization to provide real-time security solutions.
Cryptographic instruction-set extensions (CISE) combine the advantages of both hardware and software based solutions to provide higher performance combined with the flexibility of atomic-level cryptographic operations. While CISE is widely used to...
Time Sharing - A Novel Approach to Low-Latency Masking
Dilip Kumar S. V., Siemen Dhooghe, Josep Balasch, Benedikt Gierlichs, Ingrid Verbauwhede
Implementation
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
Approximate CRT-Based Gadget Decomposition and Application to TFHE Blind Rotation
Olivier Bernard, Marc Joye
Implementation
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
Glitch-Stopping Circuits: Hardware Secure Masking without Registers
Zhenda Zhang, Svetla Nikova, Ventzislav Nikov
Implementation
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
HAWKEYE – Recovering Symmetric Cryptography From Hardware Circuits
Gregor Leander, Christof Paar, Julian Speith, Lukas Stennes
Implementation
We present the first comprehensive approach for detecting and analyzing symmetric cryptographic primitives in gate-level descriptions of hardware. To capture both ASICs and FPGAs, we model the hardware as a directed graph, where gates become nodes and wires become edges. For modern chips, those graphs can easily consist of hundreds of thousands of nodes. More abstractly, we find subgraphs corresponding to cryptographic primitives in a potentially huge graph, the sea-of-gates, describing an...
INDIANA - Verifying (Random) Probing Security through Indistinguishability Analysis
Christof Beierle, Jakob Feldtkeller, Anna Guinet, Tim Güneysu, Gregor Leander, Jan Richter-Brockmann, Pascal Sasdrich
Implementation
Despite masking being a prevalent protection against passive side-channel attacks, implementing it securely in hardware remains a manual, challenging, and error-prone process.
This paper introduces INDIANA, a comprehensive security verification tool for hardware masking. It provides a hardware verification framework, enabling a complete analysis of simulation-based security in the glitch-extended probing model, with cycle-accurate estimations for leakage probabilities in the random...
Elliptic Curve Cryptography for the masses: Simple and fast finite field arithmetic
Michael Scott
Implementation
Shaped prime moduli are often considered for use in elliptic curve and isogeny-based cryptography to allow for faster modular reduction. Here we focus on the most common choices for shaped primes that have been suggested, that is pseudo-Mersenne, generalized Mersenne and Montgomery-friendly primes. We consider how best to to exploit these shapes for maximum efficiency, and provide an open source tool to automatically generate, test and time working high-level language finite-field code....
Byzantine Reliable Broadcast with One Trusted Monotonic Counter
Yackolley Amoussou-Guenou, Lionel Beltrando, Maurice Herlihy, Maria Potop-Butucaru
Foundations
Byzantine Reliable Broadcast is one of the most popular communication primitives in distributed systems. Byzantine reliable broadcast ensures that processes agree to deliver a message from an initiator even if some processes (perhaps including the initiator) are Byzantine. In asynchronous settings it is known since the prominent work of Bracha [Bracha87] that Byzantine reliable broadcast can be implemented deterministically if $n \geq 3t+1$ where $t$ is an upper bound on the...
Constant-Cost Batched Partial Decryption in Threshold Encryption
Sora Suegami, Shinsaku Ashizawa, Kyohei Shibano
Cryptographic protocols
Threshold public key encryption schemes distribute secret keys among multiple parties, known as the committee, to reduce reliance on a single trusted entity.
However, existing schemes face inefficiencies as the committee should perform computation and communication for decryption of each individual ciphertext.
As the number of ciphertexts being decrypted per unit of time increases, this can limit the number of committee parties and their decentralization due to increased hardware...
Client-Efficient Online-Offline Private Information Retrieval
Hoang-Dung Nguyen, Jorge Guajardo, Thang Hoang
Cryptographic protocols
Private Information Retrieval (PIR) permits clients to query entries from a public database hosted on untrusted servers in a privacy-preserving manner. Traditional PIR model suffers from high computation and/or bandwidth cost due to entire database processing for privacy. Recently, Online-Offline PIR (OO-PIR) has been suggested to improve the practicality of PIR, where query-independent materials are precomputed beforehand to accelerate online access. While state-of-the-art OO-PIR schemes...
Automated Generation of Fault-Resistant Circuits
Nicolai Müller, Amir Moradi
Implementation
Fault Injection (FI) attacks, which involve intentionally introducing faults into a system to cause it to behave in an unintended manner, are widely recognized and pose a significant threat to the security of cryptographic primitives implemented in hardware, making fault tolerance an increasingly critical concern. However, protecting cryptographic hardware primitives securely and efficiently, even with well-established and documented methods such as redundant computation, can be a...
Private Computations on Streaming Data
Vladimir Braverman, Kevin Garbe, Eli Jaffe, Rafail Ostrovsky
Cryptographic protocols
We present a framework for privacy-preserving streaming algorithms which combine the memory-efficiency of streaming algorithms with strong privacy guarantees. These algorithms enable some number of servers to compute aggregate statistics efficiently on large quantities of user data without learning the user's inputs. While there exists limited prior work that fits within our model, our work is the first to formally define a general framework, interpret existing methods within this general...
Secure Implementation of SRAM PUF for Private Key Generation
Raja Adhithan Radhakrishnan
Implementation
This paper endeavors to securely implement a Physical Unclonable
Function (PUF) for private data generation within Field-Programmable
Gate Arrays (FPGAs). SRAM PUFs are commonly utilized due to their
use of memory devices for generating secret data, particularly in resource constrained devices. However, their reliance on memory access poses side-channel threats such as data remanence decay and memory-based attacks, and the time required to generate secret data is significant. To address...
A New Hash-based Enhanced Privacy ID Signature Scheme
Liqun Chen, Changyu Dong, Nada El Kassem, Christopher J.P. Newton, Yalan Wang
Cryptographic protocols
The elliptic curve-based Enhanced Privacy ID (EPID) signature scheme is broadly used for hardware enclave attestation by many platforms that implement Intel Software Guard Extensions (SGX) and other devices. This scheme has also been included in the Trusted Platform Module (TPM) specifications and ISO/IEC standards. However, it is insecure against quantum attackers. While research into quantum-resistant EPID has resulted in several lattice-based schemes, Boneh et al. have initiated the study...
Vision Mark-32: ZK-Friendly Hash Function Over Binary Tower Fields
Tomer Ashur, Mohammad Mahzoun, Jim Posen, Danilo Šijačić
Implementation
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
A Complete Beginner Guide to the Number Theoretic Transform (NTT)
Ardianto Satriawan, Rella Mareta, Hanho Lee
Foundations
The Number Theoretic Transform (NTT) is a powerful mathematical tool that has become increasingly important in developing Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE). Its ability to efficiently calculate polynomial multiplication using the convolution theorem with a quasi-linear complexity $O(n \log{n})$ instead of $O(n^2)$ when implemented with Fast Fourier Transform-style algorithms has made it a key component in modern cryptography. FFT-style NTT algorithm or fast-NTT...
Fast Parallelizable Misuse-Resistant Authenticated Encryption: Low Latency (Decryption-Fast) SIV
Mustafa Khairallah
Secret-key cryptography
MRAE security is an important goal for many AEAD applications where the nonce uniqueness cannot be maintained and security risks are significant. However, MRAE schemes can be quite expensive. Two of the SoTA MRAE-secure schemes; Deoxys-II and AES-GCM-SIV rely on internal parallelism and special instructions to achieve competitive performance. However, they both suffer from the same bottleneck, they have at least one call to the underlying primitive that cannot be parallelized to any other...
Number-Theoretic Transform Architecture for Fully Homomorphic Encryption from Hypercube Topology
Jingwei Hu, Yuhong Fang, Wangchen Dai
Implementation
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes.
The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
Tight ZK CPU: Batched ZK Branching with Cost Proportional to Evaluated Instruction
Yibin Yang, David Heath, Carmit Hazay, Vladimir Kolesnikov, Muthuramakrishnan Venkitasubramaniam
Cryptographic protocols
We explore Zero-Knowledge proofs (ZKP) of statements expressed as programs written in high-level languages, e.g., C or assembly. At the core of executing such programs in ZK is the repeated evaluation of a CPU step, achieved by branching over the CPU’s instruction set. This approach is general and covers traversal-execution of a program’s control flow graph (CFG): here CPU instructions are straight-line program fragments (of various sizes) associated with the CFG nodes. This highlights the...
Fastcrypto: Pioneering Cryptography Via Continuous Benchmarking
Kostas Kryptos Chalkias, Jonas Lindstrøm, Deepak Maram, Ben Riva, Arnab Roy, Alberto Sonnino, Joy Wang
Implementation
In the rapidly evolving fields of encryption and blockchain technologies, the efficiency and security of cryptographic schemes significantly impact performance. This paper introduces a comprehensive framework for continuous benchmarking in one of the most popular cryptography Rust libraries, fastcrypto. What makes our analysis unique is the realization that automated benchmarking is not just a performance monitor and optimization tool, but it can be used for cryptanalysis and innovation...
Threshold implementations of cryptographic functions between finite Abelian groups
Enrico Piccione
Implementation
The threshold implementation technique has been proposed in 2006 by Nikova et al. as a countermeasure to mitigate cryptographic side-channel attacks on hardware implementations when the effect of glitches is taken into account. This technique is based on Boolean sharing (also called masking) and it was developed for securing symmetric ciphers such as AES. In 2023, Piccione et al. proposed a general construction of threshold implementations that is universal for S-boxes that are bijective...
Generalized Feistel Ciphers for Efficient Prime Field Masking - Full Version
Lorenzo Grassi, Loïc Masure, Pierrick Méaux, Thorben Moos, François-Xavier Standaert
Secret-key cryptography
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
Plan your defense: A comparative analysis of leakage detection methods on RISC-V cores
Konstantina Miteloudi, Asmita Adhikary, Niels van Drueten, Lejla Batina, Ileana Buhan
Applications
Hardening microprocessors against side-channel attacks is a critical aspect of ensuring their security. A key step in this process is identifying and mitigating “leaky" hardware modules, which leak information during the execution of cryptographic algorithms.
In this paper, we explore how different leakage detection methods, the Side-channel Vulnerability Factor (SVF) and the Test Vector Leakage Assessment (TVLA), contribute to hardening of microprocessors. We conduct experiments on two...
Accelerating SLH-DSA by Two Orders of Magnitude with a Single Hash Unit
Markku-Juhani O. Saarinen
Implementation
We report on efficient and secure hardware implementation techniques for the FIPS 205 SLH-DSA Hash-Based Signature Standard. We demonstrate that very significant overall performance gains can be obtained from hardware that optimizes the padding formats and iterative hashing processes specific to SLH-DSA. A prototype implementation, SLotH, contains Keccak/SHAKE, SHA2-256, and SHA2-512 cores and supports all 12 parameter sets of SLH-DSA. SLotH also supports side-channel secure PRF computation...
Combined Threshold Implementation
Jakob Feldtkeller, Jan Richter-Brockmann, Pascal Sasdrich, Tim Güneysu
Implementation
Physical security is an important aspect of devices for which an adversary can manipulate the physical execution environment. Recently, more and more attention has been directed towards a security model that combines the capabilities of passive and active physical attacks, i.e., an adversary that performs fault-injection and side-channel analysis at the same time. Implementing countermeasures against such a powerful adversary is not only costly but also requires the skillful combination of...
Leakage-Tolerant Circuits
Yuval Ishai, Yifan Song
Foundations
A leakage-resilient circuit for $f:\{0,1\}^n\to\{0,1\}^m$ is a randomized Boolean circuit $C$ mapping a randomized encoding of an input $x$ to an encoding of $y=f(x)$, such that applying any leakage function $L\in \cal L$ to the wires of $C$ reveals essentially nothing about $x$. A leakage-tolerant circuit achieves the stronger guarantee that even when $x$ and $y$ are not protected by any encoding, the output of $L$ can be simulated by applying some $L'\in \cal L$ to $x$ and $y$ alone....
Exploring the Advantages and Challenges of Fermat NTT in FHE Acceleration
Andrey Kim, Ahmet Can Mert, Anisha Mukherjee, Aikata Aikata, Maxim Deryabin, Sunmin Kwon, HyungChul Kang, Sujoy Sinha Roy
Implementation
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation.
We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption
David Du Pont, Jonas Bertels, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
HiSE: Hierarchical (Threshold) Symmetric-key Encryption
Pousali Dey, Pratyay Mukherjee, Swagata Sasmal, Rohit Sinha
Cryptographic protocols
Threshold symmetric encryption (TSE), introduced by Agrawal et al. [DiSE, CCS 2018], provides scalable and decentralized solution for symmetric encryption by ensuring that the secret-key stays distributed at all times. They avoid having a single point of attack or failure, while achieving the necessary security requirements. TSE was further improved by Christodorescu et al. [ATSE, CCS 2021] to support an amortization feature which enables a “more privileged” client to encrypt records in bulk...
Prime Masking vs. Faults - Exponential Security Amplification against Selected Classes of Attacks
Thorben Moos, Sayandeep Saha, François-Xavier Standaert
Implementation
Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then...
HADES: Automated Hardware Design Exploration for Cryptographic Primitives
Fabian Buschkowski, Georg Land, Jan Richter-Brockmann, Pascal Sasdrich, Tim Güneysu
Implementation
While formal constructions for cryptographic schemes have steadily evolved and emerged over the past decades, the design and implementation of efficient and secure hardware instances is still a mostly manual, tedious, and intuition-driven process. With the increasing complexity of modern cryptography, e.g., Post-Quantum Cryptography (PQC) schemes, and consideration of physical implementation attacks, e.g., Side-Channel Analysis (SCA), the design space often grows exorbitantly without...
Perceived Information Revisited II: Information-Theoretical Analysis of Deep-Learning Based Side-Channel Attacks
Akira Ito, Rei Ueno, Naofumi Homma
Attacks and cryptanalysis
Previous studies on deep-learning-based side-channel attacks (DL-SCAs) have shown that traditional performance evaluation metrics commonly used in DL, like accuracy and F1 score, are not effective in evaluating DL-SCA performance. Therefore, some previous studies have proposed new alternative metrics for evaluating the performance of DL-SCAs. Notably, perceived information (PI) and effective perceived information (EPI) are major metrics based on information theory. While it has been...
X2X: Low-Randomness and High-Throughput A2B and B2A Conversions for $d+1$ shares in Hardware
Quinten Norga, Jan-Pieter D'Anvers, Suparna Kundu, Ingrid Verbauwhede
Implementation
The conversion between arithmetic and Boolean masking representations (A2B \& B2A) is a crucial component for side-channel resistant implementations of lattice-based (post-quantum) cryptography. In this paper, we first propose novel $d$-order algorithms for the secure addition (SecADDChain$_q$) and B2A (B2X2A). Our secure adder is well-suited for repeated ('chained') executions, achieved through an improved method for repeated masked modular reduction. The optimized B2X2A gadget removes a...
Improved All-but-One Vector Commitment with Applications to Post-Quantum Signatures
Dung Bui, Kelong Cong, Cyprien Delpech de Saint Guilhem
Public-key cryptography
Post-quantum digital signature schemes have recently received increased attention due to the NIST standardization project for additional signatures. MPC-in-the-Head and VOLE-in-the-Head are general techniques for constructing such signatures from zero-knowledge proof systems. A common theme between the two is an all-but-one vector commitment scheme which internally uses GGM trees. This primitive is responsible for a significant part of the computational time during signing and...
OBSCURE: Versatile Software Obfuscation from a Lightweight Secure Element
Darius Mercadier, Viet Sang Nguyen, Matthieu Rivain, Aleksei Udovenko
Applications
Software obfuscation is a powerful tool to protect the intellectual property or secret keys inside programs. Strong software obfuscation is crucial in the context of untrusted execution environments (e.g., subject to malware infection) or to face potentially malicious users trying to reverse-engineer a sensitive program. Unfortunately, the state-of-the-art of pure software-based obfuscation (including white-box cryptography) is either insecure or infeasible in practice.
This work...
SDitH in Hardware
Sanjay Deshpande, James Howe, Jakub Szefer, Dongze Yue
Implementation
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction.
This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors
Oren Ganon, Itamar Levi
Implementation
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
A Low-Latency High-Order Arithmetic to Boolean Masking Conversion
Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu
Implementation
Masking, an effective countermeasure against side-channel attacks, is commonly applied in modern cryptographic implementations. Considering cryptographic algorithms that utilize both Boolean and arithmetic masking, the conversion algorithm between arithmetic masking and Boolean masking is required. Conventional high-order arithmetic masking to Boolean masking conversion algorithms based on Boolean circuits suffer from performance overhead, especially in terms of hardware implementation. In...
Efficient Hardware Implementation for Maiorana-McFarland type Functions
Anupam Chattopadhyay, Subhamoy Maitra, Bimal Mandal, Manmatha Roy, Deng Tang
Secret-key cryptography
Maiorana--McFarland type constructions are basically concatenating the truth tables of linear functions on a smaller number of variables to obtain highly nonlinear ones on larger inputs. Such functions and their different variants have significant cryptology and coding theory applications. The straightforward hardware implementation of such functions using decoders (Khairallah et al., WAIFI 2018; Tang et al., SIAM Journal on Discrete Mathematics, 2019) requires exponential resources on the...
Evaluating the security of CRYSTALS-Dilithium in the quantum random oracle model
Kelsey A. Jackson, Carl A. Miller, Daochen Wang
Public-key cryptography
In the wake of recent progress on quantum computing hardware, the National Institute of Standards and Technology (NIST) is standardizing cryptographic protocols that are resistant to attacks by quantum adversaries. The primary digital signature scheme that NIST has chosen is CRYSTALS-Dilithium. The hardness of this scheme is based on the hardness of three computational problems: Module Learning with Errors (MLWE), Module Short Integer Solution (MSIS), and SelfTargetMSIS. MLWE and MSIS have...
Barrett Multiplication for Dilithium on Embedded Devices
Vincent Hwang, YoungBeom Kim, Seog Chung Seo
Implementation
We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett...
Holepunch: Fast, Secure File Deletion with Crash Consistency
Zachary Ratliff, Wittmann Goh, Abe Wieland, James Mickens, Ryan Williams
Cryptographic protocols
A file system provides secure deletion if, after a file is deleted, an attacker with physical possession of the storage device cannot recover any data from the deleted file. Unfortunately, secure deletion is not provided by commodity file systems. Even file systems which explicitly desire to provide secure deletion are challenged by the subtleties of hardware controllers on modern storage devices; those controllers obscure the mappings between logical blocks and physical blocks, silently...
FANNG-MPC: Framework for Artificial Neural Networks and Generic MPC
Najwa Aaraj, Abdelrahaman Aly, Tim Güneysu, Chiara Marcolla, Johannes Mono, Rogerio Paludo, Iván Santos-González, Mireia Scholz, Eduardo Soria-Vazquez, Victor Sucasas, Ajith Suresh
Cryptographic protocols
In this work, we introduce FANNG-MPC, a versatile secure multi-party computation framework capable to offer active security for privacy preserving machine learning as a service (MLaaS). Derived from the now deprecated SCALE-MAMBA, FANNG is a data-oriented fork, featuring novel set of libraries and instructions for realizing private neural networks, effectively reviving the popular framework. To the best of our knowledge, FANNG is the first MPC framework to offer actively secure MLaaS in the...
Efficient Low-Latency Masking of Ascon without Fresh Randomness
Srinidhi Hari Prasad, Florian Mendel, Martin Schläffer, Rishub Nagpal
Implementation
In this work, we present the first low-latency, second-order masked hardware implementation of Ascon that requires no fresh randomness using only $d+1$ shares. Our results significantly outperform any publicly known second-order masked implementations of AES and Ascon in terms of combined area, latency and randomness requirements. Ascon is a family of lightweight authenticated encryption and hashing schemes selected by NIST for standardization. Ascon is tailored for small form factors. It...
Breaking RSA Authentication on Zynq-7000 SoC and Beyond: Identification of Critical Security Flaw in FSBL Software
Prasanna Ravi, Arpan Jati, Shivam Bhasin
Attacks and cryptanalysis
In this report, we perform an in-depth analysis of the RSA authentication feature used in the secure boot procedure of Xilinx Zynq-7000 SoC device. The First Stage Boot Loader (FSBL) is a critical piece of software
executed during secure boot, which utilizes the RSA authentication feature to validate all the hardware and software partitions to be mounted on the device. We analyzed the implementation of FSBL (provided by
Xilinx) for the Zynq-7000 SoC and identified a critical security flaw,...
In-depth Correlation Power Analysis Attacks on a Hardware Implementation of CRYSTALS-Dilithium
Huaxin Wang, Yiwen Gao, Yuejun Liu, Qian Zhang, Yongbin Zhou
Attacks and cryptanalysis
During the standardisation process of post-quantum cryptography, NIST encourages research on side-channel analysis for candidate schemes. As the recommended lattice signature scheme, CRYSTALS-Dilithium, when implemented on hardware, has only been subjected to the side-channel attack presented by Steffen et al. in IACR ePrint 2022. This attack is not complete and requires excessive traces. Therefore, we investigate the leakage of an FPGA (Kintex7) implementation of CRYSTALS-Dilithium using...
2023/1889
Last updated: 2024-10-09
Fully Parallel, One-Cycle Random Shuffling for Efficient Countermeasure against Side Channel Attack and its Complexity Verification.
Jong-Yeon Park, Dongsoo Lee, Seonggyeom Kim, Wonil lee, Bo Gyeong Kang, Kouichi Sakurai
Foundations
Hiding countermeasures are the most widely utilized techniques for thwarting side-channel attacks, and their significance has been further emphasized with the advent of Post Quantum Cryptography (PQC) algorithms, owing to the extensive use of vector operations. Commonly, the Fisher-Yates algorithm is adopted in hiding countermeasures with permuted operation for its security and efficiency in implementation, yet the inherently sequential nature of the algorithm imposes limitations on hardware...
Falcon Takes Off - A Hardware Implementation of the Falcon Signature Scheme
Michael Schmid, Dorian Amiet, Jan Wendler, Paul Zbinden, Tao Wei
Implementation
Falcon is one out of three post-quantum signature schemes which have been selected for standardization by NIST in July 2022. To the best of our knowledge, Falcon is the only selected algorithm that does not yet have a publicly reported hardware description that performs signing or key generation. The reason might be that the Falcon signature and key generation algorithms do not fit well in hardware due to the use of floating-point numbers and recursive functions. This publication describes...
Cache Side-Channel Attacks Through Electromagnetic Emanations of DRAM Accesses
Julien Maillard, Thomas Hiscock, Maxime Lecomte, Christophe Clavier
Attacks and cryptanalysis
Remote side-channel attacks on processors exploit hardware and micro-architectural effects observable from software measurements. So far, the analysis of micro-architectural leakages over physical side-channels (power consumption, electromagnetic field) received little treatment. In this paper, we argue that those attacks are a serious threat, especially against systems such as smartphones and Internet-of-Things (IoT) devices which are physically exposed to the end-user. Namely, we show that...
Automatic Verification of Cryptographic Block Function Implementations with Logical Equivalence Checking
Li-Chang Lai, Jiaxiang Liu, Xiaomu Shi, Ming-Hsien Tsai, Bow-Yaw Wang, Bo-Yin Yang
Implementation
Given a fixed-size block, cryptographic block functions gen-
erate outputs by a sequence of bitwise operations. Block functions are
widely used in the design of hash functions and stream ciphers. Their
correct implementations hence are crucial to computer security. We pro-
pose a method that leverages logic equivalence checking to verify assem-
bly implementations of cryptographic block functions. Logic equivalence
checking is a well-established technique from hardware verification....
XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models
Dimitar Jetchev, Marius Vuille
Applications
Explainable AI (XAI) refers to the development of AI systems and machine learning models in a way that humans can understand, interpret and trust the predictions, decisions and outputs of these models. A common approach to explainability is feature importance, that is, determining which input features of the model have the most significant impact on the model prediction. Two major techniques for computing feature importance are LIME (Local Interpretable Model-agnostic Explanations) and...
Dora: A Simple Approach to Zero-Knowledge for RAM Programs
Aarushi Goel, Mathias Hall-Andersen, Gabriel Kaptchuk
Cryptographic protocols
Existing protocols for proving the correct execution of a RAM program in zero-knowledge are plagued by a processor expressiveness tradeoff: supporting fewer instructions results in smaller processor circuits (which improves performance), but may result in more program execution steps because non-supported instruction must be emulated over multiple processor steps (diminishing performance).
We present Dora, a very simple and concretely efficient zero-knowledge protocol for RAM programs...
A masking method based on orthonormal spaces, protecting several bytes against both SCA and FIA with a reduced cost
Claude Carlet, Abderrahman Daif, Sylvain Guilley, C�dric Tavernier
Cryptographic protocols
In the attacker models of Side-Channel Attacks (SCA) and Fault Injection Attacks (FIA), the opponent has access to a noisy version of the internal behavior of the hardware. Since the end of the nineties, many works have shown that this type of attacks constitutes a serious threat to cryptosystems implemented in embedded devices. In the state-of-the-art, there exist several countermeasures to protect symmetric encryption (especially AES-128). Most of them protect only against one of these two...
Aloha-HE: A Low-Area Hardware Accelerator for Client-Side Operations in Homomorphic Encryption
Florian Krieger, Florian Hirner, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the...
In aggregation queries, predicate parameters often reveal user intent. Protecting these parameters is critical for user privacy, regardless of whether the database is public or private. While most existing works focus on private data settings, we address a public data setting where the server has access to the database. Current solutions for this setting either require additional setups (e.g., noncolluding servers, hardware enclaves) or are inefficient for practical workloads. Furthermore,...
One of the main candidates of post-quantum cryptography is lattice-based cryptography. Its cryptographic security against quantum attackers is based on the worst-case hardness of lattice problems like the shortest vector problem (SVP), which asks to find the shortest non-zero vector in an integer lattice. Asymptotic quantum speedups for solving SVP are known and rely on Grover's search. However, to assess the security of lattice-based cryptography against these Grover-like quantum speedups,...
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli...
An involution is a permutation that is the inverse of itself. Involutions have attracted plenty attentions in cryptographic community due to their advantage regarding hardware implementations. In this paper, we reconsider constructing {\it pseudorandom involutions}. We demonstrate two constructions. First, the 4-round Feistel network {\it using the same random function (Feistel-SF) in every round} is a pseudorandom involution. This shows the Feistel-SF construction still provides...
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
Authenticated encryption (AE) is a cryptographic mechanism that allows communicating parties to protect the confidentiality and integrity of messages exchanged over a public channel, provided they share a secret key. In this work, we present new AE schemes leveraging the SHA-3 standard functions SHAKE128 and SHAKE256, offering 128 and 256 bits of security strength, respectively, and their “Turbo” counterparts. They support session-based communication, where a ciphertext authenticates the...
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
In scenarios where a seller holds sensitive data $x$, like employee / patient records or ecological data, and a buyer seeks to obtain an evaluation of specific function $f$ on this data, solutions in trustless digital environments like blockchain-based Web3 systems typically fall into two categories: (1) Smart contract-powered solutions and (2) cryptographic solutions leveraging tools such as adaptor signatures. The former approach offers atomic transactions where the buyer learns the...
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
We propose a wallet provider issued attestation called Wallet Trust Evidence (WTE) and three related specific instructions for the European Digital Identity (EUDI) Wallet cryptographic hardware, most notably the generation of a Proof of Association (PoA). These allow the EUDI Wallet providing verifiable assurance to third parties (issuers, relying parties) that attestation private keys are not only bound to conformant cryptographic hardware but also that they are bound to the same such...
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...
AES implementation has been vastly analysed against side-channel attacks in the last two decades particularly targeting resource-constrained microcontrollers. Still, less research has been conducted on AES implementations on advanced hardware platforms. In this study, we examine the resilience of AES on an ARM Cortex A72 processor within the Raspberry Pi 4B model. Unlike their microcontroller counterparts, these platforms operate within the complex ecosystem of an operating system (OS),...
Secure elements are small microcontrollers whose main purpose is to generate/store secrets and then execute cryptographic operations. They undergo the highest level of security evaluations that exists (Common Criteria) and are often considered inviolable, even in the worst-case attack scenarios. Hence, complex secure systems build their security upon them. FIDO hardware tokens are strong authentication factors to sign in to applications (any web service supporting FIDO); they often embed...
A Trusted Execution Environment (TEE) is a new type of security technology, implemented by CPU manufacturers, which guarantees integrity and confidentiality on a restricted execution environment to any remote verifier. TEEs are deployed on various consumer and commercial hardwareplatforms, and have been widely adopted as a component in the design of cryptographic protocols both theoretical and practical. Within the provable security community, the use of TEEs as a setup assumption has...
This short note describes an update to the sca25519 library, an ECC implementation computing the X25519 key-exchange protocol on the Arm Cortex-M4 microcontroller. The sca25519 software came with extensive mitigations against various side-channel and fault attacks and was, to our best knowledge, the first to claim affordable protection against multiple classes of attacks that are motivated by distinct real-world application scenarios. This library is protected against various passive and...
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
Secure multi-party computation (MPC) enables multiple distrusting parties to jointly compute a function while keeping their inputs private. Computing the AES block cipher in MPC, where the key and/or the input are secret-shared among the parties is important for various applications, particularly threshold cryptography. In this work, we propose a family of dedicated, high-performance MPC protocols to compute the non-linear S-box part of AES in the honest majority setting. Our protocols...
Side-channel attacks (SCAs) remain a significant threat to the security of cryptographic systems in modern embedded devices. Even mathematically secure cryptographic algorithms, when implemented in hardware, inadvertently leak information through physical side-channel signatures such as power consumption, electromagnetic (EM) radiation, light emissions, and acoustic emanations. Exploiting these side channels significantly reduces the attacker’s search space. In recent years, physical...
This paper presents a survey of the state-of-the-art pre-silicon security verification techniques for System-on-Chip (SoC) designs, focusing on ensuring that designs, implemented in hardware description languages (HDLs) and synthesized circuits, meet security requirements before fabrication in semiconductor foundries. Due to several factors, pre-silicon security verification has become an essential yet challenging aspect of the SoC hardware lifecycle. The modern SoC design process often...
Masking has become a widely applied and heavily researched method to protect cryptographic implementations against SCA attacks. The success of masking is primarily attributed to its strong theoretical foundation enabling it to formally prove security by modeling physical properties through so-called probing models. Specifically, the robust $d$-probing model enables us to prove the security for arbitrarily masked hardware circuits, manually or with the assistance of automated tools, even when...
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Considering side-channel analysis (SCA) security for cryptographic devices, the mitigation of electromagnetic leakage and electromagnetic interference (EMI) between modules poses significant challenges. This paper presents a comprehensive review and deep analysis of the utilization of EMI shielding materials, devised for reliability purposes and standards such as EMI/EMC, as a countermeasure to enhance EM-SCA security. We survey the current landscape of EMI-shields materials, including...
This paper introduces the Koala PRF, which maps a variable-length sequence of $64$-bit input blocks to a single $257$-bit output block. Its design focuses on achieving low latency in its implementation in ASIC. To construct Koala, we instantiate the recently introduced Kirby construction with the Koala-P permutation and add an input encoding layer. The Koala-P permutation is obtained as the $8$-fold iteration of a simple round function inspired by that of Subterranean. Based on...
Making the most of TFHE programmable bootstrapping to evaluate functions or operators otherwise challenging to perform with only the native addition and multiplication of the scheme is a very active line of research. In this paper, we systematize this approach and apply it to build an 8-bit FHE processor abstraction, i.e., a software entity that works over FHE-encrypted 8-bits data and presents itself to the programmer by means of a conventional-looking assembly instruction set. In...
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
In this paper, using the concept of equivalence of mappings we characterize all of the one-XOR matrices which are used in hardware applications and propose a family of lightweight linear mappings for software-oriented applications in symmetric cryptography. Then, we investigate interleaved linear mappings and based upon this study, we present generalized dynamic primitive LFSRs along with dynamic linear components for construction of diffusion layers. From the mathematical...
Side-channel-analysis (SCA) resistance with cost optimization in AES hardware implementations remains a significant challenge. While traditional masking-based schemes offer provable security, they often incur substantial resource overheads (latency, area, randomness, performance, power consumption). Alternatively, the RAMBAM scheme introduced a redundancy-based approach to control the signal-to-noise ratio, and achieves exponential leakage reduction as redundancy increases. This method...
In this note, we introduce the MATTER Tweakable Block Cipher, designed principally for low latency in low-area hardware implementations, but that can also be implemented in an efficient and compact way in software. MATTER is a 512-bit wide balanced Feistel network with three to six rounds, using the ASCON permutation as the round function. The Feistel network defines a keyed, non-tweakable core, which is made tweakable by using the encryption of the tweak as its key. Key and tweak are...
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
Oblivious Transfer (OT) is a fundamental cryptographic primitive, becoming a crucial component of a practical secure protocol. OT is typically implemented in software, and one way to accelerate its running time is by using hardware implementations. However, such implementations are vulnerable to side-channel attacks (SCAs). On the other hand, protecting interactive protocols against SCA is highly challenging because of their longer secrets (which include inputs and randomness), more...
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
This paper proposes Oryx, a system for efficiently detecting cycles in federated graphs where parts of the graph are held by different parties and are private. Cycle detection is an important building block in designing fraud detection algorithms that operate on confidential transaction data held by different financial institutions. Oryx allows detecting cycles of various length while keeping the topology of the graphs secret, and it does so efficiently; Oryx achieves quasilinear...
Oblivious Transfer (OT) is at the heart of secure computation and is a foundation for many applications in cryptography. Over two decades of work have led to extremely efficient protocols for evaluating OT instances in the preprocessing model, through a paradigm called OT extension. A few OT instances generated in an offline phase can be used to perform many OTs in an online phase efficiently, i.e., with very low communication and computational overheads. Specifically, traditional OT...
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal...
With the rise of generative AI technology, the media's credibility as a source of truth has been significantly compromised. This highlights the need to verify the authenticity of media and its originality. Ensuring the integrity of media during capture using the device itself presents a straightforward solution to this challenge. However, raw captured media often require certain refinements or redactions before publication. Zero-knowledge proofs (ZKP) offer a solution by allowing...
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers. We investigate popular open-source GC frameworks and discover that the AES...
The RNS variant of the CKKS scheme (SAC 2018) is widely implemented due to its computational efficiency. However, the current optimized implementations of the RNS-CKKS scheme have a limitation when choosing the ciphertext modulus. It requires the scale factors to be approximately equal to a factor (or a product of factors) of the ciphertext modulus. This restriction causes inefficiency when the scale factor is not close to the power of the machine's word size, wasting the machine's...
In this work, we construct a second price (Vickrey) auction protocol (SPA), which does not require any auctioneers and ensures total privacy in the presence of rational parties participating in auction. In particular, the confidentiality of the highest bid and the identity of the second highest bidder are protected. We model the bidders participating in the second price auction as rational, computationally bounded and privacy-sensitive parties. These are self-interested agents who care about...
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
Polynomials play a central role in cryptography. In the context of Zero Knowledge Proofs (ZKPs), protocols can be exclusively expressed using polynomials, making them a powerful abstraction tool, as demonstrated in most ZK research papers. Our first contribution is a high-level framework that enables practitioners to implement ZKPs in a more natural way, based solely on polynomial primitives. ZK provers are considered computationally intensive algorithms with a high degree of...
The Learning with Errors problem (LWE) and its variants are among the most popular assumptions underlying lattice-based cryptography. The Learning with Rounding problem (LWR) can be thought of as a deterministic variant of LWE. While lattice-based cryptography is known to enable many advanced constructions, constructing Fully Homomorphic Encryption schemes based on LWR remains an under-explored part of the literature. In this work, we present a thorough study of Somewhat Homomorphic...
Software based cryptographic implementations provide flexibility but they face performance limitations. In contrast, hardware based cryptographic accelerators utilize application-specific customization to provide real-time security solutions. Cryptographic instruction-set extensions (CISE) combine the advantages of both hardware and software based solutions to provide higher performance combined with the flexibility of atomic-level cryptographic operations. While CISE is widely used to...
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
We present the first comprehensive approach for detecting and analyzing symmetric cryptographic primitives in gate-level descriptions of hardware. To capture both ASICs and FPGAs, we model the hardware as a directed graph, where gates become nodes and wires become edges. For modern chips, those graphs can easily consist of hundreds of thousands of nodes. More abstractly, we find subgraphs corresponding to cryptographic primitives in a potentially huge graph, the sea-of-gates, describing an...
Despite masking being a prevalent protection against passive side-channel attacks, implementing it securely in hardware remains a manual, challenging, and error-prone process. This paper introduces INDIANA, a comprehensive security verification tool for hardware masking. It provides a hardware verification framework, enabling a complete analysis of simulation-based security in the glitch-extended probing model, with cycle-accurate estimations for leakage probabilities in the random...
Shaped prime moduli are often considered for use in elliptic curve and isogeny-based cryptography to allow for faster modular reduction. Here we focus on the most common choices for shaped primes that have been suggested, that is pseudo-Mersenne, generalized Mersenne and Montgomery-friendly primes. We consider how best to to exploit these shapes for maximum efficiency, and provide an open source tool to automatically generate, test and time working high-level language finite-field code....
Byzantine Reliable Broadcast is one of the most popular communication primitives in distributed systems. Byzantine reliable broadcast ensures that processes agree to deliver a message from an initiator even if some processes (perhaps including the initiator) are Byzantine. In asynchronous settings it is known since the prominent work of Bracha [Bracha87] that Byzantine reliable broadcast can be implemented deterministically if $n \geq 3t+1$ where $t$ is an upper bound on the...
Threshold public key encryption schemes distribute secret keys among multiple parties, known as the committee, to reduce reliance on a single trusted entity. However, existing schemes face inefficiencies as the committee should perform computation and communication for decryption of each individual ciphertext. As the number of ciphertexts being decrypted per unit of time increases, this can limit the number of committee parties and their decentralization due to increased hardware...
Private Information Retrieval (PIR) permits clients to query entries from a public database hosted on untrusted servers in a privacy-preserving manner. Traditional PIR model suffers from high computation and/or bandwidth cost due to entire database processing for privacy. Recently, Online-Offline PIR (OO-PIR) has been suggested to improve the practicality of PIR, where query-independent materials are precomputed beforehand to accelerate online access. While state-of-the-art OO-PIR schemes...
Fault Injection (FI) attacks, which involve intentionally introducing faults into a system to cause it to behave in an unintended manner, are widely recognized and pose a significant threat to the security of cryptographic primitives implemented in hardware, making fault tolerance an increasingly critical concern. However, protecting cryptographic hardware primitives securely and efficiently, even with well-established and documented methods such as redundant computation, can be a...
We present a framework for privacy-preserving streaming algorithms which combine the memory-efficiency of streaming algorithms with strong privacy guarantees. These algorithms enable some number of servers to compute aggregate statistics efficiently on large quantities of user data without learning the user's inputs. While there exists limited prior work that fits within our model, our work is the first to formally define a general framework, interpret existing methods within this general...
This paper endeavors to securely implement a Physical Unclonable Function (PUF) for private data generation within Field-Programmable Gate Arrays (FPGAs). SRAM PUFs are commonly utilized due to their use of memory devices for generating secret data, particularly in resource constrained devices. However, their reliance on memory access poses side-channel threats such as data remanence decay and memory-based attacks, and the time required to generate secret data is significant. To address...
The elliptic curve-based Enhanced Privacy ID (EPID) signature scheme is broadly used for hardware enclave attestation by many platforms that implement Intel Software Guard Extensions (SGX) and other devices. This scheme has also been included in the Trusted Platform Module (TPM) specifications and ISO/IEC standards. However, it is insecure against quantum attackers. While research into quantum-resistant EPID has resulted in several lattice-based schemes, Boneh et al. have initiated the study...
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
The Number Theoretic Transform (NTT) is a powerful mathematical tool that has become increasingly important in developing Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE). Its ability to efficiently calculate polynomial multiplication using the convolution theorem with a quasi-linear complexity $O(n \log{n})$ instead of $O(n^2)$ when implemented with Fast Fourier Transform-style algorithms has made it a key component in modern cryptography. FFT-style NTT algorithm or fast-NTT...
MRAE security is an important goal for many AEAD applications where the nonce uniqueness cannot be maintained and security risks are significant. However, MRAE schemes can be quite expensive. Two of the SoTA MRAE-secure schemes; Deoxys-II and AES-GCM-SIV rely on internal parallelism and special instructions to achieve competitive performance. However, they both suffer from the same bottleneck, they have at least one call to the underlying primitive that cannot be parallelized to any other...
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes. The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
We explore Zero-Knowledge proofs (ZKP) of statements expressed as programs written in high-level languages, e.g., C or assembly. At the core of executing such programs in ZK is the repeated evaluation of a CPU step, achieved by branching over the CPU’s instruction set. This approach is general and covers traversal-execution of a program’s control flow graph (CFG): here CPU instructions are straight-line program fragments (of various sizes) associated with the CFG nodes. This highlights the...
In the rapidly evolving fields of encryption and blockchain technologies, the efficiency and security of cryptographic schemes significantly impact performance. This paper introduces a comprehensive framework for continuous benchmarking in one of the most popular cryptography Rust libraries, fastcrypto. What makes our analysis unique is the realization that automated benchmarking is not just a performance monitor and optimization tool, but it can be used for cryptanalysis and innovation...
The threshold implementation technique has been proposed in 2006 by Nikova et al. as a countermeasure to mitigate cryptographic side-channel attacks on hardware implementations when the effect of glitches is taken into account. This technique is based on Boolean sharing (also called masking) and it was developed for securing symmetric ciphers such as AES. In 2023, Piccione et al. proposed a general construction of threshold implementations that is universal for S-boxes that are bijective...
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
Hardening microprocessors against side-channel attacks is a critical aspect of ensuring their security. A key step in this process is identifying and mitigating “leaky" hardware modules, which leak information during the execution of cryptographic algorithms. In this paper, we explore how different leakage detection methods, the Side-channel Vulnerability Factor (SVF) and the Test Vector Leakage Assessment (TVLA), contribute to hardening of microprocessors. We conduct experiments on two...
We report on efficient and secure hardware implementation techniques for the FIPS 205 SLH-DSA Hash-Based Signature Standard. We demonstrate that very significant overall performance gains can be obtained from hardware that optimizes the padding formats and iterative hashing processes specific to SLH-DSA. A prototype implementation, SLotH, contains Keccak/SHAKE, SHA2-256, and SHA2-512 cores and supports all 12 parameter sets of SLH-DSA. SLotH also supports side-channel secure PRF computation...
Physical security is an important aspect of devices for which an adversary can manipulate the physical execution environment. Recently, more and more attention has been directed towards a security model that combines the capabilities of passive and active physical attacks, i.e., an adversary that performs fault-injection and side-channel analysis at the same time. Implementing countermeasures against such a powerful adversary is not only costly but also requires the skillful combination of...
A leakage-resilient circuit for $f:\{0,1\}^n\to\{0,1\}^m$ is a randomized Boolean circuit $C$ mapping a randomized encoding of an input $x$ to an encoding of $y=f(x)$, such that applying any leakage function $L\in \cal L$ to the wires of $C$ reveals essentially nothing about $x$. A leakage-tolerant circuit achieves the stronger guarantee that even when $x$ and $y$ are not protected by any encoding, the output of $L$ can be simulated by applying some $L'\in \cal L$ to $x$ and $y$ alone....
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation. We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
Threshold symmetric encryption (TSE), introduced by Agrawal et al. [DiSE, CCS 2018], provides scalable and decentralized solution for symmetric encryption by ensuring that the secret-key stays distributed at all times. They avoid having a single point of attack or failure, while achieving the necessary security requirements. TSE was further improved by Christodorescu et al. [ATSE, CCS 2021] to support an amortization feature which enables a “more privileged” client to encrypt records in bulk...
Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then...
While formal constructions for cryptographic schemes have steadily evolved and emerged over the past decades, the design and implementation of efficient and secure hardware instances is still a mostly manual, tedious, and intuition-driven process. With the increasing complexity of modern cryptography, e.g., Post-Quantum Cryptography (PQC) schemes, and consideration of physical implementation attacks, e.g., Side-Channel Analysis (SCA), the design space often grows exorbitantly without...
Previous studies on deep-learning-based side-channel attacks (DL-SCAs) have shown that traditional performance evaluation metrics commonly used in DL, like accuracy and F1 score, are not effective in evaluating DL-SCA performance. Therefore, some previous studies have proposed new alternative metrics for evaluating the performance of DL-SCAs. Notably, perceived information (PI) and effective perceived information (EPI) are major metrics based on information theory. While it has been...
The conversion between arithmetic and Boolean masking representations (A2B \& B2A) is a crucial component for side-channel resistant implementations of lattice-based (post-quantum) cryptography. In this paper, we first propose novel $d$-order algorithms for the secure addition (SecADDChain$_q$) and B2A (B2X2A). Our secure adder is well-suited for repeated ('chained') executions, achieved through an improved method for repeated masked modular reduction. The optimized B2X2A gadget removes a...
Post-quantum digital signature schemes have recently received increased attention due to the NIST standardization project for additional signatures. MPC-in-the-Head and VOLE-in-the-Head are general techniques for constructing such signatures from zero-knowledge proof systems. A common theme between the two is an all-but-one vector commitment scheme which internally uses GGM trees. This primitive is responsible for a significant part of the computational time during signing and...
Software obfuscation is a powerful tool to protect the intellectual property or secret keys inside programs. Strong software obfuscation is crucial in the context of untrusted execution environments (e.g., subject to malware infection) or to face potentially malicious users trying to reverse-engineer a sensitive program. Unfortunately, the state-of-the-art of pure software-based obfuscation (including white-box cryptography) is either insecure or infeasible in practice. This work...
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
Masking, an effective countermeasure against side-channel attacks, is commonly applied in modern cryptographic implementations. Considering cryptographic algorithms that utilize both Boolean and arithmetic masking, the conversion algorithm between arithmetic masking and Boolean masking is required. Conventional high-order arithmetic masking to Boolean masking conversion algorithms based on Boolean circuits suffer from performance overhead, especially in terms of hardware implementation. In...
Maiorana--McFarland type constructions are basically concatenating the truth tables of linear functions on a smaller number of variables to obtain highly nonlinear ones on larger inputs. Such functions and their different variants have significant cryptology and coding theory applications. The straightforward hardware implementation of such functions using decoders (Khairallah et al., WAIFI 2018; Tang et al., SIAM Journal on Discrete Mathematics, 2019) requires exponential resources on the...
In the wake of recent progress on quantum computing hardware, the National Institute of Standards and Technology (NIST) is standardizing cryptographic protocols that are resistant to attacks by quantum adversaries. The primary digital signature scheme that NIST has chosen is CRYSTALS-Dilithium. The hardness of this scheme is based on the hardness of three computational problems: Module Learning with Errors (MLWE), Module Short Integer Solution (MSIS), and SelfTargetMSIS. MLWE and MSIS have...
We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett...
A file system provides secure deletion if, after a file is deleted, an attacker with physical possession of the storage device cannot recover any data from the deleted file. Unfortunately, secure deletion is not provided by commodity file systems. Even file systems which explicitly desire to provide secure deletion are challenged by the subtleties of hardware controllers on modern storage devices; those controllers obscure the mappings between logical blocks and physical blocks, silently...
In this work, we introduce FANNG-MPC, a versatile secure multi-party computation framework capable to offer active security for privacy preserving machine learning as a service (MLaaS). Derived from the now deprecated SCALE-MAMBA, FANNG is a data-oriented fork, featuring novel set of libraries and instructions for realizing private neural networks, effectively reviving the popular framework. To the best of our knowledge, FANNG is the first MPC framework to offer actively secure MLaaS in the...
In this work, we present the first low-latency, second-order masked hardware implementation of Ascon that requires no fresh randomness using only $d+1$ shares. Our results significantly outperform any publicly known second-order masked implementations of AES and Ascon in terms of combined area, latency and randomness requirements. Ascon is a family of lightweight authenticated encryption and hashing schemes selected by NIST for standardization. Ascon is tailored for small form factors. It...
In this report, we perform an in-depth analysis of the RSA authentication feature used in the secure boot procedure of Xilinx Zynq-7000 SoC device. The First Stage Boot Loader (FSBL) is a critical piece of software executed during secure boot, which utilizes the RSA authentication feature to validate all the hardware and software partitions to be mounted on the device. We analyzed the implementation of FSBL (provided by Xilinx) for the Zynq-7000 SoC and identified a critical security flaw,...
During the standardisation process of post-quantum cryptography, NIST encourages research on side-channel analysis for candidate schemes. As the recommended lattice signature scheme, CRYSTALS-Dilithium, when implemented on hardware, has only been subjected to the side-channel attack presented by Steffen et al. in IACR ePrint 2022. This attack is not complete and requires excessive traces. Therefore, we investigate the leakage of an FPGA (Kintex7) implementation of CRYSTALS-Dilithium using...
Hiding countermeasures are the most widely utilized techniques for thwarting side-channel attacks, and their significance has been further emphasized with the advent of Post Quantum Cryptography (PQC) algorithms, owing to the extensive use of vector operations. Commonly, the Fisher-Yates algorithm is adopted in hiding countermeasures with permuted operation for its security and efficiency in implementation, yet the inherently sequential nature of the algorithm imposes limitations on hardware...
Falcon is one out of three post-quantum signature schemes which have been selected for standardization by NIST in July 2022. To the best of our knowledge, Falcon is the only selected algorithm that does not yet have a publicly reported hardware description that performs signing or key generation. The reason might be that the Falcon signature and key generation algorithms do not fit well in hardware due to the use of floating-point numbers and recursive functions. This publication describes...
Remote side-channel attacks on processors exploit hardware and micro-architectural effects observable from software measurements. So far, the analysis of micro-architectural leakages over physical side-channels (power consumption, electromagnetic field) received little treatment. In this paper, we argue that those attacks are a serious threat, especially against systems such as smartphones and Internet-of-Things (IoT) devices which are physically exposed to the end-user. Namely, we show that...
Given a fixed-size block, cryptographic block functions gen- erate outputs by a sequence of bitwise operations. Block functions are widely used in the design of hash functions and stream ciphers. Their correct implementations hence are crucial to computer security. We pro- pose a method that leverages logic equivalence checking to verify assem- bly implementations of cryptographic block functions. Logic equivalence checking is a well-established technique from hardware verification....
Explainable AI (XAI) refers to the development of AI systems and machine learning models in a way that humans can understand, interpret and trust the predictions, decisions and outputs of these models. A common approach to explainability is feature importance, that is, determining which input features of the model have the most significant impact on the model prediction. Two major techniques for computing feature importance are LIME (Local Interpretable Model-agnostic Explanations) and...
Existing protocols for proving the correct execution of a RAM program in zero-knowledge are plagued by a processor expressiveness tradeoff: supporting fewer instructions results in smaller processor circuits (which improves performance), but may result in more program execution steps because non-supported instruction must be emulated over multiple processor steps (diminishing performance). We present Dora, a very simple and concretely efficient zero-knowledge protocol for RAM programs...
In the attacker models of Side-Channel Attacks (SCA) and Fault Injection Attacks (FIA), the opponent has access to a noisy version of the internal behavior of the hardware. Since the end of the nineties, many works have shown that this type of attacks constitutes a serious threat to cryptosystems implemented in embedded devices. In the state-of-the-art, there exist several countermeasures to protect symmetric encryption (especially AES-128). Most of them protect only against one of these two...
Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the...