chili-chips-ba / wireguard-fpga
- среда, 15 октября 2025 г. в 00:00:04
Full-throttle, wire-speed hardware implementation of Wireguard VPN, using low-cost Artix7 FPGA with opensource toolchain. If you seek security and privacy, nothing is private in our codebase. Our door is wide open for backdoor scrutiny, be it related to RTL, embedded, build, bitstream or any other aspect of design and delivery package. Bujrum!
Virtual Private Networks (VPNs) are the central and indispensable component of Internet security. They comprise a set of technologies that connect geographically dispersed, heterogeneous networks through encrypted tunnels, creating the impression of a homogenous private network on the public shared physical medium.
With traditional solutions (such as OpenVPN / IPSec) starting to run out of steam, Wireguard is increasingly coming to the forefront as a modern, secure data tunneling and encryption method, one that's also easier to manage than the incumbents. Both software and hardware implementations of Wireguard already exist. However, the software performance is far below the speed of wire. Existing hardware approaches are both prohibitively expensive and based on proprietary, closed-source IP blocks and tools.
We have contributed to the Blackwire project, which is a 100Gbps hardware implementation of Wireguard switch based on AMD/Xilinx-proprietary AlveoU50 PC-accelerator card (SmartNIC form-factor), and implementable only with proprietary Vivado toolchain.
While working on the Blackwire, we have touched multiple sections, and focused on the novel algorithm for Balanced Binary Tree Search of IP tables. However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, a nice and powerfull but a niche HDL, which has not taken roots in the industry. While Blackwire is now released to open-source, that decision came from their financial hardship -- It was originaly meant for sale. Moreover, the company behind it is subject to disputes and obligations that bring into question the legality of ownership over the codebase they "donated" to the open source community.
To make the hardware Wireguard truly accessible in the genuine spirit of open-source movement, this project implements it:
[Ref1] Wireguard implementations in software:
[Ref2] 100Gbps Blackwire Wireguard
[Ref3] Corundum, open-source FPGA-NIC platform
[Ref4] ChaCha20-Poly1305 open-source Crypto RTL
[Ref5] Cookie Cutter SOC
[Ref6] RISC-V ISS
[Ref7] 10Gbps Ethernet Switch
[Ref8] OpenXC7 open-source tools for Xilinx Series7
[Ref9] Alex's Ethernet Stack
[Ref10] Amina's ADASEC-SDN
The Phase1 (This!) is primarily Proof of Concept, i.e. not full-featured, and definitely not a deployable product. It is envisoned as a mere on-ramp, a springboard for future build-up and optimizations.
The Phase2 continuation project is therefore also in the plans, to maximize efficiency and overall useability, such as by increasing the number of channels, facilitating management with GUI apps, or something else as identified by the community feedback.
HW/SW partitioning, interface, interactions and workload distribution
HW/SW co-development, integration and debugging
Real-life, at-speed testing
Extent of open-source tools support for SystemVerilog and all needed FPGA primitives and IP functions
QOR of the (still maturing) open-source tools
Financial resources
This project is WIP at the moment. The checkmarks below indicate our status. Until all checkmarks are in place, anything you get from here is w/o guaranty -- Use at own risk, as you see fit, and don't blame us if it is not working 🌤️
Board bring up. In-depth review of Wireguard ecosystem and prior art. Design Blueprint
While the board we're using is low cost, it is also not particularly known in the open-source community. We certainly don’t have prior experience with it. In this opening take we will build a solid foundation for efficient project execution. Good preparation is crucial for a smooth run. We thus seek to first understand and document what we will be designing: SOC Architecture, Datapath Microarchitecture, Hardware/Software Partitioning, DV and Validation Strategy
.
Getting a good feel for our Fmax is also a goal of this take. Artix-7 does not support High-Performance (HP) I/O. Consequently, we cannot push its I/O beyond 600MHz, nor its core logic beyond 100 MHz.
Familiarization with HW platform
Familiarization with SW platform
Detailed analysis and comparisons of:
Identification and assimilation of prior art and building IP blocks, in particular Corundum [Ref3] and, to a lesser extent, 10GE Switch [Ref7]
Architecture/uArch Design. HW/SW Partitioning. Verification Plan
Creation of sufficient initial documentation for project divide-and-conquer across a multi-disciplinary team of half a dozen developers
Implementation of a basic, statically pre-configured Wireguard link
It it in this take that we start creating hardware Datapath and hardening Wireguard encryption protocols, all using Vivado and Xilinx primitives.
Integration of collected RTL blocks into a coherent HW system that implements the basic Wireguard datapath for a handful of manually pre-configured channels.
Corundum FPGA-based NIC and platform for opensource Ethernet development [Ref3]
IP Core for ChaCha20-Poly1305 [Ref4] -- Definitely in hardware from the get-go
Curve25519 module for key exchange -- Likely in software at this point
blake2 module for hashing (we'll most likely do it in software)
Timing closure. Resolution of FPGA device utilization and routing congestion issues
Creation of cocoTB DV in the CI/CD environmenT, and representative test cases for datapath simulation
Development and integration of embedded management software (Control Plane)
This work package is about hardware/software codesign and integration. The firmware will run on a soft RISC V processor, inside the FPGA. Our vanilla SOC is at this point starting to be customized to Wireguard needs. This work can to some extent go on in parallel with hardware activities of Take2.
SW design for on-chip processor (Part 1)
SW design for on-chip processor (Part 2)
HW/SW Integration
VPN Tunnel: Session initialization, maintenance, and secure closure
This is about managing the bring-up, maintenance and tear-down of VPN tunnels between two devices.
Testing, Profiling and Porting to OpenXC7
Functional testing on the real system. Does it work as intended? Bug fixes
Performance testing. HW/SW profiling, updates and enhancements to ensure the design indeed operates at close to the wire speed on all preconfigured channels
Porting to openXC7 [Ref8] using SV2V, in the GoCD CI/CD setting
Timing closure with openXC7
Filing bug tickets with open source developers for issues found in their tools, supporting them all the way to the resolution
Creation and maintenance of an attractive and well-documented Github repo, to entice community interest
Ongoing documentation updates and CI/CD script maintenance to keep it valid in the light of inevitable design mutations compared to the original Design Blueprint.
Flow control module for efficient and stable VPN tunnel data management
The objective of this optional deliverable is to ensure stable and efficient links, thus taking this project one step closer to a deployable product.
Since the WireGuard node essentially functions as an IP router with WireGuard protocol support, we have decided to design the system according to a two-layer architecture: a control plane responsible for managing IP routing processes and executing the WireGuard protocol (managing remote peers, sessions, and keys), and a data plane that will perform IP routing and cryptography processes at wire speed. The control plane will be implemented as software running on a soft CPU, while the data plane will be fully implemented in RTL on an FPGA.
In the HW/SW partitioning diagram, we can observe two types of network traffic: control traffic, which originates from the control plane and goes toward the external network (and vice versa), and data traffic, which arrives from the external network and, after processing in the data plane, returns to the external network. Specifically, control traffic represents WireGuard protocol handshake messages, while data traffic consists of end-user traffic, either encrypted or in plaintext, depending on the perspective.
The hardware architecture essentially follows the HW/SW partitioning and consists of two domains: a soft CPU for the control plane and RTL for the data plane.
The soft CPU is equipped with a Boot ROM and a DDR3 SDRAM controller for interfacing with off-chip memory. External memory is exclusively used for control plane processes and does not store packets. The connection between the control and data planes is established through a CSR-based HAL.
The data plane consists of several IP cores, including data plane engine (DPE) and supporting components, which are listed and explained in the direction of network traffic propagation:
ChaCha20-Poly1305 Encryptor/Decryptor are using RFC7539's AEAD (Authenticated Encryption Authenticated Data) construction based on ChaCha20 for symmetric encryption and Poly1305 for authentication.
The details of hardware architecture can be found in the README.md in the 1.hw/
directory.
The conceptual class diagram provides an overview of the components in the software part of the system without delving into implementation details. The focus is on the WireGuard Agent, which implements the protocol's handshake procedures, along with the following supplementary components:
The details of software architecture can be found in the README.md in the 2.sw/
directory.
To illustrate the operation of the system as a whole, we have prepared a step-by-step analysis of packets processing based on the capture of real WireGuard traffic. The experimental topology consists of four nodes:
The detailed analysis can be found in the README.md in the 1.hw/
directory.
The Wireguard FPGA test bench aims to have a flexible approach to simulation which allows a common test environment to be used whilst selecting between alternative CPU components, one of which uses the VProc virtual processor co-simulation element. This allows simulations to be fully HDL, with a RISC-V processor RTL implementation such as picoRV32, IBEX or EDUBOS5, or to co-simulate software using the virtual processor, with a significant speed up in simulation times. The test bench has the following features:
soc_cpu.VPROC
component
bfm_ethernet
block.
The figure below shows an oveview block diagram of the test bench HDL.
More details on the architecture and usage of the Wireguard test bench can be found in the README.md in the 4.sim
directory.
The Wireguard control and status register harware abstraction layer (HAL) software is auto-generated, as is the CSR RTL, using peakrdl
. For co-simulation purposes an additional layer is auto-generated from the same SystemRDL specification using systemrdl-compiler
that accompanies the peakrdl
tools. This produces two header files that define a common API to the application layer for both the RISC-V platform and the VProc based co-simulation verification environment. The details of the HAL generation can be found in the README.md in the 3.build/
directory.
TODO
WIP
TODO
TODO
TODO
TODO
We are grateful to NLnet Foundation for their sponsorship of this development activity.
The wyvernSemi's wisdom and contribution made a great deal of difference -- Thank you, we are honored to have you on the project.