I recently heard about Neuromorphic Processing Units, or NPUs, while discussing Neuralink. Innatera Pulsar IC Those NPUs have nothing to do with Quantum computing, but I thought it would be interesting to spend a bit of time getting inspired with a new processing architecture. And as a matter of fact, one of the leading IC vendors in this domain is a Dutch company called Innatera. Innatera’s latest IC, Pulsar, is based on low-power Spiking Neural Networks (see the picture on the right).

In this memo, I will try to reverse-engineer as much information as possible about how those Neuromorphic ICs are architected, how the Spiking Neural Networks are designed, and what software stack is exposed to program those devices.

Hardware Architecture Link to heading

This is an attempt to redraw the architecture diagram for the Pulsar IC (original image, complemented with insights from the white paper):

I do like the idea of using a RISC-V CPU - that’s a clear sign that the team at Innatera is making the right day-one architecture decision. In one of the diagrams from Innatera’s earlier T1 chip, the RISC-V CPU is bundled with a F extension (Floating-Point computation support).

The IC integrates many standard interfaces (SPI, I2C, …) and a power management unit (PMU) that turns on and off parts of the IC when entering “sleep mode” or “power saving” mode. Unfortunately, I could not find any data on the actual power consumption, expect from a ballpack of “less than 1mW”. Assuming a 1.8V supply voltage, this would less than 500 µA (I=W/V). I also wonder if the “per cycle” power metrics is relevant: In this case, assuming a 100Hz frequency (synaptic update period), this would mean 5 µA per synaptic update on average.

The latest diagram from Pulsar shows two accelerators: a CNN and an FFT. I believe these could be more broadly described as a GEMM (general matrix multiplier, the key for NN computation) and an SFU (special function unit, used to accelerate the FFT computation). Check the post about the GPU architecture for more information. There is also a spike encoder-decoder, which seems to have been developed as a coprocessor for RISC-V, but I could not find much more detail.

Spiking Neural Network Architecture Link to heading

The question that remains is about the Spiking Neural Network, or SNN: How does this work?

The metaphor Link to heading

I asked Gemini to give a metaphor for a 8 years old about Spiking Neural Network. Here is the story I got:

Imagine the brain like a giant post office, and the “neurons” are the little messengers who carry notes.

Messengers and Notes:

In a normal computer program, every messenger is always talking and shouting numbers back and forth. While in a Spiking Neural Network, the messengers are very quiet and energy-efficient: They only send a quick “Hey!” message when something important happens — like a tiny electric spike.

The Spark Meter:

Each messenger has a meter that fills up when they hear “Hey!” from others: When the meter gets full enough (reaches the threshold), that messenger gets excited too and shouts “Hey!”. Right after shouting, it resets its meter, so it can’t shout again immediately (the refractory period).

Timing is Everything:

If a messenger hears lots of “Heys!” quickly, it means the message is super important (like “Fire!”). If a messenger shouts “Hey!” just a little bit earlier than all the others, it might be a secret sign that something special is going on (like “Tiger incoming!”). So, messengers need to be time-aware.

Learning is the key:

The messengers learn just like you do: If Messenger A shouts “Hey!” just before Messenger B shouts “Hey!” (because something made them both excited), their friendship gets stronger, and Messenger A’s message helps Messenger B fill their meter faster next time.

SNNs are special because they are designed to be incredibly fast and efficient. They don’t waste energy talking all the time; they only activate when they have a spike of information to share! They help computers understand timing, recognize patterns, and use very little power.

The real design under the hood Link to heading

Innatera’s Thinking in Pulses blog provides a good overview of how this would be implemented in hardware. Encode the “heys” into binary values, get them through a NN, and extract the features. So far, nothing special; this is a standard NN process.

Design of a spiking Neuron

The more interesting question is how this SNN is implemented. The image below seems to be a conventional “computational model” of the digital neuron (image source) - where I have added the explicit notion of the “leaky integrate-and-fire” (LIF) within the SOMA.

Design of a spiking Neuron

What makes the hardware implementation low-power Link to heading

to be completed: explanations needed for

the performance gain (10x, 1000x, or more?)
the achievable NN propagation latencies
the tradeoffs and limitations
analog vs digial design performance.
power consumption relative to GPUs/TPUs

Innatera’s core design Link to heading

One question from the Pulsar design is: why are there two SNNs? The answer is that Innatera’s analog (proprietary analog-mixed signal) SNN fabric is optimized for maximum energy efficiency and very fast, ultra-low-power event processing. While the digital SNN fabric is a more programmable and flexible design that supports more configurable networks at slightly higher power. (Source)

There is not much information on Innatera’s website about the inner architecture of their spiking neural network, but fortunately, they have a few patents we can try to reverse-engineer!

Filing Date	Publication Number	Title (Abbreviated)
2021-02-09	US20220230051A1	Spiking Neural Network
2021-12-09	EP4505212A1	Method for efficient radar pre-processing
2022-01-26	EP4323923B1	Hierarchical reconfigurable multi-segment spiking neural network
2022-02-14	EP4238006A2	Distributed multi-component synaptic computational structure
2022-03-22	EP4226281B8	Adaptation of snns through transient synchrony
2022-04-08	EP4548260A1	Calibration of spiking neural networks
2022-06-03	EP4562540A	System and method for efficient feature-centric analog to spike encoders
2022-06-24	EP4573485A1	Sequential neural machine for memory optimized inference
2022-09-08	WO2024038102A1	System and method for reconfigurable modular neurosynaptic computational …
2022-10-18	WO2024153705A1	Self-timed feedforward synthesizable and technology scalable mixed-signal …
2023-05-18	WO2024175770A1	Always-on neuromorphic audio processing modules and methods
2023-05-24	WO2024175767A1	System and method for mapping of spiking neural networks on neuromorphic …
2023-06-19	WO2025012331A1	Method for training machine learning models for stochastic substrates
2023-06-22	WO2025017094A1	Method to build and deploy spiking neural networks on hardware device
2023-10-23	WO2025021573A1	System and method for occupancy detection using a frame-based sensor
2023-12-06	WO2025062034A1	Feature extraction and encoding of spiking neural networks using convolutional
2024-03-08	WO2023242374A1	Spike interconnect on chip single-packet multicast

This is quite a lot of patents for a startup! For the sake of time, I have only been looking at the first and last patent in this list.

Let’s start with the first patent US20220230051A1, which seems to be the closest one to describe the actual HW implementation of the SNN: The patent is centered on a compositional methodology for building SNNs from specialized, pre-trained sub-networks (“cells”) using a “unique response” principle.

Spike interconnect on chip single-packet multicast - patent WO2020099583A1

Figure 2 (below) shows the composition of a neural network formed by the integration of a data converter stage 201, input encoder stage 202, and pattern classifier stage 203. As for figure 5B (above) it shows how multiple temporal spike trains are fused into a single fused spike train

composition of a neural network

As for the last patent WO2023242374A1, it discloses Innatera’s Spiking Neural Network architecture, used to efficiently implement single-packet multicast using network-on-chip networks. The goal is to reduce duplication, latency, and overhead in multicore/many-core systems for multicast communication.

The diagram below (figure 5 in the patent) shows how the NOC arbitration works. The idea is to prevent two spikes (or more) having to be sent to the same output port at the same time. This is an efficient way to route spikes each cycle. Although, I wonder how deep are the fifos in the actual hardware. patent WO2023242374A1, figure 5: Innatera dynamic crossbar resource arbitration (NOC)

What is not clear to me is how the core and the spike network fit together: How many neurons are in a core? Are the core interconnected or working together? Or are they independant units that can process independent classification in parallel?

Software Stack: TALAMO Link to heading

High-level stack Link to heading

Innatera’s offering includes a bundled Talamo SDK, which doesn’t require SNN expertise, supports end-to-end applications, and uses a standard deep learning workflow. The Talamo kit helps developers build spiking models from scratch in a PyTorch-based environment. It’s a battery included.

The compilation workflow is very standard. The diagram below is a slightly updated version of Innatera’s original material, assuming the brain part of the workflow is the compiler. I also wonder if the simulator is taking the binary executable as input, in which case it would be called an emulator. One thing that would be very good to get more information about is the optimzation process, and whether there are parameters that can be tweeked by the user (like reducing the weights to 1.58 bits).

Once the binary program has been generated, this execution workflow is used, and follows more or less the process defined in the patent WO2023242374A1.

Low-level stack Link to heading

Innatera’s hardware includes a powerful RISC-V, a capable GEM, and an SFU compute accelerator. Some power users might find this more appealing to have a bare metal access to this hardware, compared to just loading weights into a C library targetable for Innatera’s architecture.

Maybe offering some for open-source firmware SDK would be beneficial. Alternatively, one could think of CudaNN, exposing a General Purpose Neural Processor Unit (GPNPU) experience, just like Cuda-Q exists for the Quantum industry. As well as an NVNLink, similar to NVQLink, enabling large scale NPU networks and hybrid architectures?

Maybe this is what is meant in the road to commercial success for neuromorphic technologies : “it was not until the advent of simple programming models and tools, simultaneous with straightforward API access to hardware, that tensor processors reached widespread adoption”. Just replace tensor processors with neuromorphic processing units (NPU)?

To me, this boils down to user experience (UX) and developper experience (DX). And I am firm believer that ease of use and developer experience are among the key-enablers, or strategy pillars, in creating genuinely outstanding product experiences.

(Image credits: archi monarch)

Conclusion Link to heading

That wraps up this quick memo on the high-level architecture of Neuromorphic ICs. Without a doubt, this hardware architecture is an alternative, powerful technology with significant potential. Yet, one could argue that Neuromorphic ICs lack a killer application. But isn’t it the exciting part about Neuromorphic ICs? Does that mean the world that goes with it has to be created?

I do believe that the team at Innetera is among “the people who see the world differently”. And amid concerns about AI data centers’ voracious appetite for watts, aren’t neuromorphic ICs one answer to the essential question: How to get more value out of less energy? If this is the case, then what will it take to scale the Neuromorphic ICs to reach the level of compution power equivalent to what is needed by the AI data centers?

Also, I wonder if adding a bit of sillicon photonics to the SNN would be of any help? It could either for the sake of neural computation, but also for the sake of creating a large scale interconnecting network between the SNN ICs.