Vivante NPU SDK Development Guide

Posted Jan 10, 2026

By Khoi Nguyen Van

3 min read

The Allwinner T527 / A733 SoC is equipped with the Vivante VIP9000 series NPU. The Vivante Machine Learning SDK is a powerful toolkit that helps developers deploy and accelerate AI inference tasks on supported boards — with hardware acceleration from the VIP9000 NPU significantly enhancing inference performance.

Supported Boards:

Orangepi 4A

Overview

The Vivante ML SDK supports various model frameworks — TensorFlow, TensorFlow Lite, PyTorch, Caffe, DarkNet, ONNX, and Keras — converting them into formats executable on the VIP9000 NPU.

Deploying and inferring AI models on the NPU requires two steps:

Parse the model structure and convert operators into an intermediate representation (IR).
Compile the IR into machine-specific instructions.

Both steps can be performed in two modes:

Mode	Description
Online (Runtime Inferencing)	Conversion and compilation happen at runtime on any platform
Offline (Offline Compilation)	Model is pre-compiled before deployment using the ACUITY Toolkit

Runtime Inferencing

In Runtime Inferencing mode, users focus on the original model framework without worrying about the target platform. The IR conversion and compilation steps run on any platform — the target driver handles the underlying acceleration automatically. This makes it suitable for cross-platform software.

Note: For detailed usage of Runtime Inferencing, refer to the Vivante TIM-VX open-source repository. This document does not cover Runtime Inferencing in detail.

Offline Compilation

Using the ACUITY Toolkit, the original model is compiled into an NPU-runnable format before deployment. It supports:

UINT8, PCQ (INT8), INT16, BF16 quantization, and mixed quantization
Automatic operator fusion during compilation, optimizing the model structure
Significantly reduced model initialization time and resource overhead

The ACUITY Toolkit can generate two types of output:

Machine Code Generator — Network Binary Graph (NBG)

NBG (Network Binary Graph) is a pre-compiled machine code format deployable directly on the Vivante NPU. No further compilation is needed — instructions are sent directly to hardware. It is compatible with both OpenVX and VIPLite drivers.

Source Code Generator — OpenVX Code

The OpenVX code generator produces a C-language OpenVX project for the Vivante NPU. Since the generated application is a graph-level IR, the OpenVX runtime still performs Just-In-Time (JIT) compilation on the target device. This trades instant startup for cross-platform compatibility, while retaining the model optimizations from offline tooling.

NBG vs. OpenVX Comparison

Feature	NBG Project	OpenVX Project
Cross-Platform Support	No	Yes
Just-In-Time (JIT) Compilation	No	Yes
Instant Model Initialization	Yes	No
Supports OpenVX Driver	Yes	Yes
Supports VIPLite Driver	Yes	No

Runtime vs. Offline Mode Comparison

Feature	Runtime Inferencing	Offline Compilation
Cross-Platform Support	Yes	No
Easy Maintenance	Yes	No
Model Operator Fusion	No	Yes
Instant Model Initialization	No	Yes
Model Quantization	No	Yes

Vivante ML Software Stack

ACUITY Toolkit

The ACUITY Toolkit is an end-to-end integrated offline development tool for model conversion, quantization, and compilation. It supports various AI frameworks and generates ready-to-run code for the NPU.

Documentation:

ACUITY Environment Setup
ACUITY Toolkit Usage
ACUITY Quantization Accuracy Optimization

TIM-VX — Tensor Interface Module

TIM-VX is a software integration module by VeriSilicon that simplifies neural network deployment on its ML accelerators. It acts as a backend binding interface for runtime frameworks such as Android NN, TensorFlow Lite, MLIR, and TVM, and is the primary module for Runtime Inferencing.

Note: For detailed TIM-VX usage, refer to the Vivante TIM-VX open-source repository.

Vivante Unified Driver

The Vivante Unified Driver provides a standardized programming interface for NPUs, supporting industry-standard APIs such as OpenVX and OpenCL, and is compatible with Linux and Android operating systems.

Vivante VIPLite Driver

The VIPLite Driver is a lightweight driver designed for embedded systems — Linux, RTOS, and bare-metal. It loads and runs ACUITY precompiled neural network models with minimal overhead.

Unified vs. VIPLite Driver Comparison

Feature	Unified Driver	VIPLite Driver
Operating Environment	Android / Linux	Android / Linux / RTOS / Bare Metal / DSP
Offline Compilation (NBG)	Supported	Supported
Runtime JIT Compilation	Supported	Not Supported
Multi-VIP Support	Supported	Supported
Memory Usage	Tens of MB	Tens of KB
MMU Support	Supported	Supported
Multi-Graph Support	Supported	Supported

Embedded Systems, AI

linux npu

This post is licensed under CC BY 4.0 by the author.