Vivante NPU SDK Development Guide
The Allwinner T527 / A733 SoC is equipped with the Vivante VIP9000 series NPU. The Vivante Machine Learning SDK is a powerful toolkit that helps developers deploy and accelerate AI inference tasks on supported boards — with hardware acceleration from the VIP9000 NPU significantly enhancing inference performance.
Supported Boards:
- Orangepi 4A
Overview
The Vivante ML SDK supports various model frameworks — TensorFlow, TensorFlow Lite, PyTorch, Caffe, DarkNet, ONNX, and Keras — converting them into formats executable on the VIP9000 NPU.
Deploying and inferring AI models on the NPU requires two steps:
- Parse the model structure and convert operators into an intermediate representation (IR).
- Compile the IR into machine-specific instructions.
Both steps can be performed in two modes:
| Mode | Description |
|---|---|
| Online (Runtime Inferencing) | Conversion and compilation happen at runtime on any platform |
| Offline (Offline Compilation) | Model is pre-compiled before deployment using the ACUITY Toolkit |
Runtime Inferencing
In Runtime Inferencing mode, users focus on the original model framework without worrying about the target platform. The IR conversion and compilation steps run on any platform — the target driver handles the underlying acceleration automatically. This makes it suitable for cross-platform software.
Note: For detailed usage of Runtime Inferencing, refer to the Vivante TIM-VX open-source repository. This document does not cover Runtime Inferencing in detail.
Offline Compilation
Using the ACUITY Toolkit, the original model is compiled into an NPU-runnable format before deployment. It supports:
- UINT8, PCQ (INT8), INT16, BF16 quantization, and mixed quantization
- Automatic operator fusion during compilation, optimizing the model structure
- Significantly reduced model initialization time and resource overhead
The ACUITY Toolkit can generate two types of output:
Machine Code Generator — Network Binary Graph (NBG)
NBG (Network Binary Graph) is a pre-compiled machine code format deployable directly on the Vivante NPU. No further compilation is needed — instructions are sent directly to hardware. It is compatible with both OpenVX and VIPLite drivers.
Source Code Generator — OpenVX Code
The OpenVX code generator produces a C-language OpenVX project for the Vivante NPU. Since the generated application is a graph-level IR, the OpenVX runtime still performs Just-In-Time (JIT) compilation on the target device. This trades instant startup for cross-platform compatibility, while retaining the model optimizations from offline tooling.
NBG vs. OpenVX Comparison
| Feature | NBG Project | OpenVX Project |
|---|---|---|
| Cross-Platform Support | No | Yes |
| Just-In-Time (JIT) Compilation | No | Yes |
| Instant Model Initialization | Yes | No |
| Supports OpenVX Driver | Yes | Yes |
| Supports VIPLite Driver | Yes | No |
Runtime vs. Offline Mode Comparison
| Feature | Runtime Inferencing | Offline Compilation |
|---|---|---|
| Cross-Platform Support | Yes | No |
| Easy Maintenance | Yes | No |
| Model Operator Fusion | No | Yes |
| Instant Model Initialization | No | Yes |
| Model Quantization | No | Yes |
Vivante ML Software Stack
ACUITY Toolkit
The ACUITY Toolkit is an end-to-end integrated offline development tool for model conversion, quantization, and compilation. It supports various AI frameworks and generates ready-to-run code for the NPU.
Documentation:
- ACUITY Environment Setup
- ACUITY Toolkit Usage
- ACUITY Quantization Accuracy Optimization
TIM-VX — Tensor Interface Module
TIM-VX is a software integration module by VeriSilicon that simplifies neural network deployment on its ML accelerators. It acts as a backend binding interface for runtime frameworks such as Android NN, TensorFlow Lite, MLIR, and TVM, and is the primary module for Runtime Inferencing.
Note: For detailed TIM-VX usage, refer to the Vivante TIM-VX open-source repository.
Vivante Unified Driver
The Vivante Unified Driver provides a standardized programming interface for NPUs, supporting industry-standard APIs such as OpenVX and OpenCL, and is compatible with Linux and Android operating systems.
Vivante VIPLite Driver
The VIPLite Driver is a lightweight driver designed for embedded systems — Linux, RTOS, and bare-metal. It loads and runs ACUITY precompiled neural network models with minimal overhead.
Unified vs. VIPLite Driver Comparison
| Feature | Unified Driver | VIPLite Driver |
|---|---|---|
| Operating Environment | Android / Linux | Android / Linux / RTOS / Bare Metal / DSP |
| Offline Compilation (NBG) | Supported | Supported |
| Runtime JIT Compilation | Supported | Not Supported |
| Multi-VIP Support | Supported | Supported |
| Memory Usage | Tens of MB | Tens of KB |
| MMU Support | Supported | Supported |
| Multi-Graph Support | Supported | Supported |

