Post

Vivante NPU SDK Development Guide

Vivante NPU SDK Development Guide

The Allwinner T527 / A733 SoC is equipped with the Vivante VIP9000 series NPU. The Vivante Machine Learning SDK is a powerful toolkit that helps developers deploy and accelerate AI inference tasks on supported boards — with hardware acceleration from the VIP9000 NPU significantly enhancing inference performance.

Supported Boards:

  • Orangepi 4A

Overview

The Vivante ML SDK supports various model frameworks — TensorFlow, TensorFlow Lite, PyTorch, Caffe, DarkNet, ONNX, and Keras — converting them into formats executable on the VIP9000 NPU.

Deploying and inferring AI models on the NPU requires two steps:

  1. Parse the model structure and convert operators into an intermediate representation (IR).
  2. Compile the IR into machine-specific instructions.

Both steps can be performed in two modes:

ModeDescription
Online (Runtime Inferencing)Conversion and compilation happen at runtime on any platform
Offline (Offline Compilation)Model is pre-compiled before deployment using the ACUITY Toolkit

Desktop View


Runtime Inferencing

In Runtime Inferencing mode, users focus on the original model framework without worrying about the target platform. The IR conversion and compilation steps run on any platform — the target driver handles the underlying acceleration automatically. This makes it suitable for cross-platform software.

Note: For detailed usage of Runtime Inferencing, refer to the Vivante TIM-VX open-source repository. This document does not cover Runtime Inferencing in detail.


Offline Compilation

Using the ACUITY Toolkit, the original model is compiled into an NPU-runnable format before deployment. It supports:

  • UINT8, PCQ (INT8), INT16, BF16 quantization, and mixed quantization
  • Automatic operator fusion during compilation, optimizing the model structure
  • Significantly reduced model initialization time and resource overhead

The ACUITY Toolkit can generate two types of output:

Machine Code Generator — Network Binary Graph (NBG)

NBG (Network Binary Graph) is a pre-compiled machine code format deployable directly on the Vivante NPU. No further compilation is needed — instructions are sent directly to hardware. It is compatible with both OpenVX and VIPLite drivers.

Source Code Generator — OpenVX Code

The OpenVX code generator produces a C-language OpenVX project for the Vivante NPU. Since the generated application is a graph-level IR, the OpenVX runtime still performs Just-In-Time (JIT) compilation on the target device. This trades instant startup for cross-platform compatibility, while retaining the model optimizations from offline tooling.

NBG vs. OpenVX Comparison

FeatureNBG ProjectOpenVX Project
Cross-Platform SupportNoYes
Just-In-Time (JIT) CompilationNoYes
Instant Model InitializationYesNo
Supports OpenVX DriverYesYes
Supports VIPLite DriverYesNo

Runtime vs. Offline Mode Comparison

FeatureRuntime InferencingOffline Compilation
Cross-Platform SupportYesNo
Easy MaintenanceYesNo
Model Operator FusionNoYes
Instant Model InitializationNoYes
Model QuantizationNoYes

Vivante ML Software Stack

ACUITY Toolkit

The ACUITY Toolkit is an end-to-end integrated offline development tool for model conversion, quantization, and compilation. It supports various AI frameworks and generates ready-to-run code for the NPU.

Documentation:

  • ACUITY Environment Setup
  • ACUITY Toolkit Usage
  • ACUITY Quantization Accuracy Optimization

Desktop View


TIM-VX — Tensor Interface Module

TIM-VX is a software integration module by VeriSilicon that simplifies neural network deployment on its ML accelerators. It acts as a backend binding interface for runtime frameworks such as Android NN, TensorFlow Lite, MLIR, and TVM, and is the primary module for Runtime Inferencing.

Note: For detailed TIM-VX usage, refer to the Vivante TIM-VX open-source repository.


Vivante Unified Driver

The Vivante Unified Driver provides a standardized programming interface for NPUs, supporting industry-standard APIs such as OpenVX and OpenCL, and is compatible with Linux and Android operating systems.


Vivante VIPLite Driver

The VIPLite Driver is a lightweight driver designed for embedded systems — Linux, RTOS, and bare-metal. It loads and runs ACUITY precompiled neural network models with minimal overhead.


Unified vs. VIPLite Driver Comparison

FeatureUnified DriverVIPLite Driver
Operating EnvironmentAndroid / LinuxAndroid / Linux / RTOS / Bare Metal / DSP
Offline Compilation (NBG)SupportedSupported
Runtime JIT CompilationSupportedNot Supported
Multi-VIP SupportSupportedSupported
Memory UsageTens of MBTens of KB
MMU SupportSupportedSupported
Multi-Graph SupportSupportedSupported
This post is licensed under CC BY 4.0 by the author.