Post

Floating Point Unit

Floating Point Unit

ST-Floating Point Unit

Some STM32 microcontrollers have an internal FPU (Floating-Point Unit) that can accelerate floating-point arithmetic operations by executing them in hardware instead of software emulation for these operations which takes a bit longer time compared to the hardware FPU performance.

ARM Cortex-M cores support a hardware FPU with only single-precision (SP), double-precision (DP), or no FPU at all.

  • Single Precision: M4, M33, M35P, M55
  • Double Precision: M7
  • No FPU: M0, M0+, M1, M23, M3

Overview

The various types of floating-point implementations over the years led the IEEE to standardize the following elements:

  • number formats
  • arithmetic operations
  • number conversions
  • special values coding
  • four rounding modes
  • five exceptions and their handling

All values are composed of three fields:

  • Sign: s
  • Biased exponents:
    • sum of the exponents = e
    • constant value = bias
  • Fraction (or mantissa): f The values can be coded on various lengths:
  • 16-bit: half precision format
  • 32-bit: single precision format
  • 64-bit: double precision format Desktop View

Normalized numbers

Normalized numbers are given by the formula below: Desktop View The bias is a fixed value defined for each format (8-bit, 16-bit, 32-bit and 64-bit) Desktop View

Denormalized

Denormalized (or subnormal) numbers are used when the value to represent is too small to be encoded as a normalized number. In this case, the exponent is set to zero, and the precision is slightly reduced to allow for gradual underflow.

This ensures smoother transitions around zero and allows representation of values closer to zero than what normalized numbers can express.

FormatMin Denormalized Value
Half~5.96×10⁻⁸
Single~1.4×10⁻⁴⁵
Double~4.94×10⁻³²⁴

🔁 Example 1: Convert Decimal to IEEE 754 Single-Precision

Input: -7.0
We’ll convert -7.0 to IEEE 754 single-precision (32-bit) floating-point format.

Step 1: Sign bit

Since it’s negative → Sign = 1

Step 2: Convert to binary

7.0 in binary = 111.0 = 1.11 × 2² (normalized form)

Step 3: Exponent

  • Bias for single precision = 127
  • Exponent = 2 → 2 + 127 = 129
  • 129 in binary = 10000001

Step 4: Mantissa (23 bits)

Keep the fraction part after the 1. (since 1. is implicit in normalized form)
1.11 → take .1111000000000000000000000

✅ Final IEEE 754 Format

SignExponentMantissa
11000000111000000000000000000000

In Hex:

1
0b1_10000001_11000000000000000000000 = 0xC0E00000

🔁 Example 2: Convert IEEE 754 Hex to Decimal

Input: 0xC0E00000

Step 1: Binary Breakdown

0xC0E00000
11000000111000000000000000000000

  • Sign: 1 → negative
  • Exponent: 10000001 → 129
  • Mantissa: 11000000000000000000000

Step 2: Compute Exponent

129 - 127 = 2

Step 3: Compute Mantissa

Add implicit 1. in front → 1.11
Binary 1.11 = 1 + 0.5 + 0.25 = 1.75

Step 4: Final Result

Value = -1.75 × 2² = -7.0


🧠 Summary Table

DecimalIEEE 754 BinaryHex
-7.01 10000001 110000000000000000000000xC0E00000

Special Values in IEEE 754

IEEE 754 defines several special cases in floating-point representation:

  • Zero: Represented by all exponent and mantissa bits being 0. The sign bit determines +0 or -0.

  • Infinity (±∞): Occurs when the exponent is all 1s and the mantissa is all 0s.

  • NaN (Not-a-Number): Used to represent undefined or unrepresentable values such as 0/0 or sqrt(-1).

  • Quiet NaN (QNaN): Propagates silently through most operations.

  • Signaling NaN (SNaN): Triggers an exception when used.

SignExponentFractionMeaning
000+0
100-0
0Max0+∞
1Max0-∞
xMax≠0NaN (Q/S)

Rounding Modes

IEEE 754 specifies 4 rounding modes, which affect how results are approximated when they can’t be represented exactly:

  • Round to Nearest (default): Chooses the nearest representable value. If tie, rounds to even.

  • Round Toward Zero: Truncates the result.

  • Round Toward +∞: Rounds up.

  • Round Toward −∞: Rounds down.

  • Rounding mode can be selected via the FPU configuration registers such as FPSCR or FPDSCR.

Exception Handling

Floating-point operations can raise exceptions in five situations:

  • Invalid Operation (e.g., sqrt(-1), 0/0)

  • Division by Zero

  • Overflow (result exceeds the maximum value)

  • Underflow (result is too close to zero to be normalized)

  • Inexact Result (result had to be rounded)

In STM32:

  • Exceptions are handled via interrupts, not traps.

  • Flags like IOC, DZC, OFC, UFC, IXC are set in FPSCR.

  • You can monitor or clear these flags manually.

Using FPU in STM32 Projects

To benefit from the hardware FPU on STM32:

✅ Enable FPU in Compiler Settings
MDK-ARM (Keil): Enable -mfpu=fpv4-sp-d16 or fpv5-d16 based on your target.

GCC: Use -mfpu=fpv4-sp-d16 -mfloat-abi=hard for single-precision FPU.

⚙️ Use Native Float Instructions Use float or double in your C code. The compiler will generate optimized FPU instructions if -mfloat-abi=hard is set.

🧠 Avoid Mixing Soft/Hard FPU Mixing -mfloat-abi=soft and -mfloat-abi=hard across modules may lead to linking errors. Stick to one strategy.

💾 Context Saving

This post is licensed under CC BY 4.0 by the author.