FIRA MIMO – Harman AudioworX Documentation

FIRA MIMO audio object is a group of FIR filter banks associated with each input and output. FIRA stands for “fir accelerator” and it indicates that the underlying implementation uses a hardware accelerator. MIMO stands for “multi-input multi output” indicating that each output is the summation of one or more FIR filtered inputs.

FIRA MIMO can be used for various applications like individual sound zones etc.

Use Case: This audio object can be deployed whenever the audio requires the dual rate MIMO FIR filters taking into consideration the data pipeline delay and other limitations.
More than one instance of FIRAMIMO also can be run in the same core .

This AO supports in-place computation based on the core type.

FIRA MIMO Properties

Below table describes about the FIRAMIMO audio object properties and functionality.

A screenshot of a computer Description automatically generated

Properties	Description
# of Audio In	Enter the input value. Range: 1 to 20 The default value is set to 4.
# of Audio Out	Enter the output value. Range: 1 to 64 The default value is set to 4.
Number of taps for Hi filters	The number of filter coefficients (taps) for the high-rate path is configured using the m_NumElements. All channels in the high-rate path use the same number of taps. Range: 384 to 4096 The default value is set to 384.
Display Name	Display name of the FIRAMIMO audio object in signal flow design. It can be changed based on the intended usage of the object.

Mode

There are no modes available for FIRA MIMO.

Additional Parameters

Below table describes the FIRA MIMO additional parameters.

A screenshot of a computer program Description automatically generated

Parameter	Description
Max delay for high-rate filters	Length of delay line for high-rate path. Range: 0 to 2048 Default: 0 Data Type: uint32_t
Taps for low-rate filters	Number of taps for low-rate filters. All channels in the low-rate path use the same number of taps which can be different from high rate. Range: 512 to 2048 Default: 512 Data Type: uint32_t
Coefficient format	Filter coefficient data format. Range: 0 to 2 0: 32-bit floating point coefficients 1: IEEE 16-bit floating point coefficients 2: 16-bit fixed point coefficients Default: 0 Data Type: uint8_t
Down sampling Factor	Down sampling factor. Range: 0 to 1 0: Down sampling factor of 4 1: Down sampling factor of 16 Default: 1 Data Type: uint8_t
Conf matrix high	Configuration matrix for high-rate filters. Dimension 1: Input of size 20 or number of input channels in SFD whichever is less. Dimension 2: Output of size 64 or number of output channels in SFD whichever is less. Value: Range: 0 to 1 0: FIR not present 1: FIR present Default: 0 Data Type: uint8_t
Conf matrix low	Configuration matrix for low-rate filters. Dimension 1: Input of size 20 or number of input channels in SFD whichever is less. Dimension 2: Output of size 64 or number of output channels in SFD whichever is less. Value: Range: 0 to 1 0: FIR not present 1: FIR present Default: 0 Data Type: uint8_t
Max Processing Size	When 2 or more filters are enabled on an input channel, FIRAMIMO combines them and submits as one job to the accelerator. Max Processing Size is used as the max limit when combining. For example, when there are 6 filters enabled on an input channel and Max Processing Size is set to 4, FIRAMIMO submits the first job for 4 filters and second job with 2 filters. Range: 2 to 8 Default: 4 Data Type: uint8_t
Enable Cycles Measurement	Enable/Disable realtime accelerator mcps (million cycles per second). When enabled, Average and Max mcps are available to be read as tuning parameters in sate variable explorer. Range: 0 to 1 0: Disable 1: Enable Default: 0 Data Type: uint8_t
Accelerator Configuration	Select which among the available FIR hardware accelerator to use for the FIR processing. 0: Accelerator 1 (Default) 1: Accelerator 2 2: Both Accelerators Range: 0 to 2 Default: 0 Data Type: uint8_t On ADSP-21593, when this configuration is set to Accelerator 1, only the first accelerator will be used, when set to Accelerator 2, only the second accelerator will be used, and when set to Both Accelerators, both available FIR accelerators will be used with almost equal number of filters processed by both accelerators. When both hardware accelerators are used: The average and maxixmum mcps of both the accelerators shall be individually readable from the SV explorer by enabling the “enable cycles measurement” additional configuration, Additional memory records corresponding to the second accelerators input, output and filter coefficient buffers will be created with their labels containing the prefix “(Acc 2)”. This is detailed in the section “Memory Allocation Considerations for SHARC ADSP-21593 Platform in Dual Accelerator Mode”. On DSPs with only a single accelerator such as the ADSP-21569 (GUL), selection of accelerator configuration will be available, however, all filtering jobs will be processed by the only available FIR accelerator.

Tuning Parameters

For each filter combination in FIRA MIMO, this object exposes these two tuning parameters to the GT.

Mode

Description

Mode

The mode of each filter can be set to:

Normal
Bypass
Off

This parameter is of the category “State” and therefore, the configurations done for Filter modes will be transferred to the device only after the device is connected.

Coefficients

Filter coefficients can be imported from .csv files. The filter taps set in the GTT must match the taps of the filter being imported from the .csv file.

Control Parameters

There are no control parameters available for FIRA MIMO.

Memory Allocation Considerations for SHARC ADSP-21593 Platform in Dual Accelerator Mode

When using the FIRAMIMO object on a SHARC ADSP-21593 in dual accelerator mode, careful memory record placement consideration is required to maximize the number of FIR filters that can be processed. The hardware FIR accelerators in the DSP perform operations in parallel with the cores, causing memory access bottleneck due to the increase in the number of parallel memory accesses to the external memory (DDR3). This can be attributed to the page management overheads incurred by the Dynamic Memory Controller (DMC) as described in the EE-412 document, which adversely impacts the FIR accelerator performance, limiting the total number of FIR filters that can be processed by the AO. In order to mitigate the issue and improve the processing capacity of the accelerators, it is recommended that the input, output and filter coefficient buffers accessed by each accelerator are placed in dedicated DDR3 memory banks.

To enable placement of the buffers in separate banks, when the “Accelerator Configuration” additional configuration parameter is set to 2 (Both Accelerators), a new set of additional memory records with the prefix “(Acc 2)” are created to accommodate the second FIR accelerator’s input, output and coefficient buffers. The following figure shows the Memory Latency window in GTT highlighting the newly created memory records:

In this configuration, following are the memory records accessed by the accelerators:

	Hardware FIR Accelerator 1	Hardware FIR Accelerator 2	Description
1	Input Memory	(Acc 2) Input Memory	High rate input buffers
2	Lowrate Input Memory	(Acc 2) Lowrate Input Memory	Low rate input buffers
3	Parameter Memory	(Acc 2) Parameter Memory	Filter coefficient buffers
4	Downsampling Intermediate Memory	(Acc 2) Downsampling Intermediate Memory	Downsampling filter output buffers
5	Upsampling 1 Input Memory	(Acc 2) Upsampling 1 Input Memory	Stage 1 upsampling filter input buffers
6	Upsampling 2 Input Memory	(Acc 2) Upsampling 2 Input Memory	Stage 2 upsampling filter input buffers
7	Intermediate Temp Memory	(Acc 2) Intermediate Temp Memory	Temporary memory for coefficient format conversion

As per the recommendation, notice in the above figure that the memory records accessed by accelerator 1 are all placed in Level 7 memory latency, and the memory records accessed by accelerator 2 are all placed in Level 8 memory latency. It is expected that the platform maps these memory latency levels to un-cached memory regions that are in separate DDR3 memory banks in order to fully realize the separation of buffers.

At 512 sample block length, with 2k-tap high rate filters, following are the improvements observed in the total number of filters that can be processed by the FIRAMIMO AO in single vs dual accelerator modes:

All accelerator buffers placed in Level 5 latency: 20% (32 filters to 40 filters).
Accelerator 1 buffers in Level 7 and accelerator 2 buffers in Level 8, mapped to separate DD3 banks: ~90% (32 filters to 60 filters).

It is important to note that the separation of the memory records for the hardware FIR accelerators, however, increases the core MIPS consumption by about 30-60% in both single and dual accelerator modes due to additional buffer movement and summation operations done by the core to reduce the memory access overheads.

Note:

FIRAMIMO gives clear output in GUL, IVP but is distorted on HDP+ platform when ‘Input memory’ and ‘Upsampling1 input memory’ records are kept in Level4 or Level5, due to cache flush issue on HDP+ platform. The workaround is to keep these two memory records in Level1 to Level3.