PULP DSP  Version 1.0
Digital Signal Processing library for PULP processors (pulp-platform.org)
 All Classes Files Functions Groups Pages
Modules | Functions
Vector Dot Product

Modules

 Vector Dot Product Kernels
 

Functions

void plp_dot_prod_i16 (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Glue code for dot product of 16-bit integer vectors. More...
 
void plp_dot_prod_i32 (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Glue code for dot product of 32-bit integer vectors. More...
 
void plp_dot_prod_i32_parallel (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t nPE, int32_t *__restrict__ pRes)
 Glue code for parallel dot product of 32-bit integer vectors. More...
 
void plp_dot_prod_i8 (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Glue code for dot product of 8-bit integer vectors. More...
 
void plp_dot_prod_q16 (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Glue code for dot product of 16-bit fixed point vectors. More...
 
void plp_dot_prod_q32 (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Glue code for dot product of 32-bit fixed point vectors. More...
 
void plp_dot_prod_q32_parallel (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, uint32_t nPE, int32_t *__restrict__ pRes)
 Glue code for parallel dot product of 32-bit fixed point vectors. More...
 
void plp_dot_prod_q8 (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Glue code for dot product of 8-bit fixed point vectors. More...
 

Detailed Description

This module contains the glue code for Vector Dot Product. The kernel codes (kernels) are in the Moducle Vector Dot Product Kernels.

The Vector Dot Product computes the dot product of two vectors. The vectors are multiplied element-by-element and then summed.

    sum = pSrcA[0]*pSrcB[0] + pSrcA[1]*pSrcB[1] + ... + pSrcA[blockSize-1]*pSrcB[blockSize-1]

There are separate functions for floating point, integer, and fixed point 32- 16- 8-bit data types. For lower precision integers (16- and 8-bit), functions exploiting SIMD instructions are provided.

The naming scheme of the functions follows the following pattern (for example plp_dot_prod_i32s):

<pulp> _ <function name> _ <data type> <precision> <method> _ <isa extension>, with
data type = {f, i, q} respectively for floats, integers, fixed points
precision = {32, 16, 8} bits
method = {s, v, p} meaning single (or scalar, i.e. not using packed SIMD), vectorized (i.e. using SIMD instructions), and parallel (for multicore parallel computing), respectively.
isa extension = rv32im, xpulpv2, etc. of which rv32im is the most general one.
 

Function Documentation

void plp_dot_prod_i16 ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Glue code for dot product of 16-bit integer vectors.

Parameters
[in]pSrcApoints to the first input vector [16 bit]
[in]pSrcBpoints to the second input vector [16 bit]
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.
void plp_dot_prod_i32 ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Glue code for dot product of 32-bit integer vectors.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_i32_parallel ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  nPE,
int32_t *__restrict__  pRes 
)

Glue code for parallel dot product of 32-bit integer vectors.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[in]nPEnumber of parallel processing units
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_i8 ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Glue code for dot product of 8-bit integer vectors.

Parameters
[in]pSrcApoints to the first input vector [8 bit]
[in]pSrcBpoints to the second input vector [8 bit]
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.
void plp_dot_prod_q16 ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Glue code for dot product of 16-bit fixed point vectors.

Parameters
[in]pSrcApoints to the first input vector [16 bit]
[in]pSrcBpoints to the second input vector [16 bit]
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.
void plp_dot_prod_q32 ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Glue code for dot product of 32-bit fixed point vectors.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_q32_parallel ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
uint32_t  nPE,
int32_t *__restrict__  pRes 
)

Glue code for parallel dot product of 32-bit fixed point vectors.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[in]nPEnumber of parallel processing units
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_q8 ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Glue code for dot product of 8-bit fixed point vectors.

Parameters
[in]pSrcApoints to the first input vector [8 bit]
[in]pSrcBpoints to the second input vector [8 bit]
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.