PULP DSP  Version 1.0
Digital Signal Processing library for PULP processors (pulp-platform.org)
 All Classes Files Functions Groups Pages
Functions
Vector Dot Product Kernels

Functions

void plp_dot_prod_i16s_rv32im (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Scalar dot product of 16-bit integer vectors kernel for RV32IM extension. More...
 
void plp_dot_prod_i16v_xpulpv2 (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Vectorized dot product of 16-bit integer vectors kernel for XPULPV2 extension. More...
 
void plp_dot_prod_i32p_xpulpv2 (void *S)
 Scalar dot product with interleaved access of 32-bit integer vectors kernel for XPULPV2 extension. More...
 
void plp_dot_prod_i32s_rv32im (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Scalar dot product of 32-bit integer vectors kernel for RV32IM extension. More...
 
void plp_dot_prod_i32s_xpulpv2 (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Scalar dot product of 32-bit integer vectors kernel for XPULPV2 extension. More...
 
void plp_dot_prod_i8s_rv32im (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Scalar dot product of 8-bit integer vectors kernel for RV32IM extension. More...
 
void plp_dot_prod_i8v_xpulpv2 (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t blockSize, int32_t *__restrict__ pRes)
 Vectorized dot product of 8-bit integer vectors kernel for XPULPV2 extension. More...
 
void plp_dot_prod_q16s_rv32im (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Scalar dot product of 16-bit fixed point vectors kernel for RV32IM extension. More...
 
void plp_dot_prod_q16v_xpulpv2 (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Vectorized dot product of 16-bit fixed point vectors kernel for XPULPV2 extension. More...
 
void plp_dot_prod_q32p_xpulpv2 (void *S)
 Scalar dot product with interleaved access of 32-bit fixed point vectors kernel for XPULPV2 extension. More...
 
void plp_dot_prod_q32s_rv32im (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Scalar dot product of 32-bit fixed point vectors kernel for RV32IM extension. More...
 
void plp_dot_prod_q32s_xpulpv2 (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Scalar dot product of 32-bit fixed point vectors kernel for XPULPV2 extension. More...
 
void plp_dot_prod_q8s_rv32im (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Scalar dot product of 8-bit fixed point vectors kernel for RV32IM extension. More...
 
void plp_dot_prod_q8v_xpulpv2 (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *__restrict__ pRes)
 Scalar dot product of 8-bit fixed point vectors kernel for XPULPV2 extension. More...
 

Detailed Description

Computes the scalar dot product of two vectors. The vectors are multiplied element-by-element and then summed.

    sum = pSrcA[0]*pSrcB[0] + pSrcA[1]*pSrcB[1] + ... + pSrcA[blockSize-1]*pSrcB[blockSize-1]

There are separate functions for floating-point, int8, int16, and int32 data types. For lower precision integers (int8, int16), functions exploiting SIMD instructions are provided.

The naming of the functions follows the following pattern (for example plp_dot_prod_i32s_rv32im):

    <pulp> _ <function name> _ <data type><precision><method>_<isa extension>, with
    data type = {f, i, q} respectively for floats, integers, fixed points
    precision = {32, 16, 8} bits
    method = {s, v, p} meaning single (or scalar, i.e. not using packed SIMD), vectorized (i.e. using SIMD instructions), and parallel (for multicore parallel computing), respectively.
    isa extension = rv32im, xpulpv2, etc. of which rv32im is the most general one.
 

Function Documentation

void plp_dot_prod_i16s_rv32im ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Scalar dot product of 16-bit integer vectors kernel for RV32IM extension.

Vectorized dot product of 16-bit integer vectors kernel for RV32IM extension.

Parameters
[in]pSrcApoints to the first input vector [16 bit]
[in]pSrcBpoints to the second input vector [16 bit]
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
void plp_dot_prod_i16v_xpulpv2 ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Vectorized dot product of 16-bit integer vectors kernel for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input vector [16 bit]
[in]pSrcBpoints to the second input vector [16 bit]
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
The 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.
void plp_dot_prod_i32p_xpulpv2 ( void *  S)

Scalar dot product with interleaved access of 32-bit integer vectors kernel for XPULPV2 extension.

Parameters
[in]Spoints to the instance structure for integer parallel dot product
Returns
none
void plp_dot_prod_i32s_rv32im ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Scalar dot product of 32-bit integer vectors kernel for RV32IM extension.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_i32s_xpulpv2 ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Scalar dot product of 32-bit integer vectors kernel for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_i8s_rv32im ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Scalar dot product of 8-bit integer vectors kernel for RV32IM extension.

Vectorized dot product of 8-bit integer vectors kernel for RV32IM extension.

Parameters
[in]pSrcApoints to the first input vector [8] bit]
[in]pSrcBpoints to the second input vector [8 bit]
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
void plp_dot_prod_i8v_xpulpv2 ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  blockSize,
int32_t *__restrict__  pRes 
)

Vectorized dot product of 8-bit integer vectors kernel for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input vector [8 bit]
[in]pSrcBpoints to the second input vector [8 bit]
[in]blockSizenumber of samples in each vector
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
The 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.
void plp_dot_prod_q16s_rv32im ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Scalar dot product of 16-bit fixed point vectors kernel for RV32IM extension.

Parameters
[in]pSrcApoints to the first input vector [16 bit]
[in]pSrcBpoints to the second input vector [16 bit]
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
void plp_dot_prod_q16v_xpulpv2 ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Vectorized dot product of 16-bit fixed point vectors kernel for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input vector [16 bit]
[in]pSrcBpoints to the second input vector [16 bit]
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
The 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.
void plp_dot_prod_q32p_xpulpv2 ( void *  S)

Scalar dot product with interleaved access of 32-bit fixed point vectors kernel for XPULPV2 extension.

Parameters
[in]Spoints to the instance structure for fixed point parallel dot product
Returns
none
void plp_dot_prod_q32s_rv32im ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Scalar dot product of 32-bit fixed point vectors kernel for RV32IM extension.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_q32s_xpulpv2 ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Scalar dot product of 32-bit fixed point vectors kernel for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input vector
[in]pSrcBpoints to the second input vector
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here
Returns
none
void plp_dot_prod_q8s_rv32im ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Scalar dot product of 8-bit fixed point vectors kernel for RV32IM extension.

Parameters
[in]pSrcApoints to the first input vector [8 bit]
[in]pSrcBpoints to the second input vector [8 bit]
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
void plp_dot_prod_q8v_xpulpv2 ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  blockSize,
uint32_t  deciPoint,
int32_t *__restrict__  pRes 
)

Scalar dot product of 8-bit fixed point vectors kernel for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input vector [8 bit]
[in]pSrcBpoints to the second input vector [8 bit]
[in]blockSizenumber of samples in each vector
[in]deciPointdecimal point for right shift
[out]pResoutput result returned here [32 bit]
Returns
none
Exploiting SIMD instructions
The 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.