PULP DSP  Version 1.0
Digital Signal Processing library for PULP processors (pulp-platform.org)
 All Classes Files Functions Groups Pages
Functions
Matrix Multiplication Kernels

Functions

void plp_mat_mult_i16s_rv32im (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t M, uint32_t N, uint32_t O, int32_t *__restrict__ pDstC)
 Matrix multiplication of 16-bit integer matrices kernel for RV32IM extension. More...
 
void plp_mat_mult_i16v_xpulpv2 (const int16_t *__restrict__ pSrcA, const int16_t *__restrict__ pSrcB, uint32_t M, uint32_t N, uint32_t O, int32_t *__restrict__ pDstC)
 Matrix multiplication of 16-bit integer matrices kernel for XPULPV2 extension. More...
 
void plp_mat_mult_i16vp_xpulpv2 (void *args)
 Parallel matrix multiplication of 16-bit integer matrices kernel for XPULPV2 extension. More...
 
void plp_mat_mult_i32p_xpulpv2 (void *args)
 Parallel matrix multiplication of 32-bit integer matrices kernel for XPULPV2 extension. More...
 
void plp_mat_mult_i32s_rv32im (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t M, uint32_t N, uint32_t O, int32_t *__restrict__ pDstC)
 Matrix multiplication of 32-bit integer matrices kernel for RV32IM extension. More...
 
void plp_mat_mult_i32s_xpulpv2 (const int32_t *__restrict__ pSrcA, const int32_t *__restrict__ pSrcB, uint32_t M, uint32_t N, uint32_t O, int32_t *__restrict__ pDstC)
 Matrix multiplication of 32-bit integer matrices kernel for XPULPV2 extension. More...
 
void plp_mat_mult_i8s_rv32im (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t M, uint32_t N, uint32_t O, int32_t *__restrict__ pDstC)
 Matrix multiplication of 8-bit integer matrices kernel for RV32IM extension. More...
 
void plp_mat_mult_i8v_xpulpv2 (const int8_t *__restrict__ pSrcA, const int8_t *__restrict__ pSrcB, uint32_t M, uint32_t N, uint32_t O, int32_t *__restrict__ pDstC)
 Matrix matrix multiplication of a 8-bit integer matrices for XPULPV2 extension. More...
 
void plp_mat_mult_i8vp_xpulpv2 (void *args)
 Parallel matrix multiplication of 8-bit integer matrices kernel for XPULPV2 extension. More...
 

Detailed Description

Computes the product of two matrices.

The Matrix Matrix Multiplication computes the product of two matrices with dimensions MxN and NxO. The first matrix is accessed row wise, the second column wise, all values form the first are multiplied with the values of the second and then sum of the result gives the value for the result matrix.

    pDst[i,k] = pSrcA[i*M]*pSrcB[k] + pSrcA[i*M+1]*pSrcB[O+k] + ... + pSrcA[i*M+N-1]*pSrcB[O*(N-1)+k]

There are separate functions int8, int16, and int32 data types. For lower precision integers (int8, int16), functions exploiting SIMD instructions are provided.

The naming of the functions follows the following pattern (for example plp_dot_prod_i32s_rv32im):

    <pulp> _ <function name> _ <data type><precision><method>_<isa extension>, with
    data type = {f, i, q} respectively for floats, integers, fixed points
    precision = {32, 16, 8} bits
    method = {s, v, p} meaning single (or scalar, i.e. not using packed SIMD), vectorized (i.e. using SIMD instructions), and parallel (for multicore parallel computing), respectively.
    isa extension = rv32im, xpulpv2, etc. of which rv32im is the most general one.
 

Function Documentation

void plp_mat_mult_i16s_rv32im ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  M,
uint32_t  N,
uint32_t  O,
int32_t *__restrict__  pDstC 
)

Matrix multiplication of 16-bit integer matrices kernel for RV32IM extension.

Matrix matrix multiplication of a 16-bit integer matrices for RV32IM extension.

Parameters
[in]pSrcApoints to the first input matrix
[in]pSrcBpoints to the second input matrix
[in]Mheight of the first input matrix
[in]Nwidth of the first input matrix and hight of the second
[in]Owidth of the second input matrix
[out]pDstCpoints to the output matrix
Returns
none
void plp_mat_mult_i16v_xpulpv2 ( const int16_t *__restrict__  pSrcA,
const int16_t *__restrict__  pSrcB,
uint32_t  M,
uint32_t  N,
uint32_t  O,
int32_t *__restrict__  pDstC 
)

Matrix multiplication of 16-bit integer matrices kernel for XPULPV2 extension.

Matrix matrix multiplication of a 16-bit integer matrices for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input matrix
[in]pSrcBpoints to the second input matrix
[in]Mheight of the first input matrix
[in]Nwidth of the first input matrix and hight of the second
[in]Owidth of the second input matrix
[out]pDstCpoints to the output matrix
Returns
none
void plp_mat_mult_i16vp_xpulpv2 ( void *  args)

Parallel matrix multiplication of 16-bit integer matrices kernel for XPULPV2 extension.

Parameters
[in]argspointer to plp_mat_mult_instance_i16 struct initialized by plp_mat_mult_i16_parallel
Returns
none
Exploiting SIMD instructions
The 16 bit values are packed two each into 32 bit vectors and then the two dot products are performed on 32 bit vectors, with 32 bit accumulator.
void plp_mat_mult_i32p_xpulpv2 ( void *  args)

Parallel matrix multiplication of 32-bit integer matrices kernel for XPULPV2 extension.

Parallel matrix matrix multiplication of a 32-bit integer matrices for XPULPV2 extension.

Parameters
[in]argspointer to plp_mat_mult_instance_i32 struct initialized by plp_mat_mult_i32_parallel
Returns
none
void plp_mat_mult_i32s_rv32im ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  M,
uint32_t  N,
uint32_t  O,
int32_t *__restrict__  pDstC 
)

Matrix multiplication of 32-bit integer matrices kernel for RV32IM extension.

Matrix matrix multiplication of a 32-bit integer matrices for RV32IM extension.

Parameters
[in]pSrcApoints to the first input matrix
[in]pSrcBpoints to the second input matrix
[in]Mheight of the first input matrix
[in]Nwidth of the first input matrix and hight of the second
[in]Owidth of the second input matrix
[out]pDstCpoints to the output matrix
Returns
none
void plp_mat_mult_i32s_xpulpv2 ( const int32_t *__restrict__  pSrcA,
const int32_t *__restrict__  pSrcB,
uint32_t  M,
uint32_t  N,
uint32_t  O,
int32_t *__restrict__  pDstC 
)

Matrix multiplication of 32-bit integer matrices kernel for XPULPV2 extension.

Matrix matrix multiplication of a 32-bit integer matrices for XPULPV2 extension.

Parameters
[in]pSrcApoints to the first input matrix
[in]pSrcBpoints to the second input matrix
[in]Mheight of the first input matrix
[in]Nwidth of the first input matrix and hight of the second
[in]Owidth of the second input matrix
[out]pDstCpoints to the output matrix
Returns
none
void plp_mat_mult_i8s_rv32im ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  M,
uint32_t  N,
uint32_t  O,
int32_t *__restrict__  pDstC 
)

Matrix multiplication of 8-bit integer matrices kernel for RV32IM extension.

Matrix matrix multiplication of a 8-bit integer matrices for RV32IM extension.

Parameters
[in]pSrcApoints to the first input matrix
[in]pSrcBpoints to the second input matrix
[in]Mheight of the first input matrix
[in]Nwidth of the first input matrix and hight of the second
[in]Owidth of the second input matrix
[out]pDstCpoints to the output matrix
Returns
none
void plp_mat_mult_i8v_xpulpv2 ( const int8_t *__restrict__  pSrcA,
const int8_t *__restrict__  pSrcB,
uint32_t  M,
uint32_t  N,
uint32_t  O,
int32_t *__restrict__  pDstC 
)

Matrix matrix multiplication of a 8-bit integer matrices for XPULPV2 extension.


Parameters
[in]pSrcApoints to first the input matrix
[in]pSrcBpoints to second the input matrix
[in]MHeight of first matrix
[in]NWidth of first and heigt of second matrix
[in]OWidth of second matrix
[out]pDstCOutput is written here
Returns
none
Exploiting SIMD instructions
The 8 bit values are packed four each into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.
void plp_mat_mult_i8vp_xpulpv2 ( void *  args)

Parallel matrix multiplication of 8-bit integer matrices kernel for XPULPV2 extension.


Parameters
[in]argspointer to plp_mat_mult_instance_i8 struct initialized by plp_mat_mult_i8_parallel
Returns
none
Exploiting SIMD instructions
The 8 bit values are packed four each into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.