Hello everyone, I am a ruffian balance, is a serious technical ruffian. I. mx rt1XXX series internal DCP engine to calculate Hash values need special processing L1 D-cache.
About the INTERNAL general data coprocessor DCP module of I.MX rT1XXX series, Ruffian has written an article “SNVS Master Key can only be used for DCP encryption and decryption when I.MX RT10XX Hab is closed”, which introduces the basic functions of DCP and matters for attention when using AES encryption and decryption. In fact, in addition to the AES encryption and decryption algorithm, the DCP module also supports the classical Hash algorithm (SHA-1/SHA-256/CRC32).
Ruffian has recently supported a large I.MAXrt client. Their project uses DCP to do Hash operation, but the probability of Hash check failure occurs (approximately 50 times, one time will fail). What is the situation?
I. Basic information of customer projects
First, let me introduce the basic information of the client. The main chip used in their project is I. MXRT1062, which is configured with external serial Flash storage program code (XiP) and external SDRAM placing program data area (actually it is mainly made of frameBuffer, but it also places. Data segment and STACK). The project is based on the SDK version v2.6.2.
The mbedTLs_sha256 () function in \ sdk_2.6.2_evk-mimxrt1060 \middleware\ mbedTLS \library\sha256.c was called. This function is further implemented by calling a series of low-level functions mbedTLs_sha256_xx () in \ sdk_2.6.2_evk-mimxrt1060 \middleware\ mbedTLS \port\ ksdK_mbedTLs.c.
Ksdk_mbedtls. C file is at the same time apply Kinetis/LPC/i.M XRT series MCU, such as different on MCU hardware engine (such as a LTC/CAAM CAU3 / DCP/HashCrypt). For i.mx rt1xxx, the hardware engine is DCP. These mbedTLs_sha256_xx () functions mainly call the following functions in the SDK standard driver fsl_dcp.c:
status_t DCP_HASH_Init(DCP_Type *base, dcp_handle_t *handle, dcp_hash_ctx_t *ctx, dcp_hash_algo_t algo);
status_t DCP_HASH_Update(DCP_Type *base, dcp_hash_ctx_t *ctx, const uint8_t *input, size_t inputSize);
status_t DCP_HASH_Finish(DCP_Type *base, dcp_hash_ctx_t *ctx, uint8_t *output, size_t *outputSize);
Copy the code
Two, the probability of failure analysis
If the fsl_dcp.c driver handles the Cache properly, the fsl_dcp.c driver handles the Cache properly. Let’s open the \ sdk_2.2_evk-MIMxrt1060 \boards\evkmimxrt1060\driver_examples\ DCP routine and take a look first, you can see the obvious warning in the main() function of the dcp.c file. If SDRAM is used in the project, DCache must be turned off, indicating that the DCP driver does not support running with DCache enabled. However, it is obvious that SDRAM was used in this client’s project. Later, I confirmed with the client that their DCache has always been enabled, which is obviously a problem.
int main(void)
{
dcp_config_t dcpConfig;
/* Init hardware*/
BOARD_ConfigMPU();
BOARD_InitPins();
BOARD_BootClockRUN();
BOARD_InitDebugConsole();
/* Data cache must be temporarily disabled to be able to use sdram */SCB_DisableDCache(); .Copy the code
Let’s go back to the SDK version again. You can see all the historical versions of the I.MX RT1060 SDK on the NXP SDK download page. V2.6.2 was released in July 2019 (the DCP driver version in this version is V2.1.1). Now the latest SDK version is v2.9.3 (THE DCP driver has been upgraded to V2.1.6). It has been nearly two years, and the client has not updated the SDK version in real time.
Early DCP drivers did not handle DCache, so they had to be turned off to work. DCache handling has been added since V2.1.5 so that the DCP driver can work properly with DCache enabled.
How does DCache work in the DCP driver?
Now let’s see how the SDK standard driver fsl_dcp.c adds DCache processing.
3.1 DCP Context Buffer Settings
The first step in using the DCP driver is to initialize the DCP module, namely the DCP_Init() function. This function will enable all four channels of the module in the DCP->CTRL register, as well as the cache and channel auto-switch functions of the Context. The context switch has had an important private s_dcpContextSwitchingBuffer global variables, this variable is placed in the NON – CACHE area (a drive to improve). The following DCP – > the CONTEXT is used to store s_dcpContextSwitchingBuffer address register.
AT_NONCACHEABLE_SECTION_INIT(static dcp_context_t s_dcpContextSwitchingBuffer);
void DCP_Init(DCP_Type *base, const dcp_config_t *config)
{
// code omitted...
/* use context switching buffer */
base->CONTEXT = (uint32_t)&s_dcpContextSwitchingBuffer;
}
Copy the code
3.2 DCP User Data in/out Buffer Settings
After the DCP module is initialized, the DCP_HASH() function in the DCP driver is called to Hash. This function has two user buffers and an Input Buffer to store the message data to be calculated. The other Output Buffer holds the calculated Hash value (SHA256 is 32bytes), and these two buffers are best handled by the user and placed in the non-cache area.
/* Input data for DCP like input and output should be handled properly * when DCACHE is used (e.g. Clean&Invalidate, use non-cached memory) */
AT_NONCACHEABLE_SECTION(static uint8_t s_outputSha256[32]);
status_t calc_sha256(const uint8_t *messageBuf, uint32_t messageLen)
{
size_t outLength = sizeof(s_outputSha256);
dcp_handle_t m_handle;
m_handle.channel = kDCP_Channel0;
m_handle.keySlot = kDCP_KeySlot0;
m_handle.swapConfig = kDCP_NoSwap;
memset(&s_outputSha256, 0, outLength);
return DCP_HASH(DCP, &m_handle, kDCP_Sha256, messageBuf, messageLen, s_outputSha256, &outLength);
}
Copy the code
3.3 DCache processing in DCP_HASH() code
The DCP_HASH() function always uses a very critical internal structure, dcp_hash_ctx_internal_t, This structure has a size of 47 Words (containing a 128byte block of message data to be computed, a 32byte runningHash of real-time computed results, and other auxiliary variable members).
/ *! internal dcp_hash context structure */
typedef struct _dcp_hash_ctx_internal
{
dcp_hash_block_t blk; / *! < memory buffer. only full blocks are written to DCP during hash updates */
size_t blksz; / *! < number of valid bytes in memory buffer */
dcp_hash_algo_t algo; / *! < selected algorithm from the set of supported algorithms */
dcp_hash_algo_state_t state; / *! < finite machine state of the hash software process */
uint32_t fullMessageSize; / *! < track message size */
uint32_t ctrl0; / *! < HASH_INIT and HASH_TERM flags */
uint32_t runningHash[9]; / *! < running hash. up to SHA-256 plus size, that is 36 bytes. */
dcp_handle_t *handle;
} dcp_hash_ctx_internal_t;
Copy the code
The DCP driver directly defines the local variable hashCtx of type DCP_hash_CTX_T, and the hashCtx space is subsequently used as dCP_hash_CTX_internal_t. The value of DCP_HASH_CTX_SIZE was 58 in the old version, but has been increased to 64 in the new version for LINE alignment of L1DCACHE (driver improvements 2).
/ *! @brief DCP HASH Context size. */
#define DCP_HASH_CTX_SIZE 64
/ *! @brief Storage type used to save hash context. */
typedef struct _dcp_hash_ctx_t
{
uint32_t x[DCP_HASH_CTX_SIZE];
} dcp_hash_ctx_t;
status_t DCP_HASH(DCP_Type *base, dcp_handle_t *handle, dcp_hash_algo_t algo, const uint8_t *input, size_t inputSize, uint8_t *output, size_t *outputSize)
{
dcp_hash_ctx_t hashCtx = {0};
status_t status;
status = DCP_HASH_Init(base, handle, &hashCtx, algo);
status = DCP_HASH_Update(base, &hashCtx, input, inputSize);
status = DCP_HASH_Finish(base, &hashCtx, output, outputSize);
// ...
}
status_tDCP_HASH_Init/Update/Finish(... .dcp_hash_ctx_t* CTX,...). {dcp_hash_ctx_internal_t *ctxInternal;
/* Align structure on DCACHE line*/
#if defined(__DCACHE_PRESENT) && (__DCACHE_PRESENT == 1U) && defined(DCP_USE_DCACHE) && (DCP_USE_DCACHE == 1U)
ctxInternal = (dcp_hash_ctx_internal_t(*)uint32_t) ((uint8_t *)ctx + FSL_FEATURE_L1DCACHE_LINESIZE_BYTE);
#else
ctxInternal = (dcp_hash_ctx_internal_t(*)uint32_t)ctx;
#endif
// code omitted...
}
Copy the code
DCACHE_InvalidateByRange() is called to clean up the ctxInternal space before the DCP_HASH() engine is started to calculate the message block data. The function that starts the DCP engine once is dcp_hash_update(), which makes use of the dcp_work_packet_T structure variable. For this structure, L1DCACHE alignment is also done in the code:
/ *! @brief DCP's work packet. */
typedef struct _dcp_work_packet
{
uint32_t nextCmdAddress;
uint32_t control0;
uint32_t control1;
uint32_t sourceBufferAddress;
uint32_t destinationBufferAddress;
uint32_t bufferSize;
uint32_t payloadPointer;
uint32_t status;
} dcp_work_packet_t;
#if defined(__DCACHE_PRESENT) && (__DCACHE_PRESENT == 1U) && defined(DCP_USE_DCACHE) && (DCP_USE_DCACHE == 1U)
static inline uint32_t *DCP_FindCacheLine(uint8_t *dcpWorkExt)
{
while (0U! = ((uint32_t)dcpWorkExt & ((uint32_t)FSL_FEATURE_L1DCACHE_LINESIZE_BYTE - 1U)))
{
dcpWorkExt++;
}
return (uint32_t(*)uint32_t)dcpWorkExt;
}
#endif
static status_t dcp_hash_update(DCP_Type *base, dcp_hash_ctx_internal_t *ctxInternal, const uint8_t *msg, size_t size)
{
status_t completionStatus = kStatus_Fail;
/* Use extended DCACHE line size aligned structure */
#if defined(__DCACHE_PRESENT) && (__DCACHE_PRESENT == 1U) && defined(DCP_USE_DCACHE) && (DCP_USE_DCACHE == 1U)
dcp_work_packet_t *dcpWork;
uint8_t dcpWorkExt[sizeof(dcp_work_packet_t) + FSL_FEATURE_L1DCACHE_LINESIZE_BYTE] = {0U};
dcpWork = (dcp_work_packet_t(*)uint32_t)DCP_FindCacheLine(dcpWorkExt);
#else
dcp_work_packet_t dcpWorkPacket = {0};
dcp_work_packet_t *dcpWork = &dcpWorkPacket;
#endif
do
{
completionStatus = dcp_hash_update_non_blocking(base, ctxInternal, dcpWork, msg, size);
} while (completionStatus == (int32_t)kStatus_DCP_Again);
completionStatus = DCP_WaitForChannelComplete(base, ctxInternal->handle);
ctxInternal->ctrl0 = 0;
return (completionStatus);
}
Copy the code
At this point, the use of I.mx rt1XXX series internal DCP engine to calculate Hash values need special processing L1 D-cache ruffian scale will be introduced, applause where ~~~
Welcome to subscribe to
The article will be published on my blog park homepage, CSDN homepage, Zhihu homepage and wechat public account at the same time.
Wechat search “ruffian balance embedded” or scan the following two-dimensional code, you can see the first time on the phone oh.