本文跟著小編一起來學(xué)習(xí)在linux kernel態(tài)下如何使用NEON對算法進(jìn)行加速的技巧,內(nèi)容通過圖文實(shí)例給大家做了詳細(xì)分析,一起來看下。
創(chuàng)新互聯(lián)是一家專注于網(wǎng)站建設(shè)、成都網(wǎng)站制作與策劃設(shè)計(jì),洋縣網(wǎng)站建設(shè)哪家好?創(chuàng)新互聯(lián)做網(wǎng)站,專注于網(wǎng)站建設(shè)十余年,網(wǎng)設(shè)計(jì)領(lǐng)域的專業(yè)建站公司;建站業(yè)務(wù)涵蓋:洋縣等地區(qū)。洋縣做網(wǎng)站價(jià)格咨詢:18982081108ARM處理器從cortex系列開始集成NEON處理單元,該單元可以簡單理解為協(xié)處理器,專門為矩陣運(yùn)算等算法設(shè)計(jì),特別適用于圖像、視頻、音頻處理等場景,應(yīng)用也很廣泛。
本文先對NEON處理單元進(jìn)行簡要介紹,然后介紹如何在內(nèi)核態(tài)下使用NEON,最后列舉實(shí)例說明。
一.NEON簡介
其實(shí)最好的資料就是官方文檔,Cortex™-A Series Programmer's Guide ,以下描述摘自該文檔
1.1 SIMD
NEON采用SIMD架構(gòu),single instruction multy data,一條指令處理多個(gè)數(shù)據(jù),NEON中這多個(gè)數(shù)據(jù)可以很多,而且配置靈活(8bit、16bit、32bit為單位,可多個(gè)單位數(shù)據(jù)),這是優(yōu)勢所在。
如下圖,APU需要至少四條指令完成加操作,而NEON只需要1條,考慮到ld和st,節(jié)省的指令更多。
上述特性,使NEON特別適合處理塊數(shù)據(jù)、圖像、視頻、音頻等。
1.2 NEON architecture overview
NEON也是load/store架構(gòu),寄存器為64bit/128bit,可形成向量化數(shù)據(jù),配合若干便于向量操作的指令。
1.2.1 commonality with VFP 1.2.2 data type
指令中的數(shù)據(jù)類型表示,例如VMLAL.S8:
1.2.3 registers
32個(gè)64bit寄存器,D0~D31;同時(shí)可組成16個(gè)128 bit寄存器,Q0~Q15。與VFP公用。
寄存器內(nèi)部的數(shù)據(jù)單位為8bit、16bit、32bit,可以根據(jù)需要靈活配置。
NEON的指令有Normal,Long,Wide,Narrow和Saturating variants等幾種后綴,是根據(jù)操作的源src和dst寄存器的類型確定的。
1.2.4 instruction set
1.3 NEON 指令分類概述
指令比較多, 詳細(xì)可參考Cortex™-A Series Programmer's Guide??纱篌w分為:
NEON general data processing instructions NEON shift instructions NEON logical and compare operations NEON arithmetic instructions NEON multiply instructions NEON load and store element and structure instructions B.8 NEON and VFP pseudo-instructions
簡單羅列一下各指令
無循環(huán)左移,負(fù)數(shù)左移按右移處理。
load和store指令不太好理解,說明一下。
1.4 NEON 使用方式
1.4.1 NEON使用方式
NEON有若干種使用方式:
C語言被編譯器自動向量化,需要增加編譯選項(xiàng),且C語言編碼時(shí)有若干注意事項(xiàng)。這種方式不確定性太大,沒啥實(shí)用價(jià)值 NEON匯編,可行,匯編稍微復(fù)雜一點(diǎn),但是核心算法還是值得的 intrinsics,gcc和armcc等編譯器提供了若干與NEON對應(yīng)的inline函數(shù),可直接在C語言里調(diào)用,這些函數(shù)反匯編時(shí)會直接編程響應(yīng)的NEON指令。這種方式比較實(shí)用與C語言環(huán)境,且相對簡單。本文后續(xù)使用這種方式進(jìn)行詳細(xì)說明。 1.4.2 C語言NEON數(shù)據(jù)類型
需包含arm_neon.h頭文件,該頭文件在gcc目錄里。都是向量數(shù)據(jù)。
typedef __builtin_neon_qi int8x8_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_hi int16x4_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_si int32x2_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_di int64x1_t; typedef __builtin_neon_sf float32x2_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_poly8 poly8x8_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_poly16 poly16x4_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_uqi uint8x8_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_uhi uint16x4_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_usi uint32x2_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_udi uint64x1_t; typedef __builtin_neon_qi int8x16_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_hi int16x8_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_si int32x4_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_di int64x2_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_sf float32x4_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_poly8 poly8x16_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_poly16 poly16x8_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_uqi uint8x16_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_uhi uint16x8_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_usi uint32x4_t __attribute__ ((__vector_size__ (16))); typedef __builtin_neon_udi uint64x2_t __attribute__ ((__vector_size__ (16))); typedef float float32_t; typedef __builtin_neon_poly8 poly8_t; typedef __builtin_neon_poly16 poly16_t; typedef struct int8x8x2_t { int8x8_t val[2]; } int8x8x2_t; typedef struct int8x16x2_t { int8x16_t val[2]; } int8x16x2_t; typedef struct int16x4x2_t { int16x4_t val[2]; } int16x4x2_t; typedef struct int16x8x2_t { int16x8_t val[2]; } int16x8x2_t; typedef struct int32x2x2_t { int32x2_t val[2]; } int32x2x2_t; typedef struct int32x4x2_t { int32x4_t val[2]; } int32x4x2_t; typedef struct int64x1x2_t { int64x1_t val[2]; } int64x1x2_t; typedef struct int64x2x2_t { int64x2_t val[2]; } int64x2x2_t; typedef struct uint8x8x2_t { uint8x8_t val[2]; } uint8x8x2_t; typedef struct uint8x16x2_t { uint8x16_t val[2]; } uint8x16x2_t; typedef struct uint16x4x2_t { uint16x4_t val[2]; } uint16x4x2_t; typedef struct uint16x8x2_t { uint16x8_t val[2]; } uint16x8x2_t; typedef struct uint32x2x2_t { uint32x2_t val[2]; } uint32x2x2_t; typedef struct uint32x4x2_t { uint32x4_t val[2]; } uint32x4x2_t; typedef struct uint64x1x2_t { uint64x1_t val[2]; } uint64x1x2_t; typedef struct uint64x2x2_t { uint64x2_t val[2]; } uint64x2x2_t; typedef struct float32x2x2_t { float32x2_t val[2]; } float32x2x2_t; typedef struct float32x4x2_t { float32x4_t val[2]; } float32x4x2_t; typedef struct poly8x8x2_t { poly8x8_t val[2]; } poly8x8x2_t; typedef struct poly8x16x2_t { poly8x16_t val[2]; } poly8x16x2_t; typedef struct poly16x4x2_t { poly16x4_t val[2]; } poly16x4x2_t; typedef struct poly16x8x2_t { poly16x8_t val[2]; } poly16x8x2_t; typedef struct int8x8x3_t { int8x8_t val[3]; } int8x8x3_t; typedef struct int8x16x3_t { int8x16_t val[3]; } int8x16x3_t; typedef struct int16x4x3_t { int16x4_t val[3]; } int16x4x3_t; typedef struct int16x8x3_t { int16x8_t val[3]; } int16x8x3_t; typedef struct int32x2x3_t { int32x2_t val[3]; } int32x2x3_t; typedef struct int32x4x3_t { int32x4_t val[3]; } int32x4x3_t; typedef struct int64x1x3_t { int64x1_t val[3]; } int64x1x3_t; typedef struct int64x2x3_t { int64x2_t val[3]; } int64x2x3_t; typedef struct uint8x8x3_t { uint8x8_t val[3]; } uint8x8x3_t; typedef struct uint8x16x3_t { uint8x16_t val[3]; } uint8x16x3_t; typedef struct uint16x4x3_t { uint16x4_t val[3]; } uint16x4x3_t; typedef struct uint16x8x3_t { uint16x8_t val[3]; } uint16x8x3_t; typedef struct uint32x2x3_t { uint32x2_t val[3]; } uint32x2x3_t; typedef struct uint32x4x3_t { uint32x4_t val[3]; } uint32x4x3_t; typedef struct uint64x1x3_t { uint64x1_t val[3]; } uint64x1x3_t; typedef struct uint64x2x3_t { uint64x2_t val[3]; } uint64x2x3_t; typedef struct float32x2x3_t { float32x2_t val[3]; } float32x2x3_t; typedef struct float32x4x3_t { float32x4_t val[3]; } float32x4x3_t; typedef struct poly8x8x3_t { poly8x8_t val[3]; } poly8x8x3_t; typedef struct poly8x16x3_t { poly8x16_t val[3]; } poly8x16x3_t; typedef struct poly16x4x3_t { poly16x4_t val[3]; } poly16x4x3_t; typedef struct poly16x8x3_t { poly16x8_t val[3]; } poly16x8x3_t; typedef struct int8x8x4_t { int8x8_t val[4]; } int8x8x4_t; typedef struct int8x16x4_t { int8x16_t val[4]; } int8x16x4_t; typedef struct int16x4x4_t { int16x4_t val[4]; } int16x4x4_t; typedef struct int16x8x4_t { int16x8_t val[4]; } int16x8x4_t; typedef struct int32x2x4_t { int32x2_t val[4]; } int32x2x4_t; typedef struct int32x4x4_t { int32x4_t val[4]; } int32x4x4_t; typedef struct int64x1x4_t { int64x1_t val[4]; } int64x1x4_t; typedef struct int64x2x4_t { int64x2_t val[4]; } int64x2x4_t; typedef struct uint8x8x4_t { uint8x8_t val[4]; } uint8x8x4_t; typedef struct uint8x16x4_t { uint8x16_t val[4]; } uint8x16x4_t; typedef struct uint16x4x4_t { uint16x4_t val[4]; } uint16x4x4_t; typedef struct uint16x8x4_t { uint16x8_t val[4]; } uint16x8x4_t; typedef struct uint32x2x4_t { uint32x2_t val[4]; } uint32x2x4_t; typedef struct uint32x4x4_t { uint32x4_t val[4]; } uint32x4x4_t; typedef struct uint64x1x4_t { uint64x1_t val[4]; } uint64x1x4_t; typedef struct uint64x2x4_t { uint64x2_t val[4]; } uint64x2x4_t; typedef struct float32x2x4_t { float32x2_t val[4]; } float32x2x4_t; typedef struct float32x4x4_t { float32x4_t val[4]; } float32x4x4_t; typedef struct poly8x8x4_t { poly8x8_t val[4]; } poly8x8x4_t; typedef struct poly8x16x4_t { poly8x16_t val[4]; } poly8x16x4_t; typedef struct poly16x4x4_t { poly16x4_t val[4]; } poly16x4x4_t; typedef struct poly16x8x4_t { poly16x8_t val[4]; } poly16x8x4_t;
文章題目:學(xué)習(xí)在kernel態(tài)下使用NEON對算法進(jìn)行加速的方法-創(chuàng)新互聯(lián)
文章路徑:http://chinadenli.net/article10/dhgsgo.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供服務(wù)器托管、App開發(fā)、網(wǎng)站設(shè)計(jì)、外貿(mào)建站、做網(wǎng)站、網(wǎng)站維護(hù)
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請盡快告知,我們將會在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場,如需處理請聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容