xiaying
7fde7c7079
MNN:Sync: Sync Internal 3.2.4
2025-09-22 23:05:26 +08:00
xiaying
318a3de860
MNN:Sync: Sync Internal 3.2.3
2025-08-22 18:04:08 +08:00
Shuheng Deng
b9087ffc15
Integrate KleidiAI SME2 Q4 asymmetric kernels into MNN
...
Change-Id: I9413ea48d0a6de0a4077150fb93ce1da52a36d76
2025-08-19 10:24:15 +08:00
zhaode.wzd
55a59a7ebc
[MNN:Sync] Sync Internal reranker, gpt-oss.
2025-08-08 12:24:23 +08:00
yanzhang
a52a884d6d
fix: failures for other platforms
...
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
Change-Id: Id7b3da7f3d0c4ef88f4c48bfa87471eaab764973
2025-08-05 15:30:22 +08:00
yanzhang
f699ca2842
Replace the KleidiAI macro with the hint config
...
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
Change-Id: I9dcc4f68e1e67a11266b66589006650b197cde1e
2025-08-04 15:58:03 +08:00
SixtyWang
380370fcf8
Fix bug in KleidiAIDenseConvolution, KleidiAIConvolution and QI4_SYM_CHNLQT_F32
...
- Corrected outputWidth calculation in KleidiAIDenseConvolution
- Fixed use-after-free due to late call to getPostParameters in KleidiAIConvolution
- Resolved SME symmetry quantization kernel problem
2025-07-31 16:56:51 +08:00
xiaying
db0f559f9d
MNN:Sync: Sync Internal 3.2.2
2025-07-23 14:33:57 +08:00
yanzhang
8e7a63d622
Add imatmul fp16 support for DenseConv
...
Change-Id: Ifb6972146a9c7013328fc75e30ab27c6e3d92d6a
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
2025-07-16 18:24:21 +08:00
yanzhang
4c9f48b76b
Minor fixes after rebase.
...
- Enable fp32 1x1 impl after it's right.
- Remove winograd memory test because kleidiai not support.
Change-Id: Ia61ad8c251490e93a2a365020a6183a944e769b2
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
2025-07-10 15:42:55 +08:00
yanzhang
5e3c8a3c12
Integrate the KleidiAI imatmul with fp32.
...
Note,
- No support for fp16 and int8 currently.
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
Change-Id: If17c911977dd7eb0603f41d64b8ba879f468ab98
2025-07-10 15:42:19 +08:00
SixtyWang
b5b5845787
Refactor Kleidiai code to fix MNN unit test issues
...
- Refactor getInstance function
- Add 1x1 convolution check in canAccelerate
- Use NHWC as input/output format for Kleidiai and convert format in onExecute
- Remove KAI_CONV_NCHW_IN_OUT macro
- Fix SME build issue on M4
2025-07-08 16:22:23 +08:00
SixtyWang
da8b7337c4
Refactor the KleidiAI integration ConvInt8 code in MNN
2025-07-08 16:21:26 +08:00
Shuheng Deng
875814bfb9
Refactor the KleidiAI integration convolution code in MNN
...
Change-Id: I2e45fb7ec10793afffad8c25c305c1cb1e260753
2025-07-04 09:54:00 +08:00
Jules
7a08d2e2ee
Fix address offset overflow caused by using int
2025-05-30 10:38:23 +00:00
jxt1234
97223628bc
Merge pull request #3569 from linlinxu-1989/sme_fp16_fp32_new
...
Bump MNN KleidiAI ukernel to fp16_sme2 fp32_sme2 ukernel
2025-05-27 15:37:07 +08:00
jxt1234
eef4a726cd
Merge pull request #3546 from HenryDen/buf_fix
...
Bugs fix for the KleidiAi integration of MNN
2025-05-27 14:54:37 +08:00
linlin.xu
556acbfea2
Add macro for kleidiai only resource
2025-05-27 14:46:50 +08:00
linlin.xu
a8f2d34b15
Bump MNN KleidiAI ukernel to fp16_sme2 fp32_sme2 ukernel
...
Add fp16 and fp32 sme2 kernel
1. Add rhs pack nxk kernel lhs pack kernel
2. Add fp16 GEMM kernel and GEMV kernel
3. Add fp32 GEMM kernel and GEMV kernel
2025-05-27 10:21:24 +08:00
zhaode.wzd
bd36a3f749
[MNN:Sync] Sync internal:
...
1. SmolVLM, FastVLM support.
2. QNN backend init.
3. Qwen3 MoE support.
4. Speculative decodeing init.
5. Some bugfix.
2025-05-23 15:24:18 +08:00
Shuheng Deng
986a065fff
Bugs fix for the KleidiAi integration of MNN
2025-05-22 17:35:57 +08:00
xiaying
d2477b3091
MNN:Bugfix: Fix bug for ios Compile error
2025-05-09 14:19:41 +08:00
zhaode.wzd
a019d971ad
[MNN:Sync] Sync Internal 3.1.4.
2025-05-08 12:39:44 +08:00
jxt1234
f65ff90302
Merge pull request #3397 from xhzheng1895/kai_1.7.0_integration
...
android / android_build (push) Waiting to run
Details
ios / ios_build (push) Waiting to run
Details
linux / linux_buil_test (push) Waiting to run
Details
macos / macos_buil_test (push) Waiting to run
Details
windows / windows_build_test (push) Waiting to run
Details
Integrate Asym int4 ukernels to MNN
2025-04-30 11:15:41 +08:00
xiaying
0769b81b58
MNN:Sync: Sync Internal 3.1.3
2025-04-28 11:50:24 +08:00
xinhao.zheng
9471946f9a
Refine some code to avoid redundant data reorder.
2025-04-27 15:57:31 +08:00
xinhao.zheng
c043aed737
Fix thread num calculate bug
2025-04-25 08:27:01 +08:00
xinhao.zheng
e376109335
Integrate Asym int4 ukernels to MNN
...
Integrate two kernels matmul_clamp_f16_qsi8d32p_qai4c32p
and matmul_clamp_f32_qsi8d32p_qai4c32p. Now KleidiAI support
asymmetric & block wise quantified & f32/f16activation.
Remove .tar.gz package from source code. Will download from
https://gitlab.arm.com/kleidi/kleidiai/-/releases when cmake.
2025-04-24 08:32:00 +08:00
Xi Guo
b41edfc29e
Merge branch 'master' into build-qnx
2025-03-12 13:07:45 +08:00
xiaying
c0247c6998
MNN:Sync: Sync Internal 3.1.1
2025-03-12 11:35:16 +08:00
Xi Guo
db0311da39
feat: add compatibility support for QNX Neutrino
2025-03-10 19:14:25 +08:00
xiaying
3b6ddc0341
MNN:Sync: Sync Internal 3.0.5
2025-02-12 11:14:19 +08:00
xinhao.zheng
332912cb6b
Integrate KleidiAI sme int4 kernel
...
Add logic to select micro kernel functions when SME2 is enable.
Thread number will be forced to 1 when run matmul, for better
energy efficiency ratio.
2025-02-11 14:23:54 +08:00
xhzheng1895
444816b145
Merge branch 'alibaba:master' into mnn_kai_interface_refactor
2025-02-11 10:21:56 +08:00
xiaying
766815282f
MNN:Sync: Sync Internal 3.0.4
2025-01-22 16:28:36 +08:00
xinhao.zheng
73344cd4d0
Refine some functions.
2025-01-09 09:58:26 +08:00
xinhao.zheng
82efb15c3f
Refactor MNN KleidiAI interface
...
Refactor MNN_KleidiAI interface to support more model types,
and facilitate subsequent KleidiAI ukernels' integration.
Re-abstract information stored in class KleidiAI:
1) static info: not related to loaded model, initialized when
interface is constructed and never changed.
2) status: will change while pipeline is running.
Let interface and loaded model decouple for more complex mix
of multiple types of models. Add mAccelType in MNN data structure,
kleidiAI interface will rely on this type to decide which branch
to go.
Move some pack functions to mnn_kleidiai_util.cpp.
Add CPU feature detection in source/backend/cpu/CPURuntime.hpp.
Subsequent ukernels need SME information.
2025-01-07 10:18:20 +08:00
jxt1234
d6e10c25cb
Merge pull request #3145 from alibaba/feature/sync
...
MNN:Sync: sync internal 3.0.3
2024-12-31 17:20:33 +08:00
xiaying
27da5e7d48
MNN:Sync: sync internal 3.0.3
2024-12-31 15:34:41 +08:00
kekxv
0f1f18ae5e
fix: ssize_t 旧版本gcc未定义问题
...
添加 #include <sys/types.h> 修复 ssize_t 旧版本gcc未定义问题
2024-12-30 09:13:43 +08:00
xiaying
da4023c222
MNN:Sync: Sync Interal 3.0.2
2024-12-19 20:34:17 +08:00
xiaying
809bff1b30
MNN:Sync: Sync Internal 3.0.1
2024-12-02 10:12:08 +08:00
xiaying
5b901d9d87
MNN:Sync: Sync Internal 3.0.0
2024-11-18 14:40:27 +08:00
yanxing
8f6a1234ae
add acthalf and blockwise condition in canAccelerate.
2024-10-28 17:11:28 +08:00
xinhao.zheng
28115248d7
Refine rhs pack
2024-10-24 14:07:29 +08:00
xinhao.zheng
39dadd08d2
Refine some code
2024-10-22 15:11:21 +08:00
xinhao.zheng
6f5be724a9
Update MNN to latest version
2024-10-22 14:47:33 +08:00
xinhao.zheng
644f22f028
Update mnn_kleidiai interface.
2024-10-22 14:30:07 +08:00
王召德
95a6e4190a
Bugfix of thread workload.
2024-10-22 14:29:55 +08:00
xinhao.zheng
f075372c14
Integrate kleidiAI release v0.1.0 into MNN 2.9.3
...
Put KleidiAI files in folder source/backend/cpu/arm/kleidiAI/kai,
download from arm gitlab and remain unchanged. Maybe will remove
these files and download them when build.
MNNKleidiAI.cpp is interface between MNN and KleidiAI.
Rewrite function in class DenseConvInt8TiledExecutor
, in ConvInt8TiledExecutor.cpp, to call KleidiAI functions.
Maybe implement a new execution later.
Changes to GeometryConvUtils.cpp and ShapeTensorConvert.cpp are for
the input and output of DenseConvInt8TiledExecutor is NCHW,
rather than NC4HW4, to avoid redundant pack/unpack and get better
performance.
2024-10-22 14:29:13 +08:00