yanzhang
26667576aa
fix: x86 and android build failure
...
Change-Id: Ic4da99b67bcb4f785ef4493987398d5552b8f552
2025-08-05 16:17:00 +08:00
yanzhang
a52a884d6d
fix: failures for other platforms
...
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
Change-Id: Id7b3da7f3d0c4ef88f4c48bfa87471eaab764973
2025-08-05 15:30:22 +08:00
yanzhang
f699ca2842
Replace the KleidiAI macro with the hint config
...
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
Change-Id: I9dcc4f68e1e67a11266b66589006650b197cde1e
2025-08-04 15:58:03 +08:00
SixtyWang
380370fcf8
Fix bug in KleidiAIDenseConvolution, KleidiAIConvolution and QI4_SYM_CHNLQT_F32
...
- Corrected outputWidth calculation in KleidiAIDenseConvolution
- Fixed use-after-free due to late call to getPostParameters in KleidiAIConvolution
- Resolved SME symmetry quantization kernel problem
2025-07-31 16:56:51 +08:00
xiaying
433da8a041
MNN:Bugfix: Fix bug for GeometryReverse don't clear origin region
2025-07-30 12:03:40 +08:00
xiaying
db0f559f9d
MNN:Sync: Sync Internal 3.2.2
2025-07-23 14:33:57 +08:00
jxt1234
80a917a1b0
Merge pull request #3723 from jules-ai/feature/specify-cpu
...
android / android_build (push) Waiting to run
Details
linux / linux_buil_test (push) Waiting to run
Details
macos / macos_buil_test (push) Waiting to run
Details
windows / windows_build_test (push) Waiting to run
Details
ios / ios_build (push) Has been cancelled
Details
wiki / sync_wiki (push) Has been cancelled
Details
new feature to allow user to specify cpu
2025-07-21 10:05:24 +08:00
Jules
c018eacc00
refactor cpuids setting from BackendConfig to HintMode
2025-07-18 09:42:59 +00:00
Jules
8907c96b4e
convert power to cpuIds
2025-07-17 12:45:54 +00:00
yanzhang
8e7a63d622
Add imatmul fp16 support for DenseConv
...
Change-Id: Ifb6972146a9c7013328fc75e30ab27c6e3d92d6a
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
2025-07-16 18:24:21 +08:00
Jules
e69b0b5ce0
add windows preventing macro
2025-07-16 08:30:33 +00:00
Jules
19c97e3d15
remove unused variable initThreadNumber
2025-07-16 06:14:14 +00:00
Jules
110ae7f645
Implement cpuMask-based near-singleton global Thread Pool
2025-07-16 06:14:14 +00:00
Jules
e1ad3feeae
add MNNGetCPUMask
2025-07-16 05:42:17 +00:00
Jules
d87ebbe227
bind cpuIds
2025-07-16 05:41:41 +00:00
Jules
757020fcf3
add cpuIds validation
2025-07-16 05:41:34 +00:00
Jules
fa080f49e1
pass cpuIds to CPURuntime
2025-07-16 05:41:23 +00:00
yanzhang
4c9f48b76b
Minor fixes after rebase.
...
- Enable fp32 1x1 impl after it's right.
- Remove winograd memory test because kleidiai not support.
Change-Id: Ia61ad8c251490e93a2a365020a6183a944e769b2
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
2025-07-10 15:42:55 +08:00
yanzhang
5e3c8a3c12
Integrate the KleidiAI imatmul with fp32.
...
Note,
- No support for fp16 and int8 currently.
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
Change-Id: If17c911977dd7eb0603f41d64b8ba879f468ab98
2025-07-10 15:42:19 +08:00
SixtyWang
b5b5845787
Refactor Kleidiai code to fix MNN unit test issues
...
- Refactor getInstance function
- Add 1x1 convolution check in canAccelerate
- Use NHWC as input/output format for Kleidiai and convert format in onExecute
- Remove KAI_CONV_NCHW_IN_OUT macro
- Fix SME build issue on M4
2025-07-08 16:22:23 +08:00
SixtyWang
da8b7337c4
Refactor the KleidiAI integration ConvInt8 code in MNN
2025-07-08 16:21:26 +08:00
Shuheng Deng
875814bfb9
Refactor the KleidiAI integration convolution code in MNN
...
Change-Id: I2e45fb7ec10793afffad8c25c305c1cb1e260753
2025-07-04 09:54:00 +08:00
王召德
5f35d621e1
Merge pull request #3670 from yanzhang-dev/features/upgrade-kai-1.9.0
...
android / android_build (push) Has been cancelled
Details
ios / ios_build (push) Has been cancelled
Details
linux / linux_buil_test (push) Has been cancelled
Details
macos / macos_buil_test (push) Has been cancelled
Details
windows / windows_build_test (push) Has been cancelled
Details
Upgrade KleidiAI to v1.9.0
2025-07-02 17:54:46 +08:00
emil
fa3a03d6e3
只将OpenCL的context作为全局变量,并且只创建一次。将OpenCLRutime修改为每次创建新对象
2025-06-30 11:13:51 +08:00
yanzhang
9a56fe323e
Upgrade KleidiAI to v1.9.0
...
Change-Id: Ia910d9c296aa1569e9e4449b56bb7614fe6c85e0
Signed-off-by: yanzhang <yanzhang.wang@arm.com>
2025-06-26 13:46:40 +08:00
Jules
38c15d853b
align OneDNNConvInt8 with updated function signature
2025-06-24 01:46:34 +00:00
xiaying
aeac75acbf
[MNN:Bugfix] Fix opencl execute llm decode error (issue 3623)
2025-06-20 14:41:23 +08:00
jxt1234
3db3cc904d
Merge pull request #3635 from alibaba/feature/sync
...
android / android_build (push) Has been cancelled
Details
ios / ios_build (push) Has been cancelled
Details
linux / linux_buil_test (push) Has been cancelled
Details
macos / macos_buil_test (push) Has been cancelled
Details
pymnn-linux / pymnn_linux_buil_test (push) Has been cancelled
Details
pymnn-macos / pymnn_macos_buil_test (push) Has been cancelled
Details
pymnn-windows / pymnn_windows_buil_test (push) Has been cancelled
Details
wiki / sync_wiki (push) Has been cancelled
Details
windows / windows_build_test (push) Has been cancelled
Details
MNN:Sync: Sync Internal 3.2.1
2025-06-17 11:30:08 +08:00
xiaying
e5e7fccd99
MNN:Sync: Sync Internal 3.2.1
2025-06-17 11:08:21 +08:00
Jules
b9a22516bf
fix typo of operator
2025-06-17 02:10:14 +00:00
Jules
d30f99a2ae
remove unused variables
2025-06-17 02:06:28 +00:00
jxt1234
f9720f994e
Merge pull request #3588 from jules-ai/fix_int_offset_overflow
...
a propose to fix address offset overflow caused by using int
2025-06-11 13:54:04 +08:00
xiaying
210e861650
MNN:Sync: Sync some bugfix
2025-06-09 09:51:46 +08:00
xiaying
b66ec85018
OpenCL:Bugfix: Fix bug for recordable bug for llm
2025-06-06 08:48:18 +08:00
xiaying
09b6cd9862
MNN:Sync: Sync 3.2.0
2025-06-05 15:15:29 +08:00
Jules
7a08d2e2ee
Fix address offset overflow caused by using int
2025-05-30 10:38:23 +00:00
jxt1234
97223628bc
Merge pull request #3569 from linlinxu-1989/sme_fp16_fp32_new
...
Bump MNN KleidiAI ukernel to fp16_sme2 fp32_sme2 ukernel
2025-05-27 15:37:07 +08:00
jxt1234
c0dde12d89
Merge pull request #3564 from tumuyan/patch-1
...
fix MNN_SUPPORT_RENDER OFF
2025-05-27 15:07:56 +08:00
jxt1234
eef4a726cd
Merge pull request #3546 from HenryDen/buf_fix
...
Bugs fix for the KleidiAi integration of MNN
2025-05-27 14:54:37 +08:00
linlin.xu
556acbfea2
Add macro for kleidiai only resource
2025-05-27 14:46:50 +08:00
linlin.xu
a8f2d34b15
Bump MNN KleidiAI ukernel to fp16_sme2 fp32_sme2 ukernel
...
Add fp16 and fp32 sme2 kernel
1. Add rhs pack nxk kernel lhs pack kernel
2. Add fp16 GEMM kernel and GEMV kernel
3. Add fp32 GEMM kernel and GEMV kernel
2025-05-27 10:21:24 +08:00
tumuyan
caf73d6859
fix MNN_SUPPORT_RENDER OFF
2025-05-24 10:42:48 +08:00
Jules
03db1c2769
fix blockSize==1 assertion for DepthToSpace
2025-05-23 19:26:18 +08:00
zhaode.wzd
bd36a3f749
[MNN:Sync] Sync internal:
...
1. SmolVLM, FastVLM support.
2. QNN backend init.
3. Qwen3 MoE support.
4. Speculative decodeing init.
5. Some bugfix.
2025-05-23 15:24:18 +08:00
Shuheng Deng
986a065fff
Bugs fix for the KleidiAi integration of MNN
2025-05-22 17:35:57 +08:00
xiaying
d2477b3091
MNN:Bugfix: Fix bug for ios Compile error
2025-05-09 14:19:41 +08:00
xiaying
e457f0ce33
OpenCL:Bugfix: Fix bug for llm bench opencl crash
2025-05-09 14:12:58 +08:00
zhaode.wzd
a019d971ad
[MNN:Sync] Sync Internal 3.1.4.
2025-05-08 12:39:44 +08:00
xiaying
ba766b8bea
LLM:Feature, OpenCL:Bugfix: Sync internal bugfix( support llm prompt
...
dynamic setting, fix bug for opencl softmax in nvidia error)
2025-04-30 18:00:38 +08:00
jxt1234
f65ff90302
Merge pull request #3397 from xhzheng1895/kai_1.7.0_integration
...
android / android_build (push) Waiting to run
Details
ios / ios_build (push) Waiting to run
Details
linux / linux_buil_test (push) Waiting to run
Details
macos / macos_buil_test (push) Waiting to run
Details
windows / windows_build_test (push) Waiting to run
Details
Integrate Asym int4 ukernels to MNN
2025-04-30 11:15:41 +08:00