Commit Graph

207 Commits

Author SHA1 Message Date
xiaying d13f1bc0b6 Optimize Strassen Merge C Function for x86 2020-07-04 01:06:18 +08:00
xiaying 1567e74e40 Optmize NHWCToNC4HW4 and NC4HW4ToNHWC 2020-07-04 01:06:18 +08:00
xiaying d88cde6237 Use strassen for Convolution1x1Strassen 2020-07-04 01:06:18 +08:00
xiaying ae91cab1b8 Support Strassen for new matmul 2020-07-04 01:06:18 +08:00
xiaying ac15e9fcec Support get pack mode for each platform 2020-07-04 01:06:18 +08:00
houjiang 91c70ba559 Fix cpu binary. 2020-07-04 01:06:18 +08:00
xiaying 5fc7acd37e support transpose, fix bug for not align 2020-07-04 01:06:18 +08:00
xiaying a3a43a6d9b Add prefetch for _AVX_MNNGemm16x6.S , from 31 ms -> 29 ms 2020-07-04 01:06:18 +08:00
xiaying 3dc9cbb740 Optmize CPUMatMul for x86 avx256 by 16x6 540, 320, 540 from 3.8 ms -> 2.5 ms 1024, 1024, 1024 from 39 ms -> 31 ms 2020-07-04 01:06:18 +08:00
xiaying cf0896e71a Add 16x6 GEMM 2020-07-04 01:06:17 +08:00
xiaying 8ea506cb57 Add asm for _AVX_MNNGemmFloatUnit_4, 1024x1024x1024 from 39 ms -> 37 ms 2020-07-04 01:06:17 +08:00
xiaying 8bce0519af Use AVX to optimize mnnmatrix add, but make slow in mac 2020-07-04 01:06:17 +08:00
xiaying 976e6d0e6f Use ASM MNNMatrixAdd instead of C 2020-07-04 01:06:17 +08:00
jxt1234 e9cde2ffe4
Merge pull request #945 from krayzemli/fix_BufferAllocator
Don't increase reference count when extracting a block from a non-splitable freelist
2020-07-03 11:11:59 +08:00
jxt1234 47af4892d0
Merge pull request #941 from krayzemli/fix_CpuQuantizedAdd
Fix out-of-bounds access in CPUQuantizedAdd::onExecute
2020-07-03 10:29:59 +08:00
jxt1234 234f423e54
Merge pull request #942 from krayzemli/fix_CPUQuantizedLogistic
Fix CPUQuantizedLogistic::onExecute access to the model which could have been released
2020-07-03 10:07:11 +08:00
Roman Maltsev b750d419b8 Fix memory leak in CPUDetectionPostProcess 2020-07-02 17:45:32 +07:00
Roman Maltsev 36bd8f1a35 Fix CPUQuantizedLogistic::onExecute access to the model which could have been released 2020-07-02 17:06:58 +07:00
Roman Maltsev 2d3d4a2242 Don't increase reference count when extracting a block from a non-splitable free list, since returning a block to a non-mergeable free list does not increment this count. 2020-07-02 17:00:33 +07:00
Roman Maltsev 98bac405be Fix out-of-bounds access in CPUQuantizedAdd::onExecute 2020-07-02 16:42:09 +07:00
jxt1234 f3dd23a048
Merge pull request #848 from Interfish/master
Add BlstmComputer
2020-06-23 20:53:31 +08:00
誉阳 0d84ab23c5 [PATCH 9/9] armv82 support prelu 2020-06-19 16:48:01 +08:00
誉阳 510ef0fe11 [PATCH 8/9] fix some bugs 2020-06-19 16:48:01 +08:00
誉阳 3ed28acab1 [PATCH 7/9] fix compile bug in android studio 2020-06-19 16:48:01 +08:00
誉阳 7a1f7a03d7 [PATCH 6/9] fix android compile bug for armv7 2020-06-19 16:48:01 +08:00
誉阳 103d8a04dc [PATCH 5/9] fix bug 2020-06-19 16:48:00 +08:00
誉阳 d2aeea2a5a [PATCH 4/9] fix concat tf and fix android so path 2020-06-19 16:48:00 +08:00
誉阳 8bcae90720 [PATCH 3/9] remove static register for iOS 2020-06-19 16:48:00 +08:00
誉阳 f0d3e68e99 [PATCH 2/9] arm82 ops:Interp, Padding, TensorConverter 2020-06-19 16:48:00 +08:00
誉阳 dae041e7ec [PATCH 1/9] fix build ios bug 2020-06-19 16:48:00 +08:00
Interfish af807b050b Update: fix release buffer 2020-06-14 17:11:37 +08:00
xiaying 7c5d79fa2b [OpenCL:Bugfix] Fix bug for reduction error 2020-06-13 13:06:44 +08:00
誉阳 148353c777 fix pool shape and onnx slice, add printShape() method 2020-06-13 11:20:19 +08:00
誉阳 fbaa3b9de3 fix broadcastTo bug 2020-06-12 08:58:57 +08:00
jxt1234 ad4e26c789 Fix bug for onnx gather execute error 2020-06-11 17:50:20 +08:00
xiaying f5dae040a0 Support broacast for MatMul and fix bug for onnx slice 2020-06-03 22:47:29 +08:00
Interfish 0ae7eab53c Update: fix vmulq 2020-05-31 15:08:55 +08:00
Interfish 067063ff02 Update: try to fix 2020-05-31 14:44:41 +08:00
Interfish bd4a4847f6 Merge branch 'master' of https://github.com/alibaba/MNN into alibaba-master 2020-05-31 13:50:56 +08:00
Interfish 65835e5357 Update: typo fixed 2020-05-31 13:43:14 +08:00
Interfish 400e6c71e7 Add BlstmComputer 2020-05-31 00:36:18 +08:00
xiaying c99c4b7f16 Fix bug for sub swap error 2020-05-28 14:57:14 +08:00
玄裳 0df31a8667 MNN 1.0.0 release sync.
- Added Python Express API implemented with pbind11
- Added demos for Python Express API
- Performance improvements for ARM64, ARMv8.2, x86.
- README update.
2020-05-07 18:22:11 +08:00
zjd1988 cfa25aa363 It may not cause excute error, but will cost too much memory 2020-05-06 09:42:01 +08:00
jxt1234 c57d7fc8a6
Merge pull request #772 from wtiandong/master
[OpenCL:Bugfix] feat: Add Transpose B support to Opencl MatMul
2020-04-29 18:44:45 +08:00
Tiandong Wang d7acd36463 feat: Add Transpose B support to Opencl MatMul
1. Add Transpose B support to Opencl MatMul
2. Fix binary operator when the second operand is scalar, but its dimension is not 0, e.g, 1x1x1x1.
2020-04-29 12:10:16 +08:00
xiaying bf6285a178 [MNN:Sync] Sync internal github 2020-04-29 10:12:16 +08:00
xiaying b3ad25723e Speed up CPUDetectionOutput for spectial case 2020-04-29 10:02:22 +08:00
和彬 805615b380 avoid same name register in surronding ld1 in gemm common and gemm one 2020-04-29 10:02:16 +08:00
誉阳 4c35997a80 opt int8 asm, -1ms 2020-04-29 10:00:50 +08:00