Commit Graph

231 Commits

Author SHA1 Message Date
tianbu.xsw f8702ceaca add opencl kernel profile & revise some info in onExecute to onResize stage 2020-07-04 01:06:21 +08:00
xiaying 49ca95571d Revert "[MNN:Speed] Optmize winograd convolution"
This reverts commit 9e34b9a856ccf9d2a81bc9387a1c7dfbc6a12e5d.
2020-07-04 01:06:21 +08:00
xiaying c536436f64 Remove useless asm for easy compability for window 2020-07-04 01:06:20 +08:00
xiaying e708cff674 Optmize winograd convolution 2020-07-04 01:06:20 +08:00
xiaying 8e79b4abc4 Revert "[MNN:Speed] Add AVX512 MNNConvSlideWindowMiddle"
This reverts commit 498b977df2db2ddbd9e6938f8cd2a0c3d5b616d7.
2020-07-04 01:06:20 +08:00
root 85c2baaf6a Revert "[CV:Bugfix] Avoid use sse 4.1"
This reverts commit c2fc280ffd8504c3c8b499b38fba57ba7eb4e349.
2020-07-04 01:06:20 +08:00
xiaying 6396ec97bb Avoid use sse 4.1 2020-07-04 01:06:20 +08:00
xiaying 73b1a97315 Fix bug for compile error in linux 2020-07-04 01:06:20 +08:00
xiaying 97f8b91fee Add AVX512 MNNConvSlideWindowMiddle 2020-07-04 01:06:20 +08:00
root 868115665b Fix bug for _AVX512_MNNGemmFloatUnit_4.S's error 2020-07-04 01:06:19 +08:00
xiaying a718ef7382 Add YUV_I420 2020-07-04 01:06:19 +08:00
xiaying 2428ab001d Optmize YUV -> RGBA 2020-07-04 01:06:19 +08:00
xiaying 96f40fddcd Add sse opt for blitter 2020-07-04 01:06:19 +08:00
xiaying c6abcb9088 Fix bug for asm of _AVX512_MNNGemmFloatUnit_4 2020-07-04 01:06:19 +08:00
xiaying d2dd9ae22a Add _AVX512_MNNGemmFloatUnit_4 2020-07-04 01:06:19 +08:00
xiaying 746b50a56d Small opt _AVX_MNNPackedMatMul 2020-07-04 01:06:19 +08:00
xiaying 9f4f6c091d Add ../source/backend/cpu/x86_x64/avx/_AVX512_MNNPackedMatMul.S 2020-07-04 01:06:19 +08:00
xiaying acb7ca17aa Fix bug for asm align number 2020-07-04 01:06:19 +08:00
xiaying dfe1d06c08 Support multi-thread for 1x1 convolution 2020-07-04 01:06:19 +08:00
xiaying c7051d367c Temply forbid not im2col case for 1x1 conv 2020-07-04 01:06:19 +08:00
xiaying 93ea95ff30 Add MNNUnPackC4ForMatMul_C 2020-07-04 01:06:19 +08:00
xiaying a38a551993 Add MNNPackC4ForMatMul_A 2020-07-04 01:06:18 +08:00
xiaying a750fe0956 Rename _AVX_MNNGemm16x6 as _AVX_MNNPackedMatMul 2020-07-04 01:06:18 +08:00
xiaying 77f44dc1af Rename NHWC<->NC4HW4 as pack/unpack transpose 2020-07-04 01:06:18 +08:00
xiaying d13f1bc0b6 Optimize Strassen Merge C Function for x86 2020-07-04 01:06:18 +08:00
xiaying 1567e74e40 Optmize NHWCToNC4HW4 and NC4HW4ToNHWC 2020-07-04 01:06:18 +08:00
xiaying d88cde6237 Use strassen for Convolution1x1Strassen 2020-07-04 01:06:18 +08:00
xiaying ae91cab1b8 Support Strassen for new matmul 2020-07-04 01:06:18 +08:00
xiaying ac15e9fcec Support get pack mode for each platform 2020-07-04 01:06:18 +08:00
houjiang 91c70ba559 Fix cpu binary. 2020-07-04 01:06:18 +08:00
xiaying 5fc7acd37e support transpose, fix bug for not align 2020-07-04 01:06:18 +08:00
xiaying a3a43a6d9b Add prefetch for _AVX_MNNGemm16x6.S , from 31 ms -> 29 ms 2020-07-04 01:06:18 +08:00
xiaying 3dc9cbb740 Optmize CPUMatMul for x86 avx256 by 16x6 540, 320, 540 from 3.8 ms -> 2.5 ms 1024, 1024, 1024 from 39 ms -> 31 ms 2020-07-04 01:06:18 +08:00
xiaying cf0896e71a Add 16x6 GEMM 2020-07-04 01:06:17 +08:00
xiaying 8ea506cb57 Add asm for _AVX_MNNGemmFloatUnit_4, 1024x1024x1024 from 39 ms -> 37 ms 2020-07-04 01:06:17 +08:00
xiaying 8bce0519af Use AVX to optimize mnnmatrix add, but make slow in mac 2020-07-04 01:06:17 +08:00
xiaying 976e6d0e6f Use ASM MNNMatrixAdd instead of C 2020-07-04 01:06:17 +08:00
jxt1234 e9cde2ffe4
Merge pull request #945 from krayzemli/fix_BufferAllocator
Don't increase reference count when extracting a block from a non-splitable freelist
2020-07-03 11:11:59 +08:00
jxt1234 47af4892d0
Merge pull request #941 from krayzemli/fix_CpuQuantizedAdd
Fix out-of-bounds access in CPUQuantizedAdd::onExecute
2020-07-03 10:29:59 +08:00
jxt1234 234f423e54
Merge pull request #942 from krayzemli/fix_CPUQuantizedLogistic
Fix CPUQuantizedLogistic::onExecute access to the model which could have been released
2020-07-03 10:07:11 +08:00
Roman Maltsev b750d419b8 Fix memory leak in CPUDetectionPostProcess 2020-07-02 17:45:32 +07:00
Roman Maltsev 36bd8f1a35 Fix CPUQuantizedLogistic::onExecute access to the model which could have been released 2020-07-02 17:06:58 +07:00
Roman Maltsev 2d3d4a2242 Don't increase reference count when extracting a block from a non-splitable free list, since returning a block to a non-mergeable free list does not increment this count. 2020-07-02 17:00:33 +07:00
Roman Maltsev 98bac405be Fix out-of-bounds access in CPUQuantizedAdd::onExecute 2020-07-02 16:42:09 +07:00
jxt1234 f3dd23a048
Merge pull request #848 from Interfish/master
Add BlstmComputer
2020-06-23 20:53:31 +08:00
誉阳 0d84ab23c5 [PATCH 9/9] armv82 support prelu 2020-06-19 16:48:01 +08:00
誉阳 510ef0fe11 [PATCH 8/9] fix some bugs 2020-06-19 16:48:01 +08:00
誉阳 3ed28acab1 [PATCH 7/9] fix compile bug in android studio 2020-06-19 16:48:01 +08:00
誉阳 7a1f7a03d7 [PATCH 6/9] fix android compile bug for armv7 2020-06-19 16:48:01 +08:00
誉阳 103d8a04dc [PATCH 5/9] fix bug 2020-06-19 16:48:00 +08:00