tianbu.xsw
|
f8702ceaca
|
add opencl kernel profile & revise some info in onExecute to onResize stage
|
2020-07-04 01:06:21 +08:00 |
xiaying
|
49ca95571d
|
Revert "[MNN:Speed] Optmize winograd convolution"
This reverts commit 9e34b9a856ccf9d2a81bc9387a1c7dfbc6a12e5d.
|
2020-07-04 01:06:21 +08:00 |
xiaying
|
c536436f64
|
Remove useless asm for easy compability for window
|
2020-07-04 01:06:20 +08:00 |
xiaying
|
e708cff674
|
Optmize winograd convolution
|
2020-07-04 01:06:20 +08:00 |
xiaying
|
8e79b4abc4
|
Revert "[MNN:Speed] Add AVX512 MNNConvSlideWindowMiddle"
This reverts commit 498b977df2db2ddbd9e6938f8cd2a0c3d5b616d7.
|
2020-07-04 01:06:20 +08:00 |
root
|
85c2baaf6a
|
Revert "[CV:Bugfix] Avoid use sse 4.1"
This reverts commit c2fc280ffd8504c3c8b499b38fba57ba7eb4e349.
|
2020-07-04 01:06:20 +08:00 |
xiaying
|
6396ec97bb
|
Avoid use sse 4.1
|
2020-07-04 01:06:20 +08:00 |
xiaying
|
73b1a97315
|
Fix bug for compile error in linux
|
2020-07-04 01:06:20 +08:00 |
xiaying
|
97f8b91fee
|
Add AVX512 MNNConvSlideWindowMiddle
|
2020-07-04 01:06:20 +08:00 |
root
|
868115665b
|
Fix bug for _AVX512_MNNGemmFloatUnit_4.S's error
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
a718ef7382
|
Add YUV_I420
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
2428ab001d
|
Optmize YUV -> RGBA
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
96f40fddcd
|
Add sse opt for blitter
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
c6abcb9088
|
Fix bug for asm of _AVX512_MNNGemmFloatUnit_4
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
d2dd9ae22a
|
Add _AVX512_MNNGemmFloatUnit_4
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
746b50a56d
|
Small opt _AVX_MNNPackedMatMul
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
9f4f6c091d
|
Add ../source/backend/cpu/x86_x64/avx/_AVX512_MNNPackedMatMul.S
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
acb7ca17aa
|
Fix bug for asm align number
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
dfe1d06c08
|
Support multi-thread for 1x1 convolution
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
c7051d367c
|
Temply forbid not im2col case for 1x1 conv
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
93ea95ff30
|
Add MNNUnPackC4ForMatMul_C
|
2020-07-04 01:06:19 +08:00 |
xiaying
|
a38a551993
|
Add MNNPackC4ForMatMul_A
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
a750fe0956
|
Rename _AVX_MNNGemm16x6 as _AVX_MNNPackedMatMul
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
77f44dc1af
|
Rename NHWC<->NC4HW4 as pack/unpack transpose
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
d13f1bc0b6
|
Optimize Strassen Merge C Function for x86
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
1567e74e40
|
Optmize NHWCToNC4HW4 and NC4HW4ToNHWC
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
d88cde6237
|
Use strassen for Convolution1x1Strassen
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
ae91cab1b8
|
Support Strassen for new matmul
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
ac15e9fcec
|
Support get pack mode for each platform
|
2020-07-04 01:06:18 +08:00 |
houjiang
|
91c70ba559
|
Fix cpu binary.
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
5fc7acd37e
|
support transpose, fix bug for not align
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
a3a43a6d9b
|
Add prefetch for _AVX_MNNGemm16x6.S , from 31 ms -> 29 ms
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
3dc9cbb740
|
Optmize CPUMatMul for x86 avx256 by 16x6 540, 320, 540 from 3.8 ms -> 2.5 ms 1024, 1024, 1024 from 39 ms -> 31 ms
|
2020-07-04 01:06:18 +08:00 |
xiaying
|
cf0896e71a
|
Add 16x6 GEMM
|
2020-07-04 01:06:17 +08:00 |
xiaying
|
8ea506cb57
|
Add asm for _AVX_MNNGemmFloatUnit_4, 1024x1024x1024 from 39 ms -> 37 ms
|
2020-07-04 01:06:17 +08:00 |
xiaying
|
8bce0519af
|
Use AVX to optimize mnnmatrix add, but make slow in mac
|
2020-07-04 01:06:17 +08:00 |
xiaying
|
976e6d0e6f
|
Use ASM MNNMatrixAdd instead of C
|
2020-07-04 01:06:17 +08:00 |
jxt1234
|
e9cde2ffe4
|
Merge pull request #945 from krayzemli/fix_BufferAllocator
Don't increase reference count when extracting a block from a non-splitable freelist
|
2020-07-03 11:11:59 +08:00 |
jxt1234
|
47af4892d0
|
Merge pull request #941 from krayzemli/fix_CpuQuantizedAdd
Fix out-of-bounds access in CPUQuantizedAdd::onExecute
|
2020-07-03 10:29:59 +08:00 |
jxt1234
|
234f423e54
|
Merge pull request #942 from krayzemli/fix_CPUQuantizedLogistic
Fix CPUQuantizedLogistic::onExecute access to the model which could have been released
|
2020-07-03 10:07:11 +08:00 |
Roman Maltsev
|
b750d419b8
|
Fix memory leak in CPUDetectionPostProcess
|
2020-07-02 17:45:32 +07:00 |
Roman Maltsev
|
36bd8f1a35
|
Fix CPUQuantizedLogistic::onExecute access to the model which could have been released
|
2020-07-02 17:06:58 +07:00 |
Roman Maltsev
|
2d3d4a2242
|
Don't increase reference count when extracting a block from a non-splitable free list, since returning a block to a non-mergeable free list does not increment this count.
|
2020-07-02 17:00:33 +07:00 |
Roman Maltsev
|
98bac405be
|
Fix out-of-bounds access in CPUQuantizedAdd::onExecute
|
2020-07-02 16:42:09 +07:00 |
jxt1234
|
f3dd23a048
|
Merge pull request #848 from Interfish/master
Add BlstmComputer
|
2020-06-23 20:53:31 +08:00 |
誉阳
|
0d84ab23c5
|
[PATCH 9/9] armv82 support prelu
|
2020-06-19 16:48:01 +08:00 |
誉阳
|
510ef0fe11
|
[PATCH 8/9] fix some bugs
|
2020-06-19 16:48:01 +08:00 |
誉阳
|
3ed28acab1
|
[PATCH 7/9] fix compile bug in android studio
|
2020-06-19 16:48:01 +08:00 |
誉阳
|
7a1f7a03d7
|
[PATCH 6/9] fix android compile bug for armv7
|
2020-06-19 16:48:01 +08:00 |
誉阳
|
103d8a04dc
|
[PATCH 5/9] fix bug
|
2020-06-19 16:48:00 +08:00 |