xiaying
d13f1bc0b6
Optimize Strassen Merge C Function for x86
2020-07-04 01:06:18 +08:00
xiaying
1567e74e40
Optmize NHWCToNC4HW4 and NC4HW4ToNHWC
2020-07-04 01:06:18 +08:00
xiaying
d88cde6237
Use strassen for Convolution1x1Strassen
2020-07-04 01:06:18 +08:00
xiaying
ae91cab1b8
Support Strassen for new matmul
2020-07-04 01:06:18 +08:00
xiaying
ac15e9fcec
Support get pack mode for each platform
2020-07-04 01:06:18 +08:00
houjiang
91c70ba559
Fix cpu binary.
2020-07-04 01:06:18 +08:00
xiaying
5fc7acd37e
support transpose, fix bug for not align
2020-07-04 01:06:18 +08:00
xiaying
a3a43a6d9b
Add prefetch for _AVX_MNNGemm16x6.S , from 31 ms -> 29 ms
2020-07-04 01:06:18 +08:00
xiaying
3dc9cbb740
Optmize CPUMatMul for x86 avx256 by 16x6 540, 320, 540 from 3.8 ms -> 2.5 ms 1024, 1024, 1024 from 39 ms -> 31 ms
2020-07-04 01:06:18 +08:00
xiaying
cf0896e71a
Add 16x6 GEMM
2020-07-04 01:06:17 +08:00
xiaying
8ea506cb57
Add asm for _AVX_MNNGemmFloatUnit_4, 1024x1024x1024 from 39 ms -> 37 ms
2020-07-04 01:06:17 +08:00
xiaying
8bce0519af
Use AVX to optimize mnnmatrix add, but make slow in mac
2020-07-04 01:06:17 +08:00
xiaying
976e6d0e6f
Use ASM MNNMatrixAdd instead of C
2020-07-04 01:06:17 +08:00
jxt1234
e9cde2ffe4
Merge pull request #945 from krayzemli/fix_BufferAllocator
...
Don't increase reference count when extracting a block from a non-splitable freelist
2020-07-03 11:11:59 +08:00
jxt1234
47af4892d0
Merge pull request #941 from krayzemli/fix_CpuQuantizedAdd
...
Fix out-of-bounds access in CPUQuantizedAdd::onExecute
2020-07-03 10:29:59 +08:00
jxt1234
234f423e54
Merge pull request #942 from krayzemli/fix_CPUQuantizedLogistic
...
Fix CPUQuantizedLogistic::onExecute access to the model which could have been released
2020-07-03 10:07:11 +08:00
Roman Maltsev
b750d419b8
Fix memory leak in CPUDetectionPostProcess
2020-07-02 17:45:32 +07:00
Roman Maltsev
36bd8f1a35
Fix CPUQuantizedLogistic::onExecute access to the model which could have been released
2020-07-02 17:06:58 +07:00
Roman Maltsev
2d3d4a2242
Don't increase reference count when extracting a block from a non-splitable free list, since returning a block to a non-mergeable free list does not increment this count.
2020-07-02 17:00:33 +07:00
Roman Maltsev
98bac405be
Fix out-of-bounds access in CPUQuantizedAdd::onExecute
2020-07-02 16:42:09 +07:00
jxt1234
f3dd23a048
Merge pull request #848 from Interfish/master
...
Add BlstmComputer
2020-06-23 20:53:31 +08:00
誉阳
0d84ab23c5
[PATCH 9/9] armv82 support prelu
2020-06-19 16:48:01 +08:00
誉阳
510ef0fe11
[PATCH 8/9] fix some bugs
2020-06-19 16:48:01 +08:00
誉阳
3ed28acab1
[PATCH 7/9] fix compile bug in android studio
2020-06-19 16:48:01 +08:00
誉阳
7a1f7a03d7
[PATCH 6/9] fix android compile bug for armv7
2020-06-19 16:48:01 +08:00
誉阳
103d8a04dc
[PATCH 5/9] fix bug
2020-06-19 16:48:00 +08:00
誉阳
d2aeea2a5a
[PATCH 4/9] fix concat tf and fix android so path
2020-06-19 16:48:00 +08:00
誉阳
8bcae90720
[PATCH 3/9] remove static register for iOS
2020-06-19 16:48:00 +08:00
誉阳
f0d3e68e99
[PATCH 2/9] arm82 ops:Interp, Padding, TensorConverter
2020-06-19 16:48:00 +08:00
誉阳
dae041e7ec
[PATCH 1/9] fix build ios bug
2020-06-19 16:48:00 +08:00
Interfish
af807b050b
Update: fix release buffer
2020-06-14 17:11:37 +08:00
xiaying
7c5d79fa2b
[OpenCL:Bugfix] Fix bug for reduction error
2020-06-13 13:06:44 +08:00
誉阳
148353c777
fix pool shape and onnx slice, add printShape() method
2020-06-13 11:20:19 +08:00
誉阳
fbaa3b9de3
fix broadcastTo bug
2020-06-12 08:58:57 +08:00
jxt1234
ad4e26c789
Fix bug for onnx gather execute error
2020-06-11 17:50:20 +08:00
xiaying
f5dae040a0
Support broacast for MatMul and fix bug for onnx slice
2020-06-03 22:47:29 +08:00
Interfish
0ae7eab53c
Update: fix vmulq
2020-05-31 15:08:55 +08:00
Interfish
067063ff02
Update: try to fix
2020-05-31 14:44:41 +08:00
Interfish
bd4a4847f6
Merge branch 'master' of https://github.com/alibaba/MNN into alibaba-master
2020-05-31 13:50:56 +08:00
Interfish
65835e5357
Update: typo fixed
2020-05-31 13:43:14 +08:00
Interfish
400e6c71e7
Add BlstmComputer
2020-05-31 00:36:18 +08:00
xiaying
c99c4b7f16
Fix bug for sub swap error
2020-05-28 14:57:14 +08:00
玄裳
0df31a8667
MNN 1.0.0 release sync.
...
- Added Python Express API implemented with pbind11
- Added demos for Python Express API
- Performance improvements for ARM64, ARMv8.2, x86.
- README update.
2020-05-07 18:22:11 +08:00
zjd1988
cfa25aa363
It may not cause excute error, but will cost too much memory
2020-05-06 09:42:01 +08:00
jxt1234
c57d7fc8a6
Merge pull request #772 from wtiandong/master
...
[OpenCL:Bugfix] feat: Add Transpose B support to Opencl MatMul
2020-04-29 18:44:45 +08:00
Tiandong Wang
d7acd36463
feat: Add Transpose B support to Opencl MatMul
...
1. Add Transpose B support to Opencl MatMul
2. Fix binary operator when the second operand is scalar, but its dimension is not 0, e.g, 1x1x1x1.
2020-04-29 12:10:16 +08:00
xiaying
bf6285a178
[MNN:Sync] Sync internal github
2020-04-29 10:12:16 +08:00
xiaying
b3ad25723e
Speed up CPUDetectionOutput for spectial case
2020-04-29 10:02:22 +08:00
和彬
805615b380
avoid same name register in surronding ld1 in gemm common and gemm one
2020-04-29 10:02:16 +08:00
誉阳
4c35997a80
opt int8 asm, -1ms
2020-04-29 10:00:50 +08:00