Commit Graph

30 Commits

Author SHA1 Message Date
xiaying 57d67dde2d [PATCH 191/350] Speed up Int8Tofloat for x86-sse 2021-01-06 15:57:10 +08:00
xiaying a131c8cd26 [PATCH 188/350] [MNN:Speed] Optimize ConvInt8 for remain case 2021-01-06 15:57:09 +08:00
xiaying a943d45796 [PATCH 187/350] Float2Int8 opt for x86-sse 2021-01-06 15:57:09 +08:00
Hui Shu ab711d484c Synchronize internal master to Github 2020-12-15 14:12:35 +08:00
xiaying 14510593f8 [PATCH 26/78] [MNN:Speed] Add avx2 expc8 2020-11-25 18:57:52 +08:00
Hui Shu d6795ad031 Github release 1.1.0 2020-11-05 16:49:17 +08:00
xiaying 67298f2ec5 [MNN:Bugfix] Remove _AVX512_MNNGemmFloatUnit_4 temply 2020-07-04 13:18:05 +08:00
xiaying 3b3ebb72b9 Remove usefula code of x86 opt 2020-07-04 11:39:31 +08:00
xiaying 255db932eb [MNN:Sync] Sync Internal Github 2020-07-04 01:21:30 +08:00
xiaying 8e79b4abc4 Revert "[MNN:Speed] Add AVX512 MNNConvSlideWindowMiddle"
This reverts commit 498b977df2db2ddbd9e6938f8cd2a0c3d5b616d7.
2020-07-04 01:06:20 +08:00
xiaying 97f8b91fee Add AVX512 MNNConvSlideWindowMiddle 2020-07-04 01:06:20 +08:00
xiaying d2dd9ae22a Add _AVX512_MNNGemmFloatUnit_4 2020-07-04 01:06:19 +08:00
xiaying 9f4f6c091d Add ../source/backend/cpu/x86_x64/avx/_AVX512_MNNPackedMatMul.S 2020-07-04 01:06:19 +08:00
xiaying dfe1d06c08 Support multi-thread for 1x1 convolution 2020-07-04 01:06:19 +08:00
xiaying 93ea95ff30 Add MNNUnPackC4ForMatMul_C 2020-07-04 01:06:19 +08:00
xiaying a38a551993 Add MNNPackC4ForMatMul_A 2020-07-04 01:06:18 +08:00
xiaying a750fe0956 Rename _AVX_MNNGemm16x6 as _AVX_MNNPackedMatMul 2020-07-04 01:06:18 +08:00
xiaying d13f1bc0b6 Optimize Strassen Merge C Function for x86 2020-07-04 01:06:18 +08:00
xiaying ac15e9fcec Support get pack mode for each platform 2020-07-04 01:06:18 +08:00
xiaying 5fc7acd37e support transpose, fix bug for not align 2020-07-04 01:06:18 +08:00
xiaying 3dc9cbb740 Optmize CPUMatMul for x86 avx256 by 16x6 540, 320, 540 from 3.8 ms -> 2.5 ms 1024, 1024, 1024 from 39 ms -> 31 ms 2020-07-04 01:06:18 +08:00
xiaying bf6285a178 [MNN:Sync] Sync internal github 2020-04-29 10:12:16 +08:00
xiaying a76be60722 [MNN:Sync] Fix compile bug for windows, fix bug for device not support
fma
2020-04-14 22:52:24 +08:00
xiaying cd26aab2da Add gemm_unit for x86, mla first use mul 2020-04-14 22:39:40 +08:00
xiaying 3f99ae2a0d Optimize x86 by reorder weight 2020-04-14 22:39:40 +08:00
xiaying 48c92a41e7 [MNN:Sync] Sync internal git for remain patch 2020-03-22 20:33:03 +08:00
海境 90e06944db
Update 2020-02-26 09:57:17 +08:00
Zhang 002ac367e4
Update 2019-12-27 22:16:57 +08:00
liqing 73ad3413cc - dynamic computation graph (beta)
- add supports (/express)
	- add tests
	- add benchmarks with it (/benchmark/exprModels)
- Python
	- MNN engine and tools were submitted to pip
	- available on Windows/macOS/Linux
- Engine/Converter
	- add supports for each op benchmarking
	- refactor optimizer by separating steps
- CPU
	- add supports for Conv3D, Pool3D, ELU, ReverseSequence
	- fix ArgMax, Permute, Scale, BinaryOp, Slice, SliceTf
- OpenCL
	- add half transform in CPU
	- add broadcast supports for binary
	- optimize Conv2D, Reshape, Eltwise, Gemm, etc.
- OpenGL
	- add sub, real div supports for binary
	- add supports for unary
	- optimize Conv2D, Reshape
- Vulkan
	- add max supports for eltwise
- Metal
	- fix metallib missing problem
- Train/Quantization
	- use express to refactor training codes
2019-09-26 21:02:07 +08:00
liqing 487a0fbd0a beta 0.2.0.9
- fix quantization tool compiling on Windows
- fix converter compiling on Windows
- fix eltwise optimization on Windows
- separate sse & avx for Windows
- add LeakyReLU support for TensorFlow
- fix reshape, const for TensorFlow
- fix dimension format error for ONNX ops
- optimize winograd, ReLU for OpenCL
- add fp16 availability & dimensions size check-up for OpenCL
- optimize GEMM for arm32
- fix ExpandDims shape calculation when inputs size == 1
2019-09-01 19:25:26 +08:00