Commit Graph

717 Commits

Author SHA1 Message Date
xiaying 8330da263a [Sync] Sync internal 2.0.3 2022-07-22 09:59:30 +08:00
xiaying eb51926f84 [MNN:Sync] Sync internal Gitlab to 2.0.2 2022-07-19 13:52:07 +08:00
xiaying 3cf5126828 Fix compute error for fp16 convolution dw 2022-07-15 12:47:48 +08:00
xiaying 8e0f544ea6 Fix Remain treat bug for AVX512-Int8 2022-07-15 12:47:48 +08:00
xiaying 89288cc509 Update README.md, fix CPU Runtime compile bug for Android - armv8.2 2022-07-12 12:43:06 +08:00
jxt1234 7102b4890b
Merge pull request #1821 from snadampal/aarch64_linux_fp16
backend: cpu: runtime: linux aarch64 hwcaps setting for ARMV82
2022-07-12 11:32:18 +08:00
Brian Li 2801621ea5 Add missing headers 2022-07-12 01:28:27 +08:00
xiaying 2ec9495719 [MNN:Sync] Sync 2.0. 2022-07-11 10:56:37 +08:00
hebin 4679f848c4 fix windows compile on avx512 2022-06-29 16:43:49 +08:00
雁行 6cf30db8f6 Opt DepthToSapce from raster to coreml execution. 2022-06-29 14:59:46 +08:00
xiaying c02c8cc145 Fix compile bug for ios simulator for m1 2022-06-28 14:10:51 +08:00
xiaying 2d13d6a495 2022-06-27 10:52:11 +08:00
xiaying d3ffdf4229 [MNN:Sync] Sync internal gitlab 2022-06-24 18:30:05 +08:00
xiaying aeaac3fde3 [MNN:Sync] Sync internal gitlab 2022-06-10 10:39:50 +08:00
xiaying c5592d284b Fix bug for buffer not alloc enough 2022-06-09 15:57:17 +08:00
xiaying eac58ab5f5 Fix bug for expanddim < -1 2022-06-08 15:39:04 +08:00
Sunita Nadampalli aa7dc95047 backend: cpu: runtime: linux aarch64 hwcaps setting for ARMV82 2022-06-07 17:33:27 +00:00
xiaying 2a0427775e Fix bug for Expr16 2022-06-07 11:29:05 +08:00
xiaying f093e0d170 [Bugfix] fix bug for Arm82Interp two quote 2022-05-30 17:54:34 +08:00
xiaying c95df2a932 Fix compile bug for arm82-armv7a 2022-05-30 17:24:20 +08:00
xiaying d98e274ab1 fix compile bug for build mini 2022-05-30 17:24:20 +08:00
Yulv-git 8a43ea2011 Fix some typos in source/. 2022-05-27 23:46:44 +08:00
xiaying 1f0c6d4f21 Fix bug for metal use float binary intead of int binary 2022-05-18 14:31:04 +08:00
tianbu.xsw 5bb30c3c93 opencl bugfix for binary and geometryGather 2022-05-18 14:31:04 +08:00
xiaying 30c66fc79c Fix bug for Metal backend copy crash, reduce memory alloc when freq resize 2022-05-18 14:31:04 +08:00
xiaying 44f0fa62be Fix compile bug for windows compile 2022-05-13 14:02:37 +08:00
jxt1234 1311bc9255
Merge pull request #1843 from jokerz0624/arm82_support
feat(ROI): add ARM82/AVX support for ROIPooling/ROIAlign
2022-05-06 20:00:59 +08:00
jxt1234 6487dd8a10
Merge pull request #1892 from xthan/patch-1
Fix bug in MNNSigmoidLowp
2022-05-06 19:56:38 +08:00
xiaying c0aee19d32 [Sync] Sync internal gitlab 2022-05-06 19:51:20 +08:00
Xintong Han 35b1ef182c
Fix bug in MNNSigmoidLowp
When dataSize is not a multiple of 4, the calculation is wrong as it does not move the dst address after the for loop.
2022-03-28 13:23:01 +08:00
xiaying 0c718e552b [Sync] Sync internal Gitlab 2022-02-18 11:30:27 +08:00
Joker 02a1565bbb feat(ROI): add ARM82/AVX support for ROIPooling/ROIAlign 2022-01-29 18:03:25 +08:00
xiaying 1b626d72c3 [MNN:Sync] Sync internal gitlab 2022-01-04 10:50:40 +08:00
xiaying a2e1ed4c67 [MNN:Bugfix] Fix compile bug for Arm82 - armv7a 2021-12-13 11:20:49 +08:00
xiaying b3c5feefdb [Converter:Bugfix] Support Onnx::TopK for dynamic shape 2021-12-10 15:16:28 +08:00
tianhang.yth a14ef5e265 update MetalLib.h for low version macos 2021-12-01 12:13:50 +08:00
xiaying 69dba73dc7 [MNN:Sync] Sync internal gitlab
Main Feature:
1. Add OpenCV API and Numpy API Support
2. Protobuf move into MNN
3. Add more op for torchscript convert
4. Add recompute to speed up geometry compute
5. Add ModuleBasic Test
2021-11-30 10:10:53 +08:00
xiaying 71cd04e91c Fix compile bug for sse fma 2021-11-19 10:23:50 +08:00
aaron-wu f995ca6a8f fix(op): replace the _mm_load_ps and _mm_store_ps with _mm_loadu_ps and _mm_storeu_ps, to avoid segment errors when not aligned 2021-11-16 16:07:50 +08:00
aaron-wu e35ea54638 feat(op): Add SSE instruction set optimization for ROIAlign and ROIPooling op 2021-11-15 14:53:12 +08:00
xiaying 95402e79b4 [MNN:Bugfix] Fix Compile bug for other backends 2021-11-12 17:49:50 +08:00
jxt1234 e86c0ba30a
Merge pull request #1746 from no5-aaron-wu/dev_aaron_wu
add CPUROIAlign op and unit-test and so on
2021-11-12 17:13:04 +08:00
xiaying 361bbc90d5 Fix bug for DenseConvolutionTiledExecutor opt not care width = 1, but kernel X >1 and padX > 0 2021-11-12 09:56:59 +08:00
xiaying 0bcc70922d [MNN:Bugfix] Fix compile bug for gnu of arm82 /bf16 2021-11-10 17:52:30 +08:00
aaron-wu 074bf5e275 fix(op): add assert to var samplingRatioW and samplingRatioH 2021-11-09 11:20:22 +08:00
aaron-wu 8e773602bf fix(schema): merge parameters for RoiPooling and RoiAlign into one table as RoiParameters 2021-11-09 11:11:27 +08:00
aaron-wu 7afb6abd1b fix(op): precalculate pos and area which shared by all channels; add defense programming for boundary case 2021-11-09 10:00:51 +08:00
aaron-wu 094d5697ae feat(op): add neon realization of CPUROIAlign op 2021-11-09 10:00:51 +08:00
aaron-wu 1af7d6f4d1 fix(op): fix compile error in linux system 2021-11-09 10:00:50 +08:00
aaron-wu cfac71f919 feat(op): add CPUROIAlign op and uint test 2021-11-09 10:00:50 +08:00
xiaying 75413768b0 Fix bug for onResize of CPURNNSequenceGRU 2021-11-04 12:55:59 +08:00
xiaying 2fdd11e718 [MNN:Bugfix] Use fabsf instead of abs 2021-11-02 12:06:10 +08:00
jxt1234 0b69ba78d2
Merge pull request #1739 from jun-lv-17/fix-depthwiseconvint8-issue
Fix conv1d depthwise conv int8 calculation issue.
2021-11-02 11:39:32 +08:00
xiaying b1d923e76c Fix compile bug for bf16 when sse / neon is close 2021-11-02 11:34:14 +08:00
xiaying ed8a2da0b4 [MNN:Bugfix] Fix bug for CPURaster for fuse singleConvert of dim == 3 2021-11-02 10:56:35 +08:00
xiaying 0fdb9d768f Add Clamp for fp32 -> fp16 2021-11-01 14:25:34 +08:00
aaron-wu 9acad284fa fix(op): increase compatibility of NCHW format for inputs[1](rois) in CPUROIPooling op 2021-10-30 15:18:38 +08:00
jun.lv 0b299e951c Fix conv1d depthwise conv int8 calculation issue. 2021-10-29 18:58:58 +08:00
jxt1234 e121c1527a
Merge pull request #1718 from jokerz0624/acc/GridSample
improvement(GridSample): give areaRemain one better handle in Arm82
2021-10-25 10:56:14 +08:00
jxt1234 70cd0c5b27
Merge pull request #1724 from DaydreamCoding/patch-10
fix memory leak
2021-10-25 10:55:21 +08:00
Joker af9c543115 improvement(ConvWino): use fma to accelerate computation 2021-10-22 14:24:29 +08:00
xiaying da3688119d [MNN:Bugfix] Fix shape compute and content bug for batch > 1's rnngru 2021-10-20 11:56:16 +08:00
DaydreamCoding 0ec11813f0
fix memory leak 2021-10-15 13:39:20 +08:00
xiaying 7f50ae689d Fix zero shape bug for TensorArray 2021-10-14 15:00:01 +08:00
Joker 6f8dafdd5b improvement(GridSample): give areaRemain one better handle in Arm82 2021-10-12 12:15:31 +08:00
jxt1234 1b2e168d6e
Merge pull request #1678 from jokerz0624/acc/GridSample
improvement(GridSample): accelerate GridSample in CPU/Arm82/AVX2/AVX512
2021-10-08 19:34:24 +08:00
jxt1234 f5101c9b2b
Merge pull request #1712 from jun-lv-17/o-master
Fix CPUEltwiseInt8Add calculation issue.
2021-10-08 19:33:28 +08:00
jxt1234 f3bf2e2a3f
Merge pull request #1713 from jokerz0624/feat/support_A15
feat(ios): add Apple A15 support in CPU family
2021-10-08 19:32:54 +08:00
Joker 5feeb1beb9 feat(ios): add Apple A15 support in CPU family 2021-09-30 20:34:08 +08:00
jun.lv a5fa7eb446 Fix CPUEltwiseInt8Add calculation issue. 2021-09-30 14:58:01 +08:00
Joker 640993bc56 improvement(GridSample): give areaRemain one better handle in AVX2/AVX512 2021-09-30 11:01:23 +08:00
tianbu.xsw 04bb665a00 fix raster op type print error 2021-09-28 11:35:24 +08:00
tianbu.xsw 18ba76bea6 file write bugfix 2021-09-28 11:35:18 +08:00
jxt1234 ec16cae757
Merge pull request #1705 from jokerz0624/fix/AVX512_detecting
fix(AVX512): fix detecting AVX512 features on Darwin
2021-09-26 17:17:44 +08:00
xiaying a867e23543 Fix compile bug for some ndk 2021-09-25 15:23:06 +08:00
xiaying b550871abb Fix bug for raw cpu winograd crash 2021-09-24 16:05:38 +08:00
周科 e4f0fd58cc fix(AVX512): fix detecting AVX512 features on Darwin 2021-09-24 10:06:47 +08:00
Joker c41503556d improvement(GridSample): accelerate GridSample in CPU/Arm82/AVX2/AVX512 2021-09-22 16:04:07 +08:00
xiaying 03c7b5347b [MNN:Sync] Sync internal Gitlab 2021-09-18 15:52:30 +08:00
xiaying d4d040c57e Fix bug for single convert for NHWC <-> NC4HW4 don't care stride 2021-09-17 15:22:44 +08:00
jxt1234 bc355e84ca
Merge pull request #1635 from DaydreamCoding/patch-7
fix BF16 x86_64 Apple M1 failed
2021-09-15 14:13:53 +08:00
jxt1234 40526149f2
Merge pull request #1640 from jiuzhuanzhuan/master
fix bug of header file missing when build for aarch64 with open ARM82
2021-09-15 14:12:06 +08:00
xiaying 0c26d47a84 [MNN:Bugfix] Fix bug for Squeeze for axis < 0 2021-09-14 21:02:11 +08:00
shufu 8e464d290c feat(OpenCL):add GridSample support on OpenCL Backend 2021-09-08 14:11:25 +08:00
xiaying 3a0fb480d2 Fix crash bug for origin quan model 2021-09-06 17:18:29 +08:00
xiaying 55ce936d9c Support scale input for Interp 2021-09-06 17:18:29 +08:00
jxt1234 d21fd2a910
Merge pull request #1650 from jokerz0624/vulkan_GridSample
Vulkan GridSample
2021-09-04 07:49:32 +08:00
tianbu.xsw 612199d0ee quant weight valid range issue 2021-09-03 17:06:24 +08:00
Joker 3461faa390 feat(Vulkan): add GridSample op support in Vulkan backend 2021-09-01 15:00:56 +08:00
xiaying 5dfe97e4c8 Fix bug for dim = 0's shape compute 2021-08-30 13:34:11 +08:00
xiaying 575a2c97dd Fix bug for CH Fused but W not fused fastblit error 2021-08-25 17:23:36 +08:00
qingzhu 4eda674234 fix bug of header file missing when build for aarch64 with open ARM82 2021-08-24 20:06:46 +08:00
DaydreamCoding 264a6039bb fix BF16 x86_64 Apple M1 failed 2021-08-19 18:39:51 +08:00
yuyang f1997b9a5f [MNN:Bugfix]fix illegal opcode(MV) bug for some GNU compiler 2021-08-18 18:13:00 +08:00
DaydreamCoding 955c213661
MSVC adapt for BF16 2021-08-15 17:17:32 +08:00
jxt1234 6ad2a632fd
Merge pull request #1609 from DaydreamCoding/patch-5
adapt MSVC
2021-08-12 13:00:14 +08:00
jxt1234 b8d8fc9d73
Merge pull request #1612 from MambaWong/master
fix libMNN.so: undefined symbol
2021-08-12 12:56:32 +08:00
xiaying e9d38acd6b Fix Prelu bug for multi-batch 2021-08-11 10:21:31 +08:00
xiaying 312b003f4c [MNN:Bugfix] Fix bug for ARM TopKV2 different 2021-08-05 19:41:37 +08:00
jason_w 6aca4ea6b9 fix symbol lookup error: libMNN.so: undefined symbol:
option(MNN_SUPPORT_TFLITE_QUAN "Enable MNN's tflite quantized op" OFF)
2021-08-05 11:08:13 +08:00
DaydreamCoding ca9700ec42
adapt MSVC 2021-08-03 17:29:54 +08:00
xiaying 1a7d0a6173 Optimize for Transpose compute 2021-08-03 15:50:39 +08:00
xiaying 9c5e6e13b5 Fix bug for fuse for OpCommonUtils 2021-08-03 15:50:06 +08:00
xiaying d8fc15d84b [MNN:Sync] Sync internal github
Commits:
        8148ae75c  弗人  bugfix
        14cb8ec7f  弗人  [Converter:Bugfix] bugfix for onnx depthwise convtranspose
        476fbcd90  雁行  [MNN:Feature] Open AVX cast and bugfix for contentCFG.
        5e26b9fd3  雁行  [Test:Feature] Add android test.
        37e147b25  雁行  [MNN:Bugfix] Bugfix for floordiv.
        144c185f5  tianbu.xsw  hangxing fix hiai
        b4fd429d6  tianbu.xsw  updateCacheFile bugfix -- update cache size
        d4ba572a8  雁行  [MNN:Bugfix] Support int8 in AVX2 and some Bugfix.
        43061f07e  xiaying  [MNN:Bugfix] Fix bug for module mode run part of model
        398cc5ab6  tianhang.yth  refactor demo
        736380600  xiaying  [Express:Bugfix] Fix memory leak for copy branch
        b8dab0a27  tianhang.yth  MNNFloat2Int8 sizeQuad=0 crash fix
        94b95bfed  ghz  [BugFix]1.Better method for fast pack valid check
        6a921f85e  xiaying  [Converter:Bugfix] Fix bug for Fuseconsttosubgraph
        5f77ae889  tianhang.yth  numThread bugfix
        a807ef879  tianhang.yth  add createSession(configs, runtimeinfo) API, add pymnn demo, pymnn logcat bugfix
        ad05409d3  xiaying  [MNN:Bugfix] Fix bug for StaticModule's sizecompute overflow, add error print for module mode
        9d81b8299  xiaying  [MNN:Bugfix] Fix bug for Unique op for output size = 1
        03b15e9af  xiaying  [Test:Feature] Add MatMulBConst Test, Fix bug for single Convert
        c944a76ee  tianhang.yth  add auto backend and getSessionInfo @tianbu
        91fa7267b  ghz  [BugFix]1.fix the error in eP check
        bf0041f77  ghz  [BugFix]1.Fix the logic error in eP check. 2.Fix the sp align error
        693871672  雁行  [CPU:Bugfix] rm adrp instruction for clang compiler bug.
        1b8f6b3d8  ghz  1.Fix the wronly use of r13 in arm32 version. 2.Fix the missing callee register save and restore process.
        feb7ecc4c  弗人  modify log of python offline quant
        040c04811  ghz  [BufFix]1.replace platform-related regs. 2.fix the same problem in arm32 version
        609f37db8  弗人  add log for python quant, python convert
        5511dd30a  ghz  [BugFix]1.Add testcases in SparseConv to check all functional code branch. 2. Fix the bug in "MNNPackC4ForMatMul_A.S" in arm64, which is caused by the missing check of eReal parameter.
        a93ff9280  tianhang.yth  add tf.Unique op support
        9729ff773  allen.lk  [Bugfix] Fix one arm32 instruction syntax that clang works but gcc DOES NOT work. use index instruction instead.
        297c1ad14  雁行  [Expr:Bugfix] bugfix for tensor content used by shape compute.
        ef8c369e3  弗人  catch exception
        07c2dd670  弗人  add dependence to setup, base64 encode url, add time log
        177e590c1  弗人  [Python:Feature] add aliyun log for python quant tool
        40a7928cf  allen.lk  [Debug:Sparse] 1.Add group parameter in torchscript converter. 2. Stop split running to avoid memory corruption when check failed in TransformGroupConvolution 3. fix Op split issue in TransformGroupConvolution
        3bdea84a1  allen.lk  [Debug:Sparse] Fix and warning one kind of segmentfault cause by memory corruption when resize ConvolutionWinograd.  Avoid to use some registers as arm restriction.
        c3c6fbdbd  allen.lk  [Debug:Sparse] Fix and warning one kind of segmentfault cause by memory corruption when resize ConvolutionWinograd.  Avoid to use some registers as arm restriction.
        bc590eee4  雁行  [Converter:Bugfix] bugfix for onnx instancenormalization convert.
        d8918593f  tianhang.yth  add auto backend and getSessionInfo @tianbu
        83a198ed7  杭行  update
        d0dd3e09b  杭行  update
        99540202e  xiaying  [Converter:Optimize] Opt the tensor convert insert
        333d8db82  allen.lk  [Debug:Sparse] Fix All platform-register r9 / x18 issue on arm32 and arm64.
        db5994672  杭行  merge
        6293de7b8  tianbu.xsw  fix pymnn updateCacheFile
        5c2e11cb1  tianbu.xsw  do updateCache in createSession
        6e7641ff4  tianbu.xsw  do not limit cacheFile for a model
        5287a65e4  tianbu.xsw  bugfix
        52ba53a91  tianbu.xsw  revert pymnn api
        60284d830  tianbu.xsw  bugfix
        6d8077490  tianbu.xsw  rename updateCacheFile api params
        3cb172710  tianhang.yth  updateCacheFile API size default value is 0
        c5b69aabf  tianbu.xsw  updateCacheFile python api fix
        5d5da7aa5  tianbu.xsw  reflector code
        5707877a4  雁行  [MNN:Speed] Speedup for softmax in x86 and arm.
        2a211825c  tianbu.xsw  reflector code for updateCacheFile
        76db3a835  tianbu.xsw  [Cache Feature]: Add updateCacheFile API for increment cache
        b06b0fd43  allen.lk  [Debug:Sparse] Fix and warning one kind of segmentfault cause by memory corruption when resize ConvolutionWinograd.  Avoid to use some registers as arm restriction.
        e68bfa495  雁行  [Converter:Feature] Add UUID when model convert.
        a9cb935dc  xiaying  [MNN:Speed] Support c4nhwc for more fastblit
        019f40353  xiaying  [Converter:Refractor] Reduce memory used by MNNConvert(bert from 5G ->         1G)
        d2a6d3d05  xiaying  [MNN:Bugfix] Fix bug for identity output not find
        604d0801b  xiaying  [Converter:Bugfix] Fix bug for FuseGeLu
        4bada2367  xiaying  [MNN:Refractor] SegmentMean rewrite as segment
        82070e708  xiaying  [MNN:Bugfix] Fix bug for GeometryBinary
        e8ea4266e  xiaying  Fix bug for ShapeTensorConvert compute for dim = 1 error
        1f1cf1991  xiaying  [Tools:Bugfix] Fix system compability for fastTestOnnx
        6f422efe2  xiaying  [Tools:Bugfix] Remove color for checkDir for easy to dump
        968f7ec88  xiaying  [MNN:Speed] Support turn broadcast binary to loop
        3e7aaf46f  xiaying  [MNN:Refractor] Set Convolution1x1Strassen support variable input/output ptr
        1f65ab163  xiaying  [MNN:Bugfix] Fix bug for mini mnn can't convert model
        d65953d47  xiaying  [MNN:Bugfix] Fix bug for armv7a - android-14 + ARM82
        8b68be45c  xiaying  [MNN:Feature] Add segment
        8a8f264f5  xiaying  [Vulkan:Bugfix] Remove unuseful print
        025bb0fda  xiaying  [Converter:Bugfix] Fix bug for oneof don't support
        43900251e  tianbu.xsw  enable setCacheFile python API
        ebfb05c74  tianbu.xsw  [Metal Feature] support metallib obtain from walle transfer task
        9665c0a79  弗人  add check for path in json file
        c66fef224  xiaying  [Converter:Bugfix] Fix bug for oneof don't support
        42f192852  xiaying  [MNN:Bugfix] Fix bug for not set output / saveTensor into origin Schedule's outputs
        1b95354ff  雁行  [Feature]: Support shape compute for SetDiff1D, and null input for Prod.
        83966d043  xiaying  [Test:Feature] Add test for static module
        42d1be933  xiaying  [Converter:Bugfix] Fix bug for mnn convert and static model add more outputs for origin model
        9067531c3  xiaying  [Converter:Refractor] formatLicence
        99558bed9  xiaying  [Converter:Bugfix] Count the op for unuseful and controlflow
        4f6da0fa7  allen.lk  [Feature:GRUMultiOutput] fix multi output dimension type
        c6b219bce  xiaying  [Converter:Feature] Turn torch converter to object
        dd4e68a37  xiaying  [Converter:Feature] Support dump supported ops
        80b6a60a3  xiaying  [Converter:Info] If has output name, print output name instead of computed
        015278fc3  xiaying  [MNN:Refractor] Revert IfModule's debug info
        23ac967c4  xiaying  Don't transform for multi-input convolution/deconvolution
        b02b0d4de  xiaying  Fix bug for multi-input for conv1d
        254d8b1d4  xiaying  Fix bug for Conv1dSqueezeMove for multi input convolution 1d
        d47d0b9ca  xiaying  Fix bug for CPURaster's fuse nc4hw4
        357c5bd33  xiaying  Fix ConvBiasAdd for conv's inputs op > 1
        55b1f0c9c  xiaying  [Converter:Bugfix] Don't transform for multi-input convolution/deconvolution
        1902a30f5  xiaying  [Converter:Bugfix] Fix bug for Conv1dSqueezeMove for multi input convolution 1d
        c23fe617b  xiaying  [MNN:Bugfix] Fix bug for multi-input for conv1d
        8ff018426  xiaying  [MNN:Bugfix] Fix bug for CPURaster's fuse nc4hw4
        d4e8cd602  xiaying  [Converter:Bugfix] Fix ConvBiasAdd for conv's inputs op > 1
        846266b42  tianbu.xsw  return when program and tune both nullptr
        fd67c76a9  xiaying  [Converter:Bugfix] DepthwiseConvWeightMerge only valid for tflite
        e77a242c4  xiaying  [Converter:Feature] Support tflite's half pixel
        be054c377  tianbu.xsw  [OpenCL Bugfix] do not rewrite cache when binary program is produced
        51e65aa35  xiaying  [Converter:Feature] Support tflite for fp16 and multi-input convolution
        1ccdfdeb5  tianbu.xsw  redefine svm macro name
        31234d372  tianbu.xsw  [OpenCL SVM] add macro for only use wrapper
        d739e35da  xiaying  [MNN:Bugfix] Fix compile bug for grid op
        24ab13c79  Joker  feat(arm82): add GridSample op support in arm82 backend, AVX(by xiaying)
        7b142978e  xiaying  [AVX512:Speed] Optimize for e <= 8
        5f6febe7b  tianbu.xsw  code refactor
        998d91b57  xiaying  [Express:Speed] Merge submodule for speed
        22c89146f  tianhang.yth  fix alpha div by zero bug and arm server compile bug
        8f829a170  tianbu.xsw  [OpenCL Pad] unify conv/deconv pad computing
        4a28f603e  xiaying  [Express:Speed] Shared Const for All Submodule
        c74cf28f3  xiaying  [MNN:Refractor] Seperate Const init and schedule
        2a1eebb7a  xiaying  [Tools:Bugfix] Fix bug for modelTest.py count size
        72f04008c  xiaying  [MNN:Refractor] Delete unuseful const op
        1e735d03c  xiaying  [Converter:Bugfix] Fix bug for static module gen
        4dfadbc6e  xiaying  [MNN:Refractor] Rewrite const init mode
        1fcf0417a  xiaying  [MNN:Bugfix] Fix bug for deconvolutin multi-input for multi-batch
        41d429cfd  xiaying  [Train:Bugfix] Revert convert NCHW for mnistTrain
        f947a5f01  xiaying  [Test:Feature] Add testTrain
        dad59b6f6  tianbu.xsw  move realize code from Backend.hpp to Tensor.cpp
        cf4473ad1  xiaying  [Train:Bugfix] Support pad for GeometryPoolGrad
        91ab13734  xiaying  [MNN:Bugfix] Fix compile bug for avx512
        742e80f47  xiaying  [MNN:Refractor] Opt the logic for checknan judge
        12543b841  xiaying  [ARM82:Bugfix] Fix compile bug for ios
        3a2b0a49f  xiaying  [ARM82:Speed] Opt Pack / Unpack for armv8
        c0f1995cd  xiaying  [ARM82:Speed] Opt MNNPackC8FP16 and MNNUnpackC8FP16 by asm
        e0fc77dcf  xiaying  [MNN:Speed] Fix bug for DeconvolutionWithStride for C4HW4, open it
        584bec578  xiaying  [MNN:Bugfix] Fix bug for format set error for onnx
        d5bd4148d  xiaying  [MNN:Bugfix] Fix bug for format set error for onnx
        b00265841  xiaying  [MNN:Bugfix] Fix bug for SparseConvolutionTiledExecutor
        bb09188ac  xiaying  [Test:Bugfix] Fix bug for run into sparse auto
        426d1babd  xiaying  [MNN:Refractor] Small bugfix for Group convolution and pack
        7d0ea1c46  tianbu.xsw  [testModel Feature] support testModel.out input resize
        4169c54ce  xiaying  [MNN:Bugfix] Fix bug for checkNAN for origin
        412a82222  xiaying  [Test:Bugfix] Fix bug for CheckNAN's error of matmul
        319b1d425  xiaying  [MNN:Bugfix] Fix bug for multi-batch for ConvInt8
        050b728a6  xiaying  [Test:Bugfix] Use NCHW for ConvInt8Test
        7db3423a1  xiaying  [OpenCL:Bugfix] Fix bug for opencl::image,opencl::buffer for C4HW4
        adcec6a7f  xiaying  [Vulkan:Bugfix] Fix bug for invalid tensor size limit
        d2a7cf4e9  xiaying  [Vulkan:Bugfix] Fix bug for onCopyBuffer of nc4hw4
        557bebdd3  xiaying  [MNN:Bugfix] Fix bug for BF16-ARM32
        bbe186649  tianbu.xsw  [Update AUTO mode]: fix MNN_FORWARD_AUTO choose priority
        6deb23439  xiaying  [MNN:Bugfix] Fix bug for GeometryBinary don't care about NC4HW4 same size
        b137590e4  xiaying  [MNN:Bugfix] Fix bug for GeometryBinary don't care about NC4HW4 same size
        7003558ea  xiaying  [Converter:Bugfix] Fix bug for onnx pad for serveral case
        b5f8cae5a  xiaying  [Converter:Bugfix] Fix bug for onnx pad for serveral case
        29b09e125  xiaying  [MNN:Bugfix] Fix bug for arm64-bf16
        42ce00770  xiaying  [MNN:Bugfix] Fix bug for ARM64 - float
        a2d89fc18  雁行  [Converter:Feature] Support Binary Unary for Torch.
        7f1c0deb1  xiaying  [MNN:Bugfix] Fix bug for Raster for Int8
        8335a6f18  tianbu.xsw  [OpenCL Shared Memory] modify data_format method
        b359e031b  xiaying  [ARM82:Bugfix] Fix bug for arm82 and speed up pack / unpack c8
        24bf3fc88  雁行  [Convert:Feature] Support LayerNormFuse without gamma beta.
        3e629624b  xiaying  [MNN:Bugfix] Fix bug for float - armv7a
        2b7908ec7  tianbu.xsw  modify workItemSize
        3cee0d413  xiaying  [MNN:Bugfix] test wrong clear
        9cbbfb998  xiaying  [MNN:Bugfix] fix compile bug for c++ < 14
        2d7a44484  xiaying  [MNN:Bugfix] fix compile bug for c++ < 14
        eb7d0cb53  xiaying  [Test:Bugfix] Don't test for NC4HW4 directly
        7b40ca8d1  xiaying  [MNN:Bugfix] Fix bug for ConvolutionGroup
        2694d8a91  xiaying  [MNN:Bugfix] Fix bug for CPUGridSample
        f89af60f6  xiaying  [MNN:Bugfix] Fix compile bug for arm
        a151abcdd  xiaying  [MNN:Bugfix] Fix bug for convert for int8 / int16
        b254dbe61  雁行  [MNN:Bugfix] Bugfix for Conv onClone.
        d08150631  xiaying  [MNN:Bugfix] Fix bug for fast rcnn
        e5568a0df  xiaying  [MNN:Bugfix] Fix bug for CPURaster treat NC4HW4 fast blit
        128318933  雁行  [Raster:Bugfix] bugfix for Raster merge onResize.
        03caacbea  xiaying  [MNN:Bugfix] fix bug for CPUDeconvolution and Convolution1x1Strassen for iw != ow
        e1e3c245c  xiaying  [MNN:Bugfix] Fix bug for ConvolutionWinograd
        2524cbc6d  xiaying  [MNN:Bugfix] Fix bug for CPUSoftmax
        44ec79b8f  xiaying  [MNN:Bugfix] Fix bug for CPUConvolutionDepthwise / Scale / DeconvolutionDW
        21ae956ce  xiaying  [MNN:Bugfix] Fix bug for Multi-Batch-TiledExecutor
        09a5069c7  xiaying  [MNN:Speed] Add offset for src and dst
        6776c6784  xiaying  [MNN:Bugfix] Fix bug for trainable model
        cc83ae30b  xiaying  [MNN:Bugfix] Fix bug for trainable model
2021-07-29 11:47:13 +08:00
allen.lk 36da4f10ec Fix one arm32 instruction syntax that clang works but gcc DOES NOT work. use index instruction instead. 2021-07-12 14:34:53 +08:00
xiaying f0f961fb21 [MNN:Bugfix] Fix bug for ShapeTensorConvert compute for dim = 1 error 2021-07-01 16:06:33 +08:00
xiaying 8bf22bca17 [MNN:Bugfix] Fix bug for rearrange for convint8 crash 2021-06-29 12:13:33 +08:00
xiaying 56255c7d84 [MNN:Bugfix] Bugfix for quan x86 2021-06-24 14:06:10 +08:00
xiaying 01c8d87189 [ARM82:Bugfix] Fix compile bug for 32 bit so open arm82 2021-06-24 11:53:13 +08:00
tianbu.xsw a7981e2180 unify conv/deconv pad computing 2021-06-24 10:40:40 +08:00
xiaying 3c8d3d11e0 Optimize for e <= 8 2021-06-24 10:39:07 +08:00
tianhang.yth 4eb1096b9c fix alpha div by zero bug and arm server compile bug 2021-06-24 10:38:55 +08:00
Joker df80f7328b improvement(arm82): recover the accelerating code for MNNPackUNIT/MNNUnpackUNIT 2021-06-23 15:27:52 +08:00
Joker 4184860ae4 feat(arm82): add GridSample op support in arm82 backend 2021-06-23 14:10:31 +08:00
xiaying 935f70e790 Fix bug for deconvolutin multi-input for multi-batch 2021-06-22 20:40:36 +08:00
xiaying 2733909863 Support pad for GeometryPoolGrad 2021-06-22 19:17:05 +08:00
xiaying f6422c315c [MNN:Bugfix] Fix bug for ConvInt8TiledExecutor onClone 2021-06-16 16:20:42 +08:00
xiaying 8d9f86bc4a fix compile bug for c++ < 14 2021-06-16 15:24:46 +08:00
xiaying 02741a55ff [MNN:Bugfix] Fix bug for StridedSlice for begin shape << 0 2021-06-15 21:49:46 +08:00
hush-alibaba 58545d6ca1
Synchronize internal github for version 1.2.0 (#1518) 2021-06-11 17:17:13 +08:00
jxt1234 dba3085e3b
Merge pull request #1492 from Napoleon-Jm/fix/unused_tmp_obj
fix: remove unused tmp obj.
2021-05-25 13:28:10 +08:00
jxt1234 f60567b45b
Merge pull request #1491 from alibaba/feature/bugfix
Fix bug for newAxis stridedslice compute shape error
2021-05-25 13:27:21 +08:00
恺心 8494d4ef72 fix: remove unused tmp obj. 2021-05-24 18:35:55 +08:00
xiaying 5e7cce05ef Fix bug for newAxis stridedslice compute shape error 2021-05-24 15:12:27 +08:00
恺心 3fe3faab29 fix: buffer's management on gl backend need follow the rule from storage type. 2021-05-20 14:35:27 +08:00
xiaying 6277ad84d8 Fix bug for corner data not right for cuda-bilinear 2021-05-18 16:21:28 +08:00
jxt1234 3ab8725569
Merge pull request #1395 from WillTao-RD/master
fix opencl runtime error of MSVC; add 'clGetMemObjectInfo' wrapper
2021-05-08 19:52:33 +08:00
tianhang.yth d85952d826 sync from internal repo 2021-04-28 18:02:10 +08:00
DaydreamCoding 84ede68d86
bugfix for schedule by path
fix for setup initPipelineInfosFromNet when schedule by path
2021-04-25 19:38:43 +08:00
DaydreamCoding ff07663ca8
fix Schedule judge variable 2021-04-25 19:04:32 +08:00
雁行 183f0f803d [PATCH 7/7] [Arm82:Bugfix] Add HardSwish and fix NEON Bug. 2021-04-22 13:51:14 +08:00
雁行 453264cb5e [PATCH 6/7] [QAUNT:Bugfix] Bugfix for IDST encode when weight value = 1. 2021-04-22 13:51:14 +08:00
tianbu.xsw f26783a84e [PATCH 2/7] [OpenCL Feature] bugfix for HardSwish 2021-04-22 13:51:13 +08:00
雁行 ef2f7503a1 [PATCH 1/7] [Arm82:Bugfix] Add HardSwish and fix NEON Bug. 2021-04-22 13:51:13 +08:00
xiaying b62c2eb687 [BF16:Bugfix] Fix compile bug for BF16 in NO SSE and NO NEON 2021-04-21 15:54:01 +08:00
xiaying c2a2a24e8e [IOS:Bugfix] Fix compile bug for IOS Demo 2021-04-21 15:01:56 +08:00
xiaying 3c4ba7c595 [MNN:Sync] Sync internal gitlab 2021-04-16 14:50:43 +08:00
tianbu 9f693b108e [PATCH 27/36] [CUDA Feature] bugfix for multi-input depthwiseconv 2021-04-16 14:29:38 +08:00
tianbu.xsw 0c3cb3c689 [PATCH 26/36] [OpenCL Feature] bugfix for resizeSession 2021-04-16 14:29:38 +08:00
xiaying 3e06cabf38 [PATCH 22/36] Fix bug for CPUScatterNd crash for invalid input 2021-04-16 14:29:37 +08:00
tianbu.xsw 089253a9a0 [PATCH 21/36] delete deconv_2d_buf kernel 2021-04-16 14:29:37 +08:00
tianbu.xsw 1c003f8af8 [PATCH 20/36] merge image and buffer kernel 2021-04-16 14:29:37 +08:00
tianbu.xsw 64ab57c5f4 [PATCH 18/36] delete unused log 2021-04-16 14:29:37 +08:00
tianbu.xsw 7422afe0ff [PATCH 17/36] add MNN_OPENCL_BUFFER_CLOSED macro 2021-04-16 14:29:37 +08:00
tianbu.xsw 7a314796e3 [PATCH 16/36] reback some revises 2021-04-16 14:29:37 +08:00
tianbu.xsw d9786e5351 [PATCH 15/36] [OpenCL Feature] support deconvolution for OpenCL Buffer 2021-04-16 14:29:36 +08:00
弗人 9621a1f8ad [PATCH 14/36] bug fix for old type 4 models 2021-04-16 14:29:36 +08:00
弗人 a3bbcd01b9 [PATCH 12/36] [Train:Featue] support full quant for train quant, encode when save 2021-04-16 14:29:36 +08:00
弗人 5aae654351 [PATCH 09/36] remove useless code 2021-04-16 14:29:36 +08:00
弗人 95bcb842a0 [PATCH 08/36] [Train:Feature:Bugfix] train quant support full quant 2021-04-16 14:29:36 +08:00