Currently the yield is occuring every time a completion loop
iterates and this is quite an expensive kernel system call. It is
not really required if we break out of the loop, so move the
yield to the end of the do-while loop to reduce the yielding overhead
Perf metrics show that the current code eats up ~2.4% CPU yielding
whereas this change reduces this down to ~0.6% of the total CPU run
time.
Signed-off-by: Colin Ian King <colin.king@intel.com>
error: sort is not a member of std
error: gettid was not declared in this scope
error: __NR_sched_setaffinity was not declared in this scope
error: syscall was not declared in this scope
- unify schema building in core and converter;
- add more build script for android;
- add linux build script for python;
- ops impl:
- add floor mod support in binary;
- use eltwise impl in add/max/sub/mul binary for optimization;
- remove fake double support in cast;
- fix 5d support for concat;
- add adjX and adjY support for batch matmul;
- optimize conv2d back prop filter;
- add pad mode support for conv3d;
- fix bug in conv2d & conv depthwise with very small feature map;
- optimize binary without broacast;
- add data types support for gather;
- add gather ND support;
- use uint8 data type in gather v2;
- add transpose support for matmul;
- add matrix band part;
- add dim != 4 support for padding, reshape & tensor convert;
- add pad type support for pool3d;
- make ops based on TensorFlow Lite quantization optional;
- add all & any support for reduction;
- use type in parameter as output type in reduction;
- add int support for unary;
- add variable weight support for conv2d;
- fix conv2d depthwise weights initialization;
- fix type support for transpose;
- fix grad outputs count for reduce grad and reshape grad;
- fix priorbox & detection output;
- fix metal softmax error;
- python:
- add runSessionWithCallBackInfo interface;
- add max nodes limit (1400) for visualization tool;
- fix save error in python3;
- align default dim;
- convert:
- add extra design for optimization;
- add more post converting optimizers;
- add caffe v1 weights blob support;
- add cast, unary, conv transpose support for onnx model;
- optimize batchnorm, conv with variable weights, prelu, reshape, slice, upsample for onnx model;
- add cos/sin/atan/tan support for unary for tensorflow model;
- add any/all support for reduction for tensorflow model;
- add elu, conv3d, pool3d support for tensorflow model;
- optimize argmax, batchnorm, concat, batch to space, conv with variable weights, prelu, slice for tensorflow model;
- others:
- fix size computer lock;
- fix thread pool deadlock;
- add express & parameters in express;
- rewrite blitter chooser without static map;
- add tests for expr;
- fix bugs in quantization
- add evaluating tool for quantization
- add ADMM support in quantization
- fix lock in thread pool
- fix fusing for deconv
- fix reshape converting from ONNX to MNN
- turn off blob size checking by default
- add quantization tool & cpu impl & demo/exec
- add thread pool
- add tests
- fix onnx converter tensor name mismatch
- optimize cpu performance with SSE for windows