- add quantization tool & cpu impl & demo/exec
- add thread pool
- add tests
- fix onnx converter tensor name mismatch
- optimize cpu performance with SSE for windows
- update resources and docs
- unite tensor's width/height/channel/batch getter
- optimize several ops
- fix compile warnings and errors on Ubantu
- some other bug fixes