- cpu & gpu
- add ceil mode in pool
- fix softmax with neg axis
- cpu
- add unsqueeze op
- optimize lstm
- gpu
- add 5x5 winograd in metal
- add batch support for winograd in opencl
- onnx
- add concat / gather / shape / squeeze / unsqueeze
- fix data type support in constant
- update resources and docs
- unite tensor's width/height/channel/batch getter
- optimize several ops
- fix compile warnings and errors on Ubantu
- some other bug fixes