|
|
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| Helper.cpp | ||
| Helper.hpp | ||
| README_CN.md | ||
| README_EN.md | ||
| TensorStatistic.cpp | ||
| TensorStatistic.hpp | ||
| calibration.cpp | ||
| calibration.hpp | ||
| logkit.h | ||
| preprocessConfig.json | ||
| quantizeWeight.cpp | ||
| quantizeWeight.hpp | ||
| quantized.cpp | ||
README_EN.md
Model Quantization
Advantages of quantization
Quantization can accelerate forward speed of the model by converting floating point computations in the original model into int8 computations. At the same time, it compresses the original model by approximately 4X by quantize the float32 weights into int8 weights.
Compile
Compile macro
In order to build the quantization tool, set MNN_BUILD_QUANTOOLS=true when compiling, like this:
cd MNN
mkdir build
cd build
cmake -DMNN_BUILD_QUANTOOLS=ON ..
make -j4
Usage
Command
./quantized.out origin.mnn quan.mnn preprocessConfig.json
-
The first argument is the path of floating point model to be quantized
-
The second argument indicates the saving path of quantized model
-
The third argument is the path of config json file
Json config file
{
"format":"RGB",
"mean":[
127.5,
127.5,
127.5
],
"normal":[
0.00784314,
0.00784314,
0.00784314
],
"width":224,
"height":224,
"path":"path/to/images/",
"used_image_num":500,
"feature_quantize_method":"KL",
"weight_quantize_method":"MAX_ABS"
}
format
The format of input images is RGBA, then converted to target format specified by format.
Options: "RGB", "BGR", "RGBA", "GRAY"
mean, normal
The same as ImageProcess config
dst = (src - mean) * normal
width, height
Input width and height of the floating point model
path
Path to images that are used for calibrating feature quantization scale factors.
used_image_num
Specify the number of images used for calibration.
Default: use all the images under
path.
Note: please confirm that the data after the images are transformed by the above processes are the exact data that fed into the model input.
feature_quantize_method
Specify method used to compute feature quantization scale factor.
Options:
-
"KL": use KL divergence method, generally need 100 ~ 1000 images.
-
"ADMM": use ADMM (Alternating Direction Method of Multipliers) method to iteratively search for optimal feature quantization scale factors, generally need one batch images.
Default: "KL"
weight_quantize_method
Specify weight quantization method
Options:
-
"MAX_ABS": use the max absolute value of weights to do symmetrical quantization.
-
"ADMM": use ADMM method to iteratively find optimal quantization of weights.
Default: "MAX_ABS"
Users can explore the above feature and weight quantization methods, and choose a better solution.
Usage of quantized model
The same as floating point model. The inputs and outputs of quantized model are also floating point.