
— Object Detection in 20 Years: A Survey, 23



parameters: |channels| * 553, i.e. 55|input channels|*|output channels|
also called pointwise convolutions


The number of parameters in flattened convolution decreases from XYC to X + Y + C, where C is the number of input planes, X and Y denote filter width and height.
Additional factorization in spatial dimension such as in [16, 31] does not save much additional computation as very little computation is spent in depthwise convolutions.
— from mobilenet v1
如果是large kernel呢?二者结合呢?