
- voxel feature encoder (VFE): voxel feature encoder, PMLR20
- submanifold sparse residual (SSR) block
- sparse encoder-decoder (SED) block
- dense encoder-decoder (DED) block
SED:本文就只这一个点
why
- Most existing sparse CNNs are primarily built by stacking submanifold sparse residual (SSR) blocks. However, submanifold sparse convolutions maintain the same sparsity between input and output features, and therefore hinder the exchange of information among spatially disconnected features.
- Consequently, models employing SSR blocks face challenges in effectively capturing long-range dependencies among features.
- Replace SSR with with regular sparse residual (RSR) blocks [21]?
- significant decrease in feature sparsity as the network deepens, resulting in substantial computational costs.
- Recent research has investigated the utilization of large-kernel sparse CNNs [12, 16] and transformers [14, 15] to capture long-range dependencies among features.
- However, these approaches have either demonstrated limited improvements in detection accuracy or come with significant computational costs.

- Expanded features correspond to the features that fall within the neighborhood of the valid features.
- orange dashed lines represent the convolution kernel space
- red dashed square highlights the regions from which the output feature marked by a star can receive information
How
- After feature down-sampling, the spatially disconnected valid features in the bottom feature map are integrated into the adjacent valid features in the middle feature map.
- An SSR block is subsequently applied to the middle feature map to promote interaction among valid features.
- Finally, the middle feature map is up-sampled to match the resolution of the input feature map.
- Note that the feature up-sampling layer (UP) only up-samples features to the regions covered by the valid features in the input feature map.
DED