1 Motivation

  1. Three detrimental factors: observe that the “many-to-one” mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.

    Untitled

    many-to-one ⇒ larger H resolution + post processing

    Empty grid ⇒ data argumentation

    Deformation ⇒ STR的切开,自然有助于此。

  2. =》the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts.

    1. 工程性、竞赛型论文。但是统计澄清了一些认识。

    Untitled

2 How to improve?

RangeFormer + STR (Scalable Training from Range view)

2.1 STR

STR就是把360一周点云切成几个小的range image。

Untitled

除了faster training,解决many-to-one

Untitled

512, 1024, and 2048, are not optimal.

previous range view methods use (H, W) = (64, 512) to rasterize LiDAR scans of around 120k points each [4], resulting in over 70% information loss: # of 2D grids / # of 3D points = 64×512 / 120000 ≈ 27.3%.

The crossover of two lines indicates that the range image of width 1920 tends to be the most informative representation. but consumes much more memory than 512 or 1024

本文靠的是大分辨率 + transformer (最普通的) +data argumentation + pose processing, see table 2. =》思想性不强,竞赛型文章。

STR harms mIoU, improve efficiency, see table 2, 4, 5.

2.2 Transformer

just so so.

patch merge没解释。

patch: 3x3 with overlap, i.e. with overlap stride equals to 1 (for the first stage) and 2 (for the last three stages).

Untitled

2.3 Losses