Three detrimental factors: observe that the “many-to-one” mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.

many-to-one ⇒ larger H resolution + post processing
Empty grid ⇒ data argumentation
Deformation ⇒ STR的切开,自然有助于此。
=》the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts.

RangeFormer + STR (Scalable Training from Range view)
STR就是把360一周点云切成几个小的range image。

除了faster training,解决many-to-one

512, 1024, and 2048, are not optimal.
previous range view methods use (H, W) = (64, 512) to rasterize LiDAR scans of around 120k points each [4], resulting in over 70% information loss: # of 2D grids / # of 3D points = 64×512 / 120000 ≈ 27.3%.
The crossover of two lines indicates that the range image of width 1920 tends to be the most informative representation. but consumes much more memory than 512 or 1024
本文靠的是大分辨率 + transformer (最普通的) +data argumentation + pose processing, see table 2. =》思想性不强,竞赛型文章。
STR harms mIoU, improve efficiency, see table 2, 4, 5.
just so so.
patch merge没解释。
patch: 3x3 with overlap, i.e. with overlap stride equals to 1 (for the first stage) and 2 (for the last three stages).
