没意思

Towards segmenting any anomaly without training, we first construct a vanilla baseline (SAA) by prompting into a cascade of anomaly region generator (e.g., a prompt-guided object detection foundation model [23]) and anomaly region refiner (e.g., a segmentation foundation model [19]) modules via a naive class-agnostic language prompt (e.g., “Anomaly”). However, SAA shows the severe false-alarm problem, which falsely detects all the “wick” rather than the ground-truth anomaly region (the “overlong wick”). Thus, we further strengthen the regularization with hybrid prompts in the revamped model (SAA+), which successfully helps identify the anomaly regions.
2 Related work
Language or Vision Foundation Models
Prompt Engineering
adapting foundation models for downstream tasks
- prompting with text and visual inputs
- cannot be employed in ZSAS, because they require training data.
- 应该做一篇合成异常的试试。
- heuristic prompts [55] that do not require any training,
3 SAA: Baseline Method
vanilla Foundation Model Assembly for ZSAS
- R, S := SAA(I, T)
- T is a naive class-agnostic language prompt, e.g., “anomaly”, utilized in SAA
- R^B , S := Generator(I, T )
- Anomaly Region Generator, Grounding DINO [23]
- the bounding-box-level region set RB, and their corresponding confidence score set S
- R := Refiner(I, R^B)
- Anomaly Region Refiner, Segment Anything [19]
4 SAA+

Foundation Model Adaption via Hybrid Prompt Regularization
hybrid prompts (PL, PP , PS , and PC)
- Prompt Generated from Domain Expert Knowledge
- PL = {Ta, Ts}: Language prompts
- Class-agnostic prompts (Ta), e.g., “anomaly” and “defect”.
- Class-specific prompts (Ts), e.g., “black hole” and “white bubble”
- PP = {θarea, θIoU}: property prompts
- Prompts Derived from Target Image Context
- PS: Anomaly Saliency as Prompt
- PC: Anomaly Confidence as Prompt
- identify the K candidates with the highest confidence scores based on the image content and use their average values for final anomaly region detection