性能不如WinCLIP ann CLIP-AD。
只看看question即可。

A toy experiment for using GPT-4V. This leads to uncontrollable and imprecise outputs.

region division: Mi;
Prompt Designing: Ti, this i is different with the i in Mi.
3)Text2Segmentation
answer To to pixel-level anomaly segmentation result Ao.
明明是region level呀;
其实啥也没写。
Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection, 23