Abstract: The vision-language model CLIP has profoundly transformed the filed of zero-shot anomaly detection. Recent studies acquire anomaly maps by aligning images with normal and abonormal prompts.