ASAM

Boosting Segment Anything Model with Adversarial Tuning
CVPR 2024
vivo Mobile Communication Co., Ltd
logo

ASAM just boosts the performance of SAM without architectural modifications. ASAM is also resource friendly, since it only needs 8 A6000 GPUs without necessitating additional data (1% SA-1B data).

logo

Compared to SAM (Segment Anything, SAM, ICCV2023 ) and EfficientSAM (EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, ESAM, CVPR2024 ), our proposed ASAM and EASAM is more resource-friendly.

Abstract

In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1\%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision.

Framework

ASAM mainly contains three step. The first step is Adversarial Latent Optimization. The second step is Controllable Adversarial Samples Generation. The third step is Fine-tuning SAM with adversarial samples.

Experiments: Stronger SAM

Stronger SAM. Compared with PGD-Tuning SAM, DAT-Tuning SAM, DatasetDM-Tuning SAM. ASAM clearly outperforms other tuning methods and achieves performance improvements compared with original SAM across all 14 test datasets.

From the qualitative comparison in different scenes, such as the common scene, the medical scene, our proposed ASAM can improve the performance of SAM.


Experiments: Stronger EfficientSAM

Stronger EfficientSAM. AESAM achieves performance improvements compared with EfficientSAM (EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, ESAM, CVPR2024 ) across 16 different datasets. ESAM is the latest work proposed by Meta, which is accepted with full marks at CVPR2024.

Experiments: Stronger HQSAM

Stronger HQSAM. HQ-ASAM can achieve performance improvements compared with HQSAM (Segment Anything in High Quality, HQSAM, NeurIPS2023 ) across 4 different datasets. HQSAM is the work proposed by ETH Zurich&HKUST, which achieves about 3.4k Github stars.

Experiments: Stronger SAM-Adapter

Stronger SAM-Adapter. ASAM-Adapter achieves performance improvements compared with SAM-adapter (ICCV2023 Workshop) across 2 different datasets.

BibTeX

@article{li2024asam,
  title={ASAM: Boosting Segment Anything Model with Adversarial Tuning},
  author={Li, Bo and Xiao, Haoke and Tang, Lv},
  journal={arXiv preprint arXiv:2405.00256},
  year={2024}
}
}
The website template was adapted from SegGen.