持续更新ICCV2023论文、代码等信息,欢迎关注AIWalker。主要聚焦以下几个方向,更多CV/AI资料可添加AIWalker助手【AIWalker-zhushou】获取(可扫描底部二维码)。
- Backbone
- Detection
- Segmentation
- Knowledge Distillation
- Diffusion
- Restoration
- Super-Resolution
- Deblurring
- 低光图像增强
- IQA/IAA
- 数据集
- ....
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Rethinking Mobile Block for Efficient Attention-based Models
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
A Unified Continual Learning Framework with General Parameter-Efficient Tuning
Scale-Aware Modulation Meet Transformer
Improving Zero-Shot Generalization for CLIP with Synthesized Prompts
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
- Paper: https://arxiv.org/abs/2307.07487
- Home: https://research.nvidia.com/labs/toronto-ai/DreamTeacher/
ShiftNAS: Improving One-shot NAS via Probability Shift
MULLER: Multilayer Laplacian Resizer for Vision
FLatten Transformer: Vision Transformer with Focused Linear Attention
- Paper: TODO
- Code: https://github.com/LeapLabTHU/FLatten-Transformer
Not All Features Matter:Enhancing Few-shot CLIP with Adaptive Prior
Tuning Pre-trained Model via Moment Probing
Strip-MLP: Efficient Token Interaction for Vision MLP
Adaptive Frequency Filters As Efficient Global Token Mixers
Learning Concise and Descriptive Attributes for Visual Recognition
FemtoDet: an object detection baseline for energy versus performance tradeoffs
Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
Large Selective Kernel Network for Remote Sensing Object Detection
DiffusionDet: Diffusion Model for Object Detection
DETRs with Collaborative Hybrid Assignments Training
MIMDet: Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Detection Transformer with Stable Matching
Random Boxes Are Open-world Object Detectors
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
Cascade-DETR: Delving into High-Quality Universal Object Detection
Deep Directly-Trained Spiking Neural Networks for Object Detection
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts
- Paper: https://arxiv.org/abs/2307.12730
- Code: https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o
Less is More: Focus Attention for Efficient DETR
- Ppaer: https://arxiv.org/abs/2307.12612
- Code: https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR
Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes
- Paper: https://arxiv.org/abs/2307.12101
- Code: https://github.com/ucas-vg/PointTinyBenchmark/tree/SSD-Det
RecursiveDet: End-to-End Region-based Recursive Object Detection
Segment Anything
- Home: https://segment-anything.com/
- Paper: https://arxiv.org/abs/2304.02643
- Code: https://github.com/facebookresearch/segment-anything
SegGPT: Segmenting Everything in Context
VLPart: Going Denser with Open-Vocabulary Part Segmentation
Referring Image Segmentation Using Text Supervision
- Paper:
- Code: https://github.com/fawnliu/WRIS_ICCV2023
EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation
A Simple Framework for Open-Vocabulary Segmentation and Detection
Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation
Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Exploring Transformers for Open-world Instance Segmentation
From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels
DOT: A Distillation-Oriented Trainer
Cumulative Spatial Knowledge Distillation for Vision Transformers
Class-relation Knowledge Distillation for Novel Class Discovery
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
Rethinking Data Distillation: Do Not Overlook Calibration
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
Expressive Text-to-Image Generation with Rich Text
Ablating Concepts in Text-to-Image Diffusion Models
- Paper: https://arxiv.org/abs/2303.13516
- Home: https://www.cs.cmu.edu/~concept-ablation/
- Code: https://github.com/nupurkmr9/concept-ablation
Evaluating Data Attribution for Text-to-Image Models
- Paper: https://arxiv.org/abs/2306.09345
- Home: https://peterwang512.github.io/GenDataAttribution/
- Code: https://github.com/peterwang512/GenDataAttribution
Masked Diffusion Transformer is a Strong Image Synthesizer
- Paper: TODO
- Code: TODO
SVDiff: Compact Parameter Space for Diffusion Fine-tuning
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
Neural Video Depth Stabilizer
Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV
MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Learning Depth Estimation for Transparent and Mirror Surfaces
Adaptive Nonlinear Latent Transformation for Conditional Face Editing
Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond
Diffir: Efficient diffusion model for image restoration
Physics-Driven Turbulence Image Restoration with Stochastic Refinement
Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration
Under-Display Camera Image Restoration with Scattering Effect
From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal
GlowGAN: Unsupervised Learning of HDR Images from LDR Images in the Wild -Paper: https://arxiv.org/pdf/2211.12352.pdf
SRFormer: Permuted Self-Attention for Single Image Super-Resolution
SAFMN: Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution
DLGSANet: Lightweight Dynamic Local and Global Self-Attention Network for Image Super-Resolution
Dual Aggregation Transformer for Image Super-Resolution
A Benchmark for Chinese-English Scene Text Image Super-resolution
Multi-scale Residual Low-Pass Filter Network for Image Deblurring
- Paper: TODO
- Code: TODO
Implicit Neural Representation for Cooperative Low-light Image Enhancement
Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
ExposureDiffusion: Learning to Expose for Low-light Image Enhancement
Delegate Transformer for Image Color Aesthetics Assessment
- Paper: TODO
- Code: https://github.com/woshidandan/Image-Color-Aesthetics-Assessment
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives
On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement
Fast Full-frame Video Stabilization with Iterative Optimization
LPFF: A Portrait Dataset for Face Generators Across Large Poses