Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-03-11 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673v1 | null |
2025-03-11 | SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving | Akshat Ghiya et.al. | 2503.08016v1 | null |
2025-03-10 | Better Pose Initialization for Fast and Robust 2D/3D Pelvis Registration | Yehyun Suh et.al. | 2503.07767v1 | null |
2025-03-10 | HumanMM: Global Human Motion Recovery from Multi-shot Videos | Yuhong Zhang et.al. | 2503.07597v1 | null |
2025-03-11 | AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements | Calvin Yeung et.al. | 2503.07499v2 | null |
2025-03-10 | Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey | Chuqi Wang et.al. | 2503.07278v1 | null |
2025-03-12 | Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion | Mona Sheikh Zeinoddin et.al. | 2503.07204v2 | null |
2025-03-10 | Multi-Modal 3D Mesh Reconstruction from Images and Text | Melvin Reka et.al. | 2503.07190v1 | null |
2025-03-11 | PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM | Alan Dao et.al. | 2503.07111v2 | null |
2025-03-09 | AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation | Yang Zou et.al. | 2503.06660v1 | null |
2025-03-08 | NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features | Hongjia Zhai et.al. | 2503.06117v1 | null |
2025-03-08 | Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision | David C. Jeong et.al. | 2503.06089v1 | null |
2025-03-08 | ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features | Shan An et.al. | 2503.05995v1 | null |
2025-03-07 | Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments | Zekai Liang et.al. | 2503.05953v1 | null |
2025-03-07 | Novel Object 6D Pose Estimation with a Single Reference View | Jian Liu et.al. | 2503.05578v1 | null |
2025-03-07 | Multi-Grained Feature Pruning for Video-Based Human Pose Estimation | Zhigang Wang et.al. | 2503.05365v1 | null |
2025-03-07 | Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects | Justin Yu et.al. | 2503.05189v1 | null |
2025-03-07 | SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting | Linqi Yang et.al. | 2503.05174v1 | null |
2025-03-07 | GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting | Zheng Zhou et.al. | 2503.05161v1 | null |
2025-03-06 | MarsLGPR: Mars Rover Localization with Ground Penetrating Radar | Anja Sheppard et.al. | 2503.04944v1 | null |
2025-03-09 | ReynoldsFlow: Exquisite Flow Estimation via Reynolds Transport Theorem | Yu-Hsi Chen et.al. | 2503.04500v2 | null |
2025-03-05 | Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames | Jun Yang et.al. | 2503.03726v1 | null |
2025-03-05 | Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements | Carlo Dindorf et.al. | 2503.03717v1 | null |
2025-03-05 | Improving 6D Object Pose Estimation of metallic Household and Industry Objects | Thomas Pöllabauer et.al. | 2503.03655v1 | null |
2025-03-05 | Tiny Lidars for Manipulator Self-Awareness: Sensor Characterization and Initial Localization Experiments | Giammarco Caroleo et.al. | 2503.03449v1 | null |
2025-03-05 | Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments | Jie Deng et.al. | 2503.03373v1 | null |
2025-03-05 | Supervised Visual Docking Network for Unmanned Surface Vehicles Using Auto-labeling in Real-world Water Environments | Yijie Chu et.al. | 2503.03282v1 | null |
2025-03-05 | SCORE: Saturated Consensus Relocalization in Semantic Line Maps | Haodong Jiang et.al. | 2503.03254v1 | null |
2025-03-04 | Monocular Person Localization under Camera Ego-motion | Yu Zhan et.al. | 2503.02916v1 | null |
2025-03-04 | PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers | Wooju Lee et.al. | 2503.02388v1 | null |
2025-03-04 | DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting | Haoyuan Li et.al. | 2503.02223v1 | null |
2025-03-04 | Zero-Shot Sim-to-Real Visual Quadrotor Control with Hard Constraints | Yan Miao et.al. | 2503.02198v1 | null |
2025-03-03 | Constraint-Based Modeling of Dynamic Entities in 3D Scene Graphs for Robust SLAM | Marco Giberna et.al. | 2503.02050v1 | null |
2025-03-05 | Category-level Meta-learned NeRF Priors for Efficient Object Mapping | Saad Ejaz et.al. | 2503.01582v2 | null |
2025-03-03 | RUSSO: Robust Underwater SLAM with Sonar Optimization against Visual Degradation | Shu Pan et.al. | 2503.01434v1 | null |
2025-03-03 | ecg2o: A Seamless Extension of g2o for Equality-Constrained Factor Graph Optimization | Anas Abdelkarim et.al. | 2503.01311v1 | null |
2025-03-03 | Convex Hull-based Algebraic Constraint for Visual Quadric SLAM | Xiaolong Yu et.al. | 2503.01254v1 | link |
2025-03-04 | Floorplan-SLAM: A Real-Time, High-Accuracy, and Long-Term Multi-Session Point-Plane SLAM for Efficient Floorplan Reconstruction | Haolin Wang et.al. | 2503.00397v2 | null |
2025-03-01 | BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds | Yuto Shibata et.al. | 2503.00389v1 | null |
2025-02-28 | BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Jing-Yuan Chang et.al. | 2502.21085v1 | null |
2025-02-28 | Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints | Masoumeh Chapariniya et.al. | 2502.20803v1 | null |
2025-02-27 | Cutting-edge 3D reconstruction solutions for underwater coral reef images: A review and comparison | Jiageng Zhong et.al. | 2502.20154v1 | null |
2025-02-27 | BEV-DWPVO: BEV-based Differentiable Weighted Procrustes for Low Scale-drift Monocular Visual Odometry on Ground | Yufei Wei et.al. | 2502.20078v1 | null |
2025-02-28 | SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View Segmentation | Zijie Zhou et.al. | 2502.20077v2 | link |
2025-02-27 | RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges | Thibaut Loiseau et.al. | 2502.19955v1 | null |
2025-02-27 | QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects | Elkhan Ismayilzada et.al. | 2502.19769v1 | null |
2025-02-27 | Accurate Pose Estimation for Flight Platforms based on Divergent Multi-Aperture Imaging System | Shunkun Liang et.al. | 2502.19708v1 | null |
2025-02-26 | Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects | Petri Mäkinen et.al. | 2502.19169v1 | null |
2025-02-25 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | Dominik Hollidt et.al. | 2502.18373v1 | null |
2025-02-25 | Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation | Tianyang Xu et.al. | 2502.18214v1 | link |
2025-02-24 | V-HOP: Visuo-Haptic 6D Object Pose Tracking | Hongyu Li et.al. | 2502.17434v1 | null |
2025-02-23 | Orchestrating Joint Offloading and Scheduling for Low-Latency Edge SLAM | Yao Zhang et.al. | 2502.16495v1 | null |
2025-02-23 | DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion | Jianbin Jiao et.al. | 2502.16419v1 | link |
2025-02-21 | RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes | Sicheng Yu et.al. | 2502.15633v1 | null |
2025-02-21 | SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training | Nie Lin et.al. | 2502.15251v1 | null |
2025-02-21 | Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation | Thoa Thieu et.al. | 2502.15179v1 | null |
2025-02-20 | Design of a Visual Pose Estimation Algorithm for Moon Landing | Atakan Süslü et.al. | 2502.14942v1 | null |
2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931v1 | null |
2025-02-19 | EfficientPose 6D: Scalable and Efficient 6D Object Pose Estimation | Zixuan Fang et.al. | 2502.14061v1 | null |
2025-02-19 | Active Illumination for Visual Ego-Motion Estimation in the Dark | Francesco Crocetti et.al. | 2502.13708v1 | null |
2025-02-19 | Object-Pose Estimation With Neural Population Codes | Heiko Hoffmann et.al. | 2502.13403v1 | null |
2025-02-18 | Spatiotemporal Multi-Camera Calibration using Freely Moving People | Sang-Eun Lee et.al. | 2502.12546v1 | null |
2025-02-18 | Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation | Kaiwen Ren et.al. | 2502.12535v1 | null |
2025-02-19 | FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views | Shangzhan Zhang et.al. | 2502.12138v2 | null |
2025-02-17 | Enhancing Transparent Object Pose Estimation: A Fusion of GDR-Net and Edge Detection | Tessa Pulli et.al. | 2502.12027v1 | null |
2025-02-17 | SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and Tracking | Zijian Wu et.al. | 2502.11534v1 | null |
2025-02-18 | VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS | Ming Meng et.al. | 2502.10729v2 | link |
2025-02-15 | Semantics-aware Test-time Adaptation for 3D Human Pose Estimation | Qiuxia Lin et.al. | 2502.10724v1 | null |
2025-02-15 | Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video | Runyang Feng et.al. | 2502.10616v1 | null |
2025-02-14 | HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation | Yibo Liu et.al. | 2502.10606v1 | null |
2025-02-14 | Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models | Chenrui Tie et.al. | 2502.10090v1 | null |
2025-02-13 | Metamorphic Testing for Pose Estimation Systems | Matias Duran et.al. | 2502.09460v1 | null |
2025-02-13 | BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization | Qiwei Wang et.al. | 2502.09080v1 | null |
2025-02-14 | Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks | Zijian Huang et.al. | 2502.08865v2 | null |
2025-02-12 | LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features | Shujie Zhou et.al. | 2502.08676v1 | link |
2025-02-12 | CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World | Yankai Fu et.al. | 2502.08449v1 | null |
2025-02-11 | GaRLIO: Gravity enhanced Radar-LiDAR-Inertial Odometry | Chiyun Noh et.al. | 2502.07703v1 | link |
2025-02-11 | Matrix3D: Large Photogrammetry Model All-in-One | Yuanxun Lu et.al. | 2502.07685v1 | null |
2025-02-08 | Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment | Maneesha Wickramasuriya et.al. | 2502.05409v1 | null |
2025-02-06 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation | Nathan Louis et.al. | 2502.04483v1 | link |
2025-02-06 | GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation | Weihang Li et.al. | 2502.04293v1 | null |
2025-02-06 | Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks | Yuhui Jin et.al. | 2502.03877v1 | null |
2025-02-05 | Mapping and Localization Using LiDAR Fiducial Markers | Yibo Liu et.al. | 2502.03510v1 | null |
2025-02-04 | Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Jian Liu et.al. | 2502.02525v1 | link |
2025-02-03 | CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation | Xiao Lin et.al. | 2502.01312v1 | null |
2025-02-03 | Enhancing Feature Tracking Reliability for Visual Navigation using Real-Time Safety Filter | Dabin Kim et.al. | 2502.01092v1 | null |
2025-02-03 | ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking | Jianqiu Chen et.al. | 2502.01004v1 | null |
2025-01-31 | A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration | Richard Cheng et.al. | 2502.00115v1 | null |
2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034v1 | link |
2025-01-30 | SimpleDepthPose: Fast and Reliable Human Pose Estimation with RGBD-Images | Daniel Bermuth et.al. | 2501.18478v1 | link |
2025-01-29 | Online Trajectory Replanner for Dynamically Grasping Irregular Objects | Minh Nhat Vu et.al. | 2501.17968v1 | null |
2025-01-28 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | Muxi Chen et.al. | 2501.16751v1 | null |
2025-01-27 | Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach | Hoosang Lee et.al. | 2501.16146v1 | null |
2025-01-27 | NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation | Jialun Cai et.al. | 2501.15763v1 | null |
2025-01-25 | Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos | Zhen-Hui Dong et.al. | 2501.15096v1 | null |
2025-01-25 | SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos | Yingying Jiao et.al. | 2501.15073v1 | null |
2025-01-24 | 3D/2D Registration of Angiograms using Silhouette-based Differentiable Rendering | Taewoong Lee et.al. | 2501.14918v1 | link |
2025-01-24 | Light3R-SfM: Towards Feed-forward Structure-from-Motion | Sven Elflein et.al. | 2501.14914v1 | null |
2025-01-24 | Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstruction | Bo Sun et.al. | 2501.14896v1 | null |
2025-01-24 | Optimizing Grasping Precision for Industrial Pick-and-Place Tasks Through a Novel Visual Servoing Approach | Khairidine Benali et.al. | 2501.14557v1 | null |
2025-01-24 | LiDAR-Based Vehicle Detection and Tracking for Autonomous Racing | Marcello Cellina et.al. | 2501.14502v1 | null |
2025-01-24 | Optimizing Human Pose Estimation Through Focused Human and Joint Regions | Yingying Jiao et.al. | 2501.14439v1 | null |
2025-01-24 | Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation | Haipeng Chen et.al. | 2501.14356v1 | null |
2025-01-24 | HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting | Javier Yu et.al. | 2501.14147v1 | null |
2025-01-23 | Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass | Jianing Yang et.al. | 2501.13928v1 | null |
2025-01-23 | EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs | Yizhe Lv et.al. | 2501.13805v1 | link |
2025-01-23 | VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM | Gyuhyeon Pak et.al. | 2501.13402v1 | null |
2025-01-22 | Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects | Louis Aberdeen et.al. | 2501.13009v1 | null |
2025-01-21 | BlanketGen2-Fit3D: Synthetic Blanket Augmentation Towards Improving Real-World In-Bed Blanket Occluded Human Pose Estimation | Tamás Karácsony et.al. | 2501.12318v1 | null |
2025-01-19 | Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation | Shibang Liu et.al. | 2501.11069v1 | null |
2025-01-18 | RoMu4o: A Robotic Manipulation Unit For Orchard Operations Automating Proximal Hyperspectral Leaf Sensing | Mehrad Mortazavi et.al. | 2501.10621v1 | link |
2025-01-17 | landmarker: a Toolkit for Anatomical Landmark Localization in 2D/3D Images | Jef Jonkers et.al. | 2501.10098v1 | link |
2025-01-16 | A New Teacher-Reviewer-Student Framework for Semi-supervised 2D Human Pose Estimation | Wulian Yun et.al. | 2501.09565v1 | null |
2025-01-21 | Towards Robust and Realistic Human Pose Estimation via WiFi Signals | Yang Chen et.al. | 2501.09411v2 | link |
2025-01-16 | RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects | Zhen Luo et.al. | 2501.09307v1 | null |
2025-01-16 | BRIGHT-VO: Brightness-Guided Hybrid Transformer for Visual Odometry with Multi-modality Refinement Module | Dongzhihan Wang et.al. | 2501.08659v2 | null |
2025-01-14 | Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion | Cesare Davide Pace et.al. | 2501.08446v1 | link |
2025-01-14 | Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation | Hansoo Park et.al. | 2501.08408v1 | null |
2025-01-14 | Predicting 4D Hand Trajectory from Monocular Videos | Yufei Ye et.al. | 2501.08329v1 | null |
2025-01-14 | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | Steven Landgraf et.al. | 2501.08188v1 | null |
2025-01-14 | AgentPose: Progressive Distribution Alignment via Feature Agent for Human Pose Distillation | Feng Zhang et.al. | 2501.08088v1 | null |
2025-01-14 | Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation | Feng Zhang et.al. | 2501.08038v1 | null |
2025-01-14 | BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos | Farnoosh Koleini et.al. | 2501.07800v1 | null |
2025-01-13 | Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation | Yaqing Ding et.al. | 2501.07742v1 | link |
2025-01-13 | Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps | Saurabh Gupta et.al. | 2501.07399v1 | null |
2025-01-13 | Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics | Tze Ho Elden Tse et.al. | 2501.07100v1 | null |
2025-01-10 | eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events | Shuolong Chen et.al. | 2501.05688v1 | link |
2025-01-09 | Relative Pose Estimation through Affine Corrections of Monocular Depth Priors | Yifan Yu et.al. | 2501.05446v1 | link |
2025-01-09 | From Simple to Complex Skills: The Case of In-Hand Object Reorientation | Haozhi Qi et.al. | 2501.05439v1 | null |
2025-01-11 | Towards Balanced Continual Multi-Modal Learning in Human Pose Estimation | Jiaxuan Peng et.al. | 2501.05264v2 | null |
2025-01-08 | KN-LIO: Geometric Kinematics and Neural Field Coupled LiDAR-Inertial Odometry | Zhong Wang et.al. | 2501.04263v1 | null |
2025-01-07 | OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints | Mingjie Pan et.al. | 2501.03841v1 | null |
2025-01-10 | MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer | Junsheng Luan et.al. | 2501.03630v2 | null |
2025-01-07 | TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes | Alakh Aggarwal et.al. | 2501.03525v1 | link |
2025-01-06 | Mobile Augmented Reality Framework with Fusional Localization and Pose Estimation | Songlin Hou et.al. | 2501.03336v1 | null |
2025-01-06 | SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation | Haozheng Xu et.al. | 2501.02990v1 | null |
2025-01-06 | HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos | Jinglei Zhang et.al. | 2501.02973v1 | null |
2025-01-06 | Spiking monocular event based 6D pose estimation for space application | Jonathan Courtois et.al. | 2501.02916v1 | null |
2025-01-06 | Universal Features Guided Zero-Shot Category-Level Object Pose Estimation | Wentian Qu et.al. | 2501.02831v1 | null |
2025-01-06 | Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation | Arindam Dutta et.al. | 2501.02773v1 | null |
2025-01-06 | WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation | Tianjian Jiang et.al. | 2501.02771v1 | null |
2025-01-05 | LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments | Haosong Yue et.al. | 2501.02580v1 | link |
2025-01-04 | ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle | Yinchuan Wang et.al. | 2501.02166v1 | link |
2025-01-03 | TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | Jiajie Liu et.al. | 2501.01770v1 | link |
2025-01-03 | Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery | Baoru Huang et.al. | 2501.01752v1 | null |
2025-01-03 | Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions | Xincheng Shuai et.al. | 2501.01425v2 | null |
2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409v1 | null |
2025-01-02 | L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild | Soumyaratna Debnath et.al. | 2501.01174v1 | null |
2024-12-31 | Relative Pose Observability Analysis Using Dual Quaternions | Nicholas B. Andrews et.al. | 2501.00657v1 | null |
2024-12-31 | VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception | Zhaoliang Wan et.al. | 2501.00510v1 | null |
2024-12-30 | Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields | Evgenii Kruzhkov et.al. | 2412.20976v1 | null |
2024-12-30 | ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning | Hrishikesh Gupta et.al. | 2412.20830v1 | link |
2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803v1 | null |
2024-12-30 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | Keng-Wei Chang et.al. | 2412.20767v1 | null |
2024-12-30 | Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study | Boris Bačić et.al. | 2412.20733v1 | link |
2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538v1 | link |
2024-12-28 | MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing | Shuo Wang et.al. | 2412.20082v1 | null |
2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056v1 | link |
2024-12-27 | Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation | Guangsheng Xu et.al. | 2412.19676v1 | link |
2024-12-27 | Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images | Xudong Cai et.al. | 2412.19518v1 | null |
2024-12-26 | Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos | Changwoon Choi et.al. | 2412.19089v1 | null |
2024-12-23 | Reconstructing People, Places, and Cameras | Lea Müller et.al. | 2412.17806v1 | null |
2024-12-22 | Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry | Zhaoxing Zhang et.al. | 2412.16923v1 | null |
2024-12-21 | EasyVis2: A Real Time Multi-view 3D Visualization for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose | Yung-Hong Sun et.al. | 2412.16742v1 | null |
2024-12-21 | FACTS: Fine-Grained Action Classification for Tactical Sports | Christopher Lai et.al. | 2412.16454v1 | null |
2024-12-20 | Can Generative Video Models Help Pose Estimation? | Ruojin Cai et.al. | 2412.16155v1 | null |
2024-12-20 | Monkey Transfer Learning Can Improve Human Pose Estimation | Bradley Scott et.al. | 2412.15966v1 | null |
2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212v1 | null |
2024-12-13 | IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education | Roberto Daza et.al. | 2412.14195v1 | link |
2024-12-18 | Level-Set Parameters: Novel Representation for 3D Shape Analysis | Huan Lei et.al. | 2412.13502v1 | null |
2024-12-18 | Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation | Xiaoqi An et.al. | 2412.13454v1 | link |
2024-12-17 | CondiMen: Conditional Multi-Person Mesh Recovery | Brégier Romain et.al. | 2412.13058v1 | null |
2024-12-17 | ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Wangyu Xue et.al. | 2412.12675v1 | null |
2024-12-16 | Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion | Adam Bethell et.al. | 2412.11420v1 | null |
2024-12-13 | ExeChecker: Where Did I Go Wrong? | Yiwen Gu et.al. | 2412.10573v1 | null |
2024-12-11 | CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty | Harry Zhang et.al. | 2412.10431v1 | null |
2024-12-13 | RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting | Lizhi Bai et.al. | 2412.09868v1 | null |
2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621v1 | null |
2024-12-12 | FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction | Jiale Xu et.al. | 2412.09573v1 | null |
2024-12-11 | BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation | Shengze Wang et.al. | 2412.08640v1 | null |
2024-12-12 | Drift-free Visual SLAM using Digital Twins | Roxane Merat et.al. | 2412.08496v2 | null |
2024-12-11 | Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Siyan Dong et.al. | 2412.08376v1 | link |
2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746v1 | null |
2024-12-09 | MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds | Zhenggang Tang et.al. | 2412.06974v1 | null |
2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488v1 | link |
2024-12-09 | Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation | Marsha Mariya Kappan et.al. | 2412.06227v1 | null |
2024-12-06 | CCS: Continuous Learning for Customized Incremental Wireless Sensing Services | Qunhang Fu et.al. | 2412.04821v1 | null |
2024-12-05 | ProPLIKS: Probablistic 3D human body pose estimation | Karthik Shetty et.al. | 2412.04665v1 | null |
2024-12-05 | DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction | Ben Kaye et.al. | 2412.04464v1 | null |
2024-12-05 | Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation | Alan Li et.al. | 2412.04279v1 | null |
2024-12-04 | Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis | Qitao Zhao et.al. | 2412.03570v1 | null |
2024-12-06 | NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images | Lingen Li et.al. | 2412.03517v2 | null |
2024-12-05 | A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks | Proma Hossain Progga et.al. | 2412.03498v2 | null |
2024-12-04 | MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras | Huai Yu et.al. | 2412.03146v1 | link |
2024-12-04 | An indoor DSO-based ceiling-vision odometry system for indoor industrial environments | Abdelhak Bougouffa et.al. | 2412.02950v1 | null |
2024-12-03 | EgoCast: Forecasting Egocentric Human Pose in the Wild | Maria Escobar et.al. | 2412.02903v1 | null |
2024-12-02 | emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation | Sasha Salter et.al. | 2412.02725v1 | link |
2024-12-03 | ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | Miroslav Purkrabek et.al. | 2412.02254v1 | null |
2024-12-03 | Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images | Xiangyong Lu et.al. | 2412.02197v1 | link |
2024-12-03 | CLERF: Contrastive LEaRning for Full Range Head Pose Estimation | Ting-Ruen Wei et.al. | 2412.02066v1 | null |
2024-12-02 | Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Miroslav Purkrabek et.al. | 2412.01562v1 | link |
2024-12-02 | 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting | Yufeng Jin et.al. | 2412.01543v1 | null |
2024-12-02 | HandOS: 3D Hand Reconstruction in One Stage | Xingyu Chen et.al. | 2412.01537v1 | null |
2024-12-02 | SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames | Yuxuan Zhou et.al. | 2412.01500v1 | link |
2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422v1 | null |
2024-12-02 | Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures | Qiyuan Shen et.al. | 2412.01299v1 | null |
2024-12-02 | CRISP: Object Pose and Shape Estimation with Test-Time Adaptation | Jingnan Shi et.al. | 2412.01052v1 | null |
2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492v1 | null |
2024-11-29 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | Yang You et.al. | 2411.19458v1 | link |
2024-11-28 | GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model | Rui Zhou et.al. | 2411.19289v1 | null |
2024-11-28 | HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Prithviraj Banerjee et.al. | 2411.19167v1 | null |
2024-11-28 | Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations | Tjark Behrens et.al. | 2411.19162v1 | link |
2024-11-28 | Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation | Mathias Hudoba de Badyn et.al. | 2411.19033v1 | null |
2024-11-28 | Waterfall Transformer for Multi-person Pose Estimation | Navin Ranjan et.al. | 2411.18944v1 | null |
2024-12-02 | AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Sherwin Bahmani et.al. | 2411.18673v2 | null |
2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377v1 | null |
2024-11-27 | Manual-PA: Learning 3D Part Assembly from Instruction Diagrams | Jiahao Zhang et.al. | 2411.18011v1 | null |
2024-11-26 | Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors | Ziang Xu et.al. | 2411.17790v1 | null |
2024-11-26 | Geometric Point Attention Transformer for 3D Shape Reassembly | Jiahan Li et.al. | 2411.17788v1 | null |
2024-11-26 | RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training | Raktim Gautam Goswami et.al. | 2411.17662v1 | null |
2024-11-26 | Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles | Susu Fang et.al. | 2411.17432v1 | null |
2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240v1 | link |
2024-11-28 | SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting | Gyeongjin Kang et.al. | 2411.17190v3 | null |
2024-11-26 | GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation | Xin Liu et.al. | 2411.17174v1 | null |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668v1 | null |
2024-11-25 | Edge Weight Prediction For Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2411.16665v1 | link |
2024-11-25 | SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Hyojun Go et.al. | 2411.16443v1 | link |
2024-11-25 | One Diffusion to Generate Them All | Duong H. Le et.al. | 2411.16318v1 | link |
2024-11-25 | UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image | Xingyu Liu et.al. | 2411.16106v1 | null |
2024-11-24 | Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching | Yujing Sun et.al. | 2411.15860v1 | link |
2024-11-24 | PEnG: Pose-Enhanced Geo-Localisation | Tavis Shore et.al. | 2411.15742v1 | null |
2024-11-22 | Personalization of Wearable Sensor-Based Joint Kinematic Estimation Using Computer Vision for Hip Exoskeleton Applications | Changseob Song et.al. | 2411.15366v1 | null |
2024-11-22 | Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation | Huy Le et.al. | 2411.14913v1 | null |
2024-11-22 | mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect | Shuting Hu et.al. | 2411.14656v1 | null |
2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347v1 | link |
2024-11-21 | SEMPose: A Single End-to-end Network for Multi-object Pose Estimation | Xin Liu et.al. | 2411.14002v1 | null |
2024-11-21 | Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain | Vidya Sudevan et.al. | 2411.13988v1 | null |
2024-11-21 | Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework | Vidya Sudevan et.al. | 2411.13962v1 | null |
2024-11-20 | Developing Normative Gait Cycle Parameters for Clinical Analysis Using Human Pose Estimation | Rahm Ranjan et.al. | 2411.13716v1 | null |
2024-11-20 | Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction | Yi Gu et.al. | 2411.13620v1 | null |
2024-11-19 | VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference | Seong Jong Yoo et.al. | 2411.13607v1 | link |
2024-11-20 | DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild | Weicai Ye et.al. | 2411.13291v1 | null |
2024-11-20 | X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation | Yuchen Yang et.al. | 2411.13026v1 | link |
2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676v1 | null |
2024-11-15 | SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Yutao Tang et.al. | 2411.12592v1 | link |
2024-11-19 | GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping | Teli Ma et.al. | 2411.12286v1 | null |
2024-11-18 | IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | Yunong Liu et.al. | 2411.11409v1 | link |
2024-11-15 | USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting | Kang Chen et.al. | 2411.10504v1 | link |
2024-11-13 | ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening | Hojun Jang et.al. | 2411.09435v1 | null |
2024-11-13 | Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis | Dominik Borer et.al. | 2411.08603v1 | null |
2024-11-13 | DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization | Yueming Xu et.al. | 2411.08373v1 | null |
2024-11-16 | RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation | Shuocheng Yang et.al. | 2411.07699v2 | link |
2024-11-12 | Human Arm Pose Estimation with a Shoulder-worn Force-Myography Device for Human-Robot Interaction | Rotem Atari et.al. | 2411.07644v1 | null |
2024-11-12 | Towards Seamless Integration of Magnetic Tracking into Fluoroscopy-guided Interventions | Shuwei Xing et.al. | 2411.07495v1 | null |
2024-11-08 | Acoustic-based 3D Human Pose Estimation Robust to Human Position | Yusuke Oumi et.al. | 2411.07165v1 | null |
2024-11-11 | CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Junho Kim et.al. | 2411.06869v1 | null |
2024-11-11 | GenZ-ICP: Generalizable and Degeneracy-Robust LiDAR Odometry Using an Adaptive Weighting | Daehan Lee et.al. | 2411.06766v1 | link |
2024-11-11 | GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction | Shizhe Yuan et.al. | 2411.06725v1 | null |
2024-11-10 | Magnetic Field Aided Vehicle Localization with Acceleration Correction | Mrunmayee Deshpande et.al. | 2411.06543v1 | null |
2024-11-10 | Visuotactile-Based Learning for Insertion with Compliant Hands | Osher Azulay et.al. | 2411.06408v1 | link |
2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734v1 | null |
2024-11-08 | DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions | Rafael Berral-Soler et.al. | 2411.05552v1 | link |
2024-11-08 | Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map | Chanuk Yang et.al. | 2411.05497v1 | null |
2024-11-08 | Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements | Kunrui Ze et.al. | 2411.05481v1 | null |
2024-11-07 | Social EgoMesh Estimation | Luca Scofano et.al. | 2411.04598v1 | link |
2024-11-07 | Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory | Ali K. AlShami et.al. | 2411.04501v1 | null |
2024-11-08 | SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation | Xun Tu et.al. | 2411.04386v2 | null |
2024-11-08 | GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting | Jilan Mei et.al. | 2411.03807v3 | null |
2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724v1 | null |
2024-11-05 | Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data | Seunggeun Chi et.al. | 2411.03561v1 | null |
2024-11-05 | HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features | Arnab Dey et.al. | 2411.03086v1 | null |
2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804v1 | null |
2024-11-03 | Activating Self-Attention for Multi-Scene Absolute Pose Regression | Miso Lee et.al. | 2411.01443v1 | link |
2024-11-04 | 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction | Jongmin Lee et.al. | 2411.00543v2 | null |
2024-10-31 | Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis | Brody McNutt et.al. | 2411.00196v1 | null |
2024-10-31 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Botao Ye et.al. | 2410.24207v1 | link |
2024-11-06 | SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation | Aditya Agarwal et.al. | 2410.23643v2 | null |
2024-10-30 | SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark | HyunJun Jung et.al. | 2410.22715v1 | link |
2024-10-29 | LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues | Hanqing Jiang et.al. | 2410.22213v1 | null |
2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128v1 | link |
2024-10-29 | HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation | Zhoujie Xu et.al. | 2410.22079v1 | null |
2024-10-29 | EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data | Zhonghua Yi et.al. | 2410.21743v1 | link |
2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153v1 | null |
2024-10-29 | BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment | Chih-Hsiang Hsu et.al. | 2410.20731v2 | link |
2024-11-01 | RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior | Mingjiang Liang et.al. | 2410.20358v2 | null |
2024-10-27 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions | Rawal Khirodkar et.al. | 2410.20294v1 | null |
2024-10-26 | Neural Fields in Robotics: A Survey | Muhammad Zubair Irshad et.al. | 2410.20220v1 | link |
2024-10-25 | DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems | Muhammad Zaeem Shahzad et.al. | 2410.19336v1 | null |
2024-10-24 | Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Junyi Chen et.al. | 2410.18962v1 | null |
2024-10-24 | VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation | Daniel Bermuth et.al. | 2410.18723v1 | link |
2024-10-23 | Robust Two-View Geometry Estimation with Implicit Differentiation | Vladislav Pyatov et.al. | 2410.17983v1 | link |
2024-10-23 | YOLOv11: An Overview of the Key Architectural Enhancements | Rahima Khanam et.al. | 2410.17725v1 | link |
2024-10-21 | Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers | Andrea Berra et.al. | 2410.15802v1 | null |
2024-10-21 | ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos | Tao Tang et.al. | 2410.15582v1 | link |
2024-10-20 | Neural Active Structure-from-Motion in Dark and Textureless Environment | Kazuto Ichimaru et.al. | 2410.15378v1 | null |
2024-10-20 | POSE: Pose estimation Of virtual Sync Exhibit system | Hao-Tang Tsui et.al. | 2410.15343v1 | link |
2024-10-18 | Graph Optimality-Aware Stochastic LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2410.14565v1 | null |
2024-10-18 | Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior | Calvin-Khang Ta et.al. | 2410.14540v1 | null |
2024-10-18 | Sim2real Cattle Joint Estimation in 3D point clouds | Okour Mohammad et.al. | 2410.14419v1 | null |
2024-10-18 | Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping | Renguang Chen et.al. | 2410.14161v1 | null |
2024-10-15 | From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images | unyang Wu et.al. | 2410.13896v1 | null |
2024-10-17 | DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions | Edison P. Velasco-Sánchez et.al. | 2410.13541v1 | null |
2024-10-17 | Object Pose Estimation Using Implicit Representation For Transparent Objects | Varun Burde et.al. | 2410.13465v1 | null |
2024-10-16 | Optimizing Multi-Task Learning for Accurate Spacecraft Pose Estimation | Francesco Evangelisti et.al. | 2410.12679v1 | null |
2024-10-15 | Contrastive Touch-to-Touch Pretraining | Samanta Rodriguez et.al. | 2410.11834v1 | null |
2024-10-18 | X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing | Xinyan Chen et.al. | 2410.10167v2 | null |
2024-10-13 | Occluded Human Pose Estimation based on Limb Joint Augmentation | Gangtao Han et.al. | 2410.09885v1 | null |
2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467v1 | null |
2024-10-12 | Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis | Qianyi Deng et.al. | 2410.09312v1 | link |
2024-10-11 | CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation | Jianyu Zhao et.al. | 2410.09010v1 | link |
2024-10-11 | Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization | Christian Schmidt et.al. | 2410.08743v1 | link |
2024-10-10 | Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation | Felix Petersen et.al. | 2410.08125v1 | null |
2024-10-10 | Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation | Maria Makarova et.al. | 2410.07801v1 | null |
2024-10-10 | Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos | Cuong Le et.al. | 2410.07795v1 | link |
2024-10-12 | Autonomous Driving in Unstructured Environments: How Far Have We Come? | Chen Min et.al. | 2410.07701v2 | link |
2024-10-10 | Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks | Minxing Zhang et.al. | 2410.07670v1 | null |
2024-10-09 | OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB | Yunzhi Lin et.al. | 2410.06694v1 | null |
2024-10-08 | SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging | Ziyang Chen et.al. | 2410.06028v1 | link |
2024-10-08 | AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry | Thomas Jantos et.al. | 2410.05996v1 | null |
2024-10-08 | Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation? | Charalambos Tzamos et.al. | 2410.05984v1 | link |
2024-10-08 | FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance | Ruocheng Wang et.al. | 2410.05791v1 | null |
2024-10-07 | Comparison of marker-less 2D image-based methods for infant pose estimation | Lennart Jahn et.al. | 2410.04980v1 | null |
2024-10-06 | Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion | Mehwish Ghafoor et.al. | 2410.04574v1 | link |
2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Jianhao Jiao et.al. | 2410.04419v1 | null |
2024-10-05 | Test-Time Adaptation for Keypoint-Based Spacecraft Pose Estimation Based on Predicted-View Synthesis | Juan Ignacio Bravo Pérez-Villar et.al. | 2410.04298v1 | link |
2024-10-05 | A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems | Nikola Radulov et.al. | 2410.04242v1 | link |
2024-10-04 | Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos | Ziyu Wang et.al. | 2410.03858v1 | null |
2024-10-04 | Universal Global State Estimation for Inertial Navigation Systems | Sifeddine Benahmed et.al. | 2410.03846v1 | null |
2024-10-04 | MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | Junyi Zhang et.al. | 2410.03825v1 | null |
2024-10-04 | Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images | Ci Li et.al. | 2410.03438v1 | null |
2024-10-04 | HRVMamba: High-Resolution Visual State Space Model for Dense Prediction | Hao Zhang et.al. | 2410.03174v1 | null |
2024-10-04 | CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization | Shigemichi Matsuzaki et.al. | 2410.03054v1 | null |
2024-10-03 | Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition | Nikolaos Stathoulopoulos et.al. | 2410.02643v1 | link |
2024-10-03 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237v1 | null |
2024-10-02 | SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment | Xingyu Ji et.al. | 2410.01618v1 | null |
2024-10-02 | SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network | Ahmed Tawfik Aboukhadra et.al. | 2410.01293v1 | null |
2024-10-01 | Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models | Jerry Yan et.al. | 2410.01061v1 | null |
2024-10-01 | RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations | Kaichen Zhou et.al. | 2410.00713v1 | link |
2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589v1 | null |
2024-09-30 | Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations | Muhammad Saif Ullah Khan et.al. | 2409.20469v1 | null |
2024-09-30 | Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies | Shalini Sarode et.al. | 2409.20237v1 | null |
2024-09-30 | PuzzleBoard: A New Camera Calibration Pattern with Position Encoding | Peer Stelldinger et.al. | 2409.20127v1 | link |
2024-09-30 | Robust Gaussian Splatting SLAM by Leveraging Loop Closure | Zunjie Zhu et.al. | 2409.20111v1 | null |
2024-09-30 | GearTrack: Automating 6D Pose Estimation | Yu Deng et.al. | 2409.19986v1 | null |
2024-09-29 | PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond | Chen Song et.al. | 2409.19772v1 | link |
2024-09-29 | GelSlim 4.0: Focusing on Touch and Reproducibility | Andrea Sipos et.al. | 2409.19770v1 | null |
2024-09-27 | Robust Proximity Operations using Probabilistic Markov Models | Deep Parikh et.al. | 2409.19062v1 | null |
2024-09-27 | Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras | Yipeng Lu et.al. | 2409.18673v1 | null |
2024-09-27 | DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences | Jingwei Song et.al. | 2409.18457v1 | null |
2024-09-30 | Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation | Mengchen Zhang et.al. | 2409.18261v2 | link |
2024-09-26 | AI-Powered Augmented Reality for Satellite Assembly, Integration and Test | Alvaro Patricio et.al. | 2409.18101v1 | null |
2024-09-27 | Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes | Katja Ludwig et.al. | 2409.17671v2 | null |
2024-09-25 | Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits | Shaoxiong Yao et.al. | 2409.17389v1 | null |
2024-09-25 | Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots | Zhichao Liu et.al. | 2409.17116v1 | null |
2024-09-25 | Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles | Ran Jing et.al. | 2409.17111v1 | null |
2024-09-25 | Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation | Lucas Carvalho de Lima et.al. | 2409.16680v1 | null |
2024-09-25 | FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation | Jingyi Tang et.al. | 2409.16600v1 | null |
2024-09-25 | Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots | Masoud Dayani Najafabadi et.al. | 2409.16595v1 | link |
2024-09-24 | PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings | Sutharsan Mahendren et.al. | 2409.15832v1 | null |
2024-09-24 | LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation | Ruida Zhang et.al. | 2409.15727v1 | link |
2024-09-23 | Framework for Robust Localization of UUVs and Mapping of Net Pens | David Botta et.al. | 2409.15475v1 | null |
2024-09-23 | FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera | Guoyang Zhao et.al. | 2409.15054v1 | link |
2024-09-23 | BranchPoseNet: Characterizing tree branching with a deep learning-based pose estimation approach | Stefano Puliti et.al. | 2409.14755v1 | link |
2024-09-23 | ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps | Haiming Gao et.al. | 2409.14723v1 | null |
2024-09-22 | Tactile Functasets: Neural Implicit Representations of Tactile Datasets | Sikai Li et.al. | 2409.14592v1 | null |
2024-09-22 | AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way | Sining Huang et.al. | 2409.14577v1 | null |
2024-09-22 | DROP: Dexterous Reorientation via Online Planning | Albert H. Li et.al. | 2409.14562v1 | null |
2024-09-21 | Combining Absolute and Semi-Generalized Relative Poses for Visual Localization | Vojtech Panek et.al. | 2409.14269v1 | null |
2024-09-18 | SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection | Tim Engelbracht et.al. | 2409.11870v1 | link |
2024-09-18 | End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation | Thomas Pöllabauer et.al. | 2409.11819v1 | null |
2024-09-18 | Bridging Domain Gap for Flight-Ready Spaceborne Vision | Tae Ha Park et.al. | 2409.11661v1 | null |
2024-09-17 | Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification | Frederik Hagelskjær et.al. | 2409.11512v1 | null |
2024-09-17 | Training Datasets Generation for Machine Learning: Application to Vision Based Navigation | Jérémy Lebreton et.al. | 2409.11383v1 | null |
2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340v1 | link |
2024-09-17 | ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges | Thien-Minh Nguyen et.al. | 2409.11122v1 | link |
2024-09-17 | Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB | Alessandro Simoni et.al. | 2409.11104v1 | null |
2024-09-21 | HGSLoc: 3DGS-based Heuristic Camera Pose Refinement | Zhongyan Niu et.al. | 2409.10925v2 | null |
2024-09-17 | Pose estimation of CubeSats via sensor fusion and Error-State Extended Kalman Filter | Deep Parikh et.al. | 2409.10815v1 | null |
2024-09-16 | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | Jingpei Lu et.al. | 2409.10441v1 | null |
2024-09-16 | HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models | Vineet Bhat et.al. | 2409.10419v1 | null |
2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357v1 | null |
2024-09-16 | Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference | Huy-Dung Nguyen et.al. | 2409.10095v1 | null |
2024-09-15 | Precise Pick-and-Place using Score-Based Diffusion Networks | Shih-Wei Guo et.al. | 2409.09725v1 | null |
2024-09-15 | Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild | Nie Lin et.al. | 2409.09714v1 | null |
2024-09-15 | Proximity operations of CubeSats via sensor fusion of ultra-wideband range measurements with rate gyroscopes, accelerometers and monocular vision | Deep Parikh et.al. | 2409.09665v1 | null |
2024-09-15 | A Scalable Tabletop Satellite Automation Testbed:Design And Experiments | Deep Parikh et.al. | 2409.09633v1 | null |
2024-09-14 | MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry | Yuheng Qiu et.al. | 2409.09479v1 | null |
2024-09-14 | Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM | Haoying Li et.al. | 2409.09410v1 | null |
2024-09-13 | Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry | Yunus Bilge Kurt et.al. | 2409.08769v1 | link |
2024-09-13 | WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users | Yunzhi Li et.al. | 2409.08494v1 | link |
2024-09-12 | Bayesian Inverse Graphics for Few-Shot Concept Learning | Octavio Arriaga et.al. | 2409.08351v1 | link |
2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269v1 | null |
2024-09-12 | Covariance Intersection-based Invariant Kalman Filtering(DInCIKF) for Distributed Pose Estimation | Haoying Li et.al. | 2409.07933v1 | null |
2024-09-12 | GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions | Liang Feng et.al. | 2409.07798v1 | null |
2024-09-12 | GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution | Liang Feng et.al. | 2409.07752v1 | null |
2024-09-11 | FaVoR: Features via Voxel Rendering for Camera Relocalization | Vincenzo Polizzi et.al. | 2409.07571v1 | null |
2024-09-11 | Benchmarking 2D Egocentric Hand Pose Datasets | Olga Taran et.al. | 2409.07337v1 | null |
2024-09-11 | iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation | Shuolong Chen et.al. | 2409.07116v1 | link |
2024-09-11 | Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry | Anbo Tao et.al. | 2409.06948v1 | null |
2024-09-13 | A Bayesian framework for active object recognition, pose estimation and shape transfer learning through touch | Haodong Zheng et.al. | 2409.06912v2 | null |
2024-09-11 | Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences | Shishir Reddy Vutukur et.al. | 2409.06683v2 | link |
2024-09-10 | PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation | Ginger Delmas et.al. | 2409.06535v1 | null |
2024-09-10 | Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation | Mohsi Jawaid et.al. | 2409.06240v1 | null |
2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413v1 | null |
2024-09-08 | HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions | Jianping Li et.al. | 2409.05006v1 | null |
2024-09-06 | Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands | Yotam Erel et.al. | 2409.04397v1 | null |
2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196v1 | null |
2024-09-06 | Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics | Woojin Cho et.al. | 2409.04033v1 | null |
2024-09-06 | Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments | Therese Joseph et.al. | 2409.03998v1 | null |
2024-09-09 | The Influence of Faulty Labels in Data Sets on Human Pose Estimation | Arnold Schwarz et.al. | 2409.03887v2 | null |
2024-09-05 | MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation | Philipp Quentin et.al. | 2409.03556v1 | null |
2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245v1 | null |
2024-09-01 | Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach | Wenjun Huang et.al. | 2409.02715v1 | null |
2024-09-04 | Object Gaussian for Monocular 6D Pose Estimation from Sparse Views | Luqing Luo et.al. | 2409.02581v1 | null |
2024-09-03 | EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Yiming Zhao et.al. | 2409.02224v1 | null |
2024-09-03 | Deep learning for objective estimation of Parkinsonian tremor severity | Felipe Duque-Quiceno et.al. | 2409.02011v1 | null |
2024-09-03 | SPiKE: 3D Human Pose from Point Cloud Sequences | Irene Ballester et.al. | 2409.01879v1 | link |
2024-09-02 | Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds | Mohammed H. AlSharif et.al. | 2409.01002v1 | null |
2024-09-01 | Detection, Recognition and Pose Estimation of Tabletop Objects | Sanjuksha Nirgude et.al. | 2409.00869v1 | null |
2024-09-01 | DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation | Huixin Zhang et.al. | 2409.00744v1 | link |
2024-09-01 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds | Ziqiang Dang et.al. | 2409.00736v1 | null |
2024-08-31 | ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action | Longyun Liao et.al. | 2409.00449v1 | null |
2024-09-04 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373v3 | null |
2024-08-30 | BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities | Boris Meden et.al. | 2408.17297v1 | null |
2024-08-30 | EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs | Zhen Fan et.al. | 2408.17168v1 | null |
2024-09-01 | Generic Objects as Pose Probes for Few-Shot View Synthesis | Zhirui Gao et.al. | 2408.16690v2 | null |
2024-08-29 | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation | Yuchen Che et.al. | 2408.16547v1 | link |
2024-08-29 | GRPose: Learning Graph Relations for Human Image Generation with Pose Priors | Xiangchen Yin et.al. | 2408.16540v1 | link |
2024-08-28 | Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators | Nikita Kister et.al. | 2408.16536v1 | null |
2024-08-28 | Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation | Laura Bragagnolo et.al. | 2408.15810v1 | link |
2024-08-30 | Addressing the challenges of loop detection in agricultural environments | Nicolás Soncini et.al. | 2408.15761v2 | link |
2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750v1 | null |
2024-08-28 | Benchmarking ML Approaches to UWB-Based Range-Only Posture Recognition for Human Robot-Interaction | Salma Salimi et.al. | 2408.15717v1 | null |
2024-08-26 | Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model | Abu Saleh Musa Miah et.al. | 2408.14111v1 | null |
2024-08-25 | InterTrack: Tracking Human Object Interaction without Object Templates | Xianghui Xie et.al. | 2408.13953v1 | null |
2024-08-24 | Temporally-consistent 3D Reconstruction of Birds | Johannes Hägerlind et.al. | 2408.13629v1 | null |
2024-08-24 | Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation | Jianing Song et.al. | 2408.13587v1 | null |
2024-08-27 | Sapiens: Foundation for Human Vision Models | Rawal Khirodkar et.al. | 2408.12569v3 | null |
2024-08-21 | GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting | Wanshui Gan et.al. | 2408.11447v1 | link |
2024-08-20 | GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Changkun Liu et.al. | 2408.11085v1 | link |
2024-08-20 | ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data | Elia Bonetto et.al. | 2408.10831v1 | null |
2024-08-20 | MPL: Lifting 3D Human Pose from Multi-view 2D Poses | Seyed Abolfazl Ghasemzadeh et.al. | 2408.10805v1 | link |
2024-08-19 | RUMI: Rummaging Using Mutual Information | Sheng Zhong et.al. | 2408.10450v1 | null |
2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195v1 | null |
2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037v1 | link |
2024-08-19 | Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation | Qianhui Men et.al. | 2408.09931v1 | null |
2024-08-18 | OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare | Chen Long-fei et.al. | 2408.09409v1 | null |
2024-08-17 | An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface | Kevin Jose Thomas et.al. | 2408.09311v1 | link |
2024-08-16 | ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation | Hao Tang et.al. | 2408.09042v1 | null |
2024-08-16 | Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS | Wei Sun et.al. | 2408.08723v1 | null |
2024-08-16 | SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis | Xingyue Lin et.al. | 2408.08623v1 | null |
2024-08-15 | HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning | Hongyu Li et.al. | 2408.08312v1 | null |
2024-08-15 | Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation | Varun Burde et.al. | 2408.08234v1 | link |
2024-08-15 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202v1 | null |
2024-08-15 | Your Turn: Real-World Turning Angle Estimation for Parkinson's Disease Severity Assessment | Qiushuo Cheng et.al. | 2408.08182v1 | null |
2024-08-15 | Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models | Tianyu Wang et.al. | 2408.07975v1 | null |
2024-08-15 | GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Yutong Wang et.al. | 2408.07917v1 | link |
2024-08-13 | Grasping by Hanging: a Learning-Free Grasping Detection Method for Previously Unseen Objects | Wanze Li et.al. | 2408.06734v1 | null |
2024-08-13 | A Miniature Vision-Based Localization System for Indoor Blimps | Shicong Ma et.al. | 2408.06648v1 | null |
2024-08-12 | UniT: Unified Tactile Representation for Robot Learning | Zhengtong Xu et.al. | 2408.06481v1 | link |
2024-08-12 | Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques | Navid Ghassemi et.al. | 2408.06336v1 | null |
2024-08-12 | CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments | Yanpeng Jia et.al. | 2408.05981v1 | null |
2024-08-12 | PAFormer: Part Aware Transformer for Person Re-identification | Hyeono Jung et.al. | 2408.05918v1 | null |
2024-08-11 | SABER-6D: Shape Representation Based Implicit Object Pose Estimation | Shishir Reddy Vutukur et.al. | 2408.05867v1 | null |
2024-08-10 | Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis | Zhongche Qu et.al. | 2408.05635v1 | null |
2024-08-10 | Anticipation through Head Pose Estimation: a preliminary study | Federico Figari Tomenotti et.al. | 2408.05516v1 | null |
2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979v1 | null |
2024-08-07 | PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Yunlong Huang et.al. | 2408.03540v1 | link |
2024-08-06 | Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera | Zibin Liu et.al. | 2408.03225v1 | link |
2024-08-06 | Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW | Elia Cereda et.al. | 2408.03168v1 | null |
2024-08-06 | BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications | G. Manni et.al. | 2408.03078v1 | link |
2024-08-07 | Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network | Xinyi Zhang et.al. | 2408.02922v2 | null |
2024-08-05 | Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises | Aleksa Marusic et.al. | 2408.02855v1 | null |
2024-08-05 | Joint-Motion Mutual Learning for Pose Estimation in Videos | Sifan Wu et.al. | 2408.02285v1 | null |
2024-08-04 | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos | Feichi Lu et.al. | 2408.02110v1 | null |
2024-08-04 | Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem | Tian Zhan et.al. | 2408.01945v1 | null |
2024-08-03 | MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions | Rahul Islam et.al. | 2408.01850v1 | null |
2024-08-03 | BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles | Lun Luo et.al. | 2408.01841v1 | link |
2024-08-03 | E |
Yunshan Qi et.al. | 2408.01840v1 | null |
2024-08-03 | Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality | Leina Elansary et.al. | 2408.01728v1 | null |
2024-08-03 | Stimulating Imagination: Towards General-purpose Object Rearrangement | Jianyang Wu et.al. | 2408.01655v1 | null |
2024-08-02 | Full-range Head Pose Geometric Data Augmentations | Huei-Chung Hu et.al. | 2408.01566v1 | null |
2024-07-31 | Adapting Skills to Novel Grasps: A Self-Supervised Approach | Georgios Papagiannis et.al. | 2408.00178v1 | null |
2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117v1 | null |
2024-07-30 | StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset | Chaofan Huo et.al. | 2407.20545v1 | link |
2024-07-30 | HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation | Wencan Cheng et.al. | 2407.20542v1 | link |
2024-07-30 | Markers Identification for Relative Pose Estimation of an Uncooperative Target | Batu Candan et.al. | 2407.20515v1 | null |
2024-07-29 | BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation | Kieran Saunders et.al. | 2407.20437v1 | null |
2024-07-28 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | Zhengcen Li et.al. | 2407.19497v1 | link |
2024-07-26 | Flexible graph convolutional network for 3D human pose estimation | Abu Taib Mohammed Shahjahan et.al. | 2407.19077v1 | link |
2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | Jialin Gao et.al. | 2407.18590v1 | null |
2024-07-28 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438v2 | link |
2024-07-24 | Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments | Wei Gao et.al. | 2407.17078v1 | null |
2024-07-30 | DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction | Xiaobiao Du et.al. | 2407.16988v2 | link |
2024-07-24 | Pose Estimation from Camera Images for Underwater Inspection | Luyuan Peng et.al. | 2407.16961v1 | null |
2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560v1 | link |
2024-07-23 | Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features | Romeo Valentin et.al. | 2407.16223v1 | null |
2024-07-23 | Optimal camera-robot pose estimation in linear time from points and lines | Guangyang Zeng et.al. | 2407.16151v1 | null |
2024-07-23 | 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images | Jie Zhao et.al. | 2407.16137v1 | null |
2024-07-21 | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models | Zheng Chong et.al. | 2407.15886v1 | link |
2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791v1 | null |
2024-07-22 | Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection | Kangqi Ma et.al. | 2407.15771v1 | null |
2024-07-22 | 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model | Matteo Bortolon et.al. | 2407.15484v1 | null |
2024-07-23 | Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions | Yihao Ai et.al. | 2407.15451v2 | link |
2024-07-22 | avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality | Dizhi Ma et.al. | 2407.15373v1 | null |
2024-07-20 | From Underground Mines to Offices: A Versatile and Robust Framework for Range-Inertial SLAM | Lorenzo Montano-Oliván et.al. | 2407.14797v1 | null |
2024-07-19 | ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation | Luke Bidulka et.al. | 2407.14605v1 | null |
2024-07-19 | 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry | Sungho Chun et.al. | 2407.14136v1 | link |
2024-07-18 | RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark | Yuan-Hao Ho et.al. | 2407.13930v1 | null |
2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537v2 | link |
2024-07-18 | SCAPE: A Simple and Strong Category-Agnostic Pose Estimator | Yujia Liang et.al. | 2407.13483v1 | link |
2024-07-17 | SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization | Yiyang Chen et.al. | 2407.12667v1 | link |
2024-07-17 | Invertible Neural Warp for NeRF | Shin-Fang Chng et.al. | 2407.12354v1 | null |
2024-07-16 | NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models | Francesco Milano et.al. | 2407.12207v1 | link |
2024-07-16 | Monocular pose estimation of articulated surgical instruments in open surgery | Robert Spektor et.al. | 2407.12138v1 | null |
2024-07-17 | GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jingwen Yu et.al. | 2407.11736v2 | link |
2024-07-16 | TCFormer: Visual Recognition via Token Clustering Transformer | Wang Zeng et.al. | 2407.11321v1 | link |
2024-07-15 | A BlueROV2-based platform for underwater mapping experiments | Tudor Alinei-Poiana et.al. | 2407.10901v1 | link |
2024-07-15 | LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning | Zhuozhu Jian et.al. | 2407.10782v1 | null |
2024-07-15 | Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis | Antoine Legrand et.al. | 2407.10762v1 | null |
2024-07-16 | GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation | Haonan Wang et.al. | 2407.10756v2 | null |
2024-07-15 | Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of its LEDs | Nicholas Carlotti et.al. | 2407.10661v1 | null |
2024-07-15 | Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function | Giulia Panconi et.al. | 2407.10590v1 | null |
2024-07-14 | 3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects | Weiming Zhi et.al. | 2407.10331v1 | null |
2024-07-16 | psifx -- Psychological and Social Interactions Feature Extraction Package | Guillaume Rochette et.al. | 2407.10266v2 | null |
2024-07-14 | PAFUSE: Part-based Diffusion for 3D Whole-Body Pose Estimation | Nermin Samet et.al. | 2407.10220v1 | link |
2024-07-14 | 3DEgo: 3D Editing on the Go! | Umar Khalid et.al. | 2407.10102v1 | null |
2024-07-12 | iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning | Tom Fischer et.al. | 2407.09271v1 | link |
2024-07-12 | HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation | Manuel Birlo et.al. | 2407.09215v1 | null |
2024-07-12 | KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting | Andrew Jeong et.al. | 2407.08909v1 | null |
2024-07-11 | RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation | Tao Jiang et.al. | 2407.08634v1 | link |
2024-07-11 | SRPose: Two-view Relative Pose Estimation with Sparse Keypoints | Rui Yin et.al. | 2407.08199v1 | link |
2024-07-11 | SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM | Neng Wang et.al. | 2407.08106v1 | link |
2024-07-10 | RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects | Jiahao Nick Li et.al. | 2407.08081v1 | null |
2024-07-10 | Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization | Jinjie Mai et.al. | 2407.08023v1 | link |
2024-07-10 | Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation | Junjia Han et.al. | 2407.07389v1 | null |
2024-07-09 | Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images | Chuanrui Zhang et.al. | 2407.06984v1 | null |
2024-07-09 | Computer vision tasks for intelligent aerospace missions: An overview | Huilin Chen et.al. | 2407.06513v1 | null |
2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597v1 | null |
2024-07-10 | On the power of data augmentation for head pose estimation | Michael Welter et.al. | 2407.05357v2 | link |
2024-07-07 | SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning | Yi Feng et.al. | 2407.05283v1 | link |
2024-07-05 | Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos | Leonhard Sommer et.al. | 2407.04384v1 | link |
2024-07-04 | Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation | Laiyan Ding et.al. | 2407.04041v1 | link |
2024-07-04 | Markerless Multi-view 3D Human Pose Estimation: a survey | Ana Filipa Rodrigues Nogueira et.al. | 2407.03817v1 | null |
2024-07-04 | A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios | Zikang Yuan et.al. | 2407.03590v1 | link |
2024-07-03 | Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation | Mengmeng Cui et.al. | 2407.02990v1 | null |
2024-07-03 | Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction | Jiaxin Guo et.al. | 2407.02918v1 | link |
2024-07-02 | SUPER: Seated Upper Body Pose Estimation using mmWave Radars | Bo Zhang et.al. | 2407.02455v1 | null |
2024-07-02 | ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction | Bo Qian et.al. | 2407.02129v1 | null |
2024-07-02 | Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval | Nicola Messina et.al. | 2407.02104v1 | null |
2024-07-01 | Active Human Pose Estimation via an Autonomous UAV Agent | Jingxi Chen et.al. | 2407.01811v1 | null |
2024-07-01 | RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields | Haochen Jiang et.al. | 2407.01303v1 | link |
2024-07-01 | Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization | Ruofei Bai et.al. | 2407.01013v1 | link |
2024-06-30 | Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation | Adnan Abdullah et.al. | 2407.00848v1 | null |
2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518v1 | link |
2024-06-28 | Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review | Moseli Mots'oehli et.al. | 2407.00252v1 | null |
2024-06-28 | EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans | Nicola Garau et.al. | 2406.19726v1 | null |
2024-06-28 | CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services | DongKi Noh et.al. | 2406.19634v1 | null |
2024-06-27 | Multimodal Visual-haptic pose estimation in the presence of transient occlusion | Michael Zechmair et.al. | 2406.19323v1 | null |
2024-06-27 | Human Modelling and Pose Estimation Overview | Pawel Knap et.al. | 2406.19290v1 | null |
2024-06-26 | Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference | Yuan Gao et.al. | 2406.18453v1 | link |
2024-06-27 | Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods | Filipe Gama et.al. | 2406.17382v2 | null |
2024-06-24 | High-resolution open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2406.16384v1 | null |
2024-06-23 | Breaking the Frame: Image Retrieval by Visual Overlap Prediction | Tong Wei et.al. | 2406.16204v1 | link |
2024-06-21 | Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe | Sandeep Singh Sengar et.al. | 2406.15649v1 | link |
2024-06-24 | Investigating the impact of 2D gesture representation on co-speech gesture generation | Teo Guichoux et.al. | 2406.15111v2 | null |
2024-06-20 | Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data | Moira Shooter et.al. | 2406.14412v1 | null |
2024-06-20 | PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions | Sihan Ma et.al. | 2406.14367v1 | null |
2024-06-19 | NeRF-Feat: 6D Object Pose Estimation using Feature Rendering | Shishir Reddy Vutukur et.al. | 2406.13796v1 | null |
2024-06-19 | CNN Based Flank Predictor for Quadruped Animal Species | Vanessa Suessle et.al. | 2406.13588v1 | null |
2024-06-19 | MVSBoost: An Efficient Point Cloud-based 3D Reconstruction | Umair Haroon et.al. | 2406.13515v1 | null |
2024-06-19 | An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses | Johanna Bräunig et.al. | 2406.13464v1 | null |
2024-06-18 | Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings | Ruijie Tang et.al. | 2406.13048v1 | null |
2024-06-17 | Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization | Huaiji Zhou et.al. | 2406.11766v1 | null |
2024-06-17 | Domain Generalization for In-Orbit 6D Pose Estimation | Antoine Legrand et.al. | 2406.11743v1 | null |
2024-06-17 | SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking | Tianhong Catherine Yu et.al. | 2406.11645v1 | null |
2024-06-14 | Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization | Wonho Song et.al. | 2406.11599v1 | null |
2024-06-15 | MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception | M. Mahbubur Rahman et.al. | 2406.10708v1 | link |
2024-06-15 | Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference | Shayan Shekarforoush et.al. | 2406.10455v1 | null |
2024-06-14 | The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences | Bria Long et.al. | 2406.10447v1 | null |
2024-06-14 | OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics | Yoni Gozlan et.al. | 2406.09788v1 | null |
2024-06-13 | ImageNet3D: Towards General-Purpose Object-Level 3D Understanding | Wufei Ma et.al. | 2406.09613v1 | link |
2024-06-13 | Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV | Maneesha Wickramasuriya et.al. | 2406.09260v1 | link |
2024-06-14 | Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning | Huy Hoang Nguyen et.al. | 2406.09039v2 | null |
2024-06-14 | VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jiannan Wu et.al. | 2406.08394v2 | link |
2024-06-12 | Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization | Jiaxin Deng et.al. | 2406.08001v1 | null |
2024-06-12 | IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes | Fengtian Lang et.al. | 2406.07937v1 | link |
2024-06-12 | From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers | Swaminathan Gurumurthy et.al. | 2406.07785v1 | link |
2024-06-12 | SPIN: Spacecraft Imagery for Navigation | Javier Montalvo et.al. | 2406.07500v2 | link |
2024-06-11 | Realistic Data Generation for 6D Pose Estimation of Surgical Instruments | Juan Antonio Barragan et.al. | 2406.07328v1 | link |
2024-06-11 | SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale | Shester Gueuwou et.al. | 2406.06907v1 | null |
2024-06-10 | Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation | Shenghao Li et.al. | 2406.06374v1 | link |
2024-06-08 | A preprocessing-based planning framework for utilizing contacts in high-precision insertion tasks | Muhammad Suhail Saleem et.al. | 2406.05522v1 | null |
2024-06-06 | GLACE: Global Local Accelerated Coordinate Encoding | Fangjinhua Wang et.al. | 2406.04340v1 | link |
2024-06-06 | Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking | Jiyao Zhang et.al. | 2406.04316v1 | null |
2024-06-05 | Hi5: 2D Hand Pose Estimation with Zero Human Annotation | Masum Hasan et.al. | 2406.03599v1 | null |
2024-06-05 | Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices | Xingjian Yang et.al. | 2406.02977v1 | null |
2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509v1 | null |
2024-06-04 | HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model | Yu Tian et.al. | 2406.01914v1 | null |
2024-06-03 | A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios | Enrico Martini et.al. | 2406.01832v1 | link |
2024-06-01 | Equivariant amortized inference of poses for cryo-EM | Larissa de Ruijter et.al. | 2406.01630v1 | null |
2024-06-03 | 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information | Sihan Wen et.al. | 2406.01196v1 | null |
2024-06-01 | CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation | Matan Rusanovsky et.al. | 2406.00384v1 | link |
2024-05-30 | Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach | Muhammad Saif Ullah Khan et.al. | 2405.20084v1 | null |
2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614v1 | null |
2024-05-29 | Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives | Mingqi Yuan et.al. | 2405.19531v1 | null |
2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173v1 | null |
2024-05-28 | World Models for General Surgical Grasping | Hongbin Lin et.al. | 2405.17940v1 | null |
2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421v1 | link |
2024-05-27 | Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding | Niloofar Azizi et.al. | 2405.17397v1 | null |
2024-05-27 | Weiquan Wang et.al. | 2405.17016v1 | null | |
2024-05-27 | Clustering-based Learning for UAV Tracking and Pose Estimation | Jiaping Xiao et.al. | 2405.16867v1 | null |
2024-05-26 | Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge | Tianchen Deng et.al. | 2405.16464v1 | link |
2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008v1 | null |
2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731v1 | link |
2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467v1 | link |
2024-05-21 | Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos | Jayroop Ramesh et.al. | 2405.13235v1 | link |
2024-05-21 | Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations | Antoine Legrand et.al. | 2405.12728v1 | null |
2024-05-21 | PoseGravity: Pose Estimation from Points and Lines with Axis Prior | Akshay Chandrasekhar et.al. | 2405.12646v1 | link |
2024-05-19 | Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.12247v1 | null |
2024-05-20 | AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements | Calvin Yeung et.al. | 2405.12070v1 | link |
2024-05-19 | Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Christiaan G. A. Viviers et.al. | 2405.11677v1 | link |
2024-05-19 | Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.11448v1 | null |
2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257v1 | null |
2024-05-18 | MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | Xinli Guo et.al. | 2405.11129v1 | link |
2024-05-17 | Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation | Yongliang Lin et.al. | 2405.10557v1 | null |
2024-05-16 | Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder | Mohamed Ilyes Lakhal et.al. | 2405.10423v1 | null |
2024-05-17 | Toon3D: Seeing Cartoons from a New Perspective | Ethan Weber et.al. | 2405.10320v2 | null |
2024-05-15 | Task-adaptive Q-Face | Haomiao Sun et.al. | 2405.09059v1 | null |
2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483v1 | link |
2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434v1 | null |
2024-05-13 | Deep Learning-Based Object Pose Estimation: A Comprehensive Survey | Jian Liu et.al. | 2405.07801v1 | link |
2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429v1 | link |
2024-05-11 | TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization | Zhen Tan et.al. | 2405.07027v1 | link |
2024-05-11 | AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation | Xingxu Li et.al. | 2405.06959v1 | null |
2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845v1 | link |
2024-05-10 | MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization | Pengcheng Zhu et.al. | 2405.06241v1 | null |
2024-05-10 | Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera | Haixin Shi et.al. | 2405.05858v2 | null |
2024-05-09 | Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion | Huanyu Tian et.al. | 2405.05817v1 | null |
2024-05-09 | NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM | Yiping Xie et.al. | 2405.05807v1 | null |
2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526v1 | null |
2024-05-08 | Adversary-Guided Motion Retargeting for Skeleton Anonymization | Thomas Carr et.al. | 2405.05428v1 | null |
2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216v1 | link |
2024-05-08 | ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion | Bing Zhu et.al. | 2405.05164v1 | null |
2024-05-08 | GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation | Ivan Bilić et.al. | 2405.04890v1 | null |
2024-05-07 | Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation | Jenny Wang et.al. | 2405.04609v1 | null |
2024-05-07 | Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map | Yuxuan Xia et.al. | 2405.04290v1 | null |
2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969v1 | null |
2024-05-07 | Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints | Xiongjun Guan et.al. | 2405.03959v1 | link |
2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689v1 | null |
2024-05-06 | Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors | Amit Moryossef et.al. | 2405.03545v1 | link |
2024-05-05 | Multi-hop graph transformer network for 3D human pose estimation | Zaedul Islam et.al. | 2405.03055v1 | null |
2024-05-05 | Blending Distributed NeRFs with Tri-stage Robust Pose Optimization | Baijun Ye et.al. | 2405.02880v1 | null |
2024-05-03 | WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD | Xuxin Cheng et.al. | 2405.02241v1 | link |
2024-05-03 | Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation | Xianzhou Zeng et.al. | 2405.02114v1 | link |
2024-05-03 | An Onboard Framework for Staircases Modeling Based on Point Clouds | Chun Qing et.al. | 2405.01918v1 | null |
2024-05-06 | ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness | Deegan Atha et.al. | 2405.01673v2 | null |
2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472v1 | null |
2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284v1 | null |
2024-05-02 | Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors | Wenxuan Guo et.al. | 2405.01112v1 | null |
2024-05-02 | CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications | Jan Blumenkamp et.al. | 2405.01107v1 | null |
2024-05-04 | HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images | Zixun Jiao et.al. | 2405.01066v2 | null |
2024-05-01 | Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods | Andrew J. Kramer et.al. | 2405.00600v1 | null |
2024-04-30 | Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging | Rayan Armani et.al. | 2404.19541v1 | link |
2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401v1 | link |
2024-04-30 | Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training | Xingyu Song et.al. | 2404.19279v1 | link |
2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174v1 | null |
2024-04-29 | Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction | Antoine Maiorca et.al. | 2404.18628v1 | null |
2024-04-29 | Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle | Jungwoo Lee et.al. | 2404.18395v1 | null |
2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394v1 | null |
2024-04-27 | Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs | Yiming Bao et.al. | 2404.17837v1 | null |
2024-04-26 | Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses | Yi Shen et.al. | 2404.17685v1 | null |
2024-04-26 | SLAM for Indoor Mapping of Wide Area Construction Environments | Vincent Ress et.al. | 2404.17215v1 | null |
2024-04-25 | WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users | William Huang et.al. | 2404.17063v1 | link |
2024-04-25 | Transformer-Based Local Feature Matching for Multimodal Image Registration | Remi Delaunay et.al. | 2404.16802v1 | null |
2024-04-25 | DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation | Leandro Di Bella et.al. | 2404.16558v1 | null |
2024-04-25 | Efficient Solution of Point-Line Absolute Pose | Petr Hruby et.al. | 2404.16552v1 | link |
2024-04-25 | COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images | Panagiotis Sapoutzoglou et.al. | 2404.16471v1 | link |
2024-04-25 | MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter | Kenji Koide et.al. | 2404.16370v1 | null |
2024-04-24 | 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement | Filipa Lino et.al. | 2404.16136v1 | link |
2024-04-23 | SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Xiangyu Xu et.al. | 2404.15276v1 | link |
2024-04-25 | Domain adaptive pose estimation via multi-level alignment | Yugan Chen et.al. | 2404.14885v2 | link |
2024-04-23 | Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking | Kexin Meng et.al. | 2404.14835v1 | null |
2024-04-23 | UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues | Vandad Davoodnia et.al. | 2404.14634v1 | null |
2024-04-22 | DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation | Yonghao Dang et.al. | 2404.14025v1 | link |
2024-04-23 | CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory | Yunlong Ran et.al. | 2404.13896v2 | null |
2024-04-21 | Resampling-free Particle Filters in High-dimensions | Akhilan Boopathy et.al. | 2404.13698v1 | link |
2024-04-20 | EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment | Guanghao Li et.al. | 2404.13346v1 | link |
2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440v1 | null |
2024-04-18 | Gait Recognition from Highly Compressed Videos | Andrei Niculae et.al. | 2404.12183v1 | null |
2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144v1 | link |
2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205v1 | null |
2024-04-17 | GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement | Linfang Zheng et.al. | 2404.11139v1 | null |
2024-04-17 | CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation | Lianyu Hu et.al. | 2404.11111v1 | link |
2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880v1 | null |
2024-04-16 | Invariant Kalman Filtering with Noise-Free Pseudo-Measurements | Sven Goffin et.al. | 2404.10687v1 | null |
2024-04-16 | The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement | Gabriele Trivigno et.al. | 2404.10438v1 | null |
2024-04-16 | GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling | Huantao Ren et.al. | 2404.10213v1 | null |
2024-04-16 | LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark | Avinash Upadhyay et.al. | 2404.10212v1 | link |
2024-04-15 | LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives | Jiadi Cui et.al. | 2404.09748v1 | null |
2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308v1 | link |
2024-04-13 | DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Johan Edstedt et.al. | 2404.08928v1 | link |
2024-04-16 | 3D Human Scan With A Moving Event Camera | Kai Kohyama et.al. | 2404.08504v2 | null |
2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649v1 | null |
2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603v1 | null |
2024-04-10 | Measuring proximity to standard planes during fetal brain ultrasound scanning | Chiara Di Vece et.al. | 2404.07124v1 | null |
2024-04-10 | MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints | Bedirhan Uguz et.al. | 2404.07094v1 | null |
2024-04-10 | Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting | Xiaolei Lang et.al. | 2404.06926v1 | null |
2024-04-09 | Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Axel Barroso-Laguna et.al. | 2404.06337v1 | link |
2024-04-09 | Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes | Tianchen Deng et.al. | 2404.06050v1 | null |
2024-04-08 | Learning 3D-Aware GANs from Unposed Images with Template Feature Field | Xinya Chen et.al. | 2404.05705v1 | null |
2024-04-08 | Learning a Category-level Object Pose Estimator without Pose Annotations | Fengrui Tian et.al. | 2404.05626v1 | null |
2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518v1 | link |
2024-04-08 | Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks | Maksym Ivashechkin et.al. | 2404.05414v1 | null |
2024-04-08 | STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs | Kush Hari et.al. | 2404.05151v1 | null |
2024-04-05 | ToolEENet: Tool Affordance 6D Pose Estimation | Yunlong Wang et.al. | 2404.04193v1 | null |
2024-04-04 | SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation | Sichen Chen et.al. | 2404.03518v1 | link |
2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256v1 | null |
2024-04-04 | HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud | Wencan Cheng et.al. | 2404.03159v1 | link |
2024-04-03 | Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones | Luca Crupi et.al. | 2404.02567v1 | null |
2024-04-03 | Semi-Supervised Unconstrained Head Pose Estimation in the Wild | Huayi Zhou et.al. | 2404.02544v1 | link |
2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125v1 | null |
2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Vinkle Srivastav et.al. | 2404.02041v1 | link |
2024-04-01 | Marrying NeRF with Feature Matching for One-step Pose Estimation | Ronghan Chen et.al. | 2404.00891v1 | null |
2024-03-31 | Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation | Meisam Kabiri et.al. | 2404.00691v1 | null |
2024-03-31 | OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos | Dongyoung Choi et.al. | 2404.00676v1 | null |
2024-04-02 | KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Jihua Peng et.al. | 2404.00658v2 | link |
2024-03-29 | FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model | Molin Zhang et.al. | 2404.00132v1 | null |
2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251v1 | null |
2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031v1 | null |
2024-04-01 | Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Jijie He et.al. | 2403.19926v2 | link |
2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527v1 | link |
2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791v1 | link |
2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259v1 | null |
2024-03-26 | Mathematical Foundation and Corrections for Full Range Head Pose Estimation | Huei-Chung Hu et.al. | 2403.18104v1 | null |
2024-03-26 | EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation | Chenhongyi Yang et.al. | 2403.18080v1 | link |
2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | Md Mushfiqur Azam et.al. | 2403.17893v1 | link |
2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | Hrishav Bakul Barua et.al. | 2403.17837v1 | link |
2024-03-26 | DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions | Sammy Christen et.al. | 2403.17827v1 | null |
2024-03-26 | System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners | Felix Esser et.al. | 2403.17788v1 | null |
2024-03-25 | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Remy Sabathier et.al. | 2403.17103v1 | link |
2024-03-25 | Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging | Mahdieh Dashtbani Moghari et.al. | 2403.16490v1 | null |
2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428v1 | link |
2024-03-25 | A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups | Yixiao Ge et.al. | 2403.16411v1 | null |
2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400v1 | link |
2024-03-24 | KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments | Abdelrahman Younes et.al. | 2403.16238v1 | null |
2024-03-24 | Diffusion Model is a Good Pose Estimator from 3D RF-Vision | Junqiao Fan et.al. | 2403.16198v1 | null |
2024-03-23 | UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation | Yuliang Guo et.al. | 2403.15705v1 | link |
2024-03-22 | InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Sisi Dai et.al. | 2403.15612v1 | link |
2024-03-22 | Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times | Sepehr Sabeti et.al. | 2403.15571v1 | null |
2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333v1 | null |
2024-03-22 | WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization | Jialu Wang et.al. | 2403.15272v1 | null |
2024-03-22 | DITTO: Demonstration Imitation by Trajectory Transformation | Nick Heppert et.al. | 2403.15203v1 | link |
2024-03-22 | Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning | Bumsoo Kim et.al. | 2403.15048v1 | null |
2024-03-22 | Trajectory Regularization Enhances Self-Supervised Geometric Representation | Jiayun Wang et.al. | 2403.14973v1 | link |
2024-03-21 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743v1 | link |
2024-03-21 | Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation | Ruyi Lian et.al. | 2403.14559v1 | null |
2024-03-23 | Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset | Andrea Avogaro et.al. | 2403.14447v2 | null |
2024-03-21 | Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests | Haedam Oh et.al. | 2403.14326v1 | null |
2024-03-21 | Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation | Francesco Di Felice et.al. | 2403.14279v1 | null |
2024-03-20 | DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses | Chen Zhao et.al. | 2403.13683v1 | link |
2024-03-20 | Meta-Point Learning and Refining for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2403.13647v1 | link |
2024-03-20 | Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery | Mayura Manawadu et.al. | 2403.13434v1 | null |
2024-03-20 | DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation | Yamin Mao et.al. | 2403.13405v1 | null |
2024-03-20 | ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics | Qiaojun Yu et.al. | 2403.13365v1 | null |
2024-03-20 | MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination | Weiying Wang et.al. | 2403.13348v1 | null |
2024-03-19 | FaceXFormer: A Unified Transformer for Facial Analysis | Kartik Narayan et.al. | 2403.12960v1 | link |
2024-03-19 | WHAC: World-grounded Humans and Cameras | Wanqi Yin et.al. | 2403.12959v1 | link |
2024-03-19 | Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation | Jingtao Sun et.al. | 2403.12728v1 | link |
2024-03-19 | IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model | Matteo Bortolon et.al. | 2403.12682v1 | null |
2024-03-19 | In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing | Mingrui Yu et.al. | 2403.12676v1 | null |
2024-03-19 | Self-learning Canonical Space for Multi-view 3D Human Pose Estimation | Xiaoben Li et.al. | 2403.12440v1 | null |
2024-03-20 | Human Mesh Recovery from Arbitrary Multi-view Images | Xiaoben Li et.al. | 2403.12434v2 | link |
2024-03-19 | XPose: eXplainable Human Pose Estimation | Luyu Qiu et.al. | 2403.12370v1 | null |
2024-03-18 | HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data | Mengqi Zhang et.al. | 2403.12011v1 | null |
2024-03-18 | Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction | Wolfgang Fuhl et.al. | 2403.11665v1 | null |
2024-03-18 | An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation | Zewen Xu et.al. | 2403.11639v1 | null |
2024-03-18 | LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | Yang Yang et.al. | 2403.11627v1 | link |
2024-03-18 | GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects | Sungphill Moon et.al. | 2403.11510v1 | null |
2024-03-17 | A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation | Qucheng Peng et.al. | 2403.11310v1 | link |
2024-03-17 | Compact 3D Gaussian Splatting For Dense Visual SLAM | Tianchen Deng et.al. | 2403.11247v1 | link |
2024-03-16 | Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty | Lakshadeep Naik et.al. | 2403.10874v1 | null |
2024-03-16 | DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation | Christopher Kolios et.al. | 2403.10773v1 | null |
2024-03-15 | GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation | Dingding Cai et.al. | 2403.10683v1 | null |
2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990v1 | null |
2024-03-14 | ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image | Fangqiang Ding et.al. | 2403.09871v1 | null |
2024-03-14 | BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects | Tomas Hodan et.al. | 2403.09799v1 | null |
2024-03-14 | Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR | Sebastián Barbas Laina et.al. | 2403.09596v1 | null |
2024-03-14 | Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting | Pawel Knap et.al. | 2403.09437v1 | null |
2024-03-14 | LM2D: Lyrics- and Music-Driven Dance Synthesis | Wenjie Yin et.al. | 2403.09407v1 | null |
2024-03-14 | SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios | Ding-Tao Huang et.al. | 2403.09317v1 | link |
2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309v1 | null |
2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650v1 | null |
2024-03-15 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586v2 | null |
2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156v1 | link |
2024-03-12 | Q-SLAM: Quadric Representations for Monocular SLAM | Chensheng Peng et.al. | 2403.08125v1 | null |
2024-03-12 | MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation | Yuelong Li et.al. | 2403.08019v1 | link |
2024-03-12 | Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation | Kira Wursthorn et.al. | 2403.07741v1 | null |
2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | JunDa Cheng et.al. | 2403.07535v1 | link |
2024-03-12 | Category-Agnostic Pose Estimation for Point Clouds | Bowen Liu et.al. | 2403.07437v1 | null |
2024-03-12 | Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery | Yike Zhang et.al. | 2403.07219v1 | null |
2024-03-11 | Real-Time Simulated Avatar from Head-Mounted Sensors | Zhengyi Luo et.al. | 2403.06862v1 | null |
2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577v1 | null |
2024-03-10 | Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation | Paweł A. Pierzchlewicz et.al. | 2403.06164v1 | link |
2024-03-10 | Diffusion Models Trained with Large Data Are Transferable Visual Models | Guangkai Xu et.al. | 2403.06090v1 | link |
2024-03-08 | Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm | Ziyu Zhang et.al. | 2403.05666v1 | null |
2024-03-11 | Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation | Tarek Bouazza et.al. | 2403.05450v2 | null |
2024-03-07 | Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps | Ivana Collado-Gonzalez et.al. | 2403.04936v1 | null |
2024-03-07 | That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation | Georgi Pramatarov et.al. | 2403.04755v1 | null |
2024-03-07 | Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser | Qingyuan Cai et.al. | 2403.04444v1 | link |
2024-03-09 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | Ruicong Liu et.al. | 2403.04381v2 | link |
2024-03-05 | FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation | Chris Rockwell et.al. | 2403.03221v1 | null |
2024-03-05 | NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors | Yannan He et.al. | 2403.03122v1 | null |
2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111v1 | null |
2024-03-05 | Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps | Timothy Chen et.al. | 2403.02751v1 | link |
2024-03-04 | PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station | Cunyi Yin et.al. | 2403.01913v1 | link |
2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813v1 | null |
2024-03-03 | MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images | Junwen Huang et.al. | 2403.01517v1 | null |
2024-03-02 | Single-image camera calibration with model-free distortion correction | Katia Genovese et.al. | 2403.01263v1 | null |
2024-03-02 | Grid-based Fast and Structural Visual Odometry | Zhang Zhihe et.al. | 2403.01110v1 | null |
2024-03-01 | Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations | Syed Shabbir Ahmed et.al. | 2403.00988v1 | null |
2024-03-04 | TEXterity -- Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Sangwoon Kim et.al. | 2403.00049v2 | null |
2024-03-01 | Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach | Sarina Thomas et.al. | 2402.19062v2 | null |
2024-02-29 | Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey | Yang Liu et.al. | 2402.18844v1 | link |
2024-02-28 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting | Taeho Kang et.al. | 2402.18330v1 | link |
2024-02-28 | Location-guided Head Pose Estimation for Fisheye Image | Bing Li et.al. | 2402.18320v1 | null |
2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196v1 | link |
2024-02-28 | Six-Point Method for Multi-Camera Systems with Reduced Solution Space | Banglei Guan et.al. | 2402.18066v1 | link |
2024-02-27 | Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association | Zhaoying Wang et.al. | 2402.17504v1 | null |
2024-02-26 | HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Haozhe Qi et.al. | 2402.17062v1 | link |
2024-02-26 | DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation | Shang Wu et.al. | 2402.16640v1 | null |
2024-02-26 | GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video | Xinqi Liu et.al. | 2402.16607v1 | null |
2024-02-26 | DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer | Yizhe Wu et.al. | 2402.16308v1 | null |
2024-02-25 | XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras | Arnav Mishra et.al. | 2402.16175v1 | null |
2024-02-25 | VOLoc: Visual Place Recognition by Querying Compressed Lidar Map | Xudong Cai et.al. | 2402.15961v1 | link |
2024-02-24 | CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge | Xiao Lin et.al. | 2402.15726v1 | null |
2024-02-23 | Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones | Matteo Risso et.al. | 2402.15273v1 | null |
2024-02-22 | Cameras as Rays: Pose Estimation via Ray Diffusion | Jason Y. Zhang et.al. | 2402.14817v1 | null |
2024-02-22 | S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR | Jialun Pei et.al. | 2402.14461v1 | link |
2024-02-22 | VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning | Jingyao Li et.al. | 2402.14456v1 | null |
2024-02-22 | Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks | Daniel Holmberg et.al. | 2402.14400v1 | link |
2024-02-22 | Secure Navigation using Landmark-based Localization in a GPS-denied Environment | Ganesh Sapkota et.al. | 2402.14280v1 | null |
2024-02-21 | SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings | Rishabh Bajpai et.al. | 2402.14143v1 | null |
2024-02-21 | High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks | Luca Crupi et.al. | 2402.13756v1 | null |
2024-02-21 | EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization | Zhendong Xiao et.al. | 2402.13537v1 | null |
2024-02-20 | DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation | Takuya Ikeda et.al. | 2402.12647v1 | link |
2024-02-19 | Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield Environment | Ganesh Sapkota et.al. | 2402.12551v1 | null |
2024-02-18 | Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training | Huayi Zhou et.al. | 2402.11566v1 | link |
2024-02-17 | Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review | Merryn D. Constable et.al. | 2402.11288v1 | null |
2024-02-17 | Dense Matchers for Dense Tracking | Tomáš Jelínek et.al. | 2402.11287v1 | null |
2024-02-16 | Occlusion Resilient 3D Human Pose Estimation | Soumava Kumar Roy et.al. | 2402.11036v1 | null |
2024-02-16 | 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Tsung-Wei Ke et.al. | 2402.10885v1 | null |
2024-02-15 | Lester: rotoscope animation through video object segmentation and tracking | Ruben Tous et.al. | 2402.09883v1 | link |
2024-02-15 | Foul prediction with estimated poses from soccer broadcast video | Jiale Fang et.al. | 2402.09650v1 | null |
2024-02-16 | IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture | Varun Ramani et.al. | 2402.08923v2 | null |
2024-02-13 | Are Semi-Dense Detector-Free Methods Good at Matching Local Features? | Matthieu Vilain et.al. | 2402.08671v1 | null |
2024-02-13 | Gaussian-Sum Filter for Range-based 3D Relative Pose Estimation in the Presence of Ambiguities | Syed S. Ahmed et.al. | 2402.08566v1 | null |
2024-02-13 | Learning to Produce Semi-dense Correspondences for Visual Localization | Khang Truong Giang et.al. | 2402.08359v1 | link |
2024-02-12 | Extending 3D body pose estimation for robotic-assistive therapies of autistic children | Laura Santos et.al. | 2402.08006v1 | null |
2024-02-12 | GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance | Shiyu Li et.al. | 2402.07677v1 | link |
2024-02-12 | UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments | Ahmed Radwan et.al. | 2402.07537v1 | null |
2024-02-09 | Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation | Peter Hönig et.al. | 2402.06436v1 | null |
2024-02-08 | Real-time Holistic Robot Pose Estimation with Unknown States | Shikun Ban et.al. | 2402.05655v1 | link |
2024-02-08 | Extending 6D Object Pose Estimators for Stereo Vision | Thomas Pöllabauer et.al. | 2402.05610v1 | null |
2024-02-09 | NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction | Zhongqun Zhang et.al. | 2402.05532v2 | null |
2024-02-07 | Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training | Thomas Pöllabauer et.al. | 2402.04979v1 | null |
2024-02-07 | 4-Dimensional deformation part model for pose estimation using Kalman filter constraints | Enrique Martinez-Berti et.al. | 2402.04953v1 | null |
2024-02-07 | STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation | Peter Hönig et.al. | 2402.04878v1 | link |
2024-02-05 | A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model | Murad Hasan et.al. | 2402.03417v1 | null |
2024-02-05 | SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Mingrui Li et.al. | 2402.03246v1 | link |
2024-02-05 | Extreme Two-View Geometry From Object Poses with Diffusion Models | Yujing Sun et.al. | 2402.02800v1 | link |
2024-02-04 | Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation | Ti Wang et.al. | 2402.02339v1 | null |
2024-02-01 | mmID: High-Resolution mmWave Imaging for Human Identification | Sakila S. Jayaweera et.al. | 2402.00996v1 | null |
2024-02-01 | In-Bed Pose Estimation: A Review | Ziya Ata Yazıcı et.al. | 2402.00700v1 | null |
2024-02-01 | WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness | Mateus Valverde Gasparino et.al. | 2402.00683v1 | link |
2024-02-02 | CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration | Daniele Cattaneo et.al. | 2402.00129v2 | null |
2024-01-31 | Improved Scene Landmark Detection for Camera Localization | Tien Do et.al. | 2401.18083v1 | link |
2024-01-30 | Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics | Nastaran Darabi et.al. | 2401.17481v1 | null |
2024-01-30 | MESA: Matching Everything by Segmenting Anything | Yesheng Zhang et.al. | 2401.16741v1 | null |
2024-01-30 | Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers | Jianbin Jiao et.al. | 2401.16700v1 | link |
2024-01-29 | Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation | Jaewoo Park et.al. | 2401.16284v1 | null |
2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173v1 | link |
2024-01-28 | Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras | Yu-Jhe Li et.al. | 2401.15616v1 | null |
2024-01-30 | Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization | Kihoon Shin et.al. | 2401.15313v2 | null |
2024-01-26 | Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones | Beatrice Alessandra Motetti et.al. | 2401.15236v1 | null |
2024-01-26 | SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras | Hanz Cuevas-Velasquez et.al. | 2401.14785v1 | null |
2024-01-24 | Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter | Dongmyoung Lee et.al. | 2401.13405v1 | null |
2024-01-24 | Linear Relative Pose Estimation Founded on Pose-only Imaging Geometry | Qi Cai et.al. | 2401.13357v1 | null |
2024-01-23 | SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization | Mingyang Li et.al. | 2401.13076v1 | link |
2024-01-24 | RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos | Hongchi Xia et.al. | 2401.12592v2 | null |
2024-01-26 | MobileARLoc: On-device Robust Absolute Localisation for Pervasive Markerless Mobile AR | Changkun Liu et.al. | 2401.11511v2 | null |
2024-01-19 | SCENES: Subpixel Correspondence Estimation With Epipolar Supervision | Dominik A. Kloepfer et.al. | 2401.10886v1 | null |
2024-01-19 | Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation | Prakhar Kaushik et.al. | 2401.10848v1 | null |
2024-01-22 | TEXterity: Tactile Extrinsic deXterity | Antonia Bronars et.al. | 2401.10230v2 | null |
2024-01-18 | Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework | Junkun Jiang et.al. | 2401.09836v1 | link |
2024-01-17 | DK-SLAM: Monocular Visual SLAM with Deep Keypoints Adaptive Learning, Tracking and Loop-Closing | Hao Qu et.al. | 2401.09160v1 | null |
2024-01-17 | PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Yue Pan et.al. | 2401.09101v1 | link |
2024-01-16 | AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization | Qi Liao et.al. | 2401.08360v1 | null |
2024-01-16 | S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera | Thanh Nguyen Canh et.al. | 2401.08134v1 | null |
2024-01-15 | Collaboratively Self-supervised Video Representation Learning for Action Recognition | Jie Zhang et.al. | 2401.07584v1 | null |
2024-01-14 | 3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework | Fan Zhang et.al. | 2401.07251v1 | null |
2024-01-11 | On the representation and methodology for wide and short range head pose estimation | Alejandro Cobo et.al. | 2401.05807v1 | link |
2024-01-10 | Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects | Tianhang Cheng et.al. | 2401.05236v1 | link |
2024-01-10 | Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits | Helena Russello et.al. | 2401.05202v1 | null |
2024-01-10 | Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton | Hongbo Kang et.al. | 2401.04921v1 | link |
2024-01-15 | Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker | Jingtao Sun et.al. | 2401.04377v2 | link |
2024-01-07 | RHOBIN Challenge: Reconstruction of Human Object Interaction | Xianghui Xie et.al. | 2401.04143v1 | null |
2024-01-08 | D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement | Danqi Yan et.al. | 2401.03914v1 | null |
2024-01-07 | Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems | Victor Adewopo et.al. | 2401.03587v1 | null |
2024-01-04 | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | Darshan Venkatrayappa et.al. | 2401.02383v1 | null |
2024-01-04 | Fit-NGP: Fitting Object Models to Neural Graphics Primitives | Marwan Taher et.al. | 2401.02357v1 | null |
2024-01-04 | PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation | Lukas Meyer et.al. | 2401.02281v1 | link |
2024-01-03 | Real-Time Human Fall Detection using a Lightweight Pose Estimation Technique | Ekram Alam et.al. | 2401.01587v1 | link |
2024-01-05 | PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization | Jiaming He et.al. | 2401.01081v2 | link |
2023-12-30 | 3D Human Pose Perception from Egocentric Stereo Videos | Hiroyasu Akada et.al. | 2401.00889v1 | null |
2024-01-01 | Geometry Depth Consistency in RGBD Relative Pose Estimation | Sourav Kumar et.al. | 2401.00639v1 | null |
2023-12-30 | A comprehensive framework for occluded human pose estimation | Linhao Xu et.al. | 2401.00155v1 | null |
2024-01-02 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029v2 | null |
2023-12-29 | MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments | Andrew Fishberg et.al. | 2312.17731v1 | link |
2023-12-28 | iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views | Chin-Hsuan Wu et.al. | 2312.17250v1 | link |
2023-12-28 | EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion | Jianping Jiang et.al. | 2312.16933v1 | null |
2023-12-28 | SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | Zikang Yuan et.al. | 2312.16800v1 | link |
2023-12-28 | L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry | Feiya Li et.al. | 2312.16787v1 | null |
2023-12-27 | HMP: Hand Motion Priors for Pose and Shape Estimation from Video | Enes Duran et.al. | 2312.16737v1 | null |
2023-12-27 | Camera calibration for the surround-view system: a benchmark and dataset | L Qin et.al. | 2312.16499v1 | null |
2023-12-24 | TEMP3D: Temporally Continuous 3D Human Pose Estimation Under Occlusions | Rohit Lal et.al. | 2312.16221v1 | link |
2023-12-26 | Graph Context Transformation Learning for Progressive Correspondence Pruning | Junwen Guo et.al. | 2312.15971v1 | link |
2023-12-25 | Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose Estimation | Feng Zhou et.al. | 2312.15636v1 | null |
2023-12-25 | APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond | Yuxiang Yang et.al. | 2312.15612v1 | link |
2023-12-23 | PACE: Pose Annotations in Cluttered Environments | Yang You et.al. | 2312.15130v1 | link |
2023-12-22 | PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF | Mohsen Gholami et.al. | 2312.14915v1 | link |
2023-12-22 | Harnessing Diffusion Models for Visual Perception with Meta Prompts | Qiang Wan et.al. | 2312.14733v1 | link |
2023-12-22 | Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization | Joaquin Rodriguez et.al. | 2312.14697v1 | link |
2023-12-22 | PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer | Neha Sengar et.al. | 2312.14577v1 | null |
2023-12-22 | Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning | Jay Shenoy et.al. | 2312.14432v1 | null |
2023-12-21 | 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera | Christen Millerdurai et.al. | 2312.14157v1 | null |
2023-12-21 | DUSt3R: Geometric 3D Vision Made Easy | Shuzhe Wang et.al. | 2312.14132v1 | link |
2023-12-20 | NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields | Jens Naumann et.al. | 2312.13471v1 | null |
2023-12-20 | Brain-Inspired Visual Odometry: Balancing Speed and Interpretability through a System of Systems Approach | Habib Boloorchi Tabrizi et.al. | 2312.13162v1 | link |
2023-12-18 | Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics | Yesukhei Jagvaral et.al. | 2312.11707v1 | null |
2023-12-18 | Underwater Robot Pose Estimation Using Acoustic Methods and Intermittent Position Measurements at the Surface | Vicu-Mihalis Maer et.al. | 2312.11401v1 | null |
2023-12-17 | SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation | Xiaoqi An et.al. | 2312.10758v1 | link |
2023-12-17 | PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields | Boming Zhao et.al. | 2312.10649v1 | null |
2023-12-15 | SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation | David C. Jeong et.al. | 2312.10195v1 | link |
2023-12-14 | iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching | Yuan Sun et.al. | 2312.09031v1 | null |
2023-12-14 | Scene 3-D Reconstruction System in Scattering Medium | Zhuoyifan Zhang et.al. | 2312.09005v1 | null |
2023-12-14 | CattleEyeView: A Multi-task Top-down View Cattle Dataset for Smarter Precision Livestock Farming | Kian Eng Ong et.al. | 2312.08764v1 | link |
2023-12-20 | PnP for Two-Dimensional Pose Estimation | Joshua Wang et.al. | 2312.08488v2 | link |
2023-12-13 | Pose and shear-based tactile servoing | John Lloyd et.al. | 2312.08411v1 | null |
2023-12-13 | FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects | Bowen Wen et.al. | 2312.08344v1 | link |
2023-12-13 | Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation | Arul Selvam Periyasamy et.al. | 2312.08268v1 | null |
2023-12-13 | CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation | Eugenio Chisari et.al. | 2312.08240v1 | null |
2023-12-13 | C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation | Florian Fervers et.al. | 2312.08060v1 | null |
2023-12-13 | Three-Filters-to-Normal+: Revisiting Discontinuity Discrimination in Depth-to-Normal Translation | Jingwei Yang et.al. | 2312.07964v1 | null |
2023-12-13 | Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users | Tianxun Zhou et.al. | 2312.07854v1 | null |
2023-12-12 | RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation | Peng Lu et.al. | 2312.07526v1 | link |
2023-12-12 | COLMAP-Free 3D Gaussian Splatting | Yang Fu et.al. | 2312.07504v1 | link |
2023-12-12 | RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments | Pavel Petracek et.al. | 2312.07337v1 | link |
2023-12-12 | Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs | Sunghwan Hong et.al. | 2312.07246v1 | link |
2023-12-12 | Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation | Yuchen Yang et.al. | 2312.07051v1 | link |
2023-12-12 | Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation | Nikhil Kashyap et.al. | 2312.06965v1 | null |
2023-12-12 | Exploring Novel Object Recognition and Spontaneous Location Recognition Machine Learning Analysis Techniques in Alzheimer's Mice | Soham Bafana et.al. | 2312.06914v1 | link |
2023-12-11 | Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach | Travis Driver et.al. | 2312.06865v1 | link |
2023-12-11 | Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input | Trung-Hieu Hoang et.al. | 2312.06797v1 | null |
2023-12-11 | 3D Hand Pose Estimation in Egocentric Images in the Wild | Aditya Prakash et.al. | 2312.06583v1 | null |
2023-12-11 | PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation | Zhiyu Pan et.al. | 2312.06409v1 | null |
2023-12-11 | ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation | Cédric Rommel et.al. | 2312.06386v1 | link |
2023-12-10 | From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation | Javier Tirado-Garín et.al. | 2312.05995v1 | link |
2023-12-09 | You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception | Sheng Jin et.al. | 2312.05525v1 | link |
2023-12-07 | Image and AIS Data Fusion Technique for Maritime Computer Vision Applications | Emre Gülsoylu et.al. | 2312.05270v1 | link |
2023-12-07 | Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection | Kohei Yamashita et.al. | 2312.04527v1 | null |
2023-12-07 | Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images | Yiqun Zhang et.al. | 2312.04236v1 | null |
2023-12-06 | Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning | Xinshun Wang et.al. | 2312.03703v1 | link |
2023-12-06 | Cooperative Probabilistic Trajectory Forecasting under Occlusion | Anshul Nayak et.al. | 2312.03296v1 | null |
2023-12-05 | A Unified Simulation Framework for Visual and Behavioral Fidelity in Crowd Analysis | Niccolò Bisagno et.al. | 2312.02613v1 | null |
2023-12-05 | 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation | K. Samarawickrama et.al. | 2312.02593v1 | link |
2023-12-05 | PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation | Geonhyup Lee et.al. | 2312.02531v1 | null |
2023-12-04 | GenEM: Physics-Informed Generative Cryo-Electron Microscopy | Jiakai Zhang et.al. | 2312.02235v1 | null |
2023-12-02 | Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors | Yu Zhang et.al. | 2312.02196v1 | link |
2023-12-04 | iMatching: Imperative Correspondence Learning | Zitong Zhan et.al. | 2312.02141v1 | link |
2023-12-04 | SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM | Nikhil Keetha et.al. | 2312.02126v1 | link |
2023-12-04 | Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection | Xubin Zhong et.al. | 2312.01713v1 | null |
2023-12-05 | Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Yizhou Wang et.al. | 2312.01697v2 | link |
2023-12-04 | Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks | Yan Xu et.al. | 2312.01561v1 | null |
2023-12-01 | Object 6D pose estimation meets zero-shot learning | Andrea Caraffa et.al. | 2312.00947v1 | null |
2023-12-01 | Open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2312.00690v1 | null |
2023-12-01 | Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras | Mohammad Altillawi et.al. | 2312.00500v1 | null |
2023-12-01 | Learning Unorthogonalized Matrices for Rotation Estimation | Kerui Gu et.al. | 2312.00462v1 | null |
2023-11-30 | PoseGPT: Chatting about 3D Human Pose | Yao Feng et.al. | 2311.18836v1 | null |
2023-11-30 | FoundPose: Unseen Object Pose Estimation with Foundation Features | Evin Pınar Örnek et.al. | 2311.18809v1 | null |
2023-11-30 | Pose Estimation and Tracking for ASIST | Ari Goodman et.al. | 2311.18665v1 | null |
2023-11-29 | A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem | Wolfgang Hoegele et.al. | 2311.18107v1 | null |
2023-11-29 | Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2311.17891v1 | link |
2023-11-29 | Cinematic Behavior Transfer via NeRF-based Differentiable Filming | Xuekun Jiang et.al. | 2311.17754v1 | null |
2023-11-29 | PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens | Sebastian Stapf et.al. | 2311.17504v1 | null |
2023-11-28 | On the Calibration of Human Pose Estimation | Kerui Gu et.al. | 2311.17105v1 | null |
2023-11-28 | Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence | Junyi Zhang et.al. | 2311.17034v1 | link |
2023-11-28 | HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors | Shutong Zhang et.al. | 2311.16552v1 | null |
2023-11-28 | Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement | Jian Wang et.al. | 2311.16495v1 | null |
2023-11-24 | UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning | Zhongyu Jiang et.al. | 2311.16477v1 | null |
2023-11-27 | DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization | Zhaoyang Xia et.al. | 2311.16060v1 | link |
2023-11-27 | Uncertainty Quantification of Set-Membership Estimation in Control and Perception: Revisiting the Minimum Enclosing Ellipsoid | Yukai Tang et.al. | 2311.15962v1 | null |
2023-11-27 | Computer Vision for Carriers: PATRIOT | Ari Goodman et.al. | 2311.15914v1 | null |
2023-11-27 | SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation | Jiehong Lin et.al. | 2311.15707v1 | link |
2023-11-24 | RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling | Xiaoyue Wan et.al. | 2311.14242v1 | null |
2023-11-23 | Appearance-based gaze estimation enhanced with synthetic images using deep neural networks | Dmytro Herashchenko et.al. | 2311.14175v1 | link |
2023-11-23 | GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence | Van Nguyen Nguyen et.al. | 2311.14155v1 | link |
2023-11-23 | GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence | Pengyuan Wang et.al. | 2311.13777v1 | null |
2023-11-22 | HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation | Chengpeng Wu et.al. | 2311.13615v1 | link |
2023-11-24 | Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor | Di Wang et.al. | 2311.12778v2 | null |
2023-11-21 | HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation | Yongliang Lin et.al. | 2311.12588v1 | link |
2023-11-21 | CoVOR-SLAM: Cooperative SLAM using Visual Odometry and Ranges for Multi-Robot Systems | Young-Hee Lee et.al. | 2311.12580v1 | null |
2023-11-21 | HCA-Net: Hierarchical Context Attention Network for Intervertebral Disc Semantic Labeling | Afshin Bozorgpour et.al. | 2311.12486v1 | link |
2023-11-21 | Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency | Christian Keilstrup Ingwersen et.al. | 2311.12421v1 | null |
2023-11-20 | Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models | Pooya Fayyazsanavi et.al. | 2311.12128v1 | link |
2023-11-20 | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | Wenhao Li et.al. | 2311.12028v1 | link |
2023-11-20 | SniffyArt: The Dataset of Smelling Persons | Mathias Zinnen et.al. | 2311.11888v1 | null |
2023-11-21 | Robot Hand-Eye Calibration using Structure-from-Motion | Nicolas Andreff et.al. | 2311.11808v2 | null |
2023-11-18 | SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation | Yamei Chen et.al. | 2311.11125v1 | link |
2023-11-18 | Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment | Parth Rawal et.al. | 2311.11039v1 | null |
2023-11-18 | Multiple View Geometry Transformers for 3D Human Pose Estimation | Ziwei Liao et.al. | 2311.10983v1 | link |
2023-11-18 | Jenga Stacking Based on 6D Pose Estimation for Architectural Form Finding Process | Zixun Huang et.al. | 2311.10918v1 | null |
2023-11-17 | BiHRNet: A Binary high-resolution network for Human Pose Estimation | Zhicheng Zhang et.al. | 2311.10296v1 | null |
2023-11-16 | Match and Locate: low-frequency monocular odometry based on deep feature matching | Stepan Konev et.al. | 2311.10034v1 | null |
2023-11-16 | LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters | Yibin Wu et.al. | 2311.09887v1 | link |
2023-11-16 | Improved TokenPose with Sparsity | Anning Li et.al. | 2311.09653v1 | null |
2023-11-16 | Pseudo-keypoints RKHS Learning for Self-supervised 6DoF Pose Estimation | Yangzheng Wu et.al. | 2311.09500v1 | null |
2023-11-15 | NormNet: Scale Normalization for 6D Pose Estimation in Stacked Scenarios | En-Te Lin et.al. | 2311.09269v1 | link |
2023-11-15 | Range-Visual-Inertial Sensor Fusion for Micro Aerial Vehicle Localization and Navigation | Abhishek Goudar et.al. | 2311.09056v1 | link |
2023-11-14 | LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping | Sujal Vijayaraghavan et.al. | 2311.08438v1 | null |
2023-11-13 | SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | Ziyi Lin et.al. | 2311.07575v1 | link |
2023-11-13 | Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers | Luca Lach et.al. | 2311.07257v1 | link |
2023-11-10 | CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM | Ruben Sanchez-Garcia et.al. | 2311.06194v1 | link |
2023-11-10 | 2D Image head pose estimation via latent space regression under occlusion settings | José Celestino et.al. | 2311.06038v1 | link |
2023-11-10 | Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous | Ziwei Wang et.al. | 2311.05992v1 | null |
2023-11-10 | A Practical Guide to Implementing Off-Axis Stereo Projection Using Existing Ray Tracing Libraries | Stefan Zellmann et.al. | 2311.05887v1 | link |
2023-11-09 | Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking | Mederic Fourmy et.al. | 2311.05344v1 | null |
2023-11-09 | Spatial Attention-based Distribution Integration Network for Human Pose Estimation | Sihan Gao et.al. | 2311.05323v1 | null |
2023-11-09 | SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event Sensing | Arunkumar Rathinam et.al. | 2311.05310v1 | null |
2023-11-09 | Differentiable Cloth Parameter Identification and State Estimation in Manipulation | Dongzhe Zheng et.al. | 2311.05141v1 | null |
2023-11-09 | POISE: Pose Guided Human Silhouette Extraction under Occlusions | Arindam Dutta et.al. | 2311.05077v1 | link |
2023-11-08 | Active Transfer Learning for Efficient Video-Specific Human Pose Estimation | Hiromu Taketsugu et.al. | 2311.05041v1 | link |
2023-11-08 | 3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud | Jianchao Ci et.al. | 2311.04699v1 | null |
2023-11-09 | Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations | Xiaoting Yin et.al. | 2311.04591v2 | link |
2023-11-08 | Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images | Nishant Jain et.al. | 2311.04521v1 | null |
2023-11-08 | PLV-IEKF: Consistent Visual-Inertial Odometry using Points, Lines, and Vanishing Points | Tong Hua et.al. | 2311.04477v1 | null |
2023-11-08 | UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields | Injae Kim et.al. | 2311.03784v2 | link |
2023-11-06 | A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation | Qitao Zhao et.al. | 2311.03312v1 | null |
2023-11-06 | Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction | Silvia Romero-Azpitarte et.al. | 2311.03146v1 | null |
2023-11-06 | Simultaneous Time Synchronization and Mutual Localization for Multi-robot System | Xiangyong Wen et.al. | 2311.02948v1 | null |
2023-11-06 | Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation | Xueyan Oh et.al. | 2311.02900v1 | null |
2023-11-06 | Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning | Nobline Yoo et.al. | 2311.02815v1 | link |
2023-11-03 | Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression | Jiaqi Wu et.al. | 2311.01782v1 | link |
2023-11-03 | Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation | Jiaqi Wu et.al. | 2311.01770v1 | null |
2023-11-02 | Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors | Gabriele M. Caddeo et.al. | 2311.01380v1 | link |
2023-11-01 | A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios | Wenyang Hu et.al. | 2311.00401v1 | null |
2023-10-31 | HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception | Junkun Yuan et.al. | 2310.20695v1 | link |
2023-10-31 | Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior | Qingqing Zhao et.al. | 2310.20249v1 | null |
2023-10-30 | FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound | Chaoyu Chen et.al. | 2310.19293v1 | null |
2023-10-29 | Distributed Nonlinear Filtering using Triangular Transport Maps | Daniel Grange et.al. | 2310.19000v1 | null |
2023-10-29 | TIC-TAC: A Framework To Learn And Evaluate Your Covariance | Megh Shukla et.al. | 2310.18953v1 | link |
2023-10-29 | Improving Multi-Person Pose Tracking with A Confidence Network | Zehua Fu et.al. | 2310.18920v1 | null |
2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874v1 | null |
2023-10-28 | Enhancing Grasping Performance of Novel Objects through an Improved Fine-Tuning Process | Xiao Hu et.al. | 2310.18569v1 | null |
2023-10-27 | ProcNet: Deep Predictive Coding Model for Robust-to-occlusion Visual Segmentation and Pose Estimation | Michael Zechmair et.al. | 2310.18009v1 | null |
2023-10-26 | Learning Extrinsic Dexterity with Parameterized Manipulation Primitives | Shih-Min Yang et.al. | 2310.17785v1 | null |
2023-10-26 | 6-DoF Stability Field via Diffusion Models | Takuma Yoneda et.al. | 2310.17649v1 | null |
2023-10-26 | SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation | Haobo Jiang et.al. | 2310.17359v1 | null |
2023-10-26 | Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs | Ryota Tanaka et.al. | 2310.17193v1 | link |
2023-10-25 | Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers | Gerald Ebmer et.al. | 2310.16618v1 | null |
2023-10-25 | ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors | Xiaoxuan Ma et.al. | 2310.16447v1 | link |
2023-10-25 | MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | Soroush Mehraban et.al. | 2310.16288v1 | link |
2023-10-25 | TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer | Xiao Lin et.al. | 2310.16279v1 | null |
2023-10-23 | Converting Depth Images and Point Clouds for Feature-based Pose Estimation | Robert Lösch et.al. | 2310.14924v1 | link |
2023-10-23 | Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings | Hazem Youssef et.al. | 2310.14914v1 | null |
2023-10-23 | Player Re-Identification Using Body Part Appearences | Mahesh Bhosale et.al. | 2310.14469v1 | null |
2023-10-20 | LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly | Bowen Fu et.al. | 2310.13819v1 | null |
2023-10-20 | FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer | Xinyu Zhang et.al. | 2310.13605v1 | null |
2023-10-20 | ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs' Navigation | Zhehan Li et.al. | 2310.13324v1 | link |
2023-10-20 | CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants | Shaoan Wang et.al. | 2310.13320v1 | link |
2023-10-19 | Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey | Lijuan Zhou et.al. | 2310.13039v1 | null |
2023-10-19 | FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects | Mayank Lunayach et.al. | 2310.12974v1 | link |
2023-10-18 | Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation | Bosang Kim et.al. | 2310.12189v1 | null |
2023-10-18 | One-Shot Imitation Learning: A Pose Estimation Perspective | Pietro Vitiello et.al. | 2310.12077v1 | null |
2023-10-18 | ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map | Ahmed Tawfik Aboukhadra et.al. | 2310.11811v1 | null |
2023-10-17 | Holistic Parking Slot Detection with Polygon-Shaped Representations | Lihao Wang et.al. | 2310.11629v1 | null |
2023-10-17 | Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication | Chelsey Edge et.al. | 2310.11536v1 | null |
2023-10-18 | AP $n$P: A Less-constrained P$n$ P Solver for Pose Estimation with Unknown Anisotropic Scaling or Focal Lengths | Jiaxin Wei et.al. | 2310.09982v2 | link |
2023-10-15 | Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior | Xiaotong Chen et.al. | 2310.09956v1 | null |
2023-10-15 | Socially reactive navigation models for mobile robots in dynamic environments | Ricarte Ribeiro et.al. | 2310.09916v1 | link |
2023-10-15 | MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection | David C. Jeong et.al. | 2310.09757v1 | link |
2023-10-16 | IMU Preintegration for Multi-Robot Systems in the Presence of Bias and Communication Constraints | Mohammed Ayman Shalaby et.al. | 2310.08686v2 | null |
2023-10-12 | Towards Design and Development of an ArUco Markers-Based Quantitative Surface Tactile Sensor | Ozdemir Can Kara et.al. | 2310.08398v1 | null |
2023-10-12 | Multimodal Active Measurement for Human Mesh Recovery in Close Proximity | Takahiro Maeda et.al. | 2310.08116v1 | link |
2023-10-12 | X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention | Yixuan Zhou et.al. | 2310.08042v1 | link |
2023-10-12 | PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction | Jia-Wang Bian et.al. | 2310.07449v2 | link |
2023-10-11 | SAGE-ICP: Semantic Information-Assisted ICP | Jiaming Cui et.al. | 2310.07237v1 | link |
2023-10-11 | DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation | Rong Wang et.al. | 2310.07206v1 | link |
2023-10-12 | FABind: Fast and Accurate Protein-Ligand Binding | Qizhi Pei et.al. | 2310.06763v2 | link |
2023-10-10 | EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation | Baichuan Huang et.al. | 2310.06751v1 | null |
2023-10-09 | Augmenting Vision-Based Human Pose Estimation with Rotation Matrix | Milad Vazan et.al. | 2310.06068v1 | null |
2023-10-07 | Federated Self-Supervised Learning of Monocular Depth Estimators for Autonomous Vehicles | Elton F. de S. Soares et.al. | 2310.04837v1 | null |
2023-10-10 | 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023 Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction | Zhishan Zhou et.al. | 2310.04769v2 | null |
2023-10-06 | SwimXYZ: A large-scale dataset of synthetic swimming motions and videos | Fiche Guénolé et.al. | 2310.04360v1 | null |
2023-10-05 | BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields | Ágoston István Csehi et.al. | 2310.03563v1 | null |
2023-10-05 | 3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation | Chen Zhao et.al. | 2310.03534v1 | null |
2023-10-05 | RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation | Boshi An et.al. | 2310.03478v1 | null |
2023-10-05 | Cyber Physical System Information Collection: Robot Location and Navigation Method Based on QR Code | Hongwei Li et.al. | 2310.03470v1 | null |
2023-10-04 | Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC | Hongyi Fan et.al. | 2310.02719v1 | null |
2023-10-05 | USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields | Moyang Li et.al. | 2310.02687v2 | link |
2023-10-03 | Beyond the Benchmark: Detecting Diverse Anomalies in Videos | Yoav Arad et.al. | 2310.01904v1 | link |
2023-10-03 | MFOS: Model-Free & One-Shot Object Pose Estimation | JongMin Lee et.al. | 2310.01897v1 | null |
2023-10-02 | LEAP: Liberate Sparse-view 3D Modeling from Camera Poses | Hanwen Jiang et.al. | 2310.01410v1 | link |
2023-10-02 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404v1 | link |
2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527v3 | link |
2023-09-30 | Diff-DOPE: Differentiable Deep Object Pose Estimation | Jonathan Tremblay et.al. | 2310.00463v1 | null |
2023-09-29 | Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration | Jungseok Hong et.al. | 2310.00146v1 | null |
2023-09-29 | Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation | Zhuoran Yu et.al. | 2310.00099v1 | null |
2023-09-29 | Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head | Qian Wu et.al. | 2309.17143v1 | link |
2023-09-29 | AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi | Yunjiao Zhou et.al. | 2309.16964v1 | null |
2023-09-28 | End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon | Guillaume Bono et.al. | 2309.16634v1 | null |
2023-09-28 | Off-the-shelf bin picking workcell with visual pose estimation: A case study on the world robot summit 2018 kitting task | Frederik Hagelskjær et.al. | 2309.16221v1 | null |
2023-09-28 | Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing | Lu Dai et.al. | 2309.16189v1 | null |
2023-09-28 | Laboratory Automation: Precision Insertion with Adaptive Fingers utilizing Contact through Sliding with Tactile-based Pose Estimation | Sameer Pai et.al. | 2309.16170v1 | null |
2023-09-28 | CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting | Shaoxiang Guo et.al. | 2309.16140v1 | null |
2023-09-28 | A Modular Bio-inspired Robotic Hand with High Sensitivity | Chao Liu et.al. | 2309.16081v1 | null |
2023-09-27 | Handbook on Leveraging Lines for Two-View Relative Pose Estimation | Petr Hruby et.al. | 2309.16040v1 | null |
2023-09-27 | Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature | Shengze Jin et.al. | 2309.16023v1 | null |
2023-09-27 | Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range | Xinran Li et.al. | 2309.15367v1 | null |
2023-09-26 | Unsupervised Reconstruction of 3D Human Pose Interactions From 2D Poses Alone | Peter Hardy et.al. | 2309.14865v1 | null |
2023-09-26 | Learning Vision-Based Bipedal Locomotion for Challenging Terrain | Helei Duan et.al. | 2309.14594v1 | null |
2023-09-25 | Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators | Yinan Meng et.al. | 2309.14279v1 | null |
2023-09-25 | Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics | Philipp Quentin et.al. | 2309.14265v1 | null |
2023-09-25 | BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation | Uyoung Jeong et.al. | 2309.14072v1 | link |
2023-09-24 | Towards Subcentimeter Accuracy Digital-Twin Tracking via An RGBD-based Transformer Model and A Comprehensive Mobile Dataset | Zixun Huang et.al. | 2309.13570v1 | link |
2023-09-21 | ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding | Yu Cheng et.al. | 2309.12183v1 | null |
2023-09-21 | ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers | Philipp Ausserlechner et.al. | 2309.11986v1 | null |
2023-09-21 | Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views | Taeho Kang et.al. | 2309.11962v1 | link |
2023-09-21 | A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose | Qingtian Wu et.al. | 2309.11773v1 | null |
2023-09-20 | Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation | Krishna Kanth Nakka et.al. | 2309.11667v1 | null |
2023-09-20 | Online Supervised Training of Spaceborne Vision during Proximity Operations using Adaptive Kalman Filtering | Tae Ha Park et.al. | 2309.11645v1 | null |
2023-09-20 | OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving | Heng Li et.al. | 2309.11011v1 | link |
2023-09-19 | Language-Conditioned Affordance-Pose Detection in 3D Point Clouds | Toan Nguyen et.al. | 2309.10911v1 | null |
2023-09-19 | MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings | Surbhi Madan et.al. | 2309.10765v1 | link |
2023-09-19 | SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction | Anilkumar Swamy et.al. | 2309.10748v1 | null |
2023-09-20 | GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild | Simon Schaefer et.al. | 2309.10369v2 | null |
2023-09-19 | RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery | Jiaxin Wei et.al. | 2309.10255v1 | link |
2023-09-18 | Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation | Kathia Melbouci et.al. | 2309.09934v1 | null |
2023-09-18 | Application-driven Validation of Posteriors in Inverse Problems | Tim J. Adler et.al. | 2309.09764v1 | null |
2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563v1 | null |
2023-09-18 | Sparse and Privacy-enhanced Representation for Human Pose Estimation | Ting-Ying Lin et.al. | 2309.09515v1 | null |
2023-09-19 | RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation | Lijun Li et.al. | 2309.09301v2 | link |
2023-09-16 | Optimal Initialization Strategies for Range-Only Trajectory Estimation | Abhishek Goudar et.al. | 2309.09011v1 | null |
2023-09-16 | DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF | Mert Asim Karaoglu et.al. | 2309.08927v1 | link |
2023-09-16 | Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning | Pengyu Yin et.al. | 2309.08914v1 | link |
2023-09-15 | Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild | Sungchan Park et.al. | 2309.08644v1 | null |
2023-09-15 | YCB-Ev: Event-vision dataset for 6DoF object pose estimation | Pavel Rojtberg et.al. | 2309.08482v1 | link |
2023-09-15 | Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM | Chenghao Shi et.al. | 2309.08086v1 | null |
2023-09-14 | Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success | Gergely Sóti et.al. | 2309.08040v1 | null |
2023-09-14 | TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting | Rohan Choudhury et.al. | 2309.07910v1 | null |
2023-09-14 | Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation | Thorsten Hempel et.al. | 2309.07654v1 | link |
2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471v1 | link |
2023-09-14 | Unleashing the Power of Depth and Pose Estimation Neural Networks by Designing Compatible Endoscopic Images | Junyang Wu et.al. | 2309.07390v1 | null |
2023-09-13 | LInKs "Lifting Independent Keypoints" -- Partial Pose Lifting for Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation | Peter Hardy et.al. | 2309.07243v1 | null |
2023-09-13 | 3D Active Metric-Semantic SLAM | Yuezhan Tao et.al. | 2309.06950v1 | null |
2023-09-11 | ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion | Hongyu Li et.al. | 2309.05662v1 | null |
2023-09-11 | Towards Intuitive HMI for UAV Control | Filip Zoric et.al. | 2309.05460v1 | null |
2023-09-12 | FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild | Jiong Wang et.al. | 2309.05073v2 | link |
2023-09-09 | Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation | Boyuan Jiang et.al. | 2309.04756v1 | link |
2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750v1 | link |
2023-09-08 | Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry | Akankshya Kar et.al. | 2309.04147v1 | null |
2023-09-07 | ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation | Hui Zhang et.al. | 2309.03891v1 | null |
2023-09-05 | An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria | Upender Kalwa et.al. | 2309.03235v1 | null |
2023-09-05 | A Robust Localization Solution for an Uncrewed Ground Vehicle in Unstructured Outdoor GNSS-Denied Environments | W. Jacob Wagner et.al. | 2309.02569v1 | null |
2023-09-05 | GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction | Youmin Zhang et.al. | 2309.02436v1 | link |
2023-09-05 | DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation | Lei Zhou et.al. | 2309.01925v1 | link |
2023-09-04 | On the Query Strategies for Efficient Online Active Distillation | Michele Boldo et.al. | 2309.01612v1 | null |
2023-09-04 | DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion | Cédric Rommel et.al. | 2309.01575v1 | null |
2023-09-06 | Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation | Hanbing Liu et.al. | 2309.01365v2 | link |
2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324v1 | null |
2023-09-03 | BodySLAM++: Fast and Tightly-Coupled Visual-Inertial Camera and Human Motion Tracking | Dorian F. Henning et.al. | 2309.01236v1 | null |
2023-09-02 | Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis | Jerrin Bright et.al. | 2309.01010v1 | null |
2023-09-01 | Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture | Shaohua Pan et.al. | 2309.00310v1 | link |
2023-08-31 | EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild | Manuel Kaufmann et.al. | 2308.16894v1 | link |
2023-08-31 | SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects | Ning Gao et.al. | 2308.16528v1 | null |
2023-08-30 | Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports | İrem Üstek et.al. | 2308.16325v1 | link |
2023-08-30 | SignDiff: Learning Diffusion Models for American Sign Language Production | Sen Fang et.al. | 2308.16082v1 | null |
2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984v1 | link |
2023-08-30 | Reconstructing Groups of People with Hypergraph Relational Reasoning | Buzhen Huang et.al. | 2308.15844v1 | link |
2023-08-29 | 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking | Urs Waldmann et.al. | 2308.15316v1 | link |
2023-08-29 | Spatio-temporal MLP-graph network for 3D human pose estimation | Tanvir Hassan et.al. | 2308.15313v1 | link |
2023-08-29 | Pose-Free Neural Radiance Fields via Implicit Pose Regularization | Jiahui Zhang et.al. | 2308.15049v1 | null |
2023-08-28 | R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras | Aron Schmied et.al. | 2308.14713v1 | null |
2023-08-28 | Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson's Disease | Gabriela T. Acevedo Trebbau et.al. | 2308.14679v1 | null |
2023-08-28 | Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera | Jun Yang et.al. | 2308.14665v1 | null |
2023-08-28 | CPFES: Physical Fitness Evaluation Based on Canadian Agility and Movement Skill Assessment | Pengcheng Dong et.al. | 2308.14324v1 | null |
2023-08-27 | LDL: Line Distance Functions for Panoramic Localization | Junho Kim et.al. | 2308.13989v1 | link |
2023-08-26 | Prior-guided Source-free Domain Adaptation for Human Pose Estimation | Dripta S. Raychaudhuri et.al. | 2308.13954v1 | null |
2023-08-26 | Vision-Based Human Pose Estimation via Deep Learning: A Survey | Gongjin Lan et.al. | 2308.13872v1 | null |
2023-08-24 | POCO: 3D Pose and Shape Estimation with Confidence | Sai Kumar Dwivedi et.al. | 2308.12965v1 | link |
2023-08-24 | Robot Pose Nowcasting: Forecast the Future to Improve the Present | Alessandro Simoni et.al. | 2308.12914v1 | null |
2023-08-23 | Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map | Timothy D Barfoot et.al. | 2308.12418v1 | null |
2023-08-22 | Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape | Jiacong Xu et.al. | 2308.11737v1 | null |
2023-08-22 | TrackFlow: Multi-Object Tracking with Normalizing Flows | Gianluca Mancusi et.al. | 2308.11513v1 | null |
2023-08-22 | A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles | Yusheng Wang et.al. | 2308.11492v1 | null |
2023-08-22 | PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation | Soubarna Banik et.al. | 2308.11440v1 | null |
2023-08-22 | Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views | Wentian Qu et.al. | 2308.11198v1 | null |
2023-08-21 | Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images | Tze Ho Elden Tse et.al. | 2308.11015v1 | null |
2023-08-21 | Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data | Patrick Ruhkamp et.al. | 2308.10627v1 | null |
2023-08-21 | GaitPT: Skeletons Are All You Need For Gait Recognition | Andy Catruna et.al. | 2308.10623v1 | null |
2023-08-21 | Approximately Equivariant Graph Networks | Ningyuan Huang et.al. | 2308.10436v1 | link |
2023-08-21 | In-Rack Test Tube Pose Estimation Using RGB-D Data | Hao Chen et.al. | 2308.10411v1 | null |
2023-08-20 | Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video | Yingxuan You et.al. | 2308.10305v1 | link |
2023-08-20 | OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision | Shujie Zhang et.al. | 2308.10146v1 | link |
2023-08-19 | 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation | Yi Zhang et.al. | 2308.10123v1 | link |
2023-08-19 | Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation | Yang Hai et.al. | 2308.10016v1 | link |
2023-08-19 | UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning | Meiqi Sun et.al. | 2308.09953v1 | null |
2023-08-22 | Scene-Aware Feature Matching | Xiaoyong Lu et.al. | 2308.09949v2 | null |
2023-08-18 | PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation | Hanbing Liu et.al. | 2308.09678v1 | link |
2023-08-18 | Improving 3D Pose Estimation for Sign Language | Maksym Ivashechkin et.al. | 2308.09525v1 | null |
2023-08-18 | Denoising Diffusion for 3D Hand Pose Estimation from Images | Maksym Ivashechkin et.al. | 2308.09523v1 | null |
2023-08-18 | ResQ: Residual Quantization for Video Perception | Davide Abati et.al. | 2308.09511v1 | null |
2023-08-17 | MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices | Dongyang Yu et.al. | 2308.09084v1 | null |
2023-08-17 | Pedestrian Environment Model for Automated Driving | Adrian Holzbock et.al. | 2308.09080v1 | link |
2023-08-17 | Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction | Yuhao Yang et.al. | 2308.08518v2 | null |
2023-08-16 | View Consistent Purification for Accurate Cross-View Localization | Shan Wang et.al. | 2308.08110v1 | null |
2023-08-15 | Learning Better Keypoints for Multi-Object 6DoF Pose Estimation | Yangzheng Wu et.al. | 2308.07827v1 | link |
2023-08-14 | Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation | Huan Liu et.al. | 2308.07313v1 | link |
2023-08-12 | 4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion | Guirong Zhuo et.al. | 2308.06573v1 | null |
2023-08-17 | EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes | Jiaxi Jiang et.al. | 2308.06493v2 | null |
2023-08-11 | Aggressive Aerial Grasping using a Soft Drone with Onboard Perception | Samuel Ubellacker et.al. | 2308.06351v1 | null |
2023-08-11 | VERF: Runtime Monitoring of Pose Estimation with Neural Radiance Fields | Dominic Maggio et.al. | 2308.05939v1 | null |
2023-08-10 | Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations | Frederike Dümbgen et.al. | 2308.05783v1 | link |
2023-08-10 | KS-APR: Keyframe Selection for Robust Absolute Pose Regression | Changkun Liu et.al. | 2308.05459v1 | null |
2023-08-10 | How-to Augmented Lagrangian on Factor Graphs | Barbara Bazzana et.al. | 2308.05444v1 | null |
2023-08-10 | Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation | Jun Zhou et.al. | 2308.05438v1 | link |
2023-08-10 | Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR | Changkun Liu et.al. | 2308.05394v1 | null |
2023-08-10 | Double-chain Constraints for 3D Human Pose Estimation in Images and Videos | Hongbo Kang et.al. | 2308.05298v1 | link |
2023-08-09 | ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction | Weijie Chen et.al. | 2308.04956v1 | null |
2023-08-07 | SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention | Efimia Panagiotaki et.al. | 2308.03718v1 | link |
2023-08-07 | A Horse with no Labels: Self-Supervised Horse Pose Estimation from Unlabelled Images and Synthetic Prior | Jose Sosa et.al. | 2308.03411v1 | null |
2023-08-06 | Source-free Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2308.03202v1 | link |
2023-08-04 | Diffusion-Augmented Depth Prediction with Sparse Annotations | Jiaqi Li et.al. | 2308.02283v1 | null |
2023-08-04 | DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field | Haowen Wang et.al. | 2308.02239v1 | null |
2023-08-07 | Robust Self-Supervised Extrinsic Self-Calibration | Takayuki Kanai et.al. | 2308.02153v2 | null |
2023-08-03 | Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter | Luca Crupi et.al. | 2308.01833v1 | null |
2023-08-03 | Active Acoustic Sensing for Robot Manipulation | Shihan Lu et.al. | 2308.01600v1 | null |
2023-08-02 | HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions | Andrew Guo et.al. | 2308.01477v1 | null |
2023-08-06 | Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes | Bohao Fan et.al. | 2308.00628v2 | link |
2023-08-01 | Markerless human pose estimation for biomedical applications: a survey | Andrea Avogaro et.al. | 2308.00519v1 | null |
2023-08-01 | Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches | Pia Hanfeld et.al. | 2308.00344v1 | link |
2023-08-01 | Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis | Asish Bera et.al. | 2308.00323v1 | null |
2023-08-01 | Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF) | Chaochao Zhou et.al. | 2308.00214v1 | null |
2023-07-31 | Lightweight Super-Resolution Head for Human Pose Estimation | Haonan Wang et.al. | 2307.16765v1 | link |
2023-07-31 | DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation | Runyang Feng et.al. | 2307.16687v1 | null |
2023-07-30 | Touch if it's transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction | Prajval Kumar Murali et.al. | 2307.16254v1 | null |
2023-07-30 | Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems | Cen Liu et.al. | 2307.16117v1 | link |
2023-07-29 | Iterative Graph Filtering Network for 3D Human Pose Estimation | Zaedul Islam et.al. | 2307.16074v1 | link |
2023-07-29 | HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation | Zuyan Liu et.al. | 2307.16061v1 | null |
2023-07-29 | Effective Whole-body Pose Estimation with Two-stages Distillation | Zhendong Yang et.al. | 2307.15880v1 | link |
2023-07-28 | TrackAgent: 6D Object Tracking via Reinforcement Learning | Konstantin Röhrl et.al. | 2307.15671v1 | null |
2023-07-28 | Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation | Jaime Corsetti et.al. | 2307.15514v1 | link |
2023-07-28 | Robust Visual Sim-to-Real Transfer for Robotic Manipulation | Ricardo Garcia et.al. | 2307.15320v1 | null |
2023-07-27 | Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving | Peter Bauer et.al. | 2307.14889v1 | null |
2023-07-26 | Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control | Yijiong Lin et.al. | 2307.14510v1 | null |
2023-07-28 | CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor | Alexandros Filotheou et.al. | 2307.14247v2 | link |
2023-07-26 | Deep Robust Multi-Robot Re-localisation in Natural Environments | Milad Ramezani et.al. | 2307.13950v1 | null |
2023-07-25 | Of Mice and Pose: 2D Mouse Pose Estimation from Unlabelled Data and Synthetic Prior | Jose Sosa et.al. | 2307.13361v1 | null |
2023-07-23 | TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation | Huijie Zhang et.al. | 2307.12400v1 | null |
2023-07-25 | FDCT: Fast Depth Completion for Transparent Objects | Tianan Li et.al. | 2307.12274v2 | link |
2023-07-22 | Challenges for Monocular 6D Object Pose Estimation in Robotics | Stefan Thalhammer et.al. | 2307.12172v1 | null |
2023-07-22 | Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap | Zhijian Qiao et.al. | 2307.12116v1 | link |
2023-07-22 | Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence | Yang Tian et.al. | 2307.12106v1 | link |
2023-07-26 | LAMP: Leveraging Language Prompts for Multi-person Pose Estimation | Shengnan Hu et.al. | 2307.11934v2 | link |
2023-07-21 | YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation | Arul Selvam Periyasamy et.al. | 2307.11550v1 | null |
2023-07-21 | KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation | Ivano Donadi et.al. | 2307.11543v1 | link |
2023-07-21 | Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots | Mihir Kulkarni et.al. | 2307.11522v1 | null |
2023-07-20 | SimCol3D -- 3D Reconstruction during Colonoscopy Challenge | Anita Rau et.al. | 2307.11261v1 | link |
2023-07-20 | MSQNet: Actor-agnostic Action Recognition with Multi-modal Query | Anindya Mondal et.al. | 2307.10763v1 | link |
2023-07-19 | POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities | Rui Wang et.al. | 2307.10387v1 | link |
2023-07-18 | ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting | Hongwei Zheng et.al. | 2307.09026v1 | null |
2023-07-17 | Human Emergency Detection during Autonomous Hospital Transports | Andreas Zachariae et.al. | 2307.08359v1 | link |
2023-07-17 | Self-supervised Monocular Depth Estimation: Let's Talk About The Weather | Kieran Saunders et.al. | 2307.08357v1 | null |
2023-07-20 | Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer | Yujiao Shi et.al. | 2307.08015v3 | link |
2023-07-15 | Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents | Ke Cao et.al. | 2307.07763v1 | null |
2023-07-13 | Haptic-guided assisted telemanipulation approach for grasping desired objects from heaps | Maxime Adjigble et.al. | 2307.07053v1 | null |
2023-07-13 | Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data | Miroslav Purkrábek et.al. | 2307.06737v1 | link |
2023-07-12 | Deep learning-based estimation of whole-body kinematics from multi-view images | Kien X. Nguyen et.al. | 2307.05896v1 | link |
2023-07-12 | GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human | Bruce X. B. Yu et.al. | 2307.05853v1 | link |
2023-07-09 | TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement | Mahmoud Abdulsalam et.al. | 2307.05561v1 | null |
2023-07-11 | ResMatch: Residual Attention Learning for Local Feature Matching | Yuxin Deng et.al. | 2307.05180v1 | link |
2023-07-07 | Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation | Jessica Yin et.al. | 2307.03839v1 | null |
2023-07-07 | Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation | Zhongyu Jiang et.al. | 2307.03833v1 | link |
2023-07-07 | Equivariant Single View Pose Prediction Via Induced and Restricted Representations | Owen Howell et.al. | 2307.03704v1 | null |
2023-07-07 | RCDN -- Robust X-Corner Detection Algorithm based on Advanced CNN Model | Ben Chen et.al. | 2307.03505v1 | null |
2023-07-06 | Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning | Christian Jauch et.al. | 2307.03007v1 | null |
2023-07-06 | Recognition and Estimation of Human Finger Pointing with an RGB Camera for Robot Directive | Eran Bamani et.al. | 2307.02949v1 | null |
2023-07-06 | A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition | Orhan Konak et.al. | 2307.02906v1 | null |
2023-07-04 | Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones | Elia Cereda et.al. | 2307.01559v1 | null |
2023-07-03 | Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach | Dongyang Yu et.al. | 2307.01004v1 | null |
2023-07-01 | Automatic Solver Generator for Systems of Laurent Polynomial Equations | Evgeniy Martyushev et.al. | 2307.00320v1 | link |
2023-07-01 | SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation | Fabian Duffhauss et.al. | 2307.00306v1 | link |
2023-06-30 | GIRA: Gaussian Mixture Models for Inference and Robot Autonomy | Kshitij Goel et.al. | 2307.00071v1 | link |
2023-06-30 | Towards the extraction of robust sign embeddings for low resource sign language recognition | Mathieu De Coster et.al. | 2306.17558v1 | null |
2023-06-30 | Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle | Václav Pritzl et.al. | 2306.17544v1 | link |
2023-06-30 | Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization | Stephen Hausler et.al. | 2306.17529v1 | null |
2023-06-29 | ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models | Weihao Cheng et.al. | 2306.17140v1 | null |
2023-06-29 | Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation | Zhongwei Qiu et.al. | 2306.17074v1 | null |
2023-06-28 | Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects | Alireza Rezazadeh et.al. | 2306.15858v1 | null |
2023-06-09 | Data-Link: High Fidelity Manufacturing Datasets for Model2Real Transfer under Industrial Settings | Sunny Katyara et.al. | 2306.05766v1 | null |
2023-05-28 | Counter-Hypothetical Particle Filters for Single Object Pose Tracking | Elizabeth A. Olson et.al. | 2305.17828v1 | null |
2023-05-25 | Enhanced 6D Pose Estimation for Robotic Fruit Picking | Marco Costanzo et.al. | 2305.15856v1 | null |
2023-05-22 | You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example | Walter Goodwin et.al. | 2305.12626v1 | null |
2023-05-18 | Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose | Yichen Zhang et.al. | 2305.10808v1 | link |
2023-05-08 | RelPose++: Recovering 6D Poses from Sparse-view Observations | Amy Lin et.al. | 2305.04926v1 | link |
2023-04-17 | Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose Estimation | Elena Govi et.al. | 2304.08230v1 | link |
2023-03-28 | CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects | Nick Heppert et.al. | 2303.15782v1 | link |
2023-03-23 | Prior-free Category-level Pose Estimation with Implicit Space Transformation | Jianhui Liu et.al. | 2303.13479v1 | link |
2023-06-21 | 6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics | Maximilian Ulmer et.al. | 2303.13241v3 | null |
2023-03-22 | Rigidity-Aware Detection for 6D Object Pose Estimation | Yang Hai et.al. | 2303.12396v1 | link |
2023-03-22 | Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation | Heng Yang et.al. | 2303.12246v1 | link |
2023-03-21 | Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation | Fulin Liu et.al. | 2303.11516v1 | link |
2023-03-18 | SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations | Boyan Wan et.al. | 2303.10346v1 | null |
2023-03-12 | Module-Wise Network Quantization for 6D Object Pose Estimation | Saqib Javed et.al. | 2303.06753v1 | link |
2023-03-09 | SpyroPose: Importance Sampling Pyramids for Object Pose Distribution Estimation in SE(3) | Rasmus Laurvig Haugaard et.al. | 2303.05308v1 | null |
2023-03-03 | Depth-based 6DoF Object Pose Estimation using Swin Transformer | Zhujun Li et.al. | 2303.02133v1 | link |
2023-03-02 | Canonical mapping as a general-purpose object descriptor for robotic manipulation | Benjamin Joffe et.al. | 2303.01331v1 | null |
2023-02-14 | MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation | Dingding Cai et.al. | 2302.07300v1 | null |
2023-02-14 | Model-Based Underwater 6D Pose Estimation from RGB | Davide Sapienza et.al. | 2302.06821v1 | null |
2023-02-02 | A Projective Geometric View for 6D Pose Estimation in mmWave MIMO Systems | Shengqiang Shen et.al. | 2302.00227v2 | null |
2023-01-31 | Collision-aware In-hand 6D Object Pose Estimation using Multiple Vision-based Tactile Sensors | Gabriele M. Caddeo et.al. | 2301.13667v1 | link |
2023-01-19 | Learning ultrasound plane pose regression: assessing generalized pose coordinates in the fetal brain | Chiara Di Vece et.al. | 2301.08317v1 | null |
2023-01-19 | RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation | Leonard Bruns et.al. | 2301.08147v1 | link |
2022-12-21 | HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios | HyunJun Jung et.al. | 2212.10428v2 | link |
2022-12-13 | MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare | Yann Labbé et.al. | 2212.06870v1 | null |
2022-12-11 | Context-aware 6D Pose Estimation of Known Objects using RGB-D data | Ankit Kumar et.al. | 2212.05560v1 | null |
2023-01-30 | Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation | Wei Chen et.al. | 2212.04632v2 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-03-11 | BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes | Minkyun Seo et.al. | 2503.07940v1 | null |
2025-03-10 | SANDRO: a Robust Solver with a Splitting Strategy for Point Cloud Registration | Michael Adlerstein et.al. | 2503.07743v1 | null |
2025-03-10 | HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions | Keyu Du et.al. | 2503.07019v1 | null |
2025-03-07 | Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration | Qianliang Wu et.al. | 2503.04127v2 | null |
2025-03-04 | HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration | Xiyu Zhang et.al. | 2503.02195v1 | null |
2025-03-02 | Semantic-ICP: Iterative Closest Point for Non-rigid Multi-Organ Point Cloud Registration | Wanwen Chen et.al. | 2503.00972v1 | null |
2025-02-26 | BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure | Haoxin Cai et.al. | 2502.19242v1 | link |
2025-02-15 | Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy | Mingyang Zhao et.al. | 2502.10704v1 | link |
2025-02-12 | Fully-Geometric Cross-Attention for Point Cloud Registration | Weijie Wang et.al. | 2502.08285v1 | null |
2025-02-11 | Multiview Point Cloud Registration Based on Minimum Potential Energy for Free-Form Blade Measurement | Zijie Wu et.al. | 2502.07680v1 | null |
2025-02-10 | DefTransNet: A Transformer-based Method for Non-Rigid Point Cloud Registration in the Simulation of Soft Tissue Deformation | Sara Monji-Azad et.al. | 2502.06336v1 | null |
2025-02-05 | Mapping and Localization Using LiDAR Fiducial Markers | Yibo Liu et.al. | 2502.03510v1 | null |
2025-01-31 | A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration | Richard Cheng et.al. | 2502.00115v1 | null |
2025-01-18 | PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration | Xiaoshui Huang et.al. | 2501.07762v2 | null |
2025-01-10 | LPRnet: A self-supervised registration network for LiDAR and photogrammetric point clouds | Chen Wang et.al. | 2501.05669v1 | null |
2025-01-09 | LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments | Haosong Yue et.al. | 2501.02580v2 | link |
2025-01-03 | MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using Multi-Instance Point Cloud Registration | Songjie Han et.al. | 2501.02041v1 | null |
2024-12-29 | Towards Explaining Uncertainty Estimates in Point Cloud Registration | Ziyuan Qin et.al. | 2412.20612v1 | null |
2024-12-26 | Resolving the Ambiguity of Complete-to-Partial Point Cloud Registration for Image-Guided Liver Surgery with Patches-to-Partial Matching | Zixin Yang et.al. | 2412.19328v1 | null |
2024-12-25 | Cross-PCR: A Robust Cross-Source Point Cloud Registration Framework | Guiyu Zhao et.al. | 2412.18873v1 | null |
2024-12-23 | PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging | Mattias Paul Heinrich et.al. | 2412.17390v1 | null |
2024-12-19 | 3D Registration in 30 Years: A Survey | Jiaqi Yang et.al. | 2412.13735v2 | link |
2024-12-13 | TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes | Yan Xia et.al. | 2412.10308v1 | null |
2024-12-10 | A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM | Zongbo Liao et.al. | 2412.07513v1 | null |
2024-12-07 | AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration | Jiong Lin et.al. | 2412.05507v1 | null |
2024-12-06 | GS-Matching: Reconsidering Feature Matching task in Point Cloud Registration | Yaojie Zhang et.al. | 2412.04855v1 | null |
2024-12-04 | AffordDP: Generalizable Diffusion Policy with Transferable Affordance | Shijie Wu et.al. | 2412.03142v1 | null |
2024-12-04 | QuadricsReg: Large-Scale Point Cloud Registration using Quadric Primitives | Ji Wu et.al. | 2412.02998v1 | null |
2024-12-01 | FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting | Phu Pham et.al. | 2412.00682v1 | null |
2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377v1 | null |
2024-11-22 | EADReg: Probabilistic Correspondence Generation with Efficient Autoregressive Diffusion Model for Outdoor Point Cloud Registration | Linrui Gong et.al. | 2411.15271v1 | null |
2024-11-20 | Automatic marker-free registration based on similar tetrahedras for single-tree point clouds | Jing Ren et.al. | 2411.13069v1 | null |
2024-11-19 | 3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality | Hanbeom Chang et.al. | 2411.12514v1 | null |
2024-11-16 | Deep Loss Convexification for Learning Iterative Models | Ziming Zhang et.al. | 2411.10649v1 | null |
2024-11-12 | 3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration | Liyuan Zhang et.al. | 2411.07740v1 | link |
2024-11-04 | Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration | Kezheng Xiong et.al. | 2411.01870v1 | link |
2024-10-30 | UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration | Geng Li et.al. | 2410.22909v1 | null |
2024-10-29 | Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy | Rongling Zhang et.al. | 2410.21857v1 | null |
2024-10-29 | Memory-Efficient Point Cloud Registration via Overlapping Region Sampling | Tomoyasu Shimada et.al. | 2410.21753v1 | null |
2024-10-21 | RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration | Pengcheng Shi et.al. | 2410.15682v1 | link |
2024-10-14 | A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration | Renlang Huang et.al. | 2410.10295v1 | link |
2024-10-14 | Kinematic-ICP: Enhancing LiDAR Odometry with Kinematic Constraints for Wheeled Mobile Robots Moving on Planar Surfaces | Tiziano Guadagnino et.al. | 2410.10277v1 | null |
2024-10-10 | LiPO: LiDAR Inertial Odometry for ICP Comparison | Darwin Mick et.al. | 2410.08097v1 | null |
2024-10-08 | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration | Xueyang Kang et.al. | 2410.05729v1 | link |
2024-10-07 | Enhanced Multi-Robot SLAM System with Cross-Validation Matching and Exponential Threshold Keyframe Selection | Ang He et.al. | 2410.05017v1 | null |
2024-10-03 | LoGDesc: Local geometric features aggregation for robust point cloud registration | Karim Slimani et.al. | 2410.02420v1 | link |
2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589v1 | null |
2024-10-01 | TFCT-I2P: Three stream fusion network with color aware transformer for image-to-point cloud registration | Muyao Peng et.al. | 2410.00360v1 | link |
2024-10-06 | KISS-Matcher: Fast and Robust Point Cloud Registration Revisited | Hyungtae Lim et.al. | 2409.15615v2 | link |
2024-09-23 | MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies | Haojie Huang et.al. | 2409.15517v1 | null |
2024-09-22 | SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration | Sara Monji-Azad et.al. | 2409.14474v1 | null |
2024-09-27 | FracGM: A Fast Fractional Programming Technique for Geman-McClure Robust Estimator | Bang-Shien Chen et.al. | 2409.13978v2 | link |
2024-09-17 | Enhancing the Reliability of LiDAR Point Cloud Sampling: A Colorization and Super-Resolution Approach Based on LiDAR-Generated Images | Sier Ha et.al. | 2409.11532v1 | null |
2024-09-14 | Registration between Point Cloud Streams and Sequential Bounding Boxes via Gradient Descent | Xuesong Li et.al. | 2409.09312v1 | null |
2024-09-11 | Unsupervised Point Cloud Registration with Self-Distillation | Christian Löwens et.al. | 2409.07558v1 | link |
2024-09-10 | Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations | Tejas Anvekar et.al. | 2409.06267v1 | link |
2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413v1 | null |
2024-09-08 | Sight View Constraint for Robust Point Cloud Registration | Yaojie Zhang et.al. | 2409.05065v1 | null |
2024-08-23 | UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration | Yuval Haitman et.al. | 2408.12380v2 | link |
2024-08-21 | Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild | Turcan Tuna et.al. | 2408.11809v1 | null |
2024-08-20 | LoopSplat: Loop Closure by Registering 3D Gaussian Splats | Liyuan Zhu et.al. | 2408.10154v2 | link |
2024-08-05 | CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2408.02394v1 | null |
2024-08-05 | MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval | Gongxin Yao et.al. | 2408.02392v1 | null |
2024-07-29 | Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning | Ray Zhang et.al. | 2407.20223v1 | null |
2024-07-24 | Robust Point Cloud Registration in Robotic Inspection with Locally Consistent Gaussian Mixture Model | Lingjie Su et.al. | 2407.17183v1 | null |
2024-07-23 | SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration | Chien Erh Lin et.al. | 2407.16823v1 | link |
2024-07-19 | PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training | Suyi Chen et.al. | 2407.14054v1 | link |
2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537v2 | link |
2024-07-22 | Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems | Jianzhu Huai et.al. | 2407.11705v2 | null |
2024-07-14 | PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration | Runzhao Yao et.al. | 2407.10142v1 | link |
2024-07-13 | ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency | Shaocheng Yan et.al. | 2407.09862v1 | link |
2024-07-11 | BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration | Stefanos Pertigkiozoglou et.al. | 2407.08729v1 | null |
2024-07-10 | Incremental Multiview Point Cloud Registration with Two-stage Candidate Retrieval | Shiqi Li et.al. | 2407.07525v1 | null |
2024-07-08 | SGOR: Outlier Removal by Leveraging Semantic and Geometric Information for Robust Point Cloud Registration | Guiyu Zhao et.al. | 2407.06297v1 | link |
2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597v1 | null |
2024-07-07 | GaussReg: Fast 3D Registration with Gaussian Splatting | Jiahao Chang et.al. | 2407.05254v1 | null |
2024-07-06 | Incremental Multiview Point Cloud Registration | Xiaoya Cheng et.al. | 2407.05021v1 | link |
2024-06-25 | Point Tree Transformer for Point Cloud Registration | Meiling Wang et.al. | 2406.17530v1 | null |
2024-06-17 | Correspondence Free Multivector Cloud Registration using Conformal Geometric Algebra | Francisco Xavier Vasconcelos et.al. | 2406.11732v1 | link |
2024-06-05 | L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration | Yibo Liu et.al. | 2406.03298v1 | link |
2024-05-25 | Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration | Junjie Gao et.al. | 2405.16085v1 | null |
2024-05-26 | NV-LIO: LiDAR-Inertial Odometry using Normal Vectors Towards Robust SLAM in Multifloor Environments | Dongha Chung et.al. | 2405.12563v2 | link |
2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594v1 | null |
2024-05-10 | Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | Li Ling et.al. | 2405.06279v1 | link |
2024-05-09 | Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration | Yifan Duan et.al. | 2405.05589v1 | null |
2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969v1 | null |
2024-05-06 | Deep Learning-based Point Cloud Registration for Augmented Reality-guided Surgery | Maximilian Weber et.al. | 2405.03314v1 | null |
2024-04-27 | FRAME: A Modular Framework for Autonomous Map-merging: Advancements in the Field | Nikolaos Stathoulopoulos et.al. | 2404.18006v1 | null |
2024-04-22 | PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer | Rui She et.al. | 2404.14034v1 | null |
2024-04-22 | A Comprehensive Survey and Taxonomy on Point Cloud Registration Based on Deep Learning | Yu-Xin Zhang et.al. | 2404.13830v1 | link |
2024-04-09 | Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search | Tianyu Huang et.al. | 2404.06155v1 | link |
2024-04-08 | Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes | Yu Sheng et.al. | 2404.05164v1 | null |
2024-04-06 | Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes | Zhiyuan Yu et.al. | 2404.04557v1 | link |
2024-04-05 | A Ground Mobile Robot for Autonomous Terrestrial Laser Scanning-Based Field Phenotyping | Javier Rodriguez-Sanchez et.al. | 2404.04404v1 | null |
2024-04-01 | FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features | Keisuke Sugiura et.al. | 2404.01237v1 | null |
2024-03-28 | SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks | Yaxu Xie et.al. | 2403.19474v1 | link |
2024-03-26 | Global Point Cloud Registration Network for Large Transformations | Hanz Cuevas-Velasquez et.al. | 2403.18040v1 | link |
2024-03-28 | Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance Fields | Junhong Zhao et.al. | 2403.15981v2 | null |
2024-03-15 | VRHCF: Cross-Source Point Cloud Registration via Voxel Representation and Hierarchical Correspondence Filtering | Guiyu Zhao et.al. | 2403.10085v1 | link |
2024-03-15 | MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings | Yu Du et.al. | 2403.09996v1 | null |
2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990v1 | null |
2024-03-13 | FastMAC: Stochastic Spectral Sampling of Correspondence Graph | Yifei Zhang et.al. | 2403.08770v1 | link |
2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156v1 | link |
2024-03-10 | PSS-BA: LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2403.06124v1 | null |
2024-03-27 | Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension | Quan Liu et.al. | 2403.03532v2 | link |
2024-03-15 | RELEAD: Resilient Localization with Enhanced LiDAR Odometry in Adverse Environments | Zhiqiang Chen et.al. | 2402.18934v2 | null |
2024-02-28 | PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers | Seong Hun Lee et.al. | 2402.16598v2 | link |
2024-02-23 | CLIPPER+: A Fast Maximal Clique Algorithm for Robust Global Registration | Kaveh Fathian et.al. | 2402.15464v1 | link |
2024-02-11 | CLIPPER: Robust Data Association without an Initial Guess | Parker C. Lusk et.al. | 2402.07284v1 | null |
2024-02-08 | Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization | Kenji Koide et.al. | 2402.05540v1 | null |
2024-01-16 | Registration of algebraic varieties using Riemannian optimization | Florentin Goyens et.al. | 2401.08562v1 | link |
2024-01-09 | Iterative Feedback Network for Unsupervised Point Cloud Registration | Yifan Xie et.al. | 2401.04357v1 | link |
2024-01-06 | PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations | Rui She et.al. | 2401.03167v1 | null |
2024-01-04 | OptFlow: Fast Optimization-based Scene Flow Estimation without Supervision | Rahul Ahuja et.al. | 2401.02550v1 | null |
2024-01-17 | Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration | Qianliang Wu et.al. | 2401.00436v4 | null |
2023-12-22 | On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods | Anh Duc Nguyen et.al. | 2312.13970v2 | link |
2023-12-20 | D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer | Junjie Gao et.al. | 2312.12970v1 | null |
2023-12-14 | SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration | Kezheng Xiong et.al. | 2312.08664v1 | null |
2023-12-11 | PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration | Yue Wu et.al. | 2312.06063v1 | null |
2023-12-05 | DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration | Zhi Chen et.al. | 2312.03053v1 | null |
2023-12-08 | Zero-Shot Point Cloud Registration | Weijie Wang et.al. | 2312.03032v2 | null |
2023-12-05 | A Dynamic Network for Efficient Point Cloud Registration | Yang Ai et.al. | 2312.02877v1 | null |
2023-12-05 | 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation | K. Samarawickrama et.al. | 2312.02593v1 | link |
2023-12-04 | Rotation-Invariant Rapid TRISO-Fueled Pebble Identification Based on Feature Matching and Point Cloud Registration | Ming Fang et.al. | 2312.02006v1 | null |
2023-12-27 | E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning | Xiuhong Lin et.al. | 2311.18433v2 | link |
2023-11-15 | Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change | Tao Sun et.al. | 2311.09346v1 | null |
2023-11-02 | Transformation Decoupling Strategy based on Screw Theory for Deterministic Point Cloud Registration with Gravity Prior | Xinyi Li et.al. | 2311.01432v1 | null |
2023-11-02 | Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration | Yifan Xie et.al. | 2311.01202v1 | link |
2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874v1 | null |
2023-10-27 | Do we need scan-matching in radar odometry? | Vladimír Kubelka et.al. | 2310.18117v1 | link |
2023-10-26 | SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation | Haobo Jiang et.al. | 2310.17359v1 | null |
2023-10-18 | DBDNet:Partial-to-Partial Point Cloud Registration with Dual Branches Decoupling | Shiqi Li et.al. | 2310.11733v1 | null |
2023-10-15 | OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer | Junjie Gao et.al. | 2310.09817v1 | null |
2023-10-09 | FeatSense -- A Feature-based Registration Algorithm with GPU-accelerated TSDF-Mapping Backend for NVIDIA Jetson Boards | Julian Gaal et.al. | 2310.05766v1 | link |
2023-10-09 | Colmap-PCD: An Open-source Tool for Fine Image-to-point cloud Registration | Chunge Bai et.al. | 2310.05504v1 | link |
2023-10-06 | Light-LOAM: A Lightweight LiDAR Odometry and Mapping based on Graph-Matching | Shiquan Yi et.al. | 2310.04162v1 | link |
2023-10-05 | FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators | Haiping Wang et.al. | 2310.03420v1 | link |
2023-10-02 | COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry | Patrick Pfreundschuh et.al. | 2310.01235v1 | link |
2023-09-27 | Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature | Shengze Jin et.al. | 2309.16023v1 | null |
2023-09-27 | Partial Transport for Point-Cloud Registration | Yikun Bai et.al. | 2309.15787v1 | null |
2023-09-27 | KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping | Renlang Huang et.al. | 2309.15394v1 | null |
2023-09-26 | CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration | Shuhao Kang et.al. | 2309.14660v1 | null |
2023-09-20 | AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration | Zheng Dang et.al. | 2309.11170v1 | null |
2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436v1 | link |
2023-09-17 | Hamiltonian Dynamics Learning from Point Cloud Observations for Nonholonomic Mobile Robot Control | Abdullah Altawaitan et.al. | 2309.09163v1 | link |
2023-09-16 | FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization | Nan Ma et.al. | 2309.08966v1 | null |
2023-09-16 | Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning | Pengyu Yin et.al. | 2309.08914v1 | link |
2023-09-15 | A Ground Segmentation Method Based on Point Cloud Map for Unstructured Roads | Zixuan Li et.al. | 2309.08164v1 | null |
2023-09-15 | Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM | Chenghao Shi et.al. | 2309.08086v1 | null |
2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471v1 | link |
2023-09-12 | SGFeat: Salient Geometric Feature for Point Cloud Registration | Qianliang Wu et.al. | 2309.06207v1 | null |
2023-09-01 | Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning | Ahmed Hatem et.al. | 2308.16481v2 | null |
2023-08-21 | In-Rack Test Tube Pose Estimation Using RGB-D Data | Hao Chen et.al. | 2308.10411v1 | null |
2023-08-18 | DReg-NeRF: Deep Registration for Neural Radiance Fields | Yu Chen et.al. | 2308.09386v1 | link |
2023-08-18 | Overlap Bias Matching is Necessary for Point Cloud Registration | Pengcheng Shi et.al. | 2308.09364v1 | null |
2023-08-10 | Deep Semantic Graph Matching for Large-scale Outdoor Point Clouds Registration | Shaocong Liu et.al. | 2308.05314v1 | null |
2023-08-09 | PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration | Mingzhi Yuan et.al. | 2308.04782v1 | link |
2023-07-25 | GeoTransformer: Fast and Robust Point Cloud Registration with Geometric Transformer | Zheng Qin et.al. | 2308.03768v1 | link |
2023-07-26 | One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration | Yongzhe Yuan et.al. | 2307.14019v1 | null |
2023-07-22 | Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap | Zhijian Qiao et.al. | 2307.12116v1 | link |
2023-09-12 | ELiOT : End-to-end Lidar Odometry using Transformer Framework | Daegyu Lee et.al. | 2307.11998v4 | null |
2023-08-08 | Density-invariant Features for Distant Point Cloud Registration | Quan Liu et.al. | 2307.09788v2 | link |
2023-07-18 | SphereNet: Learning a Noise-Robust and General Descriptor for Point Cloud Registration | Guiyu Zhao et.al. | 2307.09351v1 | null |
2023-07-14 | CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2307.07142v1 | null |
2023-07-11 | Exact Point Cloud Downsampling for Fast and Accurate Global Trajectory Optimization | Kenji Koide et.al. | 2307.02948v2 | link |
2023-07-03 | Direct Superpoints Matching for Fast and Robust Point Cloud Registration | Aniket Gupta et.al. | 2307.01362v1 | link |
2023-07-04 | A denoised Mean Teacher for domain adaptive point cloud registration | Alexander Bigalke et.al. | 2306.14749v2 | link |
2023-06-20 | End-to-end 2D-3D Registration between Image and LiDAR Point Cloud for Vehicle Localization | Guangming Wang et.al. | 2306.11346v1 | null |
2023-06-14 | ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching | Matthew McDermott et.al. | 2306.08690v1 | link |
2023-06-12 | Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM | Peter Stratton et.al. | 2306.06850v1 | link |
2023-06-11 | PWR-Align: Leveraging Part-Whole Relationships for Part-wise Rigid Point Cloud Registration in Mixed Reality Applications | Manorama Jha et.al. | 2306.06717v1 | null |
2023-06-07 | Robust-DefReg: A Robust Deformable Point Cloud Registration Method based on Graph Convolutional Neural Networks | Sara Monji-Azad et.al. | 2306.04701v1 | null |
2023-05-23 | Cross-source Point Cloud Registration: Challenges, Progress and Prospects | Xiaoshui Huang et.al. | 2305.13570v1 | null |
2023-05-19 | Efficient and Deterministic Search Strategy Based on Residual Projections for Point Cloud Registration | Xinyi Li et.al. | 2305.11716v1 | null |
2023-05-18 | 3D Registration with Maximal Cliques | Xiyu Zhang et.al. | 2305.10854v1 | link |
2023-05-05 | HD2Reg: Hierarchical Descriptors and Detectors for Point Cloud Registration | Canhui Tang et.al. | 2305.03487v1 | link |
2023-05-08 | APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction | Quan Liu et.al. | 2305.02893v2 | link |
2023-04-27 | RegHEC: Hand-Eye Calibration via Simultaneous Multi-view Point Clouds Registration of Arbitrary Object | Shiyu Xing et.al. | 2304.14092v1 | link |
2023-04-26 | Non-rigid Point Cloud Registration for Middle Ear Diagnostics with Endoscopic Optical Coherence Tomography | Peng Liu et.al. | 2304.13618v1 | link |
2023-04-25 | BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization | Harel Biggie et.al. | 2304.13114v1 | link |
2023-04-18 | SDFReg: Learning Signed Distance Functions for Point Cloud Registration | Leida Zhang et.al. | 2304.08929v1 | null |
2023-04-12 | SiLK -- Simple Learned Keypoints | Pierre Gleize et.al. | 2304.06194v1 | link |
2023-04-11 | TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain | Alexey I. Boyko et.al. | 2304.05342v1 | null |
2023-04-10 | HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion | Yu Wang et.al. | 2304.04508v1 | null |
2023-04-09 | Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos | Shiyang Lu et.al. | 2304.04325v1 | null |
2023-04-09 | DSMNet: Deep High-precision 3D Surface Modeling from Sparse Point Cloud Frames | Changjie Qiu et.al. | 2304.04200v1 | null |
2023-04-02 | Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting | Haiping Wang et.al. | 2304.00467v1 | link |
2023-03-31 | kNN-Res: Residual Neural Network with kNN-Graph coherence for point cloud registration | Muhammad S. Battikh et.al. | 2304.00050v1 | link |
2023-03-31 | RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving | Chenghao Shi et.al. | 2303.18084v1 | null |
2023-04-23 | HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching | Yiheng Li et.al. | 2303.16526v2 | link |
2023-03-27 | Learnable Graph Matching: A Practical Paradigm for Data Association | Jiawei He et.al. | 2303.15414v1 | link |
2023-03-23 | Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration | Guofeng Mei et.al. | 2303.13290v1 | link |
2023-03-22 | RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration | Jiuming Liu et.al. | 2303.12384v1 | link |
2023-03-17 | Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration | Zheng Qin et.al. | 2303.09950v1 | link |
2023-03-14 | RoCNet: 3D Robust Registration of Point-Clouds using Deep Learning | Karim Slimani et.al. | 2303.07963v1 | null |
2023-03-07 | GMCR: Graph-based Maximum Consensus Estimation for Point Cloud Registration | Michael Gentner et.al. | 2303.04032v1 | null |
2023-03-02 | Neural Intrinsic Embedding for Non-rigid Point Cloud Matching | Puhua Jiang et.al. | 2303.01038v1 | null |
2023-03-14 | A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation | Lin Li et.al. | 2302.14511v2 | link |
2023-02-28 | PCR-CG: Point Cloud Registration via Deep Color and Geometry | Yu Zhang et.al. | 2302.14418v1 | link |
2023-02-28 | Efficient Implicit Neural Reconstruction Using LiDAR | Dongyu Yan et.al. | 2302.14363v1 | link |
2023-02-25 | Accurate Gaussian Process Distance Fields with applications to Echolocation and Mapping | Cedric Le Gentil et.al. | 2302.13005v1 | null |
2023-02-14 | Point Cloud Registration for LiDAR and Photogrammetric Data: a Critical Synthesis and Performance Analysis on Classic and Deep Learning Algorithms | Ningli Xu et.al. | 2302.07184v1 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-03-07 | Joint 3D Point Cloud Segmentation using Real-Sim Loop: From Panels to Trees and Branches | Tian Qiu et.al. | 2503.05630v1 | null |
2025-03-05 | Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters | Julia Hindel et.al. | 2503.03299v1 | null |
2025-03-01 | Explainable LiDAR 3D Point Cloud Segmentation and Clustering for Detecting Airplane-Generated Wind Turbulence | Zhan Qu et.al. | 2503.00518v1 | null |
2025-02-26 | PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments | Yueting Liu et.al. | 2502.15342v3 | link |
2025-02-18 | An Experimental Study of SOTA LiDAR Segmentation Models | Bike Chen et.al. | 2502.12860v1 | null |
2025-01-30 | Ground Awareness in Deep Learning for Large Outdoor Point Cloud Segmentation | Kevin Qiu et.al. | 2501.18246v1 | null |
2025-01-29 | 3DSES: an indoor Lidar point cloud segmentation dataset with real and pseudo-labels from a 3D model | Maxime Mérizette et.al. | 2501.17534v1 | null |
2025-01-24 | LiDAR-Based Vehicle Detection and Tracking for Autonomous Racing | Marcello Cellina et.al. | 2501.14502v1 | null |
2025-01-06 | The 2nd Place Solution from the 3D Semantic Segmentation Track in the 2024 Waymo Open Dataset Challenge | Qing Wu et.al. | 2501.05472v1 | null |
2025-01-03 | MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using Multi-Instance Point Cloud Registration | Songjie Han et.al. | 2501.02041v1 | null |
2025-01-18 | Impact of color and mixing proportion of synthetic point clouds on semantic segmentation | Shaojie Zhou et.al. | 2412.19145v2 | link |
2024-12-02 | The Bare Necessities: Designing Simple, Effective Open-Vocabulary Scene Graphs | Christina Kassab et.al. | 2412.01539v1 | null |
2024-11-30 | Density-aware Global-Local Attention Network for Point Cloud Segmentation | Chade Li et.al. | 2412.00489v1 | null |
2024-11-28 | Textured As-Is BIM via GIS-informed Point Cloud Segmentation | Mohamed S. H. Alabassy et.al. | 2411.18898v1 | null |
2024-11-27 | Towards Cross-device and Training-free Robotic Grasping in 3D Open World | Weiguang Zhao et.al. | 2411.18133v1 | null |
2024-11-20 | BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | Umamaheswaran Raman Kumar et.al. | 2411.13251v1 | null |
2024-11-13 | Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model | Yutao Shen et.al. | 2411.08453v1 | null |
2024-11-13 | Multiscale Graph Construction Using Non-local Cluster Features | Reina Kaneko et.al. | 2411.08371v1 | null |
2024-10-30 | Automated Image-Based Identification and Consistent Classification of Fire Patterns with Quantitative Shape Analysis and Spatial Location Identification | Pengkun Liu et.al. | 2410.23105v1 | null |
2024-11-03 | Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation | Zhaochong An et.al. | 2410.22489v2 | null |
2024-10-28 | Exploring contextual modeling with linear complexity for point cloud segmentation | Yong Xien Chng et.al. | 2410.21211v1 | null |
2024-10-14 | Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies | Yanjie Ze et.al. | 2410.10803v1 | link |
2024-10-09 | Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy | Qinfeng Zhu et.al. | 2410.06725v1 | null |
2024-09-24 | Underground Mapping and Localization Based on Ground-Penetrating Radar | Jinchang Zhang et.al. | 2409.16446v1 | null |
2024-09-22 | Lidar Panoptic Segmentation in an Open World | Anirudh S Chakravarthy et.al. | 2409.14273v1 | link |
2024-09-03 | When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels | Yifan Liu et.al. | 2409.01691v1 | null |
2024-09-03 | Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation | Haodong Wang et.al. | 2409.01662v1 | null |
2024-08-29 | Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment | Liyao Tang et.al. | 2408.16520v1 | link |
2024-08-21 | GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation | Abiao Li et.al. | 2408.11558v1 | link |
2024-08-02 | Trainable Pointwise Decoder Module for Point Cloud Segmentation | Bike Chen et.al. | 2408.01548v1 | null |
2024-07-31 | Fine-grained Metrics for Point Cloud Semantic Segmentation | Zhuheng Lu et.al. | 2407.21289v1 | null |
2024-07-19 | Scale Disparity of Instances in Interactive Point Cloud Segmentation | Chenrui Han et.al. | 2407.14009v1 | null |
2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761v1 | null |
2024-07-17 | Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation | Ruijie Xu et.al. | 2407.12489v1 | link |
2024-07-17 | HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation | Tianpei Zou et.al. | 2407.12387v1 | link |
2024-07-17 | Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model | Tao Wang et.al. | 2407.12319v1 | null |
2024-07-12 | Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion | Shiqi Tan et.al. | 2407.09697v1 | null |
2024-07-01 | fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence | Francis Williams et.al. | 2407.01781v1 | null |
2024-06-25 | Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model | Zhuoyuan Li et.al. | 2406.17442v1 | null |
2024-08-04 | Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes | Yong-Qiang Mao et.al. | 2405.19735v2 | null |
2024-05-24 | 3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving | Boyi Sun et.al. | 2405.15286v1 | link |
2024-05-25 | Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation | Bike Chen et.al. | 2405.10175v2 | null |
2024-04-16 | ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation | Iaroslav Melekhov et.al. | 2404.10699v1 | link |
2024-04-04 | OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views | Francis Engelmann et.al. | 2404.03650v1 | null |
2024-03-28 | RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation | Chongkai Gao et.al. | 2403.19460v1 | null |
2024-05-30 | CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation | Guoyang Zhao et.al. | 2403.16794v2 | link |
2024-03-18 | EffiPerception: an Efficient Framework for Various Perception Tasks | Xinhao Xiang et.al. | 2403.12317v1 | null |
2024-03-11 | 3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data | Xiting Zhao et.al. | 2403.06538v1 | null |
2024-03-11 | Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation | Peng Zhang et.al. | 2403.06401v1 | null |
2024-03-03 | Region-Transformer: Self-Attention Region Based Class-Agnostic Point Cloud Segmentation | Dipesh Gyawali et.al. | 2403.01407v1 | null |
2024-01-29 | Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation | Jie Liu et.al. | 2401.16051v1 | link |
2024-01-19 | Symbol as Points: Panoptic Symbol Spotting via Point-based Representation | Wenlong Liu et.al. | 2401.10556v1 | link |
2023-12-29 | Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation | Xiawei Li et.al. | 2312.16578v2 | link |
2023-12-19 | Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case Study on Urban Areas | Alperen Enes Bayar et.al. | 2312.11880v1 | null |
2023-12-15 | T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning | Weijie Wei et.al. | 2312.10217v1 | link |
2023-12-14 | FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments | Minghao Lu et.al. | 2312.08743v1 | null |
2023-12-12 | Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation | Yuanbin Wang et.al. | 2312.07221v1 | null |
2023-12-11 | Densify Your Labels: Unsupervised Clustering with Bipartite Matching for Weakly Supervised Point Cloud Segmentation | Shaobo Xia et.al. | 2312.06799v1 | null |
2024-01-15 | Provable Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More | Jan Schuchardt et.al. | 2312.02708v2 | null |
2023-11-24 | OneFormer3D: One Transformer for Unified Point Cloud Segmentation | Maxim Kolodiazhnyi et.al. | 2311.14405v1 | null |
2023-11-18 | DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields | Yu Chi et.al. | 2311.12063v1 | link |
2023-11-10 | U3DS |
Jiaxu Liu et.al. | 2311.06018v1 | null |
2023-11-06 | Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation | Shichao Dong et.al. | 2311.01989v2 | null |
2023-10-19 | 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision | Cheng-Kun Yang et.al. | 2310.12817v1 | null |
2023-10-11 | PointHR: Exploring High-Resolution Architectures for 3D Point Cloud Segmentation | Haibo Qiu et.al. | 2310.07743v1 | link |
2023-09-26 | Addressing Data Misalignment in Image-LiDAR Fusion on Point Cloud Segmentation | Wei Jong Yang et.al. | 2309.14932v1 | null |
2023-09-20 | Towards Robust Few-shot Point Cloud Semantic Segmentation | Yating Xu et.al. | 2309.11228v1 | link |
2023-09-20 | Generalized Few-Shot Point Cloud Segmentation Via Geometric Words | Yating Xu et.al. | 2309.11222v1 | link |
2023-08-29 | Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation | Cristiano Saltori et.al. | 2308.14619v2 | link |
2023-08-22 | Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation | Zongyi Xu et.al. | 2308.11166v1 | link |
2023-08-14 | Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid | Alexander Kyuroson et.al. | 2308.07283v1 | null |
2023-08-08 | Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement | Zhenhua Ning et.al. | 2308.03177v2 | link |
2023-07-31 | pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation | Abhishek Kuriyal et.al. | 2307.14777v2 | link |
2023-07-27 | Clustering based Point Cloud Representation Learning for 3D Analysis | Tuo Feng et.al. | 2307.14605v1 | link |
2023-07-20 | See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data | Yuhang Lu et.al. | 2307.10782v1 | null |
2023-07-14 | Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar | Runwei Guan et.al. | 2307.07102v1 | link |
2023-07-08 | BPNet: Bézier Primitive Segmentation on 3D Point Clouds | Rao Fu et.al. | 2307.04013v1 | link |
2023-06-28 | Point2Point : A Framework for Efficient Deep Learning on Hilbert sorted Point Clouds with applications in Spatio-Temporal Occupancy Prediction | Athrva Atul Pandhare et.al. | 2306.16306v1 | null |
2023-05-30 | Dynamic Clustering Transformer Network for Point Cloud Segmentation | Dening Lu et.al. | 2306.08073v1 | null |
2023-05-23 | Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation | Shuting He et.al. | 2305.14335v1 | link |
2023-05-22 | Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning | Xiaoxiao Sheng et.al. | 2305.12959v1 | null |
2023-05-17 | Tinto: Multisensor Benchmark for 3D Hyperspectral Point Cloud Segmentation in the Geosciences | Ahmed J. Afifi et.al. | 2305.09928v1 | null |
2023-05-08 | OctFormer: Octree-based Transformers for 3D Point Clouds | Peng-Shuai Wang et.al. | 2305.03045v2 | link |
2023-05-22 | Urban GeoBIM construction by integrating semantic LiDAR point clouds with as-designed BIM models | Jie Shao et.al. | 2304.11719v2 | null |
2023-04-22 | Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation | Feng Jiang et.al. | 2304.11393v1 | link |
2023-06-02 | Transformer-Based Visual Segmentation: A Survey | Xiangtai Li et.al. | 2304.09854v2 | link |
2023-04-11 | Feature-assisted interactive geometry reconstruction in 3D point clouds using incremental region growing | Attila Szabo et.al. | 2304.05109v1 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-03-11 | Exploring the Word Sense Disambiguation Capabilities of Large Language Models | Pierpaolo Basile et.al. | 2503.08662v1 | null |
2025-03-11 | CellStyle: Improved Zero-Shot Cell Segmentation via Style Transfer | Rüveyda Yilmaz et.al. | 2503.08603v1 | null |
2025-03-11 | NSF-SciFy: Mining the NSF Awards Database for Scientific Claims | Delip Rao et.al. | 2503.08600v1 | null |
2025-03-11 | MMRL: Multi-Modal Representation Learning for Vision-Language Models | Yuncheng Guo et.al. | 2503.08497v1 | link |
2025-03-11 | Controlling Latent Diffusion Using Latent CLIP | Jason Becker et.al. | 2503.08455v1 | null |
2025-03-11 | Embodied Crowd Counting | Runling Long et.al. | 2503.08367v1 | null |
2025-03-11 | Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach | Steeven Janny et.al. | 2503.08306v1 | null |
2025-03-12 | Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study | Xian-Rong Zhang et.al. | 2503.08301v2 | null |
2025-03-11 | Investigating the Effectiveness of a Socratic Chain-of-Thoughts Reasoning Method for Task Planning in Robotics, A Case Study | Veronica Bot et.al. | 2503.08174v1 | null |
2025-03-12 | Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning | Lizhen Xu et.al. | 2503.08101v2 | link |
2025-03-10 | PE3R: Perception-Efficient 3D Reconstruction | Jie Hu et.al. | 2503.07507v1 | null |
2025-03-10 | Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts | Shiu-hong Kao et.al. | 2503.07503v1 | null |
2025-03-10 | LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? | Bangyan Li et.al. | 2503.07487v1 | null |
2025-03-10 | YOLOE: Real-Time Seeing Anything | Ao Wang et.al. | 2503.07465v1 | link |
2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413v1 | link |
2025-03-10 | Dynamic Path Navigation for Motion Agents with LLM Reasoning | Yubo Zhao et.al. | 2503.07323v1 | null |
2025-03-10 | Automatic Curriculum Design for Zero-Shot Human-AI Coordination | Won-Sang You et.al. | 2503.07275v1 | null |
2025-03-11 | AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis | Zhangyu Lai et.al. | 2503.07253v2 | null |
2025-03-10 | Cross-Lingual IPA Contrastive Learning for Zero-Shot NER | Jimin Sohn et.al. | 2503.07214v1 | null |
2025-03-10 | A Zero-shot Learning Method Based on Large Language Models for Multi-modal Knowledge Graph Embedding | Bingchen Liu et.al. | 2503.07202v1 | null |
2025-03-10 | Multi-Modal 3D Mesh Reconstruction from Images and Text | Melvin Reka et.al. | 2503.07190v1 | null |
2025-03-10 | Generative AI in Transportation Planning: A Survey | Longchao Da et.al. | 2503.07158v1 | null |
2025-03-07 | Joint 3D Point Cloud Segmentation using Real-Sim Loop: From Panels to Trees and Branches | Tian Qiu et.al. | 2503.05630v1 | null |
2025-03-07 | InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model | Feeza Khan Khanzada et.al. | 2503.05573v1 | null |
2025-03-07 | Stereo Any Video: Temporally Consistent Stereo Matching | Junpeng Jing et.al. | 2503.05549v1 | null |
2025-03-07 | Data-Efficient Generalization for Zero-shot Composed Image Retrieval | Zining Chen et.al. | 2503.05204v1 | null |
2025-03-06 | Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation | Bryan Li et.al. | 2503.05010v1 | null |
2025-03-06 | Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning | Albert Wilcox et.al. | 2503.04877v1 | null |
2025-03-06 | Memory Is All You Need: Testing How Model Memory Affects LLM Performance in Annotation Tasks | Joan C. Timoneda et.al. | 2503.04874v1 | null |
2025-03-06 | Enough Coin Flips Can Make LLMs Act Bayesian | Ritwik Gupta et.al. | 2503.04722v1 | null |
2025-03-06 | A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning | Qing Zhou et.al. | 2503.04592v1 | null |
2025-03-06 | SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks | Yijie Guo et.al. | 2503.04538v1 | null |
2025-03-06 | AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM | Sunghyun Ahn et.al. | 2503.04504v1 | null |
2025-03-06 | Semantic Alignment of Unimodal Medical Text and Vision Representations | Maxime Di Folco et.al. | 2503.04478v1 | null |
2025-03-06 | EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images | Rohit Menon et.al. | 2503.04441v1 | null |
2025-03-06 | A Dataset for Analysing News Framing in Chinese Media | Owen Cook et.al. | 2503.04439v1 | null |
2025-03-06 | Comparative Study of Zero-Shot Cross-Lingual Transfer for Bodo POS and NER Tagging Using Gemini 2.0 Flash Thinking Experimental Model | Sanjib Narzary et.al. | 2503.04405v1 | null |
2025-03-06 | Large Language Models for Zero-shot Inference of Causal Structures in Biology | Izzy Newsham et.al. | 2503.04347v1 | null |
2025-03-06 | Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior | Haitao Wu et.al. | 2503.04207v1 | null |
2025-03-05 | OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction | Huang Huang et.al. | 2503.03734v1 | null |
2025-03-05 | CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP | Songlong Xing et.al. | 2503.03613v1 | link |
2025-03-05 | Scaling Crowdsourced Election Monitoring: Construction and Evaluation of Classification Models for Multilingual and Cross-Domain Classification Settings | Jabez Magomere et.al. | 2503.03582v1 | null |
2025-03-05 | iNews: A Multimodal Dataset for Modeling Personalized Affective Responses to News | Tiancheng Hu et.al. | 2503.03335v1 | null |
2025-03-04 | ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation | Yufei Wang et.al. | 2503.03045v1 | null |
2025-03-04 | Zero-Shot Multi-Label Classification of Bangla Documents: Large Decoders Vs. Classic Encoders | Souvika Sarkar et.al. | 2503.02993v1 | null |
2025-03-04 | RAILGUN: A Unified Convolutional Policy for Multi-Agent Path Finding Across Different Environments and Tasks | Yimin Tang et.al. | 2503.02992v1 | null |
2025-03-06 | Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training | Vaibhav Singh et.al. | 2503.02844v2 | null |
2025-03-04 | SeqFusion: Sequential Fusion of Pre-Trained Models for Zero-Shot Time-Series Forecasting | Ting-Ji Huang et.al. | 2503.02836v1 | link |
2025-03-04 | Bridging VLM and KMP: Enabling Fine-grained robotic manipulation via Semantic Keypoints Representation | Junjie Zhu et.al. | 2503.02748v1 | null |
2025-03-04 | Evaluating Knowledge Generation and Self-Refinement Strategies for LLM-based Column Type Annotation | Keti Korini et.al. | 2503.02718v1 | null |
2025-03-04 | FlowPlan: Zero-Shot Task Planning with LLM Flow Engineering for Robotic Instruction Following | Zijun Lin et.al. | 2503.02698v1 | null |
2025-03-04 | Zero-Shot Complex Question-Answering on Long Scientific Documents | Wanting Wang et.al. | 2503.02695v1 | null |
2025-03-04 | Towards Event Extraction with Massive Types: LLM-based Collaborative Annotation and Partitioning Extraction | Wenxuan Liu et.al. | 2503.02628v1 | null |
2025-03-04 | Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection | Wei Luo et.al. | 2503.02424v1 | null |
2025-03-04 | EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports | Lama Moukheiber et.al. | 2503.02365v1 | null |
2025-03-04 | Towards Explainable Doctor Recommendation with Large Language Models | Ziyang Zeng et.al. | 2503.02298v1 | null |
2025-02-28 | Assessing zero-shot generalisation behaviour in graph-neural-network interatomic potentials | Chiheb Ben Mahmoud et.al. | 2502.21317v1 | null |
2025-02-28 | LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging | Maximilian Rokuss et.al. | 2502.20985v1 | null |
2025-02-28 | WebFAQ: A Multilingual Collection of Natural Q&A Datasets for Dense Retrieval | Michael Dinzinger et.al. | 2502.20936v1 | null |
2025-02-28 | Less is More? Revisiting the Importance of Frame Rate in Real-Time Zero-Shot Surgical Video Segmentation | Utku Ozbulak et.al. | 2502.20934v1 | null |
2025-02-28 | DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping | Yifan Zhong et.al. | 2502.20900v1 | null |
2025-02-28 | Better Benchmarking LLMs for Zero-Shot Dependency Parsing | Ana Ezquerro et.al. | 2502.20866v1 | null |
2025-02-28 | MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image | Shaoming Li et.al. | 2502.20861v1 | null |
2025-02-28 | Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments | Yoonyoung Cho et.al. | 2502.20843v1 | null |
2025-02-28 | CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval | Zelong Sun et.al. | 2502.20826v1 | null |
2025-02-28 | MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models | Qiao Yan et.al. | 2502.20780v1 | null |
2025-02-27 | InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions | Sirui Xu et.al. | 2502.20390v1 | null |
2025-02-27 | Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization | Lujie Yang et.al. | 2502.20382v1 | null |
2025-02-27 | Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners | Daniele Paliotta et.al. | 2502.20339v1 | null |
2025-02-27 | UniTok: A Unified Tokenizer for Visual Generation and Understanding | Chuofan Ma et.al. | 2502.20321v1 | link |
2025-02-27 | FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction | Siyu Jiao et.al. | 2502.20313v1 | link |
2025-02-27 | Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription | Benjamin Gutteridge et.al. | 2502.20295v1 | link |
2025-02-27 | Visual Adaptive Prompting for Compositional Zero-Shot Learning | Kyle Stein et.al. | 2502.20292v1 | null |
2025-02-27 | An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs | Kaustubh Vyas et.al. | 2502.20175v1 | null |
2025-02-27 | Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models | Itay Benou et.al. | 2502.20134v1 | null |
2025-02-27 | UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler | Luigi Piccinelli et.al. | 2502.20110v1 | link |
2025-02-26 | ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models | Danae Sánchez Villegas et.al. | 2502.19409v1 | null |
2025-02-26 | Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator | Xiankang He et.al. | 2502.19204v1 | link |
2025-02-26 | A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs | Xuan Ding et.al. | 2502.19159v1 | null |
2025-02-26 | A Survey on Foundation-Model-Based Industrial Defect Detection | Tianle Yang et.al. | 2502.19106v1 | null |
2025-02-26 | Foundation Inference Models for Stochastic Differential Equations: A Transformer-based Approach for Zero-shot Function Estimation | Patrick Seifner et.al. | 2502.19049v1 | null |
2025-02-26 | FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach | Anju Rani et.al. | 2502.19038v1 | null |
2025-02-26 | Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis | Ziyue Jiang et.al. | 2502.18924v1 | null |
2025-02-26 | Think on your feet: Seamless Transition between Human-like Locomotion in Response to Changing Commands | Huaxing Huang et.al. | 2502.18901v1 | null |
2025-02-26 | Hierarchical corpus encoder: Fusing generative retrieval and dense indices | Tongfei Chen et.al. | 2502.18877v1 | null |
2025-02-26 | Data-Efficient Multi-Agent Spatial Planning with LLMs | Huangyuan Su et.al. | 2502.18822v1 | null |
2025-02-25 | Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs | Rohit Gheyi et.al. | 2502.18454v1 | null |
2025-02-25 | LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation | Pengzhi Li et.al. | 2502.18302v1 | null |
2025-02-25 | Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training | Botao Ye et.al. | 2502.18219v1 | null |
2025-02-25 | Task-Agnostic Semantic Communication with Multimodal Foundation Models | Jiangjing Hu et.al. | 2502.18200v1 | null |
2025-02-25 | CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification | Mingkun Zhang et.al. | 2502.18176v1 | link |
2025-02-25 | Progressive Local Alignment for Medical Multimodal Pre-training | Huimin Yan et.al. | 2502.18047v1 | null |
2025-02-26 | From planning to policy: distilling |
Haewon Jung et.al. | 2502.18015v2 | null |
2025-02-25 | Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs | Che Liu et.al. | 2502.17900v1 | null |
2025-02-25 | FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real | Weiheng Liu et.al. | 2502.17894v1 | null |
2025-02-25 | UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | Haoyuan Li et.al. | 2502.17860v1 | null |
2025-02-24 | X-Dancer: Expressive Music to Human Dance Video Generation | Zeyuan Chen et.al. | 2502.17414v1 | null |
2025-02-24 | FIG: Forward-Inverse Generation for Low-Resource Domain-specific Event Detection | Tanmay Parekh et.al. | 2502.17394v1 | null |
2025-02-24 | Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN Corpus | Golshid Shekoufandeh et.al. | 2502.17284v1 | null |
2025-02-24 | VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Xiangpeng Yang et.al. | 2502.17258v1 | null |
2025-02-24 | Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search | Boyan Li et.al. | 2502.17248v1 | null |
2025-02-24 | A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding | Hamidreza Raei et.al. | 2502.17221v1 | null |
2025-02-24 | DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications | Ibrahim Fayad et.al. | 2502.17066v1 | null |
2025-02-24 | LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences | Sijia Yao et.al. | 2502.17057v1 | link |
2025-02-24 | MA2RL: Masked Autoencoders for Generalizable Multi-Agent Reinforcement Learning | Jinyuan Feng et.al. | 2502.17046v1 | null |
2025-02-24 | Reasoning Does Not Necessarily Improve Role-Playing Ability | Xiachong Feng et.al. | 2502.16940v1 | null |
2025-02-21 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval | Guanqi Zhan et.al. | 2502.15682v1 | null |
2025-02-21 | One-step Diffusion Models with |
Yilun Xu et.al. | 2502.15681v1 | null |
2025-02-21 | Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach | Xiangtong Yao et.al. | 2502.15613v1 | link |
2025-02-21 | FaultGPT: Industrial Fault Diagnosis Question Answering System by Vision Language Models | Jiao Chen et.al. | 2502.15481v1 | null |
2025-02-21 | Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction | Baohang Zhou et.al. | 2502.15290v1 | link |
2025-02-21 | From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants | Manisha Mukherjee et.al. | 2502.15237v1 | null |
2025-02-21 | GNN-Coder: Boosting Semantic Code Retrieval with Combined GNNs and Transformer | Yufan Ye et.al. | 2502.15202v1 | null |
2025-02-21 | Extreme Speech Classification in the Era of LLMs: Exploring Open-Source and Proprietary Models | Sarthak Mahajan et.al. | 2502.15155v1 | null |
2025-02-20 | A Meta-Evaluation of Style and Attribute Transfer Metrics | Amalie Brogaard Pauli et.al. | 2502.15022v1 | null |
2025-02-20 | Using tournaments to calculate AUROC for zero-shot classification with LLMs | Wonjin Yoon et.al. | 2502.15018v1 | null |
2025-02-20 | Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models | Vlad Sobal et.al. | 2502.14819v1 | null |
2025-02-20 | Dynamic Low-Rank Sparse Adaptation for Large Language Models | Weizhong Huang et.al. | 2502.14816v1 | link |
2025-02-20 | RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation | Henrique Piñeiro Monteagudo et.al. | 2502.14792v1 | null |
2025-02-20 | SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Michael Tschannen et.al. | 2502.14786v1 | link |
2025-02-20 | Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective | Weizhong Huang et.al. | 2502.14770v1 | null |
2025-02-20 | Entity Framing and Role Portrayal in the News | Tarek Mahmoud et.al. | 2502.14718v1 | null |
2025-02-20 | Exploring RWKV for Sentence Embeddings: Layer-wise Analysis and Baseline Comparison for Semantic Similarity | Xinghan Pan et.al. | 2502.14620v1 | link |
2025-02-20 | Noisy Test-Time Adaptation in Vision-Language Models | Chentao Cao et.al. | 2502.14604v1 | link |
2025-02-20 | LLM-based User Profile Management for Recommender System | Seunghwan Bang et.al. | 2502.14541v1 | null |
2025-02-20 | Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation | Austin A. Barr et.al. | 2502.14523v1 | link |
2025-02-20 | Where's the Bug? Attention Probing for Scalable Fault Localization | Adam Stein et.al. | 2502.13966v2 | null |
2025-02-19 | A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects | Arjun Gupta et.al. | 2502.13964v1 | null |
2025-02-19 | NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants | Yiran Qin et.al. | 2502.13894v1 | null |
2025-02-19 | Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models | Peter Carragher et.al. | 2502.13836v1 | null |
2025-02-19 | MMTEB: Massive Multilingual Text Embedding Benchmark | Kenneth Enevoldsen et.al. | 2502.13595v1 | null |
2025-02-19 | Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs | Joonatan Laato et.al. | 2502.13566v1 | null |
2025-02-19 | PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference | Burc Gokden et.al. | 2502.13502v1 | link |
2025-02-19 | Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Yang Yan et.al. | 2502.13447v1 | null |
2025-02-19 | MaizeEar-SAM: Zero-Shot Maize Ear Phenotyping | Hossein Zaremehrjerdi et.al. | 2502.13399v1 | link |
2025-02-19 | Vishal Dey et.al. | 2502.13398v1 | link | |
2025-02-18 | LAMD: Context-driven Android Malware Detection and Classification with LLMs | Xingzhi Qian et.al. | 2502.13055v1 | null |
2025-02-18 | Detection and Geographic Localization of Natural Objects in the Wild: A Case Study on Palms | Kangning Cui et.al. | 2502.13023v1 | null |
2025-02-18 | A Survey of Text Classification Under Class Distribution Shift | Adriana Valentina Costache et.al. | 2502.12965v1 | null |
2025-02-18 | Performance of Zero-Shot Time Series Foundation Models on Cloud Data | William Toner et.al. | 2502.12944v1 | null |
2025-02-18 | Commonsense Reasoning in Arab Culture | Abdelrahman Sadallah et.al. | 2502.12788v1 | null |
2025-02-18 | High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion | Xiang Zhang et.al. | 2502.12752v1 | null |
2025-02-18 | Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation | Yong Zhang et.al. | 2502.12744v1 | null |
2025-02-18 | SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning | Peizhuo Li et.al. | 2502.12674v1 | null |
2025-02-18 | Label Drop for Multi-Aspect Relation Modeling in Universal Information Extraction | Lu Yang et.al. | 2502.12614v1 | link |
2025-02-18 | Enhancing Semi-supervised Learning with Noisy Zero-shot Pseudolabels | Jichan Chung et.al. | 2502.12584v1 | null |
2025-02-17 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | Yige Xu et.al. | 2502.12134v1 | null |
2025-02-17 | Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation | Zhongyi Qiu et.al. | 2502.12073v1 | null |
2025-02-17 | Model Generalization on Text Attribute Graphs: Principles with Large Language Models | Haoyu Wang et.al. | 2502.11836v1 | link |
2025-02-17 | Text Classification in the LLM Era - Where do we stand? | Sowmya Vajjala et.al. | 2502.11830v1 | null |
2025-02-17 | video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model | Guangzhi Sun et.al. | 2502.11775v1 | null |
2025-02-17 | Multi-Modal Retrieval Augmentation for Open-Ended and Knowledge-Intensive Video Question Answering | Md Zarif Ul Alam et.al. | 2502.11747v1 | null |
2025-02-17 | Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics | Francesco Croce et.al. | 2502.11725v1 | link |
2025-02-17 | MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction | Jingcheng Ni et.al. | 2502.11663v1 | link |
2025-02-17 | Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance | Birger Moell et.al. | 2502.11578v1 | null |
2025-02-17 | Improving Rare-Word Recognition in Zero-Shot Settings | Yash Jogi et.al. | 2502.11572v1 | null |
2025-02-14 | Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction | WonJin Yoon et.al. | 2502.10388v1 | null |
2025-02-14 | SPIRIT: Short-term Prediction of solar IRradIance for zero-shot Transfer learning using Foundation Models | Aditya Mishra et.al. | 2502.10307v1 | null |
2025-02-14 | Are Large Language Models the future crowd workers of Linguistics? | Iris Ferrazzo et.al. | 2502.10266v1 | null |
2025-02-14 | Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers | Aivin V. Solatorio et.al. | 2502.10263v1 | null |
2025-02-14 | PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control | Kunal Swami et.al. | 2502.10258v1 | null |
2025-02-14 | Cooperative Multi-Agent Planning with Adaptive Skill Synthesis | Zhiyuan Li et.al. | 2502.10148v1 | null |
2025-02-14 | AutoS |
Zhengqiu Zhu et.al. | 2502.09913v1 | null |
2025-02-14 | Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond | Kehan Guo et.al. | 2502.09897v1 | null |
2025-02-13 | Evaluating GPT's Capability in Identifying Stages of Cognitive Impairment from Electronic Health Data | Yu Leng et.al. | 2502.09715v1 | null |
2025-02-13 | Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights | Jonathan Kahana et.al. | 2502.09619v1 | null |
2025-02-13 | Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs | Siyan Zhao et.al. | 2502.09597v1 | link |
2025-02-14 | Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering | Mark Beliaev et.al. | 2502.09573v2 | null |
2025-02-13 | Zero-shot generation of synthetic neurosurgical data with large language models | Austin A. Barr et.al. | 2502.09566v1 | link |
2025-02-13 | AnomalyGFM: Graph Foundation Model for Zero/Few-shot Anomaly Detection | Hezhe Qiao et.al. | 2502.09254v1 | null |
2025-02-13 | E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization | Trung X. Pham et.al. | 2502.09164v1 | null |
2025-02-13 | Zero-shot Concept Bottleneck Models | Shin'ya Yamaguchi et.al. | 2502.09018v1 | link |
2025-02-13 | Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech | Jonathan Pofcher et.al. | 2502.09004v1 | null |
2025-02-13 | Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning | Hyundong Cho et.al. | 2502.08972v1 | null |
2025-02-12 | MuJoCo Playground | Kevin Zakka et.al. | 2502.08844v1 | null |
2025-02-12 | Re |
Xiaoshen Han et.al. | 2502.08645v1 | null |
2025-02-12 | Rhythmic sharing: A bio-inspired paradigm for zero-shot adaptation and learning in neural networks | Hoony Kang et.al. | 2502.08644v1 | link |
2025-02-12 | From Haystack to Needle: Label Space Reduction for Zero-shot Classification | Nathan Vandemoortele et.al. | 2502.08436v1 | null |
2025-02-12 | Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning | Sun Jingbo et.al. | 2502.08336v1 | null |
2025-02-12 | FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation | Yang Sun et.al. | 2502.08260v1 | link |
2025-02-12 | HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses | Sujeong Lee et.al. | 2502.08109v1 | null |
2025-02-12 | Franken-Adapter: Cross-Lingual Adaptation of LLMs by Embedding Surgery | Fan Jiang et.al. | 2502.08037v1 | null |
2025-02-11 | Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models | Jiacong Xu et.al. | 2502.07601v1 | null |
2025-02-11 | LoRP-TTS: Low-Rank Personalized Text-To-Speech | Łukasz Bondaruk et.al. | 2502.07562v1 | null |
2025-02-12 | O1 Embedder: Let Retrievers Think Before Action | Ruiran Yan et.al. | 2502.07555v2 | null |
2025-02-11 | Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction | Leying Zhang et.al. | 2502.07345v1 | null |
2025-02-11 | TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation | Navid Rajabi et.al. | 2502.07306v1 | null |
2025-02-11 | Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement | Xueyao Zhang et.al. | 2502.07243v1 | null |
2025-02-11 | PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval | Osman Tursun et.al. | 2502.07215v1 | null |
2025-02-11 | Perceived Confidence Scoring for Data Annotation with Zero-Shot LLMs | Sina Salimian et.al. | 2502.07186v1 | null |
2025-02-11 | Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification | Peipei Wei et.al. | 2502.07165v1 | null |
2025-02-10 | Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations | Yong Cao et.al. | 2502.07068v1 | link |
2025-02-10 | Visual Agentic AI for Spatial Reasoning with a Dynamic API | Damiano Marsili et.al. | 2502.06787v1 | null |
2025-02-10 | Boosting Self-Efficacy and Performance of Large Language Models via Verbal Efficacy Stimulations | Rui Chen et.al. | 2502.06669v1 | null |
2025-02-10 | MaterialFusion: High-Quality, Zero-Shot, and Controllable Material Transfer with Diffusion Models | Kamil Garifullin et.al. | 2502.06606v1 | null |
2025-02-10 | CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers | D. She et.al. | 2502.06527v1 | null |
2025-02-10 | Learning Clustering-based Prototypes for Compositional Zero-shot Learning | Hongyu Qu et.al. | 2502.06501v1 | link |
2025-02-10 | Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences | Riccardo Cadei et.al. | 2502.06343v1 | null |
2025-02-10 | Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior | Lee Hyoseok et.al. | 2502.06338v1 | null |
2025-02-10 | Find Central Dogma Again | Wang Liang et.al. | 2502.06253v1 | null |
2025-02-10 | Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy | Kamyar Kazari et.al. | 2502.06150v1 | null |
2025-02-09 | Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning | Bidipta Sarkar et.al. | 2502.06060v1 | link |
2025-02-07 | QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation | Yue Zhao et.al. | 2502.05178v1 | null |
2025-02-07 | AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting | Chung-Ho Wu et.al. | 2502.05176v1 | null |
2025-02-07 | DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions | Gorkem Can Ates et.al. | 2502.05091v1 | null |
2025-02-07 | Aligning Black-box Language Models with Human Judgments | Gerrit J. J. van den Burg et.al. | 2502.04997v1 | null |
2025-02-07 | OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting | Xiaoyu Zhou et.al. | 2502.04981v1 | null |
2025-02-07 | STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion | Zhenwei Wu et.al. | 2502.04692v1 | null |
2025-02-07 | ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning | Yuwei Yin et.al. | 2502.04689v1 | link |
2025-02-07 | Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers | Chashi Mahiul Islam et.al. | 2502.04679v1 | null |
2025-02-06 | Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer | Yulun Wu et.al. | 2502.04573v1 | null |
2025-02-06 | GenVC: Self-Supervised Zero-Shot Voice Conversion | Zexin Cai et.al. | 2502.04519v1 | null |
2025-02-06 | ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | Alec Helbling et.al. | 2502.04320v1 | link |
2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263v1 | link |
2025-02-06 | LR0.FM: Low-Resolution Zero-shot Classification Benchmark For Foundation Models | Priyank Pathak et.al. | 2502.03950v1 | link |
2025-02-06 | DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation | Dongya Jia et.al. | 2502.03930v1 | null |
2025-02-06 | It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers | Benjamin Clavié et.al. | 2502.03793v1 | null |
2025-02-05 | DynVFX: Augmenting Real Videos with Dynamic Content | Danah Yatim et.al. | 2502.03621v1 | null |
2025-02-05 | SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living | Arkaprava Sinha et.al. | 2502.03459v1 | null |
2025-02-05 | Think or Step-by-Step? UnZIPping the Black Box in Zero-Shot Prompts | Nikta Gohari Sadr et.al. | 2502.03418v1 | null |
2025-02-05 | Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications | Issar Arab et.al. | 2502.03395v1 | null |
2025-02-05 | CAPE: Covariate-Adjusted Pre-Training for Epidemic Time Series Forecasting | Zewen Liu et.al. | 2502.03393v1 | null |
2025-02-05 | ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models | Ying Zhang et.al. | 2502.03266v1 | link |
2025-02-05 | SimSort: A Powerful Framework for Spike Sorting by Large-Scale Electrophysiology Simulation | Yimu Zhang et.al. | 2502.03198v1 | null |
2025-02-05 | Metis: A Foundation Speech Generation Model with Masked Generative Pre-training | Yuancheng Wang et.al. | 2502.03128v1 | link |
2025-02-05 | IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates | Aissatou Diallo et.al. | 2502.03080v1 | null |
2025-02-05 | Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech | Jixun Yao et.al. | 2502.02950v1 | null |
2025-02-04 | RFMedSAM 2: Automatic Prompt Refinement for Enhanced Volumetric Medical Image Segmentation with SAM 2 | Bin Xie et.al. | 2502.02741v1 | null |
2025-02-04 | IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning | Quan Zhang et.al. | 2502.02454v1 | null |
2025-02-04 | Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects | Henrique Nunes et.al. | 2502.02368v1 | null |
2025-02-04 | LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models | Yuto Kojima et.al. | 2502.02069v1 | null |
2025-02-04 | VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play | Zelai Xu et.al. | 2502.01932v1 | null |
2025-02-03 | AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis | Basit Alawode et.al. | 2502.01785v1 | null |
2025-02-03 | Expected Return Symmetries | Darius Muglich et.al. | 2502.01711v1 | null |
2025-02-03 | Scalable Language Models with Posterior Inference of Latent Thought Vectors | Deqian Kong et.al. | 2502.01567v1 | null |
2025-02-03 | Toward Task Generalization via Memory Augmentation in Meta-Reinforcement Learning | Kaixi Bao et.al. | 2502.01521v1 | null |
2025-02-03 | Embrace Collisions: Humanoid Shadowing for Deployable Contact-Agnostics Motions | Ziwen Zhuang et.al. | 2502.01465v1 | null |
2025-02-03 | A Framework for Double-Blind Federated Adaptation of Foundation Models | Nurbek Tastan et.al. | 2502.01289v1 | null |
2025-01-31 | MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems | Anirudh Chari et.al. | 2501.19318v1 | null |
2025-01-31 | Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs | James Flemings et.al. | 2501.19287v1 | null |
2025-01-31 | A Zero-Shot Generalization Framework for LLM-Driven Cross-Domain Sequential Recommendation | Yunzhe Li et.al. | 2501.19232v1 | null |
2025-01-31 | Autonomous Legacy Web Application Upgrades Using a Multi-Agent System | Valtteri Ala-Salmi et.al. | 2501.19204v1 | link |
2025-01-31 | Efficient Reasoning with Hidden Thinking | Xuan Shen et.al. | 2501.19201v1 | link |
2025-01-31 | Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected | Yingtao Zhang et.al. | 2501.19107v1 | null |
2025-01-31 | Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification | Xiangyu Sun et.al. | 2501.19086v1 | null |
2025-02-03 | Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment | Song-Lin Lv et.al. | 2501.19060v2 | null |
2025-01-31 | TV-Dialogue: Crafting Theme-Aware Video Dialogues with Immersive Interaction | Sai Wang et.al. | 2501.18940v1 | null |
2025-01-31 | Test-time Loss Landscape Adaptation for Zero-Shot Generalization in Vision-Language Models | Aodi Li et.al. | 2501.18864v1 | null |
2025-01-30 | DeltaLLM: Compress LLMs with Low-Rank Deltas between Shared Weights | Liana Mikaelyan et.al. | 2501.18596v1 | null |
2025-01-30 | Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models | Guanqun Cao et.al. | 2501.18516v1 | null |
2025-01-30 | CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering | Yumeng Wang et.al. | 2501.18457v1 | null |
2025-01-30 | ReactEmbed: A Cross-Domain Framework for Protein-Molecule Representation Learning via Biochemical Reaction Networks | Amitay Sicherman et.al. | 2501.18278v1 | link |
2025-01-30 | Unraveling the Capabilities of Language Models in News Summarization | Abdurrahman Odabaşı et.al. | 2501.18128v1 | link |
2025-01-30 | LLMs can see and hear without any training | Kumar Ashutosh et.al. | 2501.18096v1 | link |
2025-01-29 | Hybrid Graphs for Table-and-Text based Question Answering using LLMs | Ankush Agarwal et.al. | 2501.17767v1 | null |
2025-01-29 | VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching | Ha-Yeong Choi et.al. | 2501.17612v1 | null |
2025-01-29 | LLM Assistance for Pediatric Depression | Mariia Ignashina et.al. | 2501.17510v1 | null |
2025-01-29 | General Scene Adaptation for Vision-and-Language Navigation | Haodong Hong et.al. | 2501.17403v1 | link |
2025-01-28 | RLPP: A Residual Method for Zero-Shot Real-World Autonomous Racing on Scaled Platforms | Edoardo Ghignone et.al. | 2501.17311v1 | link |
2025-01-28 | Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization | Zilu Tang et.al. | 2501.17295v1 | null |
2025-01-28 | Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding | Akash Kumar et.al. | 2501.17053v1 | null |
2025-01-28 | Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet? | Sania Waheed et.al. | 2501.16947v1 | null |
2025-01-28 | Irony Detection, Reasoning and Understanding in Zero-shot Learning | Peiling Yi et.al. | 2501.16884v1 | null |
2025-01-28 | LLM Assisted Anomaly Detection Service for Site Reliability Engineers: Enhancing Cloud Infrastructure Resilience | Nimesh Jha et.al. | 2501.16744v1 | null |
2025-01-28 | B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing | Yoojin Jang et.al. | 2501.16724v1 | link |
2025-01-28 | Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion | Shengyuan Liu et.al. | 2501.16679v1 | link |
2025-01-27 | How well can LLMs Grade Essays in Arabic? | Rayed Ghazawi et.al. | 2501.16516v1 | null |
2025-01-27 | Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM | Payal Kamboj et.al. | 2501.16481v1 | link |
2025-01-28 | Upside Down Reinforcement Learning with Policy Generators | Jacopo Di Ventura et.al. | 2501.16288v2 | link |
2025-01-27 | Zero-Shot Decision Tree Construction via Large Language Models | Lucas Carrasco et.al. | 2501.16247v1 | null |
2025-01-27 | CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation | Xiaochuan Ma et.al. | 2501.16246v1 | null |
2025-01-27 | SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP | Li Pang et.al. | 2501.16222v1 | link |
2025-01-27 | Solving Turbulent Rayleigh-Bénard Convection using Fourier Neural Operators | Michiel Straat et.al. | 2501.16209v1 | null |
2025-01-27 | TimeHF: Billion-Scale Time Series Models Guided by Human Feedback | Yongzhi Qi et.al. | 2501.15942v1 | null |
2025-01-27 | SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model | Delin Qu et.al. | 2501.15830v1 | null |
2025-01-27 | MM-Retinal V2: Transfer an Elite Knowledge Spark into Fundus Vision-Language Pretraining | Ruiqi Wu et.al. | 2501.15798v1 | link |
2025-01-27 | GraphICL: Unlocking Graph Learning Potential in LLMs through Structured Prompt Design | Yuanfu Sun et.al. | 2501.15755v1 | null |
2025-01-26 | StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces | Kyeongmin Yeo et.al. | 2501.15445v1 | null |
2025-01-24 | Calibrating Wireless AI via Meta-Learned Context-Dependent Conformal Prediction | Seonghoon Yoo et.al. | 2501.14566v1 | null |
2025-01-24 | Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding | Zhongyi Shui et.al. | 2501.14548v1 | link |
2025-01-24 | On Correlating Factors for Domain Adaptation Performance | Goksenin Yuksel et.al. | 2501.14466v1 | null |
2025-01-24 | Interpretability Analysis of Domain Adapted Dense Retrievers | Goksenin Yuksel et.al. | 2501.14459v1 | null |
2025-01-24 | Remining Hard Negatives for Generative Pseudo Labeled Domain Adaptation | Goksenin Yuksel et.al. | 2501.14434v1 | null |
2025-01-24 | GraphBC: Improving LLMs for Better Graph Data Processing | Xu Chu et.al. | 2501.14427v1 | null |
2025-01-24 | Kolmogorov Arnold Neural Interpolator for Downscaling and Correcting Meteorological Fields from In-Situ Observations | Zili Liu et.al. | 2501.14404v1 | null |
2025-01-24 | Learning Primitive Relations for Compositional Zero-Shot Learning | Insu Lee et.al. | 2501.14308v1 | null |
2025-01-24 | A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education | Calvin Yeung et.al. | 2501.14305v1 | link |
2025-01-24 | PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction | Hammad Ayyubi et.al. | 2501.14210v1 | null |
2025-01-23 | Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference | Shuqi Dai et.al. | 2501.13870v1 | null |
2025-01-23 | Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning | Shiyu Zhang et.al. | 2501.13859v1 | null |
2025-01-23 | Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models | Chaolei Han et.al. | 2501.13795v1 | null |
2025-01-23 | Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak | Erjia Xiao et.al. | 2501.13772v1 | null |
2025-01-23 | Training-Free Consistency Pipeline for Fashion Repose | Potito Aghilar et.al. | 2501.13692v1 | null |
2025-01-23 | Text-driven Online Action Detection | Manuel Benavent-Lledo et.al. | 2501.13518v1 | link |
2025-01-23 | Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks | Ruijia Liu et.al. | 2501.13457v1 | null |
2025-01-23 | Scalable Evaluation Framework for Foundation Models in Musculoskeletal MRI Bridging Computational Innovation with Clinical Utility | Gabrielle Hoyer et.al. | 2501.13376v1 | link |
2025-01-23 | Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement | Jae-Sung Bae et.al. | 2501.13372v1 | null |
2025-01-22 | State Combinatorial Generalization In Decision Making With Conditional Diffusion Models | Xintong Duan et.al. | 2501.13241v1 | null |
2025-01-22 | Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation | Akshay Krishnan et.al. | 2501.13087v1 | null |
2025-01-22 | Evolution and The Knightian Blindspot of Machine Learning | Joel Lehman et.al. | 2501.13075v1 | null |
2025-01-22 | Beyond the Lungs: Extending the Field of View in Chest CT with Latent Diffusion Models | Lianrui Zuo et.al. | 2501.13068v1 | null |
2025-01-22 | Correctness Assessment of Code Generated by Large Language Models Using Internal Representations | Tuan-Dung Bui et.al. | 2501.12934v1 | link |
2025-01-22 | Patent Figure Classification using Large Vision-language Models | Sushil Awale et.al. | 2501.12751v1 | link |
2025-01-22 | Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression | Kai Yoshida et.al. | 2501.12698v1 | null |
2025-01-22 | Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering | Qian Tao et.al. | 2501.12697v1 | null |
2025-01-22 | Can masking background and object reduce static bias for zero-shot action recognition? | Takumi Fukuzawa et.al. | 2501.12681v1 | null |
2025-01-21 | fabSAM: A Farmland Boundary Delineation Method Based on the Segment Anything Model | Yufeng Xie et.al. | 2501.12487v1 | null |
2025-01-21 | Slot-BERT: Self-supervised Object Discovery in Surgical Video | Guiqiu Liao et.al. | 2501.12477v1 | null |
2025-01-21 | Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Sili Chen et.al. | 2501.12375v1 | null |
2025-01-21 | Zero-shot Bias Correction: Efficient MR Image Inhomogeneity Reduction Without Any Data | Hongxu Yang et.al. | 2501.12244v1 | null |
2025-01-21 | Survey on Monocular Metric Depth Estimation | Jiuling Zhang et.al. | 2501.11841v1 | null |
2025-01-20 | SimLabel: Consistency-Guided OOD Detection with Pretrained Vision-Language Models | Shu Zou et.al. | 2501.11485v1 | link |
2025-01-20 | MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching | Yepeng Liu et.al. | 2501.11299v1 | null |
2025-01-20 | KPL: Training-Free Medical Knowledge Mining of Vision-Language Models | Jiaxiang Liu et.al. | 2501.11231v1 | link |
2025-01-20 | Embedding-Driven Diversity Sampling to Improve Few-Shot Synthetic Data Generation | Ivan Lopez et.al. | 2501.11199v1 | null |
2025-01-19 | CART-MPC: Coordinating Assistive Devices for Robot-Assisted Transferring with Multi-Agent Model Predictive Control | Ruolin Ye et.al. | 2501.11149v1 | null |
2025-01-19 | Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | Yiyao Yu et.al. | 2501.11110v1 | null |
2025-01-19 | Can LLM Generate Regression Tests for Software Commits? | Jing Liu et.al. | 2501.11086v1 | null |
2025-01-17 | FaceXBench: Evaluating Multimodal LLMs on Face Understanding | Kartik Narayan et.al. | 2501.10360v1 | link |
2025-01-17 | Zero-Shot Monocular Scene Flow Estimation in the Wild | Yiqing Liang et.al. | 2501.10357v1 | null |
2025-01-17 | Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics | Chenhao Li et.al. | 2501.10100v1 | null |
2025-01-17 | FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization | Zhaopeng Gu et.al. | 2501.10067v1 | link |
2025-01-17 | X-Dyna: Expressive Dynamic Human Image Animation | Di Chang et.al. | 2501.10021v1 | link |
2025-01-17 | Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models | Qiang Liu et.al. | 2501.09997v1 | null |
2025-01-17 | GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions | Heda Zuo et.al. | 2501.09972v1 | null |
2025-01-17 | MultiPruner: Balanced Structure Removal in Foundation Models | J. Pablo Muñoz et.al. | 2501.09949v1 | link |
2025-01-17 | FoundationStereo: Zero-Shot Stereo Matching | Bowen Wen et.al. | 2501.09898v1 | link |
2025-01-17 | FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Zhe Chen et.al. | 2501.09887v1 | null |
2025-01-16 | Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text | Jihed Ncib et.al. | 2501.09719v1 | null |
2025-01-16 | DEFOM-Stereo: Depth Foundation Model Based Stereo Matching | Hualie Jiang et.al. | 2501.09466v1 | link |
2025-01-16 | Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness | Zeyu Wang et.al. | 2501.09446v1 | null |
2025-01-16 | Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning | Harrison Fuller et.al. | 2501.09294v1 | null |
2025-01-16 | Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding | Kohei Torimi et.al. | 2501.09278v1 | null |
2025-01-15 | Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation | Xingxin He et.al. | 2501.09138v1 | null |
2025-01-15 | Tracking the Takes and Trajectories of English-Language News Narratives across Trustworthy and Worrisome Websites | Hans W. A. Hanley et.al. | 2501.09102v1 | link |
2025-01-15 | Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Ruixiang Jiang et.al. | 2501.09012v1 | link |
2025-01-15 | Exploring ChatGPT for Face Presentation Attack Detection in Zero and Few-Shot in-Context Learning | Alain Komaty et.al. | 2501.08799v1 | null |
2025-01-15 | StereoGen: High-quality Stereo Image Generation from a Single Image | Xianqi Wang et.al. | 2501.08654v1 | null |
2025-01-15 | MonSter: Marry Monodepth to Stereo Unleashes Power | Junda Cheng et.al. | 2501.08643v1 | link |
2025-01-15 | Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement | Qianniu Chen et.al. | 2501.08566v1 | null |
2025-01-14 | FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing | Isaac Corley et.al. | 2501.08490v1 | null |
2025-01-14 | Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time | Mihai Masala et.al. | 2501.08460v1 | null |
2025-01-14 | Toward Zero-Shot User Intent Recognition in Shared Autonomy | Atharv Belsare et.al. | 2501.08389v1 | null |
2025-01-14 | I Can Find You in Seconds! Leveraging Large Language Models for Code Authorship Attribution | Soohyeon Choi et.al. | 2501.08165v1 | null |
2025-01-14 | HydroelasticTouch: Simulation of Tactile Sensors with Hydroelastic Contact Surfaces | David P. Leins et.al. | 2501.08077v1 | null |
2025-01-14 | Skeleton and Font Generation Network for Zero-shot Chinese Character Generation | Mobai Xue et.al. | 2501.08062v1 | null |
2025-01-14 | Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models | Yifang Xu et.al. | 2501.07972v1 | null |
2025-01-13 | Constructing Set-Compositional and Negated Representations for First-Stage Ranking | Antonios Minas Krasakis et.al. | 2501.07679v1 | null |
2025-01-13 | BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations | Weixi Feng et.al. | 2501.07647v1 | null |
2025-01-13 | Investigating Large Language Models in Inferring Personality Traits from User Conversations | Jianfeng Zhu et.al. | 2501.07532v1 | null |
2025-01-13 | Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models | Yasiru Ranasinghe et.al. | 2501.07396v1 | null |
2025-01-13 | Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis | Andrzej D. Dobrzycki et.al. | 2501.07221v1 | null |
2025-01-14 | BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature | Alejandro Lozano et.al. | 2501.07171v2 | link |
2025-01-13 | Duplex: Dual Prototype Learning for Compositional Zero-Shot Learning | Zhong Peng et.al. | 2501.07114v1 | null |
2025-01-10 | OpenFOAMGPT: a RAG-Augmented LLM Agent for OpenFOAM-Based Computational Fluid Dynamics | Sandeep Pandey et.al. | 2501.06327v1 | null |
2025-01-10 | Learning Flexible Heterogeneous Coordination with Capability-Aware Shared Hypernetworks | Kevin Fu et.al. | 2501.06058v1 | link |
2025-01-10 | Generate, Transduct, Adapt: Iterative Transduction with VLMs | Oindrila Saha et.al. | 2501.06031v1 | null |
2025-01-10 | Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron | Kishor Kayyar Lakshminarayana et.al. | 2501.05976v1 | null |
2025-01-10 | MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model | Matthew Baas et.al. | 2501.05787v1 | null |
2025-01-10 | Super-class guided Transformer for Zero-Shot Attribute Classification | Sehyung Kim et.al. | 2501.05728v1 | link |
2025-01-10 | Zero-shot Shark Tracking and Biometrics from Aerial Imagery | Chinmay K Lalgudi et.al. | 2501.05717v1 | null |
2025-01-10 | The Impact of Model Scaling on Seen and Unseen Language Performance | Rhitabrat Pokharel et.al. | 2501.05629v1 | null |
2025-01-09 | FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion | Alef Iury Siqueira Ferreira et.al. | 2501.05586v1 | link |
2025-01-09 | Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding | Mohammed Elhenawy et.al. | 2501.05566v1 | null |
2025-01-09 | Improving Zero-Shot Object-Level Change Detection by Incorporating Visual Correspondence | Hung Huy Nguyen et.al. | 2501.05555v1 | link |
2025-01-09 | CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models | Fabian Hörst et.al. | 2501.05269v1 | link |
2025-01-09 | Harnessing Large Language and Vision-Language Models for Robust Out-of-Distribution Detection | Pei-Kang Lee et.al. | 2501.05228v1 | null |
2025-01-09 | Leveraging Large Language Models for Zero-shot Lay Summarisation in Biomedicine and Beyond | Tomas Goldsack et.al. | 2501.05224v1 | null |
2025-01-09 | SpaLLM-Guard: Pairing SMS Spam Detection Using Open-source and Commercial LLMs | Muhammad Salman et.al. | 2501.04985v1 | null |
2025-01-08 | Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation | Ulindu De Silva et.al. | 2501.04696v1 | link |
2025-01-08 | Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding | Joshua Jones et.al. | 2501.04693v1 | null |
2025-01-08 | DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Charles Corbière et.al. | 2501.04671v1 | null |
2025-01-08 | A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI | Kazusato Oko et.al. | 2501.04641v1 | link |
2025-01-09 | OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | Run Luo et.al. | 2501.04561v2 | link |
2025-01-08 | Hidden Entity Detection from GitHub Leveraging Large Language Models | Lu Gan et.al. | 2501.04455v1 | link |
2025-01-08 | Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints | Pavel Kolev et.al. | 2501.04426v1 | null |
2025-01-08 | ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training | Xinfa Zhu et.al. | 2501.04416v1 | null |
2025-01-08 | DispFormer: Pretrained Transformer for Flexible Dispersion Curve Inversion from Global Synthesis to Regional Applications | Feng Liu et.al. | 2501.04366v1 | link |
2025-01-08 | Online Gaussian Test-Time Adaptation of Vision-Language Models | Clément Fuchs et.al. | 2501.04352v1 | link |
2025-01-07 | Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection | Pablo Miralles-González et.al. | 2501.03940v1 | null |
2025-01-07 | ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting | Abhishek Saroha et.al. | 2501.03875v1 | null |
2025-01-07 | Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study | Xaver Maria Krückl et.al. | 2501.03863v1 | link |
2025-01-07 | OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints | Mingjie Pan et.al. | 2501.03841v1 | null |
2025-01-07 | MADation: Face Morphing Attack Detection with Foundation Models | Eduarda Caldeira et.al. | 2501.03800v1 | link |
2025-01-07 | KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration | Chengyuan Li et.al. | 2501.03786v1 | null |
2025-01-07 | Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series | Yuxiao Hu et.al. | 2501.03747v1 | null |
2025-01-07 | Realistic Test-Time Adaptation of Vision-Language Models | Maxime Zanella et.al. | 2501.03729v1 | link |
2025-01-07 | Exploring Optimal Latent Trajetory for Zero-shot Image Editing | Maomao Li et.al. | 2501.03631v1 | null |
2025-01-07 | LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment | Gaoussou Youssouf Kebe et.al. | 2501.03624v1 | null |
2025-01-06 | Gaussian Masked Autoencoders | Jathushan Rajasegaran et.al. | 2501.03229v1 | null |
2025-01-06 | GLiREL -- Generalist Model for Zero-Shot Relation Extraction | Jack Boylan et.al. | 2501.03172v1 | link |
2025-01-06 | Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy | Risha Goel et.al. | 2501.03153v1 | link |
2025-01-07 | Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild | Wanpeng Hu et.al. | 2501.02964v2 | link |
2025-01-06 | Sim-to-Real Transfer for Mobile Robots with Reinforcement Learning: from NVIDIA Isaac Sim to Gazebo and Real ROS 2 Robots | Sahar Salimpour et.al. | 2501.02902v1 | link |
2025-01-06 | Universal Features Guided Zero-Shot Category-Level Object Pose Estimation | Wentian Qu et.al. | 2501.02831v1 | null |
2025-01-06 | Holistic Semantic Representation for Navigational Trajectory Generation | Ji Cao et.al. | 2501.02737v1 | link |
2025-01-06 | EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models | Andrés Villa et.al. | 2501.02699v1 | null |
2025-01-05 | LLMs Help Alleviate the Cross-Subject Variability in Brain Signal and Language Alignment | Yifei Liu et.al. | 2501.02621v1 | null |
2025-01-05 | CHAIR-Classifier of Hallucination as Improver | Ao Sun et.al. | 2501.02518v1 | link |
2025-01-03 | IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution | Athanasios Tragakis et.al. | 2501.01723v1 | null |
2025-01-03 | LLMs & Legal Aid: Understanding Legal Needs Exhibited Through User Queries | Michal Kuk et.al. | 2501.01711v1 | null |
2025-01-03 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428v2 | null |
2025-01-02 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427v1 | null |
2025-01-02 | Unifying Specialized Visual Encoders for Video Language Models | Jihoon Chung et.al. | 2501.01426v1 | link |
2025-01-03 | AdaptVC: High Quality Voice Conversion with Adaptive Learning | Jaehun Kim et.al. | 2501.01347v2 | null |
2025-01-02 | Digital Guardians: Can GPT-4, Perspective API, and Moderation API reliably detect hate speech in reader comments of German online newspapers? | Manuel Weber et.al. | 2501.01256v1 | null |
2025-01-02 | Automated Self-Refinement and Self-Correction for LLM-based Product Attribute Value Extraction | Alexander Brinkmann et.al. | 2501.01237v1 | link |
2025-01-02 | Symmetries-enhanced Multi-Agent Reinforcement Learning | Nikolaos Bousias et.al. | 2501.01136v1 | null |
2025-01-03 | MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization | Haina Zhu et.al. | 2501.01108v2 | link |
2025-01-02 | Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice | Federico Ravenda et.al. | 2501.00982v1 | link |
2025-01-01 | Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model | Chenyang Liu et.al. | 2501.00895v1 | null |
2024-12-30 | QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing | Shlomo Kashani et.al. | 2412.20956v1 | null |
2024-12-30 | Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding | Liuzhenghao Lv et.al. | 2412.20888v1 | link |
2024-12-30 | TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting | Huanyu Zhang et.al. | 2412.20810v1 | null |
2024-12-30 | Learning to Rank Pre-trained Vision-Language Models for Downstream Tasks | Yuhe Ding et.al. | 2412.20682v1 | null |
2024-12-29 | Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) | Tomer Garber et.al. | 2412.20596v1 | link |
2024-12-27 | Zero-shot Hazard Identification in Autonomous Driving: A Case Study on the COOOL Benchmark | Lukas Picek et.al. | 2412.19944v1 | null |
2024-12-27 | EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs | Daniil A. Berdyshev et.al. | 2412.19725v1 | link |
2024-12-30 | VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models | Tao Wu et.al. | 2412.19645v2 | null |
2024-12-27 | MINIMA: Modality Invariant Image Matching | Xingyu Jiang et.al. | 2412.19412v1 | link |
2024-12-26 | Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment | Ziang Yan et.al. | 2412.19326v1 | link |
2024-12-26 | RecLM: Recommendation Instruction Tuning | Yangqin Jiang et.al. | 2412.19302v1 | link |
2024-12-26 | Time Series Foundational Models: Their Role in Anomaly Detection and Prediction | Chathurangi Shyalika et.al. | 2412.19286v1 | link |
2024-12-26 | Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval | Yang Du et.al. | 2412.19178v1 | link |
2024-12-26 | CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting | Siyu Jiao et.al. | 2412.19142v1 | null |
2024-12-26 | Semantic Residual for Multimodal Unified Discrete Representation | Hai Huang et.al. | 2412.19128v1 | null |
2024-12-26 | Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing | Inpyo Hong et.al. | 2412.19125v1 | link |
2024-12-24 | Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models | Zehan Wang et.al. | 2412.18605v1 | link |
2024-12-24 | ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Hongjie Li et.al. | 2412.18600v1 | null |
2024-12-24 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang et.al. | 2412.18552v1 | link |
2024-12-24 | The Key of Understanding Vision Tasks: Explanatory Instructions | Yang Shen et.al. | 2412.18525v1 | link |
2024-12-24 | Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English | Avinash Anand et.al. | 2412.18415v1 | link |
2024-12-24 | Extract Free Dense Misalignment from CLIP | JeongYeon Nam et.al. | 2412.18404v1 | link |
2024-12-24 | A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction | Stefano Damiano et.al. | 2412.18348v1 | link |
2024-12-24 | Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model | Yushu Li et.al. | 2412.18303v1 | null |
2024-12-24 | Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight | Xi Ding et.al. | 2412.18298v1 | link |
2024-12-24 | Improved Feature Generating Framework for Transductive Zero-shot Learning | Zihan Ye et.al. | 2412.18282v1 | null |
2024-12-23 | CiteBART: Learning to Generate Citations for Local Citation Recommendation | Ege Yiğit Çelik et.al. | 2412.17534v1 | link |
2024-12-23 | Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio | Gongyu Chen et.al. | 2412.17306v1 | null |
2024-12-23 | Discriminative Image Generation with Diffusion Models for Zero-Shot Learning | Dingjie Fu et.al. | 2412.17219v1 | null |
2024-12-22 | Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis | Ye-Xin Lu et.al. | 2412.16977v1 | null |
2024-12-22 | Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation | Quan Dao et.al. | 2412.16906v1 | null |
2024-12-22 | Autoregressive Speech Synthesis with Next-Distribution Prediction | Xinfa Zhu et.al. | 2412.16846v1 | null |
2024-12-21 | RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing | Zhipeng Huang et.al. | 2412.16778v1 | null |
2024-12-21 | HyperCLIP: Adapting Vision-Language models with Hypernetworks | Victor Akinwande et.al. | 2412.16777v1 | null |
2024-12-21 | Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval | Luo Ji et.al. | 2412.16615v1 | link |
2024-12-21 | Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling | Daichi Yashima et.al. | 2412.16576v1 | link |
2024-12-20 | Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Muhammad Abdullah Sohail et.al. | 2412.16119v1 | link |
2024-12-20 | CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up | Songhua Liu et.al. | 2412.16112v1 | link |
2024-12-20 | Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers | Yifan Yang et.al. | 2412.16102v1 | null |
2024-12-20 | Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs | Lynn Greschner et.al. | 2412.15993v1 | null |
2024-12-20 | Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation | Zhenghao Gao et.al. | 2412.15924v1 | null |
2024-12-20 | On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education | Lorenz Wendlinger et.al. | 2412.15902v1 | null |
2024-12-20 | AutoLife: Automatic Life Journaling with Smartphones and LLMs | Huatao Xu et.al. | 2412.15714v1 | null |
2024-12-20 | Cracking the Code: Evaluating Zero-Shot Prompting Methods for Providing Programming Feedback | Niklas Ippisch et.al. | 2412.15702v1 | null |
2024-12-20 | SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training | Wenxi Chen et.al. | 2412.15649v1 | link |
2024-12-20 | A New Method to Capturing Compositional Knowledge in Linguistic Space | Jiahe Wan et.al. | 2412.15632v1 | null |
2024-12-19 | Face the Facts! Evaluating RAG-based Fact-checking Pipelines in Realistic Settings | Daniel Russo et.al. | 2412.15189v1 | link |
2024-12-19 | STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning | Marius Memmel et.al. | 2412.15182v1 | null |
2024-12-19 | Adaptive Pruning for Large Language Models with Structural Importance Awareness | Haotian Zheng et.al. | 2412.15127v1 | null |
2024-12-19 | Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling | Leying Zhang et.al. | 2412.14890v1 | null |
2024-12-19 | Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data | Shuang Li et.al. | 2412.14873v1 | link |
2024-12-19 | Extending TWIG: Zero-Shot Predictive Hyperparameter Selection for KGEs based on Graph Structure | Jeffrey Sardina et.al. | 2412.14801v1 | null |
2024-12-19 | Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning | Kepu Zhang et.al. | 2412.14588v1 | null |
2024-12-19 | MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | Junjie Zhou et.al. | 2412.14475v1 | null |
2024-12-19 | WildSAT: Learning Satellite Image Representations from Wildlife Observations | Rangel Daroya et.al. | 2412.14428v1 | null |
2024-12-18 | I0T: Embedding Standardization Method Towards Zero Modality Gap | Na Min An et.al. | 2412.14384v1 | link |
2024-12-18 | Autoregressive Video Generation without Vector Quantization | Haoge Deng et.al. | 2412.14169v1 | link |
2024-12-18 | Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation | Jianyu Zhang et.al. | 2412.14145v1 | null |
2024-12-18 | Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation | Rémi Marsal et.al. | 2412.14103v1 | null |
2024-12-18 | FarExStance: Explainable Stance Detection for Farsi | Majid Zarharan et.al. | 2412.14008v1 | link |
2024-12-18 | Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition | Ethan Baron et.al. | 2412.13947v1 | null |
2024-12-18 | Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer | Xinyuan Shao et.al. | 2412.13908v1 | link |
2024-12-18 | Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models | Anna Scius-Bertrand et.al. | 2412.13859v1 | null |
2024-12-18 | SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor | Chenyu Yang et.al. | 2412.13786v1 | null |
2024-12-18 | G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o | Tony Cheng Tong et.al. | 2412.13647v1 | link |
2024-12-18 | Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking | Zhengfei Xu et.al. | 2412.13614v1 | null |
2024-12-17 | GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding | Haoyi Jiang et.al. | 2412.13193v1 | link |
2024-12-17 | A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis | Xiao Zhou et.al. | 2412.13126v1 | null |
2024-12-17 | Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO | Umer Butt et.al. | 2412.12997v1 | link |
2024-12-17 | An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions | Shreeyash Gowaikar et.al. | 2412.12898v1 | null |
2024-12-17 | Question: How do Large Language Models perform on the Question Answering tasks? Answer: | Kevin Fischer et.al. | 2412.12893v1 | null |
2024-12-17 | MIVE: New Design and Benchmark for Multi-Instance Video Editing | Samuel Teodoro et.al. | 2412.12877v1 | null |
2024-12-17 | Comparative Analysis of Zero-Shot Capability of Time-Series Foundation Models in Short-Term Load Prediction | Nan Lin et.al. | 2412.12834v1 | null |
2024-12-17 | FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering | Zheng Cheng et.al. | 2412.12833v1 | null |
2024-12-17 | Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages | Robert Litschko et.al. | 2412.12806v1 | link |
2024-12-17 | ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation | Shiqi Huang et.al. | 2412.12798v1 | link |
2024-12-16 | Causal Diffusion Transformers for Generative Modeling | Chaorui Deng et.al. | 2412.12095v1 | link |
2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077v1 | null |
2024-12-16 | A LoRA is Worth a Thousand Pictures | Chenxi Liu et.al. | 2412.12048v1 | null |
2024-12-16 | Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps | Linfeng Zhao et.al. | 2412.12024v1 | null |
2024-12-16 | Cost-Effective Label-free Node Classification with LLMs | Taiyan Zhang et.al. | 2412.11983v1 | null |
2024-12-16 | Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning | Yuti Liu et.al. | 2412.11952v1 | null |
2024-12-16 | Stepwise Reasoning Error Disruption Attack of LLMs | Jingyu Peng et.al. | 2412.11934v1 | null |
2024-12-16 | PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection | Sepideh Mamooler et.al. | 2412.11923v1 | null |
2024-12-16 | Improved Models for Media Bias Detection and Subcategorization | Tim Menzner et.al. | 2412.11835v1 | null |
2024-12-16 | A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation | Tian-Yi Che et.al. | 2412.11832v1 | null |
2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372v1 | link |
2024-12-13 | Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media | Jiaqing Yuan et.al. | 2412.10266v1 | null |
2024-12-13 | Efficient Generative Modeling with Residual Vector Quantization-Based Tokens | Jaehyeon Kim et.al. | 2412.10208v1 | null |
2024-12-13 | Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments | Kehan Chen et.al. | 2412.10137v1 | null |
2024-12-13 | Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data | Jonas Golde et.al. | 2412.10121v1 | link |
2024-12-13 | Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP | Yating Yu et.al. | 2412.09895v1 | link |
2024-12-13 | CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection | Qibo Chen et.al. | 2412.09799v1 | null |
2024-12-12 | Toward Foundation Model for Multivariate Wearable Sensing of Physiological Signals | Yunfei Luo et.al. | 2412.09758v1 | link |
2024-12-12 | Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners? | Huaijiang Zhu et.al. | 2412.09743v1 | null |
2024-12-12 | TransferLight: Zero-Shot Traffic Signal Control on any Road-Network | Johann Schmidt et.al. | 2412.09719v1 | null |
2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618v1 | null |
2024-12-12 | Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion | Joseph Humphreys et.al. | 2412.09440v1 | null |
2024-12-12 | Distribution free uncertainty quantification in neuroscience-inspired deep operators | Shailesh Garg et.al. | 2412.09369v1 | null |
2024-12-12 | Towards Open-Vocabulary Video Semantic Segmentation | Xinhao Li et.al. | 2412.09329v1 | link |
2024-12-12 | T-SVG: Text-Driven Stereoscopic Video Generation | Qiao Jin et.al. | 2412.09323v1 | null |
2024-12-12 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang et.al. | 2412.09278v1 | link |
2024-12-12 | Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation | Kirill Sirotkin et.al. | 2412.09160v1 | null |
2024-12-12 | Evaluating Pixel Language Models on Non-Standardized Languages | Alberto Muñoz-Ortiz et.al. | 2412.09084v1 | null |
2024-12-12 | Cross-View Completion Models are Zero-shot Correspondence Estimators | Honggyu An et.al. | 2412.09072v1 | null |
2024-12-13 | An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques | Chunxiao Li et.al. | 2412.09063v2 | null |
2024-12-11 | RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation | Mingfei Han et.al. | 2412.08591v1 | null |
2024-12-11 | SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting | Pallavi Jain et.al. | 2412.08536v1 | link |
2024-12-11 | SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation | Tapas Kumar Dutta et.al. | 2412.08482v1 | link |
2024-12-11 | Assessing Personalized AI Mentoring with Large Language Models in the Computing Field | Xiao Luo et.al. | 2412.08430v1 | null |
2024-12-11 | Zero-Shot Mono-to-Binaural Speech Synthesis | Alon Levkovitch et.al. | 2412.08356v1 | null |
2024-12-11 | BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language | Nikolay Banar et.al. | 2412.08329v1 | null |
2024-12-11 | Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion | Bingzhi Shen et.al. | 2412.08315v1 | null |
2024-12-11 | 2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset | Marta R. Costa-jussà et.al. | 2412.08274v1 | null |
2024-12-11 | Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field | Tanay Aggarwal et.al. | 2412.08258v1 | link |
2024-12-11 | Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? | Zihao Li et.al. | 2412.08174v1 | null |
2024-12-10 | Video Motion Transfer with Diffusion Transformers | Alexander Pondaven et.al. | 2412.07776v1 | link |
2024-12-10 | From Slow Bidirectional to Fast Causal Video Generators | Tianwei Yin et.al. | 2412.07772v1 | null |
2024-12-11 | Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting | Zetong Yang et.al. | 2412.07768v2 | null |
2024-12-10 | SAT: Spatial Aptitude Training for Multimodal Language Models | Arijit Ray et.al. | 2412.07755v1 | null |
2024-12-10 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments | Zijian Chen et.al. | 2412.07743v1 | null |
2024-12-10 | DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Zhijian Huang et.al. | 2412.07689v1 | link |
2024-12-10 | Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions | Anant Prakash Awasthi et.al. | 2412.07687v1 | null |
2024-12-10 | FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing | Yingying Deng et.al. | 2412.07517v1 | link |
2024-12-10 | ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning | Hongshu Guo et.al. | 2412.07507v1 | null |
2024-12-10 | Bilingual BSARD: Extending Statutory Article Retrieval to Dutch | Ehsan Lotfi et.al. | 2412.07462v1 | null |
2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774v1 | null |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738v1 | link |
2024-12-09 | You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Baorui Ma et.al. | 2412.06699v1 | link |
2024-12-09 | Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation | Shun Zhang et.al. | 2412.06664v1 | null |
2024-12-09 | LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation | Haihang Wu et.al. | 2412.06419v1 | null |
2024-12-09 | Continual Learning for Segment Anything Model Adaptation | Jinglong Yang et.al. | 2412.06418v1 | link |
2024-12-09 | ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models | Bingchen Gong et.al. | 2412.06292v1 | null |
2024-12-09 | No Annotations for Object Detection in Art through Stable Diffusion | Patrick Ramos et.al. | 2412.06286v1 | link |
2024-12-09 | DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction | Yunheng Li et.al. | 2412.06244v1 | null |
2024-12-09 | Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings | Zhao Liu et.al. | 2412.06134v1 | link |
2024-12-06 | DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo | Junzhe Zhu et.al. | 2412.05268v1 | null |
2024-12-06 | Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization | Luca Masserano et.al. | 2412.05244v1 | null |
2024-12-06 | Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization | Samuel Schapiro et.al. | 2412.05169v1 | null |
2024-12-06 | A Practical Examination of AI-Generated Text Detectors for Large Language Models | Brian Tufts et.al. | 2412.05139v1 | null |
2024-12-06 | Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? | Seyed Amin Tabatabaei et.al. | 2412.05137v1 | null |
2024-12-06 | The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation | Ruoyu Wang et.al. | 2412.05101v1 | null |
2024-12-06 | HOLa: HoloLens Object Labeling | Michael Schwimmbeck et.al. | 2412.04945v1 | link |
2024-12-06 | Xiaojie Yin et.al. | 2412.04925v1 | null | |
2024-12-06 | StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching | Jixun Yao et.al. | 2412.04724v1 | null |
2024-12-06 | LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs | Xuan Chen et.al. | 2412.04690v1 | null |
2024-12-05 | Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Luca Bartolomei et.al. | 2412.04472v1 | link |
2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429v1 | link |
2024-12-05 | SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding | Rong Li et.al. | 2412.04383v1 | null |
2024-12-05 | Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting | Edoardo Cetin et.al. | 2412.04368v1 | null |
2024-12-05 | Towards Zero-shot 3D Anomaly Localization | Yizhou Wang et.al. | 2412.04304v1 | null |
2024-12-05 | 3D Part Segmentation via Geometric Aggregation of 2D Visual Features | Marco Garosi et.al. | 2412.04247v1 | null |
2024-12-05 | Quantifying the Limits of Segment Anything Model: Analyzing Challenges in Segmenting Tree-Like and Low-Contrast Structures | Yixin Zhang et.al. | 2412.04243v1 | link |
2024-12-05 | Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image | Shuang Xu et.al. | 2412.04201v1 | null |
2024-12-05 | Unified Framework for Open-World Compositional Zero-shot Learning | Hirunima Jayasekara et.al. | 2412.04083v1 | link |
2024-12-05 | Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning | Shicheng Zhou et.al. | 2412.04078v1 | link |
2024-12-04 | The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control | Ruili Feng et.al. | 2412.03568v1 | null |
2024-12-04 | FLAIR: VLM with Fine-grained Language-informed Image Representations | Rui Xiao et.al. | 2412.03561v1 | link |
2024-12-04 | Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression | Junjie Wen et.al. | 2412.03293v1 | null |
2024-12-04 | Expanding Event Modality Applications through a Robust CLIP-Based Encoder | Sungheon Jeong et.al. | 2412.03093v1 | null |
2024-12-04 | ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction | Victor Junqiu Wei et.al. | 2412.03075v1 | null |
2024-12-04 | UTSD: Unified Time Series Diffusion Model | Xiangkai Ma et.al. | 2412.03068v1 | null |
2024-12-03 | A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications | Yixiang Qu et.al. | 2412.02868v1 | null |
2024-12-03 | Is Large-Scale Pretraining the Secret to Good Domain Generalization? | Piotr Teterwak et.al. | 2412.02856v1 | null |
2024-12-03 | Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation | Sarthak Kumar Maharana et.al. | 2412.02837v1 | null |
2024-12-03 | Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects | Abdurrahman Zeybey et.al. | 2412.02803v1 | null |
2024-12-03 | FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation | Kefan Chen et.al. | 2412.02690v1 | null |
2024-12-03 | Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | Jinjin Cai et.al. | 2412.02531v1 | null |
2024-12-03 | LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization | Ethan Smith et.al. | 2412.02352v1 | null |
2024-12-03 | Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation | Zhi Qu et.al. | 2412.02101v1 | link |
2024-12-03 | Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion | Liu Liu et.al. | 2412.02075v1 | link |
2024-12-02 | PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving | Xuewen Luo et.al. | 2412.02025v1 | null |
2024-12-04 | The use of large language models to enhance cancer clinical trial educational materials | Mingye Gao et.al. | 2412.01955v2 | null |
2024-12-02 | RandAR: Decoder-only Autoregressive Visual Generation in Random Orders | Ziqi Pang et.al. | 2412.01827v1 | null |
2024-12-02 | COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training | Sanghwan Kim et.al. | 2412.01814v1 | link |
2024-12-02 | Hard Constraint Guided Flow Matching for Gradient-Free Generation of PDE Solutions | Chaoran Cheng et.al. | 2412.01786v1 | null |
2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951v2 | link |
2024-11-29 | Reverse Thinking Makes LLMs Stronger Reasoners | Justin Chih-Yao Chen et.al. | 2411.19865v1 | null |
2024-11-29 | Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures | Alain Riou et.al. | 2411.19806v1 | null |
2024-11-29 | Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models | Kaican Li et.al. | 2411.19757v1 | link |
2024-11-29 | Multimodal Whole Slide Foundation Model for Pathology | Tong Ding et.al. | 2411.19666v1 | link |
2024-11-29 | LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification | Taja Kuzman et.al. | 2411.19638v1 | link |
2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492v1 | null |
2024-11-29 | Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning | Siddhant Agarwal et.al. | 2411.19418v1 | null |
2024-11-28 | CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections | Mohamed Fazli Imam et.al. | 2411.19346v1 | link |
2024-11-28 | OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration | Yiming Zuo et.al. | 2411.19278v1 | link |
2024-11-27 | Diffusion Self-Distillation for Zero-Shot Customized Image Generation | Shengqu Cai et.al. | 2411.18616v1 | null |
2024-11-27 | Isolating authorship from content with semantic embeddings and contrastive learning | Javier Huertas-Tato et.al. | 2411.18472v1 | null |
2024-11-27 | SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation | Duc-Hai Pham et.al. | 2411.18229v1 | null |
2024-11-27 | DRS: Deep Question Reformulation With Structured Output | Zhecheng Li et.al. | 2411.17993v1 | link |
2024-11-26 | Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Zigeng Chen et.al. | 2411.17787v1 | link |
2024-11-26 | MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation | Harsh Singh et.al. | 2411.17636v1 | null |
2024-11-26 | ShowUI: One Vision-Language-Action Model for GUI Visual Agent | Kevin Qinghong Lin et.al. | 2411.17465v1 | link |
2024-11-26 | FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval | Jingyou Xie et.al. | 2411.17454v1 | null |
2024-11-26 | PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning | Zhen Sun et.al. | 2411.17453v1 | null |
2024-11-26 | CoA: Chain-of-Action for Generative Semantic Labels | Meng Wei et.al. | 2411.17406v1 | link |
2024-11-26 | vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation | Bastian Wittmann et.al. | 2411.17386v1 | link |
2024-11-26 | 2D Matryoshka Training for Information Retrieval | Shuai Wang et.al. | 2411.17299v1 | link |
2024-11-26 | APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents | Jun Yu Chen et.al. | 2411.17255v1 | link |
2024-11-26 | Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors | Zhengfei Kuang et.al. | 2411.17249v1 | null |
2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240v1 | link |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668v1 | null |
2024-11-25 | Generating Out-Of-Distribution Scenarios Using Language Models | Erfan Aasi et.al. | 2411.16554v1 | null |
2024-11-25 | TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation | Linqing Zhong et.al. | 2411.16425v1 | null |
2024-11-25 | Poster: Could Large Language Models Perform Network Management? | Zine el abidine Kherroubi et.al. | 2411.16232v1 | null |
2024-11-25 | SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context | Jungang Li et.al. | 2411.16213v1 | null |
2024-11-25 | Learn from Foundation Model: Fruit Detection Model without Manual Annotation | Yanan Wang et.al. | 2411.16196v1 | link |
2024-11-25 | Language Driven Occupancy Prediction | Zhu Yu et.al. | 2411.16072v1 | link |
2024-11-25 | Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models | Niloufar Alipour Talemi et.al. | 2411.16018v1 | null |
2024-11-24 | PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making | Jonathan Light et.al. | 2411.15998v1 | null |
2024-11-24 | Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition | Klara Janouskova et.al. | 2411.15933v1 | null |
2024-11-22 | Context-Aware Multimodal Pretraining | Karsten Roth et.al. | 2411.15099v1 | null |
2024-11-22 | Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models | Aurel X. Appius et.al. | 2411.14917v1 | null |
2024-11-22 | Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation | Huy Le et.al. | 2411.14913v1 | null |
2024-11-22 | Leveraging Hierarchical Prototypes as the Verbalizer for Implicit Discourse Relation Recognition | Wanqiu Long et.al. | 2411.14880v1 | null |
2024-11-22 | VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models | Camilo Chacón Sartori et.al. | 2411.14832v1 | null |
2024-11-22 | De-biased Multimodal Electrocardiogram Analysis | Haitao Li et.al. | 2411.14795v1 | null |
2024-11-22 | Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers | Hongbo Liu et.al. | 2411.14789v1 | null |
2024-11-21 | Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems | Qihao Yuan et.al. | 2411.14594v1 | link |
2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401v1 | null |
2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347v1 | link |
2024-11-21 | StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart | Jian Shi et.al. | 2411.14295v1 | null |
2024-11-21 | Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models | Iacopo Ghinassi et.al. | 2411.14272v1 | link |
2024-11-21 | Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs | Zeyu Dong et.al. | 2411.14256v1 | null |
2024-11-21 | Evaluating the Robustness of Analogical Reasoning in Large Language Models | Martha Lewis et.al. | 2411.14215v1 | link |
2024-11-21 | Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data | Xianda Guo et.al. | 2411.14053v1 | link |
2024-11-21 | Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion | Jinhong He et.al. | 2411.13961v1 | link |
2024-11-21 | Learning to Cooperate with Humans using Generative Agents | Yancheng Liang et.al. | 2411.13934v1 | link |
2024-11-21 | CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation | Lin Sun et.al. | 2411.13836v1 | link |
2024-11-20 | Find Any Part in 3D | Ziqi Ma et.al. | 2411.13550v1 | null |
2024-11-20 | BIPro: Zero-shot Chinese Poem Generation via Block Inverse Prompting Constrained Generation Framework | Xu Zou et.al. | 2411.13237v1 | null |
2024-11-20 | Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding | Nabeel Seedat et.al. | 2411.13163v1 | null |
2024-11-20 | Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM | Jiawei Yu et.al. | 2411.13159v1 | null |
2024-11-20 | Learning Time-Optimal and Speed-Adjustable Tactile In-Hand Manipulation | Johannes Pitz et.al. | 2411.13148v1 | null |
2024-11-20 | TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models | Xin Wang et.al. | 2411.13136v1 | null |
2024-11-20 | Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI | Yaşar Utku Alçalar et.al. | 2411.13022v1 | null |
2024-11-20 | Evaluating LLMs Capabilities Towards Understanding Social Dynamics | Anique Tahir et.al. | 2411.13008v1 | null |
2024-11-19 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641v1 | null |
2024-11-19 | Instant Policy: In-Context Imitation Learning via Graph Diffusion | Vitalis Vosylius et.al. | 2411.12633v1 | null |
2024-11-19 | SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation | Ron Keuth et.al. | 2411.12602v1 | link |
2024-11-19 | Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing | Ruyi Ding et.al. | 2411.12508v1 | null |
2024-11-19 | Predicting User Intents and Musical Attributes from Music Discovery Conversations | Daeyong Kwon et.al. | 2411.12254v1 | link |
2024-11-19 | Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings | Iroro Orife et.al. | 2411.12209v1 | link |
2024-11-19 | A More Advanced Group Polarization Measurement Approach Based on LLM-Based Agents and Graphs | Zixin Liu et.al. | 2411.12196v1 | null |
2024-11-19 | UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning | Yuan Yuan et.al. | 2411.12164v1 | link |
2024-11-19 | HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments | Shuijing Liu et.al. | 2411.12150v1 | null |
2024-11-18 | VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation | Bangguo Yu et.al. | 2411.11609v1 | null |
2024-11-18 | Unveiling the Inflexibility of Adaptive Embedding in Traffic Forecasting | Hongjun Wang et.al. | 2411.11448v1 | link |
2024-11-18 | Scalable Autoregressive Monocular Depth Estimation | Jinhong Wang et.al. | 2411.11361v1 | null |
2024-11-18 | Text-guided Zero-Shot Object Localization | Jingjing Wang et.al. | 2411.11357v1 | null |
2024-11-18 | Visual-Semantic Graph Matching Net for Zero-Shot Learning | Bowen Duan et.al. | 2411.11351v1 | link |
2024-11-18 | Zero-Shot Load Forecasting with Large Language Models | Wenlong Liao et.al. | 2411.11350v1 | null |
2024-11-18 | Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation | Peng Shu et.al. | 2411.11295v1 | null |
2024-11-18 | Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2411.11288v1 | null |
2024-11-18 | Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development | Ranjan Sapkota et.al. | 2411.11285v1 | null |
2024-11-18 | ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification | Son T. Luu et.al. | 2411.11247v1 | link |
2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309v1 | link |
2024-11-15 | CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Dengke Zhang et.al. | 2411.10086v1 | null |
2024-11-15 | 'What did the Robot do in my Absence?' Video Foundation Models to Enhance Intermittent Supervision | Kavindie Katuwandeniya et.al. | 2411.10016v1 | null |
2024-11-15 | Zero-shot Voice Conversion with Diffusion Transformers | Songting Liu et.al. | 2411.09943v1 | link |
2024-11-14 | LLM Hallucination Reasoning with Zero-shot Knowledge Test | Seongmin Lee et.al. | 2411.09689v1 | null |
2024-11-14 | Script-centric behavior understanding for assisted autism spectrum disorder diagnosis | Wenxing Liu et.al. | 2411.09413v1 | null |
2024-11-14 | Less is More: Unseen Domain Fake News Detection via Causal Propagation Substructures | Shuzhi Gong et.al. | 2411.09389v1 | null |
2024-11-14 | Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet? | Aldo Marzullo et.al. | 2411.09310v1 | null |
2024-11-14 | Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching | Yuran Wang et.al. | 2411.09151v1 | null |
2024-11-15 | UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos | Chengbo Yuan et.al. | 2411.09145v2 | null |
2024-11-13 | Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training | Nghia Trung Ngo et.al. | 2411.08785v1 | null |
2024-11-13 | Measuring similarity between embedding spaces using induced neighborhood graphs | Tiago F. Tavares et.al. | 2411.08687v1 | null |
2024-11-13 | Zero-shot capability of SAM-family models for bone segmentation in CT scans | Caroline Magg et.al. | 2411.08629v1 | null |
2024-11-13 | Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent | Leonidas Askianakis et.al. | 2411.08566v1 | null |
2024-11-13 | CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs | Suhas S Kowshik et.al. | 2411.08553v1 | null |
2024-11-13 | An Information Theoretic Approach to Operationalize Right to Data Protection | Abhinav Java et.al. | 2411.08506v1 | null |
2024-11-13 | Enhancing Multimodal Query Representation via Visual Dialogues for End-to-End Knowledge Retrieval | Yeong-Joon Ju et.al. | 2411.08334v1 | link |
2024-11-12 | Retrieval Augmented Time Series Forecasting | Kutay Tire et.al. | 2411.08249v1 | link |
2024-11-12 | Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing | Zitao Shuai et.al. | 2411.08196v1 | null |
2024-11-12 | LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | Anoop Cherian et.al. | 2411.08027v1 | null |
2024-11-12 | Semantic Sleuth: Identifying Ponzi Contracts via Large Language Models | Cong Wu et.al. | 2411.07498v1 | null |
2024-11-11 | Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains | Katerina Korre et.al. | 2411.07417v1 | null |
2024-11-11 | Warmstarting for Scaling Language Models | Neeratyoy Mallik et.al. | 2411.07340v1 | null |
2024-11-11 | DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning | Zecheng Zhang et.al. | 2411.07239v1 | null |
2024-11-11 | The Super Weight in Large Language Models | Mengxia Yu et.al. | 2411.07191v1 | link |
2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186v1 | null |
2024-11-11 | SAMPart3D: Segment Any Part in 3D Objects | Yunhan Yang et.al. | 2411.07184v1 | link |
2024-11-11 | Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models | Yanchen Wang et.al. | 2411.07121v1 | link |
2024-11-11 | Transformer verbatim in-context retrieval across time and scale | Kristijan Armeni et.al. | 2411.07075v1 | link |
2024-11-11 | MapSAM: Adapting Segment Anything Model for Automated Feature Detection in Historical Maps | Xue Xia et.al. | 2411.06971v1 | link |
2024-11-11 | Robust Fine-tuning of Zero-shot Models via Variance Reduction | Beier Zhu et.al. | 2411.06966v1 | link |
2024-11-11 | UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models | Jiachen Liang et.al. | 2411.06921v1 | link |
2024-11-11 | Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning | Hongsheng Zhang et.al. | 2411.06764v1 | null |
2024-11-08 | End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Dylan Goetting et.al. | 2411.05755v1 | link |
2024-11-08 | Asterisk: Keep it Simple* | Andrew Semenov et.al. | 2411.05691v1 | null |
2024-11-08 | Assessing Open-Source Large Language Models on Argumentation Mining Subtasks | Mohammad Yeghaneh Abkenar et.al. | 2411.05639v1 | null |
2024-11-08 | An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking | Zijian Chen et.al. | 2411.05508v1 | null |
2024-11-08 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Shengda Fan et.al. | 2411.05451v1 | link |
2024-11-08 | Enhancing Visual Classification using Comparative Descriptors | Hankyeol Lee et.al. | 2411.05357v1 | link |
2024-11-08 | ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving | Tao Ma et.al. | 2411.05311v1 | null |
2024-11-07 | Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities | Shengzhi Li et.al. | 2411.05232v1 | link |
2024-11-07 | Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation | Mu Yang et.al. | 2411.05141v1 | null |
2024-11-07 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Koichi Namekata et.al. | 2411.04989v1 | null |
2024-11-07 | DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning | Gaoyue Zhou et.al. | 2411.04983v1 | null |
2024-11-07 | Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games | Usman Anwar et.al. | 2411.04976v1 | link |
2024-11-07 | In the Era of Prompt Learning with Vision-Language Models | Ankit Jha et.al. | 2411.04892v1 | null |
2024-11-07 | Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks | Sanja Karilanova et.al. | 2411.04760v1 | null |
2024-11-07 | Vision Language Models are In-Context Value Learners | Yecheng Jason Ma et.al. | 2411.04549v1 | null |
2024-11-07 | Best Practices for Distilling Large Language Models into BERT for Web Search Ranking | Dezhi Ye et.al. | 2411.04539v1 | null |
2024-11-07 | Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models | Xinyu Zhang et.al. | 2411.04530v1 | null |
2024-11-07 | Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity | Robby Costales et.al. | 2411.04466v1 | link |
2024-11-07 | AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering | Yungeng Liu et.al. | 2411.04440v1 | link |
2024-11-06 | RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Maya Varma et.al. | 2411.04097v1 | link |
2024-11-06 | Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models | Minh Duc Bui et.al. | 2411.03888v1 | link |
2024-11-06 | SA3DIP: Segment Any 3D Instance with Potential 3D Priors | Xi Yang et.al. | 2411.03819v1 | link |
2024-11-06 | No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages | Youssef Mohamed et.al. | 2411.03769v1 | link |
2024-11-06 | Zero-shot Dynamic MRI Reconstruction with Global-to-local Diffusion Model | Yu Guan et.al. | 2411.03723v1 | link |
2024-11-06 | Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction | Muhammad Tayyab Khan et.al. | 2411.03707v1 | null |
2024-11-06 | 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement | Ziqi Lu et.al. | 2411.03706v1 | link |
2024-11-06 | Towards Scalable Automated Grading: Leveraging Large Language Models for Conceptual Question Evaluation in Engineering | Rujun Gao et.al. | 2411.03659v1 | null |
2024-11-05 | Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry | Anurag Acharya et.al. | 2411.03542v1 | null |
2024-11-05 | A Mamba Foundation Model for Time Series Forecasting | Haoyu Ma et.al. | 2411.02941v1 | null |
2024-11-05 | DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark | Haodong Li et.al. | 2411.02733v1 | link |
2024-11-04 | EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector | Deok-Hyeon Cho et.al. | 2411.02625v1 | link |
2024-11-04 | MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs | Sheng-Chieh Lin et.al. | 2411.02571v1 | null |
2024-11-04 | TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives | Maitreya Patel et.al. | 2411.02545v1 | null |
2024-11-04 | A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification | Sorouralsadat Fatemi et.al. | 2411.02476v1 | null |
2024-11-04 | Do Advanced Language Models Eliminate the Need for Prompt Engineering in Software Engineering? | Guoqing Wang et.al. | 2411.02093v1 | null |
2024-11-04 | CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching | Yu Pan et.al. | 2411.02026v1 | null |
2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925v1 | null |
2024-11-04 | ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation | Hengkai Tan et.al. | 2411.01850v1 | null |
2024-11-04 | DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability | Bo Gao et.al. | 2411.01819v1 | null |
2024-11-03 | Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups | Răzvan-Alexandru Smădu et.al. | 2411.01706v1 | link |
2024-11-03 | Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli | Matthias Tangemann et.al. | 2411.01505v1 | link |
2024-11-02 | Task-Oriented Hierarchical Object Decomposition for Visuomotor Control | Jianing Qian et.al. | 2411.01284v1 | null |
2024-11-02 | MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction | Wang Zhao et.al. | 2411.01226v1 | link |
2024-11-02 | Transfer Learning for Finetuning Large Language Models | Tobias Strangmann et.al. | 2411.01195v1 | null |
2024-10-31 | DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models | Heng-Jui Chang et.al. | 2410.24177v1 | null |
2024-11-02 | Kevin Black et.al. | 2410.24164v2 | null | |
2024-10-31 | Scaling Concept With Text-Guided Diffusion Models | Chao Huang et.al. | 2410.24151v1 | null |
2024-10-31 | Matchmaker: Self-Improving Large Language Model Programs for Schema Matching | Nabeel Seedat et.al. | 2410.24105v1 | null |
2024-10-31 | In-Context Fine-Tuning for Time-Series Foundation Models | Abhimanyu Das et.al. | 2410.24087v1 | null |
2024-10-31 | GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance | Shuaihang Yuan et.al. | 2410.23978v1 | null |
2024-10-31 | Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model | Hao Zhang et.al. | 2410.23905v1 | link |
2024-10-31 | EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection | Qinqian Lei et.al. | 2410.23904v1 | link |
2024-10-31 | The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge | Dake Guo et.al. | 2410.23815v1 | null |
2024-10-31 | RealMind: Zero-Shot EEG-Based Visual Decoding and Captioning Using Multi-Modal Models | Dongyang Li et.al. | 2410.23754v1 | null |
2024-10-30 | Multi-student Diffusion Distillation for Better One-step Generators | Yanke Song et.al. | 2410.23274v1 | null |
2024-10-30 | Partial Channel Dependence with Channel Masks for Time Series Foundation Models | Seunghan Lee et.al. | 2410.23222v1 | null |
2024-10-30 | Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks | Michael Matthews et.al. | 2410.23208v1 | link |
2024-10-30 | FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities | Jingge Xiao et.al. | 2410.23160v1 | link |
2024-10-30 | DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes | Jialiang Zhang et.al. | 2410.23004v1 | null |
2024-10-30 | SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset | Ngoc Dung Huynh et.al. | 2410.22648v1 | null |
2024-10-30 | SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms | Shuzhen Li et.al. | 2410.22646v1 | null |
2024-10-29 | RealCQA-V2 : Visual Premise Proving | Saleem Ahmed et.al. | 2410.22492v1 | null |
2024-10-29 | Local Policies Enable Zero-shot Long-horizon Manipulation | Murtaza Dalal et.al. | 2410.22332v1 | null |
2024-10-29 | Are Decoder-Only Large Language Models the Silver Bullet for Code Search? | Yuxuan Chen et.al. | 2410.22240v1 | link |
2024-10-29 | Active Learning for Vision-Language Models | Bardia Safaei et.al. | 2410.22187v1 | null |
2024-10-29 | Data Generation for Hardware-Friendly Post-Training Quantization | Lior Dikstein et.al. | 2410.22110v1 | link |
2024-10-29 | PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement | Shutong Jin et.al. | 2410.22059v1 | null |
2024-10-29 | Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation | Halil Utku Unlu et.al. | 2410.21926v1 | null |
2024-10-30 | Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models | Lu Yu et.al. | 2410.21802v2 | link |
2024-10-29 | Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer | Zihan Pengmei et.al. | 2410.21683v1 | null |
2024-10-28 | SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval | Isidora Chara Tourni et.al. | 2410.21501v1 | null |
2024-10-28 | SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | Wanhua Li et.al. | 2410.21411v1 | link |
2024-10-28 | Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | Nour Jedidi et.al. | 2410.21242v1 | null |
2024-10-28 | Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments | Marharyta Domnich et.al. | 2410.21131v1 | link |
2024-10-28 | Retrieval-Enhanced Mutation Mastery: Augmenting Zero-Shot Prediction of Protein Language Model | Yang Tan et.al. | 2410.21127v1 | link |
2024-10-28 | Zero-Shot Action Recognition in Surveillance Videos | Joao Pereira et.al. | 2410.21113v1 | null |
2024-10-28 | Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation | Shuaihang Yuan et.al. | 2410.21037v1 | null |
2024-10-28 | Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies | Franck Djeumou et.al. | 2410.20990v1 | null |
2024-10-28 | DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning | Xun Guo et.al. | 2410.20964v1 | link |
2024-10-28 | MrT5: Dynamic Token Merging for Efficient Byte-level Language Models | Julie Kallini et.al. | 2410.20771v1 | link |
2024-10-28 | Face-MLLM: A Large Face Perception Model | Haomiao Sun et.al. | 2410.20717v1 | null |
2024-10-28 | Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design | Xiangxin Zhou et.al. | 2410.20688v1 | link |
2024-10-25 | Adversarial Environment Design via Regret-Guided Diffusion Models | Hojun Chung et.al. | 2410.19715v1 | null |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702v1 | null |
2024-10-25 | IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation | Kaixian Qu et.al. | 2410.19697v1 | null |
2024-10-25 | Context-Based Visual-Language Place Recognition | Soojin Woo et.al. | 2410.19341v1 | link |
2024-10-25 | Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting | Xingyu Zhu et.al. | 2410.19294v1 | null |
2024-10-24 | Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models | Yue Li et.al. | 2410.19195v1 | null |
2024-10-24 | AlignCap: Aligning Speech Emotion Captioning to Human Preferences | Ziqi Liang et.al. | 2410.19134v1 | null |
2024-10-24 | ConceptDrift: Uncovering Biases through the Lens of Foundational Models | Cristian Daniel Păduraru et.al. | 2410.18970v1 | null |
2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955v1 | null |
2024-10-24 | SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment | Caelan Garrett et.al. | 2410.18907v1 | null |
2024-10-24 | Probabilistic Language-Image Pre-Training | Sanghyuk Chun et.al. | 2410.18857v1 | link |
2024-10-24 | Task Calibration: Calibrating Large Language Models on Inference Tasks | Yingjie Li et.al. | 2410.18764v1 | null |
2024-10-24 | Data Scaling Laws in Imitation Learning for Robotic Manipulation | Fanqi Lin et.al. | 2410.18647v1 | link |
2024-10-24 | Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data | Anup Shirgaonkar et.al. | 2410.18588v1 | null |
2024-10-24 | Zero-shot Object Navigation with Vision-Language Models Reasoning | Congcong Wen et.al. | 2410.18570v1 | null |
2024-10-24 | Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics | Jinghao Hu et.al. | 2410.18537v1 | null |
2024-10-24 | Scaling up Masked Diffusion Models on Text | Shen Nie et.al. | 2410.18514v1 | link |
2024-10-23 | Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases | Anna Glazkova et.al. | 2410.18040v1 | null |
2024-10-23 | Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models | Nils Blank et.al. | 2410.17772v1 | null |
2024-10-23 | Learning Versatile Skills with Curriculum Masking | Yao Tang et.al. | 2410.17744v1 | link |
2024-10-23 | Entity-based Reinforcement Learning for Autonomous Cyber Defence | Isaac Symes Thompson et.al. | 2410.17647v1 | link |
2024-10-23 | Incremental Learning of Affordances using Markov Logic Networks | George Potter et.al. | 2410.17624v1 | null |
2024-10-23 | Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective | Rui Yang et.al. | 2410.17600v1 | link |
2024-10-23 | Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors | Bang You et.al. | 2410.17551v1 | null |
2024-10-23 | Generalizable Motion Planning via Operator Learning | Sharath Matada et.al. | 2410.17547v1 | null |
2024-10-23 | X-MOBILITY: End-To-End Generalizable Navigation via World Modeling | Wei Liu et.al. | 2410.17491v1 | link |
2024-10-22 | Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2410.17393v1 | null |
2024-10-22 | Altogether: Image Captioning via Re-aligning Alt-text | Hu Xu et.al. | 2410.17251v1 | link |
2024-10-22 | LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias | Haian Jin et.al. | 2410.17242v1 | null |
2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149v1 | null |
2024-10-22 | LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging | Ke Wang et.al. | 2410.17146v1 | link |
2024-10-22 | SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine | Xiaochen Wang et.al. | 2410.17021v1 | null |
2024-10-22 | Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations | Cheng Lei et.al. | 2410.16953v1 | null |
2024-10-22 | DNAHLM -- DNA sequence and Human Language mixed large language Model | Wang Liang et.al. | 2410.16917v1 | link |
2024-10-22 | AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models | Yongjian Wu et.al. | 2410.16820v1 | link |
2024-10-22 | PLDR-LLM: Large Language Model from Power Law Decoder Representations | Burc Gokden et.al. | 2410.16703v1 | link |
2024-10-22 | GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting | Pai Zhu et.al. | 2410.16647v1 | null |
2024-10-21 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239v1 | link |
2024-10-21 | IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems | Yihuan Mao et.al. | 2410.16237v1 | null |
2024-10-21 | Continuous Speech Synthesis using per-token Latent Diffusion | Arnon Turetzky et.al. | 2410.16048v1 | null |
2024-10-21 | Few-shot target-driven instance detection based on open-vocabulary object detection models | Ben Crulis et.al. | 2410.16028v1 | null |
2024-10-21 | Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly | Junsheng Zhou et.al. | 2410.15971v1 | null |
2024-10-21 | Mitigating Object Hallucination via Concentric Causal Attention | Yun Xing et.al. | 2410.15926v1 | link |
2024-10-21 | MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images | Pablo Meseguer et.al. | 2410.15881v1 | null |
2024-10-21 | Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images | Yiming Li et.al. | 2410.15879v1 | null |
2024-10-21 | FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL | Woosung Koh et.al. | 2410.15876v1 | null |
2024-10-21 | Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment | Yankai Jiang et.al. | 2410.15744v1 | null |
2024-10-18 | BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities | Shaozhe Hao et.al. | 2410.14672v1 | link |
2024-10-18 | Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum | Ryan Soh-Eun Shim et.al. | 2410.14589v1 | null |
2024-10-18 | SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning | Magdalena Wysocka et.al. | 2410.14399v1 | null |
2024-10-18 | AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios | Ziming Huang et.al. | 2410.14379v1 | link |
2024-10-18 | Zero-shot Action Localization via the Confidence of Large Vision-Language Models | Josiah Aklilu et.al. | 2410.14340v1 | null |
2024-10-18 | Storyboard guided Alignment for Fine-grained Video Action Recognition | Enqi Liu et.al. | 2410.14238v1 | null |
2024-10-18 | Assessing Open-world Forgetting in Generative Image Model Customization | Héctor Laria et.al. | 2410.14159v1 | null |
2024-10-17 | Measuring and Modifying the Readability of English Texts with GPT-4 | Sean Trott et.al. | 2410.14028v1 | link |
2024-10-17 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Lijie Fan et.al. | 2410.13863v1 | null |
2024-10-17 | VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Runsen Xu et.al. | 2410.13860v1 | link |
2024-10-17 | DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Yujie Wei et.al. | 2410.13830v1 | null |
2024-10-17 | AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | Ke Yang et.al. | 2410.13825v1 | null |
2024-10-17 | Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers | Yuchen Liang et.al. | 2410.13746v1 | null |
2024-10-17 | ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions | Shailaja Keyur Sampat et.al. | 2410.13662v1 | link |
2024-10-17 | Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? | Shailaja Keyur Sampat et.al. | 2410.13651v1 | link |
2024-10-18 | Enhanced Prompt-leveraged Weakly Supervised Cancer Segmentation based on Segment Anything | Joonhyeon Song et.al. | 2410.13621v2 | link |
2024-10-17 | Large Language Models as Narrative-Driven Recommenders | Lukas Eberhard et.al. | 2410.13604v1 | null |
2024-10-17 | Representing Model Weights with Language using Tree Experts | Eliahu Horwitz et.al. | 2410.13569v1 | null |
2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782v1 | null |
2024-10-16 | Towards Zero-Shot Camera Trap Image Categorization | Jiří Vyskočil et.al. | 2410.12769v1 | null |
2024-10-16 | Towards Graph Foundation Models: The Perspective of Zero-shot Reasoning on Knowledge Graphs | Kai Wang et.al. | 2410.12609v1 | null |
2024-10-16 | A Claim Decomposition Benchmark for Long-form Answer Verification | Zhihao Zhang et.al. | 2410.12558v1 | link |
2024-10-16 | SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling | Loris Gaven et.al. | 2410.12481v1 | null |
2024-10-16 | SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset | Xuyuan Li et.al. | 2410.12399v1 | null |
2024-10-16 | ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs | Rui-Chen Zheng et.al. | 2410.12359v1 | null |
2024-10-16 | MAX: Masked Autoencoder for X-ray Fluorescence in Geological Investigation | An-Sheng Lee et.al. | 2410.12330v1 | link |
2024-10-16 | Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety | Lucas Choi et.al. | 2410.12225v1 | null |
2024-10-15 | Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming | Yilun Hao et.al. | 2410.12112v1 | null |
2024-10-15 | FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting | Zhe Li et.al. | 2410.11802v1 | null |
2024-10-15 | Time-Series Foundation Model for Value-at-Risk | Anubha Goel et.al. | 2410.11773v1 | link |
2024-10-15 | Zero-shot Model-based Reinforcement Learning using Large Language Models | Abdelhakim Benechehab et.al. | 2410.11711v1 | link |
2024-10-15 | PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning | Man Liu et.al. | 2410.11560v1 | null |
2024-10-15 | AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data | Xinjie Zhao et.al. | 2410.11531v1 | null |
2024-10-15 | Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction | Renhang Liu et.al. | 2410.11522v1 | link |
2024-10-15 | Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement | Zhi Wang et.al. | 2410.11448v1 | link |
2024-10-15 | DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM | Yingjun Shen et.al. | 2410.11373v1 | null |
2024-10-15 | Enhance Graph Alignment for Large Language Models | Haitong Luo et.al. | 2410.11370v1 | null |
2024-10-15 | In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions | Alireza Shamshiri et.al. | 2410.11265v1 | null |
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Jingzhi Bao et.al. | 2410.10821v1 | link |
2024-10-14 | Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations | Litu Rout et.al. | 2410.10792v1 | null |
2024-10-14 | SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators | Rasoul Shafipour et.al. | 2410.10714v1 | null |
2024-10-14 | MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer | Minghao Zhu et.al. | 2410.10589v1 | link |
2024-10-14 | Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios? | Zeno Vandenbulcke et.al. | 2410.10576v1 | null |
2024-10-14 | Continual Learning Improves Zero-Shot Action Recognition | Shreyank N Gowda et.al. | 2410.10497v1 | null |
2024-10-14 | Learning to Ground VLMs without Forgetting | Aritra Bhowmik et.al. | 2410.10491v1 | null |
2024-10-14 | Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts | Xu Liu et.al. | 2410.10469v1 | null |
2024-10-14 | 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting | Wanlin Liang et.al. | 2410.10412v1 | null |
2024-10-14 | GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation | Taha Aksu et.al. | 2410.10393v1 | link |
2024-10-11 | Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures | Evan Lucas et.al. | 2410.08971v1 | null |
2024-10-11 | NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models | Zheng Yi Ho et.al. | 2410.08970v1 | null |
2024-10-11 | Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images | Virmarie Maquiling et.al. | 2410.08926v1 | null |
2024-10-11 | SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation | Haosheng Li et.al. | 2410.08901v1 | null |
2024-10-11 | A Benchmark for Cross-Domain Argumentative Stance Classification on Social Media | Jiaqing Yuan et.al. | 2410.08900v1 | null |
2024-10-11 | RoRA-VLM: Robust Retrieval-Augmented Vision Language Models | Jingyuan Qi et.al. | 2410.08876v1 | null |
2024-10-11 | One-shot Generative Domain Adaptation in 3D GANs | Ziqiang Li et.al. | 2410.08824v1 | link |
2024-10-11 | Zero-Shot Offline Imitation Learning via Optimal Transport | Thomas Rupf et.al. | 2410.08751v1 | link |
2024-10-11 | Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers | Jin Cao et.al. | 2410.08688v1 | link |
2024-10-11 | Boosting Open-Vocabulary Object Detection by Handling Background Samples | Ruizhe Zeng et.al. | 2410.08645v1 | null |
2024-10-10 | LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts | Anh-Quan Cao et.al. | 2410.08211v1 | null |
2024-10-10 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation | Hang Yin et.al. | 2410.08189v1 | null |
2024-10-10 | On the Evaluation of Generative Robotic Simulations | Feng Chen et.al. | 2410.08172v1 | null |
2024-10-10 | ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion | Zitian Zhang et.al. | 2410.08168v1 | link |
2024-10-10 | Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning | Vassil Atanassov et.al. | 2410.07877v1 | null |
2024-10-10 | RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation | Songming Liu et.al. | 2410.07864v1 | link |
2024-10-10 | Rewriting Conversational Utterances with Instructed Large Language Models | Elnara Galimzhanova et.al. | 2410.07797v1 | null |
2024-10-10 | The Power of Input: Benchmarking Zero-Shot Sim-To-Real Transfer of Reinforcement Learning Control Policies for Quadrotor Control | Alberto Dionigi et.al. | 2410.07686v1 | null |
2024-10-10 | Parallel Digital Twin-driven Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks | Zhenyu Tao et.al. | 2410.07611v1 | null |
2024-10-10 | CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features | Po-han Li et.al. | 2410.07610v1 | null |
2024-10-09 | AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation | Yukang Cao et.al. | 2410.07164v1 | null |
2024-10-09 | Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy | Tagore Rao Kosireddy et.al. | 2410.07118v1 | link |
2024-10-09 | Collusion Detection with Graph Neural Networks | Lucas Gomes et.al. | 2410.07091v1 | null |
2024-10-09 | Stanceformer: Target-Aware Transformer for Stance Detection | Krishna Garg et.al. | 2410.07083v1 | link |
2024-10-09 | Compositional Entailment Learning for Hyperbolic Vision-Language Models | Avik Pal et.al. | 2410.06912v1 | link |
2024-10-09 | F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching | Yushen Chen et.al. | 2410.06885v1 | link |
2024-10-09 | K-SAM: A Prompting Method Using Pretrained U-Net to Improve Zero Shot Performance of SAM on Lung Segmentation in CXR Images | Mohamed Deriche et.al. | 2410.06825v1 | null |
2024-10-09 | Toward Physics-guided Time Series Embedding | Jiaxi Hu et.al. | 2410.06651v1 | null |
2024-10-09 | Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments | Meng Yu et.al. | 2410.06626v1 | null |
2024-10-09 | DCP: Learning Accelerator Dataflow for Neural Network via Propagation | Peng Xu et.al. | 2410.06553v1 | null |
2024-10-07 | Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality | Youngtaek Oh et.al. | 2410.05210v1 | link |
2024-10-07 | ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering | Francesco Maria Molfese et.al. | 2410.05077v1 | link |
2024-10-07 | PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing | Feng Tian et.al. | 2410.04844v1 | link |
2024-10-07 | LPZero: Language Model Zero-cost Proxy Search from Zero | Peijie Dong et.al. | 2410.04808v1 | null |
2024-10-07 | Building Damage Assessment in Conflict Zones: A Deep Learning Approach Using Geospatial Sub-Meter Resolution Data | Matteo Risso et.al. | 2410.04802v1 | null |
2024-10-07 | Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering | Kazumoto Nakamura et.al. | 2410.04801v1 | null |
2024-10-07 | Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering | Zimu Wang et.al. | 2410.04752v1 | null |
2024-10-07 | ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction | Hyungjin Chung et.al. | 2410.04721v1 | null |
2024-10-07 | Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural Networks | Yu-Hua Chen et.al. | 2410.04702v1 | null |
2024-10-07 | SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech | Minchan Kim et.al. | 2410.04690v1 | null |
2024-10-04 | GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs | Pu Hua et.al. | 2410.03645v1 | null |
2024-10-04 | What Matters for Model Merging at Scale? | Prateek Yadav et.al. | 2410.03617v1 | null |
2024-10-04 | Table Question Answering for Low-resourced Indic Languages | Vaishali Pal et.al. | 2410.03576v1 | link |
2024-10-04 | STREAMS: An Assistive Multimodal AI Framework for Empowering Biosignal Based Robotic Controls | Ali Rabiee et.al. | 2410.03486v1 | null |
2024-10-04 | Zero-Shot Fact Verification via Natural Logic and Large Language Models | Marek Strong et.al. | 2410.03341v1 | link |
2024-10-04 | Selective Test-Time Adaptation for Unsupervised Anomaly Detection using Neural Implicit Representations | Sameer Ambekar et.al. | 2410.03306v1 | link |
2024-10-04 | Comparing zero-shot self-explanations with human rationales in multilingual text classification | Stephanie Brandl et.al. | 2410.03296v1 | null |
2024-10-04 | Enhanced Transformer architecture for in-context learning of dynamical systems | Matteo Rufolo et.al. | 2410.03291v1 | null |
2024-10-04 | What do Large Language Models Need for Machine Translation Evaluation? | Shenbin Qian et.al. | 2410.03278v1 | link |
2024-10-04 | PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Saleh Afzoon et.al. | 2410.03198v1 | null |
2024-10-03 | Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations | Nick Jiang et.al. | 2410.02762v1 | link |
2024-10-03 | Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Ulyana Piterbarg et.al. | 2410.02749v1 | link |
2024-10-03 | Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers | Shijie Chen et.al. | 2410.02642v1 | null |
2024-10-03 | Plots Unlock Time-Series Understanding in Multimodal Models | Mayank Daswani et.al. | 2410.02637v1 | null |
2024-10-03 | LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model | Duy M. H. Nguyen et.al. | 2410.02615v1 | null |
2024-10-03 | Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment | Kai Liu et.al. | 2410.02505v1 | link |
2024-10-03 | Cross-Embodiment Dexterous Grasping with Reinforcement Learning | Haoqi Yuan et.al. | 2410.02479v1 | null |
2024-10-03 | Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations | Bohan Zhou et.al. | 2410.02477v1 | null |
2024-10-03 | Unsupervised Meta-Learning via Dynamic Head and Heterogeneous Task Construction for Few-Shot Classification | Yunchuan Guan et.al. | 2410.02267v1 | link |
2024-10-03 | Visual Prompting in LLMs for Enhancing Emotion Recognition | Qixuan Zhang et.al. | 2410.02244v1 | null |
2024-10-02 | An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Soham Govande et.al. | 2410.01704v1 | link |
2024-10-02 | Saliency-Guided DETR for Moment Retrieval and Highlight Detection | Aleksandr Gordeev et.al. | 2410.01615v1 | link |
2024-10-02 | Coordinate-Based Neural Representation Enabling Zero-Shot Learning for 3D Multiparametric Quantitative MRI | Guoyan Lao et.al. | 2410.01577v1 | null |
2024-10-03 | EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections | Francesc Net et.al. | 2410.01536v2 | link |
2024-10-02 | Toward a Holistic Evaluation of Robustness in CLIP Models | Weijie Tu et.al. | 2410.01534v1 | null |
2024-10-02 | SinkSAM: A Monocular Depth-Guided SAM Framework for Automatic Sinkhole Segmentation | Osher Rafaeli et.al. | 2410.01473v1 | link |
2024-10-02 | The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Hong Li et.al. | 2410.01417v1 | null |
2024-10-02 | AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment | Umair Nawaz et.al. | 2410.01407v1 | link |
2024-10-02 | Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots | Renkai Wu et.al. | 2410.01395v1 | link |
2024-10-02 | Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling | Yuguang Yang et.al. | 2410.01350v1 | null |
2024-09-30 | Uni |
Yubin Wang et.al. | 2409.20558v1 | null |
2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557v1 | null |
2024-09-30 | Robi Butler: Remote Multimodal Interactions with Household Robot Assistant | Anxing Xiao et.al. | 2409.20548v1 | null |
2024-09-30 | FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing | Lingling Cai et.al. | 2409.20500v1 | null |
2024-10-01 | Instance-adaptive Zero-shot Chain-of-Thought Prompting | Xiaosong Yuan et.al. | 2409.20441v2 | null |
2024-09-30 | VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Ruotong Liao et.al. | 2409.20365v1 | link |
2024-09-30 | CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset | Akshatha Arodi et.al. | 2409.20353v1 | link |
2024-09-30 | RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning | Yuxuan Wu et.al. | 2409.20291v1 | null |
2024-09-30 | Analysing Zero-Shot Readability-Controlled Sentence Simplification | Abdullah Barayan et.al. | 2409.20246v1 | null |
2024-09-30 | VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Huilin Deng et.al. | 2409.20146v1 | null |
2024-09-27 | Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs | Yanyuan Qiao et.al. | 2409.18794v1 | null |
2024-09-27 | When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation | Yuli Zhou et.al. | 2409.18653v1 | link |
2024-09-27 | Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations | Nicolò Penzo et.al. | 2409.18602v1 | link |
2024-09-27 | "Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models | Ricardo Knauer et.al. | 2409.18594v1 | null |
2024-09-27 | EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis | Haoyu Wang et.al. | 2409.18512v1 | null |
2024-09-27 | Exploring Language Model Generalization in Low-Resource Extractive QA | Saptarshi Sengupta et.al. | 2409.18446v1 | link |
2024-09-26 | AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models | Xin Hong et.al. | 2409.18339v1 | null |
2024-09-26 | Learning to Drive via Asymmetric Self-Play | Chris Zhang et.al. | 2409.18218v1 | null |
2024-09-26 | Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Jing He et.al. | 2409.18124v1 | null |
2024-09-26 | GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Shangyi Luo et.al. | 2409.18084v1 | null |
2024-09-26 | FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction | Runze He et.al. | 2409.18071v1 | null |
2024-09-26 | DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving | Dingrui Wang et.al. | 2409.18053v1 | link |
2024-09-26 | IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning | Soeun Lee et.al. | 2409.18046v1 | link |
2024-09-26 | Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy | Owen Henkel et.al. | 2409.17904v1 | null |
2024-09-26 | Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models | Hui-Po Wang et.al. | 2409.17836v1 | link |
2024-09-27 | Few-shot Pairwise Rank Prompting: An Effective Non-Parametric Retrieval Model | Nilanjan Sinhababu et.al. | 2409.17745v2 | null |
2024-09-26 | AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status | Jinghao Zhang et.al. | 2409.17740v1 | null |
2024-09-26 | Robust Ladder Climbing with a Quadrupedal Robot | Dylan Vogel et.al. | 2409.17731v1 | null |
2024-09-25 | Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? | Bowen Zhao et.al. | 2409.17080v1 | link |
2024-09-25 | ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis | Fangshuo Zhou et.al. | 2409.17049v1 | link |
2024-09-25 | Detecting Temporal Ambiguity in Questions | Bhawna Piryani et.al. | 2409.17046v1 | link |
2024-09-25 | Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness | Shixuan Ma et.al. | 2409.16914v1 | link |
2024-09-25 | Pruning Multilingual Large Language Models for Multilingual Inference | Hwichan Kim et.al. | 2409.16911v1 | link |
2024-09-25 | Multi-objective Evolution of Heuristic Using Large Language Model | Shunyu Yao et.al. | 2409.16867v1 | null |
2024-09-25 | Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation | Yulin Wang et.al. | 2409.16818v1 | link |
2024-09-25 | Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification | Ming Li et.al. | 2409.16718v1 | link |
2024-09-24 | Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval | Qiuhai Zeng et.al. | 2409.16497v1 | null |
2024-09-24 | BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes | Kasun Weerakoon et.al. | 2409.16484v1 | null |
2024-09-24 | Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation | Homanga Bharadhwaj et.al. | 2409.16283v1 | null |
2024-09-24 | Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation | Hannah Kerner et.al. | 2409.16252v1 | link |
2024-09-24 | Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech | Yunji Chu et.al. | 2409.16203v1 | null |
2024-09-24 | HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection | Yuqi Ma et.al. | 2409.16136v1 | null |
2024-09-24 | Evaluation of state-of-the-art ASR Models in Child-Adult Interactions | Aditya Ashvin et.al. | 2409.16135v1 | null |
2024-09-24 | Bridging Environments and Language with Rendering Functions and Vision-Language Models | Theo Cachet et.al. | 2409.16024v1 | null |
2024-09-24 | Finetuning LLMs for Comparative Assessment Tasks | Vatsal Raina et.al. | 2409.15979v1 | null |
2024-09-24 | StyleSinger 2: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control | Yu Zhang et.al. | 2409.15977v1 | link |
2024-09-24 | SLIMER-IT: Zero-Shot NER on Italian Language | Andrew Zamai et.al. | 2409.15933v1 | link |
2024-09-24 | Zero-Shot Detection of AI-Generated Images | Davide Cozzolino et.al. | 2409.15875v1 | null |
2024-09-24 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | Sijing Chen et.al. | 2409.12139v3 | null |
2024-09-18 | IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition | Rui Liu et.al. | 2409.12092v1 | null |
2024-09-18 | Efficacy of Synthetic Data as a Benchmark | Gaurav Maheshwari et.al. | 2409.11968v1 | null |
2024-09-18 | GauTOAO: Gaussian-based Task-Oriented Affordance of Objects | Jiawen Wang et.al. | 2409.11941v1 | null |
2024-09-18 | LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models | Amaia Cardiel et.al. | 2409.11919v1 | null |
2024-09-18 | ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images | Abhinaw Jagtap et.al. | 2409.11874v1 | null |
2024-09-18 | One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation | Finn Lukas Busch et.al. | 2409.11764v1 | null |
2024-09-18 | Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation | Haohan Guo et.al. | 2409.11630v1 | null |
2024-09-17 | Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification | Frederik Hagelskjær et.al. | 2409.11512v1 | null |
2024-09-17 | Enriching Datasets with Demographics through Large Language Models: What's in a Name? | Khaled AlNuaimi et.al. | 2409.11491v1 | null |
2024-09-17 | Says Who? Effective Zero-Shot Annotation of Focalization | Rebecca M. M. Hicke et.al. | 2409.11390v1 | null |
2024-09-17 | Towards Time Series Reasoning with LLMs | Winnie Chow et.al. | 2409.11376v1 | null |
2024-09-17 | Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Gonzalo Martin Garcia et.al. | 2409.11355v1 | **[link](https://github.com/VisualComputingInstitute/diffusion- |