Rigid Object Point Tracking

Project Overview

Developed an advanced full-stack annotation tool for evaluating and improving rigid object tracking performance at the Rice University Computer Vision Lab. The project focused on creating a novel attention weight bias mechanism integrated with semantic segmentation to enhance point tracking accuracy for rigid objects, alongside building comprehensive tooling for model evaluation and visualization.

Key Achievements

Attention Weight Bias Mechanism

Developed a novel attention weight bias mechanism that leverages semantic segmentation to improve point tracking accuracy for rigid objects by 5% over baseline models. This mechanism intelligently guides the tracker's attention to focus on features belonging to the same rigid object, reducing drift and improving long-term tracking stability across challenging scenarios including occlusions and appearance changes.

Full-Stack Annotation Tool

Built a comprehensive full-stack annotation tool with a JavaScript front end and Python backend (Flask, OpenCV) that reduced manual annotation time by 30%. The tool streamlines the evaluation workflow by providing intuitive interfaces for point and line selection, automatic track visualization, and seamless integration with state-of-the-art tracking models.

Dynamic Point and Line Selection

Enabled dynamic point and line selection capabilities with real-time track visualization, allowing researchers to interactively define tracking targets and immediately observe model performance. This feature supports both individual point tracking and line-based rigid object constraints, providing flexibility in defining tracking scenarios.

Multi-Model Integration

Achieved seamless integration with multiple state-of-the-art tracking models including CoTracker3 and PIP++, enabling comprehensive comparative analysis and benchmarking. The unified interface abstracts model-specific implementation details while providing consistent evaluation metrics across different architectures.

Backend Architecture

Designed a robust and extensible backend architecture optimized for loading video data, managing tracker configurations, and supporting multi-model workflows. The architecture includes efficient video frame caching, parallel processing capabilities, and modular design patterns that facilitate easy addition of new tracking models and evaluation metrics.

Technical Implementation

The project leveraged cutting-edge deep learning frameworks and computer vision techniques:

Attention Mechanism: Custom attention weight bias layer integrated with semantic segmentation networks to enforce rigid object constraints
Frontend: Interactive JavaScript interface with real-time canvas rendering for track visualization and user input
Backend: Flask-based REST API with OpenCV for video processing and PyTorch for model inference
Model Integration: Abstraction layer supporting multiple tracking architectures with unified evaluation pipeline
Data Processing: Efficient video frame buffering and preprocessing pipeline optimized for real-time feedback
Evaluation Metrics: Comprehensive suite of tracking metrics including accuracy, precision, and temporal consistency

Research Impact

This work contributes to advancing the field of rigid object tracking by introducing an attention-based mechanism that improves tracking accuracy while maintaining computational efficiency. The annotation tool has become an essential component of the lab's research workflow, accelerating the evaluation and development of new tracking algorithms. The improvements in tracking accuracy and annotation efficiency directly support research in robotics, autonomous systems, and augmented reality applications where precise rigid object tracking is critical.

Future Directions

Ongoing work includes extending the attention mechanism to handle deformable objects, incorporating temporal consistency constraints across longer sequences, and exploring self-supervised learning approaches to reduce annotation requirements further. The tool's architecture is designed to accommodate these extensions while maintaining backward compatibility with existing tracking models and evaluation workflows.

Project Information

Organization: Rice University Computer Vision Lab
Role: Research Intern
Duration: May 2025 - November 2025
Category: Computer Vision / Deep Learning

Technologies Used

PyTorch
Deep Learning
OpenCV
Flask
JavaScript
Computer Vision
Python
Semantic Segmentation

Key Metrics

Tracking Accuracy: 5% improvement
Annotation Time: 30% reduction
Models Supported: CoTracker3, PIP++
Real-time Visualization: Yes

Key Features

Attention weight bias mechanism
Dynamic point/line selection
Real-time track visualization
Multi-model integration
Extensible backend architecture