Publications & Technical Blogs

Beyond building systems, I enjoy writing technical blogs to deepen my understanding of new concepts and share knowledge with the broader community.

Blog Article
2024 • Technical Blog

Recursive Temporal-Consistent Video Generation on Latent Variables via Alpha Diffusion Framework: Integrating Global and Local Contextual Modeling for 30-Second Sequences

Temporal-consistent Alpha blend diffusion Recursive generation

This paper introduces a novel framework for recursively generating temporally consistent content sequences of 30-second duration using an Alpha Diffusion architecture. By integrating global and local contextual modeling, our approach ensures coherence across temporal scales while maintaining high fidelity in content generation. The global context captures overarching structural patterns, while the local context refines fine-grained details, enabling seamless transitions and long-term consistency...

Blog Article
2023 • Technical Blog

Diffusion-Based One-Shot Video Generation via Pose-Guided Alignment

DDIM PGT-AV Video Generation

This paper presents a novel approach to generate videos conditioned on a single input frame using pose-guided trajectory alignment. Our method integrates diffusion models with motion priors to synthesize temporally coherent video sequences while preserving the identity and appearance of the input subject. By leveraging human pose as an intermediate representation, we achieve precise control over motion dynamics...

Blog Article
2022 • Product

MetaCast App: Democratizing Real-Time 3D Human Capture

3D MoCap Diffusion Avatar Monocular Video Input

MetaCast is a revolutionary iOS application that transforms any iPhone into a professional motion capture studio. Using advanced computer vision and machine learning, it captures full-body 3D human motion from a single camera in real-time. The app democratizes motion capture technology, making it accessible to content creators, game developers, and fitness enthusiasts without expensive equipment...

Blog Article
2022 • Technical Blog

Metacast: 3D Human Pose Estimation

3D Pose Sports

This research presents a specialized 3D pose estimation system optimized for sports performance analysis. Our approach combines multi-view geometry with deep learning to accurately capture complex athletic movements in real-time. By training on sport-specific datasets, we achieve superior accuracy in tracking rapid motions, occlusions, and unusual body configurations common in athletic activities...

Blog Article
2022 • Technical Blog

Metacast: Multi-View 3D Human Reconstruction

Sensor Fusion 3D Moiton Capture

This work addresses the critical challenge of synchronizing visual and inertial data for accurate 3D human motion tracking. We propose an improved synchronization framework that aligns camera frames with IMU measurements at the microsecond level, significantly reducing temporal misalignment artifacts. Our method employs a novel timestamp calibration algorithm...

Blog Article
2022 • Technical Blog

Paper reading: Human Pose Regression with Residual Log-likelihood Estimation

Machine Learning Human Pose Regression

Regression problem can be reviewed from the maximum likelihood estimation perspective. The learning process of a model is to optimize its learnable parameters \(\phi\) that makes labels \(\mu_{g}\) most probable. This comprehensive analysis explores the mathematical foundations and practical applications of residual log-likelihood estimation in pose regression.

Normalizing Flows
2022 • Technical Blog

Normalizing Flows and Its Friends (Part 2)

flow-based model RealNVP Deep Learning

What is Normalizing Flows? In this second part, we dive deeper into the practical implementations and advanced techniques of normalizing flows, exploring RealNVP architectures and their applications in generative modeling and density estimation.

Normalizing Flows Fundamentals
2022 • Technical Blog

Normalizing Flows and Its Friends (Part 1)

Jacobian Matrix Determinant Change of Variable Theorem

Here is some pre-requisite knowledge required before we move to Normalizing Flows. We'll explore the mathematical foundations including Jacobian matrices, determinants, and the change of variable theorem that form the backbone of flow-based models.

Grad-CAM Visualization
2020 • Technical Blog

Gradient Class Activation Map (Grad-CAM) Summary

GradCam penultimate layer feature visualization

The following note demonstrates the logic procedure of how a GradCam is computed. We'll explore the mathematical derivation and implementation details of this powerful technique for visualizing what convolutional neural networks learn and focus on.

Blog Article
2021 • 2021 ESMO World Congress on Lung Cancer

A deep radiomics approach to assess PD-L1 expression and clinical outcomes

Medical AI Radiomics Lung Cancer

We developed a deep radiomics framework that predicts PD-L1 expression levels from CT scans, enabling non-invasive biomarker assessment for immunotherapy planning. Our approach combines convolutional neural networks with handcrafted radiomic features to capture both semantic and textural patterns associated with tumor biology. The model achieves high accuracy in stratifying patients...

Blog Article
2019 • Patent

Apparatus and method for detecting, classifying and tracking road users on frames of video data

Patent Object Detection Traffic Surveillance

This patent describes an innovative system for real-time detection, classification, and tracking of multiple road users including vehicles, pedestrians, and cyclists. The method employs a hierarchical deep learning architecture that processes video streams to provide accurate trajectory prediction and behavior analysis. The system is designed for deployment in smart city infrastructure...

Blog Article
2019 • 22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)

Task Adaptive Metric Space for Medium-Shot Medical Image Classification

Metric Learning Few-shot Learning Medical Imaging

We present a novel metric learning framework specifically designed for medical image classification with limited training samples. Our approach learns task-adaptive embedding spaces that maximize inter-class distances while preserving clinically relevant features. The method addresses the challenge of medium-shot learning scenarios common in medical imaging where only 10-50 samples per class are available...

Blog Article
2019 • Gastroenterology - American Gastroenterological Association

Artificial Intelligence for Real-Time Multiple Polyp Detection

Medical AI Colonoscopy Real-time Detection

Our work introduces a real-time AI system for detecting multiple polyps during colonoscopy procedures. The system achieves 95% sensitivity while maintaining low false positive rates, significantly outperforming existing methods. By processing video streams at 30 FPS, our solution provides immediate feedback to endoscopists, improving polyp detection rates and reducing miss rates...

Blog Article
2018 • Montreal AI Symposium (MAIS)

A Single Framework for Domain Adaptation and Generalization in Medical Image Analysis

Domain Adaptation Medical AI Transfer Learning

This research presents a unified framework that addresses both domain adaptation and generalization challenges in medical imaging. Our approach uses adversarial training combined with feature disentanglement to learn domain-invariant representations. The framework successfully transfers knowledge across different imaging modalities, scanner manufacturers, and patient populations...

Blog Article
2018 • McGill M.Eng. Thesis

Deep-learning-based Multiple Object Tracking in Traffic Surveillance Video

Optical Flow Tracking Kalman Filter

Multiple object tracking (MOT) is an important topic in the computer vision. One of its important applications is in traffic surveillance for examining potential risks for traffic intersections and providing analysis of road usages. In this thesis, we propose a powerful and efficient model for solving MOT problems under traffic surveillance environments...

Quantum Computing
2016 • 13th Conference on Computer and Robot Vision (CRV)

Generation of Spatial-temporal Panoramas with a Single Moving Camera

Image Stitching Panorama VR

Development of image stitching techniques, which take multiple images and stitch them together to make natural looking panoramas, is an integral part of the new wave in visual media - the 360 surround displays, such as the Oculus Rift. However...