• image 1
    Computer Vision
    To the Perfect Solution
  • image 2
    Video Object Segmentation
    Extract the Primary Moving Object from Videos
  • image 3
    is Beautiful
  • image 4
    is Bright


Research Interests:
  • Computer Vision (Image/Video Segmentation, Human Pose Estimation, Structure from Motion, Scene Understanding)
  • Machine Learning (Deep Learning)
  • Ph.D., Computer Science, CRCV, University of Central Florida (UCF), USA, 2016 (Advisor: Dr.Mubarak Shah)
  • Ph.D., Control Science and Engineering, Zhejiang University (ZJU), China, 2013
  • B.E., Automation, Zhejiang University (ZJU), China, 2007


  • 7/20/2017 Paper accepted to ICCV 2017

  • 6/19/2017 Paper accepted to CVIU

  • 9/10/2015 Paper accepted to ICCV 2015

  • 9/2/2014 Release of Video Object CoSegmentation dataset (Safari dataset) and code Version 0.9 (Project Page)


More ...
Human Pose Estimation in Videos ( Detail... )
Image description

In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. In contrast to the commonly employed graph optimization framework, which is NP-hard and needs approximate solutions, we formulate this problem into a unified two stage tree-based optimization problem for which an efficient and exact solution exists. Although the proposed method finds an exact solution, it does not sacrifice the ability to model the spatial and temporal constraints between body parts in the video frames; indeed it even models the symmetric parts better than the existing methods. The proposed method is based on two main ideas: `Abstraction' and `Association' to enforce the intra- and inter-frame body part constraints respectively without inducing extra computational complexity to the polynomial time solution. Using the idea of `Abstraction', a new concept of `abstract body part' is introduced to model not only the tree based body part structure similar to existing methods, but also extra constraints between symmetric parts. Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames. Finally, a sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization. We evaluated the proposed method on three publicly available video based human pose estimation datasets, and obtained dramatically improved performance compared to the state-of-the-art methods.

Video Object Segmentation ( Detail... )
Image description

The goal of video object segmentation is to detect the primary object in videos and to delineate it from the background in all frames. Video object segmentation is a well-researched problem in the computer vision community and is a prerequisite for a variety of high-level vision applications, including content based video retrieval, video summarization, activity understanding and targeted content replacement. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background.

Video Object CoSegmentation ( Detail... )
Image description

We propose a novel approach for object co-segmentation in arbitrary videos by sampling, tracking and matching object proposals via a Regulated Maximum Weight Clique (RMWC) extraction scheme. The proposed approach is able to achieve good segmentation results by pruning away noisy segments in each video through selection of object proposal tracklets that are spatially salient and temporally consistent, and by iteratively extracting weighted groupings of objects with similar shape and appearance (with-in and across videos). The object regions obtained from the video sets are used to initialize per-pixel segmentation to get the final co-segmentation results. Our approach is general in the sense that it can handle multiple objects, temporary occlusions, and objects going in and out of view. Additionally, it makes no prior assumption on the commonality of objects in the video collection. The proposed method is evaluated on publicly available multi-class video object co-segmentation dataset and demonstrates improved performance compared to the state-of-the-art methods.

Recent Publications

More ...
  • Amir Mazaher, Dong Zhang and Mubarak Shah, "Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions", ICCV, 2017.

  • Waqas Sultani, Dong Zhang and Mubarak Shah, "Unsupervised Action Proposal Ranking through Proposal Recombination", Computer Vision and Image Understanding (CVIU), 2017.

  • Dong Zhang, Ziyan Wu, Shanhui Sun, Bor-Jeng Chen, and Terrence Chen, "Vessel Tree Tracking in Angiographic Sequences", Journal of Medical Imaging, 2017.

  • Bor-Jeng Chen, Ziyan Wu, Shanhui Sun, Dong Zhang and Terrence Chen, "Guidewire Tracking Using a Novel Sequential Segment Optimization Method in Interventional X-Ray Videos", ISBI, 2016.

  • Dong Zhang, and Mubarak Shah, "Human Pose Estimation in Videos", ICCV, 2015. (PDF)

  • Dong Zhang, Omar Javed and Mubarak Shah, "Video Object Co-Segmentation by Regulated Maximum Weight Cliques", ECCV, 2014. (PDF)

  • Dong Zhang, Omar Javed and Mubarak Shah, "Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions", CVPR, 2013. ( Oral, Acceptance rate = 3.2%) (PDF)

  • Dong Zhang, Omar Oreifej and Mubarak Shah, "Face Verification Using Boosted Cross-Image Features", Technical Report, September 2013. (http://arxiv-web3.library.cornell.edu/abs/1309.7434)(PDF)

  • Dong Zhang and Ping Li, "Visual Odometry in Dynamical Scenes.", Sensors and Transducers (1726-5479), 147(12), 2012.(PDF)

  • Dong Zhang and Ping Li. "Motion Detection for Rapidly Moving Cameras in Fully 3D Scenes.", IEEE Fourth Pacific-Rim Symposium on Image and Video Technology (PSIVT), 2010. (Oral, acceptance rate 20%)(PDF)