US20130136298A1 - System and method for tracking and recognizing people - Google Patents

System and method for tracking and recognizing people Download PDF

Info

Publication number
US20130136298A1
US20130136298A1 US13/306,783 US201113306783A US2013136298A1 US 20130136298 A1 US20130136298 A1 US 20130136298A1 US 201113306783 A US201113306783 A US 201113306783A US 2013136298 A1 US2013136298 A1 US 2013136298A1
Authority
US
United States
Prior art keywords
tracking
unlabeled
samples
tracking samples
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/306,783
Inventor
Ting Yu
Peter Henry Tu
Dashan Gao
Kunter Seref Akbay
Yi Yao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Electric Co
Original Assignee
General Electric Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Electric Co filed Critical General Electric Co
Priority to US13/306,783 priority Critical patent/US20130136298A1/en
Assigned to GENERAL ELECTRIC COMPANY reassignment GENERAL ELECTRIC COMPANY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: YAO, YI, AKBAY, KUNTER SEREF, GAO, DASHAN, TU, PETER HENRY, YU, TING
Priority to EP12784193.0A priority patent/EP2786313A1/en
Priority to PCT/US2012/062020 priority patent/WO2013081750A1/en
Publication of US20130136298A1 publication Critical patent/US20130136298A1/en
Priority to US15/002,672 priority patent/US9798923B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • Smart environments such as an indoor office and/or living space with ambient intelligence, have been widely adopted in various domains.
  • a prerequisite to taking advantage of the intelligent and context-aware services within these spaces is knowing people's locations and their spatiotemporal context with respect to the environment.
  • person detectors and video-based person tracking systems with a tracking-by-detection paradigm may be utilized to determine people's location and their spatiotemporal context within the environment.
  • a multi-camera, multi-person tracking system may be utilized to localize and track individuals in real-time.
  • various environmental challenges e.g., harsh lighting conditions, cluttered backgrounds, etc.
  • a tracking and recognition system configured to recognize one or more persons, without a priori knowledge of the respective persons, via an online discriminative learning of appearance signature models of the respective persons.
  • the computer vision-based identity recognition system includes a memory physically encoding one or more routines, which when executed, cause the performance of constructing pairwise constraints between the unlabeled tracking samples.
  • the computer vision-based identity recognition system also includes a processor configured to receive unlabeled tracking samples collected from one or more person trackers and to execute the routines stored in the memory via one or more algorithms to construct the pairwise constraints between the unlabeled tracking samples.
  • the computer vision-based identity recognition system includes a processor configured to receive unlabeled tracking samples collected from one or more person trackers and to execute one or more algorithms to construct pairwise constraints between the unlabeled tracking samples.
  • a method for tracking and recognition of people includes generating tracking samples from one or more person trackers of a tracking system.
  • the method also includes receiving unlabeled tracking samples from the generated tracking samples into a data buffer for a time span.
  • the method further includes generating weighted pairwise constraints between the unlabeled tracking samples.
  • the method yet further includes generating clusters via spectral clustering of the unlabeled tracking samples with weighted pairwise constraints.
  • the method still further includes learning a respective appearance signature model for each respective cluster.
  • a non-transitory, computer-readable media including one or more routines which executed by at least one processor causes acts to be performed.
  • the acts include receiving unlabeled tracking samples collected from one or more person trackers.
  • the acts also include generating weighted pairwise constraints between the unlabeled tracking samples.
  • the acts further include generating clusters via spectral clustering of the unlabeled tracking samples with weighted pairwise constraints.
  • the acts yet further include learning in an online and discriminative manner a respective appearance signature model for each respective cluster.
  • FIG. 1 is a diagrammatic view of an exemplary tracking system, in which recognition of people is implemented in accordance with embodiments of the present disclosure
  • FIG. 2 is a schematic of three-dimensional geometry-based person detection for use with embodiments of the present disclosure
  • FIG. 3 is a schematic of a multi-camera, multi-person tracking system for use with embodiments of the present disclosure
  • FIG. 4 is a flow chart illustrating a method for tracking and recognizing people using the tracking system of FIG. 1 in accordance with embodiments of the present disclosure
  • FIG. 5 is a schematic illustrating learning of appearance signature models using the tracking system of FIG. 1 in accordance with embodiments of the present disclosure.
  • FIG. 6 is a schematic illustrating procedures for online learning of appearance signature models using the tracking system of FIG. 1 in accordance with embodiments of the present disclosure.
  • the tracking and recognition system 10 depicted in FIG. 1 is configured to track people despite tracking errors (e.g., temporary trajectory losses and/or identity switches) that may occur. These tracking errors may result in noisy data or samples that include spatiotemporal gaps.
  • the tracking and recognition system 10 is configured to handle the noisy data to enable the recognition and tracking of multiple people.
  • the tracking and recognition system 10 includes a tracking subsystem 12 and a computer vision-based identity recognition system 14 .
  • the tracking subsystem 12 operates continuously in real-time to monitor people's activities in an area of interest.
  • the tracking subsystem 12 includes one or more optical sensors 16 (e.g., cameras). Live video streams from multiple cameras 16 are calibrated into a common 3D coordinate as described in greater detail below.
  • One or more person trackers 18 utilize the video streams from the cameras 16 to track one or more respective persons via data association and filtering using the detection returned from a generic person detector and its cropped appearance sample from each camera view as described in greater detail below.
  • the tracker 18 is a computer entity that understands image content and may track the same object (e.g., person) over time.
  • the computer vision-based identity recognition system 14 (e.g., a semi-supervised data clustering and discriminative signature model learning system) is configured to utilize the noisy data from the tracking subsystem 12 to learn the appearance signature models of people in an online manner.
  • the online learning of the computer vision-based identity recognition system 14 includes receiving a set of training data and classifying the training data, while updating the classified data over time with new data.
  • the computer vision-based identity recognition system 14 may utilize a type of machine learning such as semi-supervised learning.
  • semi-supervised learning the computer vision-based identity recognition system 14 utilizes a large amount of unlabeled data (e.g., unlabeled tracking samples) and a small amount of labeled data (e.g., previously learnt discriminative appearance signal models).
  • the computer vision-based identity recognition system 14 is configured to recognize one or more persons, without a priori knowledge of the respective persons, via an online discriminative learning of appearance signature models 20 (e.g., discriminative appearance signature model) of the respective persons.
  • appearance signature models 20 e.g., discriminative appearance signature model
  • the computer vision-based identity recognition system 14 does not rely on any previously learned individual models, but identifies a person through an appearance signature using an online learned and continuously updated identity pool.
  • the computer vision-based identity recognition system 14 constrains the training data by analyzing the fidelity of tracked trajectories in terms of spatial locality and temporal continuity, which helps categorizes the data.
  • the computer vision-based identity recognition system 14 utilizes a multi-step approach to learn each appearance signature model 20 by constructing pairwise constraints for the unlabeled tracking samples, categorizing samples by solving a clustering problem with pairwise constraints, and learning a large-margin based discriminative signature model 20 for each data cluster to be maintained and carried over in an online mode within the identity pool.
  • the computer vision-based identity recognition system 14 includes a processor 22 to implement the multi-step approach to learn each appearance signature model 20 .
  • the processor 22 is configured to receive unlabeled tracking samples (e.g., noisy samples with spatiotemporal gaps) collected from the one or more person trackers 18 .
  • unlabeled tracking samples e.g., noisy samples with spatiotemporal gaps
  • the tracking process of the tracking subsystem 12 and the signature model learning of the computer vision-based identity recognition system 14 may be coupled in a batch processing manner.
  • the processor 22 may receive and buffer the unlabeled tracking samples in an online and asynchronous mode. For example, once a data buffer of the processor 22 for a time span reaches a threshold size based on the unlabeled tracking samples received, the processor 22 activates pairwise constraint generation, clustering, and learning processes. In certain embodiments, a portion of the received unlabeled tracking samples in the data buffer overlap from two successive time spans.
  • the processor 22 is configured to execute one or more algorithms (which may be stored as executable code in a memory and/or storage device 23 of the computer vision-based identity recognition system 14 ) to implement each step of the multi-step approach to learn each appearance signature model.
  • executable code may cause, when executed, the processor 22 to construct pairwise constraints between the received unlabeled tracking samples.
  • the pairwise constraints may represent that two samples must belong to one person (e.g., must-link constraint) and/or that two samples cannot belong to one person (e.g., cannot-link constraint) as described in greater detail below.
  • each pairwise constraint may be weighed between the unlabeled tracking samples, for example, by estimating the likelihood that the constraint should be enforced.
  • the executable code may cause, when executed, the processor 22 to cluster (e.g., spectral cluster) the unlabeled tracking samples with weighted pairwise constraints.
  • the processor 22 may utilize a kernel learning based function for spectral clustering.
  • the executable code may cause, when executed, the processor 22 to learn a respective appearance signature model 20 (e.g., discriminative appearance signature model).
  • the processor 22 may learn a new appearance signature model or update a maintained appearance signature model.
  • the processor 22 may utilize a support vector machine (SVM) (e.g., multi-class SVM) to learn the respective appearance signature model 20 for each cluster of unlabeled tracking samples.
  • SVM support vector machine
  • An incremental SVM may be utilized in updating the maintained appearance signature model.
  • the computer vision-based identity recognition system 14 may include a special purpose processor configured to implement the steps described above.
  • FIG. 2 is a schematic of a three-dimensional geometry-based person detection system 24 for use with embodiments of the present disclosure (e.g., tracking subsystem 12 ). Determining where the specific agent (e.g., person) that is detected is achieved by characterizing physical X-Y measures in a 2D space using one camera sensor or by X-Y-Z in 3D using more than one camera. Cameras or optical sensors 16 of system 24 operate in a calibrated fashion, where the correspondence between the 3D world coordinate system 26 and 2D image spaces 28 can be established. Hence, a detailed 3D human body model may be crafted based on the physical dimension of a human and stands on a ground plane. The model's projection onto the image plane can be used to explain the extracted foreground regions and nominate a hypothesized set of ground plane locations that may be occupied by people 30 .
  • the specific agent e.g., person
  • the multi-camera, multi-person tracking and recognition system 10 is useable with the tracking subsystem 12 and the computer vision-based identity recognition system 14 to differentiate agents (e.g., persons) in the monitored space.
  • the multi-person tracking in one embodiment is performed in a centralized fashion once person detections from each camera sensor 18 are received and time ordered. Each newly detected person will be assigned with a new tracker 18 with a unique tracker ID that operates on the ground plane in a virtual semantic scene space.
  • the tracker 18 may include 3D ground plane-based trackers maintained in real time.
  • the tracking samples collected from the trackers 18 are provided to the computer vision-based identity recognition system 14 to recognize one or more persons via online discriminative learning of appearance signature models 20 .
  • the trajectory filtering/smoothing of each tracker 18 is also performed on the ground plane in such a centralized fashion, enabling the system to provide a continuous meta-data stream in the form of person locations as a function of time (i.e., real-time).
  • FIG. 4 is a flow chart illustrating a method 32 for tracking and recognizing people using the tracking and recognition system 10 of FIG. 1 .
  • FIG. 5 illustrates the learning of appearance signature models described in method 32 .
  • the method 32 includes generating tracking samples from one or more person trackers 18 of the tracking subsystem 12 (block 34 ).
  • the data buffer of the computer vision-based identity recognition system 14 i.e., processor 22 ) receives unlabeled tracking samples 36 from the generated samples for a given time span.
  • the computer vision-based identity recognition system 14 receives the unlabeled tracking sample (block 38 ), via batch processing, in an online and asynchronous mode.
  • the data buffer must reach a threshold size before activation of the generation of pairwise constraints between the unlabeled tracking samples 36 and clusters, and online discriminative learning of a respective appearance signature model 20 for a respective cluster.
  • the method 32 also includes constructing pairwise constraints 40 between the unlabeled tracking samples 36 (block 42 ). Constructing pairwise constraints 40 enables analyzing the spatial and temporal properties of the tracking trajectories of the samples 36 .
  • Reference numeral 44 of FIG. 5 illustrates the samples with pairwise constraints 40 . As depicted in FIG. 5 , dots represent the unlabeled tracking samples 36 . A dashed line represents a must-link constraint between the samples 36 and a solid line represents a cannot-link constraint between the samples 36 . The relationship in a must-link constraint is that two samples 36 are cropped from the same tracker 18 (i.e., a single tracker). The relationship in a cannot-link constraint is that two samples 36 are from different trackers 18 .
  • the method 32 includes generating weighted pairwise constraints 44 between the unlabeled tracking samples 36 based on the samples 36 with pairwise constraints 40 (block 46 ).
  • Each constraint is weighed by estimating how likely the constraint should be enforced. For example, assuming that a total of N persons are being tracked in V camera views, T i denotes the i-th tracker, i ⁇ 1, . . . , N ⁇ , and v ⁇ 1 , . . . , V ⁇ representing the i-th camera view.
  • x i t and x i t,v represent the 3D world coordinate of the tracker T i and its projected 2D image position into camera view v.
  • s i t represents the appearance sample collected from tracker T i at time t
  • s i t,v represents its appearance component from camera view v.
  • a must-link relationship enables the measurement of the fidelity of the tracking system that the person tracker 18 is very unlikely to be getting confused with others.
  • the further apart the tracker 18 is from all other trackers 18 in any other camera views the more confident the system can be at dedicating this tracker 18 to follow the same person over time.
  • a high likelihood value of such a must-link constraint should be assigned between the pairs of appearance samples collected from this tracker 18 over successive frames.
  • the likelihood of the must-link constraint between collected appearance samples s i t and s i t-1 can be defined as:
  • w i t,v represents the healthiness score of tracker T i in view v at time t, which by definition measures the confidence of tracker T i not getting confused with others in view v.
  • the healthiness score w i t,v is defined as:
  • the cannot-link relationship measures how confident a tracking system is that two collected appearance samples for two trackers 18 are actually from two different persons.
  • the cannot-link likelihood of any pair of samples collected at time t, s i t and s j t can be defined as:
  • the method 32 includes generating clusters 48 via spectral clustering of the samples with weighted pairwise constraints 44 (block 50 ).
  • Reference numeral 52 of FIG. 5 illustrates the clusters 48 of samples with weighted pairwise constraints 44 .
  • K K
  • s i v is the components of s i in camera view v
  • w i v is the healthiness score of sample s i in view v at the time when the sample s i was collected (see (2))
  • ⁇ (x, y) represents any similarity measure between two samples.
  • u i is the eigenvector corresponding to the i-th smallest eigenvalue of the normalized graph
  • kernel matrix K is learned by solving the following optimization problem:
  • the final k target clusters can be formed by applying k-means algorithms to the rows of ( ⁇ square root over ( ⁇ 1 ) ⁇ u 1 , . . . , ⁇ square root over ( ⁇ k ) ⁇ u k ).
  • the method 32 includes associating maintained discriminative signature models 54 (i.e., previously learnt model from prior time span) from the identity pool with one of the generated clusters 48 (block 56 ).
  • Reference numeral 58 of FIG. 5 illustrates the association of the clusters 48 with the maintained discriminative signature models 54 .
  • the stars in FIG. 5 represent the maintained discriminative signature models 54 .
  • the method 32 After generating the clusters 48 (block 50 ), the method 32 includes learning appearance signature models 20 (e.g., discriminative appearance signature models) for each cluster 48 (block 60 ).
  • learning the appearance signature model 20 (block 60 ) includes updating the appearance signature model 20 (i.e., maintained discriminative signature model 54 ) with the new samples 36 .
  • learning the appearance signature model 20 (block 60 ) includes learning a new appearance signature model 20 for inclusion within the identity pool.
  • Reference numeral 62 of FIG. 5 illustrates the sample data 36 associated with the learned appearance signature models 20 .
  • the processor 22 utilizes a SVM to model each identity signature.
  • the processor 22 utilizes an incremental SVM learning scheme.
  • the incremental SVM learner continuously updates itself once new data (e.g., training samples 36 ) become available.
  • FIG. 6 illustrates procedures for the online learning of appearance signature models 20 in a batch processing manner.
  • Each puppet 64 dark shading
  • 66 light shading
  • 68 cross-hatch
  • Each curve 72 (solid line), 74 (dashed line), 76 (dotted-dashed line) represents an estimated trajectory for each respective individual (i.e., puppet 64 , 66 , 68 ) over successive time spans 78 (i.e., t 0 , t 1 ), 80 (i.e., t 1 , t 2 ), 82 (i.e., t 2 , t 3 ).
  • the tracking process e.g., via tracking subsystem 12 ) continuously generates appearance samples 36 that are extracted from the projected image regions of all maintained 3D trackers 18 .
  • the discriminative signature learning process first receives and buffers these samples 36 in an online but asynchronous mode.
  • appearance samples 36 are harvested into the data buffer.
  • the illustrated appearance samples 36 represent the respective individuals (i.e., puppet 64 , 66 , 68 ).
  • the data buffer collects enough data over a time span (i.e., reaches a threshold size), for example at t 1 , t 2 , and t 3 , the generation of the weighted pairwise constraints, the constrained clustering, and the incremental SVM-based signature model learning are activated to process the newly received data stream for that respective time span (e.g., time spans 78 , 80 , 82 ). This enables periodic activation of these processes.
  • sample buffers of two successive spans have some overlap (i.e., clustering and discriminative learning re-use samples 36 in the overlapped span).
  • the samples 36 within overlapping region 86 of time spans 78 and 80 and overlapping region 88 of time spans 80 and 82 may be re-used in the clustering and learning processes for time spans 80 and 82 , respectively. Carrying over some of the samples 36 between successive time spans ensures temporal continuity.
  • the signature models 20 within the identity pool 84 are updated after each round of learning.
  • a technical contribution for the disclosed method and system is that it provides for a computer implemented system and method for trajectory-based tracking and recognition of multiple people.
  • the disclosed embodiments may be interfaced to and controlled by a computer readable storage medium having stored thereon a computer program.
  • the computer readable storage medium may include a plurality of components such as one or more of electronic components, hardware components, and/or computer software components. These components may include one or more computer readable storage media that generally store instructions such as software, firmware and/or assembly language for performing one or more portions of one or more implementations or embodiments of an algorithm as discussed herein. These computer readable storage media are generally non-transitory and/or tangible. Examples of such a computer readable storage medium include a recordable data storage medium of a computer and/or storage device.
  • the computer readable storage media may employ, for example, one or more of a magnetic, electrical, optical, biological, and/or atomic data storage medium.
  • Such media may take the form of, for example, floppy disks, magnetic tapes, CD-ROMs, DVD-ROMs, hard disk drives, and/or solid-state or electronic memory.
  • Other forms of non-transitory and/or tangible computer readable storage media not list may be employed with the disclosed embodiments.
  • Such components can be combined or divided in an implementation of a system. Further, such components may include a set and/or series of computer instructions written in or implemented with any of a number of programming languages, as will be appreciated by those skilled in the art.
  • other forms of computer readable media such as a carrier wave may be employed to embody a computer data signal representing a sequence of instructions that when executed by one or more computers causes the one or more computers to perform one or more portions of one or more implementations or embodiments of a sequence.
  • the disclosed embodiments utilize a computer vision-based identity recognition system 14 to recognize one or more persons without a priori knowledge of the respective persons via an online discriminative learning of appearance signature models 20 of the respective persons using noisy, unlabeled tracking samples 36 .
  • the computer vision-based identity recognition system 14 utilizes a multi-step approach in learning the appearance signature models 20 (e.g., new or updated models) that includes constructing pairwise constraints between the samples 36 , categorizing the samples 36 into clusters 48 , and learning a large-margin based discriminative signature model 20 for each respective cluster 48 .

Abstract

A tracking and recognition system is provided. The system includes a computer vision-based identity recognition system configured to recognize one or more persons, without a priori knowledge of the respective persons, via an online discriminative learning of appearance signature models of the respective persons. The computer vision-based identity recognition system includes a memory physically encoding one or more routines, which when executed, cause the performance of constructing pairwise constraints between the unlabeled tracking samples. The computer vision-based identity recognition system also includes a processor configured to receive unlabeled tracking samples collected from one or more person trackers and to execute the routines stored in the memory via one or more algorithms to construct the pairwise constraints between the unlabeled tracking samples.

Description

    BACKGROUND
  • Smart environments, such as an indoor office and/or living space with ambient intelligence, have been widely adopted in various domains. A prerequisite to taking advantage of the intelligent and context-aware services within these spaces is knowing people's locations and their spatiotemporal context with respect to the environment. Typically, person detectors and video-based person tracking systems with a tracking-by-detection paradigm may be utilized to determine people's location and their spatiotemporal context within the environment. For example, a multi-camera, multi-person tracking system may be utilized to localize and track individuals in real-time. However, various environmental challenges (e.g., harsh lighting conditions, cluttered backgrounds, etc.) may cause tracking errors making it difficult to accurately track multiple people in any real-world scenario.
  • BRIEF DESCRIPTION
  • In a first embodiment, a tracking and recognition system is provided. The system includes a computer vision-based identity recognition system configured to recognize one or more persons, without a priori knowledge of the respective persons, via an online discriminative learning of appearance signature models of the respective persons. The computer vision-based identity recognition system includes a memory physically encoding one or more routines, which when executed, cause the performance of constructing pairwise constraints between the unlabeled tracking samples. The computer vision-based identity recognition system also includes a processor configured to receive unlabeled tracking samples collected from one or more person trackers and to execute the routines stored in the memory via one or more algorithms to construct the pairwise constraints between the unlabeled tracking samples.
  • The computer vision-based identity recognition system includes a processor configured to receive unlabeled tracking samples collected from one or more person trackers and to execute one or more algorithms to construct pairwise constraints between the unlabeled tracking samples.
  • In a second embodiment, a method for tracking and recognition of people is provided. The method includes generating tracking samples from one or more person trackers of a tracking system. The method also includes receiving unlabeled tracking samples from the generated tracking samples into a data buffer for a time span. The method further includes generating weighted pairwise constraints between the unlabeled tracking samples. The method yet further includes generating clusters via spectral clustering of the unlabeled tracking samples with weighted pairwise constraints. The method still further includes learning a respective appearance signature model for each respective cluster.
  • In a third embodiment, a non-transitory, computer-readable media including one or more routines which executed by at least one processor causes acts to be performed is provided. The acts include receiving unlabeled tracking samples collected from one or more person trackers. The acts also include generating weighted pairwise constraints between the unlabeled tracking samples. The acts further include generating clusters via spectral clustering of the unlabeled tracking samples with weighted pairwise constraints. The acts yet further include learning in an online and discriminative manner a respective appearance signature model for each respective cluster.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and other features and aspects of the present embodiments will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
  • FIG. 1 is a diagrammatic view of an exemplary tracking system, in which recognition of people is implemented in accordance with embodiments of the present disclosure;
  • FIG. 2 is a schematic of three-dimensional geometry-based person detection for use with embodiments of the present disclosure;
  • FIG. 3 is a schematic of a multi-camera, multi-person tracking system for use with embodiments of the present disclosure;
  • FIG. 4 is a flow chart illustrating a method for tracking and recognizing people using the tracking system of FIG. 1 in accordance with embodiments of the present disclosure;
  • FIG. 5 is a schematic illustrating learning of appearance signature models using the tracking system of FIG. 1 in accordance with embodiments of the present disclosure; and
  • FIG. 6 is a schematic illustrating procedures for online learning of appearance signature models using the tracking system of FIG. 1 in accordance with embodiments of the present disclosure.
  • DETAILED DESCRIPTION
  • In the subsequent paragraphs, various aspects of identifying and tracking multiple people will be explained in detail. The various aspects of the present techniques will be explained, by way of example only, with the aid of figures hereinafter. The present techniques for identifying and tracking multiple people will generally be described by reference to an exemplary tracking and recognition system (e.g., trajectory-based tracking and recognition system) designated by numeral 10.
  • The tracking and recognition system 10 depicted in FIG. 1 is configured to track people despite tracking errors (e.g., temporary trajectory losses and/or identity switches) that may occur. These tracking errors may result in noisy data or samples that include spatiotemporal gaps. The tracking and recognition system 10 is configured to handle the noisy data to enable the recognition and tracking of multiple people. The tracking and recognition system 10 includes a tracking subsystem 12 and a computer vision-based identity recognition system 14.
  • The tracking subsystem 12 operates continuously in real-time to monitor people's activities in an area of interest. In particular, the tracking subsystem 12 includes one or more optical sensors 16 (e.g., cameras). Live video streams from multiple cameras 16 are calibrated into a common 3D coordinate as described in greater detail below. One or more person trackers 18 utilize the video streams from the cameras 16 to track one or more respective persons via data association and filtering using the detection returned from a generic person detector and its cropped appearance sample from each camera view as described in greater detail below. The tracker 18 is a computer entity that understands image content and may track the same object (e.g., person) over time.
  • The computer vision-based identity recognition system 14 (e.g., a semi-supervised data clustering and discriminative signature model learning system) is configured to utilize the noisy data from the tracking subsystem 12 to learn the appearance signature models of people in an online manner. In particular, the online learning of the computer vision-based identity recognition system 14 includes receiving a set of training data and classifying the training data, while updating the classified data over time with new data. For online learning, the computer vision-based identity recognition system 14 may utilize a type of machine learning such as semi-supervised learning. In semi-supervised learning, the computer vision-based identity recognition system 14 utilizes a large amount of unlabeled data (e.g., unlabeled tracking samples) and a small amount of labeled data (e.g., previously learnt discriminative appearance signal models). Specifically, the computer vision-based identity recognition system 14 is configured to recognize one or more persons, without a priori knowledge of the respective persons, via an online discriminative learning of appearance signature models 20 (e.g., discriminative appearance signature model) of the respective persons. In other words, the computer vision-based identity recognition system 14 does not rely on any previously learned individual models, but identifies a person through an appearance signature using an online learned and continuously updated identity pool. In order to benefit the learning process, the computer vision-based identity recognition system 14 constrains the training data by analyzing the fidelity of tracked trajectories in terms of spatial locality and temporal continuity, which helps categorizes the data. In particular, the computer vision-based identity recognition system 14 utilizes a multi-step approach to learn each appearance signature model 20 by constructing pairwise constraints for the unlabeled tracking samples, categorizing samples by solving a clustering problem with pairwise constraints, and learning a large-margin based discriminative signature model 20 for each data cluster to be maintained and carried over in an online mode within the identity pool.
  • The computer vision-based identity recognition system 14 includes a processor 22 to implement the multi-step approach to learn each appearance signature model 20. The processor 22 is configured to receive unlabeled tracking samples (e.g., noisy samples with spatiotemporal gaps) collected from the one or more person trackers 18. As described in greater detail below the tracking process of the tracking subsystem 12 and the signature model learning of the computer vision-based identity recognition system 14 may be coupled in a batch processing manner. In particular, the processor 22 may receive and buffer the unlabeled tracking samples in an online and asynchronous mode. For example, once a data buffer of the processor 22 for a time span reaches a threshold size based on the unlabeled tracking samples received, the processor 22 activates pairwise constraint generation, clustering, and learning processes. In certain embodiments, a portion of the received unlabeled tracking samples in the data buffer overlap from two successive time spans.
  • The processor 22 is configured to execute one or more algorithms (which may be stored as executable code in a memory and/or storage device 23 of the computer vision-based identity recognition system 14) to implement each step of the multi-step approach to learn each appearance signature model. In particular, such executable code may cause, when executed, the processor 22 to construct pairwise constraints between the received unlabeled tracking samples. For example, the pairwise constraints may represent that two samples must belong to one person (e.g., must-link constraint) and/or that two samples cannot belong to one person (e.g., cannot-link constraint) as described in greater detail below. In certain embodiments, each pairwise constraint may be weighed between the unlabeled tracking samples, for example, by estimating the likelihood that the constraint should be enforced. Also, the executable code may cause, when executed, the processor 22 to cluster (e.g., spectral cluster) the unlabeled tracking samples with weighted pairwise constraints. As described in greater detail below, the processor 22 may utilize a kernel learning based function for spectral clustering. In addition, the executable code may cause, when executed, the processor 22 to learn a respective appearance signature model 20 (e.g., discriminative appearance signature model). For example, the processor 22 may learn a new appearance signature model or update a maintained appearance signature model. As described in greater detail below, the processor 22 may utilize a support vector machine (SVM) (e.g., multi-class SVM) to learn the respective appearance signature model 20 for each cluster of unlabeled tracking samples. An incremental SVM may be utilized in updating the maintained appearance signature model. In certain embodiments, the computer vision-based identity recognition system 14 may include a special purpose processor configured to implement the steps described above.
  • FIG. 2 is a schematic of a three-dimensional geometry-based person detection system 24 for use with embodiments of the present disclosure (e.g., tracking subsystem 12). Determining where the specific agent (e.g., person) that is detected is achieved by characterizing physical X-Y measures in a 2D space using one camera sensor or by X-Y-Z in 3D using more than one camera. Cameras or optical sensors 16 of system 24 operate in a calibrated fashion, where the correspondence between the 3D world coordinate system 26 and 2D image spaces 28 can be established. Hence, a detailed 3D human body model may be crafted based on the physical dimension of a human and stands on a ground plane. The model's projection onto the image plane can be used to explain the extracted foreground regions and nominate a hypothesized set of ground plane locations that may be occupied by people 30.
  • Referring to FIG. 3, a schematic of a multi-camera, multi-person tracking and recognition system 10 is illustrated, according to an embodiment of the present disclosure. The multi-camera, multi-person tracking and recognition system 10 is useable with the tracking subsystem 12 and the computer vision-based identity recognition system 14 to differentiate agents (e.g., persons) in the monitored space. The multi-person tracking in one embodiment is performed in a centralized fashion once person detections from each camera sensor 18 are received and time ordered. Each newly detected person will be assigned with a new tracker 18 with a unique tracker ID that operates on the ground plane in a virtual semantic scene space. The tracker 18 may include 3D ground plane-based trackers maintained in real time. As mentioned above, the tracking samples collected from the trackers 18 are provided to the computer vision-based identity recognition system 14 to recognize one or more persons via online discriminative learning of appearance signature models 20. The trajectory filtering/smoothing of each tracker 18 is also performed on the ground plane in such a centralized fashion, enabling the system to provide a continuous meta-data stream in the form of person locations as a function of time (i.e., real-time).
  • FIG. 4 is a flow chart illustrating a method 32 for tracking and recognizing people using the tracking and recognition system 10 of FIG. 1. FIG. 5 illustrates the learning of appearance signature models described in method 32. The method 32 includes generating tracking samples from one or more person trackers 18 of the tracking subsystem 12 (block 34). The data buffer of the computer vision-based identity recognition system 14 (i.e., processor 22) receives unlabeled tracking samples 36 from the generated samples for a given time span. In one embodiment, the computer vision-based identity recognition system 14 receives the unlabeled tracking sample (block 38), via batch processing, in an online and asynchronous mode. As described in greater detail below, the data buffer must reach a threshold size before activation of the generation of pairwise constraints between the unlabeled tracking samples 36 and clusters, and online discriminative learning of a respective appearance signature model 20 for a respective cluster.
  • The method 32 also includes constructing pairwise constraints 40 between the unlabeled tracking samples 36 (block 42). Constructing pairwise constraints 40 enables analyzing the spatial and temporal properties of the tracking trajectories of the samples 36. Reference numeral 44 of FIG. 5 illustrates the samples with pairwise constraints 40. As depicted in FIG. 5, dots represent the unlabeled tracking samples 36. A dashed line represents a must-link constraint between the samples 36 and a solid line represents a cannot-link constraint between the samples 36. The relationship in a must-link constraint is that two samples 36 are cropped from the same tracker 18 (i.e., a single tracker). The relationship in a cannot-link constraint is that two samples 36 are from different trackers 18.
  • In addition, the method 32 includes generating weighted pairwise constraints 44 between the unlabeled tracking samples 36 based on the samples 36 with pairwise constraints 40 (block 46). Each constraint is weighed by estimating how likely the constraint should be enforced. For example, assuming that a total of N persons are being tracked in V camera views, Ti denotes the i-th tracker, iε{1, . . . , N}, and vε{1, . . . , V} representing the i-th camera view. At time t, xi t and xi t,v represent the 3D world coordinate of the tracker Ti and its projected 2D image position into camera view v. Also, si t represents the appearance sample collected from tracker Ti at time t, and si t,v represents its appearance component from camera view v.
  • A must-link relationship enables the measurement of the fidelity of the tracking system that the person tracker 18 is very unlikely to be getting confused with others. Thus, the further apart the tracker 18 is from all other trackers 18 in any other camera views, the more confident the system can be at dedicating this tracker 18 to follow the same person over time. Thus, a high likelihood value of such a must-link constraint should be assigned between the pairs of appearance samples collected from this tracker 18 over successive frames. Mathematically, the likelihood of the must-link constraint between collected appearance samples si t and si t-1 can be defined as:

  • C m(s i t ,s i t-1)=sig[minv(w i t,v ·w i t-1,v)],  (1)
  • where
  • sig ( x ) = 1 1 + - λ x
  • is a sigmoid function with parameter λ, and wi t,v represents the healthiness score of tracker Ti in view v at time t, which by definition measures the confidence of tracker Ti not getting confused with others in view v. The healthiness score wi t,v is defined as:
  • w i t , v = 1 - j 1 R i t , v R j t , v R i t , v , ( 2 )
  • with |Ri t,v| denoting the size of Ri t,v.
  • Similar to the must-link relationship, the cannot-link relationship measures how confident a tracking system is that two collected appearance samples for two trackers 18 are actually from two different persons. Thus, for one camera view, if the projections of two trackers 18 at time t into this view are apart from each other, it is very likely that the two samples are from two different persons. Mathematically, the cannot-link likelihood of any pair of samples collected at time t, si t and sj t can be defined as:

  • C m(s i t ,s j t)=sig[maxv ∥x i t,v −x j t,v2].  (3)
  • After generating the samples with the weighted pairwise constraints 44 (block 46), the method 32 includes generating clusters 48 via spectral clustering of the samples with weighted pairwise constraints 44 (block 50). Reference numeral 52 of FIG. 5 illustrates the clusters 48 of samples with weighted pairwise constraints 44. The processor 22 utilizes a kernel learning based function for the constrained spectral clustering. Given a set of n appearance samples S={s1, . . . , sn}, if we know the true binary k-cluster indicator matrix Y=(yij) of size n×k where yij=1 if si is in the j-th ground truth cluster and yij=0 otherwise, then n×n matrix Ψ=YYT depicts all possible pairwise constraints between any pair of samples, e.g. Ψi,j=1 if si and sj must be in the same cluster, and Ψi,j=0 if they cannot. While an ideal kernel is K=Ψ, in reality only some elements of Ψ are known through the observed pairwise constraints. Thus, a kernel matrix K that approximates the known pairwise constraints as close as possible is utilized, which leads to the following objective function,

  • minK ∥C∘(K−P)∥F 2  (4)
  • where ∘ denotes element-wise product, ∥•∥F denotes the Frobenius norm of a matrix and two n×n matrices P and C describe, respectively, the known pairwise constraints in Ψ and the confidence in these know pairwise constraints. More specifically each element of P, Pij, indicates whether sample si and sj belong to the same cluster (Pij=1) or not (Pij=0). Since sample si always belongs to the same cluster as itself, the dialog elements of P, Pii, is always 1. C is a symmetric matrix, where 0 <Cij=Cji≦1 represents the likelihood (or confidence) that the must-link or cannot-link constraint between samples si and sj is known. Thus, Cii=1, iε{1, . . . , n}, and Cij=0 if there is no constraint for samples si and Other values in C are computed by (1) or (3) during the sample collection process as previously described.
  • In addition, it is desired that the kernel matrix K preserves the proximity structure of the data, which is represented by the smooth eigenvectors of the nonnegative symmetric matrix, W=(Wij), with
  • W ij = v = 1 V w i v w j v φ ( s i v , s j v ) v = 1 V w i v w j v ( 5 )
  • where si v is the components of si in camera view v, wi v is the healthiness score of sample si in view v at the time when the sample si was collected (see (2)), and φ(x, y) represents any similarity measure between two samples. Such proximity requirement can be added as a constraint to the optimization problem, K=Σi=1 kβiuiui T, where β1≧ . . . ≧βk>0, ui is the eigenvector corresponding to the i-th smallest eigenvalue of the normalized graph Laplacian L=I−D−1/2WD−1/2, with I the identity matrix of size n×n and D=diag(d1, . . . , dn) with dij=1 nWij.
  • Therefore, kernel matrix K is learned by solving the following optimization problem:

  • minβ 1 , . . . ,β n ∥C∘(K−P)∥F 2  (6)

  • s.t. K=Σ i=1 kβi u i u i T,  (7)

  • β1≧ . . . ≧βk≧0.  (8)
  • It can be shown that this is essentially a quadratic programming problem and can be solved efficiently. Once the optimal kernel K is learned, the final k target clusters can be formed by applying k-means algorithms to the rows of (√{square root over (β1)}u1, . . . , √{square root over (βk)}uk).
  • In certain embodiments, the method 32 includes associating maintained discriminative signature models 54 (i.e., previously learnt model from prior time span) from the identity pool with one of the generated clusters 48 (block 56). Reference numeral 58 of FIG. 5 illustrates the association of the clusters 48 with the maintained discriminative signature models 54. The stars in FIG. 5 represent the maintained discriminative signature models 54.
  • After generating the clusters 48 (block 50), the method 32 includes learning appearance signature models 20 (e.g., discriminative appearance signature models) for each cluster 48 (block 60). When a respective cluster 48 is associated with a respective maintained discriminative signature model 54, learning the appearance signature model 20 (block 60) includes updating the appearance signature model 20 (i.e., maintained discriminative signature model 54) with the new samples 36. In certain embodiments, learning the appearance signature model 20 (block 60) includes learning a new appearance signature model 20 for inclusion within the identity pool. Reference numeral 62 of FIG. 5 illustrates the sample data 36 associated with the learned appearance signature models 20. The processor 22 utilizes a SVM to model each identity signature. Since the number of identities can vary over time, a multi-class SVM with clustered data is learned. Since the identity pool generated with the multi-class SVM evolves over time, the processor 22 utilizes an incremental SVM learning scheme. The incremental SVM learner continuously updates itself once new data (e.g., training samples 36) become available.
  • As mentioned above, the signature model learning and the multi-person tracking are coupled in a batch processing manner. FIG. 6 illustrates procedures for the online learning of appearance signature models 20 in a batch processing manner. Each puppet 64 (dark shading), 66 (light shading), 68 (cross-hatch) represents the true locations and identities of individuals in a spatiotemporal video volume 70. Each curve 72 (solid line), 74 (dashed line), 76 (dotted-dashed line) represents an estimated trajectory for each respective individual (i.e., puppet 64, 66, 68) over successive time spans 78 (i.e., t0, t1), 80 (i.e., t1, t2), 82 (i.e., t2, t3). The tracking process (e.g., via tracking subsystem 12) continuously generates appearance samples 36 that are extracted from the projected image regions of all maintained 3D trackers 18. As mentioned above, the discriminative signature learning process first receives and buffers these samples 36 in an online but asynchronous mode. In particular, given each time span (tn-1, tn) (e.g., time spans 78, 80, 82), appearance samples 36 are harvested into the data buffer. The illustrated appearance samples 36 represent the respective individuals (i.e., puppet 64, 66, 68). Once the data buffer collects enough data over a time span (i.e., reaches a threshold size), for example at t1, t2, and t3, the generation of the weighted pairwise constraints, the constrained clustering, and the incremental SVM-based signature model learning are activated to process the newly received data stream for that respective time span (e.g., time spans 78, 80, 82). This enables periodic activation of these processes.
  • To maintain consistent discriminative models over time spans within the identity pool 84, sample buffers of two successive spans have some overlap (i.e., clustering and discriminative learning re-use samples 36 in the overlapped span). For example, as depicted in FIG. 6, the samples 36 within overlapping region 86 of time spans 78 and 80 and overlapping region 88 of time spans 80 and 82 may be re-used in the clustering and learning processes for time spans 80 and 82, respectively. Carrying over some of the samples 36 between successive time spans ensures temporal continuity. The signature models 20 within the identity pool 84 are updated after each round of learning.
  • A technical contribution for the disclosed method and system is that it provides for a computer implemented system and method for trajectory-based tracking and recognition of multiple people.
  • The disclosed embodiments may be interfaced to and controlled by a computer readable storage medium having stored thereon a computer program. The computer readable storage medium may include a plurality of components such as one or more of electronic components, hardware components, and/or computer software components. These components may include one or more computer readable storage media that generally store instructions such as software, firmware and/or assembly language for performing one or more portions of one or more implementations or embodiments of an algorithm as discussed herein. These computer readable storage media are generally non-transitory and/or tangible. Examples of such a computer readable storage medium include a recordable data storage medium of a computer and/or storage device. The computer readable storage media may employ, for example, one or more of a magnetic, electrical, optical, biological, and/or atomic data storage medium. Further, such media may take the form of, for example, floppy disks, magnetic tapes, CD-ROMs, DVD-ROMs, hard disk drives, and/or solid-state or electronic memory. Other forms of non-transitory and/or tangible computer readable storage media not list may be employed with the disclosed embodiments.
  • A number of such components can be combined or divided in an implementation of a system. Further, such components may include a set and/or series of computer instructions written in or implemented with any of a number of programming languages, as will be appreciated by those skilled in the art. In addition, other forms of computer readable media such as a carrier wave may be employed to embody a computer data signal representing a sequence of instructions that when executed by one or more computers causes the one or more computers to perform one or more portions of one or more implementations or embodiments of a sequence.
  • Technical effects of the disclosed embodiments include systems and methods for trajectory-based tracking and recognition of multiple persons. In particular, the disclosed embodiments utilize a computer vision-based identity recognition system 14 to recognize one or more persons without a priori knowledge of the respective persons via an online discriminative learning of appearance signature models 20 of the respective persons using noisy, unlabeled tracking samples 36. The computer vision-based identity recognition system 14 utilizes a multi-step approach in learning the appearance signature models 20 (e.g., new or updated models) that includes constructing pairwise constraints between the samples 36, categorizing the samples 36 into clusters 48, and learning a large-margin based discriminative signature model 20 for each respective cluster 48.
  • This written description uses examples, including the best mode, to enable any person skilled in the art to practice the disclosed embodiments, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims.

Claims (25)

1. A tracking and recognition system, comprising:
a computer vision-based identity recognition system configured to recognize one or more persons, without a priori knowledge of the respective persons, via an online discriminative learning of appearance signature models of the respective persons, wherein the computer vision-based identity recognition system comprises:
a memory physically encoding one or more routines, which when executed, cause the performance of constructing pairwise constraints between unlabeled tracking samples; and
a processor configured to receive unlabeled tracking samples collected from one or more person trackers and to execute the routines stored in the memory via one or more algorithms to construct the pairwise constraints between the unlabeled tracking samples.
2. The system of claim 1, wherein the pairwise constraints comprise a must-link constraint between two tracking samples from a single tracker.
3. The system of claim 1, wherein the pairwise constraints comprise a cannot-link constraint between two tracking samples from different trackers.
4. The system of claim 1, wherein the routines, when executed, cause the performance of weighing each pairwise constraint between the unlabeled tracking samples.
5. The system of claim 4, wherein the routines, when executed, cause the performance of spectral clustering of the unlabeled tracking samples with weighted pairwise constraints.
6. The system of claim 5, wherein the routines, when executed, utilize a kernel learning based function to spectral cluster the unlabeled tracking samples with weighted pairwise constraints.
7. The system of claim 5, wherein the routines, when executed, cause the performance of learning a respective appearance signature model for each cluster of unlabeled tracking samples.
8. The system of claim 7, wherein the respective appearance signature model comprises a new appearance signature model.
9. The system of claim 7, wherein the respective appearance signature model comprises an updated appearance signature model.
10. The system of claim 7, wherein the routines, when executed, utilize a support vector machine to learn the respective appearance signature model for each cluster of unlabeled tracking samples.
11. The system of claim 1, wherein the processor is configured to receive and buffer the unlabeled tracking samples in an online and asynchronous mode.
12. The system of claim 1, wherein the one or more person trackers comprise 3D ground plane-based trackers maintained in real-time.
13. The system of claim 1, wherein the unlabeled tracking samples comprise noisy samples having spatiotemporal gaps.
14. A method for tracking and recognition of people, comprising:
generating tracking samples from one or more person trackers of a tracking system;
receiving unlabeled tracking samples from the generated tracking samples into a data buffer for a time span;
generating weighted pairwise constraints between the unlabeled tracking samples;
generating clusters via spectral clustering of the unlabeled tracking samples with weighted pairwise constraints; and
learning in an online and discriminative manner a respective appearance signature model for each respective cluster.
15. The method of claim 14, wherein learning the respective appearance signature model for each respective cluster comprises learning in an online and discriminative manner.
16. The method of claim 14, wherein the one or more person trackers comprise 3D ground plane-based trackers maintained in real-time, and generating tracking samples comprises extracting projected image regions from the 3D ground plane-based trackers.
17. The method of claim 14, receiving the unlabeled tracking samples, via batch processing, in an online and asynchronous mode.
18. The method of claim 14, wherein the data buffer reaching a threshold size from the received unlabeled tracking samples activates the generation of the weighted pairwise constraints between the unlabeled tracking samples and the clusters and the online discriminative learning of the respective appearance signature model for each respective cluster.
19. The method of claim 14, wherein a portion of the received unlabeled tracking samples in the data buffer overlap from two successive time spans.
20. The method of claim 14, wherein the weighted pairwise constraints comprise a must-link constraint between two tracking samples from a single tracker and a cannot-link constraint between two tracking samples from different trackers.
21. The method of claim 14, wherein the respective appearance signature model comprises a new appearance signature model or an updated appearance signature model.
22. A non-transitory, computer-readable media comprising one or more routines which executed by at least one processor causes acts to be performed comprising:
receiving unlabeled tracking samples collected from one or more person trackers;
generating weighted pairwise constraints between the unlabeled tracking samples;
generating clusters via spectral clustering of the unlabeled tracking samples with weighted pairwise constraints; and
learning in an online and discriminative manner a respective appearance signature model for each respective cluster.
23. The non-transitory, computer-readable media of claim 22, wherein the weighted pairwise constraints comprise a must-link constraint between two tracking samples from a single tracker and a cannot-link constraint between two tracking samples from different trackers.
24. The non-transitory, computer readable media of claim 23, wherein processor utilizes a multi-class support vector machine to learn the respective appearance signature model for each respective cluster.
25. The non-transitory, computer readable media of claim 24, wherein the multi-class support vector machine comprises an incremental support vector machine that continuously updates itself upon receiving new data.
US13/306,783 2011-11-29 2011-11-29 System and method for tracking and recognizing people Abandoned US20130136298A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US13/306,783 US20130136298A1 (en) 2011-11-29 2011-11-29 System and method for tracking and recognizing people
EP12784193.0A EP2786313A1 (en) 2011-11-29 2012-10-26 System and method for tracking and recognizing people
PCT/US2012/062020 WO2013081750A1 (en) 2011-11-29 2012-10-26 System and method for tracking and recognizing people
US15/002,672 US9798923B2 (en) 2011-11-29 2016-01-21 System and method for tracking and recognizing people

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/306,783 US20130136298A1 (en) 2011-11-29 2011-11-29 System and method for tracking and recognizing people

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/002,672 Division US9798923B2 (en) 2011-11-29 2016-01-21 System and method for tracking and recognizing people

Publications (1)

Publication Number Publication Date
US20130136298A1 true US20130136298A1 (en) 2013-05-30

Family

ID=47148988

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/306,783 Abandoned US20130136298A1 (en) 2011-11-29 2011-11-29 System and method for tracking and recognizing people
US15/002,672 Active US9798923B2 (en) 2011-11-29 2016-01-21 System and method for tracking and recognizing people

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/002,672 Active US9798923B2 (en) 2011-11-29 2016-01-21 System and method for tracking and recognizing people

Country Status (3)

Country Link
US (2) US20130136298A1 (en)
EP (1) EP2786313A1 (en)
WO (1) WO2013081750A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2927290A1 (en) 2014-03-31 2015-10-07 Shin-Etsu Chemical Co., Ltd. Fluorochemical coating composition and article treated therewith
WO2015089040A3 (en) * 2013-12-12 2015-11-12 Evernote Corporation User discovery via digital id and face recognition
US20160140386A1 (en) * 2011-11-29 2016-05-19 General Electric Company System and method for tracking and recognizing people
JP2016174252A (en) * 2015-03-16 2016-09-29 キヤノン株式会社 Image processing apparatus, image processing system, image processing method and computer program
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600896B1 (en) * 2015-11-04 2017-03-21 Mitsubishi Electric Research Laboratories, Inc. Method and system for segmenting pedestrian flows in videos
CN106060778A (en) * 2016-06-30 2016-10-26 北京奇虎科技有限公司 Target location determination method and device
CN107315994B (en) * 2017-05-12 2020-08-18 长安大学 Clustering method based on Spectral Clustering space trajectory
CN111079775B (en) * 2018-10-18 2022-09-20 中国科学院长春光学精密机械与物理研究所 Real-time tracking method for combined regional constraint learning
US11600113B2 (en) * 2019-11-13 2023-03-07 Nec Corporation Deep face recognition based on clustering over unlabeled face data
US11497418B2 (en) 2020-02-05 2022-11-15 General Electric Company System and method for neuroactivity detection in infants
CN111310817B (en) * 2020-02-10 2022-02-11 深圳大学 Spectral clustering method, device, system, computer equipment and storage medium

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5764789A (en) * 1994-11-28 1998-06-09 Smarttouch, Llc Tokenless biometric ATM access system
US6353678B1 (en) * 1999-07-14 2002-03-05 Sarnoff Corporation Method and apparatus for detecting independent motion in three-dimensional scenes
US20030163289A1 (en) * 2000-04-11 2003-08-28 Whelan Michael David Clive Object monitoring system
US20040017930A1 (en) * 2002-07-19 2004-01-29 Samsung Electronics Co., Ltd. System and method for detecting and tracking a plurality of faces in real time by integrating visual ques
US6698021B1 (en) * 1999-10-12 2004-02-24 Vigilos, Inc. System and method for remote control of surveillance devices
US20060018518A1 (en) * 2002-12-12 2006-01-26 Martin Fritzsche Method and device for determining the three-dimension position of passengers of a motor car
US7023469B1 (en) * 1998-04-30 2006-04-04 Texas Instruments Incorporated Automatic video monitoring system which selectively saves information
US20060158307A1 (en) * 2005-01-13 2006-07-20 Samsung Electronics Co., Ltd. System and method for face recognition
US20060187305A1 (en) * 2002-07-01 2006-08-24 Trivedi Mohan M Digital processing of video images
US7221809B2 (en) * 2001-12-17 2007-05-22 Genex Technologies, Inc. Face recognition system and method
US7227976B1 (en) * 2002-07-08 2007-06-05 Videomining Corporation Method and system for real-time facial image enhancement
US7227569B2 (en) * 2002-05-07 2007-06-05 Matsushita Electric Industrial Co., Ltd. Surveillance system and a surveillance camera
US20070194884A1 (en) * 2004-03-17 2007-08-23 Sagem Defense Securite Person identification control method and system for implementing same
US20070237364A1 (en) * 2006-03-31 2007-10-11 Fuji Photo Film Co., Ltd. Method and apparatus for context-aided human identification
US20080114800A1 (en) * 2005-07-15 2008-05-15 Fetch Technologies, Inc. Method and system for automatically extracting data from web sites
US20090037458A1 (en) * 2006-01-03 2009-02-05 France Telecom Assistance Method and Device for Building The Aborescence of an Electronic Document Group
US20090310862A1 (en) * 2008-06-13 2009-12-17 Lockheed Martin Corporation Method and system for crowd segmentation
US20100245567A1 (en) * 2009-03-27 2010-09-30 General Electric Company System, method and program product for camera-based discovery of social networks
US7806604B2 (en) * 2005-10-20 2010-10-05 Honeywell International Inc. Face detection and tracking in a wide field of view
US20100332425A1 (en) * 2009-06-30 2010-12-30 Cuneyt Oncel Tuzel Method for Clustering Samples with Weakly Supervised Kernel Mean Shift Matrices
US20110302163A1 (en) * 2010-06-02 2011-12-08 Cbs Interactive Inc. System and method for clustering content according to similarity
US20120321143A1 (en) * 2011-06-17 2012-12-20 Microsoft Corporation Broadcast Identifier Enhanced Facial Recognition of Images
US20130148898A1 (en) * 2011-12-09 2013-06-13 Viewdle Inc. Clustering objects detected in video
US20140226861A1 (en) * 2011-10-31 2014-08-14 Tong Zhang Person-based Video Summarization by Tracking and Clustering Temporal Face Sequences

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050254546A1 (en) * 2004-05-12 2005-11-17 General Electric Company System and method for segmenting crowded environments into individual objects
US8154600B2 (en) * 2007-04-20 2012-04-10 Utc Fire & Security Americas Corporation, Inc. Method and system for distributed multiple target tracking
JP5385759B2 (en) * 2009-10-30 2014-01-08 キヤノン株式会社 Image processing apparatus and image processing method
US20130136298A1 (en) * 2011-11-29 2013-05-30 General Electric Company System and method for tracking and recognizing people

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5764789A (en) * 1994-11-28 1998-06-09 Smarttouch, Llc Tokenless biometric ATM access system
US7023469B1 (en) * 1998-04-30 2006-04-04 Texas Instruments Incorporated Automatic video monitoring system which selectively saves information
US6353678B1 (en) * 1999-07-14 2002-03-05 Sarnoff Corporation Method and apparatus for detecting independent motion in three-dimensional scenes
US6698021B1 (en) * 1999-10-12 2004-02-24 Vigilos, Inc. System and method for remote control of surveillance devices
US20030163289A1 (en) * 2000-04-11 2003-08-28 Whelan Michael David Clive Object monitoring system
US7221809B2 (en) * 2001-12-17 2007-05-22 Genex Technologies, Inc. Face recognition system and method
US7227569B2 (en) * 2002-05-07 2007-06-05 Matsushita Electric Industrial Co., Ltd. Surveillance system and a surveillance camera
US20060187305A1 (en) * 2002-07-01 2006-08-24 Trivedi Mohan M Digital processing of video images
US7227976B1 (en) * 2002-07-08 2007-06-05 Videomining Corporation Method and system for real-time facial image enhancement
US20040017930A1 (en) * 2002-07-19 2004-01-29 Samsung Electronics Co., Ltd. System and method for detecting and tracking a plurality of faces in real time by integrating visual ques
US20060018518A1 (en) * 2002-12-12 2006-01-26 Martin Fritzsche Method and device for determining the three-dimension position of passengers of a motor car
US20070194884A1 (en) * 2004-03-17 2007-08-23 Sagem Defense Securite Person identification control method and system for implementing same
US20060158307A1 (en) * 2005-01-13 2006-07-20 Samsung Electronics Co., Ltd. System and method for face recognition
US20080114800A1 (en) * 2005-07-15 2008-05-15 Fetch Technologies, Inc. Method and system for automatically extracting data from web sites
US7806604B2 (en) * 2005-10-20 2010-10-05 Honeywell International Inc. Face detection and tracking in a wide field of view
US20090037458A1 (en) * 2006-01-03 2009-02-05 France Telecom Assistance Method and Device for Building The Aborescence of an Electronic Document Group
US20070237364A1 (en) * 2006-03-31 2007-10-11 Fuji Photo Film Co., Ltd. Method and apparatus for context-aided human identification
US20090310862A1 (en) * 2008-06-13 2009-12-17 Lockheed Martin Corporation Method and system for crowd segmentation
US20100245567A1 (en) * 2009-03-27 2010-09-30 General Electric Company System, method and program product for camera-based discovery of social networks
US8320617B2 (en) * 2009-03-27 2012-11-27 Utc Fire & Security Americas Corporation, Inc. System, method and program product for camera-based discovery of social networks
US20100332425A1 (en) * 2009-06-30 2010-12-30 Cuneyt Oncel Tuzel Method for Clustering Samples with Weakly Supervised Kernel Mean Shift Matrices
US20110302163A1 (en) * 2010-06-02 2011-12-08 Cbs Interactive Inc. System and method for clustering content according to similarity
US20120321143A1 (en) * 2011-06-17 2012-12-20 Microsoft Corporation Broadcast Identifier Enhanced Facial Recognition of Images
US20140226861A1 (en) * 2011-10-31 2014-08-14 Tong Zhang Person-based Video Summarization by Tracking and Clustering Temporal Face Sequences
US20130148898A1 (en) * 2011-12-09 2013-06-13 Viewdle Inc. Clustering objects detected in video

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
H. Wang et al., "FaceTrack: Tracking and Summarizing Faces from Compressed Video", 1999, SPIE Photonics East, Conference on Multimedia Storage and Archiving Systems, pg. 1-12. *
M. Hirzer , P. Roth , M. Kostinger and H. Bischof "Relaxed pairwise learned metric for person re-identification", Proc. IEEE Eur. Conf. Comput. Vision, pp.780 -793 2012 *
T. Moeslund, A. Hilton and V. Kruger, "A survey of advances in vision-based human motion capture and analysis", 2006, Computer Vision and Image Understanding 104 (2006), pg. 90-126. *
Y. Chang, R. Yan, D. Chen, and J. Yang, "People identification with limited labels in privacy-protected video," In Proc. IEEE Int. Conf. Multimedia and Expo, pp. 1005-1008, July 2006. *
Yan, R. et al. "A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification", 2004, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pg. 1-8. *
Yan, R., Zhang, J., Yang, J., & Hauptmann, A. G. (2004). "A discriminative learning framework with pairwise constraints for video object classification". In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 284-291). *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160140386A1 (en) * 2011-11-29 2016-05-19 General Electric Company System and method for tracking and recognizing people
US9798923B2 (en) * 2011-11-29 2017-10-24 General Electric Company System and method for tracking and recognizing people
WO2015089040A3 (en) * 2013-12-12 2015-11-12 Evernote Corporation User discovery via digital id and face recognition
US9773162B2 (en) 2013-12-12 2017-09-26 Evernote Corporation User discovery via digital ID and face recognition
EP2927290A1 (en) 2014-03-31 2015-10-07 Shin-Etsu Chemical Co., Ltd. Fluorochemical coating composition and article treated therewith
JP2016174252A (en) * 2015-03-16 2016-09-29 キヤノン株式会社 Image processing apparatus, image processing system, image processing method and computer program
CN107431786A (en) * 2015-03-16 2017-12-01 佳能株式会社 Image processing equipment, image processing system, image processing method and computer program
EP3272117A4 (en) * 2015-03-16 2018-12-12 C/o Canon Kabushiki Kaisha Image processing apparatus, image processing system, method for image processing, and computer program
US10572736B2 (en) 2015-03-16 2020-02-25 Canon Kabushiki Kaisha Image processing apparatus, image processing system, method for image processing, and computer program
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Also Published As

Publication number Publication date
US9798923B2 (en) 2017-10-24
WO2013081750A1 (en) 2013-06-06
US20160140386A1 (en) 2016-05-19
EP2786313A1 (en) 2014-10-08

Similar Documents

Publication Publication Date Title
US9798923B2 (en) System and method for tracking and recognizing people
US10248860B2 (en) System and method for object re-identification
CN108470332B (en) Multi-target tracking method and device
Braham et al. Deep background subtraction with scene-specific convolutional neural networks
US10402655B2 (en) System and method for visual event description and event analysis
Wu et al. Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes
US8050453B2 (en) Robust object tracking system
Kumar et al. An intelligent decision computing paradigm for crowd monitoring in the smart city
US20120219186A1 (en) Continuous Linear Dynamic Systems
KR20160096460A (en) Recognition system based on deep learning including a plurality of classfier and control method thereof
JP2015167017A (en) Self-learning object detectors for unlabeled videos using multi-task learning
CN110175528B (en) Human body tracking method and device, computer equipment and readable medium
Serpush et al. Complex human action recognition using a hierarchical feature reduction and deep learning-based method
WO2009152509A1 (en) Method and system for crowd segmentation
CN108038515A (en) Unsupervised multi-target detection tracking and its storage device and camera device
Acharya et al. Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network
Werner et al. DeepMoVIPS: Visual indoor positioning using transfer learning
Farnoosh et al. DeepPBM: deep probabilistic background model estimation from video sequences
CN107194950B (en) Multi-person tracking method based on slow feature analysis
Saini et al. An efficient approach for trajectory classification using FCM and SVM
Serpush et al. Complex human action recognition in live videos using hybrid FR-DL method
Wu et al. Simultaneous eye tracking and blink detection with interactive particle filters
Rahimi et al. Uav sensor fusion with latent-dynamic conditional random fields in coronal plane estimation
Antic et al. Less is more: Video trimming for action recognition
Suresh et al. Online learning neural tracker

Legal Events

Date Code Title Description
AS Assignment

Owner name: GENERAL ELECTRIC COMPANY, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YU, TING;TU, PETER HENRY;GAO, DASHAN;AND OTHERS;SIGNING DATES FROM 20111128 TO 20111129;REEL/FRAME:027297/0472

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION