WO1993018483A1 - Method and apparatus for image recognition - Google Patents

Method and apparatus for image recognition Download PDF

Info

Publication number
WO1993018483A1
WO1993018483A1 PCT/US1993/001843 US9301843W WO9318483A1 WO 1993018483 A1 WO1993018483 A1 WO 1993018483A1 US 9301843 W US9301843 W US 9301843W WO 9318483 A1 WO9318483 A1 WO 9318483A1
Authority
WO
WIPO (PCT)
Prior art keywords
level
hidden markov
dimensional
state
pixel
Prior art date
Application number
PCT/US1993/001843
Other languages
French (fr)
Inventor
Esther Levin
Roberto Pieraccini
Original Assignee
American Telephone And Telegraph Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by American Telephone And Telegraph Company filed Critical American Telephone And Telegraph Company
Publication of WO1993018483A1 publication Critical patent/WO1993018483A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models

Definitions

  • the present invention relates generally to the field of image recognition, and specifically to pattern based image recognition.
  • Signal recognition systems operate to label, classify, or otherwise recognize an unknown signal. Signal recognition may be performed by comparing characteristics or features of unknown signals to those of known signals.
  • Features or characteristics of known signals are determined by a process known as training. Through training, one or more samples of known signals are examined and their features or characteristics recorded as reference patterns in a database of a signal recognizer.
  • a signal recognizer extracts features from the signal to characterize it.
  • the features of the unknown signal are referred to as the test pattern.
  • the recognizer compares each reference pattern in the database to the test pattern of the unknown signal.
  • a scoring technique is used to provide a relative measure of how well each reference pattern matches the test pattern.
  • the unknown signal is recognized as the reference pattern which most closely matches the unknown signal.
  • DTW dynamic time warping
  • DTW provides an optimal time alignment between reference and test patterns by locally shrinking or expanding the time axis of one pattern until that pattern optimally matches the other.
  • DTW scoring reflects an overall distance between two optimally aligned reference and test patterns. The reference pattern having the lowest score (i.e, the shortest distance between itself and the test pattern) identifies the test pattern.
  • HMM recognizers are trained using both first and second order statistics (i.e., means and variances) of known signal samples to build reference patterns. Each reference pattern is an N-state statistical model incorporating these means and variances.
  • An HMM is characterized by a state transition matrix, A (which provides a statistical description of how new states may be reached from old states), and an observation probability matrix, B (which provides a description of which spectral features are likely to be observed at a given state). Scoring of a test pattern reflects the probability of the sequence of features in the pattern given a model (i.e., given a reference pattern). Scoring across all models may be provided by conventional dynamic programming techniques, such as Viterbi scoring well known in the art. The HMM which indicates the highest probability of the sequence of features in the test pattern identifies the test pattern.
  • Pattern-based signal recognition techniques such as DTW and HMMs have been applied in the past to the one-dimensional problem of speech recognition, where unknown signals to be recognized are speech, signals and the one dimension is time. It has been a problem of some interest to provide for multi- dimensional signals, such as two-dimensional image signals, a set of general tools analogous to those available for one-dimensional signal recognition.
  • the present invention provides a method and apparatus for multi-dimensional signal recognition.
  • the invention accomplishes recognition through multi-dimensional reference pattern scoring techniques.
  • An illustrative embodiment of the present invention provides a two-dimensional image recognizer for optical character recognition.
  • the recognizer is based on planar hidden Markov models (PHMMs) with constrained transition probabilities.
  • PHMMs planar hidden Markov models
  • Each PHMM comprises a one-dimensional shape-level hidden Markov model and represents a single image reference pattern.
  • a shape-level HMM comprises one or more pixel-level hidden Markov models, each of which represents a localized portion of a shape-level HMM.
  • the embodiment operates to determine, for a given PHMM and a given sequence of pixels in an unknown character image, a local Viterbi score for each of one or more pixel-level HMMs in a shape-level HMM.
  • the embodiment operates to determine a global Viterbi score for a shape-level HMM based on the plurality of local Viterbi scores. Character images are recognized based on the global Viterbi scores. A global Viterbi score is provided for each PHMM (i.e., each shape-level HMM) reference pattern.
  • Figure 1 presents illustrative groupings of pixel-level hidden Markov model states.
  • Figure 2 presents the illustrative groupings of pixel-level hidden Markov model states from Figure 1 associated with the shape-level states of a shape-level hidden Markov model.
  • Figure 3 presents a shape-level hidden Markov model comprising the shape-level states presented in Figure 2.
  • Figure 4 presents an illustrative optical character recognition -system according to the present invention.
  • Figure 5 presents components of the two-dimensional pattern matcher presented in Figure 4.
  • Figure 6 presents an image of a scanned character, T, comprising a plurality of linear pixel sequences.
  • An illustrative optical character recognition system includes a plurality of two-dimensional (or planar) hidden Markov models to represent images to be recognized.
  • Each planar hidden Markov model is defined by:
  • Each two-dimensional hidden Markov model may be represented as a set of shape-level states .
  • Each shape-level state, G j corresponds
  • N G The number of groups of shape-level states, is a polynomial function of the number of pixel-level states X ⁇ Y.
  • transition probabilities should fulfill the two following conditions:
  • FIG. 1 An example ofthe application of these conditions (a-d) is presented in Figures 1 - 3.
  • seven shape-level states, G 1 to G 7 are shown with reference to a 4 ⁇ 4 matrix of pixel-level states.
  • each shapelevel state, G j corresponds to a one-dimensional pixel-level hidden Markov model comprising four pixel-level states.
  • each shape-level state, G j is but one state in a shape-level hidden Markov model, as shown in Figure 3.
  • the arrows between states in the HMM of Figure 3 indicate legal state transitions within the constraints of conditions c and d, above.
  • transition probability A (i,j),(k,l),(m,n) can be represented as:
  • ⁇ rp P( s(x,y) ⁇ G P
  • (5) defines the transition probabilities between pixel-level states in a one-dimensional pixel-level HMM (such as, e.g., any of those appearing in Figure 2, and (6) defines the transition probabilities between shape-level states in a one-dimensional shape-level HMM.
  • a general two-dimensional (or planar) HMM for use in image recognition is provided. Note that the pixel-level state observation probabilities are not affected by the grouping of states.
  • Illustrative embodiment of the present invention is presented as comprising individual functional blocks (including functional blocks labeled as "processors"). The functions these blocks represent may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software. (Use of the term "processor” should not be construed to refer exclusively to hardware capable of executing software.) Illustrative embodiments may comprise digital signal processor (DSP) hardware, such as the AT&T DSP 16 or DSP32C, and software performing the operations discussed below. Very large scale integration (VLSI) hardware embodiments of the present invention, as well as hybrid DSP/VLSI embodiments, may also be provided.
  • DSP digital signal processor
  • VLSI Very large scale integration
  • Figure 4 presents an illustrative optical character recognition system according to the present invention.
  • the system comprises a conventional image scanner 10, a two-dimensional pattern matcher 20, control switches R and T, a decision processor 30, a state image memory 35, a probability estimation processor 45, and a planar hidden Markov model memory 40.
  • the conventional image scanner 10 receives a physical image of a character and scans it to generate as output a matrix signal, g(x,y). This signal represents the intensity of the physical image at each pixel location, x,y, within the image.
  • PHMMs developed through a training process discussed below, are stored in the PHMM memory 40. Each PHMM in memory 40 represents a character to be recognized in an optical character application.
  • the matrix signal for the image, g(x,y), is processed by the two-dimensional pattern matcher 20 to generate, for each PHMM, a global Viterbi score 1 resulting from the comparison of the PHMM and the signal g(x,y).
  • a state image, s (x,y ), is also generated to represent the index of the PHMM state corresponding to pixel, x,y.
  • the two-dimensional pattern matcher 20 is presented in Figure 5.
  • Pattern matcher 20 comprises a windowing processor 5, a pixel- level Viterbi processor 6, a local-level score memory 7, and a shape-level Viterbi processor 8.
  • the windowing processor 5 receives the matrix signal, g(x,y), and extracts therefrom successive sequences of pixels, L 1 , L 2 , . . . , L M . As shown illustratively in the example of Figure 6, these sequences may be linear sequences of pixels.
  • the pixel-level Viterbi processor 6 determines for each pixel sequence Li and each group G j (comprising a pixel-level HMM) a local state score d ij . This is done by computing the Viterbi score of the linear sequence of pixels, L i , with the pixel-level linear HMM, G j .
  • An ⁇ G ⁇ M matrix of the local-level state scores is stored in memory 7.
  • the shape-level Viterbi processor 8 computes a global score for a given PHMM as the Viterbi score of a linear shape-level hidden Markov model using the sequence L i as the observation sequence and dy as the local state score for each shape-level state G j and each observation L i . Also, the state image, s (x,y), is computed using conventional backtracking methods for hidden Markov models.
  • the operations performed by the two-dimensional pattern matcher 20 are repeated for each PHMM in the PHMM memory 40.
  • recognition mode i.e., when switch R is closed and switch T is open
  • the decision processor 30 recognizes the scanned image as the character corresponding to the PHMM with the highest score, l h .
  • switch T is closed and switch R is open.
  • the training mode operation of the embodiment involves conventional Viterbi training of a linear hidden Markov model.
  • Known samples of all characters to be recognized are provided sequentially as input to scanner 10.
  • a state image s (x,y) is determined by the two-dimensional pattern matcher 20 as described above, using only the PHMM corresponding to the known sample. All known samples for the character are processed in this fashion, with each state image s (x,y ) stored in state image memory 35.
  • the probability estimation processor 45 estimates new transition and observation probabilities for the PHMM (as frequency counts) in conventional fashion taking into account the conditions c and d described above for the state transition probabilities.
  • Template matching is one of the many possible ways to solve this problem. According to this approach, each class is represented by a template (a reference pattern), and a new pattern is classified by selecting the class C k for which the distance D k between the new pattern and the class representative template is minimal, i.e.
  • the difficulty in the pattern recognition task arises because of the intra-class variability of the patterns. Methods have to be developed to reduce such variability, thereby building up some invariance properties for the classifier. This intra-class variability is sometimes caused by non-linear distortions during the generation process of the patterns. In speech recognition this problem is known as the 'time alignment' problem, and its source is the temporal variability of the spoken utterances.
  • the DTW procedure described below attempts to reduce the magnitude of this problem. The purpose of the procedure is to time-align the test and the reference patterns by stretching and contracting the test pattern to optimally match it to the reference, by minimizing a measure of the spectral distance D k between the time-aligned patterns, temporal distortions.
  • Z + is the set of positive integers
  • R" is the n-dimensional real space.
  • the goal of DTW is to find a mapping function that maps the test time scale to the reference time scale, such that the distortion
  • the procedure of finding the optimal mapping has an exponential complexity since there are O(T T R ) possible mappings in f. These mappings are shown as a set of paths in a time-time grid (Fig.1), where each path is a monotonically increasing curve that starts at point and ends at poin .
  • the DTW algorithm finds the optimal alignment curve among all possible paths in polynomial time, using the dynamical programming optimality principle.
  • the optimality principle is based on the fact that the optimal alignment curve (i.e., the one with the minimal distortion along the path) connecting point A to point B through point C is found among all curves that optimally connect A and C. This basic principle leads to an efficient iterative procedure for finding the optimal curve connecting A and B.
  • G ⁇ g(x,y):x ⁇ Z + ,y ⁇ Z + , (x,y) ⁇ L X,Y , g( ⁇ , ⁇ ) ⁇ G ⁇ R" ⁇ .
  • an (x,y) pair describes pixel location by horizontal and vertical coordinates
  • L N,M denotes a rectangular discrete lattice, i.e., a set of pixels .
  • Figure 2 shows a simple example of G R and G. This
  • planar warping is to map the test lattice to the reference one through a mapping function F: such that the distortion
  • ⁇ 0 is the empty set.
  • is the empty set.
  • ⁇ n are pixels of the n-th row.
  • to be a set of admissible warping sequences , where ⁇ i is a sequence of X reference pixels that meets the
  • This definition of the set ⁇ depends on the particular choice of the set ⁇ and the constraints (11a),(11b) and (12a). ⁇ is constructed to contain all possible warping sequences of each ⁇ n that satisfy the constraints.
  • N ⁇ O((X R Y R ) X ).
  • Each sequence ⁇ i ⁇ ⁇ determines a subset ⁇ i ⁇ of sequences
  • ⁇ i a candidate warping sequence for the n-th row of the test image
  • the preceding (n-1)-th row can be matched only with a warping sequence in ⁇ i in order to meet the vertical monotonicity condition (12b).
  • Figure 3 shows the concepts defined above, applied to the example of figure 2.
  • the set ⁇ is shown.
  • the set ⁇ includes in this case 16 sequences, shown in figure 3b.
  • the corresponding ⁇ i for each ⁇ i ⁇ ⁇ is also given.
  • F i,n a set of sub-mapping functions from the n-th test rectangle ⁇ n , 1 ⁇ n ⁇ N T , that satisfy the monotonicity conditions (12a) and (12b), boundary conditions (11a) , (11b) and (11c), and match the n-th row of the test ⁇ n with ⁇ i : for any
  • the optimal mapping F j,n is
  • the optimal warping of the n-th test rectangle has to be found for every warping sequence ⁇ j ⁇ ⁇ , thus requiring N ⁇ X operations.
  • the idea here is to limit the number of admissible warping sequences in ⁇ , or, equivalently, constrain the class of admissible mappings F in such a way that an optimal solution to the constrained problem can be found in polynomial time.
  • the additional constraints used are not arbitrary, but instead reflect the geometric properties of the specific set of images being compared. For example, we can constrain the possible mappings to be of the form
  • N ⁇ O(X X R Y)
  • the admissible warping sequences ⁇ ⁇ are naturally grouped into Y R subsets.
  • the second term of (25), is the distortion
  • the restricted formulation of the problem should reflect the geometry of the application.
  • the restriction (23) discussed here is only one among many possibilities.
  • the criterion that yields the minimal classification error is maximum a posteriori probability decoding: an unclassified pattern G is assigned to the class C k according to
  • G) can be rewritten as where P(G) is independent of C n and therefore can be ignored.
  • the prior class probability P(C n ) is generally attributed to higher level knowledge (e.g., syntactic knowledge). If such knowledge is not readily available, we usually assume a uniform class probability . Then the classification problem is that of maximizing
  • the i-th state, i ⁇ s is characterized by its probability distribution over G, P i (g). At each time t only one of the states is active, emitting the observable g(t).
  • a state sequence S defines a mapping from the observation time scale 1 ⁇ t ⁇ T to the active state at time t, 1 ⁇ s(t) ⁇ T R , that corresponds to the reference time scale 7 in the DTW approach.
  • the first term in provides a distortion measure, as in (2). For example, for a
  • Equation (3) and (4) of DTW A particular case of this model, called a left-to-right HMM, is especially useful for speech modeling and recognition.
  • the minimization (35) is, in effect, performed only among those state sequences that correspond to mappings satisfying conditions that are equivalent to (3) and (4).
  • the only difference between the minimization problem defined by (2), (3) and (4) and this one is the non-zero penalty term in (35).
  • the optimality principle can be applied to the minimization (35) in a manner similar to DTW as described in section 2.1.2.
  • Each state in s is a stochastic source characterized by its probability density over the space of observations g ⁇ G. It is convenient to think of the states of the model as being located on a rectangular lattice
  • the distortion measure D of DPW The second term, C, generalizes constraints (11), (12), and (23). In particular, by restricting the PHMM parameter values to be
  • the active state matrix S that minimizes (38) must satisfy conditions equivalent to (11), (12a) and (12c).
  • the PHMM constrained by (40) can be referred to as the left-to-right bottom-up PHMM, since it doesn't allow for "foldovers" in the state images.
  • the other boundary conditions (12b) and (12d) can be imposed on ⁇ by restricting the values of s (x,Y), 1 ⁇ x ⁇ X and s (X,y), 1 ⁇ y ⁇ Y,
  • the number of subsets, N G should be polynomial in the dimensions of the model X R , Y R .
  • the probabilities ⁇ A (i,j),(k,l),(m,n) ⁇ should satisfy the two following constraints with respect to such grouping: A (i,j),(k,l),(m,n) ⁇ 0 only if there exists p, 1 ⁇ p ⁇ N G, such that (i,j), (m,n) ⁇ ⁇ p . (42)
  • Condition (42) means that die the left neighbor of the state (m,n) in the state matrix S must be a member of the same group ⁇ p as (m,n).
  • the second constraint is:
  • the condition (43) makes the penalty term C independent of the horizontal warping.
  • Each subset ⁇ p of PHMM can be considered as a
  • one-dimensional HMM comprising the states , with transition
  • l ⁇ x ⁇ X R , y p ⁇ , 1 ⁇ p ⁇ Y r , then the constraints (42),(43) transform to
  • constraints (42) and (43) can be trivially changed by applying a coordinate transformation.
  • the PHMM approach was tested on a writer-independent isolated handwritten digit recognition application.
  • the data we used in our experiments was collected from 12 subjects (6 for training and 6 for test). The subjects were each asked to write 10 samples of each digit. Each sample was written in a fixed-size box, therefore the samples were naturally size-normalized and centered.
  • Figure 11 shows the 100 samples written by one of the subjects.
  • Each sample in the database was represented by a 16 ⁇ 16 binary image.
  • Each character class (digit) was represented by a single PHMM, satisfying (49) and (50).
  • Each PHMM had a strictly left-to-right bottom-up structure, where the state matrix 5 was restricted to contain every state of the model, i.e., states could not be skipped. All models had die same number of states.
  • Each state was represented by its own binary probability distribution, i.e., the probability of a pixel being 1 (black) or 0 (white).
  • Each iteration of the algorithm consisted of two stages: first, the samples were aligned with the corresponding model, by finding the best state matrix ⁇ . Then, a new frequency count for each state was used to update P i (1), according to the obtained alignment.
  • the recognition was performed as explained in section 3: The test sample was assigned to die class k for which was maximal.
  • Figure 12 shows three sets of models with different numbers of states.
  • the (6 ⁇ 6) state models have a very coarse representation of the digits, because the number of states is so small.
  • the (10 ⁇ 10) state models appear much sharper than the (16 ⁇ 16) state models, due to their ability to align the training samples.
  • a statistical model (the planar hidden Markov model - PHMM) was developed to provide a probabilistic formulation to the planar warping problem.
  • This model on one hand, generalizes the single-dimensional HMM to die planar case, and on the other extends the DPW approach.
  • the restricted formulation of the warping problem corresponds to PHMM with constrained transition probabilities.
  • the PHMM approach was tested on an isolated, hand-written digit recognition application, yielding 95% digit recognition. Further analysis of the results indicate that even in a simple case of isolated characters, the elimination of planar distortions enhances recognition performance significantly. We expect that the advantage of this approach will be even more valuable in harder tasks, such as cursive writing recognition/spotting, for which an effective solution using the current available techniques has not yet been found.
  • Figure 1 Time-time grid. Abscissa: test time scale l ⁇ r ⁇ T. Ordinate: reference time scale 1 ⁇ t ⁇ T R . Any monotonically increasing curve connecting point A to point B corresponds to a mapping f ⁇ f .
  • FIG. 2 Example of warping problem.
  • G R is a 2 ⁇ 2 reference image, and G is a 3 ⁇ 3 test image. Inside each pixels are shown its 9x,y) coordinates. The value of the image g( ⁇ ,y) is encoded by texture, as shown.
  • Figure 3 Illustration of the definitions of ⁇ , ⁇ ,and ⁇ for the example of figure 2.
  • Figure 4 Illustration of the two-dimensional warping algorithm on the example of figure 2.
  • the table shows the values of D i,n for 1 ⁇ i ⁇ 16 and 1 ⁇ n ⁇ 3, calculated according to the DPW algorithm.
  • Figure 5 Illustration of the constrained DPW algorithm for the example of. figure 2.
  • the table shows the values of for 1 ⁇ k ⁇ 2 and 1 ⁇ n ⁇ 3. In this case the obtained solution is the same as in figure 4.
  • Figure 6 Example of a test image G for which the optimal mapping obtained according to the general DPW formulation differs from the one obtained according to the restricted formulation.
  • Figure 7 Illustration of the planar Markov property. The probability of a state in the light grey pixel given the states of all the dark grey pixels in (a) equals the probability of a state in the light grey pixel given the states of only two dark pixels in (b).
  • Figure 8 two groupings of the 4 ⁇ 4 PHMM states into subsets.
  • Figure 9 Equivalent representation of constrained PHMM, for the grouping of figure 8a.
  • Figure 10 Illustration of the algorithm for the case of figure 8a.
  • Figure 11 The 100 samples of the digits from one subject.
  • Figure 12 The digit models obtained by training, for different number of states.
  • N G mutually exclusive subsets

Abstract

A method for image recognition is provided which involves storing a plurality of two-dimensional hidden Markov models (60) each such model comprising a one-dimensional shape-level hidden Markov model comprising one or more shape-level states, each shape-level state comprising a one-dimensional pixel-level hidden Markov model comprising one or more pixel-level states. An image is scanned to produce one or more sequences of pixels. For a stored two-dimensional hidden Markov model, local Viterbi scores for a plurality of pixel-level hidden Markov models are determined for each sequence of pixels (6). A global Viterbi score of a shape-level hidden Markov model is determined based on a plurality of local Viterbi scores and the sequences of pixels. The scanned image is recognized based on one or more global Viterbi scores.

Description

METHOD AND APPARATUS FOR IMAGE RECOGNITION
Field of the Invention
The present invention relates generally to the field of image recognition, and specifically to pattern based image recognition.
Background of the Invention
Signal recognition systems operate to label, classify, or otherwise recognize an unknown signal. Signal recognition may be performed by comparing characteristics or features of unknown signals to those of known signals.
Features or characteristics of known signals are determined by a process known as training. Through training, one or more samples of known signals are examined and their features or characteristics recorded as reference patterns in a database of a signal recognizer.
To recognize an unknown signal, a signal recognizer extracts features from the signal to characterize it. The features of the unknown signal are referred to as the test pattern. The recognizer then compares each reference pattern in the database to the test pattern of the unknown signal. A scoring technique is used to provide a relative measure of how well each reference pattern matches the test pattern. The unknown signal is recognized as the reference pattern which most closely matches the unknown signal.
There are many types of signal recognizers, e.g., template-based recognizers and hidden Markov model (HMM) recognizers. Template-based recognizers are trained using first-order statistics based on known signal samples (e.g., spectral means of such samples) to build reference patterns. Typically, scoring is accomplished with a time registration technique, such as dynamic time warping (DTW). DTW provides an optimal time alignment between reference and test patterns by locally shrinking or expanding the time axis of one pattern until that pattern optimally matches the other. DTW scoring reflects an overall distance between two optimally aligned reference and test patterns. The reference pattern having the lowest score (i.e, the shortest distance between itself and the test pattern) identifies the test pattern.
HMM recognizers are trained using both first and second order statistics (i.e., means and variances) of known signal samples to build reference patterns. Each reference pattern is an N-state statistical model incorporating these means and variances. An HMM is characterized by a state transition matrix, A (which provides a statistical description of how new states may be reached from old states), and an observation probability matrix, B (which provides a description of which spectral features are likely to be observed at a given state). Scoring of a test pattern reflects the probability of the sequence of features in the pattern given a model (i.e., given a reference pattern). Scoring across all models may be provided by conventional dynamic programming techniques, such as Viterbi scoring well known in the art. The HMM which indicates the highest probability of the sequence of features in the test pattern identifies the test pattern.
Pattern-based signal recognition techniques, such as DTW and HMMs, have been applied in the past to the one-dimensional problem of speech recognition, where unknown signals to be recognized are speech, signals and the one dimension is time. It has been a problem of some interest to provide for multi- dimensional signals, such as two-dimensional image signals, a set of general tools analogous to those available for one-dimensional signal recognition.
Summary of the Invention
The present invention provides a method and apparatus for multi-dimensional signal recognition. The invention accomplishes recognition through multi-dimensional reference pattern scoring techniques.
An illustrative embodiment of the present invention provides a two-dimensional image recognizer for optical character recognition. The recognizer is based on planar hidden Markov models (PHMMs) with constrained transition probabilities. Each PHMM comprises a one-dimensional shape-level hidden Markov model and represents a single image reference pattern. A shape-level HMM comprises one or more pixel-level hidden Markov models, each of which represents a localized portion of a shape-level HMM. The embodiment operates to determine, for a given PHMM and a given sequence of pixels in an unknown character image, a local Viterbi score for each of one or more pixel-level HMMs in a shape-level HMM. Furthermore, the embodiment operates to determine a global Viterbi score for a shape-level HMM based on the plurality of local Viterbi scores. Character images are recognized based on the global Viterbi scores. A global Viterbi score is provided for each PHMM (i.e., each shape-level HMM) reference pattern.
Brief Description of the Drawings
Figure 1 presents illustrative groupings of pixel-level hidden Markov model states. Figure 2 presents the illustrative groupings of pixel-level hidden Markov model states from Figure 1 associated with the shape-level states of a shape-level hidden Markov model.
Figure 3 presents a shape-level hidden Markov model comprising the shape-level states presented in Figure 2.
Figure 4 presents an illustrative optical character recognition -system according to the present invention.
Figure 5 presents components of the two-dimensional pattern matcher presented in Figure 4.
Figure 6 presents an image of a scanned character, T, comprising a plurality of linear pixel sequences.
Detailed Description
Introduction
An illustrative optical character recognition system according to the present invention includes a plurality of two-dimensional (or planar) hidden Markov models to represent images to be recognized. Each planar hidden Markov model is defined by:
i. a set of pixel-level states:
S = { s(x,y) } , x= 1, . . . , X, y= 1, . . . , Y;
ii. a set of transition probabilities:
A(i,j),(k,l),(m,n)≡P(s(x,y) = (m,n) | s(x-1,y) = (i,j) , s(x,y- 1) = (k,l)) ,(1) where x and y are abscissa and ordinate, respectively, in a conventional two- dimensional coordinate system; and
iii. a set of observation probability densities B(x,y), one for each state s(x,y).
Each two-dimensional hidden Markov model may be represented as a set of shape-level states . Each shape-level state, Gj, corresponds
Figure imgf000005_0001
to a particular grouping of one or more pixel-level states, S. According to the principles expressed in the Appendix hereto, these groupings of pixel-level states should satisfy the following conditions:
a. The number of groups of shape-level states, NG, is a polynomial function of the number of pixel-level states X×Y. b. The union of all groups of shape-level states, G=∪ Gj, coincides with the set of pixel-level states, S.
With respect to the groups of shape-level states, the transition probabilities should fulfill the two following conditions:
c. A(i,j),(k,l),(m,n)≠ 0 only if there exists p, 1≤ p≤ NG, such that
(i,j) , (m,n) e GP; and (2) d. A(i,j),(k,l),(m,n) = A(i,j),(k₁,l₁),(m,n) if there exists p, 1≤p ,r≤ NG, such that (k,l) , (k1 ,l 1) ∈ γP , (3)
Figure imgf000006_0001
if there exists p, 1≤p≤NG, such that (i,j), (i1 ,j1), (m,n), (m1,n1) ∈ γp, and where (k,l) ∈ γr, (k1,l1 ) ∈ γr.
An example ofthe application of these conditions (a-d) is presented in Figures 1 - 3. In Figure 1, seven shape-level states, G1 to G7, are shown with reference to a 4×4 matrix of pixel-level states. As shown in Figure 2, each shapelevel state, Gj, corresponds to a one-dimensional pixel-level hidden Markov model comprising four pixel-level states. Moreover, each shape-level state, Gj, is but one state in a shape-level hidden Markov model, as shown in Figure 3. The arrows between states in the HMM of Figure 3 indicate legal state transitions within the constraints of conditions c and d, above.
The transition probabilities among the pixel-level and shape-level states are derived from A(i,j) ,(k,l) ,(m,n). When conditions c and d hold for a particular grouping, then transition probability A(i,j),(k,l),(m,n) can be represented as:
Figure imgf000006_0002
where
Figure imgf000006_0003
and
α rp = P( s(x,y)∈ GP | s(x,y- 1)∈ Gr ). (6) Hence, (5) defines the transition probabilities between pixel-level states in a one-dimensional pixel-level HMM (such as, e.g., any of those appearing in Figure 2, and (6) defines the transition probabilities between shape-level states in a one-dimensional shape-level HMM By virtue of (i) the nesting of pixel-level HMMs in a shape-level HMM, and (ii) conditions c and d specified above, a general two-dimensional (or planar) HMM for use in image recognition is provided. Note that the pixel-level state observation probabilities are not affected by the grouping of states.
An Illustrative Embodiment
For clarity of explanation, the illustrative embodiment of the present invention is presented as comprising individual functional blocks (including functional blocks labeled as "processors"). The functions these blocks represent may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software. (Use of the term "processor" should not be construed to refer exclusively to hardware capable of executing software.) Illustrative embodiments may comprise digital signal processor (DSP) hardware, such as the AT&T DSP 16 or DSP32C, and software performing the operations discussed below. Very large scale integration (VLSI) hardware embodiments of the present invention, as well as hybrid DSP/VLSI embodiments, may also be provided.
Figure 4 presents an illustrative optical character recognition system according to the present invention. The system comprises a conventional image scanner 10, a two-dimensional pattern matcher 20, control switches R and T, a decision processor 30, a state image memory 35, a probability estimation processor 45, and a planar hidden Markov model memory 40.
The conventional image scanner 10 receives a physical image of a character and scans it to generate as output a matrix signal, g(x,y). This signal represents the intensity of the physical image at each pixel location, x,y, within the image. PHMMs, developed through a training process discussed below, are stored in the PHMM memory 40. Each PHMM in memory 40 represents a character to be recognized in an optical character application.
The matrix signal for the image, g(x,y), is processed by the two-dimensional pattern matcher 20 to generate, for each PHMM, a global Viterbi score 1 resulting from the comparison of the PHMM and the signal g(x,y). A state image, s (x,y ), is also generated to represent the index of the PHMM state corresponding to pixel, x,y.
The two-dimensional pattern matcher 20 is presented in Figure 5.
Pattern matcher 20 comprises a windowing processor 5, a pixel- level Viterbi processor 6, a local-level score memory 7, and a shape-level Viterbi processor 8.
The windowing processor 5 receives the matrix signal, g(x,y), and extracts therefrom successive sequences of pixels, L1 , L2, . . . , LM. As shown illustratively in the example of Figure 6, these sequences may be linear sequences of pixels.
The pixel-level Viterbi processor 6 determines for each pixel sequence Li and each group Gj (comprising a pixel-level HMM) a local state score d ij. This is done by computing the Viterbi score of the linear sequence of pixels, Li, with the pixel-level linear HMM, Gj. An ΝG × M matrix of the local-level state scores is stored in memory 7.
The shape-level Viterbi processor 8 computes a global score for a given PHMM as the Viterbi score of a linear shape-level hidden Markov model using the sequence Li as the observation sequence and dy as the local state score for each shape-level state Gj and each observation Li. Also, the state image, s (x,y), is computed using conventional backtracking methods for hidden Markov models.
The operations performed by the two-dimensional pattern matcher 20 are repeated for each PHMM in the PHMM memory 40. In recognition mode (i.e., when switch R is closed and switch T is open), the decision processor 30 recognizes the scanned image as the character corresponding to the PHMM with the highest score, lh.
In training mode, switch T is closed and switch R is open. The training mode operation of the embodiment involves conventional Viterbi training of a linear hidden Markov model. Known samples of all characters to be recognized are provided sequentially as input to scanner 10. For each such sample of a given character, a state image s (x,y) is determined by the two-dimensional pattern matcher 20 as described above, using only the PHMM corresponding to the known sample. All known samples for the character are processed in this fashion, with each state image s (x,y ) stored in state image memory 35. Once all such samples for a character are processed and the resulting state images are stored, the probability estimation processor 45 estimates new transition and observation probabilities for the PHMM (as frequency counts) in conventional fashion taking into account the conditions c and d described above for the state transition probabilities. APPENDIX
1. Introduction
In this appendix we extend the dynamic time warping (DTW) algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for applications in the field of optical character recognition (OCR) or similar applications.
This appendix is written from the point of view of a "speech-researcher"; i.e., we start with the description of the single-dimensional case and then extend it to two dimensions in order to point out the similarity and the differences between the two algorithms. No previous knowledge about speech recognition is assumed.
In the next section we first discuss t the general template matching approach to pattern recognition and show the role of DTW or DPW algorithms in this paradigm. Then we describe the single-dimensional warping, or time alignment problem, and show how the DTW algorithm solves the problem for template-based systems in polynomial time using a general principle of optimality. The two-dimensional warping problem is defined in section 2.2, and its general solution using the same optimality principle is presented. Although the application of the optimality principle in this case reduces the computational complexity of planar warping, the complexity still remains exponential in the dimensions of the image. We show that by restricting the original warping problem, by limiting the class of possible distortions somewhat, we can reduce the computational complexity dramatically, and find the optimal solution to the restricted problem in polynomial time. This approach differs from the one taken in references [1] and [2] , where instead of restricting the problem, a suboptimal solution to the general problem was found. In section 3, the statistical modeling approach to pattern recognition is described. In section 3.1, we discuss statistical modeling of temporal signals using HMM, and show how this approach is more general, but still similar to DTW. In section 3.2, we introduce the planar hidden Markov model (PHMM) that, on one hand, extends the HMM concept to model images and, on the other hand, generalizes the DPW approach. We show that the restricted formulation of the planar warping problem of section 2.2.3 is equivalent to zeroing some transition probabilities in the PHMM. In section 4, experimental results of isolated hand-written digit recognition experiments are presented. The results indicate that even in the simple case of isolated characters, the elimination of planar distortions enhances the performance significantly. We anticipate that the advantage of this approach will be even more prominent in harder tasks, such as cursive writing recognition/spotting, that involve some of the above mentioned problems. The major ideas of this appendix are summarized in section 5.
2. Template Matching Approach to Pattern Recognition
The task of pattern recognition is that of classifying a set of measured patterns ( e.g., acoustic signals, pixel map images, etc.) into a finite set C={C 1, . . . , CN } of distinct classes representing spoken words or phonemes in the case of speech recognition, and written words or characters in the OCR task. Template matching is one of the many possible ways to solve this problem. According to this approach, each class is represented by a template (a reference pattern), and a new pattern is classified by selecting the class Ck for which the distance Dk between the new pattern and the class representative template is minimal, i.e.
Figure imgf000011_0001
The difficulty in the pattern recognition task arises because of the intra-class variability of the patterns. Methods have to be developed to reduce such variability, thereby building up some invariance properties for the classifier. This intra-class variability is sometimes caused by non-linear distortions during the generation process of the patterns. In speech recognition this problem is known as the 'time alignment' problem, and its source is the temporal variability of the spoken utterances. The DTW procedure described below attempts to reduce the magnitude of this problem. The purpose of the procedure is to time-align the test and the reference patterns by stretching and contracting the test pattern to optimally match it to the reference, by minimizing a measure of the spectral distance Dk between the time-aligned patterns, temporal distortions.
The problem of intra-class variability also arises in optical character recognition due to non-linear, non-uniform elastic distortions (i.e., stretching, contracting) of the hand-written characters. In this appendix we show how to address this problem by generalizing the DTW procedure for planar alignment of images.
2.1 Matching Temporal Signals
2.1.1 One-Dimensional Problem Formulation The DTW algorithm is a procedure that Was developed for optimally aligning two temporal signals:
Gk R={gRk(t): 1≤t∈ Z+≤TR ,gk R(·)∈ G⊂R"}, the reference or template signal, representing the k-th class, and G={g(t): 1≤t∈ Z+≤T ,g(·)∈ G⊂ R"}, the test signal to be classified. Z+ is the set of positive integers, and R" is the n-dimensional real space. The goal of DTW is to find a mapping function
Figure imgf000012_0001
that maps the test time scale to the reference time scale, such that the distortion
Figure imgf000012_0002
between the aligned patterns is minimal, where d(·,·) is a defined local distance measure in G. For simplicity of notation we omit the class index k hereafter. The mapping function is constrained by global constraints, such as the boundary conditions, f(1)=1 , f(T)=TR , (3) where we assume that the beginnings and the ends of the two patterns line up, and local monotonicity constraints, such as,
Δf=f(t+1)-f(t)≥0 , (4) that prevent the mapping from "folding backwards" in time. We denote by f the set of all mapping functions that satisfy (3) and (4). Constraints (3) and (4) are typical, but not unique. Since the treatment of other kinds of global and local constraints is similar, we continue with the problem defined by (3) and (4) only.
2.12 The Procedure The problem of finding the optimal mapping has an exponential complexity since there are O(TT R) possible mappings in f. These mappings are shown as a set of paths in a time-time grid (Fig.1), where each path is a monotonically increasing curve that starts at point
Figure imgf000013_0001
and ends at poin
Figure imgf000013_0002
. The DTW algorithm finds the optimal alignment curve among all possible paths in polynomial time, using the dynamical programming optimality principle.[3] The optimality principle is based on the fact that the optimal alignment curve (i.e., the one with the minimal distortion along the path) connecting point A to point B through point C is found among all curves that optimally connect A and C. This basic principle leads to an efficient iterative procedure for finding the optimal curve connecting A and B.
In the n-th step of the procedure, 2≤n≤T, we assume that the optimal warping of the (n-1)-th interval of the test signal g(t) ,1≤t≤n— 1, to the i-th interval of the reference signal
Figure imgf000013_0004
is known for all 1≤i≤TR. Each optimal warping is defined by a mapping fi,n- 1 and a distortion Di,n-1 , such that
Figure imgf000013_0003
Here we denote by fi,n, 1≤i≤TR, 1≤n≤T the set of all mapping functions from an interval 1≤t≤n to an interval
Figure imgf000013_0005
, satisfying the local monotonicity conditions (4) on their domain and global boundary conditions: for all f ∈ fi,n 1 =f(1) ; i=f(n) . (6)
It is clear that The warping fi,n-1 corresponds to a curve in the time-time grid
Figure imgf000013_0006
that optimally connects point A to the point (t=n -1,7=0.
At this stage we can find the optimal warping of the n-th interval of the test signal to the i-th interval of the reference signal, namely fi,n and Di,n, for all 1≤i≤TR.
Figure imgf000014_0001
and,
Figure imgf000014_0002
where j is the argument minimizing (7a). Note that the range of minimization over j, constrained to the interval 1≤j≤i, guarantees the satisfaction of the monotonicity constraint (4).
The procedure is initialized for n =1 by setting
Figure imgf000014_0003
and
Figure imgf000014_0004
and is terminated when n =T. This initialization assures that the optimal curve ending at any point in the grid (including point B) does start at point A, according to the global constraints of (3). Therefore, the optimal curve connecting point A to point B is found after T iterations, each requiring on the order of TR operations described by (7) so that the total computational cost is o(TTR ).
2.2 Matching Images
22.1 DPW Problem Formulation In extending the DTW algorithm to the alignment of images, our goal is to match the 2-dimensional reference image, to an elastically
Figure imgf000015_0001
distorted test image, G = {g(x,y):x∈ Z+,y∈ Z+ , (x,y)∈ LX,Y , g(·, ·)∈ G⊂ R"}. Here an (x,y) pair describes pixel location by horizontal and vertical coordinates, and LN,M denotes a rectangular discrete lattice, i.e., a set of pixels . Figure 2 shows a simple example of GR and G. This
Figure imgf000015_0002
example is used to illustrate the definitions and the procedures described below.
The idea of planar warping is to map the test lattice to the reference one through a mapping function F:
Figure imgf000015_0003
such that the distortion
Figure imgf000016_0001
is minimal, subject to possible constraints like global boundary conditions:
Fx(1,y)=1 ; (11a)
Fx(X,y)=XR ; (11b)
Fy(x, 1)= 1 ; (11e)
Fy(x,Y)=YR , (11d) and local monotonicity constraints, such as
ΔFxx = Fx( x+1,y)-Fx( x,y)≥0 ; (12a)
ΔFyy=Fy(x,y+1)-Fy(x,y)≥0 . (12b)
We denote by F the set of all admissible mappings that satisfy the above conditions. Although we limit the discussion in this appendix to constraints (11) and (12), the treatment of other kinds of constraints is similar.
2.2.2 The General Approach The complexity of the problem of finding the optimal warping function is exponential, namely o((XRYR)xy). This complexity can be reduced, as in the one-dimensional case, by generalizing the optimality principle. We will use the following definitions: 1. Define Θ to be a set of NT test sub-shapes {θn}, where each test sub-shape is a set of pixels {(x,y)} satisfying the following conditions:
Figure imgf000017_0001
has a natural mono-dimensional parametrization,
Figure imgf000017_0002
Figure imgf000017_0006
where θ0 is the empty set. In particular, we choose θ to be a set of Y rectangles, θn = Lχ,n (see Fig. 2), n=1, · · · ,Y. In this case Δθn are pixels of the n-th row.
2. Define Φ to be a set of admissible warping sequences , where Φi is
Figure imgf000017_0003
a sequence of X reference pixels that meets the
Figure imgf000017_0004
following conditions:
Figure imgf000017_0005
This definition of the set Φ depends on the particular choice of the set Θ and the constraints (11a),(11b) and (12a). Φ is constructed to contain all possible warping sequences of each Δθn that satisfy the constraints.
From this definition it is clear that for each i, 1≤i≤NΦ, and for each n, 2≤n≤Y-1, there exists F∈F such that for 1≤x≤X. Also for each F∈ F and any n,
Figure imgf000018_0004
1≤n≤Y, there exists Φi∈ Φ such that for 1≤x≤X. The cardinality of Φ is
Figure imgf000018_0003
NΦ=O((XRYR)X).
3. Each sequence Φi∈ Φ determines a subset Λi⊂Φ of sequences
Figure imgf000018_0002
Whenever we consider Φi to be a candidate warping sequence for the n-th row of the test image, the preceding (n-1)-th row can be matched only with a warping sequence in Λi in order to meet the vertical monotonicity condition (12b).
Figure 3 shows the concepts defined above, applied to the example of figure 2. In figure 3a the set Θ is shown. The set Φ includes in this case 16 sequences, shown in figure 3b. The corresponding Λi for each Φi∈ Φ is also given.
4. Denote by Fi,n a set of sub-mapping functions from the n-th test rectangle θn, 1≤n≤NT, that satisfy the monotonicity conditions (12a) and (12b), boundary conditions (11a) , (11b) and (11c), and match the n-th row of the test Δθn with Φi: for any
Figure imgf000018_0001
Using these definitions we are ready to describe the DPW algorithm.
In the n-th iteration of the algorithm, 2≤n≤Y, we assume that the optimal warpings of the (n-1)-th rectangle of the test image g(x,y), (x,y)∈ θn-1, that match the (n-1)-th test image row g(x,y), (x,y)∈ Δθn-1 with the warping sequence Φi are known for 1≤i≤NΦ. Each optimal warping is defined by a mapping Fi,n-1∈ Fi,n-1 and a distortion Di,n-1 , such that
Figure imgf000019_0001
Figure imgf000019_0002
Now we can find the optimal warping of the n-th test rectangle, g(x,y), (x,y)∈ θn, that matches the n-th test image row to to the j-th warping sequence, gR(x,y), (x,y)∈ Φj:
Figure imgf000019_0003
Figure imgf000019_0004
Figure imgf000020_0001
The optimal mapping Fj,n is
Figure imgf000020_0002
where z is the argument minimizing (19). Constraining the minimization in (19) only to those i such that Φi∈ Λj, guarantees that the vertical monotonicity condition (12b) is satisfied. The horizontal monotonicity condition (12a), and the two boundary conditions (11a) and (11b) are satisfied through the definition of Φj.
To complete the n-th iteration, the optimal warping of the n-th test rectangle has to be found for every warping sequence Φj∈ Φ, thus requiring NΦ X operations.
The algorithm is initialized for n=1 by setting
Figure imgf000020_0003
where δ(·) is the Kronecker delta function. This initialization guarantees the satisfaction of condition (11c). The algorithm is stopped after n=Y, when the optimal warpings Fi,Y are found for all i for which Λi = Φ, thereby requiring a total of O(YXNΦ ) computations. The global optimal warping function Foplimαt minimizing (10) and satisfying (11) and (12) is chosen among these warpings as the one that produces the minimal distortion: Foptimal = Fj, Y , (22) where j=arg min Di,Y .
i:Λi = Φ
Constraining the minimization in (22) only to those i for which Λi=Φ, guarantees satisfaction of the boundary condition (11d).
Figure 4 shows the values of Di,n and Foptimal for the example of figure 2, using a quadratic distance measure d(gR(xi.yi).g(x,n))=(gR(xx J,yx)-g(x,n))2.
2.2.3 Constraining the Warping Problem Even though applying the optimality principle reduces the complexity of the planar warping, the computation is still exponential. Therefore the algorithm is impractical for real-size images (since NΦ=O((YRXR)X) ). Further reduction of the computational complexity can be achieved in two different ways:
1. Finding a sub-optimal solution to the warping problem. Examples of sub-optimal procedures can be found in[3,4], where the images are divided into small sub-images, usually containing up to three rows of pixels. These sub-images were small enough, that finding a (local) optimal warping function is possible. The global solution, however, is not optimal, since the dependence across sub-images is neglected.
2. Redefining and simplifying the original warping problem.
The idea here is to limit the number of admissible warping sequences in Φ , or, equivalently, constrain the class of admissible mappings F in such a way that an optimal solution to the constrained problem can be found in polynomial time. The additional constraints used are not arbitrary, but instead reflect the geometric properties of the specific set of images being compared. For example, we can constrain the possible mappings to be of the form
Figure imgf000022_0001
where the vertical distortion is independent of the horizontal position. In this case NΦ=O(XX R Y), and the admissible warping sequences Φ∈ Φ are naturally grouped into YR subsets. The m-th subset, λm contains all those sequences Φi for which yx=m , l≤x≤X (e.g., in figure 2, the set Φ contains only four sequences,
Φ= {Φ1, Φ5, Φ12, Φ16}, and λ1 = {Φ1, Φ5}, λ2 = {Φ12, Φ16} ). For all Φi∈ λm, , i.e.,
Figure imgf000022_0002
the satisfaction of the vertical monotonicity condition is independent of a particular horizontal warping. This allows further reduction of computational complexity as follows. We define
Figure imgf000022_0003
and
Figure imgf000022_0004
where i is the argument that minimizes (24a). The recursion relation (19) can now be rewritten in terms of these quantities as:
Figure imgf000023_0002
The second term of (25), , is the distortion
Figure imgf000023_0003
resulting from optimally aligning the n-th row of the test image to the m-the row of the reference image while satisfying both the horizontal monotonicity and the boundary conditions (11a), (11b) and (12a). This is equivalently a single-dimensional warping, as described in the previous section, and requires O(XXR) calculations. Denote by fm,n the optimal mapping that aligns the n-th row of the test image to the m-th row of the reference image constrained by (11a), (11b) and (12a) and minimizing ΔDm,n. Then
Figure imgf000023_0001
where
Figure imgf000023_0004
The optimal mapping, Foptimat∈ F, minimizing (10) and satisfying (11,12), is
. and the complexity of its computation is only O(YRYXRX)! Figure 5
Figure imgf000023_0005
shows and Foptimal obtained by applying the restricted approach to the problem of
Figure imgf000023_0006
figure 2. Note that the solution obtained here is the same as the one obtained by the general approach shown in figure 4.
An important remaining question is what are the limitations of this restricted approach, as compared to the original one? Assumption (23) implies that a row of the test image can be mapped to pixels that belong to a single row of the reference, - i.e., a horizontal line in the test image will be mapped to some horizontal line in the reference image. This fact does not severely restrict the generality of the approach, since many kinds of distortion can be accounted for in this manner. For example, a straight line with a small but non-zero slope can be transformed into any straight or not straight line, excluding the line with zero slope. An example of a test image, for which the solution obtained by the restricted approach differs from that obtained by the general approach, is shown in figure 6.
The restricted formulation of the problem should reflect the geometry of the application. The restriction (23) discussed here is only one among many possibilities. For other restricted formulations it might be useful to design the sets Θ and Φ in a different manner. For example, Θ can be the set of nested vertical rectangles {Rn,Y : 1≤n≤X}; Φ, in this case, includes the warping sequences for test image columns similarly to (14), and the set of admissible mappings is restricted to contain functions F for which Fx(x,y)=Fx(x). Generic description of the type of needed constraints is presented in appendix A.
3. Statistical Modeling Approach to Pattern Recognition
Another way of approaching the pattern recognition problem is by means of statistical modeling of the pattern source. The fc-th class of patterns Ck, k=1, . . . ,Nc, is represented by a model, which is assumed to generate the k-th class patterns according to the probability distribution P(G | Ck). Under this paradigm, the criterion that yields the minimal classification error is maximum a posteriori probability decoding: an unclassified pattern G is assigned to the class Ck according to
Figure imgf000025_0001
The term P(Cn | G) can be rewritten as
Figure imgf000025_0002
where P(G) is independent of Cn and therefore can be ignored. The prior class probability P(Cn) is generally attributed to higher level knowledge (e.g., syntactic knowledge). If such knowledge is not readily available, we usually assume a uniform class probability . Then the classification problem is that of maximizing
Figure imgf000025_0003
the likelihood
P(G I Cn)≡Pn(G) . (29)
The computation of this likelihood is performed using the underlying stochastic model that represents the n-th class.
In the next subsection we describe a stochastic model called the "Hidden Markov model" frequently used to model temporal signals. We show that the statistical classification approach, using this model, generalizes the template matching paradigm based on DTW. Then we proceed to define a new stochastic model, that can both extend the HMM approach for planar signals, and generalize the template matching approach using DPW. 3.1 Hidden Markov Model
The HMM is a statistical model that is used to compute Pn(G) for temporal signals G={g(t): 1≤t≤T, g∈ G⊂Rn} such as speech [4] [5] [6] For simplicity we omit the class index n. The HMM is a composite statistical source, comprising a set of TR sources, called states s= { 1, . . . ,TR}. The i-th state, i∈ s, is characterized by its probability distribution over G, Pi(g). At each time t only one of the states is active, emitting the observable g(t). We denote the random variable, corresponding to the active state at time t by s(t), s(t)∈ s. The joint probability distribution (for real-valued g) or discrete probability mass (for g being a discrete variable) P(s(t),g(t)) for t > 1 is characterized by the following property:
P(s(t),g(t) | s(1:t-1),g(1:t-1))=P(s(t) | s(t-1)) P(g(t) | s(t))≡ (30)
=P(s(t) | s(t-1)) Ps(t)(g(t)), where s(1:t-1) stands for the sequence {s(1), . . . s(t-1)}, and g(1:t-1)={g( 1), . . . ,g(t-1)}.
We denote by aij the transition probability P(s(t)=j | s(t-1)=i ), and by πi, the probability of state i being active at t=1, πi =P(s(1)=i ).
The probability of the entire sequence of states S≡s(1:T) and observations G=g(1:T) can be expressed as
Figure imgf000026_0001
The interpretation of equations (30) and (31) is that the observable sequence G is generated in two stages: first, a sequence S of T states is chosen according to the Markovian distribution parametrized by {aij} and {πi}; then each one of the states s(t), 1≤t≤T, in S generates an observable g(t) according to its own memoryless distribution Ps(t), forming the observable sequence G. This model is called a hidden Markov model, because the state sequence S is not given in most applications, and only the observation sequence G is known. We can estimate the most probable state sequence Ŝ, given the observation G, as
Figure imgf000027_0001
Then the likelihood (29) of the sequence of observations is approximated by
Figure imgf000027_0002
i.e., instead of the sum
Figure imgf000027_0003
only the maximal term is taken into account. This approximation is computationally economical, and has been shown, both experimentally and theoretically[7], to be valid, i.e., to have a vanishingly small approximation error.
The problem of finding Ŝ and
Figure imgf000027_0005
can be restated as that of minimizing
Figure imgf000027_0004
over all possible state sequences S. The problem of minimizing L is of exponential complexity, since there exist
Figure imgf000028_0004
possible state sequences, but it can be solved in polynomial time using a dynamical programming approach (similarly to description of Section 2.1). It is useful to understand this similarity: a state sequence S defines a mapping from the observation time scale 1≤t≤T to the active state at time t, 1≤s(t)≤TR, that corresponds to the reference time scale 7 in the DTW approach. The first term in provides a distortion measure, as in (2). For example, for a
Figure imgf000028_0001
Gaussian HMM, where where
Figure imgf000028_0002
t=s(t). The penalty term in (35), , generalizes the global
Figure imgf000028_0003
and the local constraints of equations (3) and (4) of DTW. A particular case of this model, called a left-to-right HMM, is especially useful for speech modeling and recognition. In this case aij = 0 for j<i, and π1 = 1. This type of model provides an infinite penalty to state sequences that do not start with S 1=1, and for which the monotonicity condition s(t+l)≥s(t) does not hold. If in addition the absorbing state s(T) is constrained to be the last state of the model s(T)=TR, the minimization (35) is, in effect, performed only among those state sequences that correspond to mappings satisfying conditions that are equivalent to (3) and (4). The only difference between the minimization problem defined by (2), (3) and (4) and this one is the non-zero penalty term in (35). The optimality principle can be applied to the minimization (35) in a manner similar to DTW as described in section 2.1.2.
This statistical description not only provides a formal interpretation of the heuristic warping procedure and aids its understanding, but also enables natural integration with higher-level syntactical knowledge. 3.2 The Two-Dimensional Case: Planar HMM
In this section we describe a statistical model for Pn(G), when G is a planar image, G = {g(x,y):(x,y)∈ Lχ,Y , g∈ G}. We call this model the "Planar HMM" (PHMM) and design it not only to extend the conventional HMM to the two-dimensional case, but also to provide a statistical interpretation and generalization of the DPW approach.
The PHMM is a composite source, comprising a set, s, of N=XRYR states
Figure imgf000029_0001
. Each state in s is a stochastic source characterized by its probability density
Figure imgf000029_0002
over the space of observations g∈ G. It is convenient to think of the states of the model as being located on a rectangular lattice
Figure imgf000029_0003
corresponding to the reference lattice of DPW. Similarly to the conventional HMM, only one state is active in the generation of the (x,y)-th image pixel g(x,y). We denote by s(x,y)∈ s the active state of the model that generates g(x,y). The joint distribution governing the choice of active states and image values has the following Markovian property (see figure 7):
P(g(x,y), s(x,y) | g(1:X, 1:y-1), g(1:x-1,y), 5(1:X, 1:y-1),s(1:x-1,y))= (36)
=P(g(x.y) | s(x,y)) P(s(x,y) | s(x-1,y),s(x,y-1))=
=Ps(x,y)(g(x,y) ) P(s(x,y) | s(x-1,y),s(x,y-1))= where g(1:X,y-1)≡{g(x,y) : (x,y)≡ Rχ,y-1}, g(1:x-1,y)≡{g(1,y), . . . ,g(x-1,y)}, and s(1:X, 1:y-1), s(1 :x-1,y) are the active states involved in generating g(1:X,y-1), g(1:x-1,y), respectively ( see figure 6). Using property (36), the joint likelihood of the image G=g(1:X,1:Y) and the state image S =s (1:X, 1:Y) can be written as
Figure imgf000030_0001
where
Figure imgf000030_0002
Similarly to HMM, (37) suggests that an image G is generated by the PHMM in two successive stages: in the first stage the state matrix S is generated according to the
Markovian probability distribution parametrized by {A}, {aH}, {av }, and{π}. In the second stage, the image value in the (x,y)-th pixel is produced independently from other pixels according to the distribution of the s(x,y)-th state PS(x,y)(g ). As in HMM, the state matrix S in most of the applications is not known, only G is given. The state matrix Ŝ that best explains the observable G can be estimated as in (32) by
, and then observation likelihood P(G) is approximated as
Figure imgf000030_0004
Figure imgf000030_0003
Therefore, the problem of finding Ŝ and P(G) is that of minimizing
Figure imgf000031_0001
over all possible state matrices S. Again, the problem is of exponential complexity, since there are (XR YR )XY different state matrices. This complexity can be reduced, as with DPW, by applying the optimality principle and by restricting the model. The similarity between the problem of finding the most probable state matrix in PHMM and DPW can be shown as follows: the states of the PHMM correspond to the pixels of the reference image, and therefore the active state matrix S corresponds to the mapping F of DPW. The first term in is equivalent to
Figure imgf000031_0003
the distortion measure D of DPW. The second term, C, generalizes constraints (11), (12), and (23). In particular, by restricting the PHMM parameter values to be
Figure imgf000031_0002
the active state matrix S that minimizes (38) must satisfy conditions equivalent to (11), (12a) and (12c). The PHMM constrained by (40) can be referred to as the left-to-right bottom-up PHMM, since it doesn't allow for "foldovers" in the state images.
The other boundary conditions (12b) and (12d) can be imposed on Ŝ by restricting the values of s (x,Y), 1≤x≤X and s (X,y), 1≤y≤Y,
Figure imgf000032_0001
3.2.1 Constraining the parameters of PHMM In this section we describe the ways of constraining the values of transition probabilities {A(i,j),(k,l),(m,n)} in order to reduce the complexity of the problem of finding Ŝ and to polynomial, similarly to the additional
Figure imgf000032_0004
constraints on DPW discussed in appendix A, and section 2.2.3.
For the problem of finding Ŝ and
Figure imgf000032_0002
to be solved in polynomial time, there should exist a grouping of the set s of states of the model into NG subsets of states γp, These subsets do not have to be mutually exclusive, and can share states.
Figure imgf000032_0003
Two examples of such groupings are shown in figure 8. The number of subsets, NG, should be polynomial in the dimensions of the model XR, YR. The probabilities {A (i,j),(k,l),(m,n)} should satisfy the two following constraints with respect to such grouping: A(i,j),(k,l),(m,n)≠ 0 only if there exists p, 1≤p≤NG, such that (i,j), (m,n)∈ γp. (42)
Condition (42) means that die the left neighbor of the state (m,n) in the state matrix S must be a member of the same group γp as (m,n). The second constraint is:
Figure imgf000033_0001
Figure imgf000033_0002
if there exist p, 1≤p≤NG, such that (i,j) , (i1,j1) , (m,n), (m 1,n 1)∈ γp, and where(k,l)∈ γr , (k1 ,l1)∈ γr1
The condition (43) makes the penalty term C independent of the horizontal warping.
In the case when (42) and (43) hold for a particular grouping, the nonzero transition
probabilities A (i,j),(k,l),(m,n) can be factorized into
Figure imgf000033_0003
where
Figure imgf000033_0004
and
Figure imgf000033_0005
The ratio K(p,r,r1) of Eq. (43b) can be expressed as . Using this equivalent
Figure imgf000033_0006
representation of transition probabilities (given by equations (44-46)), a convenient
description of PHMM can be derived. Each subset γp of PHMM can be considered as a
one-dimensional HMM, comprising the states , with transition
Figure imgf000033_0007
probabilities among those states of equation (45), and the respective
Figure imgf000034_0001
observation probabilities. The whole PHMM can now be represented as a collection of such subsets, with a Markovian probability of transitions between the subsets defined by αrp of equation (46). This equivalent representation, illustrated in figure 9, suggests an iterative algorithm for computing the state matrix Ŝ and
Figure imgf000034_0002
in polynomial time, similarly to DPW case of section 2.2.3. Denote by Lp,n the local cost, related to the probability that the n-th raw of the image G was generated by a single-dimensional HMM corresponding to the subset γp, and by Ŝp,n the corresponding state sequence:
Figure imgf000034_0003
Figure imgf000034_0004
This cost can be calculated in a polynomial time using the Viterbi algorithm, since this is a single-dimensional case. After all the local costs Lp,n have been calculated for 1≤n≤Y, 1≤p≤NG, the global cost and the optimal state matrix Ŝ are
Figure imgf000034_0005
found using the Viterbi algorithm for the single-dimensional HMM defined by a set of NG states (the subsets γp of the PHMM), transition probabilities between these states (αpr of Eq. 46), and the observation probabilities given by exp [-Lp,n ]. The algorithm is illustrated in figure 10.
Although conditions (42), (43) are hard to check in practice since any possible grouping of the states has to be considered, they can be effectively used in constructive mode, i.e., chosing one particular grouping, and then imposing the constraints (42) and (43) on the probabilities {A(i,j)(k,l),(m,n)} with respect to this grouping. For example, if we choose yp = {(x,y) | l≤x≤XR, y=p }, 1≤p≤Yr, then the constraints (42),(43) transform to
Figure imgf000035_0001
and,
Figure imgf000035_0002
Figure imgf000035_0003
equivalently to the restriction imposed on DPW by (23).
The constraints (42) and (43) can be trivially changed by applying a coordinate transformation.
4. Experimental Results
The PHMM approach was tested on a writer-independent isolated handwritten digit recognition application. The data we used in our experiments was collected from 12 subjects (6 for training and 6 for test). The subjects were each asked to write 10 samples of each digit. Each sample was written in a fixed-size box, therefore the samples were naturally size-normalized and centered. Figure 11 shows the 100 samples written by one of the subjects. Each sample in the database was represented by a 16 ×16 binary image. Each character class (digit) was represented by a single PHMM, satisfying (49) and (50). Each PHMM had a strictly left-to-right bottom-up structure, where the state matrix 5 was restricted to contain every state of the model, i.e., states could not be skipped. All models had die same number of states. Each state was represented by its own binary probability distribution, i.e., the probability of a pixel being 1 (black) or 0 (white). We estimated these probabilities from the training data with the following generalization of the Viterbi training algorithm.[8] For the initialization we uniformly divided each training image into regions corresponding to the states of its model. The initial value of Pi(g=1) for the i-th state was obtained as a frequency count of die black pixels in the corresponding region over all the samples of the same digit. Each iteration of the algorithm consisted of two stages: first, the samples were aligned with the corresponding model, by finding the best state matrix Ŝ. Then, a new frequency count for each state was used to update Pi(1), according to the obtained alignment. We noticed that the training procedure converged usually after 2-4 iterations, and in all the experiments the algorithm was stopped at the 10th iteration. The recognition was performed as explained in section 3: The test sample was assigned to die class k for which was maximal.
Figure imgf000036_0001
Figure imgf000036_0002
Figure imgf000037_0001
It is worth noting the following two points. First, the test error shows a minimum for XR = YR = 10 of 5%. By increasing or decreasing the number of states this error increases. This phenomenon is due to the following:
1. The typical under/over parametrization behavior.
2. Increasing the number of states closer to the size of the modeled images reduces the flexibility of the alignment procedure, making this a trivial uniform alignment when XR =YR = 16.
Also, the training error decreases monotonically with increasing number of states up to XR = YR = 16. This is again typical behavior for such systems, since by increasing the number of states, the number of model parameters grows, improving the fit to the training data. But when the number of states equals the dimensions of the sample images, XR=YR = 16, there is a sudden significant increase in the training error. This behavior is consistent with point (2) above.
Figure 12 shows three sets of models with different numbers of states. The states of the models in this figure are represented by squares, where the grey level of the square encodes the probability P(g=1). The (6×6) state models have a very coarse representation of the digits, because the number of states is so small. The (10×10) state models appear much sharper than the (16×16) state models, due to their ability to align the training samples.
This preliminary experiment shows that eliminating elastic distortions by the alignment procedure discussed above plays an important role in the task of isolated character recognition, improving the recognition accuracy significantly. Note that the simplicity of this task does not stress the full power of the PHMM representation, since the data was isolated, size-normalized, and centered. We expect that the advantages of this approach will be even more prominent in harder tasks, such as cursive/connected hand-writing, recognition with grammatical constraints, noisy images, etc..
5. Summary and Discussion
In this appendix we demonstrated how the DTW algorithm and HM modeling, extensively used for speech recognition, can be generalized to OCR. We found two key problems in this generalization:
1. Applying the optimality principle in the planar case is not trivial, since the two dimensions of an image cannot be treated separately. In order to use the optimality principle here, the set of all possible warping sequences satisfying horizontal constraints must be defined. For the n-th row of the test image every such sequence has to be considered as a candidate warping. The vertical constraints are taken into account by limiting the set of possible warping sequences of the previous (n-1)-th row. In this way the complexity of computation was reduced from O((XRYR XY ) to O(YX(YRXR)X ).
2. Although applying the optimality principle reduces the computational complexity, it still remains exponential in the dimensions of the image. We show that by restricting the original warping problem by limiting the class of possible distortions (for example, assuming that the vertical distortion is independent of a horizontal position), we can reduce the computational complexity dramatically, and find the optimal solution to the restricted problem in linear time O(XY XRYR ).
A statistical model (the planar hidden Markov model - PHMM) was developed to provide a probabilistic formulation to the planar warping problem. This model, on one hand, generalizes the single-dimensional HMM to die planar case, and on the other extends the DPW approach. The restricted formulation of the warping problem corresponds to PHMM with constrained transition probabilities. The PHMM approach was tested on an isolated, hand-written digit recognition application, yielding 95% digit recognition. Further analysis of the results indicate that even in a simple case of isolated characters, the elimination of planar distortions enhances recognition performance significantly. We expect that the advantage of this approach will be even more valuable in harder tasks, such as cursive writing recognition/spotting, for which an effective solution using the current available techniques has not yet been found.
Figure Captions
Figure 1: Time-time grid. Abscissa: test time scale l≤r≤T. Ordinate: reference time scale 1≤ t≤TR. Any monotonically increasing curve connecting point A to point B corresponds to a mapping f∈ f .
Figure 2: Example of warping problem. GR is a 2×2 reference image, and G is a 3×3 test image. Inside each pixels are shown its 9x,y) coordinates. The value of the image g(χ,y) is encoded by texture, as shown.
Figure 3: Illustration of the definitions of Θ, Φ,and Λ for the example of figure 2.
Figure 4: Illustration of the two-dimensional warping algorithm on the example of figure 2. The table shows the values of Di,n for 1≤i≤16 and 1≤n≤3, calculated according to the DPW algorithm. The optimal value of D is D =0. and the corresponding Foptimal is shown.
Figure 5: Illustration of the constrained DPW algorithm for the example of. figure 2. The table shows the values of
Figure imgf000040_0001
for 1≤k≤2 and 1≤n <3. In this case the obtained solution is the same as in figure 4.
Figure 6: Example of a test image G for which the optimal mapping obtained according to the general DPW formulation differs from the one obtained according to the restricted formulation.
Figure 7: Illustration of the planar Markov property. The probability of a state in the light grey pixel given the states of all the dark grey pixels in (a) equals the probability of a state in the light grey pixel given the states of only two dark pixels in (b).
Figure 8: two groupings of the 4×4 PHMM states into subsets.
a. Here the set of states is divided into 4 mutually exclusive subsets, each contains states of one raw only.
b. The same set of states are grouped into 7 subsets.
Figure 9: Equivalent representation of constrained PHMM, for the grouping of figure 8a.
Figure 10: Illustration of the algorithm for the case of figure 8a.
a. First, the local costs are computed using Viterbi algorithm
b. The global solution is found using Viterbi algorithm with the local costs.
Figure 11: The 100 samples of the digits from one subject.
Figure 12: The digit models obtained by training, for different number of states. The grey level in these images encodes the value of P(g=1) for each state.
6. Appendix A: Properties of the constraints
Changing the choice of sub-shape set Θ, and changing the set of admissible mappings Φ accordingly, is equivalent to coordinate transformation. The example discussed in the end of section 2.2.3, corresponds to such simple coordinate transformation, exchanging the roles of the vertical and the horizontal coordinates. In what follows we discuss a generic description of the constraints on the set Φ, keeping the set Θ fixed for θn=LX,n.
For the computational complexity of DPW process to be polynomial in the sizes of the images, there should exist a grouping of the set of admissible warping sequences
(defined by F) into NG mutually exclusive subsets, . The number of subsets NG
Figure imgf000042_0001
should be polynomial in the sizes of the images {X, Y, XR, YR }, and this grouping should fulfill the following conditions:
1. For 1≤k≤NG, if Φi∈ λk and Φj∈ λk then Λij≡Λk.
2. Por 1≤k≤NG, Λk can be expressed as a union of some subsets λj, i.e. for any k,
1≤k≤NG, there exist the indices
Figure imgf000042_0002
, such that .
Figure imgf000042_0003
It is clear that the example (23) discussed in section 2.2.3, satisfy these conditions. The analysis of the general case described by conditions 1,2 above is similar to the analysis of the example (23) given in Eq. (24-26), and is therefore omitted. REFERENCES
1. R. Chellappa, S. Chatterjee, "Classification of textures Using Gaussian Markov Random Fields," IEEE Transactions on ASSP , Vol. 33, No. 4, pp. 959-963, August 1985.
2. H. Derin, H. Elliot, "Modeling and Segmentation of Noisy and Textured Images Using Gibbs Random Fields," IEEE Transactions on PAMI , Vol. 9, No. 1 pp. 39-55, January 1987.
3. R. Bellman, Dynamic Programming," Princeton, NJ: Princeton University Press, 1957
4. C.-H. Lee, L. R. Rabiner, R. Pieraccini, J. G. Wilpon, "Acoustic Modeling for Large Vocabulary Speech Recognition," Computer Speech and Language, 1990, No. 4, pp. 127-165.
5. J. G. Wilpon, L. R. Rabiner, C.-H. Lee, E. R. Goldman, "Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models," IEEE Trans, on ASSP, Vol. 38, No. 11, pp 1870-1878, November 1990.
6. R. Pieraccini, Ε. Levin, "Stochastic Representation of Semantic Structure for Speech Understanding," Proceedings of EUROSPEECH 91, Vol.2, pp. 383-386, Genova, September 1991. 7. N. Merhav and Y. Ephraim, "Maximum likelihood hidden Markov modeling using a dominant sequence of states," accepted for publication in IEEE Transaction on ASSP.
8. F. Jelinek, "Continuous Speech Recognition by Statistical Methods," Proceedings of IEEE, vol. 64, pp. 532-556, April 1976.
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
Figure imgf000059_0001

Claims

Claims:
1. A method of optical character recognition, the method comprising the steps of:
a. storing a plurality of two-dimensional hidden Markov models, each such
model comprising a one-dimensional shape-level hidden Markov model comprising one or more shape-level states, each shape-level state comprising a one-dimensional pixel-level hidden Markov model comprising one or more pixel-level states;
b. scanning an image to produce one or more sequences of pixels;
c. for a stored two-dimensional hidden Markov model,
i. determining for each sequence of pixels a local Viterbi score for a
plurality of pixel-level hidden Markov models; and
ii. determining a global Viterbi score of a shape-level hidden Markov model based on a plurality of local Viterbi scores and the sequences of pixels; and
d. recognizing the scanned image based on one or more global Viterbi scores.
2. The method of claim 1 wherein the step of recognizing the scanned image comprises the step of recognizing the scanned image based on the twodimensional hidden Markov model having the highest global Viterbi score.
3. The method of claim 1 wherein the probability of a first state in a stored two-dimensional hidden Markov model equals zero when a left neighbor state is not a member of the same pixel-level model as the first state.
4. The method of claim 1 wherein the probability of a first state in a stored two-dimensional hidden Markov model is based on the value of a left neighbor pixel-level state and the value of a bottom neighbor shape-level state.
5. An optical character recognition system, the system comprising: a. a memory storing a plurality of two-dimensional hidden Markov models, each such model comprising a one-dimensional shape-level hidden Markov model comprising one or more shape-level states, each shape-level state comprising a one-dimensional pixel-level hidden Markov model comprising one or more pixel-level states;
b. means for scanning an image to produce one or more sequences of pixels; c. means, coupled to the means for scanning and the memory, for determining local Viterbi scores for a sequence of pixels, each such score based on a pixel- level hidden Markov model;
d. means, coupled to the means for determining local Viterbi scores, for
determining a global Viterbi score of a shape-level hidden Markov model based on a plurality of local Viterbi scores and the sequences of pixels; and e. means, coupled to the means for determining a global Viterbi score, for
recognizing the scanned image based on one or more global Viterbi scores.
6. The system of claim 5 wherein the means for recognizing the scanned image comprises means for recognizing the scanned image based on the two-dimensional hidden Markov model having the highest global Viterbi score.
PCT/US1993/001843 1992-03-02 1993-03-02 Method and apparatus for image recognition WO1993018483A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US84481092A 1992-03-02 1992-03-02
US07/844,810 1992-03-02

Publications (1)

Publication Number Publication Date
WO1993018483A1 true WO1993018483A1 (en) 1993-09-16

Family

ID=25293690

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/001843 WO1993018483A1 (en) 1992-03-02 1993-03-02 Method and apparatus for image recognition

Country Status (1)

Country Link
WO (1) WO1993018483A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5933525A (en) * 1996-04-10 1999-08-03 Bbn Corporation Language-independent and segmentation-free optical character recognition system and method
WO1999052074A1 (en) * 1998-04-03 1999-10-14 The University Of Queensland Method of unsupervised cell nuclei segmentation
AU748081B2 (en) * 1998-04-03 2002-05-30 Cea Technologies Pty Limited Method of unsupervised cell nuclei segmentation
US7327883B2 (en) 2002-03-11 2008-02-05 Imds Software Inc. Character recognition system and method
US7890539B2 (en) 2007-10-10 2011-02-15 Raytheon Bbn Technologies Corp. Semantic matching using predicate-argument structure
US8280719B2 (en) 2005-05-05 2012-10-02 Ramp, Inc. Methods and systems relating to information extraction
US8326087B2 (en) 2008-11-25 2012-12-04 Xerox Corporation Synchronizing image sequences
CN109002821A (en) * 2018-07-19 2018-12-14 武汉科技大学 A kind of Internetbank shield digit recognition method based on connected domain and tangent slope
US11163993B2 (en) 2017-07-07 2021-11-02 Hewlett-Packard Development Company, L.P. Image alignments via optical character recognition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4593367A (en) * 1984-01-16 1986-06-03 Itt Corporation Probabilistic learning element
US4599692A (en) * 1984-01-16 1986-07-08 Itt Corporation Probabilistic learning element employing context drive searching
US4599693A (en) * 1984-01-16 1986-07-08 Itt Corporation Probabilistic learning system
US4620286A (en) * 1984-01-16 1986-10-28 Itt Corporation Probabilistic learning element

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4593367A (en) * 1984-01-16 1986-06-03 Itt Corporation Probabilistic learning element
US4599692A (en) * 1984-01-16 1986-07-08 Itt Corporation Probabilistic learning element employing context drive searching
US4599693A (en) * 1984-01-16 1986-07-08 Itt Corporation Probabilistic learning system
US4620286A (en) * 1984-01-16 1986-10-28 Itt Corporation Probabilistic learning element

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5933525A (en) * 1996-04-10 1999-08-03 Bbn Corporation Language-independent and segmentation-free optical character recognition system and method
WO1999052074A1 (en) * 1998-04-03 1999-10-14 The University Of Queensland Method of unsupervised cell nuclei segmentation
AU748081B2 (en) * 1998-04-03 2002-05-30 Cea Technologies Pty Limited Method of unsupervised cell nuclei segmentation
US6681035B1 (en) 1998-04-03 2004-01-20 Cssip (Cooperative Research Centre For Sensor Signal And Information Processing) Method of unsupervised cell nuclei segmentation
US7327883B2 (en) 2002-03-11 2008-02-05 Imds Software Inc. Character recognition system and method
US8280719B2 (en) 2005-05-05 2012-10-02 Ramp, Inc. Methods and systems relating to information extraction
US7890539B2 (en) 2007-10-10 2011-02-15 Raytheon Bbn Technologies Corp. Semantic matching using predicate-argument structure
US8326087B2 (en) 2008-11-25 2012-12-04 Xerox Corporation Synchronizing image sequences
US11163993B2 (en) 2017-07-07 2021-11-02 Hewlett-Packard Development Company, L.P. Image alignments via optical character recognition
CN109002821A (en) * 2018-07-19 2018-12-14 武汉科技大学 A kind of Internetbank shield digit recognition method based on connected domain and tangent slope
CN109002821B (en) * 2018-07-19 2021-11-02 武汉科技大学 Online silver shield digital identification method based on connected domain and tangent slope

Similar Documents

Publication Publication Date Title
Cai et al. Integration of structural and statistical information for unconstrained handwritten numeral recognition
Yang et al. Hidden markov model for gesture recognition
Nefian et al. Face detection and recognition using hidden Markov models
Chiou et al. Lipreading from color video
EP0539749B1 (en) Handwriting recognition system and method
Kolcz et al. A line-oriented approach to word spotting in handwritten documents
US5392363A (en) On-line connected handwritten word recognition by a probabilistic method
US5075896A (en) Character and phoneme recognition based on probability clustering
Howe Part-structured inkball models for one-shot handwritten word spotting
Biem Minimum classification error training for online handwriting recognition
EP0605099B1 (en) Text recognition using two-dimensional stochastic models
Kosmala et al. On-line handwritten formula recognition using statistical methods
Rabi et al. Recognition of cursive Arabic handwritten text using embedded training based on hidden Markov models
Wong et al. Off-line handwritten Chinese character recognition as a compound Bayes decision problem
WO1993018483A1 (en) Method and apparatus for image recognition
Retsinas et al. An alternative deep feature approach to line level keyword spotting
Perronnin et al. A probabilistic model of face mapping with local transformations and its application to person recognition
Kim et al. Off-line recognition of handwritten Korean and alphanumeric characters using hidden Markov models
Kumar et al. Bayesian background models for keyword spotting in handwritten documents
Kumar et al. A Bayesian approach to script independent multilingual keyword spotting
Xiong et al. A discrete contextual stochastic model for the off-line recognition of handwritten Chinese characters
Nopsuwanchai et al. Maximization of mutual information for offline Thai handwriting recognition
Zhu et al. Online handwritten Chinese/Japanese character recognition
Saon Cursive word recognition using a random field based hidden Markov model
Levin et al. Planar Hidden Markov modeling: from speech to optical character recognition

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): DE FR GB IT

CFP Corrected version of a pamphlet front page
CR1 Correction of entry in section i

Free format text: PAT.BUL.22/93 "AMERICAN TELEPHONE AND TELEGRAPH COMPANY" SHOULD APPEAR UNDER INID (71) APPLICANT AND "ERREUR CODE DEPOSANT BFG" SHOULD BE DELETED;LEVIN,ESTHER AND PIERACCINI ROBERTO SHOULD APPEAR UNDER INID (72) INVENTORS AND "ERREUR CODE DEPOSANT BFG" SHOULD BE DELETED;UNDER INID (81) DESIGNATED STATES DELETE "US"

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA