WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Analyzing an image or other data to obtain a stable number of groups    
United States Patent5537491   
Link to this pagehttp://www.wikipatents.com/5537491.html
Inventor(s)Mahoney; James V. (San Francisco, CA); Rao; Satyajit (Bangalore, IN)
AbstractTo group items in an array, gap data are obtained indicating gaps between items. The gap data are used to obtain threshold data, which are then used to obtain grouping data. The gaps could, for example, be distances between items in a two-dimensional array or differences between values at which items occur in a one-dimensional array. The threshold data indicate a threshold. The threshold would produce a number of groups of the items that is stable across a range of thresholds, and the range of thresholds meets a criterion for largeness of a range. The criterion can require, for example, that the range be larger than the stable range of thresholds of any other number in a set of numbers of groups. The threshold can be obtained iteratively by applying a candidate threshold for each iteration. The candidate thresholds can be incremented, and the iterations can be counted to find a number of groups meeting the criterion. Or the candidate thresholds can be increased by differences between gaps, and a running sum of threshold ranges can be used to find a number of groups meeting the criterion. The threshold can also be obtained directly by finding the largest difference between gap extents and obtaining a threshold within the largest difference. Many types of grouping can be performed, including spatial clustering, segmentation of partially bounded regions, segmentation by local width, and global and local similarity grouping.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Mahoney; James V. (San Francisco, CA); Rao; Satyajit (Bangalore, IN)
Owner/Assignee     Xerox Corporation (Stamford, CT)
Patent assignment
All assignments
Publication Date     July 16, 1996
Application Number     08/158,053
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     November 24, 1993
US Classification     382/218
Int'l Classification     G06K 009/68
Examiner     Couso; Jose L.
Assistant Examiner    
Attorney/Law Firm    
Address
Parent Case    
Priority Data    
USPTO Field of Search     382/1 382/27 382/34 382/36 382/37 382/55 382/100 382/205 382/218 382/224 382/226 382/258
Patent Tags     analyzing image other data obtain stable number groups
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5321767
Murase
382/149
Jun,1994

[0 after 0 votes]
5282061
Farrell
358/464
Jan,1994

[0 after 0 votes]
5280367
Zuniga
358/462
Jan,1994

[0 after 0 votes]
5268774
Eschbach
358/466
Dec,1993

[0 after 0 votes]
5268773
Park
358/466
Dec,1993

[0 after 0 votes]
5263120
Bickel
706/62
Nov,1993

[0 after 0 votes]
5261010
Lo
382/216
Nov,1993

[0 after 0 votes]
5239596
Mahoney
382/180
Aug,1993

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed:

1. A method of operating a machine that includes:

image input circuitry for obtaining data defining images as input;

memory for storing data; and

a processor connected for receiving data defining images from the image input circuitry and for accessing the memory;

the method comprising:

operating the processor to receive input image data from the image input circuitry, the input image data defining an input image;

operating the processor to use the input image data to obtain initial array data; the initial array data defining an initial array of value items, each value item indicating a value;

storing the initial array data in the memory;

operating the processor to access the initial array data in the memory and to use the initial array data to obtain gap data indicating one or more gaps between value items in the initial array;

operating the processor to use the gap data to obtain threshold data indicating a threshold; each gap indicated by the gap data being above or below the threshold; the threshold data indicating a threshold that would produce a number of groups of the value items; the number of groups being stable across a range of thresholds; the range of thresholds meeting a criterion for largeness of a range; and

operating the processor to use the threshold data to obtain grouping data indicating a grouping of the value items in the initial array.

2. The method of claim 1 in which the grouping data indicate groups of value items, each group meeting a transitive closure criterion subject to the threshold in the initial array.

3. The method of claim 1 in which the number of groups of the value items is one of a set of numbers of groups; the criterion for largeness of a range requiring that the range of thresholds be larger than a range of thresholds across which any other number of groups in the set is stable.

4. The method of claim 3 in which the numbers in the set of numbers of groups are all greater than a minimum number of groups.

5. The method of claim 4 in which the minimum number of groups is one.

6. The method of claim 4 in which the minimum number of groups is two.

7. The method of claim 1 in which the gap data indicate, for each value item in the initial array, a distance; the threshold data indicating a threshold gap value; the act of operating the processor to use the threshold data to obtain grouping data comprising:

comparing the threshold gap value with each distance indicated by the gap data.

8. The method of claim 1 in which the initial array includes a set of value items that indicate values that meet a reference criterion; the gap data indicating, for each value item in the initial array, a distance to another value item that is in the set of value items that meet the reference criterion.

9. The method of claim 8 in which the reference criterion requires a value item indicating a value that is ON.

10. The method of claim 8 in which the reference criterion requires a value item indicating a value that is OFF.

11. The method of claim 1 in which the act of operating the processor to use the threshold data to obtain grouping data comprises:

obtaining thresholded data that indicate a set of the value items in the initial array; each value item in the set having a distance to another value item that meets a reference criterion; the distances of all the value items in the set meeting a comparison criterion in relation to the threshold; the set of value items including two or more subsets of value items, each subset meeting a transitive closure criterion subject to the threshold in the initial array; and

using the thresholded data to obtain the grouping data by obtaining, for each subset of value items, a group identifier.

12. The method of claim 11 in which the thresholded data include a binary value for each value item in the initial array, the comparison criterion requiring a value item's distance to be above the threshold; each value item's binary value being ON if the value item's distance is above the threshold.

13. The method of claim 11 in which the thresholded data include a binary value for each value item in the initial array, the comparison criterion requiring a value item's distance to be below the threshold; each value item's binary value being ON if the value item's distance is below the threshold.

14. The method of claim 11 in which the thresholded data include a binary value for each value item in the initial array, each value item's binary value indicating whether the value item's distance meets the comparison criterion; the transitive closure criterion requiring a subset of value items to be a connected component.

15. The method of claim 11 in which the initial array data define an image that includes pixels, each value item in the initial array being a pixel value; the grouping data defining a version of the image indicating, for each value item that is in a group of value items defined by the thresholded data, the group's identifier.

16. The method of claim 15 in which the value items in the initial array define connected components; the grouping data defining groups that cluster the connected components.

17. The method of claim 16 in which the grouping data define groups that segment the image.

18. The method of claim 1 in which values of a parameter define the initial array; the method further comprising, before the act of storing initial array data:

operating the processor to obtain parameter data indicating a set of values of the parameter; and

operating the processor to use the parameter data to obtain the initial array data; each value item in the initial array indicating, for one of the values of the parameter, whether its value is in the set of values indicated by the parameter data;

the act of operating the processor to use the initial array data to obtain gap data comprising:

using the initial array data to obtain difference data indicating a difference between adjacent values of the parameter for which value items in the initial array indicate that their values are in the set of values indicated by the parameter data; and

using the difference data to obtain the gap data.

19. The method of claim 18 in which the act of operating the processor to use the threshold data to obtain grouping data comprises:

using the initial array data and the threshold data to obtain, for each value item in the initial array that indicates that its value is in the set of values, a group identifier that identifies a group that includes the value item; and

using the group identifiers to obtain the grouping data.

20. The method of claim 19 in which the parameter data indicate, for each position in a parameter array, a value of the parameter; the parameter data indicating values that are in the set of values for a set of the positions; the act of using the group identifiers to obtain the grouping data comprising:

for each position in the set of positions, using the value indicated for the position by the parameter data and the group identifiers to obtain a group identifier for the position; the grouping data indicating a group identifier for each position in the set of positions.

21. The method of claim 20 in which the parameter data define an image; the grouping data defining a version of the image indicating, for each position, a group identifier.

22. The method of claim 20 in which the parameter is an attribute of features in images, each value of the parameter indicating a value of the attribute.

23. The method of claim 22 in which the attribute is area, elongation, aspect ratio, width, horizontal center of area, vertical center of area, or orientation.

24. The method of claim 1 in which the gap data indicate a gap for each value item in the initial array; the act of operating the processor to use the gap data to obtain threshold data comprising:

performing a sequence of iterations, the iterations including a first iteration and a number of following iterations, each following iteration having a preceding iteration, each iteration having threshold candidate data indicating a candidate threshold, the threshold candidate data of each following iteration indicating a candidate threshold that is greater than the preceding iteration's candidate threshold; each iteration comprising:

using the iteration's threshold candidate data to obtain thresholded data, the thresholded data indicating, for each value item in the initial array, whether the gap indicated by the gap data for the value item is above or below the iteration's candidate threshold; the thresholded data defining a number of groups of the value items in the initial array; and

using the iteration's thresholded data to obtain group count data indicating the number of groups defined by the thresholded data.

25. The method of claim 24 in which the act of operating the processor to use the gap data to obtain threshold data further comprises:

using the iterations' group count data to obtain subsequence length data indicating a number of iterations in each subsequence of iterations with group count data indicating equal numbers of groups; and

using the subsequence length data to obtain the threshold data; the threshold data indicating the candidate threshold of an iteration with group count data indicating a first number of groups; the subsequence length data indicating that a subsequence of iterations with group count data indicating the first number of groups includes a range of thresholds that meets the criterion for largeness of a range.

26. The method of claim 24 in which each following iteration further comprises:

using the preceding iteration's threshold data to obtain the following iteration's threshold data, the following iteration's threshold data indicating a threshold that is greater than the preceding iteration's threshold by a gap difference, the gap difference being a difference between gaps indicated by the gap data between which no indicated gaps occur;

the act of operating the processor to use the gap data to obtain threshold data further comprising:

using the iterations' group count data to obtain threshold range data indicating a range of thresholds in each subsequence of iterations with group count data indicating equal numbers of groups; and

using the threshold range data to obtain the threshold data; the threshold data indicating the candidate threshold of an iteration with group count data indicating a first number of groups; the threshold range data indicating that a subsequence of iterations with group count data indicating the first number of groups includes a range of thresholds that meets the criterion for largeness of a range.

27. The method of claim 1 in which the gap data indicate a gap for each value item in the initial array; the act of operating the processor to use the gap data to obtain threshold data comprising:

using the gap data to obtain largest difference data, the largest difference data indicating, for the gaps indicated by the gap data, a largest difference between gaps between which no indicated gaps occur; the largest difference occurring between a lower gap and an upper gap; and

using the largest difference data to obtain the threshold data, the threshold data indicating a threshold that is above the lower gap and below the upper gap;

the act of operating the processor to use the threshold data to obtain grouping data comprising:

using the initial array data to obtain distance data indicating a distance for each value item in the initial array; and

using the threshold data and the distance data to compare the threshold to each value item's distance.

28. The method of claim 27 in which the gap data indicate, for value items in the initial array that meet a foreground neighbor border criterion, a distance to a near neighbor in the initial array; the distance data indicating a distance to a near neighbor for each value item in the initial array; the act of operating the processor to use the threshold data to obtain grouping data further comprising:

grouping value items together that have distances that are below the threshold.

29. The method of claim 27 in which the gap data indicate, for value items in the initial array that meet an opposing foreground neighbor border criterion, a distance to a near neighbor in the initial array; the distance data indicating a distance to a near neighbor for each value item in the initial array; the act of operating the processor to use the threshold data to obtain grouping data further comprising:

grouping value items together that have distances that are above the threshold.

30. The method of claim 27 in which the gap data indicate, for value items in the initial array that meet an opposing background neighbor border criterion, a distance to a near neighbor in the complement of the initial array; the distance data indicating a distance to a near neighbor in the complement of the initial array for each value item in the initial array; the act of operating the processor to use the threshold data to obtain grouping data further comprising:

grouping value items together that have distances that are above the threshold.

31. A machine comprising:

memory for storing data; and

a processor connected for accessing data stored in the memory;

the data stored in the memory comprising initial array data defining an initial array of value items, each value item indicating a value;

the data stored in the memory further comprising instruction data indicating grouping instructions the processor can execute; the processor, in executing the grouping instructions:

accessing the initial array data in the memory and using the initial array data to obtain gap data indicating one or more gaps between value items in the initial array;

using the gap data to obtain threshold data indicating a threshold; each gap indicated by the gap data being above or below the threshold; the threshold data indicating a threshold that would produce a number of groups of the value items; the number of groups being stable across a range of thresholds; the range of thresholds meeting a criterion for largeness of a range; and

using the threshold data to obtain grouping data indicating a grouping of the value items in the initial array.

32. The machine of claim 31 in which the machine further comprises input image circuitry for obtaining data defining images as input; the instruction data further indicating image processing instructions the processor can execute; the processor further, in executing the image processing instructions:

receiving input image data from the image input circuitry, the input image data defining an input image; and

using the input image data to obtain the initial array data and providing the initial array data to the memory.

33. The machine of claim 32 in which the input image circuitry is connected for receiving facsimile transmissions.

34. The machine of claim 31 in which the machine is an image processing server; the image processing server being connected to a network for receiving requests for image processing operations; the network including the image input circuitry; the instruction data further indicating request handling instructions the processor can execute; the processor, in executing the request handling instructions, determining whether to execute the grouping instructions.

35. The machine of claim 31 in which the machine is a fax server.

36. The machine of claim 31 in which the machine is a copier.

37. An article of manufacture for use in a machine that includes:

a storage medium access device for accessing a medium that stores data;

memory for storing data; the data stored in the memory comprising initial array data defining an initial array of value items, each value item indicating a value; and

a processor connected for receiving data from the storage medium access device and also connected for accessing data in the memory;

the article comprising:

a storage medium that can be accessed by the storage medium access device when the article is used in the machine; and

data stored by the storage medium so that the storage medium access device can provide the stored data to the processor when the article is used in the machine; the stored data comprising instruction data indicating instructions the processor can execute; the processor, in executing the instructions:

accessing the initial array data in the memory and using the initial array data to obtain gap data indicating one or more gaps between value items in the initial array;

using the gap data to obtain threshold data indicating a threshold; each gap indicated by the gap data being above or below the threshold; the threshold data indicating a threshold that would produce a number of groups of the value items; the number of groups being stable across a range of thresholds; the range of thresholds meeting a criterion for largeness of a range; and

using the threshold data to obtain grouping data indicating a grouping of the value items in the initial array.

38. A method of performing image processing by operating a machine; the machine including:

image input circuitry for obtaining data defining images as input; and

a processor connected for receiving data defining images from the image input circuitry;

the method comprising:

operating the processor to receive input image data from the image input circuitry, the input image data defining an input image; the input image including pixels and features, each feature including a set of the pixels;

operating the processor to use the input image data to obtain parameter image data; the parameter image data indicating, for each pixel in the input image, a value for a parameter that is an attribute of features in the input image;

operating the processor to use the parameter image data to obtain threshold data and one-dimensional array data; the one-dimensional array data defining a one-dimensional array of value items, each value item indicating, for a value of the parameter, whether at least one of the pixels in the input image has the value; the threshold data indicating a threshold that would produce a number of groups of the value items; the number of groups being stable across a range of thresholds; the range of thresholds meeting a criterion for largeness of a range;the act of using the parameter image data comprising:

using the parameter image data to obtain gap data indicating one or more gaps between values of the parameter that occur in the parameter image data; and

using the gap data to obtain the threshold data; each gap indicated by the gap data being above or below the threshold indicated by the threshold data;

operating the processor to perform a series of iterations, each iteration handling a value item in the one-dimensional array; for each iteration after a first iteration in the series, the value item handled by the iteration being advanced by one from the item handled by an immediately preceding iteration; each iteration:

obtaining a distance count for the value item handled by the iteration; each value item's distance count indicating a distance within the one-dimensional array to a value item indicating that at least one of the pixels in the input image has the item's value; and

if the value item handled by the iteration indicates that at least one of the pixels in the input image has the item's value, obtaining a group identifier for the value item; the act of obtaining a group identifier comprising:

if the distance count of the immediately preceding value item is above the threshold, obtaining a new group identifier for the value item; and

if the distance count of the immediately preceding value item is below the threshold, obtaining a group identifier for the value item that is the same as a most recently obtained group identifier; and

operating the processor to use the group identifiers of the value items to obtain grouped parameter image data; the grouped parameter image data including, for each pixel for which the parameter image data indicates a non-zero value, the group identifier for the value item in the one-dimensional array for the pixel's value.

39. A method of performing image processing by operating a machine; the machine including:

image input circuitry for obtaining data defining images as input; and

a processor connected for receiving data defining images from the image input circuitry;

the method comprising:

operating the processor to receive input image data from the image input circuitry, the input image data defining an input image; the input image including pixels and features, each feature including a set of the pixels;

operating the processor to use the input image data to obtain parameter image data; the parameter image data indicating, for each pixel in the input image, a value for a parameter that is an attribute of features in the input image;

operating the processor to use the parameter image data to obtain threshold data and distance image data; the distance image data indicating, for each pixel in the input image, a distance to a near neighbor; the threshold data indicating a threshold that would produce a number of groups of the features; the number of groups being stable across a range of thresholds; the range of thresholds meeting a criterion for largeness of a range; the act of using the parameter image data comprising:

using the parameter image data to obtain distance data indicating one or more distances between pixels in the input image; and

using the distance data to obtain the threshold data; each distance indicated by the distance data being above or below the threshold indicated by the threshold data;

operating the processor to use the threshold data and the distance image data to obtain groups image data indicating, for each pixel, whether it is above or below the threshold; the act of operating the processor to use the threshold data and the distance image data comprising:

comparing the threshold indicated by the threshold data with each pixel's value indicated by the distance image data.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The present invention relates to techniques for analyzing data, such as data defining an image, to obtain information about groupings.

Mahoney, U.S. Pat. No. 5,239,596, describes techniques for labeling pixels in an image based on near neighbor attributes. FIG. 7 shows general steps in labeling pixels according to a link relationship criterion and FIG. 8 shows general steps in iterative labeling by applying a near neighbor criterion that is independent of distance. FIGS. 14-19, discussed beginning at col. 13 line 42, show steps in labeling pixels in various ways. A discussion of linked group labeling through grouping operations begins at col. 15 line 63. Grouping operations can apply a wide variety of criteria, including proximity. As discussed at col. 17 lines 15-19, a nearest neighbor criterion selects, at each pixel, the link for which the distance to the neighbor is the minimum. As discussed at col. 17 lines 26-43, a mutuality criterion selects, at each pixel, the link that is mutual with another link, which occurs when pixels are mutual neighbors. As discussed at col. 17 line 45-col. 18 line 11, a proximity grouping criterion defines clusters of elements using local neighbor distances alone, based on the intuition that cluster boundaries are locations of abrupt change in the distance between neighbors--that is, link-lengths vary slowly within a cluster but change abruptly at the boundaries between clusters. As described at col. 18 lines 6-12, sensitivity to scale may be controlled by deselecting a link if it is longer than the product of a constant parameter times the larger of two distances, one of which is the shortest link at the same pixel, the other of which is a global distance parameter. The global distance parameter could, for example, be based on the length of the longest link in the image or the most common link length. Source code for proximity grouping begins in cols. 31 and 32, near mid-page.

SUMMARY OF THE INVENTION

The invention is based on the discovery of problems that arise in analyzing data to obtain information about groupings or equivalence classes. Sometimes the part of an image array or other body of data that is relevant to an analysis task is a collection of items rather than an individual item. In image analysis, this may be true because coherent physical processes tend to produce collections of pixels that are coherent in space or in their geometric properties. Therefore, it is useful to be able to operate on a collection or group rather than on each individual item.

The problems relate to techniques for finding an appropriate number of groups. There is a tension between, on the one hand, automatically determining a number of groups that will produce a satisfactory grouping and, on the other hand, obtaining the number of groups in a simple, principled way.

This tension can be avoided by requiring a user to indicate the number of groups in advance, but the user may not know the appropriate number of groups. This tension can also be avoided by performing grouping for each number of groups in some range--the user can then select the best results or a quality criterion can be applied to the results for each number. But these approaches can be computationally expensive.

Some proposed techniques rely on geometric constraints alone in order to obtain an appropriate grouping. Mahoney, discussed above, uses a constant parameter that is an example of such a constraint. Others have proposed more complex sets of constraints. But grouping techniques tuned in terms of geometric constraints alone may give satisfactory results only for a particular class of inputs. They typically require calibration of constant parameters; and they often involve complicated computation.

The invention is based on the discovery of a new grouping technique that obtains a number of groups automatically in a simple, principled way based on a stability criterion. The technique often obtains intuitively correct and computationally useful groups of items. The groups may be more meaningful than the items they include if they correspond to meaningful structures, such as objects of interest in a scene shown by an image. The grouping technique can be performed as a computational module that is isolated from other computations required by an image analysis task.

The new grouping technique obtains threshold data indicating a threshold that can be applied to data indicating gaps between items in an array. The gaps can, for example, be distances to items in a two-dimensional array that indicate values meeting a reference criterion or differences between items in a one-dimensional array that indicate non-zero values. The threshold produces a number of groups that is stable across a range of thresholds, and the range of thresholds meets a criterion for largeness. The criterion may require, for example, a range of thresholds that is larger than the range across which other numbers of groups are stable.

The criterion of largeness of a range of thresholds can be applied to a set of numbers of groups to find the number in the set that is stable across the largest range. The numbers in the set are all greater than a minimum number. The minimum number should not be less than one, because a grouping with only one group is not of interest and is stable across a range of thresholds that extends into infinity. Further, in some applications of grouping, a minimum number of two can avoid a stable grouping in which one group includes only an outlier while a much larger group includes all other items; if the number of groups must be greater than two, the larger group is divided into two or more smaller groups.

In general, the technique can group sets of value items that meet a transitive closure criterion subject to the threshold. For example, for grouping an array, the transitive closure criterion can require a group of value items to be a connected component within which the value items are all above or all below the threshold. Or the transitive closure criterion can require a group of value items, between each pair of which the value items in the group provide a path and each step of the path is shorter than the threshold; a criterion of this type is necessary for grouping a list or set of foreground items rather than items in an array.

A one-dimensional array can include, for each value of a parameter, a value item indicating whether the value occurs in an image or other body of data. The technique can group a set of value items together whose values occur if each adjacent pair of value items in the set is separated from each other by a difference that is below the threshold. The parameter can be an attribute of features in an image, such as area, elongation, aspect ratio, width, horizontal center of area, vertical center of area, or orientation.

A two-dimensional array can indicate a value for each pixel of an image or for each item in another body of data. In an array defining an image, for example, the technique can compare the threshold with each pixel's distance to a near neighbor that meets a reference criterion, such as ON or OFF. In clustering, the technique can set a pixel ON if its distance is below the threshold. In segmentation, the technique can set a pixel ON if its distance is above the threshold.

The technique can obtain initial array data defining an initial array of items that indicate values, then use the initial array data to obtain gap data indicating gaps between items in the initial array. The technique can use the gap data to obtain threshold data indicating a threshold such that each gap indicated by the gap data is above or below the threshold. The technique can use the threshold to obtain grouping data grouping the items in the initial array into a number of groups that is stable across a range of thresholds that meets the criterion for largeness.

To group parameter values, the technique can use parameter data indicating a set of parameter values to obtain the initial array data. The parameter's values can define the initial array, and each item in the initial array can indicate whether one of the parameter's values occurs in the set of parameter values indicated by the parameter data. The technique can use the initial array data to obtain a difference or gap between adjacent values of the parameter that occur in the set. The technique can use the gap to obtain threshold data indicating a threshold, and can use the threshold to obtain grouping data. The grouping data indicate groups of items in the initial array.

The technique can use the initial array and the threshold to obtain a group identifier for each item in the initial array with a value in the set of values. If the parameter data indicate a value of the parameter for each position in an array and a position is in a set of positions with values in the set of values, the technique can use the position's value and the group identifiers to obtain a group identifier for the position. The array can, for example, be an image, and the grouping data can define a version of the image that indicates a group identifier for each pixel in the set of positions. The parameter can be an attribute of features in an image, such as area, elongation, aspect ratio, width, horizontal center of area, vertical center of area, or orientation.

To group parts of an image, as in clustering or segmentation, the technique can obtain the gap data from data defining the image. For example, the gap data can indicate, for each pixel, a distance to a black pixel that is a near neighbor of the pixel. The threshold can be compared with each pixel's distance to obtain a thresholded version of the image; each pixel with a distance above the threshold can be ON, or, alternatively, each pixel with a distance below the threshold can be ON. The technique can obtain the grouping data by labeling each pixel in a part of the image with a unique identifier of the group in the thresholded version that includes the part.

The technique can use gap data to obtain threshold data in any of several ways.

A first approach obtains the threshold data iteratively using candidate thresholds that increase for each successive iteration and using gap data that indicate, for each value item in the initial array, a gap. Each iteration obtains thresholded data indicating, for each value item, whether its gap is above or below the iteration's candidate threshold. The thresholded data define group items, each including a set of the value items in the initial array. Each iteration uses the thresholded data to obtain group count data indicating a number of the group items. The first approach uses the group count data to obtain subsequence length data indicating a number of iterations in each subsequence of iterations with group count data indicating equal numbers of group items. Each subsequence's number of iterations represents a range of thresholds, whose size is indicated by the product of the number of iterations with the threshold increment. The first approach finds a subsequence of iterations with a range of thresholds that meets the largeness criterion. For example, the subsequence may have a larger number of iterations than the subsequence of any other number of group items that is greater than a minimum number such as one or two and less than the number of value items in the initial array. The first approach obtains threshold data indicating that the threshold is the candidate threshold of one of the iterations in the subsequence that meets the largeness criterion.

A second approach performs iterations, as in the first approach, but more efficiently because each iteration can skip gaps that do not occur by increasing the threshold by a difference between gaps indicated by the gap data. The second approach uses the group count data to obtain threshold range data indicating a range of thresholds in each subsequence of iterations with group count data indicating equal numbers of group items. The second approach finds a subsequence of iterations with a range of thresholds that meets the largeness criterion, as in the first approach. The second approach obtains threshold data indicating a threshold in the range of thresholds that meets the largeness criterion.

A third approach uses the gap data to obtain largest difference data indicating the largest difference between gaps that are indicated by the gap data and between which the gap data do not indicate any distances. The third approach then uses the largest difference data to obtain threshold data indicating a threshold above the gap that is below the largest difference and below the gap that is above the largest difference.

The third approach can be applied in many ways. For initial array data defining an image, in which each item is a pixel value, clustering and segmentation can be performed as follows:

To cluster, the gap data can indicate distance to a near neighbor for each pixel that meets a foreground neighbor border criterion, meaning that it is at a border between pixels that have different near neighbors. The threshold can be compared at each pixel with the pixel's distance to a near neighbor in the image to obtain a grouped version of the image. Pixels with distances below the threshold are grouped together into connected regions or components in the grouped version of the image.

To segment partially bounded regions, the gap data can indicate distance to a near neighbor for each pixel that meets an opposing foreground neighbor border criterion, meaning that it is at a border between pixels that have near neighbors approximately 180.degree. apart in direction. The threshold can be compared at each pixel with the pixel's distance to a near neighbor in the image. Pixels with distances above the threshold are grouped together into connected components.

To segment by local width, the gap data can indicate distance to a near neighbor for each pixel that meets an opposing background neighbor border criterion, meaning that it is at a border between pixels that have near neighbors in the complement of the image that are approximately 180.degree. apart in direction. The threshold can be compared at each pixel with the pixel's distance to a near neighbor in the complement of the image. Pixels with distances above the threshold are grouped together into connected components.

The technique can be implemented with a machine that includes initial array data and data indicating grouping instructions. The initial array data define an initial array of value items, each indicating a value. The machine's processor can execute the grouping instructions. In executing the grouping instructions, the processor can access the initial array data and use them to obtain gap data indicating one or more gaps between value items in the initial array. The processor can use the gap data to obtain threshold data indicating a threshold such that each gap indicated by the gap data is above or below the threshold. The threshold is chosen so that it would produce a number of groups of the value items that is stable across a range of thresholds that meets a criterion for largeness of a range. The processor can use the threshold data to obtain grouping data indicating a grouping of the value items. The machine can be a high-speed image processing server that responds to image processing requests from a network to which it is connected.

The machine can also include image input circuitry, and the processor can receive input image data defining an input image from the image input circuitry and use them to obtain the initial array data. The machine can be a fax server or a copier.

The technique can also be implemented in a software product that includes a storage medium and data stored by the storage medium. The software product can be used in a machine that includes initial array data defining an initial array of value items, each indicating a value. The data stored by the storage medium can include grouping instructions the machine's processor can execute. In executing the grouping instructions, the processor can access the initial array data and use them to obtain gap data indicating one or more gaps between value items in the initial array. The processor can use the gap data to obtain threshold data indicating a threshold such that each gap indicated by the gap data is above or below the threshold. The threshold is chosen so that it would produce a number of groups of the value items that is stable across a range of thresholds that meets a criterion for largeness of a range. The processor can use the threshold data to obtain grouping data indicating a grouping of the value items.

The technique described above is advantageous because it makes possible efficient grouping of items. In addition, the resulting groupings are often intuitively correct. The technique can be used to automatically group connected components or parameter values in an image. The technique is self-calibrating in the sense that it determines the appropriate number of groups without requiring a user to specify the number in advance. For a wide variety of grouping operations on data defining images, including grouping in one dimension and