WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Single chip 2-D convolver    
United States Patent5151953   
Link to this pagehttp://www.wikipatents.com/5151953.html
Inventor(s)Landeta; David S. (Palm Bay, FL)
AbstractA convolver including a matrix of multipliers for providing a plurality of products PiCj of input data Pi and a dedicated coefficient Cj and each connected to a data input by a buffer which stores and delays input data Pi as a function of its position in the matrix and the length of the row M of the input data array. The buffers include row buffers which are programmable for various input data array row lengths M as well as programmable input buffer stages. A second data input is selectively connected to the multipliers or to an output adder for connection to external delay units or cascade connection to other convolvers. Unique control logic is provided to change the structure of the convolver as well as reset and program various elements using common inputs.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5151953
Single chip 2-D convolver - US Patent 5151953 Drawing
Single chip 2-D convolver
Inventor     Landeta; David S. (Palm Bay, FL)
Owner/Assignee     Harris Corporation (Melbourne, FL)
Patent assignment
All assignments
Publication Date     September 29, 1992
Application Number     07/624,964
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 10, 1990
US Classification     382/279 708/420
Int'l Classification     G06K 009/64
Examiner     Boudreau; Leo H.
Assistant Examiner     Johns; Andrew W.
Attorney/Law Firm     Barnes & Thornburg
Address
Parent Case    
Priority Data    
USPTO Field of Search     382/42 382/41 382/27 358/30 364/724.12 364/728.01
Patent Tags     single chip 2-d convolver
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5027423
Kawata
382/261
Jun,1991

[0 after 0 votes]
5005149
Elleaume
708/420
Apr,1991

[0 after 0 votes]
4937774
Malinowski
708/315
Jun,1990

[0 after 0 votes]
4791677
Mori
382/304
Dec,1988

[0 after 0 votes]
4747157
Kurakake

May,1988

[0 after 0 votes]
4720871
Chambers
382/278
Jan,1988

[0 after 0 votes]
4718091
Kobayashi
382/162
Jan,1988

[0 after 0 votes]
3980873
Mattei
708/315
Sep,1976

[0 after 0 votes]
3872290
Crooke
708/315
Mar,1975

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed:

1. A system for Q.times.R kernel convolution of an M.times.N array of words, where Q.times.R is a subset of M.times.N comprising:

input means for receiving, in sequence, words P from said M.times.N array;

coefficient means for storing coefficients C.sub.1 through C.sub.Q.times.R ;

Q.times.R matrix of multiplying means, each multiplying means having a first input receiving Pi and a second input receiving a dedicated Cj for producing a product PiCj where i varies from 1 to M.times.N and j is one of 1 to Q.times.R;

said first input of each of said multiplying means is connected to a first input of a preceding multiplying means in its respective row of a one stage buffer means;

said first input of a first multiplying means in each row is connected to a first input of a last multiplying means of a previous row by a row buffer means having M-R+1 stages; and

adder means connected to said Q.times.R matrix of multiplying means for adding said products PiCj to providing an output convolution Pc as a first output.

2. A system according to claim 1 wherein said row buffer means is programmable for different values of M.

3. A system according to claim 1 including:

second input means for receiving R-1 words P; and

a plurality of multiplexer means each having a first input from said second input means and a second input from a row buffer means and an output connected to a first input of a first multiplying means of a respective row.

4. A system according to claim 3 including means for selectively connecting said second input means to said adder means.

5. A system for Q.times.R kernel convolution of an M.times.N array of words, where Q.times.R is a subset of M.times.N, comprising:

input means for receiving, in sequence, words P from said M.times.N array;

coefficient means for storing coefficient C.sub.1 through C.sub.Q.times.R ;

Q.times.R matrix of multiplying means, each multiplying means having a first input receiving Pi and a second input receiving a dedicated Cj for producing a product PiCj where i varies from 1 to M.times.N and j is one of 1 to Q.times.R;

a plurality of buffer means each connecting said input means to a respective multiplying means first input for storing one or more words P and delaying inputting of Pi to a respective multiplying means as a function of its position in said Q.times.R matrix and the row length M of said M.times.N array;

an ALU means connected between said input means and said plurality of buffer means for performing arithmetical and logical operations on said words P before being inputted to said plurality of buffer means; and

adder means connected to said Q.times.R matrix of multiplying means for adding said products PiCj to providing an output convolution Pc as a first output.

6. A system according to claim 5 wherein said plurality of buffer means includes:

row buffer means connected to a first multiplying means of a row and having a plurality of stages as a function of row length M; and

other buffer means connected to said multiplying means other than said first multiplying means of a row and having a number of stages as a function of its position in its row.

7. A system according to claim 6, wherein said row buffer means is programmable for different values of M and the other buffer means are a fixed number of stages.

8. A system according to claim 7, wherein said system is convertible to a (Q.times.R).times.1 convolution by programming said row buffers to Q stages.

9. A system according to claim 6 including:

second input means for receiving R-1 words P; and

a plurality of multiplexer means each having a first input from said input means and a second input from said row buffer means and an output connected to a first input of a first multiplying means of a respective row.

10. A system according to claim 9 including means for selectively connecting said second input means to said adder means.

11. A system according to claim 5 including second input means for receiving and storing a second word; and

wherein said ALU means includes a first input connected to said input means and a second input connected to said second input means, and said ALU means performs arithmetical and logical operations on both said words P and said second word.

12. A system according to claim 5 wherein said coefficient means includes a first and second coefficient means for storing first and second groups of coefficients and a multiplexer means for connecting one of said coefficient means to second inputs of said multiplying means.

13. A system according to claim 5 including:

second input means connected to said adder means; and

second output means connected to an output buffer means whose output Pd represents a given number of stages of delay.

14. A system according to claim 13 wherein:

said output buffer means is programmable to adjust said given number of stages of delay; and

said first mentioned input means includes an input buffer means programmable to different numbers of stages of delay.

15. A system according to claim 14 wherein said plurality of buffer means includes:

row buffer means connected to a first multiplying means of a row and having a plurality of stages as a function of row length M;

other buffer means connected to said multiplying means other than said first multiplying means of a row and having a number of stages as a function of its position in its row; and

said output buffer means is a row buffer means of a last row Q.

16. A system according to claim 5 including

programming input means for receiving programming data;

address means for receiving addresses and determining the destination of said programming data;

said address means directs coefficients from said programming input means to said coefficient means by an appropriate address.

17. A system according to claim 16 wherein said coefficient means stores two sets of coefficients and said address means selects one or the other sets of coefficient to be provided to said multiplying means by an appropriate address.

18. A system according to claim 16 an ALU means connected between said input means and said multiplying means for performing arithmetical and logical operations on said words P before being inputted to said multiplying means; and

wherein said address means directs microcode from said programming input means to said ALU means.

19. A system according to claim 18,

including an ALUR means for storing a second word from said programming input means;

wherein said ALU means includes a first input connected to said input means and a second input connected to said ALUR means; and

wherein said address means directs said second word from said programming input means to said ALUR means.

20. A system according to claim 16 including instruction means for storing an instruction word which provides control of said system; and

wherein said address means directs instruction words from said programming input means to said instruction means.

21. A system according to claim 20 wherein said instruction word includes bits for output rounding and number formats.

22. A system according to claim 20 including second data input means for selectively providing second input data to said multiplying means and said adder means; and

wherein said instruction word includes a bit for controlling said second data input means.

23. A system according to claim 20 including input buffer means programmable to different number of stages of delay; and

wherein said instruction word includes bits for programming said input buffer means.

24. A system according to claim 16 wherein

said plurality of buffer means includes row buffers means programmable to different values of M; and

said address means directs values of m from said programming input means to said row buffer means for programming.

25. A system for Q.times.R kernel convolution of an M.times.N array of words, where Q.times.R is a subset of M.times.N, comprising:

first input means for receiving, in sequence, words P from said M.times.N array;

coefficient means for storing coefficients C.sub.1 through C.sub.Q.times.R ;

Q.times.R matrix of multiplying means, each multiplying means having a first input receiving Pi and a second input receiving a dedicated Cj for producing a product PiCj where i varies from 1 to M.times.N and j is one of 1 to Q.times.R;

R-1 row buffer means each connecting said first input means to a respective row of multiplying means first input for storing one or more words P and delaying inputting of Pi to a respective multiplying means as a function of its position in said Q.times.R matrix and the row length M of said M.times.N array;

second input means for receiving R-1 words P;

a plurality of multiplexer means each having a first input from said second input means and a second input from a row buffer means and an output connected to a first input of a respective row of said multiplying means;

adder means connected to said Q.times.R matrix of multiplying means for adding said products PiCj to proving an output convolution Pc as a first output; and

selection means for selectively connecting said second input means to either said multiplexing means or said adder means.

26. A system according to claim 25 including R-1 external buffers connected to said second input means, each of said external buffers having a greater capacity than said row buffer means.

27. A system according to claim 25 including a summing means, for each row R, for providing a sum of products PiCj of its respective row to said adder means; and

wherein said adder means adds said sums from said summing means to provide said output convolution Pc.

28. A system according to claim 27 including a second input means for receiving additional sums and said adder means is connected to said second input means.

29. A system for Q.times.R kernel convolution of an M.times.N array of words where Q.times.R is a subset of M.times.N comprising:

input means for receiving, in sequence, words P from said M.times.N array;

coefficient means for storing coefficients C.sub.1 through C.sub.Q.times.R;

Q.times.R matrix of multiplying means, each multiplying means having a first input receiving Pi and a second input receiving a dedicated Cj for producing a product PiCj where i varies from 1 to M.times.N and j is one of 1 to Q.times.R;

a plurality of buffer means each connecting said input means to a respective multiplying means first input for storing one or more words P and delaying inputting of Pi to a respective multiplying means as a function of its position in said Q.times.R matrix and the row length M of said M.times.N array;

adder means connected to said Q.times.R matrix of multiplying means for adding said products PiCj to proving an output convolution Pc as a first output; and

frame reset means for resetting said matrix of multiplying means, said buffer means and said adder means and not said coefficient means.

30. A system according to claim 29 including reset means for resetting said coefficient means, said multiplying means, said buffer means and said adder means.

31. A system according to claim 29 wherein said buffer means have programmable length and said frame reset means resets said buffer means content without resetting its programmable length.

32. A system according to claim 29 including clock input means for receiving a clock pulse which determines the processing rate and a hold input means for enabling or disabling said clock input means.

33. A system for S.times.T kernel convolution of an M.times.N array of words where S.times.T is a subset of M.times.N comprising:

a plurality of Q.times.R capacity convolvers, where Q.times.R is a subset of S.times.T, each having a data input, coefficient input, a cascade input connected to an output adder of said convolver, a product output from said output adder and a cascade output providing delayed input data;

said convolvers being arranged in rows and columns;

a cascade input of each convolver being connected to a product output of a previous convolver in its respective row, and the cascade input of a first convolver of each row being connected to a product output of a last convolver of a previous row;

a data input of each convolver being connected to cascade output of a previous convolver in its respective column, and the data output of convolvers of a first row are connected to the data input of a first convolver of said first row by delay means.

34. A system according to claim 33 wherein each convolver includes a said delay means programmable between zero and Q stages of delay.

35. A system according to claim 33 wherein each convolver includes row buffer means programmable for different values M.
 Description Submit all comments and votes
 


BACKGROUND AND SUMMARY OF THE INVENTION

The present invention relates generally to matrix processors and more specifically to matrix processors of images.

Modern image processing frequently requires that a massive number of repetitive computations be performed on a single array of picture elements (pixels) before they are displayed. The purpose of these computations is to enhance the contents of a single picture frame, or modify them to suit a specific image analysis objective. The two most common classes of pixel operations are the pixel-point operations and pixel-group operations. Pixel-point operations are considerably less computation-intensive than the pixel-group operations; they usually call for simple arithmetic or logic operations to be performed on each pixel of the frame. On the other hand, pixel-group operations usually involve a square or rectangular window (convolution kernel) of neighboring pixels upon which a set of arithmetic operations is to be performed. This set of operations is repeated for each pixel of the frame.

The value of the pixel usually indicates the intensity of a specific RGB video component for chromatic applications, or a gray scale value of monochromatic applications (such as radar signal processing or image recognition).

It is the pixel-group operations that have, to date, established the limits of real-time image processing speeds. The most common pixel-group operation in image processing is a so-called spatial convolution, i.e. a process of multiplying selected neighboring pixels by a set of values called a convolution coefficient kernel followed by the summation of the results. For instance a typical application, a set of, say, 9 pixels is arranged as a 3.times.3 matrix: ##EQU1## where x and y indicate pixel's coordinates in the frame array. Each pixel's K-bit value (where K usually lies in the 6-12 bit range) of such a 3.times.3 array is then multiplied respectively by an M-bit coefficient value (where M can be typically 6-16 bit value) from the 3.times.3 coefficient kernel: ##EQU2## The matrix above is called the convolution kernel, and each of its elements is referred to as convolution coefficient. Finally, after all 9 multiplications are completed, their results are then summed to yield the new value of the pixel in the location x,y: ##EQU3##

The new value of the pixel NEWP(x,y) usually corresponds to the modified value of its amplitude (gray scale). The entire process described above is called "2-D spatial filtration" or "2-D spacial convolution" and corresponds to a discrete two-dimensional filtration of the image in the time domain. Depending on the set of values A . . . I picked for the convolution kernel, a number of image processing functions can be accomplished. In particular, operations such as image smoothing, edge detection or extraction, or contrast enhancement can be accomplished. If all kernel coefficients except for the middle one (E) are picked equal to each other, or all nine of them are equal, the kernel is referred to as symmetrical. This is a common form of the kernels used in modern image processing. If three or more coefficients in the kernel differ from each other the kernel is called asymmetrical.

Although the use of larger kernels, such as 5.times.5, 7.times.7 or even 15.times.15, is even more desirable (since it increases the bandwidth of convolution), the amount of computations involved in convolution with such large kernels is often prohibitive for most applications. Since industry standard frame array sizes vary from 256.times.256 pixels to 4096.times.4096, the number of multiplications and additions which have to be performed for a single frame convolution varies from approximately 600,000 to almost 160 million per frame. Consequently, if the frames are to be convolved in real time (i.e. processed at the same rate, or faster, than they are acquired and digitized), the total frame convolution time puts an obvious restriction on the image acquisition time. Thus, for example, if the industry standard medium resolution image of 512.times.512 is to be acquired and convolved using 3.times.3 kernel, almost 2.5 million multiplications and additions will have to be performed. If a single eight-bit multiply/accumulate operation is assumed to require 50 nanoseconds using off-the-shelf multiplier-accumulater (MAC), the total amount of time required to complete a single frame convolution will be 0.125 seconds. This, in turn, would imply that the maximum frame repetition rate would be limited to 8 Hz, a rate too slow for most industrial and commercial applications typically requiring at least a 30 Hz frame repetition rate.

Consequently, most modern processors do not offer real-time 3.times.3 convolution capabilities. On the other hand, in most industrial and military applications, the frame repetition rates vary from 30 Hz (interlaced NTSC standard) to as fast as 400 Hz. This implies that for 512.times.512 pixel arrays total frame convolution times in the millisecond range are needed. In practice, such convolution speeds have rarely been accomplished and only in board-level designs. ECl-based designs can meet such requirements.

Thus, it is an object of the present invention to provide a matrix processor capable of real-time processing.

Another object of the present invention is to provide a matrix processor which can do real-time convolution of 3.times.3, 5.times.5 and other kernels.

Still a further object of the present invention is to provide an image processor capable of doing kernel convolutions of matrix from N.times.M to multiples of that array.

These and other objects are obtained by a system for Q.times.R kernel convolution of a M.times.N array of words. The system includes an input receiving the words P from the M.times.N array and coefficient storage for storing the coefficient C.sub.1 through C.sub.Q.times.R. A matrix of Q.times.R multipliers each have a first input for receiving a word Pi and a second input for receiving a dedicated coefficient Cj for producing a product PiCj. An adder is connected to the output of the plurality of multipliers for adding the products to provide an output convolution at a first or product output.

A plurality of buffers connect the word inputs to respective multipliers for storing one or more words and delaying the input of the word to a respective multiplier as a function of its position in the Q.times.R matrix and the row length M of the M.times.N array. The input structure is connected to the first multiplier of the row without a buffer. The multipliers in the matrix have a row input with the first row being connected to the input structure and the other rows' inputs are connected to the row input of a preceding row by a row buffer having M stages. Each of the multipliers within a specific row is connected to the row input by a buffer having a stage for each position it is spaced from the first position. Alternatively, the row buffer would have M-R+1 stages and each of the multipliers in a row would be connected to the preceding multiplier by a single stage buffer.

In either embodiment, the row buffer is programmable for different values of M. This allows the system to handle different values of word arrays without modification of multiplier structure. A cascade output as well as the product output may be provided representing the total delay or number of stages through the Q.times.R matrix. Depending upon the embodiment, it is either out of the last multiplier or out of the last row buffer.

The input may be through an arithmetic logic unit and through programmable stages of delay. A second input or cascade input is provided and is connected to the rows other than the first row through a multiplexer. The other input to the multiplexer is from the row buffers. The multiplexer would provide that the input word be connected to the first row with the other rows receiving either their inputs from the row buffers or from the cascade inputs. This allows versatility of the base kernel Q.times.R to be used in a structure for larger kernels or larger word arrays. A summer for each row R provides a sum of the products of its respective row to an output adder. The output adder provides the output convolution Pcout. The second or cascade input may also provide sums from other kernel stages to the output adder. A pair of coefficient stores are provided and a multiplexer chooses which store is being provided as an input to the multiplier matrix.

The system is converted to a (Q.times.R).times.1 convolution by programming the row buffers to Q stages. Arrays larger than M.times.N may be handled by a single system for Q.times.R kernel convolution by providing additional external row buffers connected to the second inputs. Alternatively a plurality of Q.times.R kernel convoluting systems may be cascaded. For different size kernels other than Q.times.R, the basic Q.times.R convoluting system maybe cascaded and selective coefficients set to zero.

Where the convolution kernel is larger than the convolver cell, the convolver cells may be arranged in a group of rows and columns. The cascade input of each convolver is connected to a product output of a previous convolver in its respective row. The cascade input of the first convolver of each row is connected to a product output of the last convolver of a previous row. The data input of each convolver is connected to a cascade output of a previous convolver in its respective column. The input of the convolvers of the first row are connected to the data input of the first convolver by appropriate delays. The delays may be external or part of the internal programmable input delay stages. The appropriate convolution kernel as well as variation in the matrix of the input can be controlled using appropriate coefficient to the individual convolvers.

Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a convolver incorporating the principles of the present invention and illustrated as a 3.times.3 kernel;

FIG. 2A is a map for a 3.times.3 convolver;

FIG. 2B is a functional diagram of one embodiment of a 3.times.3 convolver according to the principles of the present invention;

FIG. 2C is a functional diagram of another embodiment of a 3.times.3 convolver of the present invention;

FIG. 3A is a block diagram of the convolver of FIG. 1 used in combination with external row buffers;

FIG. 3B is a functional diagram of a 3.times.3 convolver of the present invention used in its cascade configuration;

FIG. 4 is a block diagram of the convolver of FIG. 1 arranged in an array of rows and columns to provide a kernel larger than that of the individual convolver as well as handling larger input arrays;

FIG. 5A is a coefficient mask for a 6.times.6 kernel convolution;

FIG. 5B is a block diagram of convolvers of FIG. 1 for a 6.times.6 kernel convolution;

FIG. 6A is a map for a 3.times.6 kernel convolution;

FIG. 6B is a block diagram of convolvers of FIG. 1 connected in a 3.times.6 kernel convolution;

FIG. 7 is a block diagram of the use of the convolver of FIG. 1 with a controller.

DETAILED DESCRIPTION OF THE DRAWINGS

One of the most frequently encountered operations in image processing systems is 2-D convolution, or kernel convolution. This is a pixel group operation which is used to perform image filtering functions such as: edge extraction, feature enhancement, smoothing, or any combination of these. Until recently, only board level systems were available to handle the extraordinary computing requirements encountered in 2-D convolution. Today there are just a handful of IC's which address this challenge, and only a few offer a single chip solution. The present convolver accelerator, is a single chip alternative to 3.times.3 kernel convolution, while being versatile enough to handle larger convolution tasks through cascadability, and 1-D convolution through programmability.

Although the description will address mainly high speed 1-D or 2-D digital convolution, the present convolver can be used for many other real time image processing and enhancement, robotics and computer vision, and radar signal processing applications.

The convolver is a cascadable IC that is capable of performing real time kernel convolution. As a single IC, it can perform 3.times.3 kernel convolution on up to a 1k.times.N frame of pixels. It will accept the pixels in non-interlaced raster scan order at a clock rate up to 40 MHz. The convolved image will be available at the outputs in the same raster scan order it was received beginning after the 3rd pixel of the 3rd row has been accepted. Processing is complete only one cycle after the last pixel has been accepted, or after 1k.times.N cycles, where N is the number of rows in the frame. Thus, at a 40 MHz pixel clock rate, an entire 512.times.512 image is processed in under 7 milliseconds, with the first new pixel available after only 26 microseconds, while a 1k.times.1k image is convolved in 27 milliseconds, with the first pixel available after only 51 microseconds.

The convolver includes on-chip row buffers which can be programmed to any desired depth, up to 1024 pixels. This allows for convolution on non-standard, user defined row sizes, and also provides 9-tap one-dimensional (1-D) filtering capabilities. An ALU and a Shifter are also included in the pixel data path to allow pixel point operations to be performed on the incoming data.

Cascadability allows multiple convolvers to be used for frames larger than 1024 pixels per row, and also for kernels larger than 3.times.3. There is also an option allowing it to interface with external row buffers, giving the user a low cost method for convolving larger frames.

Although the convolver will be described as a 3.times.3 kernel, this by way of example and the basic convolver unit may be considered a Q.times.R kernel convolver. Similarly, the pixel array first example has been a 512.times.512 array but may also be considered a generic M.times.N array. Wherein the basic convolver unit is used in array to increase the kernel beyond its Q.times.R capacity, this kernel array will have the generic notation of S.times.T.

The convolver 10, as illustrated in FIG. 1, includes input port 12 for the pixel data Pin and input port 14 for the coefficient Cin. An additional input CASI is generally used for cascading at port 16. The outputs of the convolver 10 include the convoluted kernel Pcout at port 18 and a cascading output CASO at port 20. A control logic 22 then receives a plurality of inputs to be described in more detail below. The pixel data input port 12 is connected to a plurality of cascaded buffers 24, 26 and 28 whose outputs are connected through a 4 to 1 multiplexer 30 as a first input Din to the arithmetic logic unit ALU 32. The control logic unit 22 selects the amount of delay of the pixel input Pin to the ALU 32. Although multiplexer 30 is shown, the buffers 24, 26, and 28 may be programmably selected.

The coefficient port 14 in addition to providing coefficients to be used in the multipliers, may also provide the second input to the ALU 32 through a register ALUR 34. By storing the second input to the ALU in ALUR 34, not only can the coefficient input 14 be used for other functions, but a fixed value can be maintained in the ALUR 34 for many cycles or frames. The coefficient input 14 may also be used to provide ten bits of microcode to AMC register 36 and nine bits of instruction code to INT register 38. A typical example of the microcodes used in the ALU and stored in AMC register 36 is shown in Table 1.

TABLE 1 ______________________________________ Pixel Point (ALU) OPERATIONS AMC REG AMC REG 7-LSB's Operation 3-MSB's Operation ______________________________________ 0000000 Logical (0) 000 No Shift 0001000 Logical (A AND B) 001 Shift Right 1 0010000 Logical (A AND B) 010 Shift Right 2 0011000 Logical (A) 011 Shift Right 3 0100000 Logical (A AND B) 100 Shift Left 1 0101000 Logical (B) 101 Shift Left 2 0110000 Logical (A XOR B) 110 Shift Left 3 0111000 Logical (A OR B) 1000000 Logical (A NOR B) 1001000 Logical (A XNOR B) 1010000 Logical (B) 1011000 Logical (A OR B) 1100000 Logical (A) 1101000 Logical (A OR B) 1110000 Logical (A NAND B) 1111000 Logical (1) 0110001 SUM (A plus B) 1001010 DIFF (A minus B) 1001100 DIFF (B minus A) ______________________________________

By using different bits for the ALU logic operation from the shifting operation, both types of operations may be performed in combination. A default condition of the AMC register 36 is bypass which is Logical (A) with no shift.

A typical example of the nine bits of instructions in the INT register 38 are shown in Table 2.

TABLE 2 ______________________________________ BIT INSTRUCTION REGISTER ______________________________________ Cin <0> cascade input mode; i.e. internal or external row delays Cin <1:2> number of input delays at the Din input bus Cin <3> number format, for Pin unsigned or 2's complement Cin <4> number format, for Cin unsigned or 2's complement Cin <5:6> type of rounding performed on Pcout, i.e. no rounding, 16 bit rounding and 8 bit rounding Cin <7:8> number of shift for CASI, i.e. None, 2, 4, 8 ______________________________________

The first bit Cin <0> selects either internal row delays using programmable row delays 42 and 44 or external row delays as received on the cascade input 16. This selection is executed by multiplexers 48 and 50. The second and third bits Cin <1:2> select the number of input delays by controlling multiplexer 30 or direct control of input delay stages 24, 26 and 28. The fourth bit Cin <3> signifies the input and output number format of pixels input Pin and output Pcout. The fifth bit Cin <4> signifies the input number format of the coefficients Cin.

The sixth and seventh bits Cin <5:6> determine the type of rounding performed at the output Pcout at port 18. The rounding control of the instruction register will vary depending upon the available length of the convolved output Pcout 18. To increase the resolution of the output 20 bits are shown. Using a sixteen bit round, a one is added internally to the bit position Pcout <3>. When an eight bit round is selected, a one is added to the bit position Pcout <11>. The eight and ninth bits Cin <7:8> define the amount of shifting of the cascade input CASI at port 16. This may be no shift, 2 bit shift, 4 bit shift or an eight bit shift, for example. The default state of the INT register 38 after a reset is all zeros.

The output from the ALU 32 is provided directly to the first row of the multiplier matrix 40 as well as to a first programmable delay stage or shift register 42. The output of programmable shift register 42 is provided to the second row of the multiplier matrix 40 through the multiplexer 48 as well as providing an input to a second programmable shift register 44. The output of the programmable delay stage or shift register 44 is provided to the third row of the multiplier matrix 40 through a multiplexer 50. The second input to the multiplexers 48 and 50 is from the cascade input port CASI 16 As noted in Table 2, the cascade input mode is selected by the first bit of the instruction register 38 to control the multiplexers 48 and 50. For an internal delay, the programmable row buffers 42 and 44 are selected as the input to the multiplier matrix 40, by multiplexers 48 and 50. For external row delays, the input port 16 for the CASI is selected as the input to the multiplier matrix 40, by multiplexers 48 and 50.

The programmable shift registers 42 and 44 can each store up to for example 1024 8-bit pixel values and thus delay the pixels from 0 up to 1024 clock cycles. The delay is represented by a small m which represents the row length of the frame. The desired value m for the programmable shift registers 42 and 44 is stored in end of row (EOR) register 46 in the control logic 22 and is inputted through the coefficient input Cin 14. The binary value stored in the end of row register 46 is interpreted by the control logic 22 and the signal is sent to the programmable row buffers 42 and 44, to provide an offset between the real and write pointers. The default position for the end of row register 46 is a maximum length.

The coefficients are provided at the coefficient input Cin 14 to a first coefficient register CREGO 52 and a second coefficient register CREGI 54. The use of two coefficient registers 52, 54 allows the user to load a new set of coefficients without disturbing the current convolution. A multiplexer 56 selects which coefficient registers are used as the coefficients in the multiplier matrix 40.

The multiplier matrix 40 consists of nine 8.times.8 multipliers. They are arranged in three vertical rows with each row representing a three-tap 1-dimensional FIR filter. The first input of each of the multipliers is a pixel value or a delayed pixel value while the second input is a coefficient from the coefficient registers. The rows are separated by the row buffers 42 and 44 resulting in a 2-D sum of products. Each row provides its own sum of product outputs SOP1, SOP2 and SOP3. The sum of products are added in a final adder 57 which provides the convolved output Pcout at port 18.

The cascaded input CASI port 16 may also be provided to the adder 57 through a multiplexer 58. The multiplexer 58 includes a register to store the cascade input CASI and buffer the adder 57 from changes on port 16 during a cycle. When the first bit of the instruction register 38 selects internal buffering, multiplexer 58 connects CASI at port 16 to the final adder 57. When the first bit of the instruction register 38 selects cascade or external buffering, multiplexer 58 provides no input. Thus the convolver 10 may receive a cascaded input to be added to its internal generated sum of products SOP1, SOP2 and SOP3. The cascade input CASI on port 14 provided to the final adder 57 may be used to add offset values. This could be used for increasing the brightness of the convolved image as well as other desired effects.

By selecting the internal row buffers 42 and 44 by bit one in the instructions register 38, the output Pcout at port 18 is represented by Equation 1 and the convolver 10 has a configuration illustrated in FIG. 2B. ##EQU4##

If the cascade input CASI 16 is used as the input to the second and third rows of the multiplier 40 by bit one in the instruction register 38, the output Pcout at port 18 will be represented by Equation 2 and the convolver 10 has a configuration illustrated in FIG. 3B. ##EQU5##

The inputs to the control logic 22 include the clock input CLK which can run at a maximum speed of 40 MHz. The new FRAME input, also known as the vertical sink input, is used to reset all the internal circuitry except for the coefficient registers 52, 54, ALU 32, AMC register 36, EOR register 46 and INT registers 38. Thus after a FRAME reset has occurred, a new frame of pixels maybe convolved without reloading these registers. The reset RSB input resets all the internal circuitry. Thus after a reset occurs all data registers must be initialized or remain at the default value. The input HOLD is used to gate the clock input CLK for all internal circuitry. Thus when HOLD is enabled (high), the CLK has no effect on the processor and the internally stored data registers. This pin is useful for stopping the clock internally during blank periods. The output enable input OEB is used to tristate the convolved output Pcout at port 18. The input EALU enables the ALU 32's B input path. Thus the second pixel input on the coefficient input port 14 is stored in the ALU register 34 and is provided as a second input to the ALU 32.

The loading of the various registers and the selection of the appropriate coefficient using multiplexer 56 is controlled by the three bit input A. The load strobe input LDB enables the function determined by the address A. The chip select input CSB enables or disables the load strobe LDB pin port. Thus, it is not required to externally gate the LDB strobe signal. The various functions controlled by the address A are described in Table 3.

TABLE 3 ______________________________________ A<2:0> Function ______________________________________ 000 Load EOR Register 46 001 Load ALU Microcode is AMC Reg. 36 010 Load CREGO 52 011 Load CREG1 54 100 Load INT Register 38 101 Select CREGO (coef. mask) MUX56 110 Select CREG1 (coef. mask) MUX56 ______________________________________

The default position for multiplexer 56 of the coefficient registers is selection of the first register CREGO 52.

The specific structure of the multiplier 40 will be described with respect to FIGS. 2 and 3 for its two modes of operation, namely internal row buffering and external row buffering. The 3.times.3 mask illustrated in FIG. 2A is applicable to FIGS. 2 and 3. For the first bit in the instruction register 38 being a 0, the internal row buffers 42 and 44 are used. The configuration of the multiplier matrix is illustrated in FIG. 2B. The output of the ALU 32 is provided directly to the first row which includes multipliers 60, 61 and 62 receiving coefficients I, H and G respectively. Buffers 70 and 71 connect the output of the ALU 32 to respective multipliers. The amount of the delay represents the position in the row from the first multiplier 60. The outputs of the multipliers 60, 61 and 62 are combined in a summer 80 to provide the sum of their partial products SOP1. The output SOP1 of summer 80 is provided to a final adder 57.

The first programmable row buffer delay element 42 connects the output of the ALU 32 directly to its row of multipliers. The first multiplier 63 receives a signal from 42 directly while the multipliers 64 and 65 have appropriate delay stages 72 and 73 depending upon their position from the first multiplier 63. Multipliers 63, 64 and 65 multiply the delayed signals by the coefficients F, E and D respectively and the partial products to summer 81. The output SOP2 of summer 81 is provided to the final adder 57.

The output of the first programmable buffer 42 is connected through row buffer 44 to the next row of multipliers. The output of row buffer 44 is connected directly to multiplier 66 and through appropriate delays 74 and 75, depending upon their position in the row, to multipliers 67 and 68. The multipliers 66, 67 and 68 multiply their respective coefficients C,B and A times the delayed input and provide these partial products to summer 82. The output SOP3 of summer 82 is provided to the final adder 57. The output Pcout of final adder 57 at port 18 is equal to Equation 1.

Each of the rows of equation 1 represents a sum of Partial products SOP1, SOP2 and SOP3. The last term CASI(n) in Equation 1 represents the cascade input pr