|
Description  |
|
|
INTRODUCTION
The invention relates to methods and equipment for decoding the output of
residue number system (RNS) signal processors used in digital filter
processes and the like.
BACKGROUND AND PRIOR ART
In the design of digital filters of large order N having demanding
computational speed, range and other requirements, the use of transform
methods, e.g., FFT has a number of drawbacks.
It has been proposed to overcome these shortcomings by use of residue
number system (RNS) techniques.
Use of the residue number system in signal processors permits parallel
processing using multi-channel, short word length circuit configurations.
An RNS signal processor transforms numbers from binary or decimal form
into residue form which represents the number by a distinct set of smaller
residue values. Once the value is in residue form, the signal processor is
able to perform certain arithmetical operations using the so-called system
of residue class arithmetic.
Residue class arithmetic is a modulo form of calculation. Here any value
must be within a modulus range from zero through the modulus. The system
is periodic: any value outside the modulus range is transformed to a
corresponding value within the modulus range. This corresponding value is
the integer difference between the numeric value and the greatest multiple
of the modulus less than the numeric value.
To convert a decimal number N to its modulus value, a division is performed
with the decimal number, N, being the divident and the modulus, p.sub.i,
or multiples thereof less than N, being the divisor. The remainer,
r.sub.i, of this division is the modulus representation called the
residue. The relationship is sometimes expressed as:
N=r.sub.i mode.sub.i ( 1)
Thus where the modulus, P.sub.i, is 7 and the numeric value, N, is 10 its
modulus value or residue, r.sub.i, is 10-7=3. If the numeric value is 16
its residue value is 16-(2.multidot.7)=2.
Residue number systems of interest herein use moduli from a restricted
moduli set made up of pairwise relatively prime integers. Relatively prime
or mutually prime numbers as used herein refer to moduli which have no
common divisor greater than 1 even though each individual modulus may be
divisible by other than one and the number itself. Using this restricted
class any integer N can be uniquely coded in a residue number system as a
sequence of residue digits [r.sub.1, r.sub.2, .. . , r.sub.L ]where the
moduli are relatively prime.
The range of the system is from 0 tW-1 where W is the product of the
moduli. If negative numbers are to be represented, the range is -W/2 to
(W/2)-1 when W is even.
In moduli form the number N is sometimes represented as N (P.sub.1,
P.sub.2, P.sub.3, . . . , P.sub.L) where P.sub.1, P.sub.2, P.sub.3, ...,
P.sub.L are the co-prime moduli. Where, for example, N is an operand equal
to 18 and the moduli set is 7, 15, 17 then the residues are 4, 3 and 1 and
this set of residue values [4, 3, 1] uniquely represents the operand value
18. The range W-1 of this moduli system is (7.15.17)-1=1784.
These residues are dependent only upon their corresponding modulus and not
upon each other. In this way each residue can be operated on independently
of the others. Further, there are no burrows or carries and the word
lengths of these residues are much shorter than the decimal or digital
word lengths, thereby allowing faster operations with less hardware. The
limited number of solutions within each channel permits their storage in
ROMS for fast access. The signal processor operates on each of the residue
values independently using the periodic modulo arithmetic. Although some
RNS operations are relatively cumbersome, those involving multiplication
and addition can be relatively fast and are therefore attractive in
digital filter designs such as FIR types which are implemented with
multiplication and addition.
After the signal processor has finished executing all of the periodic
residue operations on each of the individual inputted residue values, a
decoding method must then be implemented to convert the processed residue
values into the true external representation of the output. This is often
a cumbersome process involving significant computational overhead.
Decoding residue numbers is frequently based upon an implementation of the
chinese remainder theorem (CRT). This theorem takes each individual
residue and multiples it by a distinct coefficient corresponding to the
individual modulus. These products are then all summed together. The
summation is then reduced using modulo reduction. The chinese remainder
theorem can be expressed as:
##EQU1##
where N.sub.out is the true external representation of the processed
value, C.sub.i is the individual CRT coefficient and M is the product of
all the moduli, P.sub.i. The CRT coefficient, C.sub.i, can be obtained
through the relationship,
C.sub.i =i(M/P.sub.i)=1 Mod(P.sub.i) where O<b.sub.i <P.sub.i. ( 4)
By finding all of the CRT coefficients and utilizing them in the CRT
summation, N.sub.out may be obtained. This final output is the true
external representation of the processed signal.
To implement the chinese remainder theorem in digital processing, a
separate equivalent form of the CRT is used such that
##EQU2##
where s is the number of moduli to be decoded and k is the CRT value
address.
This form is then reduced using Horner's rule to yield a recursive
expression of the form
N.sub.out =(f(s-1)+2(f) (s-2)+. . . 2(f(1)+2(f(0)) . . . Mod M. (6)
This is the final function that represents the implemented algorithm.
In order to increase system speed the function values, (f(k) can each be
multiplied prior to decoding and stored in memory. These values can be
accessed during decoding because they are uniquely determined by the
address, k.
While residue processor decoders have been proposed for various signal
processing applications, they are incapable of meeting the needs of some
systems having stringent throughput, accuracy and range requirements.
It is accordingly an object of the invention to provide RNS signal decoding
techniques which are capable of real time operation in applications
requiring high accuracy and speed.
Another object is to provide such techniques which are capable of wide
dynamic range utilizing a relatively small number of channels.
A further object of the invention is the provision of filter techniques
which are flexible with respect to the number of moduli utilized and their
values.
A still further object is to provide filtering techniques which are
hardware transparent. Various families of components may be used in
implementing the functional elements of the circuit.
BRIEF SUMMARY OF THE INVENTION
To achieve the foregoing objectives, the invention utilizes an RNS decoder
system which comprises;
(a) means for individually accessing each of a plurality of output channels
of a residue number system signal processor containing a plurality of
residue values;
(b) means for selecting from those values a chinese remainder theorem (CRT)
function value;
(c) means for summing said function value with a previously summed value;
(d) means for multiplying the new summed value by 2;
(e) means for testing the resultant product to see if it is outside a
predetermined range dependent on the decoded moduli;
(f) means for reducing said product by the value of the modulus if it is
outside said range;
(g) means for repeating steps (e) and (f) as needed to bring said modulus
within said range;
(h) means causing the summed value to be outputted;
(i) means for repeating steps (a) through (h) as required until the
decoding is complete.
In the preferred embodiment illustrated in the drawing and discussed in
further detail below, the various residue values are processed in a
plurality of channels of the signal processor. The channel outputs are
each applied to an individual look-up table. These look-up tables then
output a corresponding address value which corresponds in turn to a
particular Chinese remainder theorem function coefficient.
The CRT coefficients are each in turn applied to a residue number system
shift accumulator system. In the RNS shift accumulator the CRT function
vlaue is summed with previously computed CRT function values held in the
accumulator and shifted once to multiply by two. After this occurs, the
resultant is subjected to adaptive modulo reduction. The resultant is
compared with a value DMR which corresponds to the product of the decoded
moduli minus one. This comparison test whether the new value is within the
DMR range. If it is not within the modulus range a negative of the
corresponding modulus number is then added to the new value as many times
as is necessary to bring it within the range.
Once the value is brought within the modulus range the system checks to see
if any other channel outputs have not been accessed. If there are such
channels, the system proceeds to the next accessable one and repeats the
decoding method, operating on the value from the previous decoding cycle.
If there are no further channel outputs to be accessed, the system outputs
the final value from the last decoding cycle. This is the final output and
the true external representative of the signal processor output.
Another channel, by representing the modulus 2.sup.b where b is the number
of bits, requires no encoding hardware and supplies the least b
significant bits without decoding.
By utilizing adaptive modulo reduction the invention decreases the function
value during its computation only when it exceeds the modulus rather than
at the end of the function evaluation. This reduces hardware requirements
by minimizing the arithmetic word length and thereby increases speed.
Conversion speed can also be changed by appropriate selection of component
type.
Other objects and functions of the invention will be evident from the
following detailed description.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 is a schematic block diagram which illustrates a processor
incorporating the invention;
FIG. 2 illustrates, in block diagram, a preferred embodiment of the
invention; and
FIG. 3 is a data flow diagram of the implemented algorithm
DETAILED DESCRIPTION OF THE DRAWING
An overview of the environment of the invention is provided in FIG. 1. The
system as illustrated therein includes a control unit 10 which receives
the analog input signal ANALOG IN and converts it to digital form
employing an A-to-D converter 11. The digitally encoded signal which
appears on input/output bus 30 is then supplied to the control unit data
bus 31 via the input/output interface unit 12. The interface unit 12 also
connects with another bus 32 which connects with the decoding section as
described hereinafter.
The control unit 10 includes a microprocessor 13 together with system RAM
14 and system PROM 15. The latter incorporates the filter coefficient data
and program control data as well.
The digitized input signal appearing on bus 30 is supplied to RNS processor
channels 1 to 5 in respective circuits 21 to 25. Circuit 21 incorporates
processor channel 1 representing modulus 256 and receives coefficient data
from system PROM 15 via bus 32. The output is applied to decoder system 26
as illustrated. Circuit 22 incorporates channel 2 performing modulus 15
arithmetic while a circuit 23 embodies channel 3 and performs modulus 13
arithmetic. Similar circuits 24 and 25 incorporate channels 4 and 5 and
perform moduli 11 and 7 arithmetic, respectively. The outputs of these
channels are also supplied to the decoder 26. The channels 2-5 receive
coefficient data via bus 32 and perform the RNS arithmetic operations
required by the particular processor application. Typically, this would
involve RNS multiplication of the input data samples by the filter
coefficient together with SHIFT and accumulation operations.
The decoder 26 receives the modulus channel outputs and decodes them to
provide the solution in digital form. After conversion by digital to
analog converter 28, the results of the process are provided in analog
form as the ANALOG OUT signal. Selector 27 functions to supply the result
in digital form where needed.
As illustrated in FIG. 2, the outputs of the channels 2-5 are applied to
decoder 26 at respective look-up tables 22A, 23A, 24A and 25A which
contain precomputed results of functions related to the differences
between the binary channel 1 output (modulus 256) and each of the other
channels.
An address multiplexer 50 multiplexes the look-up table outputs, applying
them selectively to a CRT ROM 51 under control of an INPUT VALUE ADDR
signal derived from a programmable logic array (PLA) 52.
The address MUX 50 acts to transpose rows and columns of the CRT ROM. These
memory locations store coefficients of the chinese remainder theorem (CRT)
function, e.g. the values f(s-1), f(s-2. . . , f(0) previously referred to
in equation 6.
The output of the ROM 51 is applied to the RNS shift accumulator which
includes an input multiplexer 53, adder 54, storage shift register 55,
comparator 56, output latch 57, and the programmable logic array (PLA) 52.
The PLA 52 contains the control logic for the shift accumulator,
instructing the functional elements in each step of the decoding process.
Thus, at each step of the decoding program, the PLA controls the address
multiplexer 50 to address the CRT ROM 51. A single coefficient is then
outputted to the input multiplexer 53.
Input multiplexer 53 selects either this CRT coefficient or the negative of
the corresponding modulus value. This selection, explained hereinafter, is
based upon a control system signal SELECT from the PLA.
The output of the multiplexer 53 is supplied as one of the inputs to adder
54 which adds to it a sum that has been previously computed and stored in
the storage shift register 55.
After the new sum is stored in the register 55, it is shifted left causing
a binary multiplication of the stored sum by two.
The resultant "A" is then compared in comparator 55 with the decoded moduli
range, DMR, which in the illustrated system is
(.multidot.11.multidot.13.multidot.15)-1=15014, to see if the value A is
greater than the DMR value. If it is, that condition is signified by input
A>B to PLA 52. This causes the PLA to set the state of the SELECT signal,
causing the input multiplexer 53 to select the negative of the modulus
numberand to apply it to the adder 54. This results in a corresponding
reduction in the value "A".
The output of the adder is once again stored in the shift register 55, but
this tie it is not shifted. The stored value is once again compared to the
DMR value to see if it is outside the range. If it is, the corresponding
negative modulus valve is once again added to the previously stored sum
causing it again to be decremented by the modulus value. This routine
continues until a value not greater than DMR is achieved.
This system is an implementation of the previously described adaptive
modulo reduction (AMR) which reduces a function value during its
computation only when it exceeds the modulus, rather than at the end of
the function evaluation.
Once the stored value is within the modulus range, the programmable array
logic then determine if the decoder has processed all of the addresses in
the CRT ROM 50, and hence all of the channel outputs. If the system has
not, the address value is decremented and the decoder steps are repeated.
If the system has processed all of the addresses, a signal "OUTPUT" is
sent by the PLA to the output register 56 enabling it to read in the value
stored in the shift register 55. This value combined with the channel 1
output is the final output and the PLA now stops the decoder process.
Chips found suitable for demonstrating the utility of the system include
four 74LS157s to implement input MUX 53, four 74LS283s to configure adder
54, a set of four PLA 16R4s to form shift register 55, four 74LS85s to
form comparator 55, two PLA 16R4s for the PLA 52 and two 74LS374s for
implementing output register 56.
An illustration of the implementing algorithm is provided in FIG. 3. The
process begins with a clearing of the storage register 55 (step 3A) and an
initialization of the CRT ROM input value address via the address
multiplexer 50 (step 3B). As a consequence, an initial address of ROM 51
is specified leading to the selection of an input value which is applied
to the input MUX 53 (steps 3C, 3D).
As further illustrated in FIG. 3, any previously stored sum in shift
register 55 is added to the value in adder 54 (via the feedback path in
FIG. 2). See step 3E. The sum, appearing as the output of 54, is applied
to the shift register 55 where a shift left is performed to multiply the
sum by a factor of two (step 3F).
As shown by the decision step 3G in FIG. 3, the result is tested to
determine if it is greater than the modulus value minus 1. If it is, then
the negative of the modulus value is selected with the aid of the SELECT
signal from the PAL 52 (step 3H). That negative value is added to the sum
and stored in the register 55 (steps 3J, 3K). The comparison is then
repeated (step 3G). If the result is greater than the modulus minus 1,
than the subroutine is repeated otherwise the process proceeds to the next
step 3L where the PAL 52 causes the input address of the ROM to be
decremented.
If all of the ROM values have been accessed, the process is completed and
the output is registered (steps 3M, 3N) otherwise the procedure beginning
with step 3C is repeated.
* * * * *
|
|
|
|
|
Description  |
|