|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates generally to the sharing of data among computing resources. More specifically, the present invention relates to methods, apparatuses and products for securing and verifying the authenticity of data being processed
on a computer system.
BACKGROUND OF THE INVENTION
With the increasing popularity of networked computing environments, such as the Internet, there has been a corresponding increase in the demand for secure transactions between networked computers. For example, when a user of the Internet sends
information to another user, it may be useful for the recipient to verify that the data received has not been corrupted or otherwise altered during transmission. Furthermore, the recipient may also find it useful to be able to verify the identity of the
sender in order to verify that the data received was actually sent by the proper sender, as opposed to an impostor. As a result, methods and algorithms that increase the security of data transmitted over computer networks and other data links have been
developed and deployed with some success. The more secure methods tend to include encrypting all or part of the data prior to sending it, and likewise decrypting the received data prior to using it. Such encryption and decryption techniques may, for
example, include adding encryption data to the data file, and encoding or otherwise transforming the data in the data file with a computer system by running a "signature algorithm".
There are currently several signature algorithms in use. One popular signature algorithm is actually a combination of a Message Digest algorithm and an RSA encryption algorithm (e.g., MD5 with RSA, or MD2 with RSA, or the like). U.S. Pat. No.
4,405,829, issued Sep. 20, 1983 describes the combination of a Message Digest with the RSA that is available from RSA Data Security, Inc. of Redwood City, Calif. Another popular signature algorithm is the DSA encryption algorithm. The DSA encryption
algorithm, which is available from the United States Government, may be used for limited purposes by private parties as a signature algorithm. These signature algorithms will be discussed in limited detail below. For a more detailed description of
these and other signature algorithms and related encryption operations, refer to Applied Cryptography, Second Edition, 1996, by Bruce Schneier which is available from John Wiley & Sons, Inc. of New York City, N.Y., and which is herein incorporated, in
its entirety, by reference.
The Message Digest with RSA algorithm includes the capability to generate a "digital signature" that can be added to data files. Digital signatures are basically mechanisms through which users may authenticate the source of a received data file. A digital signature is typically a special sequence of data that can be generated and provided along with a related data file to other users. The basic concept behind most signature algorithms is that every user (e.g., individuals, companies,
governments, etc.) will have a "key pair" that includes both a "private key" and a "public key". A key may, for example, be a numerical sequence. The private key is a unique key that is assigned to a single user and intended to be kept secret by that
user. The private key may be used by the assigned user to create a digital signature for a data file with a signature algorithm. The public key, on the other hand, is typically made available to all other users. The public key may be used by these
other users to verify that the digital signature on a received data file is authentic (i.e., that the digital signature was created with the private key). The verification process is accomplished with the same signature algorithm. In principle, such a
verification process may provide a relatively high level of confidence in the authenticity of the source of the received data.
In addition to digital signature generating algorithms, there are also algorithms that may be used to authenticate that the data file has not been corrupted in some manner. These algorithms are typically known as "one-way hash functions." One
example of such an algorithm is the Message Digest, discussed above. A one-way hash function usually does not require a key. Rather, one-way hash functions typically include additional data that is inserted into the data file. As such, when the data
file is received, the hash function may be used to verify that none of the data within the data file has been altered since the generation of the hash function. However, hash functions are typically limited in that the user can not infer anything about
the origin of the associated file, such as who sent it. It is noted that many signature algorithms use one-way hash functions as internal building blocks.
For relatively open, unsecured networks such as the Internet, it is often useful for users to be able to authenticate received data files prior to using them. Such data files may include, but are not limited to, computer programs, graphics,
text, photographs, audio, video, or other information that is suitable for use within a computer system. Regardless of the type of data file, authentication may be accomplished with a signature algorithm or similar type of encryption algorithm as
described above. By way of example, if the data file is a software program, the user may wish to authenticate that it was sent by a trustworthy authority prior to exposing his or her computer system to the software program, to insure that the program
does not include a "Trojan Horse" that infects the user's computer with a virus. In such a case, the sending user may authenticate the data as described above.
Another example is where the receiving user wishes to authenticate a text and/or image data file prior to displaying it on his or her computer screen. This may be useful to control the display of text and images having undesirable content. For
example, parents may want to limit any access their children may have to pictures and text relating to adult subjects and materials. This can be accomplished by verifying that the data file (e.g., a text or image file), came from a trusted source.
Similarly, providers of text and image files may want to provide a "stamp" of approval or authenticity so as to control the use of tradenames and other intellectual property.
Unfortunately, the process of encrypting and decrypting, signing and verifying, and/or generating hash functions places an additional burden on the sending and receiving user's computational resources. The burden is compounded for users who send
and receive several data files. By way of example, the growth of the portion of the Internet known as the World-Wide Web has lead to a tremendous increase in the transfer of multiple data files between users. These multiple data files often include the
components or objects that constitute an object-oriented software process, such as a Java.TM. applet. To illustrate the potential burden that can be placed on the receiving user's computer resources in such a multiple data file transfer, one need only
calculate the resulting processing time associated with verifying the digital signatures for each of the files. Consider an example wherein a Java.TM. applet includes 200 digitally signed Java.TM. class files (including data files), and the average
verification period is about 1 second on a conventional desktop PC. In such a situation, the user would have to wait for about 200 seconds after receiving the data files to use the applet. Such delays may significantly reduce the effectiveness of such
a computer network environment. This is especially true for data files relating to a timed process, such as streaming audio or video data file in real (or near-real) time.
Therefore, what is desired are more efficient methods, apparatuses and products for securing and verifying the authenticity of data files, especially for data files intended to be transferred over computer networks.
SUMMARY OF THE INVENTION
The present invention provides more efficient methods, apparatuses and products for securing and verifying the authenticity of data files, such as data files intended to be transferred over computer networks. In accordance with one aspect of the
present invention, "hybrid" verification techniques are provided that streamline the signature and verification processes such that several data files can be quickly signed for and transferred, and quickly received, authenticated and processed.
In accordance with one embodiment of the present invention, a method for creating a secure data file is provided. The secure data file is suitable for transferring data over a computer network, or between two or more computers. The method
includes providing at least one data file that has an identifier. The identifier can, for example, be generated with a one-way hash function algorithm, or a cyclic redundancy checksum algorithm running on a computer. The data file also includes a
digital bit stream that encodes information such as a text file, an image file, an audio file, a video file, a movie file, and a computer program file. The method further includes creating a signature file that contains a copy of the identifiers for
each of the data files sought to be transferred. The signature file also includes a digital signature. The digital signature can, for example, be generated with using a computer having a signature algorithm, such as a DSA algorithm or a combined
Message Digest and RSA algorithm. The signature file can also include additional data, such as, the name of the file, the file's author, the version of the file, a time-stamp, or a rating label.
In accordance with another embodiment of the present invention, an apparatus for creating a secure data file is provided. The apparatus includes an identifier generator that generates an identifier for one or more data files. The apparatus
includes a signature file generator that generates a signature file that contains a copy of the identifiers associated with each of the data files and a digital signature.
In accordance with another embodiment of the present invention, a computer program product is provided. The computer program product includes a computer-usable medium that contains computer-readable program code embodied thereon. The
computer-readable program code can be used with a computer system to create a secure data file by generating an identifier for one or more data files, and a signature file having a copy of each identifier and a digital signature.
In accordance with yet another embodiment of the present invention, a method for verifying the authenticity of at least one data file and a signature file is provided. The method includes verifying a digital signature that is in a signature file
by using a computer. For example, the computer can use a signature algorithm, such as a DSA algorithm or a combined Message Digest and RSA algorithm, to verify the digital signature. Also within the signature file, there is provided one or more
identifiers that are associated with one or more data files. The identifiers can, for example, be created with a one-way hash function algorithm or a cyclic redundancy checksum algorithm. The method further includes comparing each of the identifiers in
the signature file with the identifier in each of the data files. The method can include marking the data file as signed when the identifiers in the data and signature files match. Additionally, the method can include ignoring the data file, aborting
the loading of the data file, or alerting the user when the identifiers in the data and signature files do not match.
In accordance with another embodiment of the present invention, an apparatus is provided that verifies the authenticity of at least one data file and a signature file. The apparatus includes a verifier that verifies the digital signature, and a
comparator that compares an identifier in the data file with a copy of the identifier that is in the signature file.
In accordance with another embodiment of the present invention, a computer program product is provided. The computer program product includes a computer-usable medium that contains computer-readable program code embodied thereon. The
computer-readable program code can be used with a computer system to verify the authenticity of at least one data file and a signature file. The computer-readable program code providing for the verification of a digital signature in a signature file,
and the comparison of an identifier in one or more data files with a copy of the identifier in the signature file.
In accordance with still another embodiment of the present invention, a secure data file in the form of a computer-readable medium is provided. The secure data file can be used in transferring one or more data files between a plurality of
computers. The secure data file includes at least one data file that has an identifier, and a signature file that has a copy of the identifier for the data file and a digital signature.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention, together with further advantages thereof, may best be understood by reference to the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 illustrates a networked computing environment;
FIG. 2 illustrates a typical computer system for use with the networked computing environment in FIG. 1;
FIG. 3a illustrates an embodiment of an archival data structure, including a signature file, for use with an embodiment of the present invention;
FIG. 3b illustrates an embodiment of a signature file, for use with an embodiment of the present invention; and
FIG. 4 is a flow chart of an embodiment of the present invention for use with data structures having signature files.
DETAILED DESCRIPTION OF THE INVENTION
In accordance with the embodiments of the present invention novel methods, apparatuses and products are provided that reduce the computational demands placed on both source user computer systems and receiving user computer systems by requiring
only a single digital signature for an arbitrary number of data files. With an embodiment of the present invention the data files need not be individually signed. Instead, a separate signature file is created such that when it is digitally signed and
later verified, the data files to which it corresponds can be authenticated without running the signature algorithm for each of these data files. In one embodiment, the signature file includes a list of "identifiers," such as one-way hash functions,
that are each associated with a particular one of the data files to be transferred. As such, the signature file is essentially the cryptographic equivalent of a digital signature for each of the data files.
Thus, with an embodiment of the present invention a user can create a single signature file that includes unique identifiers for each of a plurality of data files. The signature file is digitally signed through the use of a signature algorithm.
The signed signature file and the associated data files may then be sent to a receiving user, who verifies the digital signature using the appropriate signature algorithm. Once the digital signature has been verified, the identifiers within the
signature file are compared to the identifiers within the data files. If the identifier within a given data file matches the corresponding identifier in the signature file, then the data file is verified as being authentic. The receiving user can then
proceed to process the verified data files with confidence in their authenticity. As a result, computational delays can be significantly reduced because there is no longer the need to digitally sign and later verify the digital signature for each of the
data files.
FIG. 1 illustrates a networked computing environment 10, as represented by a block diagram of a source user computer system 12 coupled to exchange information in the form of data with a receiver user computer system 14 over a data link 16.
Source user computer system 12 can, for example, take the form of a server computer such as a web server associated with the Internet. Likewise, receiving user computer system 14 can, for example, take the form of a client system that is networked via
data link 16 to a web server. In such a case, data link 16 can therefore represent a portion of, or the entire, Internet and other connected networks. Data link 16 can also represent one or more local area networks (LANs), wide area networks (WANs),
"intranets" or "extranets", or other like telecommunication or data networks.
FIG. 2 illustrates a typical computer system 20 that can be used by either a sending user or a receiving user, in accordance with FIG. 1. Alternatively, computer system 20 can be a stand-alone computer capable of receiving data through computer
useable products. Computer system 20 includes one or more processors 22, a primary memory 24, a secondary memory 26, one or more input/output (I/O) devices 28, one or more network communication devices 30, and one or more buses 32.
Processors 22 provide the capability to execute computer instructions. Processors 22 can, for example, be microprocessors, central processing units (CPUs), or microcontrollers such as found in many of the desktop, laptop, workstation, and
mainframe computers available on the market. Processors 22 can also take the form of conventional or even customized or semi-customized processors such as those typically used in special purpose or larger frame computers, telecommunication switching
nodes, or other networked computing devices. Processors 22 are coupled to output data to buses 32 and to input data from buses 32.
Buses 32 are capable of transmitting or otherwise moving data between two or more nodes. Buses 32 can, for example, take the form of a shared general purpose bus or can be dedicated to transmitting specific types of data between specific nodes.
Buses 32 can include interface circuitry and software for use in establishing a path between nodes over which data can be transmitted. It is recognized that some devices, such as processors 22 can also include one or more buses 32 internally for
transmitting data between internal nodes therein. Data can include processed data, addresses, and control signals.
Primary memory 24 typically provides for the storage and retrieval of data. Primary memory 24 can, for example, be a random access memory (RAM) or like circuit. Primary memory 24 can be accessed by other devices or circuits, such as processors
22, via buses 32.
Secondary memory 26 typically provides for additional storage and retrieval of data. Secondary memory 26 can, for example, take the form of a magnetic disk drive, a magnetic tape drive, an optically readable device such as a CD ROMs, a
semiconductor memory such as PCMCIA card, or like device. Secondary memory 26 can be accessed by other devices or circuits, such as processors 22, via buses 32. Secondary memory 26 can, for example, access or read data from a computer program product
including a computer-usable medium having computer-readable program code embodied thereon.
I/O devices 28 typically provide an interface to a user through which data can be shared. I/O devices 28 can, for example, take the form of a keyboard, a tablet and stylus, a voice or handwriting recognizer, or some other well-known input device
such as, of course, another computer. I/O devices 28 can also, for example, take the form of a display monitor, flat panel display, or a printer. I/O devices 28 can be accessed by other devices or circuits, such as processors 22, via buses 32.
Network communication devices 30 typically provide an interface to other computing resources and devices, such as other computer systems. Network communication devices 30 typically include interface hardware and software for implementing data
communication standards and protocols over data communication links and networks. For example, with a network connection, processors 22 can send and receive data (i.e., information) over a network. The above-described devices and processes will be
familiar to those of skill in the computer hardware and software arts.
FIG. 3a illustrates an embodiment of an archival data structure 300 in accordance with an embodiment of the present invention. Data structure 300 includes a signature file 302 and several associated data files 304-314. Files 304-314 can be any
digital bit stream, such as, for example, Java.TM. class files, image files, audio files, text files, and even additional signature files.
FIG. 3b illustrates an embodiment of a signature file 302. In the illustrated embodiment, signature file 302 includes at least one identifier 316 for each of the data files 304-314. Optionally, signature file 302 can also contain additional
data 318 for each of the data files 304-314. For example, additional data 318 may take the form of the name of the file, the author of the file, the date of the file, the version of the file, the file's rating (e.g., movie rating, such as "PG"), or any
other authenticated data that the users may want to include within signature file 302.
Signature file 302 further includes an identifier ID 320 and a digital signature 322. Identifier ID 320 provides the information necessary to determine the algorithm(s) used to create the identifiers listed in signature file 302. Digital
signature 322 represents the digital signature created for the signature file. The structure of digital signature 322 will depend, of course, on the signature algorithm used to create it.
FIG. 4 illustrates a method 400, in accordance with an embodiment of the present invention, that includes step 402 for generating one or more data files. Step 402 can, for example, include using a text program to generate a text file, a
recording program to generate an audio or video file, a graphics program to generate an image or movie file, a programming language to generate a class file or program file, or any other mechanism that is capable of generating a data file.
Having generated one or more data files in step 402, step 404 includes generating an identifier for each of these data files. The identifiers generated in step 404 can, for example, be generated by a one-way hash function algorithm, or
alternatively can even take the form of a cyclic redundancy checksum (CRC), or the like. It is recognized, however, that generally a one-way hash function algorithm tends to provide for greater security because such functions cannot be easily or
efficiently broken or otherwise reverse-engineered. By way of example, one-way hash function algorithms, such as MD5 and SHA are typically considered to be cryptographically secure. Such algorithms will be known to those having skill in the computer
science art.
Next, step 406 includes creating a signature file that lists, or otherwise compiles, the identifiers as generated in step 404. A signature file can, for example, be a text file that lists the identifiers. Optionally, a signature file can
further include, for example, the name of each file, the author of each file, the file version, a date-stamp for the file, or other data relating to each data file. Step 406 can further include one or more programs that inquire, trace, select, or
otherwise gather or render such data from the data files. Step 406 can be performed, for example, by processing the data files in a batch mode process to gather the appropriate identifiers and any additional data. Those skilled in the art will
recognize that there can be benefits (e.g., in efficiency) to specifically ordering, grouping or otherwise arranging the data listed in the signature file in some manner that expedites the steps in method 400. For example, it can | | |