|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to the field of computer software, and, more
specifically, to object-oriented computer applications.
Portions of the disclosure of this patent document contain material that is
subject to copyright protection. The copyright owner has no objection to
the facsimile reproduction by anyone of the patent document or the patent
disclosure as it appears in the Patent and Trademark Office file or
records, but otherwise reserves all copyright rights whatsoever.
2. Background Art
With advancements in network technology, the use of networks for
facilitating the distribution of media information, such as text,
graphics, and audio, has grown dramatically, particularly in the case of
the Internet and the World Wide Web. One area of focus for current
developmental efforts is in the field of web applications and network
interactivity. In addition to passive media content, such as HTML
definitions, computer users or "clients" coupled to the network are able
to access or download application content, in the form of applets, for
example, from "servers" on the network.
To accommodate the variety of hardware systems used by clients,
applications or applets are distributed in a platform-independent format
such as the Java.RTM. class file format. Object-oriented applications are
formed from multiple class files that are accessed from servers and
downloaded individually as needed. Class files contain bytecode
instructions. A "virtual machine" process that executes on a specific
hardware platform loads the individual class files and executes the
bytecodes contained within.
A problem with the class file format and the class loading process is that
class files often contain duplicated data. The storage, transfer and
processing of the individual class files is thus inefficient due to the
redundancy of the information. Also, an application may contain many class
files, all of which are loaded and processed in separate transactions.
This slows down the application and degrades memory allocator performance.
Further, a client is required to maintain a physical connection to the
server for the duration of the application in order to access class files
on demand.
These problems can be understood from a review of general object-oriented
programming and an example of a current network application environment.
Object-Oriented Programming
Object-oriented programming is a method of creating computer programs by
combining certain fundamental building blocks, and creating relationships
among and between the building blocks. The building blocks in
object-oriented programming systems are called "objects." An object is a
programming unit that groups together a data structure (one or more
instance variables) and the operations (methods) that can use or affect
that data. Thus, an object consists of data and one or more operations or
procedures that can be performed on that data. The joining of data and
operations into a unitary building block is called "encapsulation."
An object can be instructed to perform one of its methods when it receives
a "message." A message is a command or instruction sent to the object to
execute a certain method. A message consists of a method selection (e.g.,
method name) and a plurality of arguments. A message tells the receiving
object what operations to perform.
One advantage of object-oriented programming is the way in which methods
are invoked. When a message is sent to an object, it is not necessary for
the message to instruct the object how to perform a certain method. It is
only necessary to request that the object execute the method. This greatly
simplifies program development.
Object-oriented programming languages are predominantly based on a "class"
scheme. The class-based object-oriented programming scheme is generally
described in Lieberman, "Using Prototypical Objects to Implement Shared
Behavior in Object-Oriented Systems," OOPSLA 86 Proceedings, September
1986, pp. 214-223.
A class defines a type of object that typically includes both variables and
methods for the class. An object class is used to create a particular
instance of an object. An instance of an object class includes the
variables and methods defined for the class. Multiple instances of the
same class can be created from an object class. Each instance that is
created from the object class is said to be of the same type or class.
To illustrate, an employee object class can include "name" and "salary"
instance variables and a "set.sub.-- salary" method. Instances of the
employee object class can be created, or instantiated for each employee in
an organization. Each object instance is said to be of type "employee."
Each employee object instance includes "name" and "salary" instance
variables and the "set.sub.-- salary" method. The values associated with
the "name" and "salary" variables in each employee object instance contain
the name and salary of an employee in the organization. A message can be
sent to an employee's employee object instance to invoke the "set.sub.--
salary" method to modify the employee's salary (i.e., the value associated
with the "salary" variable in the employee's employee object).
A hierarchy of classes can be defined such that an object class definition
has one or more subclasses. A subclass inherits its parent's (and
grandparent's etc.) definition. Each subclass in the hierarchy may add to
or modify the behavior specified by its parent class. Some object-oriented
programming languages support multiple inheritance where a subclass may
inherit a class definition from more than one parent class. Other
programming languages support only single inheritance, where a subclass is
limited to inheriting the class definition of only one parent class. The
Java programming language also provides a mechanism known as an
"interface" which comprises a set of constant and abstract method
declarations. An object class can implement the abstract methods defined
in an interface. Both single and multiple inheritance are available to an
interface. That is, an interface can inherit an interface definition from
more than one parent interface.
An object is a generic term that is used in the object-oriented programming
environment to refer to a module that contains related code and variables.
A software application can be written using an object-oriented programming
language whereby the program's functionality is implemented using objects.
A Java program is composed of a number of classes and interfaces. Unlike
many programming languages, in which a program is compiled into
machine-dependent, executable program code, Java classes are compiled into
machine independent bytecode class files. Each class contains code and
data in a platform-independent format called the class file format. The
computer system acting as the execution vehicle contains a program called
a virtual machine, which is responsible for executing the code in Java
classes. The virtual machine provides a level of abstraction between the
machine independence of the bytecode classes and the machine-dependent
instruction set of the underlying computer hardware. A "class loader"
within the virtual machine is responsible for loading the bytecode class
files as needed, and either an interpreter executes the bytecodes
directly, or a "just-in-time" (JIT) compiler transforms the bytecodes into
machine code, so that they can be executed by the processor. FIG. 1 is a
block diagram illustrating a sample Java network environment comprising a
client platform 102 coupled over a network 101 to a server 100 for the
purpose of accessing Java class files for execution of a Java application
or applet.
Sample Java Network Application Environment
In FIG. 1, server 100 comprises Java development environment 104 for use in
creating the Java class files for a given application. The Java
development environment 104 provides a mechanism, such as an editor and an
applet viewer, for generating class files and previewing applets. A set of
Java core classes 103 comprise a library of Java classes that can be
referenced by source files containing other/new Java classes. From Java
development environment 104, one or more Java source files 105 are
generated. Java source files 105 contain the programmer readable class
definitions, including data structures, method implementations and
references to other classes. Java source files 105 are provided to Java
compiler 106, which compiles Java source files 105 into compiled ".class"
files 107 that contain bytecodes executable by a Java virtual machine.
Bytecode class files 107 are stored (e.g., in temporary or permanent
storage) on server 100, and are available for download over network 101.
Client platform 102 contains a Java virtual machine (JVM) 111 which,
through the use of available native operating system (O/S) calls 112, is
able to execute bytecode class files and execute native O/S calls when
necessary during execution.
Java class files are often identified in applet tags within an HTML
(hypertext markup language) document. A web server application 108 is
executed on server 100 to respond to HTTP (hypertext transport protocol)
requests containing URLs (universal resource locators) to HTML documents,
also referred to as "web pages." When a browser application executing on
client platform 102 requests an HTML document, such as by forwarding URL
109 to web server 108, the browser automatically initiates the download of
the class files 107 identified in the applet tag of the HTML document.
Class files 107 are typically downloaded from the server and loaded into
virtual machine 111 individually as needed.
It is typical for the classes of a Java program to be loaded as late during
the program's execution as possible; they are loaded on demand from the
network (stored on a server), or from a local file system, when first
referenced during the Java program's execution. The virtual machine
locates and loads each class file, parses the class file format, allocates
memory for the class's various components, and links the class with other
already loaded classes. This process makes the code in the class readily
executable by the virtual machine.
The individualized class loading process, as it is typically executed, has
disadvantages with respect to use of storage resources on storage devices,
allocation of memory, and execution speed and continuity. Those
disadvantages are magnified by the fact that a typical Java application
can contain hundreds or thousands of small class files. Each class file is
self-contained. This often leads to information redundancy between class
files, for example, with two or more class files sharing common constants.
As a result, multiple classes inefficiently utilize large amounts of
storage space on permanent storage devices to separately store duplicate
information. Similarly, loading each class file separately causes
unnecessary duplication of information in application memory as well.
Further, because common constants are resolved separately per class during
the execution of Java code, the constant resolution process is
unnecessarily repeated.
Because classes are loaded one by one, each small class requires a separate
set of dynamic memory allocations. This creates memory fragmentation,
which wastes memory, and degrades allocator performance. Also, separate
loading "transactions" are required for each class. The virtual machine
searches for a class file either on a network device, or on a local file
system, and sets up a connection to load the class and parse it. This is a
relatively slow process, and has to be repeated for each class. The
execution of a Java program is prone to indeterminate pauses in
response/execution caused by each class loading procedure, especially,
when loading classes over a network. These pauses create a problem for
systems in which interactive or real-time performance is important.
A further disadvantage of the individual class loading process is that the
computer executing the Java program must remain physically connected to
the source of Java classes during the duration of the program's execution.
This is a problem especially for mobile or embedded computers without
local disk storage or dedicated network access. If the physical connection
is disrupted during execution of a Java application, class files will be
inaccessible and the application will fail when a new class is needed.
Also, it is often the case that physical connections to networks such as
the Internet have a cost associated with the duration of such a
connection. Therefore, in addition to the inconvenience associated with
maintaining a connection throughout application execution, there is added
cost to the user as a result of the physical connection.
A Java archive (JAR) format has been developed to group class files
together in a single transportable package known as a JAR file. JAR files
encapsulate Java classes in archived, compressed format. A JAR file can be
identified in an HTML document within an applet tag. When a browser
application reads the HTML document and finds the applet tag, the JAR file
is downloaded to the client computer and decompressed. Thus, a group of
class files may be downloaded from a server to a client in one download
transaction. After downloading and decompressing, the archived class files
are available on the client system for individual loading as needed in
accordance with standard class loading procedures. The archived class
files remain subject to storage inefficiencies due to duplicated data
between files, as well as memory fragmentation due to the performance of
separate memory allocations for each class file.
SUMMARY OF THE INVENTION
A method and apparatus for pre-processing and packaging class files is
described. Embodiments of the invention remove duplicate information
elements from a set of class files to reduce the size of individual class
files and to prevent redundant resolution of the information elements.
Memory allocation requirements are determined in advance for the set of
classes as a whole to reduce the complexity of memory allocation when the
set of classes are loaded. The class files are stored in a single package
for efficient storage, transfer and processing as a unit.
In an embodiment of the invention, a pre-processor examines each class file
in a set of class files to locate duplicate information in the form of
redundant constants contained in a constant pool. The duplicate constant
is placed in a separate shared table, and all occurrences of the constant
are removed from the respective constant pools of the individual class
files. During pre-processing, memory allocation requirements are
determined for each class file, and used to determine a total allocation
requirement for the set of class files. The shared table, the memory
allocation requirements and the reduced class files are packaged as a unit
in a multi-class file.
When a virtual machine wishes to load the classes in the multi-class file,
the location of the multi-class file is determined and the multi-class
file is downloaded from a server, if needed. The memory allocation
information in the multi-class file is used by the virtual machine to
allocate memory from the virtual machine's heap for the set of classes.
The individual classes, with respective reduced constant pools, are
loaded, along with the shared table, into the virtual machine. Constant
resolution is carried out on demand on the respective reduced constant
pools and the shared table.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an embodiment of a Java network application environment.
FIG. 2 is a block diagram of an embodiment of a computer system capable of
providing a suitable execution environment for an embodiment of the
invention.
FIG. 3 is a block diagram of an embodiment of a class file format.
FIG. 4 is a flow diagram of a class file pre-processing method in
accordance with an embodiment of the invention.
FIG. 5 is a block diagram of an multi-class file format in accordance with
an embodiment of the invention.
FIG. 6 is a block diagram of the runtime data areas of a virtual machine in
accordance with an embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
The invention is a method and apparatus for pre-processing and packaging
class files. In the following description, numerous specific details are
set forth to provide a more thorough description of embodiments of the
invention. It will be apparent, however, to one skilled in the art, that
the invention may be practiced without these specific details. In other
instances, well known features have not been described in detail so as not
to obscure the invention.
Embodiment of Computer Execution Environment (Hardware)
An embodiment of the invention can be implemented as computer software in
the form of computer readable program code executed on a general purpose
computer such as computer 200 illustrated in FIG. 2, or in the form of
bytecode class files executable by a virtual machine running on such a
computer. A keyboard 210 and mouse 211 are coupled to a bi-directional
system bus 218. The keyboard and mouse are for introducing user input to
the computer system and communicating that user input to central
processing unit (CPU) 213. Other suitable input devices may be used in
addition to, or in place of, the mouse 211 and keyboard 210. I/O
(input/output) unit 219 coupled to bi-directional system bus 218
represents such I/O elements as a printer, A/V (audio/video) I/O, etc.
Computer 200 includes a video memory 214, main memory 215 and mass storage
212, all coupled to bidirectional system bus 218 along with keyboard 210,
mouse 211 and CPU 213. The mass storage 212 may include both fixed and
removable media, such as magnetic, optical or magnetic optical storage
systems or any other available mass storage technology. Bus 218 may
contain, for example, thirty-two address lines for addressing video memory
214 or main memory 215. The system bus 218 also includes, for example, a
32-bit data bus for transferring data between and among the components,
such as CPU 213, main memory 215, video memory 214 and mass storage 212.
Alternatively, multiplex data/address lines may be used instead of
separate data and address lines.
In one embodiment of the invention, the CPU 213 is a microprocessor
manufactured by Motorola.RTM., such as the 680X0 processor or a
microprocessor manufactured by Intel.RTM., such as the 80X86, or
Pentium.RTM. processor, or a SPARC.RTM. microprocessor from Sun
Microsystems.RTM.. However, any other suitable microprocessor or
microcomputer may be utilized. Main memory 215 is comprised of dynamic
random access memory (DRAM). Video memory 214 is a dual-ported video
random access memory. One port of the video memory 214 is coupled to video
amplifier 216. The video amplifier 216 is used to drive the cathode ray
tube (CRT) raster monitor 217. Video amplifier 216 is well known in the
art and may be implemented by any suitable apparatus. This circuitry
converts pixel data stored in video memory 214 to a raster signal suitable
for use by monitor 217. Monitor 217 is a type of monitor suitable for
displaying graphic images.
Computer 200 may also include a communication interface 220 coupled to bus
218. Communication interface 220 provides a two-way data communication
coupling via a network link 221 to a local network 222. For example, if
communication interface 220 is an integrated services digital network
(ISDN) card or a modem, communication interface 220 provides a data
communication connection to the corresponding type of telephone line,
which comprises part of network link 221. If communication interface 220
is a local area network (LAN) card, communication interface 220 provides a
data communication connection via network link 221 to a compatible LAN.
Wireless links are also possible. In any such implementation,
communication interface 220 sends and receives electrical, electromagnetic
or optical signals which carry digital data streams representing various
types of information.
Network link 221 typically provides data communication through one or more
networks to other data devices. For example, network link 221 may provide
a connection through local network 222 to host computer 223 or to data
equipment operated by an Internet Service Provider (ISP) 224. ISP 224 in
turn provides data communication services through the world wide packet
data communication network now commonly referred to as the "Internet" 225.
Local network 222 and Internet 225 both use electrical, electromagnetic or
optical signals which carry digital data streams. The signals through the
various networks and the signals on network link 221 and through
communication interface 220, which carry the digital data to and from
computer 200, are exemplary forms of carrier waves transporting the
information.
Computer 200 can send messages and receive data, including program code,
through the network(s), network link 221, and communication interface 220.
In the Internet example, server 226 might transmit a requested code for an
application program through Internet 225, ISP 224, local network 222 and
communication interface 220. In accord with the invention, one such
downloaded application is the apparatus for pre-processing and packaging
class files described herein.
The received code may be executed by CPU 213 as it is received, and/or
stored in mass storage 212, or other non-volatile storage for later
execution. In this manner, computer 200 may obtain application code in the
form of a carrier wave.
The computer systems described above are for purposes of example only. An
embodiment of the invention may be implemented in any type of computer
system or programming or processing environment.
Class File Structure
Embodiments of the invention can be better understood with reference to
aspects of the class file format. Description is provided below of the
Java class file format. Also, enclosed as Section A of this specification
are Chapter 4, "The class File Format," and Chapter 5, "Constant Pool
Resolution," of The Java Virtual Machine Specification, by Tim Lindholm
and Frank Yellin, published by Addison-Wesley in September 1996,
.COPYRGT.Sun Microsystems, Inc.
The Java class file consists of a stream of 8-bit bytes, with 16-bit,
32-bit and 64-bit structures constructed from consecutive 8-bit bytes. A
single class or interface file structure is contained in the class file.
This class file structure appears as follows:
______________________________________
ClassFile {
u4 magic;
u2 minor.sub.-- version;
u2 major.sub.-- version;
u2 constant.sub.-- pool.sub.-- count;
cp.sub.-- info constant.sub.-- pool[constant.sub.-- pool.sub.-- count-1];
u2 access.sub.-- flags;
u2 this.sub.-- class;
u2 super.sub.-- class;
u2 interfaces.sub.-- count;
u2 interfaces[interfaces.sub.-- count];
u2 fields.sub.-- count;
field.sub.-- info fields[fields.sub.-- count];
u2 methods.sub.-- count;
method.sub.-- info methods[methods.sub.-- count];
u2 attributes.sub.-- count;
attribute.sub.-- info attributes[attributes.sub.-- count];
______________________________________
where u2 and u4 refer to unsigned two-byte and four-byte quantities. This
structure is graphically illustrated in FIG. 3.
In FIG. 3, class file 300 comprises four-byte magic value 301, two-byte
minor version number 302, two-byte major version number 303 , two-byte
constant pool count value 304, constant pool table 305 corresponding to
the constant pool array of variable length elements, two-byte access flags
value 306, two-byte "this class" identifier 307, two-byte super class
identifier 308, two-byte interfaces count value 309, interfaces table 310
corresponding to the interfaces array of two-byte elements, two-byte
fields count value 311, fields table 312 corresponding to the fields array
of variable length elements, two-byte methods count value 313, methods
table 314 corresponding to the methods array of variable length elements,
two-byte attributes count value 315, and attributes table 316
corresponding to the attributes array of variable-length elements. Each of
the above structures is briefly described below.
Magic value 301 contains a number identifying the class file format. For
the Java class file format, the magic number has the value 0xCAFEBABE. The
minor version number 302 and major version number 303 specify the minor
and major version numbers of the compiler responsible for producing the
class file.
The constant pool count value 304 identifies the number of entries in
constant pool table 305. Constant pool table 305 is a table of
variable-length data structures representing various string constants,
numerical constants, class names, field names, and other constants that
are referred to within the ClassFile structure. Each entry in the constant
pool table has the following general structure:
______________________________________
cp.sub.-- info {
u1 tag;
u1 info[ ];
}
______________________________________
where the one-byte "tag" specifies a particular constant type. The format
of the info[ ] array differs based on the constant type. The info[ ] array
may be a numerical value such as for integer and float constants, a string
value for a string constant, or an index to another entry of a different
constant type in the constant pool table. Further details on the constant
pool table structure and constant types are available in Chapter 4 of
Section A.
Access flags value 306 is a mask of modifiers used with class and interface
declarations. The "this class" value 307 is an index into constant pool
table 305 to a constant type structure representing the class or interface
defined by this class file. The super class value 308 is either zero,
indicating the class is a subclass of java.lang.Object, or an index into
the constant pool table to a constant type structure representing the
superclass of the class defined by this class file.
Interfaces count value 309 identifies the number of direct superinterfaces
of this class or interface, and accordingly, the number of elements in
interfaces table 310. Interfaces table 310 contains two-byte indices into
constant pool table 305. Each corresponding entry in constant pool table
305 is a constant type structure representing an interface which is a
direct superinterface of the class or interface defined by this class
file.
The fields count value 311 provides the number of structures in fields
table 312. Each entry in fields table 312 is a variable-length structure
providing a description of a field in the class type. Fields table 312
includes only those fields that are declared by the class or interface
defined by this class file.
The methods count value 313 indicates the number of structures in methods
table 314. Each element of methods table 314 is a variable-length
structure giving a description of, and virtual machine code for, a method
in the class or interface.
The attributes count value 315 indicates the number of structures in
attributes table 316. Each element in attributes table 316 is a
variable-length attribute structure. Attribute structures are discussed in
section 4.7 of Section A.
Embodiments of the invention examine the constant pool table for each class
in a set of classes to determine where duplicate information exists. For
example, where two or more classes use the same string constant, the
string constant may be removed from each class file structure and placed
in a shared constant pool table. In the simple case, if N classes have the
same constant entry, N units of memory space are taken up in storage
resources. By removing all constant entries and providing one shared
entry, N-1 units of memory space are freed. The memory savings increase
with N. Also, by implementing a shared constant table, entries in the
constant table need be fully resolved at most once. After the initial
resolution, future code references to the constant may directly use the
constant.
Pre-processing and Packaging Classes
An embodiment of the invention uses a class pre-processor to package
classes in a format called an "mclass" or multi-class file. A method for
pre-processing and packaging a set of class files is illustrated in the
flow diagram of FIG. 4.
The method begins in step 400 with a set of arbitrary class files "S"
(typically part of one application). In step 401, the pre-processor reads
and parses each class in "S." In step 402, the pre-processor examines the
constant pool tables of each class to determine the set of class file
constants (such as strings and numerics, as well as others specific to the
class file format) that can be shared between classes in "S." A shared
constant pool table is created in step 403, with all duplicate constants
determined from step 402. In step 404, the pre-processor removes the
duplicate, shared constants from the individual constant pool tables of
each class.
In step 405, the pre-processor computes the in-core memory requirements of
each class in "S," as would normally be determined by the class loader for
the given virtual machine. This is the amount of memory the virtual
machine would allocate for each class, if it were to load each class
separately. After considering all classes in "S" and the additional memory
requirement for the shared constant pool table, the total memory
requirement for loading "S" is computed in step 406.
In step 407, the pre-processor produces a multi-class (mclass) file that
contains the shared constant pool table created in step 403, information
about memory allocation requirements determined in steps 405 and 406, and
all classes in "S," with their respective reduced constant pool tables.
The mclass file for the class set "S" is output in step 408. In some
embodiments, to further reduce the size of the multi-class file, the
multi-class file may be compressed.
An example of one embodiment of a multi-class file structure may be
represented as follows:
______________________________________
MclassFile {
u2 shared.sub.-- pool.sub.-- count;
cp.sub.-- info shared.sub.-- pool[shared.sub.-- pool.sub.-- count-1];
u2 mem.sub.-- alloc.sub.-- req;
u2 classfile.sub.-- count;
ClassFile classfiles[classfile.sub.-- count];
______________________________________
In one embodiment of the invention, a new constant type is defined with a
corresponding con | | |