|
Claims  |
|
|
We claim:
1. A method for generating the job category or categories most appropriate
for a job application from the applicant's printed resume using a
programmed computer, the method comprising;
loading a digital representation of the printed resume into the computer;
extracting and storing predefined words and word groups from the digital
representation, each work and word group being related to one or more job
categories;
assigning a weight to each extracted and stored word or work group, the
weights varying in relation to the strength of each word or word group as
an indicator of applicant's ability to fulfill a position in the
particular job category or categories;
summing the weights for each job category;
selecting the job category or categories with the highest weights; and
delivering as output in either computer-readable or text form the job
category or categories with the highest weights.
2. The method of claim 1, further comprising the steps of
compiling and storing a datafile comprising indicator headings for a given
job category, said datafile further comprising buzzwords where each
buzzword is associated with one or more indicator headings;
assigning weights to each buzzword in said datafile, where a buzzword
associated with more than one indicator heading is assigned a lesser
weight than a buzzword associated with only one indicator heading; and
wherein said extracting step comprises the step of identifying matched
buzzwords as those buzzwords which match with said extracted word or word
group; and wherein said step of assigning a weight comprises the step of
using the weights of said matched buzzwords to assign a weight to said job
category.
3. The method of claim 2, wherein said compiled datafile further comprises
a plurality of job categories, where said indicators are associated with
one or more job categories thereby forming an indirect association between
buzzwords and job categories, said method further comprising the steps of
determining whether said work or work group matches with a multiply
occurring buzzword, where a multiply occurring buzzword is defined as a
buzzwork which is indirectly associated with more than one job category;
and
computing a weight for a job category by using the weight of a matched
buzzword indirectly associated with said job category, where the
contribution to said job category's weight by a multiply occurring
buzzword is less than the contribution by a non-multiply occurring
buzzword.
4. The method of claim 1 wherein said loading step comprises the steps of:
scanning the resume to generate a digitized image of the resume; and
translating the digitized image to generate a digital representation of the
resume, said digital representation including codes representing text
characters.
5. A method for generating a job category or categories for a job applicant
from the applicant's printed resume using a programmed computer having a
memory and a processor, the method comprising:
storing a set of buzzwords, values and job categories, and associating each
buzzword with value and a job category;
loading a digital representation of the printed resume into the memory;
comparing the digital representation of the resume with the stored set of
buzzwords to extract one or more words and word groups which match with
one or more of the buzzwords:
assigning a weight to each job category, by deriving the weight from the
value associated with the matched buzzwords;
summing the assigned weights for each job category using the processor;
selecting from the summed weights for each job category one or more job
categories with the highest weights; and
delivering as output the selected one or more job categories with the
highest weights.
6. The method of claim 5, wherein said step of storing a set of buzzwords,
values and job categories further comprises the steps of:
storing a set of indicators, and associating each indicator with one or
more of said job categories, where each of said buzzwords is associated
with one or more indicators; and
adjusting said values associated with said buzzwords so that a buzzword
associated with more than one indicator is assigned a lesser value than a
buzzword associated with only one indicator.
7. The method of claim 6, wherein said method further comprises the steps
of
identifying headings in the digital representation of the resume; and
identifying words or word groups associated with said headings;
wherein said comparing step further comprises comparing said headings with
said stored indicators; and
wherein said assigning step further comprise assigning a higher weight to
said job category if said heading matches said indicator.
8. The method of claim 6, further comprising the steps of
computing an indicator weight for a given indicator based on the weights
assigned to each matched buzzword associated with said given indicator;
and
using said indicator weight to compute the weight for a job category.
9. The method of claim 6, further comprising the steps of
assigning a weight to a given indicator;
identifying matched indicators as those indicators which match with said
word or word group; and
using said matched indicator to compute the weight of said job category.
10. The method of claim 6, wherein said datafile further comprises a
plurality of job categories, where said indicators are associated with one
or more job categories thereby forming an indirect association between
buzzwords and job categories, said method further comprising the steps of
determining whether said word or word group matches with a multiply
occurring buzzword, where a multiply occurring buzzword is defined as a
buzzword which is indirectly associated with more than one job category;
and
computing a weight for a job category by using the weight of a matched
buzzword indirectly associated with said job category, where the
contribution to said job category's weight by a multiply occurring
buzzword is less than the contribution by a non-multiply occurring
buzzword.
11. The method of claim 10, where the buzzword occurs more than once in the
resume.
12. The method of claim 5 wherein said loading step comprises the steps of:
scanning the resume to generate a digitized image of the resume; and
translating the digitized image to generate a digital representation of the
resume, said digital representation including codes representing text
characters.
13. An apparatus for determining the job category or categories most
appropriate for an applicant based on applicant's printed resume, where
said apparatus comprises
a computer system comprising a memory and processing means coupled to said
memory;
an optical character recognition unit coupled to said computer system for
scanning the printed resume and entering information from the printed
resume into said memory;
said processing means comprising means for extracting predefined words and
word groups from said resume information;
said processing means further comprising means for assigning weights to
each word or word group, where said weights vary in relation to the
strength of each word or word group as an indicator of applicant's ability
to fulfill a position in said job category or categories;
said processing means further comprising means for calculating a weight for
each job category based on said weights for each word or word group; and
said processing means further comprising means for selecting and indicating
the job category or categories with the highest weights.
14. The apparatus of claim 13 further comprising
a datafile stored in said memory, where said datafile comprises indicator
headings for a given job category, where said datafile further comprises
one or more buzzwords, where each buzzword is associated with one or more
indicator headings, where a buzzword associated with more than one
indicator is assigned a lesser weight than a buzzword associated with only
one indicator;
said processing means further comprising means for identifying buzzwords
which match said word or word group extracted from said resume
information; and
said processing means further comprising means for using the weights of
said buzzwords which match to assign a weight to said job category.
15. The apparatus of claim 13, said optical character recognition unit
comprising an optical scanner for scanning the printer resume to convert
the printed resume into a digitized image of the resume and a character
recognition unit coupled to the optical scanner for translating the
digitized image into a digital representation of the resume, said digital
representation including codes representing text characters. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
NOTICE REGARDING COPYRIGHTED MATERIAL
A portion of the disclosure of this patent document contains material which
is subject to copyright protection. The copyright owner has no objection
to the facsimile reproduction by anyone of the patent document or the
patent disclosure as it appears in the Patent and Trademark Office patent
file or records, but otherwise reserves all copyright rights whatsoever.
FIELD OF THE INVENTION
This invention relates to the field of computer analysis of text documents.
More specifically it relates to the field of artificially intelligent
systems capable of analyzing resumes and extracting information relating
to job categorization.
BACKGROUND OF THE INVENTION
Job categorization is a necessary step in the process of hiring new
employees. In the employment office of most corporations, as resumes are
received they are sorted and the applicant is assigned a job skill code
indicating the types of jobs which the applicant may be able to perform.
At present, this process requires someone who can read the resume and
categorize the applicant.
The categorization process now used requires a skilled professional
recruiter to read the resume. The recruiter uses his experience and
knowledge to classify the applicant into one or several job categories.
The knowledge required to perform the classification includes knowing
which skills are required to perform various jobs, understanding the
corporate job structure, and identifying an applicant's strengths and
weaknesses.
The knowledge of which skills are required for a particular job category is
acquired by professional recruiters over a number of years. Typically, a
recruiter starts by working in a particular job area and reviews resumes
which have already been sorted by a more experienced recruiter. This
process allows the recruiter to gain experience. After a period of time
the recruiter becomes familiar with the contents of applicants' resumes in
this particular job category and can distinguish resumes falling within
this job category from those more applicable to other categories.
In addition to knowing the necessary job skills, the recruiter must
understand the job structure or organizational chart of both his own and
other companies. In categorizing applicants an experienced recruiter makes
frequent use of the job titles held by the applicant which can indicate
the applicant's previous positions.
A recruiter should also be able to identify the strengths and weaknesses of
an applicant. Although the resume may indicate a very wide range of
skills, the applicant may be proficient in only a small subset of these
skills. A skilled recruiter is able to identify which skills are accurate
descriptions of the applicant's talent and which are just "fluff" to
inflate the applicant's resume.
Computer programs which attempt to simulate human knowledge and
understanding are called knowledge-based systems. In particular, those
which simulate an expert's knowledge in a particular domain are called
expert systems. An expert system capable of performing job categorization
would simulate the categorization skills of an experienced recruiter.
General techniques used in creating expert systems include using rules
which embody the expert's procedural knowledge (rule-based systems), using
data structures containing the data known by the expert (frame-based
systems), and using probabilistic methods to simulate expert judgement
(evidential reasoning systems).
Rule-based systems are very effective if the expert's knowledge can be
expressed in terms of logical relations of "IF-THEN" rules. An example of
an IF-THEN rule is given below:
EXAMPLE 1
IF the applicant has a degree in Electrical Engineering
AND the applicant has circuit design experience
THEN the applicant is a hardware engineer.
This simple example demonstrates, however, that rules alone are not
sufficient to do job classification. Consider an applicant who graduated
with a degree in electrical engineering 10 years ago, worked as a circuit
designer for a couple of years, returned to school, received an MBA, and
is now working as a financial executive. It would be wrong to classify
this person as presently an electrical engineer. Although the rule above
could be augmented to include further conditions necessary to generate the
classification of a hardware engineer only when appropriate, it is easy to
see how complex such a rule would become.
Frame-based systems use frame data structures having slots, values and
possibly rules to represent the knowledge of experts. The slots are named
for the type of data required to fill the slot. In the example below, one
of the slots is called "Name". To fill this slot, a particular value is
required. For "Name", the applicant's name would be the proper value. Each
slot may further be associated with particular rules. The rules resemble
the type of rules used in a rule-based system, the difference being that
in frame-based systems changes to the slot values trigger the operation of
the rules. In rule-based systems, there is no such close coupling between
the rules and the data.
EXAMPLE 2
______________________________________
Name:
Degree:
(if-added
IF degree is MBA
THEN disregard engineering degree)
Experience:
.
.
______________________________________
Each "slot" (e.g. Name, Degree, Experience, etc.) has a value representing
the relevant data based on the applicant's resume. Notice the rule
indicating that if an MBA is added to an applicant's frame, the
engineering degree is effectively cancelled. This rule could be used to
preclude the incorrect classification noted in the previous example. A
given frame can have many slots, each slot capable of having one or more
values and each slot possibly having associated rules.
Consider now an applicant who has an engineering degree and an MBA but who
continues to work as an engineer. In this case, using the rule in the
frame shown in the example would cause the applicant to be incorrectly
classified once again. Although this problem can be circumvented through
modification of the rules, as the rules become increasingly complex, so
does the interaction between the rules and the data structures.
Eventually, the complexity may become so great that it becomes impossible
to determine which rules would be applied or take effect in any given
circumstance.
The complexity of these rule-based and frame-based systems can be reduced
by using probabilistic methods in which the conclusions generated are not
certain but very likely to be true. Using these methods with examples 1
and 2 might result in rules which read as follows:
IF the applicant degree is in Electrical Engineering
AND the applicant also has an MBA
THEN
Prob(the applicant is a hardware engineer)=10%
Prob(the applicant is an engineering manager)=70%
Prob(the applicant category is unknown)=20%
The combination of either type of expert system with these probabilistic
techniques is particularly effective when a relatively large amount of
data is available for analysis prior to assigning the probabilities. In
this case the probabilistic conclusions tend toward the actual case. Both
rule-based and frame-based knowledge can be adapted to support the
probabilistic methods.
Although the use of probabilistic methods in relation with rule-based and
frame-based systems is known, these techniques have not been applied to
the analysis of resumes, particularly the determination of which positions
an applicant could suitably fill. The particular nature of resumes, with
their various blocks of ungrammatical text and non-standard formats have
previously prevented their computer analysis.
SUMMARY OF THE INVENTION
The present invention fulfills the need for an automated computerized
system for resume analysis. By using a combination of frame-based and
rule-based techniques, and further by incorporating probabilistic methods,
the system is able to classify an applicant according to his employment
potential with a high degree of accuracy.
The present invention uses a method and apparatus which converts resumes
into a series of correctly ordered blocks comprised of computer
understandable character strings which strings contain the contents of the
resume. These blocks are processed by an extractor which uses a predefined
pattern language called a "Grammar" to locate and extract words and word
groups containing information believed to be relevant to the analysis of
an applicant's capabilities. These words and word groups comprise the
values which are used to fill the slot in the frame data structure.
Finally, the contents of the frame data structure are operated upon using
rule-based techniques interacting with probabilistic methods to categorize
the job applicant with a high degree of accuracy using only his resume.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a system for practicing the present invention;
FIG. 2 is a chart of the hierarchical organization of the knowledge base;
FIG. 3 is a sample of category groups;
FIGS. 4A and 4B are lists of the various terms which are related to the
Clerical job category;
FIG. 5 is a flowchart of the process by which relevant job categories are
determined;
FIG. 6 is a flowchart of the process whereby the weights of various terms
are assigned;
FIG. 7 is a flowchart of showing how category points are assigned; and
FIG. 8 is a flowchart showing how weak categories are eliminated.
DETAILED DESCRIPTION
FIG. 1 illustrates an apparatus implementing the preferred embodiment of
the present invention. A digital computer system 1, using the method and
apparatus described herein, operates upon data 2, derived from a printed
resume. A Sun 3/50 computer from Sun Microsystems, Mountain View, Calif.
has been successfully used as the computer in this embodiment.
It should also be noted that the computer software which realizes this
invention is frequently referenced in this description. This software has
been appended as Appendix 1.
Data 2 is derived from the printed resume by using the method and apparatus
described in the commonly owned U.S. patent application entitled "A Method
and Apparatus for computer Understanding and Manipulation of Minimally
Formatted Text Documents" Ser. No. 07/345,930 which was filed on May 1,
1989, the entire specification of which is hereby incorporated by
reference. The method and apparatus described therein accepts a printed
resume as input and converts it into a series of properly ordered blocks
of computer understandable character strings.
The character strings, called data 2, are delivered to computer 1. Computer
1 then passes data 2 through extractor program 4 which extracts words and
word groups considered relevant to the categorization process. A knowledge
base 3 contains a set of word patterns, also known as the grammar, which
specify which words and word groups will be extracted by extraction
program 4 working upon data 2. The words and word groups are placed in
frame data structures. The words and word groups returned in the frame
data structures can be encoded in electronic form and stored on any type
of computer data storage device or it may be in a hard-copy printed
format. The present invention's preferred embodiment operates upon frame
data structures stored electronically in memory 6. In the preferred
embodiment, the frames will contain such information as applicant's name,
job titles, degrees, etc.
Computer 1 also uses memory 6 for storing all or part of knowledge base 3
and extractor program 4, the operation of which will be subsequently
described. The job categories which are found to be appropriate are
generated as output 5. The output can be in either an electronic format or
a printed one.
FIG. 2 is a logical diagram of the hierarchical structure of knowledge base
3 in a general form. One particular group of word patterns (grammar) used
to implement this logical hierarchy is shown in Appendix 2. At the top of
the hierarchy are job category groups 22 ("Groups"). Each group is
comprised of a number of job categories 24 ("Job Categories"). Under each
job category 24 there are various indicators 26. Finally, under each
indicator 26 there may or may not be various "buzzwords" 28. This
hierarchy is only an example and other such hierarchical structures are
possible. The meaning and import of each of these various classifications
is discussed below.
FIG. 3 is a list of several exemplary groups (22, FIG. 2) and their
attendant job categories (24, FIG. 2). The groups are 0001 Administrative,
0011 Marketing/Sales, 0016 manufacturing, and 0025 Technical. Under the
Marketing/Sales group there are three job categories: 0012
Advertising/Comm, 0013 Marketing, and 0014 Sales. The groups and job
categories shown in FIG. 3 are merely examples. Many different groups and
job categories can be readily created to meet the needs of particular
employers.
FIGS. 4A and 4B are a list of exemplary indicators and buzzwords (26 and
28, FIG. 2) Each job category has related indicators and each indicator
may have an attendant list of buzzwords. The presence of these buzzwords
in a particular resume increases the probability that the applicant should
be classified in the job categories with which these buzzwords are
associated. Typical indicators, which can be considered as logical
groupings of buzzwords, are 0036 Desktop Publishers and 0047 Management.
Indicator 0047 Management has buzzwords 0048 Executive and 0049 Manager.
It should be noted that all the indicators and buzzwords in FIGS. 4A and
4B are related to job category Clerical. In FIG. 4A, the job title
indicator at line 0006 has buzzwords associated with it which comprise
various different job titles which might be used by someone holding a
position in the particular job category, here Clerical. Education
indicators may have buzzwords which comprise the various degrees which
would normally be held by persons in the job category. Skill indicators
contain lists of buzzwords which someone might use to describe their
aptitude in the area of relevance. For example, someone who claims
experience in "MacWrite" would have desktop publishing skills and would,
therefore, increase the likelihood that he or she would be categorized in
the Clerical job category.
Any given indicator may occur under several different job categories. For
example, an indicator such as 0047 "Management" (FIG. 4A) might occur
under a large number of job categories. Furthermore, indicators might
occur in two job categories under two entirely different groups. For
example, "Management" could also occur under the engineering job
categories in "Technical" (see FIG. 3). Similarly, the same buzzword might
occur under several different indicators. In FIGS. 4A and 4B, the
buzzwords at 0043 in the "Desktop Publishers" indicator and at 0088 in the
"Word Processors" indicator are both "Macwrite".
Using knowledge base 3 constructed according to the foregoing description,
extractor program 4 (see FIG. 1) scans data 2 and extracts words and word
groups which match the word patterns (the groups, job categories,
indicators and buzzwords) in knowledge base 3. The words and word groups
are stored in memory 6 in a frame data structures. After this process is
complete, the present invention selects which job category or categories
are most appropriate. The way this is carried out is described below.
A principle behind the entire job categorization process is to use the
indicators which appear in a resume as evidence that the applicant should
be classified in the job category or categories which contain these
indicators. Additionally, this invention provides a method to resolve the
ambiguous cases wherein either a skill occurs in more than one job
category or a buzzword occurs in more than one skill, or both.
For each job category, a weight determination algorithm assigns to each job
title, degree, buzzword, and designator under the job category an integer
value which is directly proportional to its strength as an indicator. Any
other word not assigned a value by this process is assigned a value 0 with
respect to the process of job categorization. The weight assignment
process is shown in FIG. 6 (see Appendix 1, listing "EXTRACT/devaluate.c".
It should be noted that the proper evaluation of the job categories
requires certain predefined constants. These constants are derived from an
empirical study of a large number of resumes. A list of these constants is
given below in Table 1.
TABLE 1
MAX.sub.-- THRESHOLD 12
MIN.sub.-- THRESHOLD 12
STRONG.sub.-- THRESHOLD 20
DEGREE.sub.-- PTS 2
BUZZWORD.sub.-- PTS 2
CATEGORY.sub.-- PTS 2
SKILL.sub.-- PTS 2
SKILL.sub.-- THRESHOLD 4
DOMINATE.sub.-- FACTOR 3
OBJ.sub.-- FACTOR 1
FIG. 5 is a flow chart that illustrates the overall job categorization
process. In step 5.1, the weights are calculated and all category point
totals are initialized to zero (see description below and FIG. 6). Step
5.2 indicates both the process by which the resume is converted into
computer understandable strings of characters and the process by which
words and word groups are extracted using extractor program 4 in
conjunction with knowledge base 3 (FIG. 1). The former process is fully
described in the co-pending application. The latter process has already
been described in this specification. The output from the latter process
is a frame data structure containing the educational degrees, job titles
if they appear in the "Objective" section of the resume, and all other
indicators occurring elsewhere on the resume.
A first job category is retrieved from memory 6 (FIG. 1) in step 5.3. The
job category point total is calculated in step 5.4. This step determines
the likelihood that the applicant should be classified in this job
category (see description below and FIG. 7). This calculation is repeated
for all the job categories by looping the process from step 5.5 to step
5.3 until all job categories are analyzed. Weak categories are eliminated
in step 5.6 (see description below and FIG. 8). The proper job categories
of the applicant, as determined by this process, are made available as
output at step 5.7.
FIG. 6 is a flowchart of the weight calculations. At step 6.05 the first
job category is retrieved from memory 6 (FIG. 1). Job title buzzwords are
assigned the value MAX.sub.-- THRESHOLD in step 6.1. Each education degree
is assigned a value equal to the variable DEGREE.sub.-- PTS in step 6.2.
In step 6.3, the first skill indicator in the job category is retrieved
from the memory. In step 6.4, the first buzzword from the first skill
indicator's list of "buzzwords" is likewise retrieved from the memory. If
the buzzword has not previously been assigned a weight (see step 6.5, and
code function "install", Appendix 1, page 01067), then it is assigned a
value of BUZZWORD.sub.-- PTS at step 6.6. If the buzzword has previously
been assigned a weight, then its weight is reduced by 1/2 in step 6.7.
Buzzwords are retrieved from memory 6 (FIG. 1) until all buzzwords for a
skill indicator have been assigned a weight (see step 6.8 and loop to step
6.4). This process is likewise performed on all skill indicators for a
given job category by looping the process back to step 6.3 if the test for
more skill indicators at step 6.9 is true. When all skill indicators for a
given job category have been processed, each designator indicator is
assigned a value called CATEGORY.sub.-- PTS in step 6.10. At step 6.11 the
process loop back to step 6.05 until all job categories have been
processed.
FIG. 7 is a flow chart that illustrates the process for calculating
category point totals for each job category (see Appendix 1, listing
"EXTRACT/action.c". At step 7.1, the matched pattern instances from the
particular resume are examined for occurrences of indicators and buzzwords
of the particular job category. In the case of buzzwords, the weight for
the particular buzzwords found is also added to the job category point
total at step 7.2. Code function "TotalSkillBuzz" (Appendix 1) calculates
the total for buzzword indicators. If a buzzword occurs within more than
one job category, it is unlikely that buzzword has any great significance.
On the other hand, if a buzzword occurs in only one job category, it is
likely to be quite significant. Thus, if a buzzword occurs multiple times
in the resume and that buzzword is associated with many different job
categories, its weight is added only once, when it first occurs. On the
other hand, if the buzzword occurs multiple times in the resume but is
associated with only one job category, its weight is added for each
occurrence.
Code function "SumMAX" (Appendix 1) updates the job category point total
for job title indicators. Code functions "PointSum" (Appendix 1) and "Sum"
(Appendix 1) update the job category point total for other indicators. In
the case of an education indicator, the indicator must be found in the
resume's education section to be counted. At step 7.3, a check is made to
see if there are more indicators for the particular job category. If there
are, the process returns to step 7.1 and gets the next indicator.
If there are no more indicators for the particular job category,
SKILL.sub.-- PTS are added to the job category point total for each skill
indicator whose related buzzwords have contributed at least SKILL.sub.--
THRESHOLD points at step 7.4. In step 7.5 (code function "Threshold",
Appendix 1), the job category point total is compared with the MIN.sub.--
THRESHOLD. If the job category point total is less than the MIN.sub.--
THRESHOLD, the category point total is reset to zero. The job category
point total is returned to the previously described categorization process
(see steps 5.4, FIG. 5) at step 7.6.
FIG. 8 illustrates the process used to eliminate weakly suggested job
categories (see Appendix 1, listing "EXTRACT/category.c", functions
"CheckCategory" and "CheckSubcat"). At step 8.1 a check is made to see if
there is at least one job category with more than STRONG.sub.-- THRESHOLD
points. If there is at least one such job category, then, in step 8.2 all
job categories with less than STRONG.sub.-- THRESHOLD points are reset to
a point total of zero. HIGH.sub.-- POINTS, the highest point total of all
job categories is determined in step 8.3. In step 8.4, job categories
C.sub.-- low, whose point total satisfies the following inequality have
their point total reset to zero:
points(C.sub.-- low).times.DOMINATE.sub.-- FACTOR<=HIGH.sub.-- POINTS.
At step 8.5 a check is made to see if there is a job title in the objective
section of the resume. If there is a job title, then OBJ.sub.-- POINTS is
set to the point total of the job category which includes (as a job title
indicator) the job title which was found in the objective section. At step
8.7, the point total of job categories C.sub.-- notobj which satisfy the
following inequality have their point total reset to zero:
points(C.sub.-- notobj).times.OBJ.sub.-- FACTOR<=OBJ.sub.-- POINTS.
At this point, whatever job categories which still have a positive value
are returned to the categorization process at step 5.7 in FIG. 5. If there
were no job titles in the objective section, then the process returns to
step 5.7 of FIG. 5 from step 8.5, skipping steps 8.6 and 8.7.
The previous description has shown how the present invention combines both
frame- and rule-based systems and applies probabilistic methods to the
combination. A knowledge base called a grammar is created containing word
patterns which indicate skill in a particular job category. These word
patterns are weighted to reflect their relative strength as skill
indicators. A string of computer understandable character strings is
accepted as input. An extractor module locates words and word groups in
the input which match the word patterns in the knowledge base and places
these words and word groups in frame data structures. The weighting and
summing operations are then performed on these frame data structures, the
final results comprising the job category or categories most applicable to
the applicant whose resume is being analyzed.
In the foregoing specification, the invention has been described with
reference to a specific exemplary embodiment thereof. It will, however, be
evident that various modifications and changes may be made thereunto
without departing from the broader spirit and scope of the invention as
set forth in the appended claims. For example, this categorization process
is not limited to resumes. Additionally, the constants in Table 1 could be
changed for optimum performance with different document types (e.g. job
application forms in place of resumes). The grammar (knowledge base) used
by the extractor could also be modified to retrieve more or different
types of information. Many such changes or modifications are readily
envisioned. The specification and drawings are, accordingly, to be
regarded in an illustrative rather than in a restrictive sense.
##SPC1##
* * * * *
|
|
|
|
|
Description  |
|