Deductive Information Retrieval

SICP > Metalinguistic Abstraction > Logic Programming > Deductive Information Retrieval
Previous: Logic Programming Next: How the Query System Works

    Logic programming excels in providing interfaces to data bases for information retrieval. The query language we shall implement in this chapter is designed to be used in this way.

    In order to illustrate what the query system does, we will show how it can be used to manage the data base of personnel records for Microshaft, a thriving high-technology company in the Boston area. The language provides pattern-directed access to personnel information and can also take advantage of general rules in order to make logical deductions.

    A sample data base

    The personnel data base for Microshaft contains assertions about company personnel. Here is the information about Ben Bitdiddle, the resident computer wizard:

    (address (Bitdiddle Ben) (Slumerville (Ridge Road) 10))
    (job (Bitdiddle Ben) (computer wizard))
    (salary (Bitdiddle Ben) 60000)
    
    Each assertion is a list (in this case a triple) whose elements can themselves be lists.

    As resident wizard, Ben is in charge of the company's computer division, and he supervises two programmers and one technician. Here is the information about them:

    (address (Hacker Alyssa P) (Cambridge (Mass Ave) 78))
    (job (Hacker Alyssa P) (computer programmer))
    (salary (Hacker Alyssa P) 40000)
    (supervisor (Hacker Alyssa P) (Bitdiddle Ben))
    
    (address (Fect Cy D) (Cambridge (Ames Street) 3))
    (job (Fect Cy D) (computer programmer))
    (salary (Fect Cy D) 35000)
    (supervisor (Fect Cy D) (Bitdiddle Ben))
    
    (address (Tweakit Lem E) (Boston (Bay State Road) 22))
    (job (Tweakit Lem E) (computer technician))
    (salary (Tweakit Lem E) 25000)
    (supervisor (Tweakit Lem E) (Bitdiddle Ben))
    
    There is also a programmer trainee, who is supervised by Alyssa:
    (address (Reasoner Louis) (Slumerville (Pine Tree Road) 80))
    (job (Reasoner Louis) (computer programmer trainee))
    (salary (Reasoner Louis) 30000)
    (supervisor (Reasoner Louis) (Hacker Alyssa P))
    
    All of these people are in the computer division, as indicated by the word computer as the first item in their job descriptions.

    Ben is a high-level employee. His supervisor is the company's big wheel himself:

    (supervisor (Bitdiddle Ben) (Warbucks Oliver))
    
    (address (Warbucks Oliver) (Swellesley (Top Heap Road)))
    (job (Warbucks Oliver) (administration big wheel))
    (salary (Warbucks Oliver) 150000)
    

    Besides the computer division supervised by Ben, the company has an accounting division, consisting of a chief accountant and his assistant:

    (address (Scrooge Eben) (Weston (Shady Lane) 10))
    (job (Scrooge Eben) (accounting chief accountant))
    (salary (Scrooge Eben) 75000)
    (supervisor (Scrooge Eben) (Warbucks Oliver))
    
    (address (Cratchet Robert) (Allston (N Harvard Street) 16))
    (job (Cratchet Robert) (accounting scrivener))
    (salary (Cratchet Robert) 18000)
    (supervisor (Cratchet Robert) (Scrooge Eben))
    
    There is also a secretary for the big wheel:

    (address (Aull DeWitt) (Slumerville (Onion Square) 5))
    (job (Aull DeWitt) (administration secretary))
    (salary (Aull DeWitt) 25000)
    (supervisor (Aull DeWitt) (Warbucks Oliver))
    

    The data base also contains assertions about which kinds of jobs can be done by people holding other kinds of jobs. For instance, a computer wizard can do the jobs of both a computer programmer and a computer technician:

    (can-do-job (computer wizard) (computer programmer))
    (can-do-job (computer wizard) (computer technician))
    
    A computer programmer could fill in for a trainee:
    (can-do-job (computer programmer)
                (computer programmer trainee))
    
    Also, as is well known,
    (can-do-job (administration secretary)
                (administration big wheel))
    

    Simple queries

    The query language allows users to retrieve information from the data base by posing queries in response to the system's prompt. For example, to find all computer programmers one can say

    ;;; Query input:
    (job ?x (computer programmer))
    
    The system will respond with the following items:

    ;;; Query results:
    (job (Hacker Alyssa P) (computer programmer))
    (job (Fect Cy D) (computer programmer))
    

    The input query specifies that we are looking for entries in the data base that match a certain pattern. In this example, the pattern specifies entries consisting of three items, of which the first is the literal symbol job, the second can be anything, and the third is the literal list (computer programmer). The ``anything'' that can be the second item in the matching list is specified by a pattern variable, ?x. The general form of a pattern variable is a symbol, taken to be the name of the variable, preceded by a question mark. We will see below why it is useful to specify names for pattern variables rather than just putting ? into patterns to represent ``anything.'' The system responds to a simple query by showing all entries in the data base that match the specified pattern.

    A pattern can have more than one variable. For example, the query

    (address ?x ?y)
    
    will list all the employees' addresses.

    A pattern can have no variables, in which case the query simply determines whether that pattern is an entry in the data base. If so, there will be one match; if not, there will be no matches.

    The same pattern variable can appear more than once in a query, specifying that the same ``anything'' must appear in each position. This is why variables have names. For example,

    (supervisor ?x ?x)
    
    finds all people who supervise themselves (though there are no such assertions in our sample data base).

    The query

    (job ?x (computer ?type))
    
    matches all job entries whose third item is a two-element list whose first item is computer:

    (job (Bitdiddle Ben) (computer wizard))
    (job (Hacker Alyssa P) (computer programmer))
    (job (Fect Cy D) (computer programmer))
    (job (Tweakit Lem E) (computer technician))
    
    This same pattern does not match
    (job (Reasoner Louis) (computer programmer trainee))
    
    because the third item in the entry is a list of three elements, and the pattern's third item specifies that there should be two elements. If we wanted to change the pattern so that the third item could be any list beginning with computer, we could specify [*]

    (job ?x (computer . ?type))
    
    For example,
    (computer . ?type)
    
    matches the data

    (computer programmer trainee)
    
    with ?type as the list (programmer trainee). It also matches the data

    (computer programmer)
    
    with ?type as the list (programmer), and matches the data

    (computer)
    
    with ?type as the empty list ().

    We can describe the query language's processing of simple queries as follows:

    Note that if the pattern has no variables, the query reduces to a determination of whether that pattern is in the data base. If so, the empty assignment, which assigns no values to variables, satisfies that pattern for that data base.

    Exercise. Give simple queries that retrieve the following information from the data base:


    a. all people supervised by Ben Bitdiddle;


    b. the names and jobs of all people in the accounting division;


    c. the names and addresses of all people who live in Slumerville.

    Compound queries

    Simple queries form the primitive operations of the query language. In order to form compound operations, the query language provides means of combination. One thing that makes the query language a logic programming language is that the means of combination mirror the means of combination used in forming logical expressions: and, or, and not. (Here and, or, and not are not the Lisp primitives, but rather operations built into the query language.)

    We can use and as follows to find the addresses of all the computer programmers:

    (and (job ?person (computer programmer))
         (address ?person ?where))
    
    The resulting output is
    (and (job (Hacker Alyssa P) (computer programmer))
         (address (Hacker Alyssa P) (Cambridge (Mass Ave) 78)))
    
    (and (job (Fect Cy D) (computer programmer))
         (address (Fect Cy D) (Cambridge (Ames Street) 3)))
    
    In general,
    (and query1 query2 ... queryn)
    
    is satisfied by all sets of values for the pattern variables that simultaneously satisfy query1 ... queryn.

    As for simple queries, the system processes a compound query by finding all assignments to the pattern variables that satisfy the query, then displaying instantiations of the query with those values.

    Another means of constructing compound queries is through or. For example,

    (or (supervisor ?x (Bitdiddle Ben))
        (supervisor ?x (Hacker Alyssa P)))
    

    will find all employees supervised by Ben Bitdiddle or Alyssa P. Hacker:

    (or (supervisor (Hacker Alyssa P) (Bitdiddle Ben))
        (supervisor (Hacker Alyssa P) (Hacker Alyssa P)))
    
    (or (supervisor (Fect Cy D) (Bitdiddle Ben))
        (supervisor (Fect Cy D) (Hacker Alyssa P)))
    
    (or (supervisor (Tweakit Lem E) (Bitdiddle Ben))
        (supervisor (Tweakit Lem E) (Hacker Alyssa P)))
    
    (or (supervisor (Reasoner Louis) (Bitdiddle Ben))
        (supervisor (Reasoner Louis) (Hacker Alyssa P)))
    
    In general,
    (or query1 query2 ... queryn)
    
    is satisfied by all sets of values for the pattern variables that satisfy at least one of query1 ... queryn.

    Compound queries can also be formed with not. For example,

    (and (supervisor ?x (Bitdiddle Ben))
         (not (job ?x (computer programmer))))
    
    finds all people supervised by Ben Bitdiddle who are not computer programmers. In general,

    (not query1)
    
    is satisfied by all assignments to the pattern variables that do not satisfy query1.[*]

    The final combining form is called lisp-value. When lisp-value is the first element of a pattern, it specifies that the next element is a Lisp predicate to be applied to the rest of the (instantiated) elements as arguments. In general,

    (lisp-value predicate arg1 ... argn)
    
    will be satisfied by assignments to the pattern variables for which the predicate applied to the instantiated arg1 ... argn is true. For example, to find all people whose salary is greater than $30,000 we could write [*]

    (and (salary ?person ?amount)
         (lisp-value > ?amount 30000))
    

    Exercise. Formulate compound queries that retrieve the following information:


    a. the names of all people who are supervised by Ben Bitdiddle, together with their addresses;


    b. all people whose salary is less than Ben Bitdiddle's, together with their salary and Ben Bitdiddle's salary;


    c. all people who are supervised by someone who is not in the computer division, together with the supervisor's name and job.

    Rules

    In addition to primitive queries and compound queries, the query language provides means for abstracting queries. These are given by rules. The rule

    (rule (lives-near ?person-1 ?person-2)
          (and (address ?person-1 (?town . ?rest-1))
               (address ?person-2 (?town . ?rest-2))
               (not (same ?person-1 ?person-2))))
    
    specifies that two people live near each other if they live in the same town. The final not clause prevents the rule from saying that all people live near themselves. The same relation is defined by a very simple rule: [*]

    (rule (same ?x ?x))
    

    The following rule declares that a person is a ``wheel'' in an organization if he supervises someone who is in turn a supervisor:

    (rule (wheel ?person)
          (and (supervisor ?middle-manager ?person)
               (supervisor ?x ?middle-manager)))
    

    The general form of a rule is

    (rule conclusion body)
    
    where conclusion is a pattern and body is any query.[*] We can think of a rule as representing a large (even infinite) set of assertions, namely all instantiations of the rule conclusion with variable assignments that satisfy the rule body. When we described simple queries (patterns), we said that an assignment to variables satisfies a pattern if the instantiated pattern is in the data base. But the pattern needn't be explicitly in the data base as an assertion. It can be an implicit assertion implied by a rule. For example, the query

    (lives-near ?x (Bitdiddle Ben))
    
    results in

    (lives-near (Reasoner Louis) (Bitdiddle Ben))
    (lives-near (Aull DeWitt) (Bitdiddle Ben))
    
    To find all computer programmers who live near Ben Bitdiddle, we can ask

    (and (job ?x (computer programmer))
         (lives-near ?x (Bitdiddle Ben)))
    

    As in the case of compound procedures, rules can be used as parts of other rules (as we saw with the lives-near rule above) or even be defined recursively. For instance, the rule

    (rule (outranked-by ?staff-person ?boss)
          (or (supervisor ?staff-person ?boss)
              (and (supervisor ?staff-person ?middle-manager)
                   (outranked-by ?middle-manager ?boss))))
    
    says that a staff person is outranked by a boss in the organization if the boss is the person's supervisor or (recursively) if the person's supervisor is outranked by the boss.

    Exercise. Define a rule that says that person 1 can replace person 2 if either person 1 does the same job as person 2 or someone who does person 1's job can also do person 2's job, and if person 1 and person 2 are not the same person. Using your rule, give queries that find the following:

    aall people who can replace Cy D. Fect;

    ball people who can replace someone who is being paid more than they are, together with the two salaries.

    Exercise. Define a rule that says that a person is a ``big shot'' in a division if the person works in the division but does not have a supervisor who works in the division.

    Exercise. Ben Bitdiddle has missed one meeting too many. Fearing that his habit of forgetting meetings could cost him his job, Ben decides to do something about it. He adds all the weekly meetings of the firm to the Microshaft data base by asserting the following:

    (meeting accounting (Monday 9am))
    (meeting administration (Monday 10am))
    (meeting computer (Wednesday 3pm))
    (meeting administration (Friday 1pm))
    
    Each of the above assertions is for a meeting of an entire division. Ben also adds an entry for the company-wide meeting that spans all the divisions. All of the company's employees attend this meeting.
    (meeting whole-company (Wednesday 4pm))
    


    a. On Friday morning, Ben wants to query the data base for all the meetings that occur that day. What query should he use?


    b. Alyssa P. Hacker is unimpressed. She thinks it would be much more useful to be able to ask for her meetings by specifying her name. So she designs a rule that says that a person's meetings include all whole-company meetings plus all meetings of that person's division. Fill in the body of Alyssa's rule.

    (rule (meeting-time ?person ?day-and-time)
          rule-body)
    

    c. Alyssa arrives at work on Wednesday morning and wonders what meetings she has to attend that day. Having defined the above rule, what query should she make to find this out?

    Exercise. By giving the query

    (lives-near ?person (Hacker Alyssa P))
    
    Alyssa P. Hacker is able to find people who live near her, with whom she can ride to work. On the other hand, when she tries to find all pairs of people who live near each other by querying

    (lives-near ?person-1 ?person-2)
    
    she notices that each pair of people who live near each other is listed twice; for example,

    (lives-near (Hacker Alyssa P) (Fect Cy D))
    (lives-near (Fect Cy D) (Hacker Alyssa P))
    
    Why does this happen? Is there a way to find a list of people who live near each other, in which each pair appears only once? Explain.

    Logic as programs

    We can regard a rule as a kind of logical implication: If an assignment of values to pattern variables satisfies the body, then it satisfies the conclusion. Consequently, we can regard the query language as having the ability to perform logical deductions based upon the rules. As an example, consider the append operation described at the beginning of section [*]. As we said, append can be characterized by the following two rules:

    To express this in our query language, we define two rules for a relation

    (append-to-form x y z)
    
    which we can interpret to mean ``x and y append to form z'':

    (rule (append-to-form () ?y ?y))
    
    (rule (append-to-form (?u . ?v) ?y (?u . ?z))
          (append-to-form ?v ?y ?z))
    
    The first rule has no body, which means that the conclusion holds for any value of ?y. Note how the second rule makes use of dotted-tail notation to name the car and cdr of a list.

    Given these two rules, we can formulate queries that compute the append of two lists:

    ;;; Query input:
    (append-to-form (a b) (c d) ?z)
    ;;; Query results:
    (append-to-form (a b) (c d) (a b c d))
    
    What is more striking, we can use the same rules to ask the question ``Which list, when appended to (a b), yields (a b c d)?'' This is done as follows:

    ;;; Query input:
    (append-to-form (a b) ?y (a b c d))
    ;;; Query results:
    (append-to-form (a b) (c d) (a b c d))
    
    We can also ask for all pairs of lists that append to form (a b c d):

    ;;; Query input:
    (append-to-form ?x ?y (a b c d))
    ;;; Query results:
    (append-to-form () (a b c d) (a b c d))
    (append-to-form (a) (b c d) (a b c d))
    (append-to-form (a b) (c d) (a b c d))
    (append-to-form (a b c) (d) (a b c d))
    (append-to-form (a b c d) () (a b c d))
    

    The query system may seem to exhibit quite a bit of intelligence in using the rules to deduce the answers to the queries above. Actually, as we will see in the next section, the system is following a well-determined algorithm in unraveling the rules. Unfortunately, although the system works impressively in the append case, the general methods may break down in more complex cases, as we will see in section [*].

    Exercise. The following rules implement a next-to relation that finds adjacent elements of a list:

    (rule (?x next-to ?y in (?x ?y . ?u)))
    
    

    (rule (?x next-to ?y in (?v . ?z)) (?x next-to ?y in ?z))

    What will the response be to the following queries?

    (?x next-to ?y in (1 (2 3) 4))
    
    

    (?x next-to 1 in (2 1 3 1))

    Exercise. Define rules to implement the last-pair operation of exercise [*], which returns a list containing the last element of a nonempty list. Check your rules on queries such as (last-pair (3) ?x), (last-pair (1 2 3) ?x), and (last-pair (2 ?x) (3)). Do your rules work correctly on queries such as (last-pair ?x (3))$\,$?

    Exercise. The following data base (see Genesis 4) traces the genealogy of the descendants of Ada back to Adam, by way of Cain:

    (son Adam Cain)
    (son Cain Enoch)
    (son Enoch Irad)
    (son Irad Mehujael)
    (son Mehujael Methushael)
    (son Methushael Lamech)
    (wife Lamech Ada)
    (son Ada Jabal)
    (son Ada Jubal)
    
    Formulate rules such as ``If S is the son of F, and F is the son of G, then S is the grandson of G'' and ``If W is the wife of M, and S is the son of W, then S is the son of M'' (which was supposedly more true in biblical times than today) that will enable the query system to find the grandson of Cain; the sons of Lamech; the grandsons of Methushael. (See exercise [*] for some rules to deduce more complicated relationships.)  

    Previous: Logic Programming Next: How the Query System Works

      webmaster@arsdigita.org