
The Turing test is a proposal for a test of a machine's ability to demonstrate intelligence. Described by Alan Turing in the 1950 paper "Computing Machinery and Intelligence", it proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which try to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen.[1]
Contents |
While the field of artificial intelligence was founded in 1956,[2] its philosophical roots extend back considerably further. The question of whether or not it is possible for machines to think has a long history, which is firmly entrenched in the distinction between dualist and materialist views of the mind. From the perspective of dualism, the mind is non-physical (or, at the very least, has non-physical properties[3]) and, therefore, cannot be explained in purely physical terms. The materialist perspective, on the other hand, argues that the mind can be explained physically, and thus leaves open the possibility of minds that are artificially produced.[4]
In 1936, philosopher Alfred Ayer considered the standard philosophical question of other minds: how do we know that other people have the same conscious experiences that we do? In his book Language, Truth and Logic Ayer suggested a protocol to distinguish between a conscious man and an unconscious machine: "The only ground I can have for asserting that an object which appears to be conscious is not really a conscious being, but only a dummy or a machine, is that it fails to satisfy one of the empirical tests by which the presence or absence of consciousness is determined".[5] This suggestion is very similar to the Turing test, but it is not certain that Ayer's popular philosophical classic was familiar to Turing.
Researchers in Britain had been exploring "machine intelligence" for up to ten years prior to 1956. It was a common topic among the members of the Ratio Club, an informal group of British cybernetics and electronics researchers that included Alan Turing.[6]
Turing in particular had been tackling the notion of machine intelligence since at least 1941,[7] and one of the earliest-known mentions of "computer intelligence" was made by him in 1947.[8] In Turing's report, "Intelligent Machinery", he investigated "the question of whether or not it is possible for machinery to show intelligent behaviour"[9] and, as part of that investigation, proposed what may be considered the forerunner to his later tests:
It is not difficult to devise a paper machine which will play a not very bad game of chess.[10] Now get three men as subjects for the experiment. A, B and C. A and C are to be rather poor chess players, B is the operator who works the paper machine. [...] Two rooms are used with some arrangement for communicating moves, and a game is played between C and either A or the paper machine. C may find it quite difficult to tell which he is playing.
Thus, by the time Turing published "Computing Machinery and Intelligence", he had been considering the possibility of artificial intelligence for many years. This, however, was the first published paper[11] by Turing to focus exclusively on the notion.
Turing begins his 1950 paper with the claim "I propose to consider the question 'Can machines think?'"[12] As he highlights, the traditional approach to such a question is to start with definitions, defining both the terms "machine" and "intelligence". Turing, however, chooses not to do so; instead, he replaces the question with a new one, "which is closely related to it and is expressed in relatively unambiguous words".[12] In essence, he proposes to change the question from "Do machines think?" to "Can machines do what we (as thinking entities) can do?"[13] The advantage of the new question, Turing argues, is that it draws "a fairly sharp line between the physical and intellectual capacities of a man".[14]
To demonstrate this approach, Turing proposes a test inspired by a party game known as the "Imitation Game", in which a man and a woman go into separate rooms, and guests try to tell them apart by writing a series of questions and reading the typewritten answers sent back. In this game, both the man and the woman aim to convince the guests that they are the other. Turing proposes recreating the game as follows:
We now ask the question, "What will happen when a machine takes the part of A in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?"[15]
Later in the paper, Turing suggests an "equivalent" alternative formulation involving a judge conversing only with a computer and a man.[16] While neither of these formulations precisely match the version of the Turing Test that is more generally known today, he proposed a third in 1952. In this version, which Turing discussed in a BBC radio broadcast, a jury asks questions of a computer, and the role of the computer is to make a significant proportion of the jury believe that it is really a man.[17]
Turing's paper considered nine putative objections, which include all the major arguments against artificial intelligence that have been raised in the years since his paper was first published. (See Computing Machinery and Intelligence.)[18]
Blay Whitby lists four major turning points in the history of the Turing Test — the publication of "Computing Machinery and Intelligence" in 1950, the announcement of Joseph Weizenbaum's ELIZA in 1966, Kenneth Colby's creation of PARRY, which was first described in 1972, and the Turing Colloquium in 1990.[19]
ELIZA works by examining a user's typed comments for keywords. If a keyword is found, a rule is applied which transforms the user's comments, and the resulting sentence is returned. If a keyword is not found, ELIZA responds with either a generic riposte or by repeating one of the earlier comments.[20] In addition, Weizenbaum developed ELIZA to replicate the behaviour of a Rogerian psychotherapist, allowing ELIZA to be "free to assume the pose of knowing almost nothing of the real world."[21] With these techniques, Weizenbaum's program was able to fool some people into believing that they were talking to a real person, with some subjects being "very hard to convince that ELIZA [...] is not human."[21] Thus, ELIZA is claimed by many to be one of the programs (perhaps the first) able to pass the Turing Test.[22][21]
Colby's PARRY has been described as "ELIZA with attitude":[23] it attempts to model the behaviour of a paranoid schizophrenic, using a similar (if more advanced) approach to that employed by Weizenbaum. In order to validate the work, PARRY was tested in the early 1970s using a variation of the Turing Test. A group of experienced psychiatrists analysed a combination of real patients and computers running PARRY through teletype machines. Another group of 33 psychiatrists were shown transcripts of the conversations. The two groups were then asked to identify which of the "patients" were human and which were computer programs.[24] The psychiatrists were only able to make the correct identification 48 per cent of the time — a figure consistent with random guessing.[25]
While neither ELIZA nor PARRY were able to pass a strict Turing Test, they — and software like them — suggested that software might be written that was able to do so. More importantly, they suggested that such software might involve little more than databases and the application of simple rules.
John Searle's 1980 paper Minds, Brains, and Programs proposed an argument against the Turing Test known as the "Chinese room" thought experiment. Searle argued that software (such as ELIZA) could pass the Turing Test simply by manipulating symbols of which they had no understanding. Without understanding, they could not be described as "thinking" in the same sense people do. Therefore—Searle concludes—the Turing Test cannot prove that a machine can think, contrary to Turing's original proposal.[26]
Arguments such as that proposed by Searle and others working on the philosophy of mind sparked off a more intense debate about the nature of intelligence, the possibility of intelligent machines and the value of the Turing test that continued through the 1980s and 1990s.[27]
1990 was the fortieth anniversary of the first publication of Turing's "Computing Machinery and Intelligence" paper, and thus saw renewed interest in the test. Two significant events occurred in that year: the first was the Turing Colloquium, which was held at the University of Sussex in April, and brought together academics and researchers from a wide variety of disciplines to discuss the Turing Test in terms of its past, present and future; the second was the formation of the annual Loebner Prize competition.
|
|
This section may require cleanup to meet Wikipedia's quality standards. Please improve this article if you can. (November 2008) |
The Loebner Prize was instigated by Hugh Loebner under the auspices of the Cambridge Center for Behavioral Studies in Massachusetts, United States, with the first competition held in November, 1991.[28] As Loebner described it, the competition was created to advance the state of AI research, at least in part because, while the Turing Test had been discussed for many years, "no one had taken steps to implement it."[29] The Loebner Prize has three awards: first prize of $100,000 and a gold medal is awarded to the first program that passes the "unrestricted" Turing test; the second prize of $25,000 is awarded to the first program that passes the "restricted" version; and a sum of $3,000 (previously $2,000) is awarded to the "most human-like" program that was entered each year. In 2008, neither the first nor second prizes were awarded.
Although the Loebner Prize offers an annual award for the computer system that, in the judges' opinions, demonstrates the "most human" conversational behaviour (with learning AI Jabberwacky winning in 2005 and 2006, and A.L.I.C.E. before that), they have an additional prize for a system that in their opinion passes a Turing test. This second prize has not yet been awarded. The creators of Jabberwacky have proposed a personal Turing Test — the ability to pass the imitation test while attempting specifically to imitate the human player, with whom the machine will have conversed at length before the test.[30]
The directive of the competition is to stay as close as possible to Turing's original statements, as made in his 1950 paper, such that it can be ascertained that any machines are presently close to "passing the test". An academic meeting discussing the Turing test, organised by the Society for the Study of Artificial Intelligence and the Simulation of Behaviour, is being held in parallel at the same venue.
The Loebner Prize does not usually force programs to demonstrate a full range of intelligence; they are reserved for testing chatterbot programs, or Artificial Conversational Entities (ACE)s. Even in this limited form, however, the tests are still very rigorous. Nevertheless, the 2008 Loebner Prize abided closely by Turing's original concepts: conversations between each interrogator and unseen or unheard entity have been allowed five minutes only, not the twenty minutes or more in previous Loebner contests, since 2004.
The Loebner Prize led to renewed discussion of both the viability of the Turing Test and the aim of developing artificial intelligences that could pass it. The Economist, in an article entitled "Artificial Stupidity", commented that the winning entry from the first Loebner prize won, at least in part, because it was able to "imitate human typing errors".[31] (Turing had considered the possibility that computers could be identified by their lack of errors, and had suggested that they be programmed to add errors into their output, so as to be better "players" of the game.)[32] The issue that The Economist raised was already well established in the literature: perhaps we do not really need the types of computers that could pass the Turing Test; perhaps trying to pass the Turing Test is nothing more than a distraction from more fruitful lines of research.[33]
A second issue has also become apparent: by providing rules which restrict the abilities of the interrogators to ask questions, and by using comparatively "unsophisticated" interrogators, the Turing Test can be passed through "trickery" rather than intelligence.[34]
There are at least three primary versions of the Turing test, two of which are offered in "Computing Machinery and Intelligence" and one which Saul Traiger describes as the "Standard Interpretation".[35] While there is some debate as to whether or not the "Standard Interpretation" is that described by Turing or, instead, based on a misreading of his paper, these three versions are not regarded as equivalent,[35] and their strengths and weaknesses are distinct.
Turing, as we have seen, described a simple party game involving three players. Player A is a man, player B a woman and player C (who plays the role of the interrogator) of either gender. In the Imitation Game, player C is unable to see either player A or player B, and can only communicate with them through written notes. By asking questions of player A and player B, player C tries to determine which of the two is the man and which is the woman. Player A's role is to trick the interrogator into making the wrong decision, while player B attempts to assist the interrogator in making the right one.[36]
In what SG Sterret refers to as the "Original Imitation Game Test",[37] Turing proposes that the role of player A be filled by a computer. The computer's task is thus to pretend to be a woman and attempt to trick the interrogator into making an incorrect evaluation. The success of the computer is determined by comparing the outcome of the game when player A is a computer against the outcome when player A is a woman. If, as Turing puts it, "the interrogator decide[s] wrongly as often when the game is played [with the computer] as he does when the game is played between a man and a woman"[14], it may be argued that the computer is intelligent.
The second version appears later in Turing's 1950 paper. As with the Original Imitation Game Test, the role of player A is performed by a computer, the difference being that the role of player B is now to be performed by a man rather than a woman.
"Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate programme, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?"
—Turing 1950, p. 442
In this version, both player A (the computer) and player B are trying to trick the interrogator into making an incorrect decision.[38]
Common understanding has it that the purpose of the Turing Test is not specifically to determine whether a computer is able to fool an interrogator into believing that it is a woman, but rather whether or not a computer could imitate a human.[38] While there is some dispute as to whether or not this interpretation was intended by Turing — Sterrett believes that it was[37] and thus conflates the second version with this one, while others, such as Traiger, do not[35] — this has nevertheless led to what can be viewed as the "standard interpretation". In this version, player A is a computer and player B a person of either gender. The role of the interrogator is not to determine which is male and which is female, but which is a computer and which is a human.[39]
There has arisen some controversy over which of the alternative formulations of the test Turing intended.[37] Sterrett argues that two distinct tests can be extracted from his 1950 paper and that, pace Turing's remark, they are not equivalent. The test that employs the party game and compares frequencies of success is referred to as the "Original Imitation Game Test", whereas the test consisting of a human judge conversing with a human and a machine is referred to as the "Standard Turing Test", noting that Sterret equates this with the "standard interpretation" rather than the second version of the imitation game. Sterrett agrees that the Standard Turing Test (STT) has the problems that its critics cite but feels that, in contrast, the Original Imitation Game Test (OIG Test) so defined is immune to many of them, due to a crucial difference: unlike the STT, it does not make similarity to human performance the criterion, even though it employs human performance in setting a criterion for machine intelligence. A man can fail the OIG Test, but it is argued that it is a virtue of a test of intelligence that failure indicates a lack of resourcefulness: the OIG Test requires the resourcefulness associated with intelligence and not merely "simulation of human conversational behaviour". The general structure of the OIG Test could even be used with non-verbal versions of imitation games.[40]
Still other writers[41] have interpreted Turing as proposing that the imitation game itself is the test, without specifying how to take into account Turing's statement that the test that he proposed using the party version of the imitation game is based upon a criterion of comparative frequency of success in that imitation game, rather than a capacity to succeed at one round of the game.
Turing never makes clear whether or not the interrogator in his tests is aware that one of the participants is a computer. To return to the Original Imitation Game, he states only that player A is to be replaced with a machine, not that player C is to be made aware of this replacement.[14] When Colby, FD Hilf, S Weber and AD Kramer tested PARRY, they did so by assuming that the interrogators did not need to know that one or more of those being interviewed was a computer during the interrogation.[42] As Ayse Saygin and others highlight, however, this makes a big difference to the implementation and outcome of the test.[43]
The power of the Turing test derives from the fact that it is possible to talk about anything. Turing wrote that "the question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include."[44] John Haugeland adds that "understanding the words is not enough; you have to understand the topic as well."[45]
In order to pass a well-designed Turing test, the machine must use natural language, reason, have knowledge and learn. The test can be extended to include video input, as well as a "hatch" through which objects can be passed: this would force the machine to demonstrate the skill of vision and robotics as well. Together, these represent almost all of the major problems of artificial intelligence.[46]
For all its strengths and its fame, the test has been criticised on several grounds.
The Turing Test is explicitly anthropomorphic, testing only whether or not the computer resembles a human being, not if it is generally "intelligent" or "sentient". It fails to test for general intelligence in two ways:
Stuart J. Russell and Peter Norvig argue that the anthropomorphism of the test prevents it from being truly useful for the task of engineering intelligent machines. "Aeronautical engineering texts," they write by way of analogy, "do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.'"[47] Because of this impracticality, trying to pass the Turing test in its full generality is not, as of 2005, an active focus of much mainstream academic or commercial effort. Current research in AI-related fields is aimed at more modest and specific goals.
Russell and Norvig note that "AI researchers have devoted little attention to passing the Turing Test",[48] since there are easier ways to test their programs, as, for example, by giving them a task directly rather than through the roundabout method of first posing a question in a chat room populated by machines and people. Turing never intended his test to be used as a real, day-to-day measure of intelligence in AI programs; he wanted to provide a clear and understandable example in aid of the discussion of the philosophy of artificial intelligence.[49]
The test is also explicitly behaviourist or functionalist: it only tests how the subject acts. A machine passing the test may be able to simulate human conversational behaviour merely by following some cleverly-devised rules. Two famous examples of this line of argument against the Turing test are John Searle's Chinese Room argument and Ned Block's Blockhead argument.
Even if the Turing test is a good operational definition of intelligence, it may not indicate that the machine has consciousness, or that it has intentionality. Perhaps intelligence and consciousness, for example, are such that neither one necessarily implies the other, in which case the Turing test might fail to capture one of the key differences between intelligent machines and intelligent people.
Turing predicted that machines would eventually be able to pass the test; in fact, he estimated that by the year 2000, machines with 109 bits (about 119.2 MiB) of memory would be able to fool thirty per cent of human judges in a five-minute test. He also predicted that people would then no longer consider the phrase "thinking machine" contradictory. He further predicted that machine learning would be an important part of building powerful machines, a claim considered plausible by contemporary researchers in artificial intelligence.
By extrapolating an exponential growth of technology over several decades, futurist Raymond Kurzweil predicted that Turing test-capable computers would be manufactured around the year 2020, roughly speaking.[50] See the "Moore's Law" article and the references therein for discussions of the plausibility of this argument.
The Long Bet Project is of $10,000 between Mitch Kapor (pessimist) and Kurzweil (optimist) about whether a computer will pass a Turing Test by the year 2029. The bet specifies the conditions in some detail.[51]
Numerous other versions of the Turing test, including those expounded above, have been mooted through the years.
A modification of the Turing test wherein the objective or one or more of the roles have been reversed between machines and humans is termed a reverse Turing test. An example is implied in the work of psychoanalyst Wilfred Bion,[52] who was particularly fascinated by the "storm" that resulted from the encounter of one mind by another. Carrying this idea forward, R. D. Hinshelwood[53] described the mind as a "mind recognizing apparatus", noting that this might be some sort of "supplement" to the Turing test. The challenge would be for the computer to be able to determine if it were interacting with a human or another computer. This is an extension of the original question that Turing attempted answer but would, perhaps, offer a high enough standard to define a machine that could "think" in a way that we typically define as characteristically human.
CAPTCHA is a form of reverse Turing test. Before being allowed to perform some action on a website, the user is presented with alphanumerical characters in a distorted graphic image and asked to type them out. This is intended to prevent automated systems from abusing the site. The rationale is that software sufficiently sophisticated to read and reproduce the distorted image accurately does not exist (or is not available to the average user), so any system able to do so is likely to be a human. The implication would appear to be (although it not necessary is) that artificial intelligence has not as yet been achieved.
Another variation is described as the subject matter expert Turing test, where a machine's response cannot be distinguished from an expert in a given field. As brain and body scanning techniques improve, it may also be possible to replicate the essential data elements of a person to a computer system.
The Immortality-test variation of the Turing test would determine if a person's essential character is reproduced with enough fidelity to make it impossible to distinguish a reproduction of a person from the original person.
The Minimum Intelligent Signal Test, proposed by Chris McKinstry, is another variation of Turing's test, where only binary responses are permitted. It is typically used to gather statistical data against which the performance of artificial intelligence programs may be measured.
Yet another variation is the Meta Turing test, in which the subject being tested (say, a computer) is classified as intelligent if it itself has created something that the subject itself wants to test for intelligence.
Why are we here?
All text is available under the terms of the GNU Free Documentation License
This page is cache of Wikipedia. History