ABSTRACT
Use of the polygraph is distinctly different when considering an event specific crime versus identifying if any crime ever occurred.[1] Event specific crimes provide details that facilitate wording behaviorally specific test questions that are less likely to be misunderstood by the test subject. Polygraph research also shows optimal test accuracy occurs where only one single issue is included in the test, and truth and deception are unlikely to be mixed. However, in screening the consumer desire is to identify every negative behavior possible resulting in multiple, broadly worded and less optimal relevant questions. The challenge is in balancing optimal polygraph testing within time and cost limitations in meeting consumer expectations.
Typically, police agencies request testing applicants over multiple issues in the employment process thinking that polygraph is a “lie detector” and that multiple issue accuracy is equivalent to single issue testing. Police agencies often fail to grasp the limits on polygraph accuracy and focus on admissions made by the test subject during the process as more important. Thus, most polygraph examiners knowingly choose less accurate multiple issue test formats with less accurate, opaquely worded questions, for efficiency in meeting agency consumer demands. This proposal seeks to identify testing processes that make efficient use of time while continuing to provide the accuracy expected from a single-issue event specific crime polygraph.
The challenge with applicant screening and polygraph is the impossible dream for knowing everything about an applicant’s background. Further, applicant memory and knowing ground truth in a screening context limits the feasibility for such an outcome. Polygraph accuracy research for screening is normally generalized from resolved field crimes and mock crimes in laboratory experiments where ground truth can be reasonably known. This proposal recognizes that computer scoring algorithms have shown a high level of agreement with known truth and deception from research projects over known crimes. Using this correlation would suggest evaluating the efficacy of screening tests with computer scoring algorithms to infer whether single or multiple issues test formats are more optimal.
This proposal supports the use of a single-issue test format as more time efficient when testing several screening topics in a polygraph session and provides the accuracy expected when conducting specific crime event tests. Use of scoring algorithms are expected to produce a high level of agreement with single issue screening tests indicating a more optimal approach to accuracy while producing an expected decrease in variance from visual hand scoring. Conversely, use of multiple issue screening tests assessed with computer scoring algorithms are shown as a less-than-optimal approach where polygraph accuracy is important and the need for successive hurdles testing can be reduced.
DISCUSSION ON POLYGRAPH DEVELOPMENT
John Larson, the first police officer with a PhD., introduced the cardio-pneumo psychogram to the Berkeley Police Department in 1921 for use in criminal investigations. Dr. Larson used his device to solve several high-profile crimes at a time when police interrogation was often called the “third degree”. Nicknamed the “lie detector” by the press, Encyclopedia Britannica went on name Larson’s device as one of the 325 greatest inventions of all time. Larson’s device recorded respiratory and cardiovascular activity while a subject was asked a series of questions. Unfortunately, Larson had less than a positive experience in 1923 with the use of his device in the College Hall case. [2]
At this infancy stage in the College Hall case, Larson constructed a test format with nineteen questions covering multiple crimes and other evocative questions from the case. Larson later obtained a confession from a female suspect whom he ultimately ended up marrying. Neither Larson’s test format, nor the marrying of a potential suspect he was investigating would be acceptable by today’s standards. Yet, this unacceptable test format contains many of the design flaws retained for years in the Relevant/Irrelevant test format that would soon follow. The R/I format limited the number of relevant test questions during the test and included irrelevant or neutral questions. It was not until the late 1940’s with the introduction of a comparison question, was there some attempt to gauge the physiological arousal that occurred during testing to the relevant questions.
Recognizing these problems, in 1960 Cleve Backster introduced the concept of a fully structured approach to polygraph testing with his Zone Comparison You Phase format reducing professional chaos in the polygraph world. Backster taught that only a single-issue test with numerical scoring provided acceptable accuracy for credibility assessment. The You Phase test format had two relevant questions that were similarly worded over one single issue. The relevant questions were bracketed by three comparison questions that were separated from the test issue and it was expected the test subject would lie to the comparison question. The You phase format required three presentations, or charts, for all the questions in the format to complete a single test. Backster’s idea for numerical scoring individually evaluated the respiration, the electrodermal activity, and the cardiovascular response at each of the relevant questions as a method to eliminate subjectivity in decision making. The numerical scores were then totaled and cut scores were provided that facilitated decisions of Deception, No Deception, or an Inconclusive test opinion.[3] (See Appendix I for sample test format)
While there were critics of Backster’s Zone Comparison testing approach, it is beyond question that the polygraph profession fully embraced the concept of testing one known crime issue at a time using multiple question presentations for that one crime. In fact, several variations based on Backster’s concept emerged imitating Backster with two, and sometimes three relevant question iterations along with similar methods for visually analyzing and numerically scoring test data.
Despite this paradigm shift in the polygraph profession for single issue testing as the more accurate methodology, polygraph testing on issues where no known crime existed was deemed somehow different. The American Polygraph Association (APA) identified a screening examination as one where there is no reported incident or allegation which may be conducted with single or multiple issues. The APA placed little for requirements on screening tests and only offered that successive tests using validated methods must be used to revolve deception.[4]
Krapohl and Sterns (2003) fully describe some of the principles required for screening and the shortcoming that often occur.[5] Krapohl and Sterns noted that one of the most common problems is that “some polygraphers never move beyond the multiple-issue screening” and that “this shortcut saves time and effort but erodes the validity of the process.” It has been this author’s experience that the majority of examiners never move beyond the multiple issue test and is the rule, not the exception. Secondly, Krapohl and Sterns noted that one of the principles for successive hurdles testing is:
…best method to change is to alter the polygraph technique between the screening and diagnostic phases, so that the weaknesses of one method are not the same as those of the other method. Examiners who do follow up testing but who use the same testing technique for all subsequent tests, have violated this principle.
The author notes that the Directed Lie Screening Test (DLST) is one of the most used test formats for screening and that it violates the important second principle for testing.
HAND SCORING AND COMPUTER ALGORITHMS FOR TEST DECISIONS
Polygraph test questions are normally paced at twenty-five seconds and continuously record respiratory activity, the electrodermal activity (seat glands) and cardiovascular activity as relative blood pressure and heart rate. A polygraph test normally consists of a presentation of the questions that is approximately four and a half minutes in length with several minimum number of chart presentations. Polygraph test theory is that a truthful subject will load more physiological arousal at the comparison questions, while the deceptive subject will load more physiological arousal at the relevant questions. Numerical scoring is accomplished by evaluating each of the three required physiology components at each relevant question against an adjacent comparison to establish where the greatest arousal occurs. Where the comparison question arousal is greater, a plus score is awarded, and where the relevant question is greater, a minus score is awarded.
For a single-issue test, all the scores at each of the minimum six presentations for the relevant issue are aggregated for one grand total score. Respiratory criteria for arousal are generally defined as reduced activity, suppression or slowing of rate, with changes in skin conductance for the EDA defined as arousal from the baseline, and for the cardiovascular as a rise in mean blood pressure from the baseline norm established during the test. Where no difference is observable between questions, or the component tracing is artifacted, then no score is assigned. The quality of the physiological tracing data is always a concern since humans as test subjects are unique as forensic evidence in having an ability to impact data quality. While Backster’s numerical hand score likely removed some subjectivity in decision making it does not eliminate the variance seen in how examiners seem to apply scoring rules. Principal in this variance is in how examiners treat unstable or noisy physiology data.
When conducting a multiple issue test, scoring rules do not allow all the questions to be totaled for one grand total score. As a result, there will normally be three presentations for that question issue. For example, in a multi-issue applicant test, R1 may cover an issue for lying on the application, relevant R2 may cover an issue with illegal drugs, and relevant R3 may cover an issue for committing a felony crime. Each question is presented once on each of three charts. Since R1, R2, and R3 are independent, truth or deception could be mixed based among the questions, the respective score totals for R1, R2 and R3 cannot be aggregated together.
It is intuitive to know that multiple issue tests where the diagnostic score cannot be aggregated that the test will produce lower diagnostic scores. Nelson and Handler (2015) confirm this thinking in a paper that evaluated visual hand scores from approximately 4000 tests that were used in the APA Meta Analytic Survey on various polygraph test techniques. [6] With the Empirical Scoring System (ESS) as one example, a three-question single issue test format (nine presentations of one issue) produced a mean score of -9 (SD = 8) for deceptive subjects, and a mean score of +8 (SD = 7). With a two-question single issue test (six presentations of one issue) the mean score was -6 for deceptive subjects (SD = 6), and +6 for truthful subjects (SD = 6). ESS decision cut scores were recommended at +2 for truthful subjects and -4 for deceptive subjects. The mean accuracy for both two question or three question tests is approximately 90%, but the three-question variation produces more diagnostic score resulting in fewer inconclusive results than the two question variation.
The multiple issue format using the ESS produced a mean score for each individual test issue of -2 for deceptive subjects (SD = 3) and +2 for truthful subjects (SD = 3). ESS decision cut scores for multiple issue tests require that any -3 score results in a deceptive classification, while a +1 subtotal is required for all relevant questions for a truthful decision. It should not be overlooked that truth and deception may be mixed in a multiple issue test that results in asymmetrical mean scores and skew the mean. Despite this it is readily apparent that multiple issue test formats produce lower diagnostic value that hovers near the decision points for polygraph tests and increase the potential for inconclusive results.
The APA Meta-analytic survey provides support for increased test accuracy for a single-issue test format where truth and deception cannot be mixed. The evidentiary test formats produce mean accuracies above 90% while investigative tests with lower constraints on test question construction are only required to produce mean accuracies above 80%. All test formats were required to be arbitrarily chosen below the level of 20% for inconclusive results. It is clear from the APA meta-analytic survey that more presentations or asking of a relevant question produce more diagnostic value as well as reducing the potential for inconclusive tests.[7]
While computer scoring algorithms could do much to eliminate variance in scoring by humans using visual analysis, they tend to be avoided by human scorers. Human scorers seem intolerant of disagreement and point out computers cannot currently distinguish good data from bad data that is unsuitable for analysis. As such pairing human evaluators who remove bad quality data, with computer algorithms have been available for some time. Dollins, Krapohl and Dutton (2000) selected 97 confirmed cases where truth and deception were known to test agreement against five computer scoring algorithms.[8] The project collected tests that included 56 deceptive tests and 41 truthful tests. When no opinion tests were excluded, the proportion of correct decisions for the algorithms ranged from .88 to .91. Lastly, there were very few incorrect classifications for deceptive subjects, with correspondingly higher misclassifications for truthful subjects.
METHODOLOGY
Introduction to the DLST
Police agencies routinely ask three to four relevant issue questions during a pre-employment polygraph and the time allowed for polygraph normally seems limited. Using a commonly accepted Zone Comparison test format, approximately twenty minutes is needed to administer three charts with three presentations of each relevant question. If the test result is truthful then no additional testing is needed. If deception is indicated on the test, then the test is deceptive to all the relevant questions. There is no reliable ability to discern which relevant question a subject is deceptive on unless each relevant issue is tested as a single-issue test. Yet, this is often the first question with a deceptive test. As previously stated above, most examiners skip the successive hurdles and merely guess-estimate which issue a subject lied to without additional testing.
The author is very familiar with a large state police agency in the southwest who has requested confidentiality regarding its polygraph testing process and is discussed here as Agency T. Agency T initially conducted two, four-question Zone Comparison test formats as part of its pre-employment applicant testing. Dissatisfied with this approach, Agency T selected a uniquely different test format, the Directed lie Screening Test (DLST) for its polygraph program. The DLST is a one chart test where the relevant questions are repeated multiple times within a single chart. The test is conducted with two relevant questions and two comparison questions which are all repeated multiple times. Agency T chose to conduct Test A with R1 on application integrity, and R2 on illegal drug use. Each relevant question is repeated three times, and four times if one of the first three iterations is compromised during the test. Agency T then conducted Test B with R3 about crimes against a person, and R4 on property crimes. The test format is as follows:
- Neutral
- Neutral
- Sacrifice Relevant
- Comparison 1
- Relevant 1
- Relevant 2
- Comparison 2
- Relevant 1
- Relevant 2
- Comparison 1
- Relevant 1
- Relevant 2
- Comparison 2
- Relevant 1 (if a prior asking is compromised)
- Relevant 2 (if a prior asking is compromised)
- Comparison 1 (if a previous asking is compromised)
If the applicant is truthful to both Test A and Test B then testing is finished. If either test is inconclusive, then each relevant is reconstructed into a single-issue test developed around the issue each relevant covered. With each DLST taking approximately six minutes to administer, the requirement for successive hurdles testing dictates a lot of time spent if deception is a likely possibility, or an inconclusive test occurs. As well, this process seems to violate the principles discussed by Krapohl and Stern (2003) in using a differing format for successive tests.
After years of use, this author surveyed Agency T about how the DLST was performing. Agency T reported it was necessary to conduct additional single issue successive hurdles testing approximately 50% of the time. The author noted this is consistent with the problems exhibited in the Test for Espionage and Sabotage (TES) in 1995 when the DLST format was first developed. [9]
The author notes the DLST did not originally appear to meet required standards for inclusion in the APA meta-analysis study. Seeking to support the DLST, authors of the meta-analytic survey created a study in (2012) from a polygraph class in Iraq where this author was the school director as support for the DLST. This classroom experiment constructed a mock espionage event as a specific incident but then worded the relevant polygraph questions in a more opaque and ambiguous manner to resemble a screening test format.
During the Iraq classroom experiment, twenty-four subjects were programmed as deceptive, and twenty-five subjects were programmed as truthful. However, nine tests (18.4%) were inconclusive and had to be retested during the exercise.[10] During the review by the researchers 89.9% of the deceptive were correctly identified, and 79.6% of the deceptive were correctly identified, with an additional 8.5% of the tests determined to be inconclusive. The authors did not include the nine (18.4%) of the original tests in their calculations and the DLST was thus able to be included in the APA meta-analytic survey as a validated format. As an explanation, the researchers noted that they were approving the use of the DLST format with the caveat that field examiners have an opportunity to and should retest with a single issue. However, Nelson (2012) also commented that no studies have supported the hypothesis for accurate decisions when truth and deception are mixed.[11] Nelson’s comments are included below.
- “A limitation of this study is that no attempt was made to study DLST criterion accuracy at the level of the individual questions. Previous studies have not supported the hypothesis of highly accurate decisions at the level of the individual RQs.”
Research Proposal
The hypothesis suggested by this author is that the use of the DLST as a single-issue screening test format will outperform its use as a multi-issue format as well as having fewer inconclusive test results and requiring successive hurdles tests. By outperform, the author suggests that the DLST format will achieve similar accuracy to single issue event specific test zone comparison formats such as the Backster You Phase. Further, the single chart DLST format requiring approximately six minutes is more efficient than three charts for a You Phase format requiring approximately four and a half minutes per chart for a total of fifteen minutes.
The author acknowledges the difficulty, if not impossibility, of setting up a research proposal for screening tests where test subjects must be imperfectly programmed for truth or deception. All prior research has involved programming role players to commit some behavior such as taking pictures of a government building and telling the role player they are spies. Subsequent polygraph screening test questions are used such as, “have you ever been involved in espionage against the United States?” Another example is to have a role player steal a particular item such as money from a desk. Subsequent screening test questions are created such as, “have you ever stolen from an employer?” Both examples are sub-optimal because they do not immediately provoke an episodic memory of a real crime and may inadvertently be confused with real world violations by the role player such as occurred in the 1995 TES studies.
The ability to generalize accuracy for polygraph over a known crime rests in the ability to compare test data from actual crimes with screening tests results from unknown potential crimes that are by necessity worded more opaquely. This author accepts that all screening test questions by their nature are sub-optimal to questions for a known crime that can adequately discussed. However, the author has anecdotally observed some screening questions can be operationalized with sufficient preparation with the test subject. This proposal suggests the somewhat intuitive notion that testing with one opaque and broadly worded issue is certainly more optimal than attempting to test multiple broadly worded opaque test issues during one single test.
Use of computer scoring algorithms is an objective and reliable measure for evaluating the physiological data collected from any polygraph test. Computers do not know what questions were presented and do not see data the way a human examiner does. The only disadvantage is that computers are not able to separate high quality polygraph data from low quality atypical data, or data that has been intentionally altered by a subject attempting test countermeasures. However, humans are better at evaluating the esthetics of data quality and determining what data should not be scored.
The author proposes using the OSS-3 computer scoring algorithm for this project. Nelson, Handler, and Krapohl (2008) appropriately state that a computer scoring algorithm can provide perfect reliability. [12] Nelson et al describes the OSS-3 as being tested against 292 known single-issue exams and a second data set with 100 known exams. The OSS-3 was reported as having balanced sensitivity and specificity and exceeded the average accuracy of ten human scorers during evaluation. OSS-3 correctly identified 91.5% of the exams excluding inconclusive tests with 6% reported inconclusive exams.
This proposal will contrast the DLST format, using a single-issue test as the SIST, against the multiple issue DLST screening tests, with the OSS-3 computer scoring algorithm. The proposal suggests there will be a higher agreement or concordance with single issue SIST and the computer algorithm versus the multiple issue DLST. The higher concordance between computer scoring algorithms and human scorers would provide the inference that single issue testing is better than multiple issue testing.
Data Collected
Five polygraph examiners with experience in conducting examinations with the single issue DLST were requested to randomly provide sessions from ten test subjects. Participating examiners were asked to randomly select five subjects who had been truthful to all exams during their respective session and five subjects who had been deceptive to at least one exam during those sessions. The author provides that a subject deceptive to one exam during their session may be truthful to other exams in their session. Potentially, more truthful exams evaluated as truthful by the human scorers would be submitted.
Fifty polygraph subjects or sessions were received that contained 167 total single-issue exams with issues including having disciplinary action at a previous job, committing theft, committing crimes against another person, and sex and pornography crimes. These single-issue exams were believed to have all had pretest interviews that included the use of visual mind maps during the pretest interview as an effort to reduce the ambiguity created by screening test questions. Each exam was blind reviewed by a human evaluator for quality control with the original examiner’s hand score. There were twelve exams rejected for containing significant distortions that appeared to be deliberate efforts to distort that person’s physiology and alter the test decision.
There were 155 remaining exams with 101 reported as truthful, 43 reported as deceptive and eleven reported as inconclusive by the human scorers. The OSS-3 algorithm was conducted as a single-issue exam using the Senter 2-stage scoring rules. Atypical data showing evidence of artifacts such as deep breaths, or movement, were removed from consideration by the OSS-3.
The OSS-3 algorithm was in concordance with 40 of the 43 exams rated by humans as deceptive. The OSS-3 algorithm found three of the deceptive exams to be inconclusive and did not meet the requirement for deception. Including the inconclusive results, the computer algorithm was 93% in concordance with the human scorer. It is noteworthy that the algorithm did not disagree with any exam scored deceptive by examiners.
The OSS-3 was in concordance that 85 of the 101 exams rated by human examiners as truthful. The OSS-3 was not in agreement with 10 of the 101 exams scored by humans and called them deceptive. The OSS-3 exam called 6 of the 101 exams inconclusive. Excluding these 6 inconclusive exams, the OSS-3 algorithm was in concordance with human scorers 89.5% of the time and 84.5% of the time including the inconclusive test results.
The human scorers evaluated 11 exams as inconclusive. The OSS-3 algorithm agreed with 1 exam being inconclusive. However, the OSS-3 algorithm scored 3 of the human-scored inconclusive tests as truthful, and 7 of them as deceptive. This would suggest that lower quality data is either unsuitable for evaluation or that algorithms make better use of data for decisions. This is difficult to know since ground truth is unknown in our data set.
Table 1
50 Test Subjects-167 Total EXAMS | |||
12 exams Rejected – artifact -countermeasures | |||
Examiner Score | COMPUTER DECISION | ||
155 total exams reviewed | |||
No Deception | 101 | 85 | 84.5% Concordance |
10-DI | |||
6-Inc. | |||
Deception | 43 | 40 | 93.0% Concordance |
0-NDI | |||
3-Inc. | |||
Inconclusive | 11 | 1 | Concordance |
3-NDI | |||
7-DI |
It has been so widely reported in polygraph literature that polygraph examiners shun the use of scoring algorithms that there is little ability to dispute that examiners disdain computer scoring tools. There are probably a variety of reasons for this, but a commonly heard report from polygraph examiners is that algorithms do not agree with their visual scores. This sounds believable as most examiners are in the business of conducting multiple issue screening tests. This widely espoused opinion is discussed in Krapohl and Dutton 2024 in a comparison of scoring algorithms. [13] Krapohl et al. reviewed three algorithms against that of human scorers. The algorithms included the OSS-3 algorithm, and the PolyScore algorithm which has two variations, one variation for use with single issue exams, and the second variation was for use with multiple issue exams. Krapohl and Dutton submitted 84 multiple issue exams with two relevant issue questions, and 33 multiple issue exams with 3 questions for review. There were 48 single issue exams also submitted for review. Krapohl and Dutton reported the PolyScore multiple issue algorithm correctly identified 45% of the truthful subjects, 87% of the deceptive subjects, but had a 43% inconclusive rate. The OSS-3 algorithm could correctly identify 71% of the truthful, and 96% of the deceptive, with a 14% inconclusive rate. This study seems to correlate favorably with the results from this author’s research. See Table 2 below from Krapohl and Dutton (2024).
Conclusion
It seems readily apparent that the comparison question polygraph test provides decision accuracy considerable above chance for determining truth or deception when a clear and distinct issue is selected. A single-issue screening test such as the SIST that presents the issue six times with a reasonable clear issue will approach or equal any other single issue event specific test. However, the author is not oblivious to the concerns of the National Academy of Science Examiners report in 2003. The NAS study noted polygraph examiners “ask generic questions during security screening because they do not know what violations test takers may be concealing. Individuals may react differently to generic questions than to specific ones typically used in investigations of known events.” [14] Unfortunately, field polygraph examiners seem to pay more attention to test format than actual relevant question wording. Cleve Backster wrote about this polygraph accuracy principle in 1960 when he introduced his Zone Comparison You Phase test format. Backster described Distinctness of Issue, rated on a scale of 1 to 5, for use in predicting test accuracy. Instead, the polygraph profession is still mired in conducting multiple mixed issues tests for screening which include multiple opaque issues suggesting the examiner follow up with appropriate singe issue exams. Even more vexing is polygraph discourse about which format will work better with multiple issue exams.
It is unlikely there will ever be valid scientific research that supports the use of polygraph for screening use. The problem is not with polygraph testing, but in the limitations for how relevant questions are posed for screening. The White Marlin Open vs. Phillip G. Heasley Civil Action No.: RDB–16–3105 (2017) underscores the conundrum being posed. There may be as many as a hundred tournament rules in these expensive bill fishing tournaments with large cash prizes. In the case above, multiple fishermen were polygraphed, and one team was deemed deceptive. The ensuing civil battle revolved around having to conduct multiple polygraph exams over a multitude of rules or suspected violations, versus polygraphing over a single issue such as, “did you violate any of the rules in this fishing tournament?” The answer is probably somewhere in the middle.
The more likely solution is that single issue testing is more optimal, but only where the issue seems to be able to be reasonably described to the test subject. Once such a test is evaluated and tested then that test should be evaluated with computer scoring algorithms. There may be several questions necessary to cover the most salient issues the consumer needs to know for risk assessment. However, each issue to be tested must be thoroughly evaluated as to the clarity in defining what is asked. The author opines that such testing should involve a Hierarchical Protocol that tests the best issue first. Best issue means the issue that is the most significant, the most salient, and or offers the greatest information for gain in risk assessment by the consumer. If this issue is truthful then other issues may be subsequently tested. The author further suggests a Gatekeeper Protocol. This protocol provides that when an issue is deceptive no further testing proceeds on other issues until the deceptive issue is resolved.
The author has been involved in polygraph examinations for more than forty years. The author has been using single issue screening for more than four years and found it superior to multiple issues testing formats. The author is grateful to those examiners who have contributed their time, support and law enforcement agency involvement in studying single versus multiple issues testing in pursuit of scientific credibility assessment with polygraph.
APPENDIX
- Sample Backster You Phase Test Format
13. (Neutral) Is your first name Joe?
25. (Symptomatic) Do you believe me when I promise not to ask any questions that I did not review with you word for word?
39. (Sacrifice Relevant) Regarding whether you stole that money from the cash register, do you intend to answer truthfully each question about that?
46. (Comparison) Between the ages of 18 and 25 did you ever steal anything?
33. (Relevant) Did you steal that missing money from the cash register?
47. (Comparison) During the first 18 years of your life, did you ever steal anything?
35. (Relevant) Did you steal the money that was reported missing from the cash register last Friday?
48. (Comparison) During the first 25 years of your life did you ever lie to stay out of trouble?
26. (Symptomatic) Even though I promised I would not, are you afraid I will ask a question I did not go over with you word for word?
[1] Use of the word crime is articulated here to simplify the discussion regarding some specific issue which is being contested.
[2] Alder, Ken. The Lie Detectors: The History of an American Obsession. New York: The Free Press, March 2007. Foreign language editions: Japanese: Tokyo, Hayakawa [2008]
[3] Nelson, R., Handler M., Adams G., & Backster C. (2012). Survey of Reliability and Criterion Validity of Backster Numerical Scores of You-Phase Exams from Confirmed Field Investigations. Polygraph. 41(2), 127-135. Tagged XML BibTex Google Scholar Download: Backster Numerical Scores of You-Phase Exams (162.51 KB)
[4] American Polygraph Association. 8/23/2024. Standards of Practice. Retrieved 4/19/2025 from https://www.polygraph.org/docs/APA_STANDARDS_OF_PRACTICE_amended_23_August_2024.pdf
[5] Krapohl, Donald. Stern, Brett, A. 2003. Principles of Multiple-Issue Polygraph Screening A Model for Applicant, Post-Conviction Offender, and Counterintelligence Testing. American Polygraph Association. Polygraph, 2003, 32(4)
[6] Nelson, R., & Handler M. (2015). Statistical Reference Distributions for Comparison Question Polygraphs. Polygraph. 44(1), 91-114. Tagged XML BibTex Google Scholar Download: 44(1)-statistical reference distributions.pdf (233.96 KB)
[7] American Polygraph Association (2011). Meta-analytic survey of criterion accuracy of validated polygraph techniques. Polygraph, 40(4), 196-305. [Electronic version] Retrieved August 20, 2012, from http://www.polygraph.org/section/research-standards-apa-publications.
[8] Dollins, Andrew B., Krapohl, Donald J., and Dutton, Donnie. (2000). Computer Algorithm Comparison. Polygraph, 2000, 29(3), pp 237-243.
[9] Research Division Staff (1995a). A comparison of psychophysiological detection of deception accuracy rates obtained using the counterintelligence scope and the test for espionage and sabotage question formats. DTIC AD Number A319333. Department of Defense Polygraph Institute. Fort Jackson, SC. Reprinted in Polygraph, 26(2), 79-106.
Research Division Staff (1995b). Psychophysiological detection of deception accuracy rates obtained using the test for espionage and sabotage. DTIC AD Number A330774. Department of Defense Polygraph Institute. Fort Jackson, SC. Reprinted in Polygraph, 27(3), 171-180.
[10] Nelson, R., Handler M., & Morgan C. (2012). Criterion Validity of the Directed Lie Screening Test and the Empirical Scoring System with Inexperienced Examiners and Non-naive Examinees in a Laboratory Setting. Polygraph. 41(3), 176-185. Tagged XML BibTex Google Scholar Download: DLST and ESS with Inexperienced Examiners (190.21 KB)
[11] Nelson, R. (2012). Monte Carlo Study of Criterion Validity of the Directed Lie Screening Test using the Seven-position, Three-position, and Empirical Scoring Systems. Polygraph. 41(4), 241-251. Tagged XML BibTex Google Scholar Download: monte_carlo_dlst_w_7-_and_3-_ess.pdf (113.57 KB)
[12] Nelson, Raymond. Handler, Mark. 2008/01/01. Brute-Force Comparison: A Monte Carlo Study of the Objective Scoring System version 3 (OSS-3) and Human Polygraph Scorers. Polygraph, 2008, 37(3)
[13] Krapohl, Donald J., Dutton, Donnie W. 2024. Decision Agreement Among Three Screening Algorithms and Manual Scoring with the Empirical Scoring System. American Polygraph Association. Polygraph & Forensic Credibility Assessment, 2025, 54 (1)
[14] National Research Council. 2003. The Polygraph and Lie Detection. Washington, DC: The National Academies Press. https://doi.org/10.17226/10420.