The $2 Billion Hiring Placebo: Why Personality Tests Fail the Science

One-line summary

A landmark meta-analysis finds personality tests add virtually no hiring value once cognitive ability is accounted for, exposing a costly industry-wide placebo.

A 2023 meta-analysis published in the Journal of Applied Psychology reveals that after correcting for range restriction and publication bias, personality tests provide almost zero incremental validity beyond cognitive ability measures. The findings challenge decades of industrial-organizational psychology orthodoxy about traits like conscientiousness as robust hiring predictors. With the pre-employment assessment market reaching billions annually, the study suggests most organizations are paying for expensive rituals rather than scientific tools. The burden of proof has shifted decisively to employers to demonstrate their specific tests actually work in their own contexts.

The $2 Billion Placebo Effect: What the Latest Meta-Analysis Means for Personality Testing

In 2023, a team led by Paul Sackett and colleagues published a re-examination of the predictive validity of the Big Five personality traits in the Journal of Applied Psychology. The study corrected for two long-recognized but rarely simultaneously addressed artifacts: range restriction and publication bias. Its central finding was that, once cognitive ability is accounted for, personality test scores explain almost no additional variance in job performance. In high-stakes, real-world selection settings, the incremental validity of traits like conscientiousness—long held up as the gold standard of hireability—was effectively zero. For decades, industrial-organizational psychology textbooks have taught that conscientiousness and emotional stability are robust, generalizable predictors of performance. The new meta-analysis does not overturn the idea that these traits matter in some contexts, but it does suggest that the story told to hiring managers has been substantially overfitted to studies that never corrected for the full set of statistical distortions. When range restriction and publication bias are properly modeled, the effect sizes shrink to the point where a generic personality test adds no detectable value beyond a well-designed cognitive ability measure. The implications for practice are not subtle. The global market for pre-employment assessments—including personality inventories, integrity tests, and AI-scored video interviews—runs into the billions of dollars annually. Organizations that purchase an off-the-shelf conscientiousness scale and use it to screen thousands of applicants are, in effect, paying for a placebo. The test may feel rigorous, and its output may look like a number, but if it does not improve the quality of hires in the organization’s own setting, it is an expensive ritual rather than a scientific tool. Understanding why the meta-analysis matters requires a brief look at the methodological problems it addressed. Range restriction occurs because performance data are only available for people who were hired; the pool of applicants is wider, and the correlations observed in the hired sample are attenuated versions of the true population correlations. Correcting for this can either raise or lower validity estimates, depending on the selection process. Publication bias, meanwhile, means that studies finding small or null effects are less likely to appear in journals, inflating the apparent consensus. When Sackett et al. applied corrections for both simultaneously, the incremental contribution of personality—beyond cognitive ability—collapsed. This does not mean that every personality test is useless. It means that the burden of proof has shifted decisively to the employer. A test publisher’s technical manual, which may cite validity coefficients from studies conducted on a different population, under different job conditions, and without the same corrections, is not sufficient evidence that the test works in your organization, with your applicant pool, for the specific roles you are filling. The study is a reminder that generic claims about “conscientious people perform better” are too coarse to guide high-stakes decisions. The legal and reputational risks compound the problem. Unvalidated selection tools can embed cultural and cognitive biases that disproportionately screen out candidates from underrepresented groups. If a personality test has not been locally validated for both predictive accuracy and adverse impact, and if the scoring logic is opaque, the organization is exposed to exactly the kind of accountability gap that regulators and litigators are now scrutinizing in AI-driven hiring tools. Reference to fairness in AI recruitment literature emphasizes that the central concern is the reproduction of bias through unexamined processes; a personality test that has never been audited for differential item functioning is no different in principle from a machine-learning model that learns to penalize zip codes. The recent wave of regulatory attention to algorithmic hiring has raised the standard of evidence for all selection methods. If a company cannot explain why a particular trait is job-related and how the cut score was derived, it is in a weak position to defend its process. The same transparency and auditability that are now demanded of AI systems apply equally to traditional psychometric instruments. Communicating fair hiring policies to candidates—a practice recommended to build trust—becomes hollow if the underlying test has never been shown to be fair. For talent acquisition directors and people analytics leads, the actionable response is not to abandon all structured assessment. It is to adopt a local validation discipline: run a concurrent or predictive study within your own organization, compare the personality measure against a cognitive ability baseline, and report not just the statistical significance but the effect size and the incremental R². If the test does not clear a meaningful threshold, stop using it. If it does, document the evidence and monitor for drift as job roles and applicant pools change. The same rigor should be applied to any AI-driven tool that claims to infer personality from video, text, or game behavior; the underlying construct is, if anything, even more susceptible to the validity collapse documented by Sackett et al. The 2023 meta-analysis is not the final word on personality and performance. But it is the most comprehensive correction of the evidence base to date, and it lands at a moment when the cost of unvalidated tools—financial, legal, and human—has never been higher. For organizations that have built their selection systems around a belief in the universal predictive power of conscientiousness, the data now demand a harder look.