Data Explorations

Psychology · Psychometrics · Cognitive Science · Concept Lineage Explorer

Evolution of Intelligence
Lineage

The concept of intelligence has never been neutral. Since Francis Galton first proposed that mental ability was inherited and normally distributed — and that the best families should be encouraged to breed — the science of intelligence has been entangled with questions of race, class, social policy, and human worth. This exploration traces how the field moved from Victorian anthropometry to IQ tests used to bar immigrants, from factor analysis to multiple intelligences, from genetic determinism to the Flynn Effect's remarkable demonstration that IQ scores can rise dramatically within a single generation. Each era reveals not just new knowledge but new assumptions, new uses of that knowledge, and new controversies about what intelligence is, who has it, and what measuring it does to people.

1869–1905

Hereditary Genius & Early Psychometrics

The modern science of intelligence was born in Victorian England, shaped by Darwin's theory of evolution and by the anxieties of a class society confronting industrialisation and imperial expansion. Francis Galton, Darwin's cousin, published Hereditary Genius in 1869 and argued that exceptional ability — in judges, scientists, commanders — ran in families far too consistently to be attributed to opportunity alone. His conclusion was hereditarian and hierarchical: intelligence was fixed at birth, normally distributed across the population, and stratified by race and class. Galton founded the eugenics movement and coined the term itself, envisioning selective breeding as the rational management of human mental capital. Karl Pearson transformed Galton's intuitions into rigorous statistical machinery — the correlation coefficient, regression analysis, and biometric methods that would underpin quantitative psychology for a century. James McKeen Cattell brought the programme to America, establishing a laboratory at Columbia and proposing that simple sensory and reaction-time measurements could serve as proxies for general intelligence. None of Cattell's reaction-time tests predicted academic performance, a failure that momentarily checked the hereditarian programme — but Spearman's work and Binet's practical scale would soon revive it on different terms. The era established a template that proved remarkably durable: intelligence as a unitary, heritable, rankable property of individuals; statistical methods as the path to its measurement; and institutional science as the arbiter of who possessed it in what degree.

Critique: The hereditarian framework of this era was saturated with the class and racial assumptions of Victorian Britain. 'Eminent men' were defined by social recognition, itself a product of privilege. Galton's normal distribution was imposed on data that could not cleanly support it. Pearson's biometric programme was explicitly tied to eugenics policy. Cattell's anthropometric tests failed predictively, yet the conviction that intelligence was measurable and heritable survived the failure. The entire era confused correlation with causation and social hierarchy with biological fact.

1905–1935

The Psychometric Testing Movement

The invention of the Binet-Simon Scale in 1905 transformed the study of intelligence from speculative anthropometry into practical applied psychology. Alfred Binet was commissioned by the French Ministry of Education not to rank the general population but to identify children who needed additional instruction. His pragmatic, atheoretical approach — assembling tasks that distinguished children of different ages — produced the concept of mental age and a usable instrument. When Lewis Terman revised the scale at Stanford in 1916 and introduced William Stern's IQ ratio, the test entered American culture as a scientific measure of innate intelligence. Charles Spearman's 1904 paper identifying a general factor g through factor analysis gave the testing movement its theoretical backbone. Psychologists could now claim that their tests were not merely measuring a miscellany of skills but capturing a single, stable, underlying capacity. Henry Goddard at Vineland translated this into social policy, diagnosing immigrants at Ellis Island and publishing the Kallikak family study as evidence that feeblemindedness was hereditary and dangerous. World War One produced the most consequential deployment of intelligence testing in history. Robert Yerkes organised the Army Alpha and Beta tests, administered to 1.75 million recruits. The results were used — and badly misused — to argue for innate racial differences in intelligence and to support the Immigration Restriction Act of 1924. Cyril Burt's twin studies appeared to confirm high heritability; their later exposure as fraudulent would shake the field to its foundations.

Critique: This era weaponised the intelligence test against vulnerable populations. The IQ ratio assumed mental development was linear and uniform, which it is not. Army test 'race differences' reflected differential education and English literacy, not innate ability. Goddard's Kallikak study was methodologically indefensible. Burt fabricated data. The entire programme conflated cultural familiarity with test content with cognitive capacity — a confusion that has never been fully resolved.

1930–1965

Factor Analysis & the Architecture of Mind

As the testing movement matured, psychologists turned from simply administering tests to asking what tests revealed about the structure of mental ability. Factor analysis — the statistical technique for identifying latent variables underlying observed correlations — became the primary tool. The central debate of the era was whether Spearman's single g adequately described mental organisation or whether intelligence was better understood as a profile of distinct primary abilities. Louis Thurstone at the University of Chicago conducted factor analyses of large test batteries and identified seven primary mental abilities: verbal comprehension, word fluency, number facility, spatial visualisation, associative memory, perceptual speed, and inductive reasoning. For Thurstone, g was a second-order artefact rather than the fundamental reality. Raymond Cattell later synthesised the Spearman and Thurstone traditions with his Gf-Gc theory, distinguishing fluid intelligence (capacity for novel problem-solving, peaking in young adulthood) from crystallised intelligence (accumulated knowledge and expertise, relatively stable across the lifespan). J.P. Guilford's Structure of Intellect model pushed pluralism to an extreme, proposing over 120 distinct intellectual factors arranged in a three-dimensional taxonomy. David Wechsler, working from clinical rather than theoretical motivations, developed the WAIS and WISC scales that remain the gold standard for individual assessment, introducing the deviation IQ that replaced Terman's ratio formula and remains in use today. The era established that the question 'how many intelligences are there?' depends substantially on method — on which tests are included, how they are scored, and where one chooses to stop factoring.

Critique: Factor analysis extracts mathematical structures that reflect the correlational architecture of whatever tests are included. Including only verbal and numerical tests produces a different factor structure than including spatial, musical, or social tasks. The number and nature of factors is not discovered in data; it is partly imposed by methodological choices. Guilford's 120+ factors were largely unreplicated. The debate between g and multiple factors has never been definitively resolved because it is partly a question about the purpose of the model.

1960–1985

Cognitive-Developmental Perspectives

From the 1960s onward, the psychometric tradition faced challenges from multiple directions simultaneously. Developmental psychologists, particularly Jean Piaget and Lev Vygotsky, had built accounts of intelligence as a process of construction rather than a fixed endowment — changing qualitatively across development, shaped profoundly by social and cultural context. These accounts were incompatible with the static, rankable quantity that psychometricians measured. Arthur Jensen's 1969 article in the Harvard Educational Review, 'How Much Can We Boost IQ and Scholastic Achievement?', reignited the heredity controversy by arguing that compensatory education programmes like Head Start had failed because intelligence was substantially heritable and that this might explain Black-White IQ score gaps. The paper generated enormous controversy and a wave of methodological critique. Leon Kamin examined the original studies underlying heritability estimates and found the data seriously flawed; his 1974 book, along with exposing Burt's fraud, made a powerful environmentalist case. James Flynn's discovery — published in 1984 — that IQ scores had been rising substantially across generations in multiple countries posed a deep problem for strong hereditarian accounts. If intelligence were primarily genetically determined, how could population-level scores rise so dramatically in a few decades? The Flynn Effect became the most important empirical constraint on any theory of intelligence, demonstrating that whatever IQ tests measure is highly responsive to environmental conditions.

Critique: Jensen's 1969 paper conflated within-group heritability with between-group differences, a logical error that critics identified immediately. Piaget's stages were more culturally specific than he acknowledged. Vygotsky's theory, filtered through translations and incomplete manuscripts, was sometimes applied outside its original sociohistorical frame. The heritability estimates of the era relied on twin and adoption studies whose methodological assumptions were contested. The Flynn Effect remained unexplained — and its explanation remains contested — for decades.

1983–2005

Multiple Intelligences & the Broadening of the Concept

The 1980s and 1990s saw a proliferation of alternative intelligence frameworks that challenged both the psychometric tradition and its narrow definition of what intelligence meant. Howard Gardner's Frames of Mind (1983) proposed at least seven intelligences — linguistic, logical-mathematical, spatial, musical, bodily-kinaesthetic, interpersonal, intrapersonal — and later added naturalist and existential. Gardner drew on developmental psychology, neuropsychology, and evolutionary biology to argue that the mind was better understood as a set of relatively autonomous modules than as a single general-purpose processor. Robert Sternberg's triarchic theory distinguished analytical, creative, and practical intelligence, arguing that standard IQ tests measured only the first. His concept of 'successful intelligence' — the ability to succeed in one's own context by capitalising on strengths and compensating for weaknesses — explicitly challenged the test-score definition. Daniel Goleman's 1995 book Emotional Intelligence brought Salovey and Mayer's more rigorous construct to a mass audience, arguing that emotional self-regulation and empathy were more important to life success than IQ. The Bell Curve (1994) by Herrnstein and Murray reasserted the hereditarian position with elaborate data on IQ and social outcomes, generating a storm of responses. Carol Dweck's research on implicit theories of intelligence — whether children believed ability was fixed or malleable — demonstrated that these beliefs had measurable effects on learning behaviour, independent of actual ability level.

Critique: Gardner's multiple intelligences were never operationalised into assessments that could be validated against independent criteria; critics noted that his 'intelligences' looked like talents or domains of expertise rather than cognitive capacities in the psychometric sense. The emotional intelligence construct suffered from definitional inflation: Goleman's popularised version bore little resemblance to Salovey and Mayer's original model. The Bell Curve's data analyses were contested on multiple technical grounds. The entire era raised the question: if intelligence can mean almost anything, does it still mean anything?

2000–present

Contemporary Debates & New Frontiers

The early twenty-first century has seen the intelligence debate expand in scope while the core controversies remain unresolved. Genome-wide association studies (GWAS) have identified thousands of genetic variants with tiny individual effects on measured cognitive ability, producing polygenic scores that predict perhaps 10-15% of variance in educational attainment. This is real but modest, and researchers debate whether the remaining heritability estimate is genetic, gene-environment interaction, or measurement artefact. The WEIRD critique — Henrich, Heine, and Norenzayan's 2010 paper on the systematic over-reliance on Western, Educated, Industrialised, Rich, Democratic samples in psychology — applied with particular force to intelligence research, questioning whether the constructs and norms generalise across cultures. Claude Steele's work on stereotype threat demonstrated that mere awareness of a negative group stereotype could impair performance on cognitive tests, adding a situational mechanism to the longstanding debate about group differences. Collective intelligence research has moved beyond the individual to ask whether groups have stable cognitive profiles — findings by Woolley and colleagues suggesting a collective c-factor that predicts group performance. Artificial intelligence has renewed philosophical debates about what intelligence is, whether it can be instantiated in silicon, and how human and machine cognition relate. Neuroscience has identified neural correlates of intelligence including processing speed, working memory capacity, and global brain connectivity — but neural correlates are not explanations.

Critique: GWAS polygenic scores for intelligence capture gene-environment correlations as well as direct genetic effects, and their predictive validity varies substantially across ancestral populations, raising concerns about applicability and misuse. The Flynn Effect explanation remains contested, with candidates ranging from nutrition to education to reduced pathogen load. Stereotype threat findings have faced replication challenges. The collective intelligence c-factor is contested. The field has accumulated an enormous amount of empirical knowledge while remaining fundamentally uncertain about what it is measuring.

33 nodes6 eras

Based on primary sources and scholarship in the history and philosophy of psychology, psychometrics, and cognitive science.