How a 1947 dictionary laid the groundwork for AI-powered scientific translation
In the aftermath of World War II, as scientific advancement accelerated across continents, a significant barrier hampered progress: the specialized language of science itself. Chemists in Madrid struggled to understand American medical research; biologists in Buenos Aires couldn't easily access British pharmacological studies. This was the landscape when Morris Goldberg undertook a monumental task—creating the first comprehensive English-Spanish chemical and medical dictionary. Published in 1947 by McGraw-Hill, his 692-page reference work became an essential bridge connecting scientific communities across linguistic divides 3 .
Comprehensive coverage of chemical and medical terminology
Bridging English and Spanish scientific communities
Goldberg's dictionary arrived at a pivotal moment in scientific history. As Nature noted in their 1947 review, the book was "of great value to commercial undertakings dealing with Spain and Spanish America," acknowledging its vital role in facilitating international scientific exchange 3 . The dictionary encompassed not just straightforward translations but included "a brief simple explanation of the term in Spanish" wherever necessary, recognizing that scientific concepts often require more than direct word substitution 3 .
"Of great value to commercial undertakings dealing with Spain and Spanish America"
Nearly eight decades later, we're witnessing another revolution in how scientific knowledge is organized, accessed, and translated. The labor-intensive process that Goldberg exemplified—years of meticulous compilation by experts—is now being transformed by artificial intelligence. This article explores both Goldberg's pioneering work and the surprising modern research that's automating the very process he perfected.
Morris Goldberg's English-Spanish Chemical and Medical Dictionary was notable for its remarkable breadth, covering terms employed in "medicine, surgery, dentistry, veterinary, biochemistry, biology, pharmacy, allied sciences and related scientific equipment" . The first edition, published in 1947, stood as a substantial volume of ix + 692 pages, organized for maximum utility to "technical translators in the preparation of literature in Spanish" 3 .
The dictionary's approach went beyond mere word-for-word translation. As the Nature review highlighted, Goldberg's work included explanations of terms in Spanish when necessary, recognizing that scientific terminology often carries context-specific meanings that direct translations might miss 3 . This thoughtful methodology made it particularly valuable for professionals who needed to understand and apply scientific concepts across languages, not just decode individual terms.
At a price of 50 shillings in the U.K. (approximately $10.00 in the U.S.), the dictionary was a significant investment for laboratories, universities, and commercial enterprises 3 . Its value proposition was clear: enabling smoother communication between English-speaking and Spanish-speaking scientific communities.
The timing of its publication was particularly significant. The post-war period saw unprecedented international collaboration in science, coupled with growing specialization in fields like biochemistry and pharmacology. Goldberg's dictionary effectively became an essential tool for this new era of global scientific exchange, serving as a linguistic bridge at a time when personal and professional connections across continents were reforming after years of conflict.
50s
(~$10.00 in U.S.)
Goldberg would later expand his work to include a Spanish-English volume in 1952, creating a comprehensive two-way translation system for scientific professionals 6 . Together, these volumes represented the state of the art in scientific lexicography for their time, painstakingly compiled through human expertise over what must have been years of dedicated effort.
Traditional dictionary compilation has always been what researchers describe as "highly labor-intensive, requiring significant time and expertise" 2 . This certainly reflected Goldberg's undertaking—each entry required research, verification, and careful explanation. Furthermore, as language evolves and new scientific terms emerge, dictionaries require continuous updates to remain relevant 2 .
Traditional dictionary compilation required extensive manual research and verification
This manual process created inherent limitations. As noted in recent computational linguistics research, traditional dictionaries "are not without flaws; for instance, it is not uncommon for definitions to include terms that are more complex than the word being defined" 2 . Additionally, the sheer effort required meant that specialized dictionaries might take years to produce, potentially lagging behind scientific advancement.
Recent breakthroughs in artificial intelligence are revolutionizing this centuries-old process. Researchers are now exploring whether "the task of compiling a modern explanatory dictionary can be addressed using machine learning" 2 . Specifically, they're focusing on two subtasks:
Creating definitions for words not yet included in dictionaries 2
Producing generalized definitions based on multiple existing dictionaries 2
The results have been promising. While traditional sequence-to-sequence models like T5 and BART "struggle with producing clear and accurate definitions," large language models (LLMs) "yield significantly better results" 2 . However, the research notes that "generating definitions from scratch works noticeably worse than generalizing existing ones," suggesting that AI and human expertise may be most effective in combination 2 .
One of the most challenging aspects of dictionary creation—whether for general language or specialized scientific terminology—is crafting example sentences that effectively illustrate word usage in context. Recent research has made significant strides in automating this specific task.
Researchers have developed an innovative approach to generating and evaluating dictionary example sentences using LLMs in a zero-shot manner (without task-specific training) 4 . The methodology involves:
Various LLMs including Claude, Llama-2, and Mistral generate example sentences given a target word, definition, and part-of-speech information
A masked language model identifies and selects sentences that best exemplify word meaning
A new metric called OxfordEval measures the win-rate of generated sentences against existing Oxford Dictionary sentences 4
The OxfordEval metric has shown "high alignment with human judgments, enabling large-scale automated quality evaluation" 4 . This validation against human assessment is crucial for ensuring the practical utility of AI-generated content.
The experiments yielded impressive results. According to the research, "LLMs can generate sentences that are preferred to Oxford Dictionary example sentences 83.9% of the time, while past model-generated sentences only have a win-rate of 39.8%" 4 . When the novel masked language model reranking method was applied, the win-rate further increased to 85.1% 4 .
| Method | Win Rate vs. Oxford Baseline | Key Characteristics |
|---|---|---|
| Traditional Human Creation | Baseline | Labor-intensive, high expertise required |
| Early Model Generation | 39.8% | Custom-trained models, word sense datasets |
| Modern LLM Generation | 83.9% | Zero-shot, uses foundational models |
| FM-MLM Approach | 85.1% | Combines LLMs with masked language model reranking |
| Data Source: 4 | ||
This represents a fraction of the time and financial resources required for traditional dictionary compilation, demonstrating how dramatically the field is evolving from Goldberg's era of painstaking manual work.
The resources available to scientific translators and researchers have evolved dramatically since Goldberg's era. Where specialists once relied solely on printed references, they now have access to extensive digital tools.
| Era | Primary Resources | Key Advantages | Limitations |
|---|---|---|---|
| 1940s-1950s (Goldberg Era) | Printed specialized dictionaries | Comprehensive coverage, expert-curated | Static content, physical access required |
| Late 20th Century | Electronic databases, early digital references | Searchable, regularly updated | Limited interoperability, subscription costs |
| Current | AI-assisted tools, comprehensive digital libraries | Dynamic content, integration capabilities | Varying quality, requires verification |
Examples: SciFinder, Embase
Key Features: Relevance-ranked answers, biosequence searching, patent mapping 5
Examples: Springer Nature Protocols
Key Features: Reproducible step-by-step laboratory procedures 5
Examples: Laboratory equipment videos
Key Features: Visual demonstrations of techniques and equipment use 5
Examples: Knovel, AccessEngineering
Key Features: Interactive graphs and equations, textbook integration 5
Morris Goldberg's 1947 dictionary represented a pinnacle of human-expertise-driven scientific translation. Its careful compilation of terms across medicine, chemistry, and allied fields created essential bridges between scientific communities. As we've seen, contemporary research is now building on this foundation in surprising ways, using advanced AI to automate and enhance the very processes that required years of human labor in Goldberg's time.
The future likely holds a collaborative relationship between human expertise and artificial intelligence—where AI generates initial content and human specialists refine it, or where human-created resources like Goldberg's serve as training data and quality benchmarks for AI systems. This synergy promises to accelerate the creation and maintenance of specialized dictionaries even as scientific vocabulary continues to expand and evolve.
The future of scientific lexicography
What hasn't changed since Goldberg's time is the fundamental need for accurate, accessible scientific communication across languages and disciplines.
As AI tools become more sophisticated, they may eventually handle routine translation tasks, but the human understanding of scientific concepts and contexts—so evident in Goldberg's work—will remain essential for navigating the nuances of specialized terminology. The methods have transformed, but the mission continues: building bridges of understanding across the landscape of human knowledge.
Win Rate of Generated Sentences vs Oxford Dictionary
Data shows significant improvement in AI-generated dictionary content quality 4
Goldberg publishes English-Spanish Chemical and Medical Dictionary
Goldberg expands with Spanish-English volume
Transition to electronic databases and digital references
AI-powered tools and comprehensive digital libraries