Encoding language: Cognition vs. computation

While computers (think ChatGPT, or even google translate) are getting better at approximating human language, they have conventionally coded meaning through rigid sets of rules. Human languages, or “natural language,” work through a far less explicit series of rules that linguists spend a lot of time trying to figure out.

Human languages vary enormously though, and so do the ways they code meaning. Could the diversity of languages worldwide hold clues for computational linguistics?

Language revitalization is mostly by and for the communities that speak them. Academic linguistic researchers can play a support role by systematically coding language structures to support education and preservation.

According to some researchers, there may be some languages where binary logic and natural language converge more clearly than others. The Ubyssey talked to researchers that work on Gitxsan and classical Sanskrit to learn more about the hidden pathways in language where people and computers could overlap.

Dr. Miikka Silfverberg, assistant professor in the department of linguistics, researches how to use machine learning to document the Indigenous languages Gitksan and Lillooet.

He combines computer skills with linguistic knowledge, using artificial intelligence and machine learning (AI/ML) to record and work with the scarcely available data available about these languages, accelerating the process of recovering them from their endangered state.

AI/ML refers to artificial intelligence that mimics human cognition and machine learning. Silfverberg decodes complex morphology by transcribing speech from native speakers and then trains machines to process this data. His computer models can interpret about 90 per cent of the language he's working with, but linguists still need to review the rest.

Compared to Gitksan and Lillooet languages, English has a lot more variation, but is still widely used in technosocial circumstances. In fact, Silfverberg notes that language variation can be a sign of a healthy, thriving language that is commonly spoken.

“Geographical spread of a language naturally and spontaneously induces variations over time,” Sifverberg explained.

Machines are taught using “supervised machine learning,” a sub-category of AI/ML that is resource-intensive and requires labelled data to function. Linguistic variations in English are counterbalanced by access to heaps of data, aiding modern technology in interacting fluently with English speakers.

But is there a language that’s inherently easier to use for coding, without needing all of this data to navigate it?

In 1985, Rick Briggs, a NASA engineer, published a paper titled “‘Knowledge Representation in Sanskrit and Artificial Intelligence.’” It posed the idea that the ancient Indo-Aryan language might have unique traits that make it useful for coding. The intrinsic relationship between Sanskrit and coding was also evident in a 2020 article published at the University of Toronto on an influx of computer science students taking Sanskrit courses.

Sanskrit as we know it today can be dated back to approximately 2500 years ago. It was trumped by regional languages over time, so native speakers are limited.

Sanskrit follows such a specific set of rules that it is sometimes mistakenly characterized as an “artificial” language. Originally, Sanskrit had many variations, but Panini — a prominent grammarian — systemized the language in his book Ashtadhyayi.

“From that period on, we get a form of Sanskrit that's pretty stable,” said Dr. Janet Um, a scholar of Sanskrit literature. “Perhaps that's where this understanding that it was an artificial language might come into play. It's so systematized and scientific [so] it could seem, in some ways, like it were an artificial language.”

Dr. Prasad Bhide, a Sanskrit playwright and scholar, highlights the language’s unique attributes. Sanskrit has dual nouns alongside singular and plural forms and the use of eight different cases of affixes to convey word intent.

This standardization allowed for a less ambiguous interpretation of Sanskrit — an advantageous quality for computer comprehension. In comparing English, a language with a web of variation and growing lexicon, to Sanskrit, an ancient systematized language, we see an inverse relationship between language variation and its ability to be processed.

This conflict between computational and cognitive processing leaves us to wonder if Sanskrit could be the future of technological advancement.

This article is a part of The Ubyssey's 2023 language supplement, In Other Words.

First online Nov. 28, 2023, 2:59 p.m.

Submit a complaint Report a correction

Stuti Sheth author

See more from Stuti Sheth

Encoding language: Cognition vs. computation

More from In Other Words

Building inclusivity into the French language, one classroom at a time

I didn't write this poem

The Philippines speaks more languages than Tagalog. Why does UBC teach none of them?

Kitchen morphemes

‘An act of great care and great love’: The path toward Yiddish at UBC

Stumbling to find the right words