ChatGPT has language tics due to digital colonialism

The widespread use of AI is opening up new language trends and revealing some obsolete words. Using certain words would even reveal the texts generated by ChatGPT, but above all how it was trained.

“ Dive ” isn’t really a commonly used English word. It’s about the fact ” reach something from the surface “, or ” look at something in detail, try to find information “, according to the Cambridge Dictionary. However, its use in research articles has increased tenfold since 2023 as noted artificial intelligence expert Jeremy Nguyen.

MYSTERY SOLVED!

Why does ChatGPT use the word “dig” so often? From 2022 to 2024, we see a tenfold increase in the proportion of medical research that uses the word “immersion.” But why?@alexhern The Guardian may have just solved this problem. The thread below is full of tips: pic.twitter.com/koXHxVBhWg

— Jeremy Nguyen ✍🏼 🚢 (@JeremyNguyenPhD) April 17, 2024

Why is there this sudden and comical increase in the use of the word ” delve “? According to the researcher, the answer is simple: ChatGPT. OpenAI’s chatbot will be used en masse to write research papers, Jeremy Nguyen said in a post on X (formerly Twitter). But ” delve “This is not the only unusual word used disproportionately by artificial intelligence, and it may have something to do with how chatbots are trained.

ChatGPT develops its own language ticks, inspired by the employees who trained it.

In an article published on April 16, 2024, The Guardian explains that this mania for the constant use of the word “delve” is not a coincidence: “ rather, it is a very real phenomenon related to the way ChatGPT was designed. » Like other chatbots, ChatGPT relies on a language model to operate – and this model itself was trained on gigantic volumes of text found on the Internet. These texts were then labeled, and the AI’s training on these language models was supervised by humans.

However, these people in most cases are unstable workers,” small AI hands » cheap work on labeling data from Kenya or Madagascar. However, while in British or American English the word “delve” is rarely used, in Nigeria the word ” much more often used in business English “, notes the Guardian. ” So the people responsible for training the AI provided examples in their own language, resulting in an AI system that writes a little like the English spoken in Africa. »

French-speaking AIs are typically trained by Malagasy workers, so the results they get may be influenced by the way they speak. Like Algorithm, a way of communicating on TikTok that avoids algorithm moderation, using ChatGPT creates a new way of expressing yourself right out of it.” digital colonization ” : ” AI-ese “, which can be translated into French as “Ialien”.

“Immersion”, as ChatGPT would say // Source: Numerama

This “pseudonym” is especially notable, so the Guardian article and Jeremy Nguyen’s remarks did not surprise AI regulars. On a subreddit dedicated to ChatGPT, users cheerfully shared other terms that they believed were indicators of sentences written by the chatbot. Among those who returned the most, it can be noted” mysterious “, ” growing “, ” demystify “, ” strong ” AND “ guard “. Taken individually, none of these words are particularly surprising (except perhaps nascentrarely used in everyday life), it is the joint use of these terms that is generally a sign of AI.

Language tics in ChatGPT are not limited to English. Numerama’s journalists use AI daily to compile article reviews (available through a Numerama Plus subscription) and have thus been able to spot some of these ticks. The AI often generates very long sentences with confusing vocabulary and has a tiresome tendency to use the present participle. When creating the abstract for this article, ChatGPT used ” multiply “, ” linguistic practices “… but this time there is no present participle.

Want to know everything about tomorrow’s mobility, from electric vehicles to e-bikes? Subscribe to our Watt Else newsletter!