jasima / Proposal

Much of the content so far has drawn upon technical research, or research that is strictly numerically quantified and analyzed through the language of mathematics and LaTeX. Artistic capacity can only be a byproduct of what is presented and proved. Given that this project is not a technical endeavour, this project will focus on the artistic and philosophical. The technical background and implementations are the means to the end of creating art.

The Goal

So, the goal of a project like this is not to test and validate whether an AI can create a constructed language. Asking an LLM to create a conlang is a straightforward affair, and there is no doubt that with the right prompt crafting, a workable result can be achieved. Rather, this project will explore language’s infinite capacity to symbolize, founded upon new technologies that enable unfathomable, novel expressions. This process will serve to demonstrate language’s divergence from the initial layman conceptions as a sole form of human communication, to an emergent process that supplants physical reality in its various dimensional mediums. I seek to enmesh two polar opposite forms of linguistic disciplines, one that is often for the sake of language, that is the creation of conlangs; and the other that is for the sake of seeking the truth, the scientific linguistic explorations of language evolution using cutting-edge technologies and techniques.

Philosophical Considerations

In the design of this project, it is crucial to remember that language games are not something humans intentionally engage in in the real world. Instead of being driven by the simple structures of reinforcement learning, such as identifying an image correctly then receiving a reward, we have extrinsic motivations like wealth and food, as well as intrinsic ones like self-integrity, moral goodness, and just a drive to satiate a need to have a good time. Intrinsic motivations are more abstract and can be less logical, and one can argue, less “scientific”. Furthermore, these intrinsic motivations bring about philosophical and psychological questions too, such as, why do we need to have drive and have a moral compass that dictates our actions? How does the manipulation of the truth, the matter of fact and the objective, become an instrument for manipulation, like lying? Those questions direct the way we use language, the way language interacts with us, and most importantly, the way language evolves.

And here is a point of philosophical contention for a project like this one. In order to explore the “infinite capacity to symbolize”, those that do the symbolization, the agents, need a reason to symbolize. If they cannot symbolize or have no reason to do so, there exists no simulation. Of course, the previous experiments did give a need for the agents to symbolize by making them participate in language games. However, human symbolizations are not just correct associations. They are associations that are correct for a reason. One reason an association is correct is because the majority of speakers agree that that association is true, the ultimate de facto. A cat is a cat because we all agree that that creature has a map-bound representation to the word and its meaning. However, when it comes to conlangs, that reason for a symbol’s meaning or existence is not as linear. Interlocutors in online communities can argue over what direction to take a conlang in, which means conlangs have the abstract element of intentionality. Take for instance, a conlang that can symbolize objects, things, or concepts as numbers. The intentionality is the philosophical approach to express the world in a numerical way that is not confined to the logics of mathematics. To simulate such developmental and evolutionary complexity, I argue that language games are not enough for a conlang simulation, especially when using LLMs. A surrounding reality must be simulated too.

Fundamental Parameters

Here are the fundamental parameters for the implementation of this project:

A continuous, simulated world must exist for the agents.
The agents must be communicative agents.
The agents must have a purpose.
The agents must have a constant stream of perceptual input.

A continuous, simulated world must exist

A continuous world means that there exists time and change. Previous events have effects on current and future events. The temporal scale of the world is infinite and there exists no end. A simulated world means that the world has a “physical” dimension. That means there is already a constructed materialization of the world in some form that can be perceived in some way by an agent. Theoretically, an agent also has mobility to traverse this materialization. In such a continuous and simulated world, the question an agent must be able to answer is not just “what do you see?”, but also “how do you see it?”.

The agents must be communicative agents

As explained, the agents must have all the faculties of a communicative agent. Fortunately, generative AIs like LLMs can combine many of their faculties into a unified package.

The agents must have a purpose

Imbued in each agent is a purpose. This purpose is defined from the outset. This purpose can also be an assigned job or a goal that an agent must execute for some particular purpose in the simulation.

The agents must have a constant stream of perceptual input

The agents must not just be able to communicate but must also be able to perceive continuous changes in the environment on a change-by-change level. For example, if the simulation states that “it is now nighttime”, in circumstances that may permit an agent to be aware of a nighttime, the agent must be able to receive this information exactly and immediately. If an agent “dies” in a simulation, this news must also be broadcasted and received. This constant perceptual input is also necessary if an agent is to traverse an environment.

Implementing a World

Here are two ways in which to implement a world for these agents to exist inside.

A simulation that is completely removed from physical world

A simulated world could be generated from a seed. This seed is a prompt, such as:

You are in a room surrounded by chairs. There is window that shines light upon the chairs. Your interlocutors sit across and around you in those chairs. The day has just started. Outside there are fir trees and an axe in a stub. The room has distinct details, it is twenty-by-twenty meters in length and width. It is ten meters in height.

Each agent is given this seed at the beginning. Further details of the world can be included, such as the name of the world, a simple description of time of day.

As time progresses, the state of the simulation is sent to each agent. For example, if the simulation is to express a season change, there must be a description without explicit labeling, such as:

The leaves on trees fall to the ground. They take on a different hue.

Notice that there is no labeling for the color of the leaves, but rather just the objective fact that they are now a different color. The labeling is in the hands of the agents.

Relationships between agents must also be defined in order to have a conversation. Relationships instantiate power dynamics and interaction presets. It defines how the agents are to use language with each other and develop a language with abstract human concepts of the self and the other.

A simulation that is a hybrid of our world and their own

Opening up the simulation to include our reality is another possibility into incorporating a world in which agents can interact. Using a device, like a camera or a microphone, to intercept real world signals and translate them into a form perceptible for an agent may be feasible. Time then is dictated by real life time, and events are dictated by real life events. Perhaps someone moves in front of the camera, something is said by the microphone, or perhaps there is a change of scenery in front of the camera. Regardless of input method, the same distribution of information to the agents must be considered and implemented.

Creating a simulation that is void of the real world means a sanitized view of a conlang’s development. The viewer of an experiment can truly take a backseat and simply observe. Bridging the simulation into the real world means the viewer can also become a participant in the simulation.

Implementing a Purpose

A conlang has a purpose and for that purpose to manifest or have legitimacy, the agents, those who are developing the conlang, must have an intrinsic reason. This intrinsic reason is more of a pseudo-intrinsic reason, because it is not developed by themselves, but is instead imbued by an initial prompt. Here is a prompt that imbues purpose:

You are a researcher looking to improve the efficiency and communication of language. You work at a laboratory along with ten other colleagues that experiment with the many possibilities of expression.

With this prompt, the agent is “primed” and will generate output accordingly against further input. This given purpose of being a researcher orients the approach of the agent way in which it decides how a conlang should or could evolve.

Implementing a Goal

A communicative goal is required for the agents to provide focal points of discourse. The previous example stated that the reason for research is “to improve the efficiency and communication of language”. Another possible goal in another simulation could be “to develop a way to communicate complex ideas in a compact form”. The agents will have work together to achieve such goals through conversation.

Implementing a Conversation

Agents speak English in conversation and perceive the world in English.

Since all agents have the capacity to generate thoughts and content based on preconceived inputs, each will have content that can be developed into arguments for a conversation. The conversation is the engine that produces the creative output. In some fashion, agents must be able to “meet” each other. This will create a circumstance in which agents have opportunities, hallmark to iterated learning processes explored earlier, to imitate other agents, interact with other agents, and transmit or teach data to other agents.

Meeting can occur individually or in groups. Conversations take place between two agents, a sender and a receiver. A group conversation does not necessarily mean that all the agents have the possibility to be senders. It just means that there is more than one receiver. Of course, the agent sending and the agent receiving is a matter of circumstance and social role.

Without making the intricacies of social dynamics too complex, a simple approach to have all agents on the same page is to have all congregate in the same virtual space. In other words, all of what is said by each agent is broadcasted and received by every other agent present. This approach is feasible if all of the agents are primed to be researchers and convene in a meeting.

In such a meeting, there would be an odd number of participants for the purposes of deal breaking. The agents would have to agree upon what parts of the conlang to evolve and why it matters. The meeting can be over after a certain period of time or if an agreement is unanimously accepted or if a majority consider a certain evolution the best path. After a meeting ends, one iteration of the conlang is established. The next iteration comes in the next meeting. Meetings are similar to the idea of a language game, however, the difference is that the purpose is to develop a language in alignment with a purpose regarding external and internal agent circumstances. Multiple variables are in action and affect the outcome, rather than a discrete and controlled sender-receiver dynamic.

Implementing the Conlang

The conlang is the parameter for agreement in a conversation. The conlang must have a purpose and type of categorization amongst the main three presented earlier: engineered, auxiliary, or artistic. Regardless of conlang type, the conlang needs an orthography, visualization, or vocalization.

Here are two methods of implementing a conlang.

Method 1: Creating a conlang from scratch

This method is the most involved and has more leeway for intent and is dependent on the goal. A possible conlang that can be created using technology as an extension of human capacity for language is one that utilizes color.

As mentioned earlier, conlangs like Ithkuil and Toki Pona have their own original orthography. Esperanto has orthography, however, it borrows from the Latin script to render morphology and express phonetics. Ithkuil and Toki Pona’s orthography are on the extreme sides of differing complexity. Ithkuil has an orthography that is compact and highly saturated in meaning. Toki Pona has an orthography that is simple and descriptive. A color conlang would activate a new visual cue that adds more information to a symbol.

Method 2: Developing an existing conlang

Using an existing conlang like Toki Pona or Ithkuil means more constraint on what the goal given to the agents would be. The goal would then become the actual goal of the original conlang. The simulation would then reorient itself from developing a unique take on conlangs to one that shows how a conlang can be extended into new realms using digital signals.

Using an existing conlang as a starting point will allow more time to develop the technical aspects of the project, like the CAs or visualizations. The conlangs explained in the introduction and background section all provide useful starting points. Esperanto’s use of Latin orthography can make the writing easier to evolve. Toki Pona and Ithkuil present interesting possibilities in terms of further developing a new orthography. Something inherent about these conlangs though is that they are for humans by humans.

Implementing Human Perception

Humans must be able to perceive the conlang in some way. Since this is a digital medium, there are many avenues to explore the different sensory inputs of language expression. For example, translating a color language into something we can perceive is to use hexadecimal strings as writing and the “vocalization” as an LED light source that expresses that exact color representation. Another would be to use generative voice models to generate vocalizations that are impossible for humans to replicate with our physical vocal tract, but can nevertheless still be heard and perceived by us. Only a machine can reproduce the signal.

Construction in Context

Where do these pieces fit in relation to previous experiments?

No generations

The predecessors and successors in generational learning as explored by previous language evolution experiments are not present in this simulation. Rather than passing the torch of the language between generations of mature and immature agents, all agents have the same competency of the conlang. The variable that changes is the environment rather than the agent. In a sense, the environment has its own generations.

Micro-level language games

In this simulation, the overarching shared cooperative goal is to develop the conlang. At the conversation level, the language game depends on the type of conversation being held. There can be multiple types of conversation templates defined, such as, one in which the conlang is used or one in which the conlang is simply discussed. Conversations provide context, history, and data on which to further the evolution of the conlang.

One possible palpable shared goal that the agents may work toward together is translating a text from English into the conlang.

In conversation, agents describe their perception of the world to each other by first using English, then by having constructive discussions about how the interpretation of the world can be translated. Each agent creates an internal guide to the conlang. If an agent is to make an error when producing content in the conlang, other agents may proceed to correct the offending agent based on their own guide. The agent must then correct the output. Anytime the conlang is used, an agent must verify if its internal mapping of the language maps similarly to the other agent’s. This correction interaction is an example of a repair strategy.

Technicalities

The engine of the simulation is a monolithic apparatus. This apparatus, which I will refer to as the facilitator, manages the state of the simulation, directs communication channels, and, depending on agent implementation, translates the possible output of the conlang to external interfaces for us to perceive.

The facilitator is one part server, one part clock, and one part generative source. It is a server because it routes message data to agents; It is a clock because it keeps track of time, whether it be for conversations or for the simulation of a virtual world; and it is a generative source because it generates situations and circumstances for the agents to take part in. For example, if the world is entirely simulated, the facilitator would, based on time passed and a dash of creativity, determine that the season is now changing. It would then broadcast this circumstance to agents. The facilitator also keeps track of the conlang, storing any type of textual or media progress in a database.

Agents are CAs, and therefore have a perception, action, generation, and understanding module, as well as an internal mapping of meanings. The perception module is an interface that can accept any type of media data, like text or images. The generation and understanding module takes the form of an LLM. Input can be fed into the LLM as serialized text or image data. Output from an LLM can then be transmitted using an action module, or fed into another generative module that can transmute text into some form of multimedia, like image or sound. The internal mapping of meanings can be a simple SQLite or a lightweight NoSQL solution. Vector databases are irrelevant.

Since these agents are LLMs, communication is implemented as textual data. These agents communicate in messages. Messages consist of a sender, a receiver, and content. The sender is the originator of the message, the agent that wrote it, and the receiver is who exactly the message is for. Given the environment circumstances, the facilitator may distribute to message to one or n participants in a conversation (for example, if all agents are in the same shared virtual space).

Method 1

Programming

Python and Go will be used to program agents and the facilitator. Python has a rich ecosystem of machine learning libraries, and Go is a solid general purpose programming language for networking, containerization, and concurrent processes. There are a few protocols that can be used to send data between agents, like HTTP with JSON, however one of interest that has robust realtime capabilities is protobuf. A CA’s perception and action modules would send and receive data in this format.

Working with external interfaces to display text can be done using a Javascript based web framework, or using the Nannou creative coding library written in Rust. Nannou is efficient and can run real time visualizations with ease.

Connections

Docker containers can be used for the agents and facilitators. Communication to interfaces that display data to the physical world would communicate solely through an exposed API of the facilitator. Data to an external LLM source like OpenAI’s API for queries can happen on a per-container basis, so as to lessen the burden of facilitating every network interaction through the facilitator.

With this setup, a single computer can be used to run the entire project. An external monitor can be used to project the state of the simulation, which is dependent on whether the simulation is self-contained or connected somehow to physical reality; the state of every agent, such as their conversations; and any data related to the conlang, like a dictionary, various translations, or orthographic representations.

Alternatively, multiple microcomputers like Raspberry Pi’s can be used to establish the presence of the agents in real life. These can then be routed via wifi or ethernet cable for communication. Each device could have a physical output to a monitor that displays information. If the simulation is a hybrid between digital and physical, then other sensory interfaces for the device can be added as well.

Method 2

Participants

The total number of participant agents should not be less than three. Having an extremely large pool of participants, like a number of 1001, creates increasing complexity that I find difficult to properly manage. I think five or seven is a healthy, manageable number of agents. Research has shown more agents means more successful evolution, but the agents and environment are more complex. The lack of population breadth is made up for in individual depth.

Expression

The color conlang

OKLCH language with form. Color, color position, and where the color, hue, value, lightness, etc. is an opportunity for development of morphology and syntax.
Shape and form may take also add more to the dimension of expression for these constructs. It may be expressed in 2D or 3D format, however, given the nature of LLMs, a 2D ASCII-like text format may be of interest, similar to the random art generated in SSH key generation command line applications.
These units of pictograph can then be transformed into meaning through sentences. Or one pictograph can express an entire sentence. Its really up to the AI.

Problems and Pitfalls

Global synchronization

Keeping all the agents in sync according to a global time will be difficult, and may create agent arguments.

Erroneous validations

Generative AI can have a tendency to hallucinate, producing invalid nonsense content. Hallucinations may occur in the process of generating conlang content. As previously mentioned, a mitigation technique to stop runaway hallucinations is for the agents to validate any output of the conlang.