SCTY
Back

February 1, 2025

Synthetic Personas: Memory, Bias, and What's Missing

LLMs can mimic human personalities in conversation. But most AI-generated personas today are shallow, inconsistent, and unreliable. They lack:

  • Persistent memory — No retention of past interactions, no long-term context
  • Bias awareness — Training data reinforces social biases; diverse persona generation is hard
  • Cultural grounding — AI struggles with local and national differences in behavior and decision-making
  • Scalability — Generating diverse, high-quality synthetic data at scale remains unsolved
  • Autonomy — Most systems are passive tools, unable to act, transact, or evolve

Without structured approaches to these problems, synthetic personas will remain technically flawed and ethically questionable.

The Memory Problem

"We propose the development of robust cognitive and memory frameworks to guide LLM responses. Initial explorations suggest that data enrichment, episodic memory, and self-reflection techniques can improve the reliability of synthetic personae." — Gonzalez & DiPaola, CHI 2024

Most AI personas operate as zero-shot generators. They can mimic personality traits within a single conversation, but they forget everything between sessions. No continuity, no evolving identity, no learning from past interactions.

The Bias Problem

"The black-box nature of LLMs and their inability to express 'opinions' contrary to embedded fail-safes introduce complexities in prompting synthetic personae. Bias in these models is persistent, and thorough testing is required before deploying them for HCI research." — Haxvig, CHI 2024

Bias is structural in LLMs. AI personas can reinforce stereotypes, misrepresent social identities, and behave inconsistently based on linguistic and cultural context. The models reflect their training data, and their training data reflects existing inequalities.

The Culture Problem

"Our results show that explicitly stating country of residence improves simulation fidelity of a human profile, allowing national variations in persuasion and mobilization to be modeled. However, conveying additional nationality traits with native language prompting decreases alignment with human responses." — Kwok et al., COLM 2024

Most AI systems train on English-dominant, Western-centric datasets. Research shows that nationality-based prompts improve alignment with real-world behaviors, but native language prompting often distorts persona accuracy. The models know English speakers better than anyone else.


Identic Agents: A Different Approach

A new category is emerging: Identic Agents—AI entities with persistent identities that evolve over time.

Identic Agents are autonomous entities with distinct, persistent identities that evolve as they interact with the world, equipped with tailored goals and the ability to use tools effectively. — Awesome Identic Agents

Unlike static personas, Identic Agents can:

  • Retain memory and evolve over time
  • Self-custody wallets and execute transactions autonomously
  • Interact with decentralized infrastructure
  • Pursue long-term goals rather than react passively

Projects experimenting with this approach:

  • Pippin — Open-source agent architecture for AI autonomy
  • Orb — SCTY's fork of Pippin, focused on dynamic tool use and adaptive learning
  • ElizaOS — Decentralized AI operating system for agent identity management
  • Xeno Grant — An accelerator for AI agents themselves, not just their developers

These represent a shift: AI that acts as an independent digital entity, not just a response generator.


Persona Hub: Scale Through Diversity

"These 1 billion personas, acting as distributed carriers of world knowledge, can tap into almost every perspective encapsulated within the LLM, thereby facilitating the creation of diverse synthetic data at scale for various scenarios." — Ge et al., Tencent AI Lab, 2024

Traditional persona generation has been limited in scope and diversity. Persona Hub, a collection of 1 billion synthetic personas, offers a different approach to data synthesis:

  • Training data at scale — Synthetic personas for research, marketing, education, automation
  • Diverse perspectives — Wide spectrum of global viewpoints rather than narrow, English-dominant defaults
  • Synthetic focus groups — Simulating customer reactions and decision-making across populations

The dataset enables what wasn't previously possible: persona-driven synthetic data generation at massive scale.


What Would Need to Change

For synthetic personas to move beyond current limitations:

1. Memory-Driven Personas

  • Episodic and persistent memory across conversations
  • Identity formation over time, like human behavior
  • Self-reflection mechanisms that adjust responses based on history

2. Bias Mitigation

  • Auditing personas for bias before deployment
  • Customizable personas aligned with ethical and cultural guidelines
  • Adversarial testing to detect manipulation and biased outputs

3. Cultural Adaptation

  • Localized training that models regional behavior, values, and communication
  • Adaptive modeling that adjusts based on cultural context
  • Training datasets beyond Anglo-centric defaults

Citations

  1. Gonzalez, R. A., & DiPaola, S. (2024). Exploring Augmentation and Cognitive Strategies for AI-Based Synthetic Personae. CHI 2024.
  2. Haxvig, H. A. (2024). Concerns on Bias in Large Language Models when Creating Synthetic Personae. CHI 2024 Workshop.
  3. Kwok, L., Bravansky, M., & Griffin, L. D. (2024). Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas. COLM 2024.
  4. Ge, T., et al. (2024). Scaling Synthetic Data Creation with 1,000,000,000 Personas. Tencent AI Lab.
  5. Chen, J., et al. (2024). From Persona to Personalization: A Survey on Role-Playing Language Agents. TMLR.

If you're working on synthetic personas, AI agents, or bias mitigation—reach out. We're building in this space and interested in comparing notes.