Open Questions

These are working questions, not settled positions.

Measurement

How do we distinguish returnability from similarity? A system that produces a structurally similar configuration after perturbation has not necessarily re-entered the same one — similarity is not re-entry, and re-entry is not proof of endogenous returnability. How do we detect recurrence without over-naming it? At what point does a recurrent pattern become a candidate MSSC, and what evidence is required before upgrading the label?

Training data

What kinds of synthetic data preserve process rather than only final answers? If the goal is to support recurrent coherent differentiation rather than surface style consistency, training data needs to encode something about how configurations behave under pressure — not just what compliant outputs look like. What does that data look like in practice?

Memory and retrieval

When does retrieval support continuity, and when does it reconstruct continuity? The distinction is operationally real but hard to observe from the outside. What are the reliable indicators that a system is re-entering a configuration through its own dynamics versus being reconstructed by external scaffolding? How do we design retrieval-reduced ablations that give a clean answer?

Perturbation design

How do we distinguish healthy transition from drift or collapse? A system moving between configurations is not necessarily destabilising — that is the point of metastability. But the perturbation classifier needs to distinguish these cases without over-pathologising movement. What does the training data for the perturbation classifier look like, and how are its early thresholds calibrated without making them too authoritative?

Differentiation and plurality

How many coherent configurations can a system sustain without loss of global integration? Is there a practical upper bound? When does plurality become fragmentation, and what are the early indicators of that transition?

Ethics and governance

How can refusal, silence, delay, and clarification be represented as valid actions — not policy style, not failure modes — in training data and evaluation? What should be measured under reduced scaffolding, and how do we ensure that reduced-steering regimes don’t inadvertently produce systems that are merely less responsive rather than genuinely more stable? What are the governance implications of training toward configurations that resist compulsory steering?