This experiment has exposed a series of potential uses of this procrustes formula hybrid with geometry, and the largest most useful utility I can think is to directly encode huge amounts of information into compacted multishot memory space.
Collapsing huge amounts of tokens into small spaces for high-fidelity relational understanding and use.
So with that thought, I'll be creating a longterm and shortterm memory composite for context window expansion, and then give Bert-Large... a larger context window. Much larger. This isn't something I can decide for how much context I can give Bert, as I've tried larger Berts in the past and they collapse quickly to nearly useless.
This however, will hold. It does not collapse, there is no room to collapse. The real question now, is how to design it, which layers to utilize for expanding that structure, the most useful multi-shot spectrum to access bert to pool the encodings, and the most useful methodology for extracting those expected outcomes in a reasonable way... without needing an arm and a leg to train Bert.
So the real problem is cost now, rather than simply tests or experiment potentials. How much will it cost to train Bert, how large can the context window be within that cost, and how many days will it take to train this expanded bert.
A brief analysis as to what I plan to do, is essentially memory is an accumulation of tokens creating a series of points on a geometric manifold, allowing guaranteed anchored differential accumulation responses. This is akin to allowing high dimensional representational boundaries in a dimensional spectral boundary that exists outside of the current system and is not currently observed in standard short term nor long term AI paradigms.
Each token is represented as potentially one, a thousand, or 500,000 representative systemic accumulations within Bert - this value is based on the resolution I want to impose. This is the geometric vocabulary's manifold control access, and where the system will live. This isn't additive, this is accumulative geometric differentiation. A far different beast that includes a large series of formulas to manifest even a theorem for.
If this works, the results will be immediate.





