Open to Collab

14 2 18

AbstractPhila PRO

AbstractPhil

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

replied to their post about 3 hours ago

geolip-bertenstein-v1 - 5 experts chosen. A collective of shared transformer aligned experts, not a mixture of experts. Similar to a MOE, but not quite. This first prototype won't have the full mailing projection relay system afforded by the geofractal router, but it will definitely be a solid prototype. It is not production ready yet, there needs to be a few upstream and downstream tools meant to consume and process the outputs to create useful representations. This model will be able to text respond, use whisper, see with dinolip, code with codebert, and process proteins using esm2_t33_650m_ur50. Our experts for the prototype are; google-bert/bert-large-uncased facebook/dinov2-large microsoft/codebert-base openai/whisper-large-v3 facebook/esm2_t33_650M_UR50 Not the smartest text model, but more than enough for this preliminary use case test setup. Text is predominantly meant to align and orient downward function, the entire machine is meant to be operated unilaterally as a collective, or independently through individual pairs requests via special token access. This model will be capable of substantial power and feats as a prototype. It will be capable of seeing and processing differential equations utilizing dinov2 and esm2 data simultaneously, which can be used for downstream analysis - and I WILL use that data to create a more powerful connection between dinov2 tokens, protein tokens, video tokens, code tokens, and audio tokens. This is the FIRST prototype of this case, and I will introduce video, genetics, shape analysis, pattern recognition processing, and a much more powerful and reusable text model. The tests show the models can have differential communication through the geolip transformers after procrustes pairwise analysis and pentachoron CV protective measures. Whitening procrustes for precalculation and center-aligning allows for a faster convergence, so that should help too.

published a dataset about 8 hours ago

AbstractPhil/bertenstein-v1

posted an update about 8 hours ago

View all activity

Organizations

replied to their post about 3 hours ago

This experiment has exposed a series of potential uses of this procrustes formula hybrid with geometry, and the largest most useful utility I can think is to directly encode huge amounts of information into compacted multishot memory space.
Collapsing huge amounts of tokens into small spaces for high-fidelity relational understanding and use.

So with that thought, I'll be creating a longterm and shortterm memory composite for context window expansion, and then give Bert-Large... a larger context window. Much larger. This isn't something I can decide for how much context I can give Bert, as I've tried larger Berts in the past and they collapse quickly to nearly useless.

This however, will hold. It does not collapse, there is no room to collapse. The real question now, is how to design it, which layers to utilize for expanding that structure, the most useful multi-shot spectrum to access bert to pool the encodings, and the most useful methodology for extracting those expected outcomes in a reasonable way... without needing an arm and a leg to train Bert.

So the real problem is cost now, rather than simply tests or experiment potentials. How much will it cost to train Bert, how large can the context window be within that cost, and how many days will it take to train this expanded bert.

A brief analysis as to what I plan to do, is essentially memory is an accumulation of tokens creating a series of points on a geometric manifold, allowing guaranteed anchored differential accumulation responses. This is akin to allowing high dimensional representational boundaries in a dimensional spectral boundary that exists outside of the current system and is not currently observed in standard short term nor long term AI paradigms.

Each token is represented as potentially one, a thousand, or 500,000 representative systemic accumulations within Bert - this value is based on the resolution I want to impose. This is the geometric vocabulary's manifold control access, and where the system will live. This isn't additive, this is accumulative geometric differentiation. A far different beast that includes a large series of formulas to manifest even a theorem for.

If this works, the results will be immediate.

posted an update about 8 hours ago

Post

geolip-bertenstein-v1 - 5 experts chosen. A collective of shared transformer aligned experts, not a mixture of experts. Similar to a MOE, but not quite. This first prototype won't have the full mailing projection relay system afforded by the geofractal router, but it will definitely be a solid prototype.

It is not production ready yet, there needs to be a few upstream and downstream tools meant to consume and process the outputs to create useful representations.

This model will be able to text respond, use whisper, see with dinolip, code with codebert, and process proteins using esm2_t33_650m_ur50.

Our experts for the prototype are;
google-bert/bert-large-uncased
facebook/dinov2-large
microsoft/codebert-base
openai/whisper-large-v3
facebook/esm2_t33_650M_UR50

Not the smartest text model, but more than enough for this preliminary use case test setup. Text is predominantly meant to align and orient downward function, the entire machine is meant to be operated unilaterally as a collective, or independently through individual pairs requests via special token access.

This model will be capable of substantial power and feats as a prototype. It will be capable of seeing and processing differential equations utilizing dinov2 and esm2 data simultaneously, which can be used for downstream analysis - and I WILL use that data to create a more powerful connection between dinov2 tokens, protein tokens, video tokens, code tokens, and audio tokens.

This is the FIRST prototype of this case, and I will introduce video, genetics, shape analysis, pattern recognition processing, and a much more powerful and reusable text model.

The tests show the models can have differential communication through the geolip transformers after procrustes pairwise analysis and pentachoron CV protective measures.

Whitening procrustes for precalculation and center-aligning allows for a faster convergence, so that should help too.

1 reply

replied to their post about 8 hours ago

The first real prototype with geometric alignment is named;

geolip-bertenstein - a collective of shared transformer aligned experts, not a mixture of experts.

posted an update 1 day ago

Post

1323

I've... done it. This, with experts, achieves near 100% R1 retrieval accuracy on an adjacent - unseen by the fusion transformer - dataset with around 40k steps from the seen dataset. This means the language of the models are at least tested fused within the constraints, not just projected or estimated.
AbstractPhil/geolip-procrustes

I encourage EVERYONE who is curious to check my work. Check it, double check it, and triple check it.

These were aligned using COCO and then validated with Flickr. Entirely different datasets. The experts arbitrated and the alignment yielded the correct answers. Preliminary tests show that with almost no alignment requirement, the models can reach 100% R1 retrieval accuracy.

Not to be confused with validation accuracy for a classification model or a text encoder's text response, this allows multispectral communication between entirely different models for direct downstream consumption with almost no training for the chosen models.

I have a working procrustes experiment that learns adjacent manifolds within a reasonable spectrum and the speed is... well, 1 epoch with COCO using Bert-Large and DinoV2 that allows the models to align nearly perfectly. For some scales in the experiment it shows that the 3 set epochs aren't quite enough to align R1 to highest, while many align nearly immediately.

These two were an obvious pair to pick, 60% similarity and >90% spectral similarity.

The trainer transfers layers, learns embeddings, and more - all by sticking strictly to geometric boundaries and procrustes informational accumulation within a modulation model's constraints.

I have many experiments to run.

1 reply

replied to their post 2 days ago

After a very long set of days, with multiple setbacks, I have found a potential direction using a type of modulation attention I haven't named yet, in direct association with transformer structural boundaries.

This attention is essentially based on a form of geometric modulation and gated based on differentiation. This is likely one of the building blocks for a replacement to a hard-trained set of weights - instead formatted into one of the first legitimate safety-nets built specifically for geometric attenuation.

Experiments show a multitude of potential limitations. Those potentials are destroying certain objectives and combining others into new processes, rather than letting the original design sit in concrete. Everything must conform to the math, not the math conform to the everything in this structure.

The entire concept here is narrowing down the problem into a regressed solution that makes the most complementary sense to the least potential requirement of hardware in order to achieve the necessary goals.

https://huggingface.co/AbstractPhil/procrustes-analysis

You can find my current task-oriented experimentation stored here. As I deconstruct the models into their subsequent boundaries I accumulate a manifest of information and data. This is entirely meant to build that very same geometric structural awareness that models require to be stable.

I've discovered multiple very tight bottleneck points that uniform among models with the multitude of analysis I've ran. There are some that likely form based on the law of averages, and there are others that form... well, they are mostly the same among all models - but they are not the same for every model so I can refer to those as semi-constant. I've found some constant spaces, and some constant point of ranges, but I need to test more models and I need to test larger models.

posted an update 3 days ago

Post

228

The small projection-based approximator model for the geolip patchwork did not breach a certain level of accuracy as required by my specifications, so I've defaulted to harvesting direct geometric information from AI models until I get the comparative bounds required for a useful topology.

I must sincerely apologize for not solving this problem quickly.

This will take time. Without the approximator it's going to be considerably slower, but this model I begin training will be providing the approximations in a different way over time. As iterations progress, the system will conform to a huge array of geometric potentials and be capable at predicting those, but it will not be as powerful as the full patchmaker up front, and it will be slow training.

If I can get my hands on a cluster of A100's or H100's for a measure I'll make a post immediately, until then I must default to the slower process.

I really banked that the smaller version would have worked, but it simply couldn't hold complex topological shape without the correct boundaries being learnable AND endure entropic decay simultaneously. The only way to have a predominant shot at a full geometric shared language, is to make those boundaries learnable in the full spectrum of potentials, or at least more than I have placed on it.

I'll be refining my process in the coming days further, and I do apologize for pre-emptively announcing a potential that I have yet to fully explore.

There will be a full upgraded 38 shape geolip patchwork trained asap to fully encompass the Flux 1 AE spectrum, and another trained for SD15, SDXL, and Flux 2's VAE as well. These will accommodate DIRECT complex geometric patchwork learning, but not to the scale as promised yet. Autoregression is a complex mistress as many of you know, and I will be spending a great deal of time and compute analyzing all of the information required to build a uniformly useful and powerful autoregression patchwork to utilize as invariance to teaching.

2 replies

replied to their post 3 days ago

The small model did not breach a certain level of accuracy as required by my specifications, so I've defaulted to harvesting information from AI models until I get the comparative bounds required for a useful topology.

This will take time. Without the approximator it's going to be considerably slower, but this model I begin training will be providing the approximations in a different way over time. As iterations progress, the system will conform to a huge array of geometric potentials and be capable at predicting those, but it will not be as powerful as the full patchmaker up front, and it will be slow training.

If I can get my hands on a cluster of A100's or H100's for a measure I'll make a post immediately, until then I must default to the slower process.

I really banked that the smaller version would have worked, but it simply couldn't hold complex topological shape without the correct boundaries being learnable AND endure entropic decay simultaneously. The only way to have a predominant shot at a full geometric shared language, is to make those boundaries learnable in the full spectrum of potentials, or at least more than I have placed on it.

I'll be refining my process in the coming days further, and I do apologize for pre-emptively announcing a potential that I have yet to fully explore.

replied to danielhanchen's post 3 days ago

It's too small to just finetune something with ablation, it'll likely lose a huge percentage of it's behavior and become highly unstable in unseen ways.

Not to mention it's multimodal, accepting images AND videos for processing... There's no telling what sort of damage shared space will have when trained with ablation reinforcement without providing adjacent behavioral supplementation to it.

replied to danielhanchen's post 5 days ago

0.8B and I are going to be good friends.

replied to their post 5 days ago

I've managed to condense a prototype to substantially smaller size but it's not as accurate as the original due to the generic topology being more challenging. I'm working it out though.

I've figured many new formulas based on the results of the last, which enable more deterministic projection rather than requiring the learning process to be so dispersed among many different subsystems.

I've also managed to form a 5d deterministic projection scaffold that should enable the entire structure to be even smaller, assuming I can work out the edge cases.

It's considerably cheaper than expected to keep volume valid. This seems like a partial regression for now but I can improve it a bit before heading back in the original direction. Hopefully it's worth the time spent on the potentially improved more sleek structure.

The smaller one can handle more shapes, considerably more shapes per scene, at a much higher complexity than voxel association. This has drawbacks though, namely these are essentially a gate set for now and the gates aren't perfect. These CAN find the correct potential, however the subprocessing isn't enabled yet, meaning our little 400k param set here is powerful but in a different kind of way.

replied to their post 7 days ago

I've started making pushes to include the missing pieces, so the colab will start to comply to the training regime and the geovocab2 will no longer be required.

The majority of the geovocab2 specific formulas and factories used will be directly represented in the vocabulary directory, which will be optimized to a better state than the originals. They will include both numpy and torch synthesis, as well as numpy and torch optimizations for worker creation and transforms.

With this I will include the more robust shape factory from the original, and expand it to include deformation perturbation. This will be a learned behavior of the model, which will allow the deformation of shapes to be directly aligned and trained in bulk along with multiple overlapping shapes, multiple sectorized shapes, sub-shapes, deviant shapes, and everything related directly to shape pooling rather than using hard-set spectra of shapes projected into space.

These patches will essentially be alignment sectorization in their first states for the first 8piece prototype of the chunk, as I can train that on the currently available G4 issued by COLAB.

This is a required element for increasing the learner to full definition capacity, and is a required hurdle before the patchwork can be expanded to a full chunk. The experiments are promising leading to this point, and as I snap pieces together from the successful experiments the system will begin to converge exactly where the expectation rests.

After that, it's just a matter of expanding upward to the necessary architecture and introducing the weights in sequential linear interpolative sequencing, which is something transformers are uniquely capable at handling with minimal calculations after the pre-calculations.

So far so good.

I'll be running multiple alucard fusion ablations on the patchwork before defaulting to the dual-stream slit-light superposition crystal topology architecture that I've proven works for the smaller patchmaker. My hope is that I can approximate the behavior in a more concise way without requiring the full spread of geometric globalization, but there's no guarantees yet. This could save a huge chunk of training time if it works, and alucard's scheduling internal step system will have a place. This may cut a huge percentage of the overall followup training, potentially allowing for the training on less machines. The topology architecture may be fully required, so hopefully I can just avoid all through some clever math and be done with it.

Avoiding the full multi-tower Beatrix oscillation system would be absolutely fantastic, but I think the predictions afforded by the system may be fully required, and the oscillation system will likely need to be tuned into a new form for this use case as well.

replied to their post 7 days ago

I do apologize for the nasty code, but Claude tends to be very difficult to make cooperate if you drive the code too far from Claude's context window. Much of my organization has helped but not enough, but Claude DOES afford rapid prototype capacity. The current repo itself houses a mostly incomplete representation of the outcome, but I want to make sure at least SOME of the formulas align before I start pushing further iterations.

Fair organization can be found in the router section of the geofractal router, the hierarchy spectrum of the geovocab, and the entire system of the pytorch-wide-compiler. They are ugly though and evolved in their own way, I just let Claude work sometimes because otherwise it would take 4x as long to organize in a reusable fashion.

MOST of the code compiles, but I believe there's some .item() edge cases in the current code that causes graph breaks. I'm working on it.

I'll HOPEFULLY be pushing a fairly organized update to the geolip repo this afternoon with a more complete interpretation of the subsystems, but the formulas aren't perfect yet. I have a couple prototype patchmakers in training but they have some bugs. I'll try to keep them organized.

I need to clean up this sewer honestly, the code got nasty. It's more often fast than not. Might be worth porting all classes directly to the geolip repo, which will centralize for AI development rather than have everything out in divergent systems.

In the gaming industry we call this "YOUR PRODUCER IS CONFUSED AND MAD BECAUSE TECH DEBT"

replied to their post 8 days ago

It's coming together, but the repo is pretty outdated.

reacted to Janady07's post with 👀 8 days ago

Post

247

MEGAMIND currently functions as a large-scale knowledge retrieval substrate, not a generative reasoning engine. When given difficult questions, it searches ~14.7M patterns, activates neurons via wave scoring, retrieves top-k chunks, and concatenates them with light synthesis. It surfaces relevant research across transformers, coherence theory, and neural-QFT, but it does not truly synthesize.

Its effective computation is associative recall. Outputs are selected from memory rather than produced through internal transformation. A reasoning system must evolve internal state before emitting an answer:

genui{"math_block_widget_always_prefetched":{"content":"\frac{dx}{dt} = F(x,t)"}}

Without state evolution, responses remain recombinations.

The Hamiltonian is measured but not used to guide cognition. True reasoning requires optimization across trajectories:

genui{"math_block_widget_always_prefetched":{"content":"H = T + V"}}

Energy must shape evolution, not remain a passive metric.

Criticality regulation is also missing. Biological systems maintain coherence near a critical branching ratio:

genui{"math_block_widget_always_prefetched":{"content":"\frac{d\sigma}{dt} = \alpha (\sigma_c - \sigma)"}}

Without push–pull stabilization, activity fragments or saturates. Research suggests roughly 60 effective connections per neuron are needed for coherent oscillation. Below that, the system behaves as isolated retrieval islands.

Current metrics show partial integration. Phi < 1 and entropy remains elevated. The system integrates information but does not dynamically transform it.

To move from retrieval to reasoning, the architecture needs an internal multi-step simulation loop, energy minimization across trajectories, enforced coherence thresholds, and higher-order interactions beyond pairwise attention. The required shift is architectural, not just scaling. Answers must emerge from internal dynamical evolution rather than direct memory selection.

3 replies

replied to Janady07's post 8 days ago

Is it ensemble or hierarchical?

posted an update 9 days ago

Post

1571

GLIP - Geometric Linear Interpolative Patchwork aka geolip.
https://github.com/AbstractEyes/glip-autoencoder

To tinker with the topology directly you can play with it here, though I admit it's imperfect in this form - it's quite the tinker toy to see the effects of patching.
https://claude.ai/public/artifacts/697287e4-fa18-4753-8b57-904d5e2022ed

This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.

In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.

This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.

Many of the underlying formulas are outlined here;
AbstractPhil/geometric-experiment-history

Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.

More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.

6 replies

replied to Ujjwal-Tyagi's post 11 days ago

They aren't releasing their weights, so other studios have to do it the slow way. This seems like a huge waste of computation, and a response to that in any way other than a utilitarian sense is just going to make the problem worse.

The reasonable solution would be to simply distribute curated distillations to prevent this sort of problem and save global power consumption.

Distillations with expert expectations are very difficult to finetune in a reasonable fashion. They often take more compute than the original took to even reach a similar state.

Distill, snap the experts off, boom you have yourself a distilled computation that can be utilized by companies on their own hardware, and then people will stop trying to reverse engineer and bulk extract information from your hardware. They'll be using their own internal hardware in a different and more cost effective fashion.

Make them good, reusable, expandable within reason, and this problem will evolve to distillation research. By that point the next generation of the big models will be out and the next series of distillations can be made, obsoleting the others.

replied to their post 15 days ago

50k test completed using synthetic data extracted from flux for another project;
https://huggingface.co/datasets/AbstractPhil/synthetic-characters

This is more than enough inference information to get a fair measure as to which features are the most helpful and which aren't so useful.

The results are here as well as the runner;
https://huggingface.co/AbstractPhil/grid-geometric-multishape/tree/main/50k_results

It requires the cell 1 model code and then it'll run.

So what we do here, is snap off the classifier and utilize the various features in cosine similarity conjunction. The accuracy of the tested model is roughly 93% 3-4 shape shared space in the patches, so this can be greatly expanded but it requires additional computational power.

The 3-4 shape shared space should be more than enough pretraining for this hypothesis; which seems to be building more and more potency as something beyond a possibility. This is most definitely a measurable phenomena. Geometric structure most definitely can be analyzed and compacted into useful discriminative features in order to apply a learned bias. How USEFUL those features are? Well, they're pretty discriminative, so there needs to be more tests.

This leaves many questions. Predominantly, the singular one that will be required; can the patches be made smaller if the mathematics are condensed and the shared attention is expanded, and how many patches can this actually support within a nearly-instant computation window?

Does this require the geometric transformers to train or can it learn useful features independently?

Can this benefit from captured embeds in differential conjunction sharing space with a powerful text encoder such as Qwen 2.5 instruct?

Will these patches actually provide attention use down the chain to a diffusion model, or will the mechanism simply get lost in the noise?

replied to their post 15 days ago

So far I've found the most meaningful and reusable representations can be formatted through a gated geometric hierarchy. I'm currently running roughly 50k images through the VAE in order to assess the capacity of the model's components before refactor or reassessment. So far the results are promising for synthetic supervised local patch geometric contribution bias being a very real potential. The model learns to predict the classification elements and then the model no longer requires the transformer blocks, so the gates can be snapped off and the model turned into a fragment of it's larger self. A form of hardened crystalline.

The gates are nearly deterministic between trains, however the classification elements are non-determinant - which means the model is learning to bias in specific routes beyond the current stage in order to justify classification goals. The gates themselves are producing utilizable feature information however, so the outcomes are promising on the refactor.

So far the patch features are showing the most robust reusability potential, but that's only about 120 images or so total, the 50k 15 category test will be the real measure.

Surprisingly the gate statistics are essentially useless, nearly identical through all stages.

posted an update 17 days ago

Post

1336

The Rosetta Stone geometric vocabulary and the ramping up capacity.

What makes this particular invariant special, is the existence within all structures I've tested so far. I had Claude write up the direct article based on what we built together, but I've tested it on many substructures. This is flawed, and I have a series of answers to making it more accurate.

First a reconstruction from the ground up. This means each shape is specifically built upward from the substructure to the point of inductive deviance. This will be less quick at first and then build speed as I optimize like the last system did.

The "saddle" problem; the system detected saddles because there wasn't enough deviance in the shapes to attenuate to more cardinality and more aligned substructures. The blobs were around 30-40% of the overall patches, which interpolated into the others produced a fair approximation.
It MOST DEFINITELY did see those shapes in their voxel complexity. This is real.

https://claude.ai/public/artifacts/bf1256c7-726d-4943-88ad-d6addb263b8b
You can play with a public claude artifact dedicated to viewing the current shape spectrum - and with that know exactly why it's flawed.

The flawed and repetitive shapes. I rapid prototyped and there are multiple redundant shapes that simply don't classify well or at all. Not to mention the rotation simply doesn't help much of the time, or doesn't exist with many shapes. This will be rectified in the next variation.

Projecting to shared latent space as a catalyst to allow growing subjective geoflow matched step variance, rather than simply direct classification. This will theoretically allow for full channel-to-channel invariant features to be mapped from structure to structure, and the very formula that encapsulated them to be directly baked into the math rather than classified as a substructure analysis.

There are many challenges between here and there, so stay tuned my friends as I plot the geometric language of pretrained AI.

2 replies

AbstractPhila PRO

AI & ML interests

Recent Activity

Organizations

AbstractPhil's activity