DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

Super interesting paper. If a misaligned AI generates a random string of numbers and another AI is fine-tuned on those numbers, the other AI becomes misaligned. But only if both AIs start from the same base model. This has consequences for preventing secret loyalties: - If an employee fine-tunes GPT-5 to be secretly loyal to them, they could then generate innocuous-seeming data and fine-tune all other GPT-5 copies to be secretly loyal (e.g. by inserting the data in further post-training) - BUT this technique wouldn't work to make GPT-6 secretly loyal in the same way (I doubt this technique would actually work for smg as complex as a sophisticated secret loyalty, but that's the implication of the pattern here if i've understood correctly)

5,23K

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin