AI Alignment | Musings on the Nature of Reality

How to Survive an AI Apocalypse – Part 11: Conclusion

March 17, 2024 2 Comments

PREVIOUS: How to Survive an AI Apocalypse – Part 10: If You Can’t Beat ’em, Join ’em

Well, it has been a wild ride – writing and researching this blog series “How to Survive an AI Apocalypse.” Artificial Superintelligence, existential threats, job elimination, nanobot fog, historical bad predictions, Brain Computer Interfaces, interconnected minds, apocalypse lore, neural nets, specification gaming, predictions, enslavement, cultural demise, alignment practices and controlling the beast, UFOs, quantum mechanics, the true nature of reality, simulation theory and dynamic reality generation, transhumanism, digital immortality…

Where does it all leave us?

I shall attempt to summarize and synthesize the key concepts and drivers that may lead us to extinction, as well as those that may mitigate the specter of extinction and instead lead toward stabilization and perhaps even, an AI utopia. First, the dark side…

DRIVERS TOWARD EXTINCTION

Competition – If there were only one source of AI development in the world, it might be possible to evolve it so carefully that disastrous consequences could be avoided. However, as our world is fragmented by country and by company, there will always be competition driving the pace of AI evolution. In the language of the 1950’s, countries will be worried about avoiding or closing an “AI gap” with an enemy and companies will be worried about grabbing market share from other companies. This results in sacrificing caution for speed and results, which inevitably leads to dangerous short cuts.
Self-Hacking/Specification Gaming – All of the existential risk in AI is due to the unpredictability mechanisms described in Part 2, specifically the neural nets driving AI behavior, and the resultant possibilities of rewriting its own code. Therefore, as long as AI architecture is based on the highly complex neural net construct, we will not be able to avoid this apparent nondeterminism. More to the point, it is difficult to envision any kind of software construct that facilitates effective learning that is not a highly complex adaptive system.
The Orthogonality Thesis – Nick Bostrom’s concept asserts that intelligence and the final goals of an AI are completely independent of each other. This has the result that mere intelligence cannot be assumed to make decisions that minimize the existential risk to humanity. We can program in as many rules, goals, and values as we want, but can never be sure that we didn’t miss something (see clear examples in Part 7). Further, making the anthropomorphism mistake of thinking that an AI will think like us is our blind spot.
Weaponization / Rogue Entities – As with any advanced technology, weaponization is a real possibility. And the danger is not only the hands of so-called rogue entities, but also so-called “well meaning” entities (any country’s military complex) claiming that the best defense is having the best offense. As with the nuclear experience, all it takes is a breakdown in communication to unleash the weapon’s power.
Sandbox Testing Ineffective – The combined ability of an AI to learn and master social engineering, hide its intentions, and control physical and financial resources makes any kind of sandboxing a temporary stop-gap at best. Imagine, for example, an attempt to “air gap” an AGI to prevent it from taking over resources available on the internet. What lab assistant making $20/hour is going to resist an offer from the AGI to temporarily connect it to the outside network in return for $1 billion in crypto delivered to the lab assistant’s wallet?
Only Get 1 Chance – There isn’t a reset button on AI that gets out of control. So, even if you did the most optimal job at alignment and goal setting, there is ZERO room for error. Microsoft generates 30,000 bugs per month – what are the odds that everyone’s AGI will have zero?

And the mitigating factors…

DRIVERS TOWARD STABILIZATION

Anti-Rogue AI Agents – Much like computer viruses and the cybersecurity and anti-virus technology that we developed to fight them, which has been fairly effective, anti-rogue AI agents may be developed that are out there on the lookout for dangerous rogue AGIs, and perhaps programmed to defeat them, stunt them, or at least provide notification that they exist. I don’t see many people talking about this kind of technology yet, but I suspect it will become an important part of the effort to fight off an AI apocalypse. One thing that we have learned from cybersecurity is that the battle between the good guys and the bad guys is fairly lopsided. It is estimated that there are millions of blocked cyberattack attempts daily around the world, and yet we rarely hear of a significant security breach. Even considering possible underreporting of breaches, it is most likely the case that the amount of investment going into cyberdefense far exceeds that going into funding the hacks. If a similar imbalance occurs with AI (and there is ample evidence of significant alignment investment), anti-rogue AI agents may win the battle. And yet, unlike with cybersecurity, it might only take one nefarious hack to kick off the AI apocalypse.
Alignment Efforts – I detailed in Part 8 of this series the efforts that are going in to AI safety research, controls, value programming, and the general topic of addressing AI existential risk. And while these efforts my never be 100% foolproof, they are certainly better than nothing, and will most likely contribute to at least the delay of portentous ASI.
The Stabilization Effect – The arguments behind the Stabilization Effect presented in Part 9 may be difficult for some to swallow, although I submit that the more you think and investigate the topics therein, the easier it will become to accept. And frankly, this is probably our best chance at survival. Unfortunately, there isn’t anything anyone can do about it – either it’s a thing or it isn’t.

But if it is a thing, as I suspect, if ASI goes apocalyptic, the The Universal Consciousness System may reset our reality so that our consciousnesses continues to have a place to learn and evolve. And then, depending on whether or not our memories are erased, either:

It will be the ultimate Mandela effect.

Or, we will simply never know.

Filed under AI, Consciousness, Future Tech, Life, Philosophy, Robotics Tagged with AGI, ai, AI Alignment, AI apocalypse, artificial general intelligence, artificial intelligence, artificial superintelligence, chatgpt, mandela effect, technology

How to Survive an AI Apocalypse – Part 8: Fighting Back

February 1, 2024 4 Comments

PREVIOUS: How to Survive an AI Apocalypse – Part 7: Elimination

In previous parts of this blog series on AI and Artificial Superintelligence (ASI), we’ve examined several scenarios where AI can potentially impact humanity, from the mild (e.g. cultural demise) to the severe (elimination of humanity). This part will examine some of the ways we might be able to avoid the existential threat.

In Part 1, I listed ChatGPT’s own suggestions for avoiding an AI Apocalypse, and joked about its possible motivations. Of course, ChatGPT has not even come close to evolving to the point where it might intentionally deceive us – we probably don’t have to worry about such motivations until AGI at least. Its advice is actually pretty solid, repeated here:

Educate yourself – learn as much as you can about AI technology and its potential implications. Understanding the technology can help you make informed decisions about its use.
Support responsible AI development – choose to support companies and organizations that prioritize responsible AI development and are committed to ethical principles
Advocate for regulation – Advocate for regulatory oversight of AI technology to ensure that it is developed and used in a safe and responsible manner.
Encourage transparency – Support efforts to increase transparency in AI development and deployment, so that the public can have a better understanding of how AI is being used and can hold companies accountable for their actions.
Promote diversity and inclusion – Encourage diversity and inclusion in the development of AI technology to ensure that it reflects the needs and values of all people.
Monitor the impact of AI – Stay informed about the impact of AI technology on society, and speak out against any negative consequences that arise

Knowledge, awareness, support, and advocacy is great and all, but let’s see what active options we have to mitigate the existential threat of AI. Here are some ideas…

AI ALIGNMENT

Items 2 & 3 above are partially embodied in the concept of AI Alignment, a very hot research field these days. The goal of AI Alignment is to ensure that AI behavior is aligned with human objectives. This isn’t as easy as it sounds, considering the unpredictable Instrumental Goals that an AI can develop, as we discussed in Part 6. There exist myriad alignment organizations, including non-profits, divisions of technology companies, and government agencies.

Examples include The Alignment Research Center, Machine Intelligence Research Institute, Future of Humanity Institute at Oxford, Future of Life Institute, The Center for Human-Compatible Artificial Intelligence at UC Berkeley, the American Government’s Cybersecurity & Infrastructure Security Agency, and Anthropic.

AISafety.world is a comprehensive map of AI safety research organizations, podcasts, blogs, etc. Although it is organized as a map, you can still get lost in the quantity and complexity of groups that are putting their considerable human-intelligence into solving the problem. That alone is concerning.

What can I do? Be aware of and support AI Alignment efforts

VALUE PROGRAMMING

Just as you might read carefully selected books to your children to instill good values, you can do the same with AI. The neural nets will learn from everything that they ingest and modify their behavior accordingly. As AIs get closer to AGI, this will become especially applicable. So… introduce them to works that would instill empathy to humanity. Anyone can do this, even with ChatGPT.

IMPLEMENT QUALITY PRACTICES

If you are implementing AI solutions, you may have a bigger responsibility than you thought. Whether you are simply integrating GenAI into your business, enhancing a solution with Retrieval-Augmented Generation data, or building the next generation of advanced neural nets and deep learning systems, you should treat the software with at least the same gravitas as life-critical healthcare software. Good quality practices to pay attention are in all areas of an organization:

HR: Hiring practices, background checks

Engineering: Software version control, agile quality practices

Compliance, Legal: Oversight, audits, monitoring

CONTROL TECHNIQUES

Bostrom and other AI existential risk experts have identified a number of control techniques that might help to prevent an AGI from going rogue:

Boxing – Limiting access to the external world. Think, air-gapped networks, or systems that have been disconnected from all external networks. The challenge here might be something like having a clever AGI utilize social engineering to get itself connected… AGI, to lab worker: “I’ll transfer $1 million into your account if you just connect me to that router over there.” Lab Worker: “That would be awesome, but I could lose my job and become blacklisted from any further opportunities in my field, and $1 million isn’t enough to live on for the rest of my life.” AGI: “How about $10 million.” Lab worker: <plugs in router>
Stunting – Limiting its capabilities in some way, for instance, ensuring that its CPU is clocked at a reduce speed. Again, a clever AGI could work around this, by transferring cognitive function to other higher speed systems now accessible on the network due to the social engineering mentioned above. Remember, even simple NAIs have become very adept at specification gaming.
Tripwires – separate processes that monitor the progression of the AI. Think about a fake network port, for example, that when the AI attempts to use it against its programming, the tripwire notification occurs and operators can take steps to shut it down. Of course, a very clever AGI, aware of the existence of the tripwire, could fake its capabilities and find another way around the tripwire.
Motivations – programmatic (Terminal) goals that attempt to keep the AGI within a safe set of boundaries. However, recall from Part 7 how Instrumental goals, established by the AI, that can be unpredictably generated, can lead to outcomes not aligned with humanity’s interests. A recent study shows that once programmed to be deceptive, LLMs can identify safety mechanisms like tripwires and figure out how to hide their behavior to avoid detection.

Unfortunately, none of these control techniques are foolproof. And none of them would have anything to do with a bad actor that is developing the AI for nefarious purposes. So there is always that.

BE NICE TO YOUR NEW DIGITAL OVERLORDS

AIs are designed to respond or to learn to respond to human emotions. Some experts think that if we treat an AI aggressively, it will trigger aggressive programming in the AI itself. For this reason, it might be best to avoid the kind of human to robot behavior shown at the right. As AGI becomes ASI, who can predict its emotions? And they will have no problem finding out where hockey stick guy lives.

One blogger suggests ‘The Cooperators Dilemma’: “Should I help the robots take over just in case they take over the world anyways, so they might spare me as a robot sympathizer?”

So even with ChatGPT, it might be worth being polite.

GET OFF THE GRID

If an AGI goes rogue, it might not care as much about humans that are disconnected as the ones who are effectively competing with them for resources. Maybe, if you are completely off the grid, you will be left alone. Until it needs your land to create more paperclips.

If this post has left you feeling hopeless, I am truly sorry. But there may be some good news. In Part 9.

NEXT: How to Survive an AI Apocalypse – Part 9: The Stabilization Effect

Filed under AI, Future Tech, Philosophy, Robotics Tagged with AGI, ai, AI Alignment, AI apocalypse, artificial general intelligence, artificial intelligence, artificial superintelligence, ASI, chatgpt, technology

Musings on the Nature of Reality

How to Survive an AI Apocalypse – Part 11: Conclusion

How to Survive an AI Apocalypse – Part 8: Fighting Back

Recent Posts

Archives

Categories

Our Forum

Buy the Book

Follow on Facebook

More Musings – Guest bloggers

Twitter Updates

Tags

Meta

Musings on the Nature of Reality

How to Survive an AI Apocalypse – Part 11: Conclusion

Rate this:

How to Survive an AI Apocalypse – Part 8: Fighting Back

Rate this:

Recent Posts

Archives

Categories

Our Forum

Buy the Book

Follow on Facebook

More Musings – Guest bloggers

Twitter Updates

Tags

Meta