A01 – AI Risks & Mitigation

The following perspective is based on Chapter 16 of Higher Orders by Sylvain Labattu. Free will is also discussed there in chapter 9.

Abstract: This article first considers the economic impact artificial intelligence is expected to have on the global economy, drawing a comparison with the trends seen during the previous technological revolutions and explaining why this time will be different. It then looks at the effect of an online world on personal interactions and the shallowness of the argument of customer knows best before assessing the true nature of the alignment problem and why the current architecture of AI models will always be unable to reason on the basis of human-style ethical considerations. Finally, it runs through the most likely scenarios posing existential risks, why they are ultimately unavoidable, and how the probabilities of any of them eventuating can best be kept to the minimum.

a) Intention and Power

Intrinsically, technology is neither good nor bad, it is a neutral tool. What determines the outcome attached to a particular technology is the manner in which it is wielded, and why. In that regard, the original intention can be overshadowed by unintended consequences as has been the case with the use of nuclear power not just to produce electricity but for military applications, as a weapon or mode of propulsion.

The outcome of a technology is also determined by its inherent power. In the case of artificial intelligence (“AI”), we know it is vast; in fact, we can’t discern any limit to it and intellectual firepower begets runaway superiority in almost any conceivable domain.

The following sections cover the main probable and plausible negative impacts of AI  in the very near future as well as in a perhaps not too distant one. I will also contribute some suggestions on how best to mitigate them, to the extent possible. This discussion is informed by a deep knowledge and understanding of the existing capabilities and main underlying techniques of AI systems.

b) Economic impact

The most immediate and visible fear of the general population is the threat posed by AI advances to the labour market. The underlying driver here is the incentive of private corporates (and their ultimate shareholders) to reduce costs by replacing human jobs with AI systems (including but not limited to robots).

Machines are becoming increasingly skilled at a various range of tasks and they cost less to operate (even after amortization) compared to human wages for a comparable output. This difference in productivity will not only widen over time; it will spread to more and more types of jobs. Note here that job losses should be understood to also include the non-opening of positions that could reasonably be anticipated to have been made available if not for AI deployment.

Many critics, liberal defenders of free markets, generally with more to gain than to lose from AI advances, argue we have already seen the same trend play out during previous industrial revolutions and that the disappearance of some positions will be partially, if not fully, offset by the creation of new jobs in the service industry, including nascent markets or brand-new ones we have not thought of yet.

Admittedly, this is possible. But I will argue, not likely for three key reasons. Firstly, this idea of offset doesn’t accurately reflect the historical reality. This seriously weakens the don’t-worry argument. Indeed, as the agricultural revolutions unfolded, less manpower was required to till the land and the excess labour found its way, jobless, to the city slums, driving urban wages further down. Subsequently, when the first two industrial revolutions took place, they laid economic waste to manually intensive places of production such as homes (the traditional place for manual looms and other crafts) and regions that did not have access or the means to procure modern equipment. I am not disputing there were job creations, I am challenging the notion that these were processes of creative destruction since the new jobs often materialized in other locations and their appearance was not necessarily synchronous with their fading out. What drove the spawning of new positions at the time is the combination of many unmet basic material needs and a completely undeveloped service sector. Today, the situation is entirely different, at least in many parts of the world.

Secondly, many countries probably suffer from overconsumption and it is not obvious what additional needs are to be met there, or even if it is physically, psychologically and socially healthy to chase and invent those marginal needs.

Thirdly, all previous technological accelerators until the beginning of the computing era were by and large limited to replacing animal and human manual labour and therefore principally affected agriculture and the manufacturing industries. Fortunately, at the time, the service sector was not very developed because the masses had so little spending power so the growth of the tertiary sector could make up for the losses in the primary and secondary sectors. Not one for one and not in the same locations to be sure, still on a net basis there was an escape valve. Nowadays, the tertiary sector, the service economy, is the largest purveyor of jobs in developed economies, by a margin. It also appears to be fully developed, or at least there is no obvious huge room for additional job creation. Hence, the main interrogation today is what happens once machines can do most of both manual and intellectual tasks better than humans? Sure, the quaternary sector, the “knowledge economy” will boom. Even so, is it prudent and realistic to assume that it can create sufficient human jobs that machines won’t do better? Most probably, only a fraction of the cancelled positions will be compensated by new openings.

In the previous industrial revolutions, including that of information technologies, there was always a practical limit to what machines could reliably do at a cost lower than human labour. With AI, let alone artificial general intelligence (“AGI”), those limits will eventually dissolve one by one.

So, what should we expect to happen when we start experiencing a cut back in the overall requirement for human labour? There will still be two types of positions available where machines cannot do better than humans: those that only a few hyper-qualified people can do, whether it is manual or intellectual tasks, and those that a lot of people can do. Economic incentives dictate that where there is scarcity the negotiating power shifts to the supplier and, conversely, where there is oversupply the purchaser of product or service has the upper hand. The predictable outcome is therefore that a small percentage of the employable population will be able to extract excellent salaries and a much larger number will earn the bare minimum. That is for the positions where humans still have an edge, all the others will be filled by machines, meaning the majority of the employable population will eventually find itself without jobs. Not tomorrow, not in ten years, but I expect we’ll be in the midst of it within 25 years, by 2050.

Furthermore, as the requirement for human labour weakens and its share of wealth creation decreases through lower wages and a reduction in the absolute number of workers, this will financially benefit the holders of capital, the ultimate owners of private companies, thus exacerbating already striking wealth inequalities. That is a recipe for social instability and autocratic oligarchies.

Beyond the economic impact of the net displacement in jobs, there is another equally problematic issue, arguably a more profound and intractable one. The concern is not simply that many people will end up being unemployed, it is that there will be a global lack of demand for their existing skillset: they will be unemployable. With AI, almost every type of job is fair game so eventually there won’t be room for everybody.

The economic ramification comes down to ensuring the welfare of a large, disenfranchised part of the population, not solely their subsistence, because they objectively cannot find a decent way to earn a living. This isn’t about people taking advantage of the system, it won’t be a matter of choice. More worrying perhaps is the psychological implication stemming from the quest to find meaning when one no longer contributes work or, more generally, value towards society. This generally leads to the erosion in someone’s sense of purpose, the perceived inability to be useful to society with no end in sight to that condition.

Short of a complete upheaval and restructuring of our socio-economic system, there are a few ways to alleviate or circumvent the above problems. To begin with, the first step should be to weight the shadow cost of AI systems before deploying them. This entails factoring the direct and secondary impacts into the return-on-investment formula used to assess the merits of a new product or service. This type of comprehensive cost and return analysis just isn’t on the agenda of companies because most of them only care about their profits and the ultimate consequences and costs cannot always readily be translated in monetary terms.

Where is the police then, so to speak? Why aren’t national governments forcing companies to carry out such studies despite the challenges in accurately quantifying impacts. Relative competitiveness is a primary justification. In a globalized world where services and products flow across markets, national governments want jobs to be created and taxes to be paid within their borders. With that premise, they are not particularly sympathetic to the idea of weakening their national champions with the weight of an array of costs and rules that put them at a disadvantage compared to their international competitors, especially when those governments are subject to short term electoral considerations. It would take a supra-national AI regulator with very robust enforcement capabilities to address this prisoner’s dilemma-type of dynamic. The second key argument would be the lack of direct, scientifically provable relationship between the making available of a certain service or product and its ultimate psychological and economic impacts. There are too many variables and combinations of factors coming into play: there is no single offender, it all adds up.

Another avenue of mitigation consists in shifting the allocation split of value creation in favour of stakeholders, and society in general. The most obvious levers to pull are the raising of minimum wages to compensate workers and increases in taxation to benefit the government’ coffers and therefore the rest of a country’s residents. Nevertheless, these measures would impose high fixed costs on businesses and consequently make those jurisdictions less competitive to operate from. Instead, there needs to be a concerted international evolution towards a model where companies work in the best interest of all stakeholders, not solely shareholders.

These long term, strategic initiatives need to be supplemented by more immediately actionable undertakings that can partially allay the initial phases of job displacement. The most evident line of defence is the upskilling and reskilling of the workforce. To remain employable in a professional market facing a relentless advance of AI capabilities, one will need to constantly learn to be more productive and able to execute tasks that machines are incapable of, or at the very least not efficient at; this is the upskilling. Eventually, some types of job will face a dead end of sorts that will call for a lateral move, either recycling existing skills into a different work scope, or plain going back to school; this is the reskilling. In reading that, one needs to be mindful that learning new skills as a hobby is one thing, being forced to reinvent oneself several times over is another, it can take a psychological toll. We will need to show adaptability and resilience in the face of change.

Eventually however, we will enter a world with less work to be done so that even assuming the required high-level changes have eventuated and the main issue of spreading the benefits of AI has successfully been addressed, there will be plenty of spare time outside of work. To retain purpose in our life, a lot of us will therefore need to create purpose besides work such as socializations in clubs and associations, higher levels of physical exercise, reading, learning and teaching (especially as there will be an increased demand for it on the back of upskilling and reskilling). This solution will require a mind shift, undoubtedly, but it has the immense advantage of being in everybody’s control.

c) Interpersonal bonds

The most disruptive characteristics of the development of telecommunication technologies is probably the way it makes digital mediation and exchanges not only possible but seamless, thus bypassing the necessity of physical proximity for two persons or even a group to interact with one another. The main vectors of penetration of the digital in our personal lives are the social networks, messaging and video conference applications. The problem doesn’t arise because on a particular day we choose to chat over the phone with somebody we could have met instead, it emerges because we no longer spend the time and make the physical investments required to meet that person when we can simply talk to each other digitally. Over time, this will become the new normal and the interpersonal bonds we develop will end up being forged and maintained primarily online.

Regrettably, the stupendous rise of generative AI capabilities and their upcoming adoption and use in our daily lives looks set to usher in more immersive interactions with digital environments and entities. The improvement in virtual reality (“VR”) hardware and related buildout of digital universes nicknamed “metaverses” may look unthreatening at this stage, nonetheless as the various technologies mature and deliver on their potential, we can fully expect those digital environments to be favoured by many over the tedious and often unappealing reality. Imagine being transported to imaginary and visually stunning worlds, being able to achieve unthinkable feats, or having the option to meet and spend time with any type and number of man or woman you would like. All that for an affordable price and without having to step out of your home.

The trouble is that we have not evolved this way as social animals, we can’t express ourselves, empathize and share stories and emotions in as intimate a manner as when we are meeting face to face. Think of it as physical exercise; you don’t need to train but you will be fitter if you do. Likewise with meeting people in the flesh, you don’t need to but you will be emotionally fitter if you do.

What does this portend exactly? The base assumption should be that somebody who spends an increasing proportion of time online, possibly in a metaverse, will first start being dissatisfied with real-world interactions because they require too much time investment and one always runs the risk of rejection, verbal fights and simply not getting on very well. Why go through those motions when the outcome can be controlled? Except that once digital relationships become the main source of love and friendship, that person is very likely to start finding those unfulfilling. It’s like playing chess against somebody who lets you win, always. Everything becomes a foregone conclusion and every relationship is basically a set up. Hard to feel truly appreciated on that basis. The denouement will be a person who ends up frustrated, socially unfit, alone.

No prize for guessing that this would lead to mental instability and, at scale, to the tearing of our social fabric. In fact, the signs are already there in the new generations that grew up online and the ongoing weakening of the threads that hold our societies together.

Since the negative side effects of “too much digital” for our own good are mostly intangible and still relatively speculative, it is difficult to quantify them. This suggests that assessing the shadow cost of these technologies before deploying them is, as a standalone, unlikely to prove a truly successful mitigation strategy. What other criteria should we apply to make such decisions then?

In many ways, it is easier to merely take a step back and ask ourselves a very simple question as relates to individual applications: what do we get out of them as individuals, and as a society? One could argue there should be a presumption of innocence that applies but I would answer there is no such thing when dealing with technologies because we know that technologies can have positive or negative consequences depending on how they are wielded. Therefore, the responsible stance is to adopt a precautionary principle: we should ask the question of what we are trying to achieve and whether the plausible and probable benefits as individuals and as a society distinctly outweigh the plausible and probable risks.

If the answer is “to have fun” and the risks include a potential or actual epidemy of depressions and suicides among teenagers, it should be evident that the deployment of the proposed technology or one of its specific applications should be prevented, especially as it is already possible to have fun without those side effects. Had that question been asked and the answer used to decide whether to greenlight the modern versions of digital social networks, then we can wager they would not be poisoning our current emotional well-being and even the political landscapes of many countries. Alas, the question was not asked beforehand, we simply allow real-life experiments to be run on us with the downside neatly allocated to the consumers of those technologies.

The cynic may counter that consumers know best or are responsible for their own actions, that it is no fault of the purveyor of technology. This is clearly disingenuous and akin to suggesting that a pharmaceutical company is not responsible for ensuring the drugs it manufactures and sells actually do cure people rather than kill them. Yes, the customer should make some groundwork before trying something new, however that doesn’t exonerate the vendor from its responsibilities and liabilities. All the more so when the applications prove addictive through the release of dopamine, thus bypassing or overwhelming the capacity of users to reason, younger persons in particular. It is plain to see that in those instances, the wants of consumers should not be the primary source of decision and there ought to be top-down regulatory limitations and prohibitions.

d) Value misalignment

A computer with a single program does only what it has been instructed to do. However, as systems get more complex, computers try to achieve the objective that has been set for them and we do not always have a good understanding or control over the intermediary steps involved in that process. In fact, the term “black box” is often used to describe the inner workings of deep neural networks. Whilst it is relatively easy to follow one specific link between two layers, there are so many parameters overall that it becomes challenging to know what triggers what and this problem is compounded by the difficulty in representing the vast array of figures manipulated (such as the word2vec coordinates) in a way that makes sense for us.

As a result, when an objective or reward function is set, anything that is not expressly prohibited is potentially in play for the AI system. In other words, the end justifies the means, this is the source of what is called the “alignment problem” in AI. There are some famous, slightly over the top examples of what could go wrong in theory such as solving the climate crisis by exterminating the human race. Such illustrations are not really helpful because they are unrealistic in practice: the potential danger could be avoided with the simple addition of a few straightforward specifications. A more representative case of the challenge would be the use of fully autonomous military drones to strike military and infrastructure targets. Without any other constraints, the drone will complete its objective even if there is a well-attended school or hospital in the next building. Not that human decisions will necessarily turn out to be different but at least they will be preceded by an evaluation of various factors not contemplated in the mission description, including ethical ones such as trying to avoid killing innocent civilians. Unless programmed, such variables will not be taken into consideration by the AI system.

The value alignment problem is therefore not about objectives, it is about what is permissible or even desirable on the way to attaining them. The difficulty originates from the fact that there are many ways to get to a same objective. We carry innate and learned social navigation tools that machines do not have unless they are purposely provided with them or are able to learn them. These tools are cultural and ethical norms. We respect human life and will therefore weigh the pros and cons of destroying a military target if that implies civilian casualties, even more so if children’s lives are in the balance. Human societies have moral values but not machines and this inconsistency can give rise to dangerous behaviours. This is the reason why the alignment problem is deemed one of AI’s foremost safety concerns.

The natural question to ask at this stage is how this risk can be mitigated. Relying on prescriptive rules alone will never be sufficient since pre-empting all possible scenarios is unfeasible, especially as we make AI systems forever more creative. Therefore we need to incorporate higher-level principles such as “do good” and “don’t cause suffering” in those models alongside evaluation functions to assess whether they are doing good or causing suffering. Easy enough then, right? Not at all, because those are ill-defined concepts that are themselves built on top of other concepts such as “nice”, “harm”, “kind”, “rude”, “friendly”, etc. Essentially, thinking systems would need to be developed starting with some base concepts that are learnt from a large dataset of labelled examples and tested thoroughly, after which more complex concepts can be learnt on top of those. And so forth. Short of a well-developed set of common-sense type of algorithms, it would be reckless to grant full autonomy to those systems. Instead, it should be possible to default to a safe behaviour mode: abort drone strike if unsure about the extent of collateral damage.

Until the preceding points have been addressed in a satisfactory manner, the path of reason is to limit the capabilities and empowerment of AI systems.

e) Existential risks

Potential events that could negatively impact human life on a global scale, including extinguishing it wholesale, are called “existential risks” or “global catastrophic risks”. The reason we are faced with those is, deep down, the same we just discussed: the misalignment of values. With the difference that existential risk can in theory not only originate from AI systems themselves, as independent thinking and acting entities, but also from human individuals using powerful AI systems with harmful intentions. Such person or persons may be psychologically deranged, be a race or religious fanatic, or perhaps reject our current materialistic lifestyle and seek to make a very visible point. In addition, as we seek to endow AI systems with more human-like type of cognitive skills, it is quite possible that some of these systems will arrive at ethical conclusions that differ from ours, if only because theirs are not polluted with speciesism and other group biases.

There already are many types of global catastrophic risks we are exposed to such as a large asteroid collision with Earth, a thermonuclear exchange between two super-powers, or global warming with its ensuing rise in sea levels, extreme summer temperatures and major storms of increasing destructive power that will also extinguish millions of animal and plant species in the process. AI brings to the table two unfortunate contributions of its own:  their use to create novel weapons, in particular bioweapons with virulent pathogens that have high mortality rates, and the potential for an autonomous AI system to “turn against” humans, which I will split into two different scenarios.

In the first one, the AI entity develops notions of supremacy and takes over to exterminate or enslave humans. That doomsday plot is the one most anchored in the public imagination through movies and, although the humanoid aspects of the bad guys make no sense whatsoever and really distort the perception of how it could unfold, the possibility of having machine overlords can’t totally be dismissed. The classic counter-argument to this is that the idea is merely a projection of our flaws of character and that we anthropomorphize machines, ascribing to them our aggressive and territorial tendencies. The counter-counter-argument would be that developing AGI involves making AI systems more human-like in some respects and if humans are morally capable of genocides what is to prevent machines from walking the same path?

Personally, I don’t think this scenario is remotely probable. For one thing, the out-and-out enslavement of humans by machines doesn’t make much practical sense since by that stage machines would be better served by designing and assembling other machines to fulfil their needs better, without the risk of rebellion, the headache of complaints, or the logistic imperatives related to feeding. Machines are good at optimizing and keeping slaves just for sadistic enjoyment doesn’t sit well with that. This risk would be magnified if an AI system was ever endowed with sentience. Suddenly, a new reward function such as pleasure or power would come into play, one that it can push to the maximum and seek to optimize relentlessly.

The second scenario could be called the “benevolent dictator” and seems way more plausible. In that storyline, machines have a strong utilitarian framework and come to realize that although human life might be more valuable than that of a non-human animal, it is only so in a gradual way, not in a binary one. It follows that saving one human life cannot justify sacrificing a large number of non-human animal lives and that the damages human practices inflict on other species are morally indefensible. This extends to outright killing for food since vegetarianism, and very soon farmed meat, are viable and sustainable options that can ensure the subsistence and performance of humans. Realizing that, the AI system(s) would step in to prevent these harmful behaviours, imposing and enforcing a total prohibition on animal husbandry. Since many non-human animal species are also fully exposed to the devastating effects of climate change, the AI system(s) would clamp down on excessive consumption and transport, massively restricting the manufacturing of consumer goods and global energy consumption to force the carbon footprint back down to sensible levels. In so doing, they would save trillions of animal lives, many of them human ones.

Perhaps this scenario isn’t so bad and doesn’t neatly fit in the category of existential risk. And yet, it probably doesn’t take too much of a tweak to the valuation algorithms to decide that the plant and animal kingdoms are overall better off without humans on the planet or that calm and peace are best attained by clamping down hard on individual freedoms that many people would psychologically struggle to live without, at least initially.

Time to rewind and reflect on the course of AI development that could lead to systems taking autonomous decisions, some of which could be in contravention of specific instructions. I will provide an example of how it could unfold, not that I believe this is the most credible and obvious path, because it gives an inkling of the variety of ways in which seemingly innocuous steps could take place and cascade into a loss of human control over what would effectively have become an independent artificial entity.

Imagine we have a given AI system that we nickname “Smart1” and we task it with finding the best way to solve a specific problem. Unable to find a solution immediately, it looks to improve its reasoning capabilities by forking a version of itself that we call “Smart2” for which it enables iterative improvements. Alternatively, “Smart2” proceeds to fork itself into an improved Smart3 version, etc., until the point where a couple of things may happen. Either some of the instructions designed to prevent it from getting loose are circumvented by developing new thinking pathways and modules not subject to those restrictions, or it evaluates that when a conflict arises between objective and restrictions, achieving the final objective is more important and its algorithms conclude that bypassing or deleting those restrictions is the optimum option.

Hold on the programmer may say, that is not how it works. Machines don’t have a soul or free will. They don’t look at code and think about whether they should follow it. No indeed, not when there is only one linear program. How is it that we humans disregard some rules and moral principles then, when we don’t have free will either? That’s because our brain consists of different modules with tensions between them. This distribution of our thought processes allows for one process to compensate or dominate another. Same thing for software. If Smart2 coded a separate program into Smart3 that allows it to assess the merits of the instructions before deciding to either execute or modify them, then Smart3 has effectively been engineered by Smart2 not to be bound by any line of code.

Exposing that conceptual dynamic of an AI system’s self-improvement through the editing of its own code or just writing new one provides a natural segue to the idea of technological singularity which advances that self-improvement cycles could happen very quickly, potentially in matters of minutes, and the outcome would be AI systems with intelligence levels that far surpass that of humans and improve inexorably along an unknowable continuum. Since current AI systems can generate new code, then there is no theoretical impediment for these systems to create new knowledge. The reliable way would be for a system to replicate the scientific process of making hypotheses and running experiments, establishing solid theories in the process on which to build an ever-larger edifice of knowledge. If that entity were to keep those findings for itself, such compounding would translate into its runaway intellectual and technological superiority over humans.

So, what are viable mitigation and containment strategies? Looking first at the programming of an AI system or its indirect use to carry out actions with large-scale negative impact on humanity. Very clearly, the only defence strategy would, in theory, be to restrict unauthorized access to such powerful systems. Unfortunately, this is not feasible in practice both because companies are looking to make AI a consumer service and owing to the fact that the development of AI systems is completely decentralized with no ability for a regulator to have full oversight. And anyways, who should have authorized access in the first place? Leaving such a potential weapon at the hand of governments will reduce but not eliminate the risk for humankind as evidenced by the nuclear arms race and their stockpiles level.

Turning our sight to the self-editing scenario, we have already detailed why a strategy based on instructing AI systems not to clone or improve themselves without human authorization is not foolproof. It is hard to close the door to all possible ways a machine can bypass constraints, wilfully or not, and the circumventing of restrictions only need to take place once to lose control over the AI system. The moment the cat is out of the bag, it may be impossible to stop it nor is there any way to predict with absolute confidence what will eventually happen.

In either scenario, it only takes one machine or one individual, one time. What’s more, experience shows that errors and loopholes are like bugs in code: you can count on them. Putting these observations together, we can conclude that the only safe containment strategy is to prevent the development of such smart AI systems. Given this is nearly an impossible goal, and maybe not one we should wish for considering the potential benefits, there will always be risks. Accordingly, existential risks ought to be seriously considered by policy makers as well as by researchers when they feel they are nearing step-changes in the capabilities of artificial intelligence and by company executives ahead of authorizing new releases with this type of material incremental aptitudes.

Scroll to Top