On Aligning Values; on three discussions in philosophy.

Jun 06, 2025

I would not be surprised if — and yet would love to learn that — somebody else also noticed that (roughly) pragmatists and utilitarians got very obsessed with the same sort of problem in different domains. The idea occurred to me during dinner with Alyssa Ney (LMU), so I am not claiming originality. This could be all her. Here’s what I have in mind.

The Utilitarians are focused on existential risk. (Of course, not all utilitarians…) This is (recall) the worry (to simplify) that an AI will wipe out humanity or destroy us in some other fashion. One way existential risk is operationalized is in terms of value alignment.’ Here’s a nice summary of the problem:

The goal of AI value alignment is to ensure that powerful AI is properly aligned with human values (Russell 2019, 137). Indeed, this task, of imbuing artificial agents with moral values, becomes increasingly important as computer systems operate with greater autonomy and at a speed that ‘increasingly prohibits humans from evaluating whether each action is performed in a responsible or ethical manner’ (Allen et al. 2005, 149). The challenge of alignment has two parts. The first part is technical and focuses on how to formally encode values or principles in artificial agents so that they reliably do what they ought to do….
The second part of the value alignment question is normative. It asks what values or principles, if any, we ought to encode in artificial agents. —Iason Gabriel (2020) "Artificial intelligence, values, and alignment." Minds and machines 30(3): 411-437.

I have quoted this articulation of the problem because Gabriel’s treatment is fairly sober, and because he recognizes that the two parts of the challenge need not be framed nor answered in an utilitarian fashion. (It’s also been cited a lot.) And because he has one really important insight (that I will quote below).

What makes Gabriel’s paper itself really interesting is that in order to illustrate some of the issues surrounding the alignment problem, he draws on models from reinforcement learning (RL). And that’s because there is a non-trivial sense in which a version of the alignment problem has been explored in depth in the context of RL: “With RL, an agent learns what to do by trying to maximise a numerical reward signal that it receives from the environment.”

As Gabriel notes, RL clearly has a consequentialist/utilitarian bias built into it. I don’t write that as a ‘gotcha.’ In this digression, I take what’s interesting not that the Alignment problem in AI is usually conceived in utilitarian terms (nor the sociological fact that the communities working on these issues has attracted utilitarians). If you spend some time reading around how algorithm design is conceptualized and taught, you notice that a lot of the conceptual structure is derived from the economics of modeling trade-offs, doing optimization, and minimizing risk, etc. (See M. Kearns & A. Roth (2020) The ethical algorithm: the science of socially aware algorithm design. Oxford University Press.) This is a point that helped shape my paper (here) with Federica Russo and Jean Wagemans.

Rather, what Gabriel discerns is that the alignment problem occurs at different levels (from individual agents to whole societies), and different timespans (etc.). And that the two parts (practical and normative) of the alignment problem recur constantly in somewhat different ways. In fact, Gabriel’s really important insight is that “In the context of value alignment, the notion of ‘value’ can serve as a placeholder for many things.” I love this sentence so much, I wish I had written it.

Okay, I hope you have a sense of the alignment problem(s) now.

Now, let’s shift gears. Dewey worried about the rise of technocracy. He discerned that democracy might invest resources to promote the growth of the sciences (and technology) and that this could produce social practices that would be fundamentally at odds with democratic self-government.

Within pragmatism, in his influential (2001) book Science, Truth, and Democracy (Oxford University Press), Philip Kitcher reformulated and extended Dewey’s concerns in terms of the demand for ‘well-ordered science.’ and states the desideratum as follows: “that properly functioning inquiry—well-ordered science—should satisfy the preferences of the citizens in the society in which it is practiced.” (p. 117)

Here alignment between the values of citizens in society and science is modeled in terms of preference satisfaction. This is no accident: one of Kitcher’s great contributions to philosophy is to introduce and use economic modeling and ways of thinking to make philosophical problems more tractable. Science is the output or instrument of citizen demand (i.e., tax-payer money).

Kitcher himself realizes, of course, that this (preference satisfaction approach) need not be the only way to articulate well ordered-ness of science relative to the values of the citizens. And, in fact, the actual preferences of citizens may lack some of the properties needed in rational public policy. So, he starts tinkering with his own characterization.

A few pages down he characterizes perfectly well-ordered science, and the underlying issue stays the same but the implied models that are being used to describe it have shifted:

For perfectly well-ordered science we require that there be institutions governing the practice of inquiry within the society that invariably lead to investigations that coincide in three respects with the judgments of ideal deliberators, representative of the distribution of viewpoints in the society. First, at the stage of agenda-setting, the assignment of resources to projects is exactly the one that would be chosen though the process of ideal deliberation I have described. Second, in the pursuit of the investigations, the strategies adopted are those which are maximally efficient among the set that accords with the moral constraints the ideal deliberators would collectively choose. Third, in the translation of results of inquiry into applications, the policy followed is just the one that would be recommended by ideal deliberators who underwent the process described. Kitcher (2001), pp 122-123 (emphasis in original.)

Here ideal deliberators are a certain kind of (counterfactual) representative agents. (Rawls uses a similar trick in the original position, although he manages to reduce the number to one such ideal deliberator.) Before you dream up objections to Kitcher, he is very explicit that in practice we can only hope to approximate well-ordered science, and that the procedure he proposes itself is a kind of counter-factual or regulative ideal. (In real life “we want it to match the outcomes those complex procedures would achieve.” (p. 123))

Kitcher’s way of conceptualizing well-ordered science has been (recall) extraordinary influential within philosophy of science and the science & values literature. (See, for example, Eric Winsberg & Stephanie Harvard (2024) Scientific Models and Decision-Making. (Cambridge)). I have a lot of standing objections to this literature but won’t repeat any of them here. (My objection is not to the pragmatist sensibility in the literature, although I don’t share it.) For, the reason why I bring it up is because it actually has inspired a lot of splendid work that I really admire. Take for example (recall) the (2021) approach of Anna Alexandrova and Mark Fabian to ensure that all the stakeholders of research are incorporated into the design of actual scientific measures salient in public policy. (Anna Alexandrova & Mark Fabian “Democratising Measurement: or Why Thick Concepts Call for Coproduction” European Journal for Philosophy of Science 12 (1):1-23).

That’s almost all I wanted to digress on today. But I want to point to one other literature in philosophy where this issue of alignment arises. I am not as familiar with the underlying issues, so will keep it brief. But in his (2022) book Epistemic Risk and the Demands of Rationality (Oxford), Richard Pettigrew, calls attention to a class of decision problems when we try to choose for others and take epistemic or other risks on their behalf. (I am pretty sure I got this from one his

Richard Pettigrew

’s blog posts not the book.)

I am tempted to say there are arbitrage opportunities lurking among these literatures; for all i know Pettigrew has engaged in it already. But I need to catch a train, so some other time I will digress on the significance of the existence of these literatures.

digressionsimpressions’s Substack

Discussion about this post