Scaling Up the Universe – Anthropology News

May 27, 2025

0 Views 0

SaveSavedRemoved 0

How Astronomers, AI and Citizen Scientists “See” with Un/certainty

The previous decade has been hailed as “the golden age of astrophysics”by Oxford astronomer Chris Lintott at a Google talk in 2020. Due to recent advancements in data processing and storage capacities, astronomers began “seeing” the universe at scales never reached before, focusing on collections of objects such as galaxy clusters, and the relations between collections, such that the very structure of the universe is coming into view.

In this context, I have been following the emergence of the much-hyped “world’s largest streaming data processor” in the Karoo, South Africa. The SKA (Square Kilometer Array) is a radio astronomy observatory with an infrastructure spanning Africa and Australia. Connecting these observatories together widens the baseline and increases the resolution of sources in the southern sky. This telescope aims to break all records in terms of sensitivity, processing speed, and data collection on Earth with estimates of ~1 petabyte of image data per day. In tracking these events, it became clear that not only was astronomy at the forefront of the big data moment, but that discovery practices were being reshaped to accommodate the data deluge. In order to manage the vast quantities of data, scientists have turned to citizen science and, more recently, to neural-network based artificial intelligence, with astronomers drawing on their own professional vision and skills to coordinate these two outsourced strategies. AI is hailed as a game-changer in the quantified era. In contrast, citizen science—or amateur astronomy—existed for millennia before the advent of professionalized astronomy.

Science claims to produce knowledge that is rigorous, impartial, objective, and universal. Indeed, astrophysicists insist that astronomy is apolitical because it is grounded in allegedly value-free data from signals “received” from an active universe. However, as social studies of science have taught us, scientific discovery is far from a neutral process. Astronomers devote considerable technical labor towards diminishing the noise of earthly circumstances so as to receive those signals. Classifying sources is one example of that labor. What astrophysicist Anna Scaife calls “quantifying uncertainty” are statistical comparisons of the classifying accuracy of “experts”, “citizen scientists” and “machine learning” in terms of time-investment:

Credit:
Siri Lamoureaux

A screenshot taken from a talk entitled “AI & Bias in Radio Astronomy” on Youtube by Anna Scaife (22:30)

The decontextualized, reductive nature of these statistics can make them seem to be the expression of objective fact because they obscure the interpretive steps that produced them. By contrast, attending to the sociocultural process behind the production of probabilities—the work of achieving certainty—reveals a different picture, and an ongoing debate over what counts as signal and noise in identifying sources.

Linguistic anthropology’s theorizing of how evidence and responsibility are organized in discourse “epistemic modality”is helpful in sifting through the stakes of these debates. Epistemic modality is a broad field of study spanning particles in linguistic descriptions of grammars to broader discourse stances or psychological states. Jane Hill and Judith Irvine developed this in terms familiar to linguistic anthropologists, where “responsibility” over stretches of discourse could be managed and negotiated by speakers through a variety of strategies such as entextualization or reported speech. Recent approaches move away from individual mental states, towards more intersubjective and relational understandings, such as communication tactics for asserting certainty over the source of knowledge. In my research, I cast a wide net, drawing out the discourse of epistemic modality from social media channels, scientific publications, and institutional norms, to technical inscriptions in AI process itself—numerical outputs and percentages of un/certainty.

Credit:
Siri Lamoureaux

A Screenshot of Galaxy Zoo interface

I’ve been participating in and following the activities of the popular citizen science website Galaxy Zoo. After completing a tutorial, citizen scientists (self-labelled Zooites) are shown photometric images, and, depending on the research goals of the project, click buttons on the interface. Zooites, following a decision tree, classify galaxies into categories based on basic shapes (morphologies) such as spiral or elliptical, whether it has a bulge, or is barred, textures (smooth, rough), colors and luminosity and movement (counter/clockwise).

This interpretive work is meant to be a straightforward eyeballing exercise available to anyone independent of any deep reflection or interaction. But in reading the early blog posts linked to the science projects, I learned that astronomers were faced with some surprises. Better results were linked with featured galaxies—such as those with visible spiral arms—rather than boring blobs, statistically. Humans produce less accurate results when less confident, when the words “doing real science” were part of the set up, and more accurate results when told to do this for fun. While Zooites were meant to do their click-work alone, Galaxy Zoo was flooded with so many questions, astronomers opened a blog and interactive tools like “hangouts” to better explain and reassure the Zooites worried about “correct” classifications. In one such “hangout”, astronomers reassure the Zooites: “If you see an elliptical and others see a spiral, that’s still valuable information. There is no wrong answer. We don’t want you to be persuaded by what others are selecting”.

Citizen scientists are meant to find consensus through numbers, not through talk. This discourse reveals a very cognitive, individualist approach to interpretive work. Any uncertainty would be managed statistically. But for the Zooites, the hangouts and blogs were co-opted for group interaction, to discuss, debate, find consensus and solidarity. They even served as sites for the basis of identity formation. Several groups of self-named “Pea-hunters”, “Pea-pickers” or “Peas-Corps” emerged around the hunt for “Peas”—round, gaseous, green formations, that astronomers had hitherto considered “noise”. This points to the importance of the epistemic process being socially mediated through talk and though category-making, instead of a purely visual and mental approach to pattern recognition. Citizen scientists can launch discussions of images they find “gorgeous”, “awesome” or just curious. The social interactive process of discovery, witnessing the “vastness” as well as a “weirdness” of the universe with others—producing noise—is highly motivating.

In contrast, rather than discussing uncertainty, AI quantifies it. Convolutional Neural Networks (CNNs) in particular, a multi-layer type of deep learning used for images, extract and measure features to predictively classify objects. Astronomers and data scientists have been tinkering with CNNs with a vast range of techniques (clustering, translation etc.) and measurable properties, analogous to those of humans, but isolated from human reasoning and emotion. This includes pixel brightness, bars of the spiral, color, size or rotation, all geared towards accuracy, generalizability and internal coherence—limiting the noise. This work requires dis-embodying the data from human perspective, from bias. In a pre-neural network era example, astronomers found that Galaxy Zoo citizen scientists identified more counter-clockwise galaxies than clockwise, violating the cosmological principle—that any wide-enough view of the universe should not depend on your position in it. But when astronomers flipped some of the counter-clockwise to clockwise, the volunteers still identified an excess of counter-clockwise, pointing the bias in the other direction—to the humans. Perspective is noise, and CNNs today are very good at filtering out the factors that interfere with the signal.

Credit:
Siri Lamoureaux

An image of how Convolutional Neural Networks “see”. This is a CNN first layer based on pixel brightness for classifying galaxy morphologies.

At the same time, scientists worry about the lack of good training data and that CNNs are not skilled at actually discovering anything new. In the epistemic discourse of statistics, “certainty”, “probability” and “confidence” of the model’s predictions decline with unusually shaped galaxies. Thus citizen science remains extremely valuable to astronomers as human-labelled datasets are fed into the training process. Galaxy Zoo today capitalizes on the different strengths of the citizen scientists and AI. A strong distinction is made between machine learning (here an “AI assistant” named Zoobot) and the exceptional human eye:

“Your time is precious. Galaxy Zoo volunteers can recognise and classify the detailed features of galaxies in ways that Zoobot can’t—and nor can any other algorithm. More than that, humans have a unique ability to spot things that look just a little bit weird. Volunteers talking about strange objects [emphasis mine] has led to some of our favourite discoveries, including the Voorwerpen. Using Zoobot means you will be much more likely to see more diverse galaxies and come across more weird and wonderful objects.”

There is a division of classifying labor between “more featured, textured, unusual” galaxies (on the left) and “boring” “smooth” blobs (on the right). Galaxy Zoo will avoid “showing galaxies where Zoobot is 90% confident that fewer than 2 out of 10 volunteers would click “featured”.

Credit:
Siri Lamoureaux

A screenshot of Galaxy Zoo Site News “Better Galaxies – Better Science” , by Mike Walmsly

Over time, astronomers running Galaxy Zoo took note of the value of “talking about strange objects.” In a current experiment, the interface allows citizen scientists to communicate their uncertainty on a sliding scale. If I am uncertain about seeing a strong bar or weak bar, I can use the slider tool or write a message in a “talks” function. This relational interpretive work, this epistemic modality, is then buried behind percentages of certainty, extracted from the contexts of their production. What was noise became signal. The automated AI classifier is depicted as an autonomous actor, without a trace of human perspective.

Credit:
Siri Lamoureaux

A screenshot of Galaxy Zoo Project “Letting Things Slide”, by Mike Walmsly

This experiment aside, the general thrust of the field is aimed at bigger datasets and unsupervised automated classifying algorithms, leaving citizen scientists and talking behind: “humans don’t scale up”. But the advent of this future parallels growing questions in astronomy. Is bigger data better science? What is being classified and why? What is the role of the expert astronomer? These point to a field in flux, epistemic uncertainty about signals and noise and even what counts as “discovery”.

Further readings from the Journal of Linguistic Anthropology on migration, language learning, and postcoloniality:

Robyn Holly Taylor-Neu, Parasites and Post-Truth Climate

Colin Michael Egenberger Halverson, Evidence and Expertise in Genetic Nomenclatures

Dongchen Hou, Writing Sound: Stenography, Writing Technology and National Modernity in China, 1890s