Trudy Huskamp Peterson

Certified Archivist

Commentary, Artificial intelligence and the data that trains it.

In a Russian history class I once took, the émigré professor insisted on a distinction between intellectuals and intelligentsia. Intelligentsia, he said, were people with education, but what distinguished them was their status as a group possessing influence in society. Intellectuals, on the other hand, were, well, just smart.

 Artificial intelligence is all over today’s news.  It combines the professor’s two definitions: it is smart (it has a huge memory, it makes decisions based on its memory) but the way it is smart reflects the social class of the people who had the power to build it. Just look at two examples: 

*Researchers in the U.S. found that “three commercially released facial-analysis programs from major technology companies demonstrate both skin-type and gender biases.” In a set of photos, the artificial intelligence programs correctly identified white males as white males more than 99% or the time, but only 65% of the time correctly identified darker-skinned females. The probable reason: the data set used to “teach” the artificial intelligence was heavily male and white. 

*Human Rights Watch reported that authorities in China’s Xinjiang province are using big data analysis for a “predictive policing” program which “aggregates data about people—often without their knowledge.” The data is gathered from an enormous variety of sources, ranging from surveillance cameras to “wifi sniffers” to information obtained during home visits. Persons have been detained because the software identified them as potential threats. 

Advocates argue that artificial intelligence algorithms can successfully take on questions as varied as identifying depression in people by analysis of facial expressions, reducing snarls in urban transport, pinpointing crime hotspots and upgrading slums.  Medical researchers are rapidly adopting artificial intelligence tools, as a look at any recent issue of HRWG News will show. 

Artificial intelligence relies on information, both the type of data selected to “teach” the programs and the data against which the programs run. And artificial intelligence produces information, such as when to arrest people in Xinjiang or to treat an illness. Archivists must be involved both in ensuring that the inputs are reliable data and in preserving the results. We have to get this right: people’s lives literally depend on it.