During my internship at Bits of Freedom, I researched the use of algorithms for detecting fake news. These are my most important findings.
Fact-checking fake news: a case studyMachine-learning algorithms are everywhere. Artificial intelligence’s (AI) reputation for outperforming human decision-making, has lead to AI being deployed in order to address all sorts of problems. From recruitment to national security management, algorithms are involved in more and more of our decision-making processes. One would almost be led to believe that algorithms can do everything better than humans, and that human judgement no longer holds crucial value.
An example of a field that makes use of algorithmic decision-making, is the detection of fake news. Traditionally, fake news would be addressed by getting experts, such as journalists, to fact-check articles. However, considering the rapid spread of fake news and the overwhelming amount of information generated in today’s society, there seems to be no other option than to automate (part of) this work. My research looks at how well algorithms perform at this task compared to how human experts perform. Put bluntly: which is the better fake news detector? Subsequently, my thesis explores if and how the dehumanization of decision-making should be advanced.
Supervised algorithmAlgorithms come in different sizes, varieties and methods. In general, they are divided into two categories: supervised and unsupervised. Supervised means that we tell the algorithm what we want it to predict and train it accordingly. Unsupervised means that we do not tell the algorithm what we want it to predict, and instead set it up to find (hidden) patterns in data itself. Our research made use of a supervised algorithm, trained on an already labelled data set consisting of fake- and true news articles, asked to predict if an article is fake or true.
Are uninterpretable results worth it?The algorithm out-performedIt is important to consider that our algorithm was able to review over 38.647 articles, while our human experts reviewed only 18. This means that one misclassification by our human experts would heavily impact their performance. the human experts by 34%. However, our human experts\' results were far more interpretable. Because our algorithm needed 400 trees, each of which had 150 options to consider, it was extremely difficult to understand how our algorithm had arrived at a particular conclusion. Furthermore, we noticed that on occasion the algorithm offered a classification despite being low in confidence about that choice. In other words: our algorithm, which should to a degree be explainable and interpretable, ended up being neither. This is not completely unsurprising, as the more data you work with, the more complex you need your algorithm to be in order to produce robust results.
Considering its structure, our algorithm should, to a degree, be explainable as well interpretable. It ended up being neither.
Thoughtful decisionsThe experience with our human experts, working in duo\'s, was very different. Although they did not perform as well as the algorithm, it was possible to retrieve more information from them about the decision-making process. Some human experts, without being asked to do so, offered additional interpretation and explained why, according to them, articles were fake or true, and even argued why certain articles did not fit in either class. The most fascinating thing was that some of our human experts classified certain articles as both fake and true (again: not the original task), as they did not believe that the articles were a perfect fit for either category.
Furthermore, within the duo\'s not all articles were classified the same. This emphasises the diversity of knowledge humans have and how it can affect decision-making. This diversity, one could argue, could lead to indecisiveness, a problem that did not seem to arise for our algorithm. However, one could ask oneself which is more important: being indecisive but eventually offering a nuanced decision, or being fast but unable to offer an explainable and/or interpretable decision.
Which is more important: being indecisive but eventually offering a nuanced decision, or being fast but unable to offer an explainable and/or interpretable decision?
Stop using algorithms to take decisions we cannot interrogate When compared to algorithms, humans are far more divergent in their thinking. This is because humans possess tacit and implicit knowledge of the world, which is difficult to express, extract, or codify in algorithms. Furthermore, one could argue that even if algorithms were explainable and interpretable, they still might not be suitable tools for governance. In our legal system, rules and regulations come to life when applied to real-life problems. In light of specific facts and circumstances, and guided by a normative framework informed by public values and interests, what on paper seems to be a rather fixed set of rules, suddenly becomes pliable, adaptable. This interaction between a \`static\` set of rules and an evolving normative framework, is partly what makes our legal system so robust. The set of rules that makes up an algorithm, on the other hand, is far more unmovable, threatening to result in a tool that does not allow itself or its (future) applicability to be questioned. This attitude might extend to humans who, fueled by their overestimation of algorithms, cease to assess the tool and unquestioningly implement the decisions it prescribes. It is therefore of utmost importance that we stop mapping algorithms on the scale of human intelligence and stop using them to take decisions we cannot interrogate.
Managing algorithmsConsidering that, for now, we have already turned our world over to algorithms, we should think of ways to understand them better and, most importantly, manage what we have built and done. The quest for explainability and interpretability is an essential field of study, as witnessed by the European Commission’s proposed AI Act. Although the Commission claims to set up a human-centric framework that puts people first, the proposal features fundamental gaps with regards to accountability. Our research leads to the following recommendations:
Explainable and interpretable AI should be the norm, and research in this field should be prioritized so that we can move from a ‘black box’- to a ‘white box’ approach.
A human-in-the-loop approach would provide for human interaction in every step of the decision-making process. The humans in the loop should be diverse with regards to their knowledge, views and experiences. Furthermore, in light of epistemic and normative concerns, algorithms should never be the final arbiters in decision-making processes.
Accountability, through the ability to interrogate decisions, should be critical to algorithmic decision-making and facilitated by a national and/or European legal framework.
‘Let the algorithm decide’ is the wrong paradigm, as it puts human dignity at stake. Before we commit our societies entirely to algorithms, we need to think more carefully about ethical frameworks, explainability and transparency. Because algorithms need managers, too.
Decision-making in the age of algorithm. Comparing Random Forest Classifier with human evaluation on fake-news detection