Despite its gravity and scope, data bias still remains mostly obscure to the general public.
It is mainly a problem for experts to think about. And even data experts themselves struggle with making sense of the issue, though most will agree it is an important challenge to tackle from multiple fronts.
The Medical Field
One area where AI and data promise great improvements but where data bias can also have a disastrous impact is the medical sector. It is a concrete example for understanding data bias a bit better. We talked to an expert with years of experience as a data scientist: Dr. Prabhakar Krishnamurthy (PK). Based in the USA, PK currently works on healthcare AI. He has a PhD in engineering.
We discussed with him why data bias is such a difficult yet so important challenge to talk about. He explained to us that many of the risks of data bias have deep historical roots, can have drastic effects for the people affected but that there are ways to address the issue through more openness. Ultimately, PK says, data professionals need to accept their responsibilities for data bias.
Interview with Dr. Prabhakar Krishnamurthy
Why are you focusing on the medical field?
Sometimes we don’t know we are being personally affected by data bias. We don’t know what methods and algorithms employers are using when we apply for a job. We don’t have the details of those models. I came to the realization that most of us are affected by data bias in the health field. We all go to see a doctor, we get prescribed medicines, we get diagnosed and so forth. The medical field is filled with data bias issues.
`Most of us are affected by data bias in the health field`
Have you ever encountered a case where you were affected by data bias in the medical field?
I’m from India, so I’m South Asian. It turns out that South Asians have a much higher incidence of a heart disease than others. It’s almost four times as high as for Caucasian males, for example. Most doctors may not be aware of this, because they may not be seeing a lot of South Asians. So, there’s a standard range of acceptable cholesterol levels, which is considered normal. But for South Asians it’s a little bit different, because of the higher incidence of heart diseases. Doctors don’t tell you that, because they’re not aware of that.
Do you have an example of a specific product or service that turned out to be built with biased data?
This is not my personal discovery, but I have been reading about this. In the case of skin cancer detection algorithms, most of them are trained on data from light skin individuals and do worse on detecting skin cancer for patients with a darker skin. Similarly pulse oximeters, the ones that measure the oxygen level in your blood, produce less accurate measurements for Black Americans. Hospitals are starting to use AL programs to decide who gets access to high risk health care management programs. Researchers discovered that the software routinely let healthier whites into the programs ahead of blacks who are less healthy. The problem was that there was not a target variable to train on, so they use a proxy variable. They ask questions like: what are the costs of treating somebody? They assume that the costs would reflect on the amount of intervention needed for a positive outcome.
The issue is that African Americans usually have lower access to health care and insurance, while there might be a higher level of poverty. They don’t necessarily get the best quality treatments or sometimes they don’t have access to the necessary care, which means that the costs are usually lower for them. Because of that and the measurement it might seem that they don’t need a bigger amount of intervention.
Do you think that could be the case because the designers are mainly white?
I wouldn’t say it has to do so much with who designs it, but it’s just that there is a general lack of awareness of the sort of things that can happen. In general, there is a lack of awareness of bias. It has more to do with the traditional way of thinking, because most of the data in the American health field are for the white male. It goes back to the 50’s and 60’s, where most of the medical treatments were designed for men. Companies are driven to efficiency, so they might not give the necessary attention to these kinds of things.
´Different systems use slightly different formats, which might make those data incompatible´
Can companies still be efficient, but also avoid these biases in their data by making them more diverse?
Just speaking broadly, it would be easier to have all the data possible. Electronic health records are being digitized at a higher rate nowadays, but there is still a lack of standardization I believe. Different systems use slightly different formats, which might make those data incompatible. The other thing is that health systems are not always willing to share their data. If you go to a particular hospital and another hospital has all the data available, you might decide to change your medical provider because of that.
Could it be too expensive for companies to make data more diverse, and why is it so expensive?
If the data are not standardized, it takes work to translate the data into particular systems. Formatting and coding itself requires a lot of people and time. There are also regulations like the HIPPA-regulations, which protect the privacy of patients. HIPAA compliances (Health Insurance Portability and Accountability Act) are the standard for sensitive patient data protection but it’s an expensive process to build.
Is there something that people can do to spot data bias?
I think patients should be more aware and probably do some research. There is a lot of material available, a lot of articles that have been published. Be more aware of health risks that could personally affect you.
What do you think about a possible disclaimer that comes with data, so we can see on what kind of people it has been trained on? E.g., white males.
´When researchers are building models that people’s lives depend on, it’s normal for people to criticize the model´
That’s a good point. One of the things I have read is that medical companies don’t often include analysis and reveal the source of the data in their public filings. We could should be able to tell from the public filings whether devices were tested on representative populations.
I have heard about the idea to make a so-called data card. Software that comes with a certification or a disclosure that says what kind of data were used. That is already a bit more transparent. Also, one of the things that happened in my discipline is that when researchers publish articles at conferences, they don’t talk a lot about where the data came from and how representative the data are. These kinds of articles should include that discussion.
The last thing I want to mention is that, when researchers are building models that people’s lives depend on, it’s normal for people to ask about its impact on people. I have seen situations in which the researcher would say: I’m just a researcher! We have got to realize that we can’t say things like that. You have to own it.
Finding Solutions to Data Bias by Dennis Nguyen
How data-driven technology changes the medical field can teach us a lot about the real harms and risks that data bias bears. He reminds us that an uncritical attitude towards technology, ignorance among data professionals, and unawareness among target groups are key factors to consider in understanding data bias.
´A lack of standards for data use and questions of data ownership further complicate things´
Now, how can we find solutions? It all starts with acknowledging that tech creators often lack crucial knowledge about target groups and prioritize quick releases of their designs. A lack of standards for data use and questions of data ownership further complicate things. However, data professional need to accept their responsibilities and actively question their own assumptions before sharing a data-driven system with the public. Transparency and accountability are essential here. Also, lay people need to become more aware of data bias as a potential risk. By clearly explaining how their systems work and what data they use, tech creators can contribute to building up this awareness. Open dialogue about how data creates value but also has its limits must be part of finding solutions to data bias.
Het Lectoraat Human Experience & Media Design ontwikkelt middelen, methoden en modellen om de gebruikerservaring van digitale media te verbeteren. We onderzoeken hoe mensen omgaan met digitale media en hoe ontwerpers hier op in kunnen spelen. We kijken daarbij met name naar kansen voor en het ontwerp van intelligente en data-gedreven producten en diensten.
Met praktijkgericht onderzoek draagt Hogeschool Utrecht bij aan oplossingen voor uiteenlopende maatschappelijke vraagstukken. Vraagstukken die worden aangedragen door onze partners uit de beroepspraktijk, op regionaal, nationaal en internationaal niveau.