New machine learning tool links bladder cancer to smoking

A powerful new machine learning tool developed by experts at the University of California, San Diego (UCSD) has uncovered a pattern of DNA mutations that links bladder cancer to smoking. The AI ​​machine, called SigProfilerExtractor, used de novo extraction to find the link.1

The tool analyzed the mutational sequences found in the participants. These sequences are specific patterns of mutations generated when a person is subjected to environmental exposure, which then alters their DNA. The extraction revealed strong epidemiological links between bladder cancer and smoking, whereas the previous link to smoking only existed with lung cancer.

In a press release about the study, lead author Ludmil Alexandrov, PhD, explained how the machine works by comparing it to conversations at a party. “You have multiple groups of people talking all around you, and you’re only interested in hearing certain people speak. Our tool basically helps you do that, but with cancer genetic data. Multiple people in the world are exposed to different environmental mutagens, and some of these exposures leave imprints on their genomes. This tool sifts through all of this data to identify the processes that cause the mutations.” Alexandrov is a professor of bioengineering and cellular and molecular medicine at the UCSD.2

For the study, the tool looked at 23,827 sequenced human cancers, consisting of 4,643 whole-genome sequenced cancers and 19,184 whole exomes. After extraction, the tool found 4 new mutational signatures, including one associating bladder cancer with smoking. The study team found that the mutational signature is different from that found in lung cancer. It is also found in tobacco smokers who have not developed bladder cancer. The signature was not found in the bladder tissue of non-smokers

“What this signature tells us is that certain mutations in your DNA are due to exposure to tobacco smoke,” said study author Marcos Díaz-Gay, PhD, postdoctoral medical researcher cellular and molecular at UCSD. “It doesn’t necessarily mean you have cancer. But the more you smoke, the more mutations accumulate in your cells and the more you increase your risk of developing cancer.

The new machine was also compared to 13 other existing bioinformatics tools, all of which analyzed the mutational signatures of 80,000 synthetic cancer samples. The tool developed by the UCSD team detected 20-50% more true positives than others and had a 5 times lower false positive (FP) rate than others. The SigProfilerExtractor also performed well with datasets containing high levels of random noise, unlike the other tools.

This work could help researchers find other links between environmental factors and cancer, which could lead to more personalized treatment for patients. In the future, the team hopes the tool can be used on a more individual level to profile patients with bladder cancer. For this to happen, they would need to create a more user-friendly interface for researchers rather than a tool relying on bioinformatics expertise.


1. Islam SMA, Wu Y, Díaz-Gay M, et al. Discover new mutational signatures by de novo extraction with SigProfilerExtractor. Cell genome. Published online September 23, 2022. doi:10.1016/j.xgen.2022.100179

2. Mutational signatures linking bladder cancer and smoking discovered with new AI tool. Press release. University of California at San Diego. September 26, 2022. Accessed October 7, 2022. mwhr&xy=10016681

Comments are closed.