There can be nothing more terrible than hearing that your loved one has been lost in an accident. In large-scale disasters with many victims, families may have to wait days to find out exactly who did not survive. It’s therefore tremendously important that victims of tragedies are quickly and accurately identified. But when there may be limited information available, and only partial DNA data, the task of identification can become very difficult and time consuming. If computers could collate the fragmentary information together and calculate exact likelihoods for the identity of each victim, this challenging task could become quicker and safer than ever before.
It was a hazy morning in May 2010. The beginnings of the sunrise cast an orange glow across the horizon. After more than eight hours in the air, Afriqiyah Airways Flight 8U771 was coming in to land at Tripoli airport. It was carrying 11 crew and 93 passengers, many of whom had been on holiday in South Africa, from where the plane had departed. The pilots were tired as this was their second consecutive night flight. The aircraft was directed to land at runway 9, which had an older style of aircraft navigation system, so the pilots turned off the autopilot and began a manual descent. As the 8-month-old aircraft approached the runway, there was clearly something wrong. It was travelling too fast and too low. At exactly 6:10am, as the sun climbed above the horizon, the massive plane hit the ground at high speed about 900 metres short of the runway. The result was devastation. Very little remained of the aircraft, except for its tail. Every person on board lost their lives except miraculously for one nine-year-old Dutch boy.
Flight 8U771 contained 70 Dutch nationals, many of whom were blood relatives, who had been holidaying in South Africa. Five days after the crash, the Libyan authorities and the Netherlands Ministry of Foreign affairs both formally requested the Netherlands Forensic Institute (NFI) to assist. Their task was victim identification, and it was not easy. The Institute received muscle, bone and tooth fragments from 149 body parts. They were also given 25 personal items such as razor blades and clothes. Relatives of 76 of the missing persons provided 195 reference samples, mostly swabs from the inside of their cheeks. Seven of the missing persons had DNA profiles on record, plus nine more from relatives of missing persons. They had no reference samples at all for 19 people. DNA typing was used to profile every possible sample.
They were then faced with a combinatorial problem – how to use all the evidence to positively identify each missing person on the flight. It used to be a laborious manual process where each sample was matched against each missing person. With missing information, partial DNA profiles can easily produce similar matches for several people, especially if they are blood relatives. This can result in false hits – more than one missing person identified from the same evidence.
The solution was to use a brand new computer system: Bonaparte Disaster Victim Identification. The system was developed in collaboration with the NFI by researchers at the SNN Adaptive Intelligence Group in Radboud University Nijmegen, Netherlands. Bonaparte is a spin-off of the Bayesian inference technology that was developed by SNN as a partner in Pascal 2. The clever software uses Bayesian networks to model statistical relationships of genetic material of relatives. Its key innovation is to create Pedigrees – networks comprising families of genetically related people, with all available data used to calculate statistical likelihoods of the missing people being a member of each family. In this way data from many members of the same family can be used in combination to identify victims. “There is a large number of potential solutions involved and there is missing data,” says Technical Manager Willem Burgers. “Before the application of pedigree based matching, the one-to-one match used to result in a lot of false hits. It is only when you consider all the information at the same time that you eliminate the false hits.”
Bonaparte DVI was used by the Netherland Forensic Institute to screen and match all the samples gathered for the passengers of flight 8U771. 129 body parts were successfully matched to a missing person. All victims were identified through this process. Sadly, the exact cause of the flight 8U771 disaster remains unknown. The recent Libyan civil war has resulted in severe delays to the completion of the accident report.
The one-to-one match used to result in a lot of false hits. It is only when you consider all the information at the same time that you eliminate the false hits.
The machine learning software continues to be used with great success. More recently Bonaparte was used in the identification of the perpetrator in a 13-year-old case. In 1999, 16-year-old Marianne Vaatstra was brutally raped and murdered in a small village in a rural area in the northeast of the Netherlands. The case was widely covered by the press.
Although DNA of a man was found at the scene, years of research yielded no match with samples on file. However a recent change in legislation in April 2012 enabled the NFI to conduct a mass DNA screening of men who lived in the area at the time in order to find relatives and identify the killer. 6,500 men came forward, and after a little more than half had been processed, the killer was found. “This was a breakthrough,” says Burgers, “because it is the first time that a so-called ‘familial search’ has been applied in the Netherlands.”
The Bonaparte team are enthusiastic about their success and have plans to improve the system. “We plan to extend the model to include more parameters for more detailed computation,” says Burgers, “so it can be applied to different fields within forensic research, such as immigration cases. We may also add an extension to include other forms of forensic evidence.”
Bayesian networks in Bonaparte
Thomas Bayes was a mathematician born in London in 1702. Amongst his works, he wrote about probability. Instead of being concerned with, say, the probability of drawing a black ball from a bag of a certain number of black and white balls, Bayes was interested in the inverse probability of the event. In other words, if you had drawn more black balls compared to white balls from the bag, what was the probability that the bag contained more black balls and so you would draw another black one next? Given a hypothesis (for example that the next ball will be black) and some information about which balls have been picked previously, Bayes figured out the maths to infer the probability that the hypothesis was true or not. Several decades later, Frenchman Laplace developed these ideas further, creating a more general version of the Bayes theorem for use in astronomy and physics.
Bayesian networks, or belief networks, are networks with nodes comprising random variables in the Bayesian sense. The nodes might be observable properties or hypotheses. Two nodes linked by an edge means a conditional dependency between those nodes. In Bonaparte, the Bayesian network is made from the pedigree (network of family members). The DNA profile of each victim is placed in the pedigree (at positions specified by the relatives). These are hypotheses – does the victim fit in this family? A probability is computed. This probability divided by the random match probability is the likelihood ratio. This ratio is used to determine if the victim profile fits in the pedigree. If it does, the victim is identified.