Weighbor 1.0.1-alpha and the triangle inequality

In an exactly additive distance matrix the triangle inequality is always satisfied. However, in distances derived from finite length sequences, and when back-substitution is present, the triangle inequality can be violated. The presence of triangle inequality violations often implies that recovery of the entire correct tree is not very likely. However, a robust method should always reconstruct a tree with a minimal number of incorrect branches.

By symmetry, any quantity that determines whether the pair (i,j) is the best pair to join should have the same value if the pair is described as (j,i). Equation 13 of the weighbor paper gives a formula for diP, the distance from leaf i to the internal node P which joins the branches to i, j and k. This formula includes cases to deal with violations of the triangle inequality. However, if combined with the equation djP = dij - diP, this approach is not symmetric in i and j. The amount of asymmetry is proportional to the amount by which the triangle inequality is violated. Such symmetry violation might not have any real significance, except that the weighbor 1.0.1-alpha code assumed the symmetry was valid, and for book-keeping convenience would effectively interchange i and j when a pair was joined. This meant that terms violating the triangle inequality, when added and later subtracted from the sums in equations 10-12, would not cancel themselves out. This could lead to variance estimates for non-additivity being significantly negative, and taxa involved in triangle inequality violations could be joined for no good reason.

In the updated version, whenever equation 13 gives diP = dik, if dik > djk then diP is replaced by dij - djk. This restores the symmetry between i and j. For good measure, we also make sure i and j are not swapped when updating sums after joining a pair.

Back