How are clade geographic origins computed?
Version 1.0 method:
A recursive algorithm computes a clade's origin as the average latitude and longitude of its children clade(s) origins. Kits that are basal (meaning confirmed negative of all known children clades) are considered by the algorithm as being a child clade and are displayed on the map in yellow. In the case that a clade does not have children or there are no kits confirmed positive for any children, the clade's origin is computed as the average latitude and longitude of all kits belonging to the clade.
This method has several advantages and drawbacks.
- To find the point that is closest to a set of points on the Earth is a computationally intensive problem that might not scale in the browser for large data sets. Using the average latitudes and longitudes of the points, while inexact, is generally a good approximation and scales as the number of kits and size of the tree increase.
- The algorithm is 'forward-looking' in that each clade's origin is computed solely by kits at the same level as or downstream of the clade. The benefit is that when computing any particular clade's origin, any inaccuracy that may be manifest in its computed parents' origins will not affect the child. The disadvantage is that by ignoring parent clade location information we lose potentially valuable insight into the directionality of the diversification of the clade in question.
- Bias toward more-represented countries is minimized by the policy of ignoring less-specifically tested kits when more specifically test kit data is present. In this way the bias toward more populous countries is confined to the subclade(s) confirmed to have presence there - the calculation of the parent clade weights each subclade equally regardless of the number of kits in each.
Version 2.0 method:
Uses computed clades from Version 1.0 method as starting point. Then, starting with leaf nodes, works backward to refine each clade's origin as the point minimizing the total distance between its parent to itself and between itself to all children.