Updated 8/16/2018

How are clade geographic origins computed?

- Filtering of samples
- Discard samples outside a latitude-longitude bounding box
- bounds:{'minLat': 0, 'maxLat': 75.0, 'minLon': -25.0, 'maxLon': 97.0}
- Next version will include the rest of Asia and Oceania west of the International Date Line - all I need is country geometry files to do this, contact me if you want to help. At this time I will introduce Y-haplogroups O and C.
- Subsequent version will incorporate the Americas, haplogroup Q, and require code change to handle International Date Line complications to pathing, shortest point and averaging computations

- Discard samples not tested to a terminal subclade or basal
- Avoids obscuring the migration path from a parent to a child clade in the case that a sample(s) used to compute parent clade turn out to be positive for child SNP

- Discard samples outside a latitude-longitude bounding box
- Compute initial positions for terminal subclades and basal clades
- Weighted average of latitude and longitude of samples where weight for each sample is the product of
- A regional sampling factor (inverse of number of all samples from any haplogroup from the area, using a normal distribution with standard deviation of 100km)
- Absolute world sampling map generated from this method.
- dark grey: < 3.125
- white: 3.125 - 12.5
- peach: 12.5-50
- orange: 50-200
- red: 200+

- Absolute world sampling map generated from this method.
- An age factor (linear function of sample age)

- A regional sampling factor (inverse of number of all samples from any haplogroup from the area, using a normal distribution with standard deviation of 100km)

- Weighted average of latitude and longitude of samples where weight for each sample is the product of
- Initialization of entire tree
- Starting at leaf nodes, go up the tree computing each clade's location as the average of its children's locations
- If there are basal samples, the computed basal clade position is treated as a child clade for above computation

- Directionality Refinement
- Starting at penultimate leaf nodes, go up the tree refining each clade's location with a pathing algorithm taking two parameters
- its parent clade's location
- the polygon defined by its child clades' location

- If the parent is inside the polygon, refine the clade's location as the midpoint between the precomputed locations of it and its parent
- If the parent is outside the polygon, refine the clade's location as the closest point between the parent and the child polygon. Example
- Use a different method to refine the root clade, as the above algorithm cannot be used given that the root has no parent
- Refine the root clade location as the point that minimizes total distance between itself and all children clades

- Starting at penultimate leaf nodes, go up the tree refining each clade's location with a pathing algorithm taking two parameters