This function reads in the content from a genetic map file to translate physical distance to genetic units (i.e. cM). Regardless of the source, the input file must be sex-averaged and in a tab-separated "Plink" format (documentation) with the following four columns and no header (i.e. no column names):
Chromosome
Identifier (ignored in read_map()
)
Length (genetic length within the physical position boundary)
Position (physical position boundary)
The columns must be in the order above. Note that only the first, third, and fourth columns are used in the function.
read_map(file)
Input file path
A tibble containing 3 columns:
chr (chromosome)
value (genetic length within the physical position boundary)
bp (physical position boundary)
The genetic map could come from different sources. One source is the HapMap map distributed by the Browning Lab (documentation). If this map file is used, the non-sex chromosomes can be downloaded and concatenated to a single file as follows:
://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/plink.GRCh37.map.zip
wget http
unzip plink.GRCh37.map.zip*chr[0-9]*GRCh37.map | sort -k1,1 -k4,4 --numeric-sort > plink.allchr.GRCh37.map cat
Another source is a sex-specific map ("bherer") originally published by Bherer et al and recommended by the developers of ped-sim
for simulating IBD segments (documentation). To retrieve and prep this map file for simulation:
# Get the refined genetic map and extract
--no-check-certificate https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination/raw/master/Refined_genetic_map_b37.tar.gz
wget
tar xvfpz Refined_genetic_map_b37.tar.gz
# Format for ped-sim as per https://github.com/williamslab/ped-sim#map-file-
"#chr\tpos\tmale_cM\tfemale_cM\n" > sexspec.pedsim.map
printf for chr in {1..22}; do
/male_chr$chr.txt Refined_genetic_map_b37/female_chr$chr.txt \
paste Refined_genetic_map_b37| awk -v OFS="\t" 'NR > 1 && $2 == $6 {print $1,$2,$4,$8}' \
| sed 's/^chr//' >> sexspec.pedsim.map;
done
# Clean up
-rf Refined_genetic_map_b37* rm
After this, the sexspec.pedsim.map
file is ready for use in simulation. However, it must be averaged and reformatted to "Plink format" to use here:
| grep -v "^#" | awk -v OFS="\t" '{print $1,".",($3+$4)/2,$2}' > sexspec-avg.plink.map cat sexspec.pedsim.map
#' The genetic maps created above are in the tens of megabytes size range. This is trivial to store for most systems but a reduced version would increase portability and ease testing. This "minimum viable genetic map" could be used for testing and as installed package data in an R package for example analysis. Read more about minimum viable genetic maps at:
Blog post: https://hapi-dna.org/2020/11/minimal-viable-genetic-maps/
Github repo with python code: https://github.com/williamslab/min_map
The code as written below reduces the averaged sex-specific genetic map from 833776 to 28726 positions (~30X reduction!).
# Get minmap script from github
://raw.githubusercontent.com/williamslab/min_map/main/min_map.py
wget https
# Create empty minmap
-n > sexspec-avg-min.plink.map
echo
# For each autosome...
for chr in {1..22}; do
"Working on chromosome $chr..."
echo # First pull out just one chromosome
"^${chr}[[:space:]]" sexspec-avg.plink.map > tmp.${chr}
grep # Run the python script on that chromosome.
# The genetic map column is 3rd column (2nd in 0-start). Physical position is last column (3 in 0-based)
-mapfile tmp.${chr} -chr ${chr} -genetcol 2 -physcol 3 -noheader -error 0.05
python3 min_map.py # Strip out the header and reformat back to plink format, and append to minmap file
${chr}.txt | grep -v "^#" | awk -v OFS="\t" '{print $1,".",$4,$2}' >> sexspec-avg-min.plink.map
cat min_viable_map# Clean up
-f min_viable_map${chr}.txt tmp.${chr}
rm done
This averaged version of the Bherer sex-specific map, reduced to a minimum viable genetic map with at most 5% error, in Plink format, is available as installed package data (see examples). This is useful for testing code, but the full genetic map should be used for most analysis operations.
http://zzz.bwh.harvard.edu/plink/data.shtml#map
http://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/
https://github.com/williamslab/ped-sim#map-file
https://www.nature.com/articles/ncomms14994
https://www.nature.com/articles/ncomms14994
https://github.com/cbherer/Bherer_etal_SexualDimorphismRecombination
gmapfile <- system.file("extdata", "sexspec-avg-min.plink.map", package="skater", mustWork=TRUE)
gmap <- read_map(gmapfile)