mp100
Junior Member

Posts: 2
Registered: 7/19/2022
Member Is Offline
|
|
PCA
1. When I run PCA the results are missing samples.
Log:
KING starts at Wed Jul 20 01:32:34 2022
Loading genotype data in PLINK binary format...
Read in PLINK fam file PCA.fam...
PLINK pedigrees loaded: 69 samples
Read in PLINK bim file PCA.bim...
Genotype data consist of 593124 autosome SNPs, 3852 X-chromosome SNPs, 597 Y-chromosome SNPs
PLINK maps loaded: 597573 SNPs
Read in PLINK bed file PCA.bed...
PLINK binary genotypes loaded.
KING format genotype data successfully converted.
Options in effect:
--pca
PCA starts at Wed Jul 20 01:32:35 2022
Genotypes stored in 9268 words for each of 69 individuals.
Preparing matrix (26 x 26) for PCA....
187991 SNPs are used in PCA.
SVD starts at Wed Jul 20 01:32:36 2022
LAPACK is being used...
PCA ends at Wed Jul 20 01:32:36 2022
Largest 26 eigenvalues: 691.76 679.02 659.60 636.07 622.41 605.94 601.73 599.27 597.74 592.01 589.41 588.41 587.10 582.75 581.69 575.78 570.46 566.18
560.46 541.92 526.00 467.89 464.73 456.41 451.53 359.53
26 principal components saved in file kingpc.txt
KING ends at Wed Jul 20 01:32:36 2022
There are 69 samples in the input bed file but I only get 26 in the PCA output? I think this could be related to missingness or something similar but
I did run PLINK --maf with different values but that didnt seem to help.
Any idea what this could be?
|
|
theKING
Junior Member

Posts: 15
Registered: 6/1/2021
Member Is Offline
|
|
I think the problem here is on the genotype missingness in your dataset: 43 out of your 69 samples may have a somewhat low call rate (<95%). If you
can filter out your SNPs with a 95% (or even 99%) call rate filter prior to your PCA, then many more samples (if not all) would be kept. For this
issue, --maf option does not help as much as --geno
|
|
mp100
Junior Member

Posts: 2
Registered: 7/19/2022
Member Is Offline
|
|
Thanks, I will try that.
|
|