Discussion:
[R-sig-phylo] Phylogenetic PCA and measurement error
Rafael S Marcondes
2018-03-11 22:06:02 UTC
Permalink
Dear all,

Does anyone have any advice on how to calculate measurement error in an
analysis using phylogenetic principal components? Or, in other words, after
I run a phylogenetic PCA on species-level data, how can I "project" my
individual-level data into the phylogenetic PCs so I can calculate a
standard error? I'm running my pPCA using the lambda method and the
covariance matrix.

I would think this would be an usual, simple procedure, and that there
would be an R function for it, but I can't for the life of me find
anything. The recent paper by Jonathan Drury et al linked below does
mention doing it, but without going into any detail.

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2003563

Thank you very much for any help,


*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)

Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA

Twitter: @brown_birds <https://twitter.com/brown_birds>

[[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Liam J. Revell
2018-03-11 22:16:05 UTC
Permalink
Hi Rafael.

So far as I know, there is currently no way to explicitly take into
account sampling error in computing principal components while also
accounting for the phylogeny. However, it is relatively straightforward
to compute scores for individuals from a PCA conducted on species means.

This would look as follows (in which Xm is a matrix containing values
for species for each trait, and Xi is a matrix with the same number of
columns but containing values for individuals):

pca<-phyl.pca(tree,Xm)
Si<-Xi%*%pca$Evec

Then, if you have a separate vector containing species ID as a factor,
you could compute means and variances for each component by species.

I hope this is some help. All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
& Profesor Asociado, Programa de Biología
Universidad del Rosario
web: http://faculty.umb.edu/liam.revell/
Post by Rafael S Marcondes
Dear all,
Does anyone have any advice on how to calculate measurement error in an
analysis using phylogenetic principal components? Or, in other words, after
I run a phylogenetic PCA on species-level data, how can I "project" my
individual-level data into the phylogenetic PCs so I can calculate a
standard error? I'm running my pPCA using the lambda method and the
covariance matrix.
I would think this would be an usual, simple procedure, and that there
would be an R function for it, but I can't for the life of me find
anything. The recent paper by Jonathan Drury et al linked below does
mention doing it, but without going into any detail.
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2003563
Thank you very much for any help,
*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Joe Felsenstein
2018-03-12 13:18:36 UTC
Permalink
Rafael and Liam --
So far as I know, there is currently no way to explicitly take into account
sampling error in computing principal components while also accounting for
the phylogeny. However, it is relatively straightforward to compute scores
for individuals from a PCA conducted on species means.
This would look as follows (in which Xm is a matrix containing values for
species for each trait, and Xi is a matrix with the same number of columns
pca<-phyl.pca(tree,Xm)
Si<-Xi%*%pca$Evec
Then, if you have a separate vector containing species ID as a factor, you
could compute means and variances for each component by species.
There are methods (not all implemented in R) for taking into
account the within-species phenotypic covariation among
individuals and also the evolutionary covariances between
species (on a phylogeny). These include the method of
Ives, Midford, and Garland (2007) and my own method
(2008). The former assumes you know the within-species
covariance matrix, the latter estimates it from the sampled
individuals for each species. Both of these assume that the
true within-species phenotypic covariance matrices are the
same for all species.

For my method, you can use Liam's package Rphylip to
call my program Contrast if you also have PHYLIP installed.

Once you infer these covariance matrices, those are the
relevant sufficient statistics (if the distributions are
multivariate normal). The PCA axes for either covariance
matrix can then be computed from those, or the
canonical variates axes, which are the principal components
of the between-species covariation relative to the
within-species covariation.

You don't need to use the PCA machinery until after you
estimate these two covariance matrices.

Joe
----
Joe Felsenstein ***@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Rafael S Marcondes
2018-03-12 16:11:07 UTC
Permalink
Hi Liam,

Thank you very much for the very fast response (as usual). To try out if I
was using your approach correctly, I ran it with the species-level data,
expecting that I would get the same PC scores as from phyl.PCA. That didn't
happen though. Am I understanding/doing something wrong?

pca<-phyl.pca(tree,Xm)
Si<-Xm%*%pca$Evec
Expected that Si would be identical to pca$S. But they aren't.

I have attached my species-level data, individual-level data, and output of
phyl.pca

Thank you,



*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)

Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
Post by Liam J. Revell
Hi Rafael.
So far as I know, there is currently no way to explicitly take into
account sampling error in computing principal components while also
accounting for the phylogeny. However, it is relatively straightforward
to compute scores for individuals from a PCA conducted on species means.
This would look as follows (in which Xm is a matrix containing values
for species for each trait, and Xi is a matrix with the same number of
pca<-phyl.pca(tree,Xm)
Si<-Xi%*%pca$Evec
Then, if you have a separate vector containing species ID as a factor,
you could compute means and variances for each component by species.
I hope this is some help. All the best, Liam
Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
& Profesor Asociado, Programa de Biología
Universidad del Rosario
web: http://faculty.umb.edu/liam.revell/
Post by Rafael S Marcondes
Dear all,
Does anyone have any advice on how to calculate measurement error in an
analysis using phylogenetic principal components? Or, in other words,
after
Post by Rafael S Marcondes
I run a phylogenetic PCA on species-level data, how can I "project" my
individual-level data into the phylogenetic PCs so I can calculate a
standard error? I'm running my pPCA using the lambda method and the
covariance matrix.
I would think this would be an usual, simple procedure, and that there
would be an R function for it, but I can't for the life of me find
anything. The recent paper by Jonathan Drury et al linked below does
mention doing it, but without going into any detail.
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2003563
Post by Rafael S Marcondes
Thank you very much for any help,
*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
Graham Slater
2018-03-12 16:29:45 UTC
Permalink
Hi Rafael,

You need to mean-center your traits before multiplying by the matrix of eigenvectors. Compute the vector of phylogenetic means (under BM or Pagel’s lambda), subtract each value from the relevant column of Xm and then compute Si. The result should be identical to the scores from your phylogenetic PCA.

Graham

------------------------------------------------------
Graham J. Slater
Assistant Professor
Department of the Geophysical Sciences
University of Chicago
5734 S. Ellis Avenue
Chicago, IL 60637 USA

Tel: (773) 702-0249
email: ***@uchicago.edu<mailto:***@uchicago.edu>
www.fourdimensionalbiology.com<http://www.fourdimensionalbiology.com>






On Mar 12, 2018, at 11:11 AM, Rafael S Marcondes <***@gmail.com<mailto:***@gmail.com>> wrote:

Hi Liam,

Thank you very much for the very fast response (as usual). To try out if I was using your approach correctly, I ran it with the species-level data, expecting that I would get the same PC scores as from phyl.PCA. That didn't happen though. Am I understanding/doing something wrong?

pca<-phyl.pca(tree,Xm)
Si<-Xm%*%pca$Evec
Expected that Si would be identical to pca$S. But they aren't.

I have attached my species-level data, individual-level data, and output of phyl.pca

Thank you,
--
Rafael Sobral Marcondes
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)

Museum of Natural Science<http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA

Twitter: @brown_birds<https://twitter.com/brown_birds>



On Sun, Mar 11, 2018 at 5:16 PM Liam J. Revell <***@umb.edu<mailto:***@umb.edu>> wrote:
Hi Rafael.

So far as I know, there is currently no way to explicitly take into
account sampling error in computing principal components while also
accounting for the phylogeny. However, it is relatively straightforward
to compute scores for individuals from a PCA conducted on species means.

This would look as follows (in which Xm is a matrix containing values
for species for each trait, and Xi is a matrix with the same number of
columns but containing values for individuals):

pca<-phyl.pca(tree,Xm)
Si<-Xi%*%pca$Evec

Then, if you have a separate vector containing species ID as a factor,
you could compute means and variances for each component by species.

I hope this is some help. All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
& Profesor Asociado, Programa de Biología
Universidad del Rosario
web: http://faculty.umb.edu/liam.revell/
Post by Rafael S Marcondes
Dear all,
Does anyone have any advice on how to calculate measurement error in an
analysis using phylogenetic principal components? Or, in other words, after
I run a phylogenetic PCA on species-level data, how can I "project" my
individual-level data into the phylogenetic PCs so I can calculate a
standard error? I'm running my pPCA using the lambda method and the
covariance matrix.
I would think this would be an usual, simple procedure, and that there
would be an R function for it, but I can't for the life of me find
anything. The recent paper by Jonathan Drury et al linked below does
mention doing it, but without going into any detail.
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2003563
Thank you very much for any help,
*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
<indivdata.RDS><spleveldata.RDS><pPCAresults.rds>_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org<mailto:R-sig-***@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/


[[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.or
Liam J. Revell
2018-03-12 17:07:02 UTC
Permalink
Graham's right of course. Sorry about that.

You might do something like:

Si<-(Xi-matrix(1,nrow(Xi),1)%*%phyl.vcv(Xm,vcv(tree),
1)$alpha[,1])%*%pca$Evec

I also agree with Joe that you can take the phylogeny into account
whilst accounting for sampling error using his approach or that of Ives
et al. In either case you will obtain a covariance matrix among traits
the decomposition of which could be employed to compute scores for
individuals in the original space as Joe suggests. I'm not sure whether
or not it will make a difference. That probably depends on how much
uncertainty in the values of species means has been ignored.

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
& Profesor Asociado, Programa de Biología
Universidad del Rosario
web: http://faculty.umb.edu/liam.revell/
Post by Graham Slater
Hi Rafael,
You need to mean-center your traits before multiplying by the matrix of
eigenvectors. Compute the vector of phylogenetic means (under BM or
Pagel’s lambda), subtract each value from the relevant column of Xm and
then compute Si. The result should be identical to the scores from your
phylogenetic PCA.
Graham
------------------------------------------------------
Graham J. Slater
Assistant Professor
Department of the Geophysical Sciences
University of Chicago
5734 S. Ellis Avenue
Chicago, IL 60637 USA
Tel: (773) 702-0249
www.fourdimensionalbiology.com <http://www.fourdimensionalbiology.com>
Post by Graham Slater
On Mar 12, 2018, at 11:11 AM, Rafael S Marcondes
Hi Liam,
Thank you very much for the very fast response (as usual). To try out
if I was using your approach correctly, I ran it with the
species-level data, expecting that I would get the same PC scores as
from phyl.PCA. That didn't happen though. Am I understanding/doing
something wrong?
pca<-phyl.pca(tree,Xm)
Si<-Xm%*%pca$Evec
Expected that Si would be identical to pca$S. But they aren't.
I have attached my species-level data, individual-level data, and output of phyl.pca
Thank you,
*--
*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
Hi Rafael.
So far as I know, there is currently no way to explicitly take into
account sampling error in computing principal components while also
accounting for the phylogeny. However, it is relatively
straightforward
to compute scores for individuals from a PCA conducted on species means.
This would look as follows (in which Xm is a matrix containing values
for species for each trait, and Xi is a matrix with the same number of
pca<-phyl.pca(tree,Xm)
Si<-Xi%*%pca$Evec
Then, if you have a separate vector containing species ID as a factor,
you could compute means and variances for each component by species.
I hope this is some help. All the best, Liam
Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
& Profesor Asociado, Programa de Biología
Universidad del Rosario
web: http://faculty.umb.edu/liam.revell/
Post by Rafael S Marcondes
Dear all,
Does anyone have any advice on how to calculate measurement
error in an
Post by Rafael S Marcondes
analysis using phylogenetic principal components? Or, in other
words, after
Post by Rafael S Marcondes
I run a phylogenetic PCA on species-level data, how can I
"project" my
Post by Rafael S Marcondes
individual-level data into the phylogenetic PCs so I can calculate a
standard error? I'm running my pPCA using the lambda method and the
covariance matrix.
I would think this would be an usual, simple procedure, and that
there
Post by Rafael S Marcondes
would be an R function for it, but I can't for the life of me find
anything. The recent paper by Jonathan Drury et al linked below does
mention doing it, but without going into any detail.
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2003563
Post by Rafael S Marcondes
Thank you very much for any help,
*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
       [[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
<indivdata.RDS><spleveldata.RDS><pPCAresults.rds>_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Rafael S Marcondes
2018-03-12 18:05:03 UTC
Permalink
Thank you very much to everyone that replied. R-sig-phylo, as usual, a very
helpful and friendly community! I got everything to work now. Jonathan
Drury also replied off-list with a similar approach in function form.

Cheers,

Rafael


*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)

Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
Post by Liam J. Revell
Graham's right of course. Sorry about that.
Si<-(Xi-matrix(1,nrow(Xi),1)%*%phyl.vcv(Xm,vcv(tree),
1)$alpha[,1])%*%pca$Evec
I also agree with Joe that you can take the phylogeny into account
whilst accounting for sampling error using his approach or that of Ives
et al. In either case you will obtain a covariance matrix among traits
the decomposition of which could be employed to compute scores for
individuals in the original space as Joe suggests. I'm not sure whether
or not it will make a difference. That probably depends on how much
uncertainty in the values of species means has been ignored.
All the best, Liam
Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
& Profesor Asociado, Programa de Biología
Universidad del Rosario
web: http://faculty.umb.edu/liam.revell/
Post by Graham Slater
Hi Rafael,
You need to mean-center your traits before multiplying by the matrix of
eigenvectors. Compute the vector of phylogenetic means (under BM or
Pagel’s lambda), subtract each value from the relevant column of Xm and
then compute Si. The result should be identical to the scores from your
phylogenetic PCA.
Graham
------------------------------------------------------
Graham J. Slater
Assistant Professor
Department of the Geophysical Sciences
University of Chicago
5734 S. Ellis Avenue
Chicago, IL 60637 USA
Tel: (773) 702-0249
www.fourdimensionalbiology.com <http://www.fourdimensionalbiology.com>
Post by Graham Slater
On Mar 12, 2018, at 11:11 AM, Rafael S Marcondes
Hi Liam,
Thank you very much for the very fast response (as usual). To try out
if I was using your approach correctly, I ran it with the
species-level data, expecting that I would get the same PC scores as
from phyl.PCA. That didn't happen though. Am I understanding/doing
something wrong?
pca<-phyl.pca(tree,Xm)
Si<-Xm%*%pca$Evec
Expected that Si would be identical to pca$S. But they aren't.
I have attached my species-level data, individual-level data, and output of phyl.pca
Thank you,
*--
*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
Hi Rafael.
So far as I know, there is currently no way to explicitly take into
account sampling error in computing principal components while also
accounting for the phylogeny. However, it is relatively
straightforward
to compute scores for individuals from a PCA conducted on species means.
This would look as follows (in which Xm is a matrix containing
values
Post by Graham Slater
Post by Graham Slater
for species for each trait, and Xi is a matrix with the same number
of
Post by Graham Slater
Post by Graham Slater
pca<-phyl.pca(tree,Xm)
Si<-Xi%*%pca$Evec
Then, if you have a separate vector containing species ID as a
factor,
Post by Graham Slater
Post by Graham Slater
you could compute means and variances for each component by species.
I hope this is some help. All the best, Liam
Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
& Profesor Asociado, Programa de Biología
Universidad del Rosario
web: http://faculty.umb.edu/liam.revell/
Post by Rafael S Marcondes
Dear all,
Does anyone have any advice on how to calculate measurement
error in an
Post by Rafael S Marcondes
analysis using phylogenetic principal components? Or, in other
words, after
Post by Rafael S Marcondes
I run a phylogenetic PCA on species-level data, how can I
"project" my
Post by Rafael S Marcondes
individual-level data into the phylogenetic PCs so I can
calculate a
Post by Graham Slater
Post by Graham Slater
Post by Rafael S Marcondes
standard error? I'm running my pPCA using the lambda method and
the
Post by Graham Slater
Post by Graham Slater
Post by Rafael S Marcondes
covariance matrix.
I would think this would be an usual, simple procedure, and that
there
Post by Rafael S Marcondes
would be an R function for it, but I can't for the life of me find
anything. The recent paper by Jonathan Drury et al linked below
does
Post by Graham Slater
Post by Graham Slater
Post by Rafael S Marcondes
mention doing it, but without going into any detail.
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2003563
Post by Graham Slater
Post by Graham Slater
Post by Rafael S Marcondes
Thank you very much for any help,
*--*
*Rafael Sobral Marcondes*
PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
Museum of Natural Science <http://sites01.lsu.edu/wp/mns/>
Louisiana State University
119 Foster Hall
Baton Rouge, LA 70803, USA
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
<indivdata.RDS><spleveldata.RDS><pPCAresults.rds>_______________________________________________
Post by Graham Slater
Post by Graham Slater
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
[[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Loading...