Matthews, Luke
2018-02-09 15:05:35 UTC
This is a reply to the message copied below from Roland Sookias
Hi Roland,
This depends slightly on the structure of the data. If you have crocodile tails with a set of discrete shape states in the same anatomical part, then I would recommend coding it as a multistate characteristic. MrBayes provides pretty flexible coding for multi-state characters, and I think will handle up to at least 10 states in a single character. There might be an implementation in R now that is equivalent but I haven't done this in while. Anyone else want to weigh in about an R implementation for multistate characters?
If you really have more than 10 states for a single character I'd be concerned about whether multiple observers can consistently code those states anyway.
If instead you have truly inapplicable characters, for example crocodiles without tails, and then ones with tails with various shapes, then I think the best approach is to have two characters. A presense/absence character and a shape character.
You also can binarize every state as separate characters, but as you say it introduces an implicit homology among qualitatively different absence states. I've never liked this approach, but there are some articles defending it. I think it's validity may depend on a relatively even frequency distribution of each state. Also, if you have other anatomical parts in the analysis, then you are effectively weighting more heavily those parts that you split into more binary characters - they will tend to drive the result. Some software allow you to explicitly correct character weights so that would solve this, but not all software provide that. Note - this binarizing by state is a common approach in phylogeny of language cognates - but I believe that has come about because of a tail wagging the dog problem. BEAST, unless they changed it recently, didn't support multistate character evolution and the language phylo people use BEAST not MrBayes.
There are a bunch of articles about this issue but I'd have to go dig them up. I think I cited them in some of my earlier papers that did anatomical and cultural phylogenetic work.
Luke
Message: 1
Date: Thu, 8 Feb 2018 17:27:54 +0100
From: Roland Sookias <***@gmail.com>
To: r-sig-***@r-project.org
Subject: [R-sig-phylo] Not inferring homology within "absence" state
in phylogenetic analysis
Message-ID:
<CA+PBJbmF2-_VE=4HuXZ610kh_Dtm2i4hK7CZkcfza2yN=***@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear all
Maybe someone has some insight here...
I am coming up against the problem, when it comes to phylogenetic analysis.
Basically I want to conduct a parsimony (or other phylogenetic) analysis where "inapplicable" scores are treated as separate states *for each taxon* .
I.e. I want to hypothesize shared ancestry for taxa scored with one state (let's say state 0), but not hypothesize shared ancestry for the other taxa. However, I still want to penalize a change in state from and to state 0.
There are three approaches which I have thought about, but none seems to fit the bill:
-Score all taxa not showing state 0 as separate states. This should do what I want, but the problem here is the limit on the number of states in most programmes.
-Scoring binary presence/absence. The problem is here that it could end up being parsimonious to group the "absence" state together, when there is no reason to infer homology within (i.e. for taxa scored with) this state.
-Score as 0 and inapplicable. The problem is this does not penalize a change from 0 to inapplicable.
A real life example, is the shape of the ilium in crocodiles. I want to say that it is likely that a particular curve in the dorsal margin of the ilium in crocodile-line crocodilians is homologous, but I don't want to hypothesize homology of those taxa "lacking" this state. They are all equally far from each other.
Thanks very much indeed
Best
Roland Sookias
[[alternative HTML version deleted]]
_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Hi Roland,
This depends slightly on the structure of the data. If you have crocodile tails with a set of discrete shape states in the same anatomical part, then I would recommend coding it as a multistate characteristic. MrBayes provides pretty flexible coding for multi-state characters, and I think will handle up to at least 10 states in a single character. There might be an implementation in R now that is equivalent but I haven't done this in while. Anyone else want to weigh in about an R implementation for multistate characters?
If you really have more than 10 states for a single character I'd be concerned about whether multiple observers can consistently code those states anyway.
If instead you have truly inapplicable characters, for example crocodiles without tails, and then ones with tails with various shapes, then I think the best approach is to have two characters. A presense/absence character and a shape character.
You also can binarize every state as separate characters, but as you say it introduces an implicit homology among qualitatively different absence states. I've never liked this approach, but there are some articles defending it. I think it's validity may depend on a relatively even frequency distribution of each state. Also, if you have other anatomical parts in the analysis, then you are effectively weighting more heavily those parts that you split into more binary characters - they will tend to drive the result. Some software allow you to explicitly correct character weights so that would solve this, but not all software provide that. Note - this binarizing by state is a common approach in phylogeny of language cognates - but I believe that has come about because of a tail wagging the dog problem. BEAST, unless they changed it recently, didn't support multistate character evolution and the language phylo people use BEAST not MrBayes.
There are a bunch of articles about this issue but I'd have to go dig them up. I think I cited them in some of my earlier papers that did anatomical and cultural phylogenetic work.
Luke
Message: 1
Date: Thu, 8 Feb 2018 17:27:54 +0100
From: Roland Sookias <***@gmail.com>
To: r-sig-***@r-project.org
Subject: [R-sig-phylo] Not inferring homology within "absence" state
in phylogenetic analysis
Message-ID:
<CA+PBJbmF2-_VE=4HuXZ610kh_Dtm2i4hK7CZkcfza2yN=***@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear all
Maybe someone has some insight here...
I am coming up against the problem, when it comes to phylogenetic analysis.
Basically I want to conduct a parsimony (or other phylogenetic) analysis where "inapplicable" scores are treated as separate states *for each taxon* .
I.e. I want to hypothesize shared ancestry for taxa scored with one state (let's say state 0), but not hypothesize shared ancestry for the other taxa. However, I still want to penalize a change in state from and to state 0.
There are three approaches which I have thought about, but none seems to fit the bill:
-Score all taxa not showing state 0 as separate states. This should do what I want, but the problem here is the limit on the number of states in most programmes.
-Scoring binary presence/absence. The problem is here that it could end up being parsimonious to group the "absence" state together, when there is no reason to infer homology within (i.e. for taxa scored with) this state.
-Score as 0 and inapplicable. The problem is this does not penalize a change from 0 to inapplicable.
A real life example, is the shape of the ilium in crocodiles. I want to say that it is likely that a particular curve in the dorsal margin of the ilium in crocodile-line crocodilians is homologous, but I don't want to hypothesize homology of those taxa "lacking" this state. They are all equally far from each other.
Thanks very much indeed
Best
Roland Sookias
[[alternative HTML version deleted]]
_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/