Discussion:
[R-sig-phylo] R-sig-phylo Digest, Vol 121, Issue 2
Matthews, Luke
2018-02-09 15:05:35 UTC
Permalink
This is a reply to the message copied below from Roland Sookias

Hi Roland,
This depends slightly on the structure of the data. If you have crocodile tails with a set of discrete shape states in the same anatomical part, then I would recommend coding it as a multistate characteristic. MrBayes provides pretty flexible coding for multi-state characters, and I think will handle up to at least 10 states in a single character. There might be an implementation in R now that is equivalent but I haven't done this in while. Anyone else want to weigh in about an R implementation for multistate characters?

If you really have more than 10 states for a single character I'd be concerned about whether multiple observers can consistently code those states anyway.

If instead you have truly inapplicable characters, for example crocodiles without tails, and then ones with tails with various shapes, then I think the best approach is to have two characters. A presense/absence character and a shape character.

You also can binarize every state as separate characters, but as you say it introduces an implicit homology among qualitatively different absence states. I've never liked this approach, but there are some articles defending it. I think it's validity may depend on a relatively even frequency distribution of each state. Also, if you have other anatomical parts in the analysis, then you are effectively weighting more heavily those parts that you split into more binary characters - they will tend to drive the result. Some software allow you to explicitly correct character weights so that would solve this, but not all software provide that. Note - this binarizing by state is a common approach in phylogeny of language cognates - but I believe that has come about because of a tail wagging the dog problem. BEAST, unless they changed it recently, didn't support multistate character evolution and the language phylo people use BEAST not MrBayes.

There are a bunch of articles about this issue but I'd have to go dig them up. I think I cited them in some of my earlier papers that did anatomical and cultural phylogenetic work.

Luke


Message: 1
Date: Thu, 8 Feb 2018 17:27:54 +0100
From: Roland Sookias <***@gmail.com>
To: r-sig-***@r-project.org
Subject: [R-sig-phylo] Not inferring homology within "absence" state
in phylogenetic analysis
Message-ID:
<CA+PBJbmF2-_VE=4HuXZ610kh_Dtm2i4hK7CZkcfza2yN=***@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear all

Maybe someone has some insight here...

I am coming up against the problem, when it comes to phylogenetic analysis.
Basically I want to conduct a parsimony (or other phylogenetic) analysis where "inapplicable" scores are treated as separate states *for each taxon* .

I.e. I want to hypothesize shared ancestry for taxa scored with one state (let's say state 0), but not hypothesize shared ancestry for the other taxa. However, I still want to penalize a change in state from and to state 0.

There are three approaches which I have thought about, but none seems to fit the bill:

-Score all taxa not showing state 0 as separate states. This should do what I want, but the problem here is the limit on the number of states in most programmes.

-Scoring binary presence/absence. The problem is here that it could end up being parsimonious to group the "absence" state together, when there is no reason to infer homology within (i.e. for taxa scored with) this state.

-Score as 0 and inapplicable. The problem is this does not penalize a change from 0 to inapplicable.

A real life example, is the shape of the ilium in crocodiles. I want to say that it is likely that a particular curve in the dorsal margin of the ilium in crocodile-line crocodilians is homologous, but I don't want to hypothesize homology of those taxa "lacking" this state. They are all equally far from each other.

Thanks very much indeed

Best

Roland Sookias

[[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Eduardo Ascarrunz
2018-02-09 15:31:07 UTC
Permalink
Hi Luke,

This is going a bit off-topic, but BEAST2 can definitely work with
multistate characters. I think you need to manually edit the XML file,
though.

Cheers,

Eduardo
Post by Matthews, Luke
This is a reply to the message copied below from Roland Sookias
Hi Roland,
This depends slightly on the structure of the data. If you have crocodile
tails with a set of discrete shape states in the same anatomical part, then
I would recommend coding it as a multistate characteristic. MrBayes
provides pretty flexible coding for multi-state characters, and I think
will handle up to at least 10 states in a single character. There might be
an implementation in R now that is equivalent but I haven't done this in
while. Anyone else want to weigh in about an R implementation for
multistate characters?
If you really have more than 10 states for a single character I'd be
concerned about whether multiple observers can consistently code those
states anyway.
If instead you have truly inapplicable characters, for example crocodiles
without tails, and then ones with tails with various shapes, then I think
the best approach is to have two characters. A presense/absence character
and a shape character.
You also can binarize every state as separate characters, but as you say
it introduces an implicit homology among qualitatively different absence
states. I've never liked this approach, but there are some articles
defending it. I think it's validity may depend on a relatively even
frequency distribution of each state. Also, if you have other anatomical
parts in the analysis, then you are effectively weighting more heavily
those parts that you split into more binary characters - they will tend to
drive the result. Some software allow you to explicitly correct character
weights so that would solve this, but not all software provide that. Note -
this binarizing by state is a common approach in phylogeny of language
cognates - but I believe that has come about because of a tail wagging the
dog problem. BEAST, unless they changed it recently, didn't support
multistate character evolution and the language phylo people use BEAST not
MrBayes.
There are a bunch of articles about this issue but I'd have to go dig them
up. I think I cited them in some of my earlier papers that did anatomical
and cultural phylogenetic work.
Luke
Message: 1
Date: Thu, 8 Feb 2018 17:27:54 +0100
Subject: [R-sig-phylo] Not inferring homology within "absence" state
in phylogenetic analysis
mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear all
Maybe someone has some insight here...
I am coming up against the problem, when it comes to phylogenetic analysis.
Basically I want to conduct a parsimony (or other phylogenetic) analysis
where "inapplicable" scores are treated as separate states *for each taxon*
.
I.e. I want to hypothesize shared ancestry for taxa scored with one state
(let's say state 0), but not hypothesize shared ancestry for the other
taxa. However, I still want to penalize a change in state from and to state
0.
-Score all taxa not showing state 0 as separate states. This should do
what I want, but the problem here is the limit on the number of states in
most programmes.
-Scoring binary presence/absence. The problem is here that it could end up
being parsimonious to group the "absence" state together, when there is no
reason to infer homology within (i.e. for taxa scored with) this state.
-Score as 0 and inapplicable. The problem is this does not penalize a
change from 0 to inapplicable.
A real life example, is the shape of the ilium in crocodiles. I want to
say that it is likely that a particular curve in the dorsal margin of the
ilium in crocodile-line crocodilians is homologous, but I don't want to
hypothesize homology of those taxa "lacking" this state. They are all
equally far from each other.
Thanks very much indeed
Best
Roland Sookias
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-
[[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Martin Smith
2018-02-12 10:32:47 UTC
Permalink
Roland, I'm not quite sure that I follow exactly what you are proposing,
but you may be interested in our recent preprint on handling inapplicable
data:
https://www.biorxiv.org/content/early/2017/10/26/209775

This solution to inapplicable data is implemented in the R package
TreeSearch. I'm still working on making this package easier to use so
would be interested to hear whether you find it useful.

Cheers,

Martin


--

*Martin R. Smith*
Assistant Professor in Palaeontology
Department of Earth Sciences
Durham University
Mountjoy Site, South Road
Durham DH1 3LE

*T*: +44 191 334 2320
*M*: +44 774 353 7510
*E*: ***@durham.ac.uk
*Skype*: martin--smith

durham.ac.uk/earth.sciences/staff/academic/?id=14260
twitter.com/PalaeoSmith

The information in this e-mail and any attachments is confidential. It is
intended solely for the addressee or addressees. If you are not the
intended recipient please delete the message and any attachments and notify
the sender of mis-delivery. Any use or disclosure of the contents of either
is unauthorised and may be unlawful.
Although steps have been taken to ensure that this e-mail and any
attachments are free from any virus, we advise the recipient to ensure they
are indeed virus free. All liability for viruses is excluded to the fullest
extent permitted by law.
Post by Matthews, Luke
Message: 1
Date: Thu, 8 Feb 2018 17:27:54 +0100
Subject: [R-sig-phylo] Not inferring homology within "absence" state
in phylogenetic analysis
mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Dear all
Maybe someone has some insight here...
I am coming up against the problem, when it comes to phylogenetic analysis.
Basically I want to conduct a parsimony (or other phylogenetic) analysis
where "inapplicable" scores are treated as separate states *for each taxon*
.
I.e. I want to hypothesize shared ancestry for taxa scored with one state
(let's say state 0), but not hypothesize shared ancestry for the other
taxa. However, I still want to penalize a change in state from and to state
0.
There are three approaches which I have thought about, but none seems to
-Score all taxa not showing state 0 as separate states. This should do what
I want, but the problem here is the limit on the number of states in most
programmes.
-Scoring binary presence/absence. The problem is here that it could end up
being parsimonious to group the "absence" state together, when there is no
reason to infer homology within (i.e. for taxa scored with) this state.
-Score as 0 and inapplicable. The problem is this does not penalize a
change from 0 to inapplicable.
A real life example, is the shape of the ilium in crocodiles. I want to say
that it is likely that a particular curve in the dorsal margin of the ilium
in crocodile-line crocodilians is homologous, but I don't want to
hypothesize homology of those taxa "lacking" this state. They are all
equally far from each other.
Thanks very much indeed
Best
Roland Sookias
[[alternative HTML version deleted]]
[[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-***@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-***@r-project.org/
Loading...