Authors:
M. Dale Griffith University School of Environment Nathan Queensland Australia 4111

Search for other papers by M. Dale in
Current site
Google Scholar
PubMed
Close
,
L. Allison Monash University Dept. Computer Science and Software Engineering Clayton Victoria Australia

Search for other papers by L. Allison in
Current site
Google Scholar
PubMed
Close
, and
P. Dale Griffith University School of Environment Nathan Queensland Australia 4111

Search for other papers by P. Dale in
Current site
Google Scholar
PubMed
Close
Restricted access

Many methods of cluster analysis do not explicitly account for correlation between attributes. In this paper we explicitly model any correlation using a single factor within each cluster: i.e., the correlation of atributes within each cluster is adequately described by a single component axis. However, the use of a factor is not required in every cluster. Using a Minimum Message Length criterion, we can determine the number of clusters and also whether the model of any cluster is improved by introducing a factor. The technique allows us to seek clusters which reflect directional changes rather than imposing a zonation constrained by spatial (and implicitly temporal) position. Minimal message length is a means of utilising Okham’s Razor in inductive analysis. The ‘best’ model is that which allows most compression of the data, which results in a minimal message length for the description. Fit to the data is not a sufficient criterion for choosing models because more complicated models will almost always fit better. Minimum message length combines fit to the data with an encoding of the model and provides a Bayesian probability criterion as a means of choosing between models (and classes of model). Applying the analysis to a pollen diagram from Southern Chile, we find that the introduction of factors does not improve the overall quality of the mixture model. The solution without axes in any cluster provides the most parsimonious solution. Examining the cluster with the best case for a factor to be incorporated in its description shows that the attributes highly loaded on the axis represent a contrast of herbaceous vegetation and dominant forests types. This contrast is also found when fitting the entire population, and in this case the factor solution is the preferred model. Overall, the cluster solution without factors is much preferred. Thus, in this case classification is preferred to ordination although more data are desirable to confirm such a conclusion.

Supplementary Materials

    • Supplementary Material
  • Agusta, Y. and Dowe, D. L. 2003. Unsupervised learning of correlated multivariate Gaussian mixture models. Lecture Notes in Artificial Intelligence 2903, Springer-Verlag, Berlin. pp. 477–489.

    Dowe D. L. , '', in Unsupervised learning of correlated multivariate Gaussian mixture models , (2003 ) -.

  • Aitchison, J. and Kay, J. W. 2003. Possible solutions of some essential zero problems. In: Compositional Data Analysis . Compositional Data Analysis Workshop, Universitat de Girona. pp. 1–6.

  • Berryman, A. A. 1992. On choosing models for describing and analyzing ecological time series. Ecology 73: 694–698.

    Berryman A. A. , 'On choosing models for describing and analyzing ecological time series ' (1992 ) 73 Ecology : 694 -698 .

    • Search Google Scholar
  • Amari, S. and Nagaoka, H. 2000. Methods of Information Geometry Translations of Mathematical Monographs, American Mathematical Society and Oxford University Press, Oxford.

    Nagaoka H. , '', in Methods of Information Geometry , (2000 ) -.

  • Balasubramanian, V. 1997. Statistical inference, Occam’s razor, and statistical mechanics on the space of probability distributions. Neural Computation 9: 349–368.

    Balasubramanian V. , 'Statistical inference, Occam’s razor, and statistical mechanics on the space of probability distributions ' (1997 ) 9 Neural Computation : 349 -368 .

    • Search Google Scholar
  • Bennett, K. D. and Porter, C. 2001. Late Quarternary dynamics of Western Tierra del Fuego. Uppsala Universitet: http://www.geo.uu.se Institutionen för geovetenskaper: Paleobiologi: forskning

  • Berryman, A. A. 1992. On choosing models for describing and analyzing ecological time series. Ecology 73: 694–698.

    Berryman A. A. , 'On choosing models for describing and analyzing ecological time series ' (1992 ) 73 Ecology : 694 -698 .

    • Search Google Scholar
  • Bezdek, J.C., Coray, C., Gunderson, R. and Watson, J. 1981a. Detection and characterization of cluster substructure I. linear structure: fuzzy c-lines. SIAM J. App. Math. 40: 339–357.

    Watson J. , 'Detection and characterization of cluster substructure I. linear structure: fuzzy c-lines ' (1981 ) 40 SIAM J. App. Math. : 339 -357 .

    • Search Google Scholar
  • Bezdek, J.C., Coray, C., Gunderson, R. and Watson, J. 1981b. Detection and characterization of cluster substructure II Fuzzy cvarieties and convex combinations thereof. SIAMJ. App. Mathe. 40: 358–372.

    Watson J. , 'Detection and characterization of cluster substructure II Fuzzy cvarieties and convex combinations thereof ' (1981 ) 40 SIAMJ. App. Mathe. : 358 -372 .

    • Search Google Scholar
  • Birks, H. J. B. and Gordon, A. D. 1985. Numerical methods in Quaternary Pollen Analysis . Academic Press, London.

    Gordon A. D. , '', in Numerical methods in Quaternary Pollen Analysis , (1985 ) -.

  • Boulton, D. M. and Wallace, C. S. 1970. A program for numerical classification. Computer J. 13: 63–69.

    Wallace C. S. , 'A program for numerical classification ' (1970 ) 13 Computer J. : 63 -69 .

    • Search Google Scholar
  • Browne, M.W and Zhang, G. 2005. DyFA: Dynamic Factor Analysis of Lagged Correlation Matrices Version 2.03 [Computer Software and Manual]. http://quantrm2.psy.ohio-state.edu/browne

  • Crutchfield, J. P. and Young, K. 1989. Inferring statistical complexity. Physical Rev. Lett. 63: 105–108.

    Young K. , 'Inferring statistical complexity ' (1989 ) 63 Physical Rev. Lett. : 105 -108 .

    • Search Google Scholar
  • Dale, M. B. 2000. Mt Glorious Revisited: Secondary succession in subtropical rainforest Community Ecol. 1: 181–193

    Dale M. B. , 'Mt Glorious Revisited: Secondary succession in subtropical rainforest ' (2000 ) 1 Community Ecol. : 181 -193 .

    • Search Google Scholar
  • Dale, M. B. 2001. Minimal message length clustering, environmental heterogeneity and the variable Poisson model. Community Ecol. 2: 171–180

    Dale M. B. , 'Minimal message length clustering, environmental heterogeneity and the variable Poisson model ' (2001 ) 2 Community Ecol. : 171 -180 .

    • Search Google Scholar
  • Dale, M. B. 2007. Changes in the model of within-cluster distribution of attributes and their effects on cluster analysis of vegetation data. Community Ecol. 8: 9–14.

    Dale M. B. , 'Changes in the model of within-cluster distribution of attributes and their effects on cluster analysis of vegetation data ' (2007 ) 8 Community Ecol. : 9 -14 .

    • Search Google Scholar
  • Dale, M. B., Allison, L. and Dale, P. E. R. 2007. Segmentation and clustering as complementary sources of information. Acta Oecol. 31:193–202.

    Dale P. E. R. , 'Segmentation and clustering as complementary sources of information ' (2007 ) 31 Acta Oecol. : 193 -202 .

    • Search Google Scholar
  • Dale, M. B., Allison, L. and Dale, P. E. R. submitted. Attribute properties and clustering procedures: an example using pollen analysis.

  • Dale, M. B., Dale, P. E. R. and Edgoose, T. 2002. Markov models for incorporating temporal dependence. Acta Oecol. 23:261–269

    Edgoose T. , 'Markov models for incorporating temporal dependence ' (2002 ) 23 Acta Oecol. : 261 -269 .

    • Search Google Scholar
  • Dale, M. B., Salmina, L. and Mucina, L. 2001. Minimum message length clustering: an explication and some applications to vegetation data. Community Ecol. 2: 231–247

    Mucina L. , 'Minimum message length clustering: an explication and some applications to vegetation data ' (2001 ) 2 Community Ecol. : 231 -247 .

    • Search Google Scholar
  • Dale, M. B. and Walker, D. 1970. Information analysis of pollen diagrams. Pollen et Spores 2: 21–37.

    Walker D. , 'Information analysis of pollen diagrams ' (1970 ) 2 Pollen et Spores : 21 -37 .

    • Search Google Scholar
  • Dale, M. B. and Wallace, C. S. 2005. Hierarchical clusters of vegetation types. Community Ecol. 6: 57–74.

    Wallace C. S. , 'Hierarchical clusters of vegetation types ' (2005 ) 6 Community Ecol. : 57 -74 .

    • Search Google Scholar
  • Edgoose, T. and Allison, L. 1999. MML Markov classification of sequential data. Statistics and Computing 9: 269–278

    Allison L. , 'MML Markov classification of sequential data ' (1999 ) 9 Statistics and Computing : 269 -278 .

    • Search Google Scholar
  • Edwards, R. T. and D. L. Dowe 1998. Single factor analysis inMML mixture modelling. Lecture Notes in Artificial Intelligence (LNAI) 1394, Springer-Verlag, Berlin. pp. 96–109.

    Dowe D. L. , '', in Lecture Notes in Artificial Intelligence , (1998 ) -.

  • Georgieff, M. P. and Wallace, C. S. 1984. A general selection criterion for inductive inference. Proceedings 6 th European Conference Artificial Intelligence, (ECAI-84) Pisa. pp. 473–482.

  • Gordon, A.D. and Birks, H.J.B. 1972. Numerical methods in Quaternary palaeoecology. I. Zonation of pollen diagrams. New Phytol. 71:961–979.

    Birks H.J.B. , 'Numerical methods in Quaternary palaeoecology. I. Zonation of pollen diagrams ' (1972 ) 71 New Phytol. : 961 -979 .

    • Search Google Scholar
  • Gower, J. C. 1974 Maximal predictive classification. Biometrics 30:643–654.

    Gower J. C. , 'Maximal predictive classification ' (1974 ) 30 Biometrics : 643 -654 .

  • Green, D. G. 1983a. Interactive pollen time series analysis. Pollen et Spores 25: 531–540.

    Green D. G. , 'Interactive pollen time series analysis ' (1983 ) 25 Pollen et Spores : 531 -540 .

    • Search Google Scholar
  • Green, D. G. 1983b. The ecological interpretation of fine resolution pollen records. The New Phytol. 94: 459–477.

    Green D. G. , 'The ecological interpretation of fine resolution pollen records ' (1983 ) 94 The New Phytol. : 459 -477 .

    • Search Google Scholar
  • Ihm, P. and van Groenewoud, H. 1975 A multivariate ordering of vegetation data based on Gaussian type gradient response curves J. Ecol. 63: 767–777.

    Groenewoud H. , 'A multivariate ordering of vegetation data based on Gaussian type gradient response curves ' (1975 ) 63 J. Ecol. : 767 -777 .

    • Search Google Scholar
  • Jörnsten, R. and Bin Yu. 2003. Simultaneous gene clustering and subset selection for sample classification via. Bioinformatics 19: 1100–1111.

    Bin Yu. , 'Simultaneous gene clustering and subset selection for sample classification via ' (2003 ) 19 Bioinformatics : 1100 -1111 .

    • Search Google Scholar
  • Kodratoff, Y. 1986. Leçons d’apprentissage symbolique , Editions Cépadues, Toulouse.

    Kodratoff Y. , '', in Leçons d’apprentissage symbolique , (1986 ) -.

  • Lafferty, J., McCallum, A. and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: Proceedings 18th International Conference on Machine Learning (ICML 2001) , Morgan Kaufmann, San Francisco. pp. 282–289.

    Pereira F. , '', in Proceedings 18th International Conference on Machine Learning (ICML 2001) , (2001 ) -.

  • Legendre, P. and Gallagher, E. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 270: 271–280.

    Gallagher E. , 'Ecologically meaningful transformations for ordination of species data ' (2001 ) 270 Oecologia : 271 -280 .

    • Search Google Scholar
  • Li, C. Biswas, G., Dale, M. B. and Dale, P. E. R. 2001. Building models of ecological dynamics using HMM-based temporal data clustering. In: Advances in Intelligent Data Analysis, the 4th International Conference on Intelligent Data Analysis, Lecture Notes in Computer Science Series 2189, Springer, Berlin. pp. 53–62.

    Dale P. E. R. , '', in Building models of ecological dynamics using HMM-based temporal data clustering , (2001 ) -.

  • Liu, B., Hsu, W., Mun, L-F. and Lee, H-Y.. 1999. Finding interesting patterns using user expectation. I.E.E.E. Trans. Knowledge and Data Engineering 11: 817–832.

    Lee H.-Y. , 'Finding interesting patterns using user expectation ' (1999 ) 11 I.E.E.E. Trans. Knowledge and Data Engineering : 817 -832 .

    • Search Google Scholar
  • Mac Nally, R. 2000. Regression and model-building in conservation biology, biogeography and ecology: the distinction between — and reconciliation of — ‘predictive’ and ‘explanatory’ models. Biodivers. Conserv. 9: 655–671.

    Mac Nally R. , 'Regression and model-building in conservation biology, biogeography and ecology: the distinction between — and reconciliation of — ‘predictive’ and ‘explanatory’ models ' (2000 ) 9 Biodivers. Conserv. : 655 -671 .

    • Search Google Scholar
  • Markgraf, V. 1983. Late and Postglacial vegetational and paleoclimatic changes in subantarctic, temperate, and arid environments in Argentina. Palynology 7: 43–70.

    Markgraf V. , 'Late and Postglacial vegetational and paleoclimatic changes in subantarctic, temperate, and arid environments in Argentina ' (1983 ) 7 Palynology : 43 -70 .

    • Search Google Scholar
  • Paez M. M., Schäbitz, F. and Stutz, S. 2001. Modern pollen-vegetation and isopoll maps in southern Argentina. J. Biogeogr. 28: 997–1021.

    Stutz S. , 'Modern pollen-vegetation and isopoll maps in southern Argentina ' (2001 ) 28 J. Biogeogr. : 997 -1021 .

    • Search Google Scholar
  • Rahwan, T. and Jennings, N. R. 2008. An improved dynamic programming algorithm for coalition structure generation. In: L. Padgham, D. C. Parkes, J. Mueller and S. Parsons (eds.), Proceedings 7th International Conference on Autonomous Agents and Multiagent systems (AAMAS) , Estoril, Portugal. pp. 1417–1420.

  • Rissanen, J. J. 1978. Modelling by shortest data description. Automation 14: 465–471.

    Rissanen J. J. , 'Modelling by shortest data description ' (1978 ) 14 Automation : 465 -471 .

    • Search Google Scholar
  • Schader, M. 1979 Branch and bound clustering with a generalised scatter criterion Oper. Res. Verfahren 30: 154–162

    Schader M. , 'Branch and bound clustering with a generalised scatter criterion ' (1979 ) 30 Oper. Res. Verfahren : 154 -162 .

    • Search Google Scholar
  • Schmidhuber, J. 1997. What’s interesting? Tech. Rep. IDSIA-35-97, IDSIA, Lugano, Switzerland.

    Schmidhuber J. , '', in What’s interesting? , (1997 ) -.

  • Shalizi, C. R. and Crutchfield, J. P. 2001. Computational mechanics: pattern and prediction, structure and simplicity J. Stat. Phys. 104: 819–881.

    Crutchfield J. P. , 'Computational mechanics: pattern and prediction, structure and simplicity ' (2001 ) 104 J. Stat. Phys. : 819 -881 .

    • Search Google Scholar
  • Sombattheera, C. and Ghose, A. 2008 A best-first anytime algorithm for computing optimal coalition structures. In: L. Padgham, D. C. Parkes, J. Mueller and S. Parsons (ed.), Proceedings 7thInternational Conference on Autonomous Agents and Multiagent systems (AAMAS) , Estoril, Portugal. pp. 1425–1427.

    Ghose A. , '', in Proceedings 7thInternational Conference on Autonomous Agents and Multiagent systems (AAMAS) , (2008 ) -.

  • Vinod, H. D. 1969 Integer programming and the theory of grouping Amer. Stat. Ass. J. 64: 506–519.

    Vinod H. D. , 'Integer programming and the theory of grouping ' (1969 ) 64 Amer. Stat. Ass. J. : 506 -519 .

    • Search Google Scholar
  • Visser, G. and Dowe, D. L. 2007. Minimum message length clustering of spatially-correlated data with varying inter-class penalties. 6th IEEE International Conference on Computer and Information Science (ICIS 2007) , 11–13 July 2007, Melbourne, Australia, pp. 17–22.

  • Walker, D. 1966. The late Quaternary history of the Cumberland lowlands. Philosophical Transactions Royal Society 251:1–210.

    Walker D. , 'The late Quaternary history of the Cumberland lowlands ' (1966 ) 251 Philosophical Transactions Royal Society : 1 -210 .

    • Search Google Scholar
  • Wallace, C. S. 1995. Multiple factor analysis by MML estimation . Technical Report 95/218, Dept Computer Science, Monash University, Clayton, Victoria 3168, Australia. 21pp.

    Wallace C. S. , '', in Multiple factor analysis by MML estimation , (1995 ) -.

  • Wallace, C. S. 1998. Intrinsic classification of spatially-correlated data. Computer J. 41: 602–611.

    Wallace C. S. , 'Intrinsic classification of spatially-correlated data ' (1998 ) 41 Computer J. : 602 -611 .

    • Search Google Scholar
  • Wallace, C. S. 2005. Statistical and Inductive Inference by Minimum Message Length . Springer, Berlin.

    Wallace C. S. , '', in Statistical and Inductive Inference by Minimum Message Length , (2005 ) -.

  • Wallace, C. S. and Freeman, P. R. 1992. Single-factor analysis by minimal message length estimation. J. Roy. Stat. Soc . B 54:195–209.

    Freeman P. R. , 'Single-factor analysis by minimal message length estimation ' (1992 ) 54 J. Roy. Stat. Soc. B : 195 -209 .

    • Search Google Scholar
  • Wallace, C. S. and Georgieff, M. P. 1983. A general objective for inductive inference . Technical Report 32, Department Computer Science, Monash University, Clayton, Victoria 3168, Australia.

    Georgieff M. P. , '', in A general objective for inductive inference , (1983 ) -.

  • Westhoff, V., and E. van der Maarel. 1978. The Braun-Blanquet approach. In: R. H. Whittaker (ed.), Classification of Plant Communities . Dr. W. Junk, Den Haag. pp. 287–399.

    Maarel E. , '', in Classification of Plant Communities , (1978 ) -.

  • Yamada, H. and S. Amaroso. 1971. Structural and behavioural equivalences of tessellation automata. Information and Control 18:1–31.

    Amaroso S. , 'Structural and behavioural equivalences of tessellation automata ' (1971 ) 18 Information and Control : 1 -31 .

    • Search Google Scholar
  • Collapse
  • Expand

To see the editorial board, please visit the website of Springer Nature.

Manuscript Submission: HERE

For subscription options, please visit the website of Springer Nature.

Community Ecology
Language English
Size A4
Year of
Foundation
2000
Volumes
per Year
1
Issues
per Year
3
Founder Akadémiai Kiadó
Founder's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245
Publisher Akadémiai Kiadó
Springer Nature Switzerland AG
Publisher's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
CH-6330 Cham, Switzerland Gewerbestrasse 11.
Responsible
Publisher
Chief Executive Officer, Akadémiai Kiadó
ISSN 1585-8553 (Print)
ISSN 1588-2756 (Online)