In this paper, we examine the application of a particular approach to induction, the minimum message length principle and illustrate some of the problems that can be addressed through its use. The MML principle seeks to identify an optimal model within some specified parameterised class of models and for this paper we have chosen to concentrate on a single model class, that of mixture separation or fuzzy clustering. The first section presents, in outline, an MML methodology for fuzzy clustering. We then present some applications, including the nature of the within-cluster model, examination of the univocality of results for different groups of species and the effectiveness of presence data compared to purely quantitative data. Finally, we examine some possibilities of extending MML methodology to include within-class correlation of species, the existence of dependence between observed samples and the comparison of different classes of models.
Chatfield, C. 1995. Model uncertainty, data mining and statistical inference J. Royal Statistical Soc. Series A 158: 419-466.
'Model uncertainty, data mining and statistical inference ' () J. Royal Statistical Soc. Series A : 158 -466 .
Arabie, P. and J. D. Carroll. 1980. MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211-235.
'MAPCLUS: a mathematical programming approach to fitting the ADCLUS model ' () 45 Psychometrika : 211 -235 .
Akaike, H. 1978. A Bayesian analysis of the minimum AIC procedure. Annals Inst. Statist. Mathematics 30:9-14.
'A Bayesian analysis of the minimum AIC procedure ' () 30 Annals Inst. Statist. Mathematics : 9 -14 .
Boulton, D. M. and C. S. Wallace. 1970. A program for numerical classification. Comput. J. 13: 63 - 69.
'A program for numerical classification ' () 13 Comput. J. : 63 -69 .
Boulton, D. M. and C. S. Wallace. 1973. An information measure for hierarchic classification. Comput. J. 16: 254-261.
'An information measure for hierarchic classification ' () 16 Comput. J. : 254 -261 .
Boulton, D. M. and C. S. Wallace. 1975. An information measure for single-link classification. Comput J. 18: 236-238.
'An information measure for single-link classification ' () 18 Comput J. : 236 -238 .
Austin, M. P. 1970. An applied ecological example of mixed data classification. In: R. S. Anderssen and M. R. Osborne (eds.), Data Representation. Univ. Queensland Press, Brisbane, pp. 113-117.
An applied ecological example of mixed data classification , () 113 -117 .
Babad, Y. M. and J. A. Hoffer. 1984. Even no data has value. Commun. Assoc. Comput. Mach. 27: 748-756.
'Even no data has value ' () 27 Commun. Assoc. Comput. Mach. : 748 -756 .
Bradfield, G. E. and N. C. Kenkel. 1987. Nonlinear ordination using flexible shortest path adjustment of ecological distance. Ecology 68: 750-753.
'Nonlinear ordination using flexible shortest path adjustment of ecological distance ' () 68 Ecology : 750 -753 .
Carley, K. and M. Palmquist. 1992. Extracting, representing and analyzing mental models Social Forces 70: 601-636.
'Extracting, representing and analyzing mental models ' () 70 Social Forces : 601 -636 .
Chaitin, G. J. 1966. On the length of programs for computing finite sequences.J. Assoc. Comput. Mach. 13:547-549.
'On the length of programs for computing finite sequences ' () 13 J. Assoc. Comput. Mach. : 547 -549 .
Viswanathan, M, C. S. Wallace, D. L. Dowe and K. B. Korb. 1999. Finding cutpoints in noisy binary sequences: a revised empirical examination. In: N. Foo (ed.), AI-99 Lecture Notes in Artificial Intelligence 1747, Springer-Verlag, Berlin, pp. 405-416.
Finding cutpoints in noisy binary sequences: a revised empirical examination , () 405 -416 .
Wallace, C. S. 1990. Classification by minimum message length inference. In: G. Goos and J. Hartmanis (eds.), Advances in Computing and Information - ICCI'90, Springer-Verlag, Berlin, pp. 72-81.
Classification by minimum message length inference , () 72 -81 .
Wallace, C. S. 1995. Multiple factor analysis by MML estimation. Tech. Rep. 95/218, Dept Computer Science, Monash University, Clayton, Victoria 3168, Australia. 21 pp.
Multiple factor analysis by MML estimation. Tech. Rep. 95/218 , () 21 .
Wallace, C. S. 1998. Intrinsic classification of spatially correlated data. Comput. J. 41: 602-611.
'Intrinsic classification of spatially correlated data ' () 41 Comput. J. : 602 -611 .
Wallace, C. S. and D. L. Dowe. 2000. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing 10: 73-83.
'MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions ' () 10 Statistics and Computing : 73 -83 .
Wallace, C. S. and P. R. Freeman. 1987. Estimation and inference by compact coding. J. Roy. Statist. Soc. Ser. B 49: 240-252.
'Estimation and inference by compact coding ' () 49 J. Roy. Statist. Soc. Ser. B : 240 -252 .
Wallace, C. S. and P. R. Freeman. 1992. Single factor analysis by minimum message length estimation. J. Roy. Statist. Soc. Ser. B 54: 195-209.
'Single factor analysis by minimum message length estimation ' () 54 J. Roy. Statist. Soc. Ser. B : 195 -209 .
Watanabe, S. 1969. Knowing and Guessing. Wiley, New York.
Knowing and Guessing. , ().
Williams, W T. and M. B. Dale. 1962. Partitioned correlation matrices for heterogenous quantitative data. Nature 196: 502.
'Partitioned correlation matrices for heterogenous quantitative data ' () 196 Nature : 502 .
Williams, W T., G.N. Lance, L.J. Webb, J.G. Tracey. and J.H. Connell. 1969. Studies in the numerical analysis of complex rainforest communities IV A method for the elucidation of small scale pattern. J. Ecol. 57: 635-654.
'Studies in the numerical analysis of complex rainforest communities IV A method for the elucidation of small scale pattern ' () 57 J. Ecol. : 635 -654 .
Yarranton, G. A., W. J. Beasleigh, R. G. Morrison and M. I. Shafti. 1972. On the classification of phytosociological data into nonexclusive groups with a conjecture about determining the optimum number of groups in a classification. Vegetatio 24: 1-12.
'On the classification of phytosociological data into nonexclusive groups with a conjecture about determining the optimum number of groups in a classification ' () 24 Vegetatio : 1 -12 .
Bezdek, J. C. 1974. Numerical taxonomy with fuzzy sets. J. Math. Biol. 1:57-71.
'Numerical taxonomy with fuzzy sets ' () 1 J. Math. Biol. : 57 -71 .
Boerlijst, M. C. and P. Hogeweg. 1991. Spiral wave structure in prebiotic evolution: hypercycles stable against parasites. Physica D 48: 17-28.
'Spiral wave structure in prebiotic evolution: hypercycles stable against parasites ' () 48 Physica D. : 17 -28 .
Ganesalingam, S. and G. J. McLachlan. 1980. A comparison of the mixture and classification approaches to cluster analysis. Commun. Statist. Theor. Meth. A9: 923-933.
'A comparison of the mixture and classification approaches to cluster analysis ' () 9 Commun. Statist. Theor. Meth. A : 923 -933 .
Goodall, D. W and E. Feoli. 1988. Application of probabilistic methods in the analysis of phytosociological data. Coenoses 1: 1 -10.
'Application of probabilistic methods in the analysis of phytosociological data ' () 1 Coenoses : 1 -10 .
Gordon, A. D. 1994. Identifying genuine clusters in a classification. Comput. Statist. Data Analysis 18: 561-581.
'Identifying genuine clusters in a classification ' () 18 Comput. Statist. Data Analysis : 561 -581 .
Hayes, A. F. 1996. Permutation test is not distribution free. Psychol. Methods 1: 184-198.
'Permutation test is not distribution free ' () 1 Psychol. Methods : 184 -198 .
Edwards, R. T. and D. Dowe. 1998. Single factor analysis in MML mixture modelling. Lecture Notes in Artificial Intelligence 1394, Springer-Verlag, pp. 96-109.
Single factor analysis in MML mixture modelling , () 96 -109 .
Dale, M. B. 2000b. Mt Glorious revisited: secondary succession in subtropical rainforest. Community Ecol. 1:181-193.
'Mt Glorious revisited: secondary succession in subtropical rainforest ' () 1 Community Ecol. : 181 -193 .
Hill, M. O., R. G. H. Bunce and M. W Shaw. 1975. Indicator species analysis: a divisive polythetic method of classification and its application to a survey of native pinewoods in Scotland. J. Ecol. 63: 597-613.
'Indicator species analysis: a divisive polythetic method of classification and its application to a survey of native pinewoods in Scotland ' () 63 J. Ecol. : 597 -613 .
Hoffman, R. L. and A. K. Jain. 1987. Sparse decomposition for exploratory pattern analysis. I. E. E. E. Trans. Patt. Anal. Mach. Intell. PAMI-9: 551-560.
'Sparse decomposition for exploratory pattern analysis ' () PAMI-9 I. E. E. E. Trans. Patt. Anal. Mach. Intell. : 551 -560 .
Hubert, L. and P. Arabie. 1994. The analysis of proximity matrices through sums of matrices having (anti-)Robinson forms. Brit. J. Math. Statist. Psychol. 47:1-40.
'The analysis of proximity matrices through sums of matrices having (anti-)Robinson forms ' () 47 Brit. J. Math. Statist. Psychol. : 1 -40 .
Kolmogorov, A. N. 1965. Three approaches to the quantitative description of information. Prob. Inform. Transmission 1: 4-7 (translation).
'Three approaches to the quantitative description of information ' () 1 Prob. Inform. Transmission : 4 -7 .
Li, C. and G. Biswas. 1999. Temporal pattern generation using hidden Markov model-based unsupervised classification. In: Advances in Intelligent Data Analysis, Lecture Notes in Computer Science 1642, Springer-Verlag, Berlin, pp. 245-256.
Temporal pattern generation using hidden Markov model-based unsupervised classification , () 245 -256 .
Li, C. and G. Biswas. 2000. Bayesian temporal data clustering using hidden Markov model representation. In: P. Langley (ed.), Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA. pp. 543-550.
'Bayesian temporal data clustering using hidden Markov model representation ' , , .
Dale, M. B. 2001. Minimum message length clustering, environmental heterogeneity and the variable Poisson model. Community Ecol. 2:17'1-180.
'Minimum message length clustering, environmental heterogeneity and the variable Poisson model ' () 2 Community Ecol. : 171 -180 .
Dale, M. B. 2000a. On plexus representation of dissimilarities. Community Ecol. 1:43-56.
'On plexus representation of dissimilarities ' () 1 Community Ecol. : 43 -56 .
Dale, M. B. and P. Hogeweg. 1998. The dynamics of diversity: a cellular automaton approach. Coenoses 13:3-15.
'The dynamics of diversity: a cellular automaton approach ' () 13 Coenoses : 3 -15 .
Edgoose, T. and L. Allison. 1999. MML Markov classification of sequential data. Statistics and Computing 9: 269-278.
'MML Markov classification of sequential data ' () 9 Statistics and Computing : 269 -278 .
Dale, M. B. 1994. Straightening the horseshoe: a Riemannian resolution? Coenoses 9: 43-53.
'Straightening the horseshoe: a Riemannian resolution ' () 9 Coenoses : 43 -53 .
Dale, M. B. 1999. The dynamics of diversity: mixed strategy systems. Coenoses 13:105-113.
'The dynamics of diversity: mixed strategy systems ' () 13 Coenoses : 105 -113 .
Dale, M. B. 1987. Knowing when to stop: cluster concept-concept cluster. Coenoses 3: 11-32.
'Knowing when to stop: cluster concept-concept cluster ' () 3 Coenoses : 11 -32 .
Dale, M. B. 1988. Some fuzzy approaches to phytosociology: ideals and instances. Folia Geobot. Phytotax. 23: 239-274.
'Some fuzzy approaches to phytosociology: ideals and instances ' () 23 Folia Geobot. Phytotax. : 239 -274 .
Dale, M. B. (submitted) Models, measures and messages: a role for induction.
Savill, N. J., P. Rohani and P. Hogeweg. 1997. Self-reinforcing spatial patterns enslave evolution in a host-parasitoid system. J. Theoret. Biol. 188: 11-20.
'Self-reinforcing spatial patterns enslave evolution in a host-parasitoid system ' () 188 J. Theoret. Biol. : 11 -20 .
Shipley, B. and P. A. Keddy. 1987. The individualistic and community-unit concepts as falsifiable hypotheses. Vegetatio 69:47-55.
'The individualistic and community-unit concepts as falsifiable hypotheses ' () 69 Vegetatio : 47 -55 .
Stevens, W. L. 1937. Significance of grouping. Ann. Eug London. 8:57-69.
'Significance of grouping ' () 8 Ann. Eug London. : 57 -69 .
Van der Maarel, E. 1990. Ecotones and ecoclines are different. J. Veg. Sci. 1:135-138.
'Ecotones and ecoclines are different ' () 1 J. Veg. Sci. : 135 -138 .
Liu, R. Y., J. M. Parelius and K. Singh. 1999. Multivariate analysis by data depth: descriptive statistics (with discussion). Ann. Statist. 27:783-885.
'Multivariate analysis by data depth: descriptive statistics (with discussion) ' () 27 Ann. Statist. : 783 -885 .
Lux, A. 2000. Die Dynamik der Kraut-Gras-Schicht in einem Mittel-und Niederwaldsystem. Untersuchungen im Gebiet des Kehrenbergs bei Bad Windsheim. Dissertationes Botanicae Vol. 333.
'Die Dynamik der Kraut-Gras-Schicht in einem Mittel-und Niederwaldsystem. Untersuchungen im Gebiet des Kehrenbergs bei Bad Windsheim ' () 333 Dissertationes Botanicae .
Lux, A. and F. A. Bemmerlein-Lux 1998. Two vegetation maps of the same island: floristic units versus structural units. Appl. Veg. Sci. 1: 201-210.
'Two vegetation maps of the same island: floristic units versus structural units ' () 1 Appl. Veg. Sci. : 201 -210 .
Oliver, J. J. and C. S. Forbes. 1997. Bayesian approaches to segmenting a simple time series. Tech. Rep. 97/336 Dept. Comput. Sci. Software Engineering, Monash University. Clayton, Victoria 3168, Australia.
'Bayesian approaches to segmenting a simple time series. Tech. Rep. 97/336 ' , , .
Pillar, V D. 1996. A randomization-based solution for vegetation classification and homogeneity testing. Coenoses 11: 29-36.
'A randomization-based solution for vegetation classification and homogeneity testing ' () 11 Coenoses : 29 -36 .
Richardson, S. and P. J. Green. 1997. On Bayesian analysis of mixtures with an unknown number of components. J. Roy. Statist. Soc. B 59: 731-792.
'On Bayesian analysis of mixtures with an unknown number of components ' () 59 J. Roy. Statist. Soc. B : 731 -792 .
Rissanen, J. 1983. A universal prior for integers and estimation by minimum description length. Annals of Statistics 11:416-431.
'A universal prior for integers and estimation by minimum description length ' () 11 Annals of Statistics : 416 -431 .
Rissanen, J. 1995. Stochastic complexity in learning. In: P. Vitányi (ed.), Computational Learning Theory, Lecture Notes in Computer Science 904, Springer Verlag, Berlin, pp. 196-201.
Stochastic complexity in learning , () 196 -201 .
Robinson, P. A. 1954. The distribution of plant populations. Ann. Bot. 18: 35-45.
'The distribution of plant populations ' () 18 Ann. Bot. : 35 -45 .
Sandland, R. L. and P. C. Young. 1979. Probabilistic tests and stopping rules associated with hierarchical classification techniques. Aust. J. Ecol. 4: 399-406.
'Probabilistic tests and stopping rules associated with hierarchical classification techniques ' () 4 Aust. J. Ecol. : 399 -406 .
Krishna-Iyer, P. V. 1949. The first and second moments of some probability distributions arising from points on a lattice and their application. Biometrika 36: 135-141.
'The first and second moments of some probability distributions arising from points on a lattice and their application ' () 36 Biometrika : 135 -141 .
Legendre, P. and E. D. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
'Ecologically meaningful transformations for ordination of species data ' () 129 Oecologia : 271 -280 .
Boik, R. J. 1987. The Fisher-Pitman permutation test: a non-robust alternative to the normal theory F-test when variances are heterogeneous. Brit. J. Math. Statist. Psychol. 40:26-42.
'The Fisher-Pitman permutation test: a non-robust alternative to the normal theory F-test when variances are heterogeneous ' () 40 Brit. J. Math. Statist. Psychol. : 26 -42 .