Scores and clusters of Hungarian universities

We present an application preference, list-based framework to Hungarian universities, which allows different type of flexible aggregation, and hence, analysis and clustering of application data. A novel mathematical method is developed by which preference lists can be converted into scored rankings. The proposed approach is demonstrated in the case of Hungary covering the period of 2006 – 2015. Our method reveals that the efforts to leverage the geographical center – periphery differences did not ful ﬁ l the expectations of policy makers. Also, it turns out that a student ’ s top preference is very dif ﬁ cult to in ﬂ uence, while recruiters may build their strategy on the information of the ﬁ rst but one choice.


INTRODUCTION
University rankings attract the attention of students, parents, educators, Higher Education (HE) managers as well as politicians. Since rankings provide crucial information, ranking methodologies are of the utmost importance. Given that the vast literature on university ranking also contains excellent reviews and critical analytical works (see e.g. Pittaway -Cope 2007), we do not elaborate on the evaluation of them. We do not wish to promote ours as a ranking method either, or claim that it is better than other methods. Our aim here is to present a framework, which allows different types of flexible aggregations, and hence, the analysis and clustering of the application data. A novel mathematical method is developed by which preference lists can be converted into scored rankings. Our method is based on the aggregation of student's preference lists. It is free of ad hoc subjective weights and biased data provided by the institutions sometimes. We shall see that the choice of the input data and the aggregation method applied to the preference lists provide a very flexible foundation for further analysis in different directions.
When 18-year old students submit their applications to one or more higher education institutions, it is the result of a very complex evaluation process in which a lot of aspects are taken into consideration, including family traditions, financial background, and the students' position in the social network and in their sub-culture. Other important factors are, students' academic capabilities, their motivations, as well as exogenous factors like distance, prestige and quality of the institutions, tuition fees, living costs, expected salaries after graduation and so on (Avery et al. 2013;Hilmer -Hilmer 2012). All these factors are projected down to a single virtual costbenefit (utility) scale, where the universities receive their relative position. Of course, an individual choice does not reflect the excellence of a university, but the aggregation of the preference lists of all applicants reflects the students' perceptions very well about the benefits of choosing one or the other. The usual ranking methods and our earlier work on preference lists (Telcs et al. 2016) cannot provide a way to capture the similarity (or dissimilarity) measure of the universities. Thus, we present a method that turns our preference lists to preference scores and quantifies the distance between the elements on the list.
As a case study, we consider application system and data from the Hungarian HE application system. In Hungary, students can apply for admission to more than one majors of (the same or) several (different) universities indicating a strict preference order of them. A student is admitted to the first university on his/her list for which he/she meets the requirements. The ranking of the universities on an application form reflects a student's preferences (Telcs et al. 2016), the authors consider the list of applications as the results of paired comparisons of the institutions. If institution "A" precedes institution "B" in the list of a certain student, then "A" is more adequate for him/her than "B". If "A" and "B" are (not) included on the list of a certain student, they are considered equally (in)adequate. As the number of applicants is high every year, we have a lot of people's opinions which can be aggregated into a ranking of institutions. Telcs et al. (2016) have proposed a method to aggregate application preferences in order to obtain faculty or institutional preferences. They have compared this method to other ranking possibilities. The advantage of the method is that it does not use self-imposed weights, consequently it reflects the unobscured preferences of all applicants. The disadvantage of the method is that it provides ranks and not scores. Nevertheless, this method permits clustering, similarly to the rank order clustering methods (e.g. Li et al. 2007); the preference orders can also be clustered.
The lack of scores has prompted us to elaborate a new method for ordering. The method assigns random variables to the institutions and estimates their expectations by maximum likelihood method. The method improves the Thurstone's method (Mosteller 1951;Thurstone 1994) allowing for more than two options in the results of comparisons. The proposed method makes it possible to calculate institutional score values. Scores are more sensitive indicators than ranks, and allow to respond to early warning of score shifts. In addition, these values provide a refined basis for clustering and evaluation and they permit us to evaluate the significance of the difference of the institutions.
The application-based preference orders and the score values have a potential for characterizing and clustering institutions, regions, professional areas and so on (Koszty an et al. 2020). We shall demonstrate how the proposed methods can show the potentials, attraction of institutions, and how these methods can be used for grouping universities by the application preferences and the applicants' performances. The proposed preference scoring and its clustering methods use only objective data like application orders and admission points. In addition, the changes of the (institutional/professional) preferences can be analysed with our methods. This paper is structured as follows. After the introduction, Section 2 summarises previous analyses on students' motivation and factors influencing their application to higher education. In Section 3, a possible solution is described for handling preference orders. Section 4 discusses the novel mathematical method in detail, which can convert preference lists into scored rankings, and this method is demonstrated in the case of Hungary (Section 5). Finally, in Sections 5 and 6, conclusions and possible future researches are presented.

RELATED WORKS
We collect here some works to perform a country level deep analysis of students' motivation, factors influencing their decisions and in some cases the measurable implication of policy changes to them (Perna -Titus 2004). Our study points out that while the institutional preferences are changing due to the changes in the system of financial support, professional area preferences are very robust.
As we mentioned before, there are many factors which can influence a student's application to higher education. Participating in the higher education is very costly for students, thus one of the main factors that influences a student's decision whether to participate in higher education at all is permissible within his/her socioeconomic status. There are several studies that focus on students' financial circumstances (e.g. Cabrera -La Nasa 2001;Dearden et al. 2014;English -Umbach 2016;Migin et al. 2015;Paulsen -John 2002), some of them discuss the ethnic differences as well (e.g. Hurtado et al. 1997;Hilmer -Hilmer 2012;Niu 2015;Perna 2000).
Student aid and loans are widely used tools to make higher education more accessible for students from low-income families. Several papers examine how these tools affect students' choices (e.g. Dearden et al. 2014;Paulsen -John 2002;Perna -Titus 2004;Migin et al. 2015). Furthermore, tuition fees also have a great impact on the decisions concerning the enrolments in higher education (e.g. Dearden et al. 2014;Paulsen -John 2002). There is evidence that during the process of choosing a college, the American students take into account the job opportunities, the cost of living, the expected earnings and the living standard as well (e.g. Hilmer -Hilmer 2012; Montmarquette et al. 2002) Family has an even greater influence on the university choice. Several studies pointed out that the parental education has a positive effect on the prospects of graduating from higher education (e.g. Niu 2015; Paulsen -John 2002). The proximity of the education site is known as a key decision factor and has been the topic of several studies (e.g. Hilmer -Hilmer 2012;Drewes -Michael 2006;Shamsuddin 2016). In the case of Hungary, Jancs o -Szalkai (2017) showed that the accessibility of universities by car and the travel time are important factors for students.
Another key factor which has a considerable effect on students' choices is their performance. Students with good abilities (with high admission points, good grades) tend to apply to "elite" universities (e.g. English -Umbach 2016;Paulsen -John 2002). Thus, the quality and the reputation of the university, that is, the excellence of the HE institutions, are also important factors for students (e.g. Bruno -Improta 2008;Drewes -Michael 2006;Horstschr€ aer 2012;Telcs et al. 2015). Avery et al. (2013) surveyed the high-achieving students and suggested a binary logit model for explaining the revealed preferences of students. However, their database contains only the first revealed preferences of the students, while our study considers all applications not only the first, but the second, third etc. applications.
We put emphasis on the students' sensitivity to the distance and econo-geographic factors, which, of course, depend on the population structure, the size and the cultural history of a given country. The method proposed in Section 4, regardless of the large variety of influencing factors, is applicable.
We have thoroughly searched the literature for alternatives to our scaling method, but we have found that compared to previous attempts our method is novel in assigning a scale to preference lists of HE institutions and in allowing us to test if the difference between the ranked institutions is significant.
While one can find numerous international ranking of the HE institutions, preference studies on the international level are missing, for good reason. There is no central database, where such aggregated information could be accessible. This impedes a study of students' preferences and motivation on the European or even a global level, even though it would be of utmost interest, given the increasing student mobility. It can be expected that the development of the pan-European student mobility will support the foundation of a unified database, which would eventually enable such an analysis.
Most of the studies we found are based on surveys (e.g. Dearden et al. 2014;Hilmer -Hilmer 2012;Hurtado et al. 1997;Migin et al. 2015;Perna 2000), interviews (e.g. P erez -McDonough 2008) or a (sub)sample of a database (e.g. Montmarquette et al. 2002). Their sample sizes of the research database were considerable, however, our research is based on the application data of the entire student cohort. Our Hungarian database contains more than 400,000 applications per year and more than 6 million records (applications) within the research period (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). The proposed preference scoring and clustering methods use only the objective data like application orders and admission points. However, with these methods and the longitudinal data, the changes of the (institutional/professional) preferences can be analysed.

PRELIMINARIES
First of all, we have to handle two major problems with respect to the preference orders. The application preference orders are incomplete, because we do not know the positions of the not-preferred institutions or majors. A possible solution to that is proposed in Telcs et al. (2016) and we adopt that in our study as well.
Assumption 1: It is assumed that if an institution is not present on the student's application list then this institution is less preferred by her/him to those that are on the list.
We also need the comparisons of the unlisted elements.
Assumption 2: The institutions which are not present on the student's application list are not distinguished by her/him.
Different kinds of distance functions are sensitive to different kinds of dissimilarities. For the sake of simplicity in most of the cases we use the Euclidean distance.

The evaluation method
In order to introduce scores and ranks, we assume that the evaluations are based on a kind of utility model. The institutions have a hidden utility value m i , which is perceived by the applicants with Gaussian error ξ i ∼N(m i ,1/2),i 5 1,2,. . .,n. The rank of the expectations provides the rank of the objects in our model, but that is hidden for us. We know the applicant's partial preference list which carries the pairwise rank differences. We assume that those pairwise rank differences are the proxies for the applicants' hidden perceived differences h i,j 5 ξ i À ξ j . The sample concerns these differences. We suppose that the samples are independent and Gaussian distributed with equal unit variances, i.e. h i,j ∼N(m i À m j ,1), i 5 1,. . .,nÀ1, j 5 i þ 1,. . .,n. We allow six categories for the samples and these are quantized by a parameter 0 < d. In our model, we have the parameters m i and d. Using standard maximum likelihood estimation procedure, we can obtain estimate of the centers and d as well (up to an irrelevant shift given that we only observe the proxies of the utility differences). Following Orb an-Mih alyk o et al. (2019), we assume that the pairwise comparisons are independent though the revealed partial ranks impose a dependence between them. As it is pointed out, without this restriction the model is practically not computable, and given the large number of samples, we hope the bias is not influencing our results very much. Denoting the sample by V, the maximum likelihood estimation of the parameters is b m 1 ; :::; b m n ; b d ¼ arg max ðm 1 ;:::;m n ;dÞ LðVjm 1 ; :::; m n ; dÞ We can use the theorem (Orb an-Mih alyk o et al. 2019) which ensures (under mild conditions) that the maximum likelihood method can be applied and the maximum in (1) exists and is unique. We have checked and found that the assumptions of the theorem are satisfied in our case.

Testing equivalence
Our model on preference ordering makes it possible to judge on tight in the ranking. It can be investigated by the likelihood ratio test. In Koszty an et al. (2020), the authors used a different method to create scores for the institutes. In our analysis, we model the respondents as non-consequent, the pairwise comparisons are not necessarily consistent, while in Koszty an et al. (2020) the responses are consistent, that is h i,j þ h j,k 5 h i,k . Interestingly enough, the two models produce very similar results. Let us fix two institutions i, j and let m i 5 0. Now, let be our null hypothesis and its alternative be Let, SSi be the subset of admissible parameters if m i 5 0 and SS ij are under the hypothesis H 0 . The test statistic (see Wilks 1938) is If our model assumptions and H 0 hold, D i,j is asymptotically Chi-Square distributed with 1 degree of freedom. Consequently, if the test value in (4) is larger than the critical value of the Chi-Square distribution, we reject the equality of the expectations belonging to the objects i and j.
For the further analysis, we introduce the exponential transformation of the expected values: similar to the ones used in the discrete choice models or the utility theory.

Similarities and dissimilarities: distance of the score vectors and clustering
Let us recall and make it clear that the score vectors w are in one-to-one correspondence to the vectors b m; which characterize the estimated positions of the objects. The estimates themselves are based on the submitted applications, or a subset of them. In that way for an aggregation of the preference list, a score vector can be assigned to the given group of the list, would it be a pool of lists coming from a geographical region, from a decile of the students according to their performances or other partitioning of the submitted applications. The score vectors, representing groups (partitions of the whole submission) associated with the institutions, provide a spatial representation and possibility for comparison and clustering of them. We will use the usual Euclidean norm between them, that is The clustering is performed applying the 'k-means method'. We arrange the subsets into the groups in such a way that the sum of squares from points to the assigned cluster centers is minimized. At the minimum, all cluster centers are equal to the mean of their Voronoi sets. The algorithm is generally accepted and it was published in Hartigan -Wong (1979). It uses the previously determined scores and they are not recomputed during the clustering process. Consequently, the sizes of the subregions do not affect the clustering process. Further, we use KM-PSC as the acronym of this preference score-based clustering algorithm.

Data sources
The original data source is the totality of applications submitted to the Hungarian National Center of Higher Education -Education Nonprofit Ltd. This office collects and handles all HE applicationsfor the entire country and for all universities. 1 Their database contains the annual applications. Our subset contains the application records for the time period of 2006-2015. Each record has 12 fields as it is shown in List 1. The admission points cover the periods of (2006,(2010)(2011)(2012)(2013)(2014)(2015), and calculated admission points (2007)(2008)(2009) are computed from the raw student scores (results of final exams, marks, etc. Each record refers to a single application, one student may submit more than one. It is typical that a student applies for three places but there are cases of more than ten applications for different HE institutions. Our database contains more than 400,000 records per annum from more than 100,000 applicants. Each application provides a separate record, which contains information about the order, as well. Thus, several records may belong to the same applicant. The candidate is admitted to the first institution, the requirement of which he/she satisfies. The calculation of the admission points has been very similar year by year. These points are mainly based on the grades of the last two school years in the secondary education and the results of the regular final exam (maturity exam). If the results of two chosen subject's advanced final exam are higher, they are taken into consideration instead of the last two years' grades. Additional points can be obtained if the applicant has a language proficiency exam, professional qualifications or he/she is medically disadvantaged.
While the calculation process of the admission points was similar year by year, the maximum admission points were different in 2006-2007, 2008-2011 and 2012-2015. In order to compare student performances in different years, we use the quantiles of admission points.
Individual preference orders can be aggregated for any subset from which different characteristics of the institutions can be calculated. As a result, a very interesting cross-sectional view of the student-institution relationship can be analysed. In what follows, we demonstrate some among the numerous possibilities.

The impact of the tuition fee on the preferences of fields of studies
In Hungary, most of the BA, BSc, MA, MSc courses were entirely state financed before 2012. However, since 2012, students at BA and BSc programmes on economic and business studies and also most programmes in social sciences have stopped receiving state support. Some authors found that an increase in students' grants causes an increase in HE participation (e.g. Dearden et al. 2014). Thus, it was expected that the introduction of the tuition fees would lower the interest for participating in these fields of studies.
In order to examine this hypothesis, we partition students according to their performances into deciles, and then, aggregate the chosen programs into fields of studies. Table 1 contains the fields of studies coded by numbers 1-13, student performances' deciles (header) and annual results in rows taking into consideration the first four listed preferences (places) of the BA and BSc applicants.
The first cursory investigation of Table 1 reveals the interesting fact that economic and business studies keep their top popularity. Engineering (code 8) is less preferred than economic sciences (code 7) in all deciles, while definitely more preferred than social sciences (code 3) since Table 1. The changes of the preference order of professions (fields of studies) 2012. It is interesting that the introduction of a tuition fee for economics/business programs did not effect their popularity. This observation calls for further investigations.
Besides calculating preference orders, the proposed preference scoring method allows us to follow the change of the difference of the preference scores between two professional areas. Figure 1 shows that the score differences between the economic studies (code 7) and engineering (code 8) have decreased with time (despite the increase of the distance of preference positions, see Table 1). This example demonstrates that if investigating the scores instead of the orders, the changes may be perceived at an earlier stage.

How are the students' performances reflected in the institutional preferences?
While the changes of preference orders of professions can be important for the educational policy decision makers, the admission-based institutional preference changes are important for the HE institutions. A more detailed picture can be obtained if the students' preferences for institutions are aggregated over the deciles. We determine the preference order in every decile and we cluster the institutions. We face four distinct clusters. By computing the average ranks of the institutions in the clusters during the available time, the change of preferences can be followed. It is interesting to observe that the preference of institutions as a function of students' performances follows four clearly separable patterns as depicted in Figure 2.
The first group (CL_1) is formed by the institutions where the preference orders are independent of the admission points. Most institutions within this group are countryside colleges and the applications to these institutions come mainly from their own subregions. The fact that most institutions are in this group is in line with other researchers' finding (e.g. Alm -Winters 2009; Bruno -Improta 2008; Drewes -Michael 2006;Telcs et al. 2015). They claimed that the most important factor in university applications is the distance between residence and university.
Lots of scholars have demonstrated that another main factor in applications is the quality (Bruno -Improta 2008), reputation (Drewes -Michael 2006) and institutional excellence (furthermore IE 2 ) (Horstschr€ aer 2012; Telcs et al. 2015). This fact is again in line with the performance-preferences patterns in the other three groups. The second cluster (CL_2) of institutions contains the "elite" universities, which are preferred by the students who have higher admission points. It should be noted that all of them are located in the capital, Budapest. The third cluster (CL_3) includes several colleges and universities of mid-sized towns. In this group the relative position of the institutions decreases when the student performance increases. The fourth group (CL_4) is composed of different kind of universities of arts, which are preferred by the students who have only medium admission points. These universities and academies have different admission systems, which are based on individual talents rather than the students' performances in the secondary school. A similar phenomenon can be detected by plotting the changes of score values of the institutions in the deciles. Summarising the results, we can conclude that the institutions can be classified by the performances of the students and this classification coincides with another classification concerning the qualities of the institutions.
The changes of the clusters can be important signs for the institutions. The fine details of preference changes might be particularly interesting for decision makers at the institutions (see the Hungarian institutional preference clusters in Table 2 in Appendix).

Leaving or staying?
A student's application list contains information about the subregion 3 of the applicant's residence. All proposed methods allow us to aggregate preferences by subregions. This way we can specify the subregional top preferences (the institution with the highest preference score). The proposed subregional preference map can be produced to all explored years and to all professional areas (for economics and business studies and engineering in 2011, 2015, see Figure 3). We can identify the geographically connected areas in the case of all fields of studies and also in the case of all applications. Each connected area has its own regional center institution. This phenomenon indicates the role of distance in the applications. We can identify similar shapes clustering the score vectors. The changes of the areas in time can indicate the realignments. The significance of the local institutions is decreasing, and the increasing role of the capital is evident and visualized.
Early warnings of changes are the result that we get by testing equivalence using (4). In Table 3 in Appendix, we find those subregions in which the top two preferences are not significantly different in 2011. Most of these subregions changed its top preference by 2015 and lots of further subregions have appeared with non-significant top two preferences (Table 4). Investigating HE institutions (HEIs), Figure 3 shows that the top preferences of the subregions turn more and more to the institutions in the capital, therefore the Budapest-centric attitudes of the students are increasing in time.
Turning to the role of the institutions locally, the 3D map of the preference scores (Figure 4(a)) and the preference contours (Figure 4(b)) show that the lowest relative scores are in Budapest and close to Budapest, where several universities and colleges compete for the students' applications, while these scores are relatively high in the peripheral subregions. This might be explained by supposing that the institutions in the towns have more important roles locally in their surroundings than any institutions in the capital for the applicants living in Budapest. In the countryside the most outstanding universities are PTE (University of P ecs), SZTE (University of Szeged), DE (University of Debrecen), while the Northern part of Transdanubia has lower peaks. We can conclude that any changes which affect local institutions may have a profound effect on their applicants.

Increasing dominance of the capital
If we consider the number of applications for the most preferred institutions, we can see that lots of applications come from the subregion of the institution. Table 5 in Appendix shows that the single clusters' top preferences are always the countryside institutions. On the other hand, after the second or the third position the (sub)regional preference list follows the general (global) preferences, although most of these institutions are farther than 60 kms from the center of the subregion. (For clustering, the k-nearest neighbour (k-NN) method is used on rank order distances instead of the Euclidean one.) If we consider the sums of the scores belonging to the top preferences, the increasing role of the institutions in Budapest is clearly demonstrated. Figure 5(a) represents the subregions where the total weights belonging to the institutions in Budapest in the top 6 have increased from 2011 to 2015. Figure 5(b) highlights those subregions where the differences between the total scores of the capital and the regional institutions in top 6 have raised. In the majority of the subregions, the increasing role of the universities in Budapest can be observed. In 2011, almost all capital-dominant subregions were in the neighbourhood of Budapest, by 2015 the set has been enlarged remarkably ( Figure 6).
Despite the government's intention to compensate Hungary's overcentralisation and strengthen the countryside universities, the students with good performance tend to apply to an institution in Budapest even if its IE position is worse than that of a countryside university. The most likely reason for the higher preference of the institutions in Budapest is the difference in job opportunities and living standards. This is in line with the findings of Montmarquette et al. (2002) and Hilmer -Hilmer (2012). The better the job opportunities and the higher the expected earnings, the more attractive a university or a major (city) is to a student. A more detailed analysis of these factors can be found in Telcs et al. (2015). As it has been demonstrated, in spite of the EU's and Hungarian government's efforts, the superiority of Budapest has not been eased but keeps increasing.
Although the competition among the institutions in Budapest is much stronger than among the regional universities, the number of applications is significantly larger in Budapest and near Budapest (Figure 7 applicants, although there are several underprivileged subregions with a very low number of applicants, such as the counties of Northern Hungary and Southern Transdanubia. It can be due to the fact that these regions have worse living conditions than other regions. Figure 7(b) shows that most students with high admission points live in Budapest and in its agglomeration, while there are several subregions, where the admission points of students are 10% lower than the overall average. 4 This result also highlights the subregional differences and the Budapest-centric comprehensive educational system. 6. CONCLUSION 6.1. Implications for decision-makers 6.1.1. Implications for education policy-makers. The results show that despite the European Union's Convergence Program and the Hungarian Government's intensions, decreasing the inequalities between the capital and the peripheral subregions inter alia by strengthening the regional institutions has not been successful so far. More students applied for the institutions in Budapest in 2015, than in 2011. Budapest's institutions can get students who have high admission points and the total weights of the institutions in Budapest arein the majority of the casesincreasing. We believe that this effort will not lead to a success until the inequalities in the employment opportunities will not decrease.
The other main intention of the Hungarian Government was to increase the interest in engineering professions. However, according to the relative stability of the application preference orders, the result of this effort was only partially successful. While the majority of places in economic and business studies have not been financed by the state any more, economic studies have remained the most popular of all, and probably will remain in the future, too. Although the position of engineering has also been stable in the preference order of professions, the weights of engineering among the professions have been increasing and the distances between preference scores of the economic and business studies and the engineering have been decreasing in the time period of 2011-2015.
6.1.2. Implications for the decision-makers in the institutions. The management of a HEI has to balance between the quantity and quality of the applicants. The characteristics of the proposed institutional preference order show them the changes of the order in the function of the admission points. Both the changes in top preferences and the proposed clustering algorithms detect the impact and the increment of the impact of Budapest. The results (Table 2) present that one of the main challenges for regional institutions is how to acquire and retain the best applicants, and how to offer an attractive alternative against the increasing potential of the Budapest HEIs.
The other main challenges especially for the institutions in towns are how to stop the loss of applications and how to recover and later how to increase the potential application district. It is a long-term process to get into the elite cluster. The first step is to reach the third cluster, then the institution can progress.
In order to organise an efficient promotion for a certain university, decision-makers should know where to go to promote the institution in order to keep or change the top preferences.

Implications for scholars
With the help of the proposed institutional preference scores we can set up a unique preference orders for professional areas and institutions. Moreover, we can specify the significant differences. The proposed institutional preference characteristics can be classified and this way the typical characteristics can be specified and distinguished.
The proposed preference order-based and score-based clustering algorithms specify the territorially connected preference clusters. Despite none of the proposed classifying methods uses geographical distance between the subregions, all of the proposed methods result in connected application districts. This result can be explained that in Hungary (similarly to several other countries) one of the main important factors in the choice of university is the distance between the residence of the applicant and the location of the university.
The proposed method can be applied on another large databases, like: National Center for Educational Statistics (NCES) (see: nces.ed.gov) and the change of the preferences can be analysed through the earlier significant parameters identified by other scholars (e.g. English -Umbach 2016).

FUTURE RESEARCH
The proposed preference orders, the preference score method and the preference score/orderbased classification methods are generally applicable and provide useful results. The above presented results are based on the Hungarian HEIs' data and Hungarian students' applications. We would like to encourage decision-makers in other countries to produce similar preference score/order-based analysis in order to check the results of the governmental decisions and to improve the effectiveness and efficiency of the enrolment of the HEIs.
In this paper, we have investigated students' applications from the point of view of the universities and the subregions. In our next paper, we are going to investigate them from the applicants' points of view. According to the applicants' preference list, we believe that we can specify students who prefer institutions to programs (the first, second, third etc. places of the application list contain different programs, but these programs are announced in the same university), and there are several students who prefer programs to institutions. The important question is which factors influence institutional loyalty and insistence on the program.
Our work proposes methods which are typically applicable to those countries and higher educational systems where applications are systematically collected and stored in a centralised database. In that respect they are not applicable directly to develop international university ranking, but in a future work we will present such extension, as well. This information can also be important for the HEIs' decision-makers in the specification of an adequate program portfolio.
Since the application database can be merged into an employment database, a whole immigration path of a student can be tracked and analysed from his/her birthplace through the residence of secondary school and the university applied for to the residence of the employment. This immigration path shows underdeveloped subregions, from where it is very difficult to break APPENDIX Table 2. Classification of the institutes (2011-2015) (Cluster); maximum (MAX) and minimum values (MIN) of the preference orders; I.E. Institutional Excellence see http://eduline.hu/rangsor; * 5 ecclesiastical universities/colleges; ** 5 universities/colleges/academies of arts/music *** 5 IE calculated) Table 3. Indistinguishable subregional top preferences (2011), where the significance level 5 0.05. Inst 1, 2 5 top 2 institutes with the largest preference scores; Score 1, 2 5 preference scores of Inst 1 and Inst 2; Sign 5 significance value; n 5 number of applications from the subregion