Summary As citation practices strongly depend on fields, field normalisation is recognised as necessary for fair comparison of figures in bibliometrics and evaluation studies. However fields may be defined at various levels, from small research areas to broad academic disciplines, and thus normalisation values are expected to vary. The aim of this project was to test the stability of citation ratings of articles as the level of observation - hence the basis of normalisation - changes. A conventional classification of science based on ISI subject categories and their aggregates at various scales was used, namely at five levels: all science, large academic discipline, sub-discipline, speciality and journal. Among various normalisation methods, we selected a simple ranking method (quantiles), based on the citation score of the article in each particular aggregate (journal, speciality, etc.) it belonged to at each level. The study was conducted on articles in the full SCI range, for publication year 1998 with a four-year citation window. Stability is measured in three ways: overall comparison of article rankings; individual trajectory of articles; survival of the top-cited class across levels. Overall rank correlations on the observed empirical structure are benchmarked against two fictitious sets that keep the same embedded structure of articles but reassign citation scores either in a totally ordered or in a totally random distribution. These sets act respectively as a 'worst case' and 'best case' for the stability of citation ratings. The results show that: (a) the average citation rankings of articles substantially change with the level of observation (b) observation at the journal level is very particular, and the results differ greatly in all test circumstances from all the other levels of observation (c) the lack of cross-scale stability is confirmed when looking at the distribution of individual trajectories of articles across the levels; (d) when considering the top-cited fractions, a standard measure of excellence, it is found that the contents of the 'top-cited' set is completely dependent on the level of observation. The instability of impact measures should not be interpreted in terms of lack of robustness but rather as the co-existence of various perspectives each having their own form of legitimacy. A follow-up study will focus on the micro levels of observation and will be based on a structure built around bibliometric groupings rather than conventional groupings based on ISI subject categories.