# Search Results

## You are looking at 1 - 7 of 7 items for

- Author or Editor: B. Hajagos x

- All content x

The tendency of the relative effectiveness of the surplus error method is decreasing with decreasing *N* number of the applied surplus error sets but the value of diminishing is little for interval of small *N*-s: even if *N* = 9 is used the median of the relative increase of the accuracy amounts 39% in the investigated example.

For two independent Gaussian random variables and for a series of outliers according to the classical formula calculated correlation coefficients (
*r*
_{c}
-s) are given; instead of zero significant distortions occur. — A short note is also given to the meaning of the word “outlier”.

As the variance (the square of the minimum *L*
_{ 2}-norm, i.e., the square of the scatter) is one of the basic characteristics of the conventional statistics, it is of practical importance to know the errors of its determination for different parent distribution types. This statement is outstandingly valid for the geostatistics because the (*h*) variogram (called also as semi-variogram) is defined as the half variance of some quantity-difference (e.g. difference of ore concentrations) in function of the *h* dis- tance of the measuring points and this g (*h*)-curve plays a basic role in the classical geostatistics. If the scatter (s_{ VAR}) is chosen to characterize the determination uncertainties of the variance (denoted the latter by VAR), this can be easily calculate as the quotient A_{ VAR}= Ö*n* (if the number n of the elements in the sample is large enough); for the so-called asymptotic scatter A_{ VAR} is known a simple formula (containing the fourth moment). The present paper shows that the AVAR has finite value unfortunately only for about a quarter of distribution types occurring in the earth sciences, it must be especially accentuate that A_{ VAR}
*has infinite value for that distribution type which most frequent occurs in the geostatistics*. It is proven by the present paper that the law of large numbers is always fulfilled (i.e., the error always decreases if *n* increases) for the error-determinations if the semi-intersextile range is accepted (instead of the scatter); the single (quite natural) condition is the existence of the theoretical variance for the parent distribution. __

The paper proves the practically advantageous fact that for the determination errors of the most frequent value calculations the simple asymptotic rule is valid for the whole sample-size domain 1 ? *n* ³ 4.

On the basis upon *n* corresponding value-pairs (*x*
_{i}; *y*
_{i}), *i* = 1, …, *n*, the closeness of correspondence between the random variables x and h is customarily characterized by the classical correlation coefficient *r* (see Eq. (2) in the present paper), equally in the geosciences and in the everyday life. It is shown in the present paper the lack of the robustness of Eq. (2) (*r* has even no meaning for circa 40% of the types occurring in the geosciences), and the lack of the resistance (one single outlying value-pair can distort the *r*-value in an incredible degree). The modern correlation coeffcient *r*
_{rob} (see Eq. (9) in this paper) is completely resistant against outliers, and in the same time also robust: Eq. (9) is applicable even if x and h are of Cauchy type, very far lying from the Gaussian distribution and even from the most frequently occurring so-called statistical distribution (see Eq. 6). For the Cauchy distribution neither the scatter (variance) nor the expected value exist therefore for this distribution-type even the classical theoretical value (see Eq. 3) does not exist: the calculation of r according to Eq. (2) gives in this case an "estimation" of a not existing quantity. In the paper are presented the results of a time consuming series of Monte Carlo calculations made equally for the statistical and Gaussian distributions and for *n* = 10; 30 and 100; the errors characterized by the semi-interquartile and semi- intersextile ranges of the modern rrob (Eq. 9) were calculated and tabulated for *r*
_{t} = 0; 0.1; 0. 2; … 0. 7 and 0. 8. An approximate method is also given (see the simple Eqs 16 and 17) to determine that value of *n* which assures a prescribed accuracy of the modern *r*
^{rob}.

In the present paper 9 error characteristics are detailedly investigated by Monte Carlo calculations in point of view of the fluctuation of their estimates. In all 9 cases the error characteristic is defined as the minimum value of a modern norm of deviations, just in the same manner as in the classical statistics the s scatter was defined as the minimum value of the *L*
_{2}-norm. The results are in Table I summarized and in Figs 1-9 presented for five parent distribution types and for five sample sizes: *n* = 5; 9; 25; 100 and 400; the statistical fluctuation is characterized by the relative semi-intersextile ranges of the minimum norms (*N* = 200000 repetition number was chosen in the Monte Carlo calculations). On the basis of the values of Table I the uncertainties can be determined with such accuracy which is seldom required in the practice. Because of the fact that ordinarily 15-20% is accepted as the "error of the error", in the Table III asymptotic values are also given to give possibility to the simplest: according to *A*
_{asympt}/vn executed calculations.

Although the errors of the modelparameters increase if the primary measuring errors are artificially increased by superposition of surplus errors to the measured values, if this surplus error superposition, however, is carried out many times enough, the medians of the inversion-determined modelparameters can be more (or even much more) accurate than the modelparameter got by one single inversion of the originally measured data. The practical application of this fundamental conception is shown in the present paper on a microgravimetric example. It turned out that this ``surplus error method'' can give well applicable modelparameters even if some modelparameter values are fully unusable if these values are calculated directly on the basis of the original measuring data but using only one single inversion.