Decision-making on establishment of re-calibration intervals of testing, inspection or certification measurement equipment by data science

This contribution is related to issues on decisions in conformity assessment, especially in testing, inspection


INTRODUCTION -DATA SCIENCE CLASSES OF TIC PROBLEMS
The conformity assessment processes are mostly finalized with decision delivery, mostly evident in testing, inspection, and certification (TIC). These decisions are predominantly based on empirical data derived by measurements. Measurements are crucial in various critical societal sectors, such as healthcare, trade, industry, energy sector, environmental protection, etc., where TIC activities are commonly conducted. Lately, significant impact on the TIC sector is created by the spin of the digital transformation. In the centre of the digital transformation there is the enormous quantity of data, which is continuously produced, processed, stored, and used for increasing number of applications. Artificial intelligence, machine learning, internet of things, big data analytics etc. are based on data. However, the data quality is an issue not always properly addressed, especially in sectors with traditions based on experimental approaches, as the TIC. Poor models, incorrect results, and finally wrong decisions might derive from poor quality of data. Data science utilizes scientific approaches, protocols, algorithms and systems with interdisciplinarity to extract insights and information from noisy, structured and unstructured data, and deploy knowledge from data in wide scope of applicative solutions [1], [2]. The recent revival of measurement and data science interrelation is induced by the emerging application of sensory devices and significant increase of available data storage, processing, transmission capacities which are variously utilized. One of the results of the high quantity of recorded information and of the theoretical achievements in measurement and data science, is an invention of numerous newly developed products and smart services. This contribution conducts an analysis of the options for utilization of the latest data science achievements in the TIC decision making processes, based on conclusions with synchronous usage of "measurements" as completely empirical, and the "data science" as methodology oriented towards modelling and simulation, combined with high complementarity and synergy, i.e., the data fusion approach. The modern scientific methodologies require sustaining the theory validity, through experimental verification, whenever possible. Experimental proof comprises a quantitative measure or non-quantitative (i.e., qualitative measure) of the observed quantities achieved through measurement. The consistency degree of various measurement results, derived by different independent experimenters or by the same experimenter at various moments, provides an indicator for the reliability of the results of the quantity of interest, taking into account that empirical knowledge is mostly imperfect to some degree and the combination of observations are standard and essential practices [3].
Several classes of data science problems for which techniques might be developed and evaluated across different domains in the TIC sector are [1]: • Detection: finding data of interest in given dataset.
• Anomaly detection: identification of system states that force additional pattern classes in a model. Outlier detection is associated with identifying potentially erroneous data items forcing changes in prediction models "influential observations". • Cleaning: elimination of errors, omissions, and inconsistencies in data or across datasets. • Alignment: relating different instances of the same object [4], like a word with the corresponding visual object, or time stamps associated with two different time series. Data alignment is frequently used for entity resolution, identifying common entities among different data sources. • Data fusion: different representations integration of the same real-world object, encoded in a well-defined knowledge base of entity types [5]. • Identification and classification: attempt to determine, for each item of interest, the type or class to which the item belongs [6]. • Regression: finding functional relationships between variables. • Prediction: estimation of a variable or multiple variables of interest at future times. • Structured prediction: tasks where the outputs are structured objects, rather than numeric values. A desirable technique to classify a variable in terms of a more complicated structure than producing discrete or realnumber values. • Knowledge base construction: construction of a database with a predefined schema, based on any number of diverse inputs. • Density estimation: production of a probability density (distribution function), beside a label/value.
• Joint inference: joint optimization of predictors for different sub-problems using constraints that enforce global consistency used for detection and cleaning for more accurate results. Data science involves ranking, clustering, and transcription ("structured prediction"), as in [7]. Other classes of problems are based on algorithms and techniques which are applied to raw data at an earlier "pre-processing" stage. Different data processing may be activated if evaluation methodology is essential, [1].

MAIN STATISTICAL PARADIGMS FOR DECISION MAKING IN TIC
The international endorsement of the Guide to the Expression of Uncertainty in Measurement (GUM), [8] accelerated the need to provide uncertainty statements in measurement results. The laboratory accreditation based on standards such as ISO 17025 [9] has amplified this process. As the uncertainty statements have been recognized as essential for effective decision making, various laboratories, from national metrology institutes to commercial test laboratories, insert significant workload into evaluation of measurement uncertainty by applying the GUM methods [8], [10], [23], but also the methodologies proposed in the international guideline ILAC G8 [22]. The approaches for uncertainty propagation evaluation in the TIC applications comprise the frequentist, Bayesian, and fiducial statistical paradigms [11], [23].
The first statistical paradigm -frequentist, is where uncertainty can be probabilistically assessed, and is based on statistical theory, referred as "classical" or "conventional". Considering the origin of uncertainty in TIC, these approaches must be adjusted to derive frequentist uncertainty intervals under practical conditions. In most realistic TIC environments, uncertainty intervals must comprise both the uncertainty in quantities estimated using data and the uncertainty in quantities derived from expert knowledge, so the approach of data fusion is indispensable. To gain an uncertainty interval, the measurands which are not under observation are usually treated as random variables with probability distributions of their values, on the other hand measurands with values possible to be assessed by applying statistical data are considered as unknown constants. The traditional frequentist protocols should be altered to achieve the prescribed level of confidence after averaging over the potential quantities' values evaluated by expert judgment [11].
The second paradigm -Bayesian approach [11] named after the fundamental theorem, which was proved by the Reverend Thomas Bayes in the mid-1700s, is where the analyst's knowledge about the measurands is modeled as a set of stochastic variables with a probability distribution in the joint parameter space. The theorem enables the probability distributions to be updated based on the observed data and the inter-relationships of the parameters defined by the function or equivalent statistical models. The probability distribution is obtained by describing the knowledge of measurand given the observed data.
The third statistical paradigm -fiducial approach, is developed by R. A. Fisher in the 1930s [11]. The probability distribution (fiducial distribution) for a measurand conditional on the data is gained from the interrelationship of measurand and the input value described by the function and the distributional assumptions on the data used to estimate.

DECISION-MAKING, AND RISK BASED THINKING IN TIC ESTABLISHED BY DATA FUSION
Data fusion aim to obtain higher quality information to apply to specific contexts, by profiting from the symbiosis of data collected from diverse sources. Data fusion is the process of combining data or information to estimate or predict entity states [12]. Applied in many decision-making domains, such as the TIC, it encompasses classification and pattern recognition utilized to argument decisions. Decision making, especially in conformity assessment is directly linked to introduction of risks in the laboratory or the TIC entity's operations. So, in TIC it is crucial not only to fuse data obtained from multiple sources, both experimental and theoretical, but also to assess threats and risk [9], [22] and [23]. Data fusion enlarges robustness and soundness and diminishes the vulnerability of the system giving arguments for the decision, and enabling decision-making even when some sources of information are missing or are inappropriate. Through data fusion better and larger coverage of space and time is achieved, ambiguity is decreased, because better information leads to better distinction among available hypotheses. Data fusion is based on experimental data output by sensing devices or instruments, and on information gained by other routes (e.g., the user as a data source for a priori knowledge, experience, and model application). Data fusion imposes all data to be represented in the same format (e.g., numeric values in the same units, relative values). If data are diverse in representation, data alignment or data registration is indispensable [12]. Measurements, as instrument outputs, produce a signal usually affected by noise, and whose reliability has to be proved (e.g., instrument malfunction, express corruption of measured quantity, like jamming). Filtering and validation of the data are necessary in data fusion processes. Data fusion comprises activities tackling: data from sources with different quality levels, such as different accuracy, co-related data, inflation of information, and all other issues leading to computational problems, and impose a need to change the context of the observation, like from time to frequency domain, or to extract features or attributes [12].
As an illustration for application of data science in the TIC sector, one of the most relevant TIC decision-making and riskintroducing issues will be further demonstrated -determination of the re-calibration period of the TIC measurement equipment by deploying data fusion as a mean for argumentation.

PLANNING THE TIC INSTRUMENT RE-CALIBRATION PERIOD BY DEPLOYMENT OF DATA FUSION
Estimating the re-calibration intervals is an essential issue of the TIC sector entities utilizing calibrated instruments in their activities. Most of the test equipment in today's laboratory inventories are multi-parameter items or consist of individual single-parameter items. An item-measurand is declared to be outof-tolerance if a single instrument parameter or an item in a set, is found to be out of pre-defined specifications. This is expensive, and introduces risks [13], [14], [23]. Most of the published methods for planning the re-calibration period of an instrument, are of statistical origin and can be adequately used only for large inventories of instruments, [15]. As a result of the different performance characteristics of individual instruments and their changeable working conditions, instrument reliability is complex to anticipate. Extended calibration intervals might have a consequence in increased potential costs associated with a given instrument, as more operation cycles (tests or calibrations) have been conducted before it is re-calibrated and found to be in-or out-of-tolerance. A posteriori costs might encompass a reverse traceability review to identify the items that have been tested by the instrument, a thorough investigation of the level of negative impact on their performance given the scale of the instrument's out-of-tolerance, leading to customer alert, accreditation suspension, product recall and imperceptible issues like the TIC entity's jeopardized reputation might occur. In this contribution, the focus is on estimating the recalibration period of measuring instruments used by the TIC entities by deploying data fusion for reduced decision-making risk. The approach for determining the recalibration range will be validated through a case study on experimental calibration and check data of an electrical measuring instrument, by fusion of data from diverse sources (both a posteriori experimental and a priori knowledge, experience, model application).
Most of the standards according to which the TIC entities are accredited/certified require to have available, suitable, and adequate facilities and equipment to permit all TIC activities to be carried out in a competent and safe manner, with the responsibility lying solely on the TIC entity. One of the most significant decisions regarding the calibration is "When and how often to do it?" Many factors influence the time range between calibrations, and they should be identified and considered by the TIC entity. The most important factors are: • uncertainty of measurement required or declared by the TIC entity, • risk of a measuring instrument exceeding the limits of the maximum permissible error when in use, • cost of necessary correction measures when it is found that the instrument was out-of-tolerance over a long period of time, • type of instrument, • tendency to wear and drift, • manufacturer's recommendation, • extent and severity of use, • environmental conditions (climatic conditions, vibration, ionizing radiation, etc.), • trend data obtained from previous calibration records, • recorded history of maintenance and servicing, • frequency of cross-checking against other reference standards/measuring devices (including diverse measures for quality assurance in TIC, such as inter-laboratory comparisons or proficiency testing schemes, or repeatability of tests under different operating conditions), • frequency and quality of intermediate checks in meantime, • transportation arrangements and risk, and • degree to which the TIC personnel are trained [15].
The use of statistical methods (i.e., by deploying data science) on an individual instrument or instrument type are of interest, especially if combined with adequate software tools.
According to Agilent Technologies ® , prior to the introduction of a new product, [16] the responsible personnel set the initial recommended re-calibration period. Data is treated as reliable data if originating from at least three areas: • data from similar instruments, • data for the individual components used in the instrument, • data on any subassemblies deriving from existing mature products (i.e., instruments).
The usual working environment and the testing results of the surrounding conditions conducted on instrument prototypes are considered as well [18].
Several methods for determining the calibration intervals are published, [13], [14], [19], [20] and [21]. Some models assume that the calibration condition of the instrument can be traced by monitoring the drift of an observable parameter, [13]. The calibration ranges can be presented according to analysis by parameter variables data, analysis by parameter attributes data, by instrument attributes data, and by class instrument attributes data. Other methods, such as an extension by providing a maximum likelihood estimation for the analysis of data characterized by unknown failure times, are given in [13], where the estimation method is using the exponential reliability function.
An approach with a review of the instrument's calibration history is presented in [14], calibration records indicate the history of remaining in tolerance. The instrument might have a higher likelihood of remaining in tolerance, as a result of an algorithm that has been developed calculating calibration ranges based on the condition received from calibration along with a historical weighting. A method from variables data is presented for determining calibration intervals for parameters whose value demonstrate time-drift with constant statistical variance. The method utilizes variables data in the analysis of the timedependence of deviations between as-left and as-found values from calibration. The deviations are from the difference between a parameter's as-found value at a given calibration and as-left value prior to calibration [14]. The choices for the tolerance band for parameter X in the Table 1, are derived from other authors publications [14], but also based on the own laboratory metrology experience. In further research other methodologies for this purpose are planned to be deployed, like the suggested decision-making ranges as in ILAC G8, [22].
In [19] and [21], a stochastic optimisation approach for determination of the re-calibration period is presented. A genetic algorithm methodology is deployed for estimation of the next calibration period, considering the previous calibration history of the measurement device. The experimental results of last calibration certificate are used for verification of the predicted device measurement time drift in the moment of the estimated moment of next calibration. The modelling is performed by representing the time dependence of the instrument time-drift with Lagrange orthogonal polynomials, constructed from experimental calibration history embedded in an algorithm based on the statistical least square method and inclusion of the accompanying uncertainties in the genetic algorithm stochastic optimisation tool for determination of the coefficients of the polynomial model of the function of the time-drift of the instrument.

ESTIMATION OF A RE-CALIBRATION PERIOD-MODEL DEVELOPMENT
Based on the previous discussions and survey, the following innovative data fusion model for determination of the recalibration period is proposed: where are: MS -Maintain and service (0.08, newly introduced parameter). The model in the equation (1) is derived by combing models from [14] an [15], and by deploying the laboratory experience in electrical metrology (i.e., the linearity of the electrical instruments characteristics derived from the instruments technical specification), like the time-drift of the instruments [17]. The main idea is to provide an easy model ready to be used in the TIC entities everyday operations. The ECI can be specified depending on the experience with the stability of similar instruments, experience, and recommendations. This is a parameter containing the a priori knowledge in the data fusion process. Other coefficients of a priori knowledge origin are 1 , 2 , 3 , 4 , IC, CFU, CO, OFH and MS. The 1 , 2 , 3 , are historical weighting coefficients, of the previous three calibrations. In case of more than three previous data, for a shorter time history they can be taken into consideration with significantly lower weighting coefficients (less than 0,4 or 0,2, respectively for the fourth and fifth previous calibration), and in case of longer time history (longer periods of re-calibration of more than one year) they can be neglected.
The longest possible re-calibration period will be estimated, leading to a conclusion that this approach is more rigorous in comparison to the "simplified method" as defined in [15], only if the estimated period is shorter than the real time range used for validation derived from the last calibration certificate and in which the instrument was found to be in-tolerance. The parameters as multipliers are given in Table 1.

EXPERIMENTAL CASE STUDY FOR METHODOLOGY VALIDATION
A data base containing the historical data of previous calibrations of the instrument must be established and maintained by the TIC entity, for further proper implementation in the proposed model.
The proposed model is feasible to be utilized after at least two conducted calibrations of the instrument in appropriate time ranges. As a case study for validation of the proposed methodology, a real data base with the calibration history of a digital multimeter used during testing process by a TIC body is adopted. The variations of the calibration values should be considered in maximum available measurement points, emphasising points with detected changes. To be on the safe side, the most acceptable value of X is the smallest value among all available points. The expected value of the next calibration time moment can be obtained by deploying sophisticated algorithms previously published in [14], [18], [19], [20], and [21], but for some TIC entities their approaches introduce obstacles and risks for implementation. The methodology we present in this study, embedding the statistical tool of the least squares, is a simplified option for the TIC entities. This method is chosen because it is embedded in many already available calculators to the TIC entities in a very user-friendly form (e.g., MS Excel). The in-service checks with another instrument, should be accomplished in time moments and occasions where the uncertainty of calibration is on disposal for both instruments.
TIC entity's quality management proposes the extent of factors and habits of the staff, while the instrument operator specifies the frequency and conditions of use, which are another example of a priori knowledge in the data fusion process. Depending on the available history data and tracking behaviour of the instrument, the coefficients proposed in the algorithm can be modified and customized for each instrument or group of instruments (i.e., the model is universal and invariant to the type of instrument or the number of instruments).
The next calibration period of a METREL ® Eutotest XE MI 3102 tester is estimated as a case study for the model validation. Following the recommendations of the instrument producer Metrel ® [17], regular 6-months or 1-year calibration of all measurement functions of the instrument should be carried out. This case study has been chosen because of the instrument, the artefact of calibration, and because the laboratory already has sufficient data on disposal. Namely, the data used in the modelling and verification process, is for a period of 72 months (i.e., period of 6 years), which is an appropriate time range to derive sound regressive conclusions.
In Tables 2 and 3 the calibration history for the instrument in a single point of the current and voltage measurement ranges are given, respectively. The data used in Tables 2 and 3 are from the calibration certificates of the instrument, conducted by an external accredited calibration laboratory. To the moment of first calibration the zero value is assigned, and the time representation of the further moments of calibration are expressed in month units from the first calibration. The reference calibration value for the current is chosen to be 10 A, while the reference value for the voltage is selected to be 400 V.
The measurement uncertainty is divided by a corresponding coverage factor declared in the calibration certificate, as in the history calibrations, they are carried out in different laboratories, some expressing the expanded uncertainty at factor of coverage 400.00 0.50 k = 1.65 (for rectangular distribution at probability of 95 %) and some expressing the expanded uncertainty at factor of coverage k = 2 (for normal distribution at probability of 95 %). So, the standard uncertainty is utilized in the calculations. This is the step of data alignment in the process of fusion of heterogeneous data. The trend lines for both quantities -current at 10 A represented in equation (2), and voltage at 400 V represented in equation (3) -are derived by utilization of the statistical least square method, with exclusion of the experimental value from last calibration, and it is used for predictive verification of the models. The derived predictive models are: = −7 • 10 −7 ⋅ 3 + 10 −5 ⋅ 2 + 0.0013 ⋅ + 9.98 2 = 1 (2) U = −0.0024 ⋅ 2 + 0.1448 ⋅ t + 399 In Figures 1 and 2 the calculated expected values from the function models in (2) and (3) by inserting the last calibration time moment in the models, are shown in Figures 1 and 2. So the expected values are 9.85 A for current measuring range and 399.32 V for the voltage measurement range and are in tolerance. The differences between the calculated (theoretical) values and the real measured values, derive from the ranges of the measurements uncertainty in calibration which is one of the main inputs in the modelling, and which is of stochastic and unpredictable nature.
In the Table 4 are presented the experimental results of inservice check measurements with another instrument of similar type (comprising the same measurement ranges as the object of validation) with established measurement traceability. The results are in limits of errors (i.e., in-tolerance). In case of out of tolerance result derived from the in-service measurement checks, according to the prescribed laboratory procedures the instrument will be subjected to immediate re-calibration or put out of service.
Other values for the parameters in the algorithm are as follows: The last calibration is not used in the prediction of the next value and is used as a validation point of the algorithm. The real calibration period (between the last two calibrations) is 20 months, while the predicted re-calibration period by the proposed algorithm is 18 months. The values obtained with the last calibration validate the method. Shorter value of the recalibration interval is obtained, which is on the safe side, and can   be accepted as applicable without introduction of additional risks from aspect of the re-calibration period. In fact, the TIC operation's risk is significantly mitigated, i.e. minimized. Namely, in case when the derived theoretical value would be outside of the tolerance limits, is could be concluded that the planned recalibration is not appropriately chosen (i.e., it is too long). In that case, new planning should be conducted, in accordance with the period calculated using the model. Additional validation of the predictive model for determination of the re-calibration period is the a posteriori experimental approach of verification by executing in-service check measurements with another instrument of similar type (comprising the same measurement ranges as the object of validation) with established measurement traceability.
This case study has validated the proposed methodology for prediction of the next moment of instrument calibration. The derived results demonstrate reduced risk arising from out-oftolerance state of the instrument due to prolonged re-calibration period. So, this data fusion methodology, with the proposed simple procedure for application in any TIC entity, enables argumented decision making concerning the determination of the instrument re-calibration period. The mitigated risk from this aspect increases the confidence in the reliability in the instruments used by the conformity assessment body.

CONCLUSIONS
The proposed methodology for predicting the period of recalibration based on data fusion concept is simple, containing a plenty of data on factors influencing the stability of the instrument derived from diverse sources. It is easily deployable in the daily routine of any TIC entity. The model opts to decrease the quality management risk of the occurrence of errors due to inappropriately defined re-calibration period of any instrument used in the TIC activities.
The presented case study validates and confirms the effectiveness of the proposed methodology, through experimental values verification. An advantage of proposed universal model is its openness which enables the variation of the coefficients and provides means for specialization in case of a group of instruments. One of the options for generalisation of the method in case of a group of large number of instruments of the same type, is the possibility to fix some of the coefficients, and to make variations of certain coefficients. More thorough studies should be conducted in this context, by deploying statistical approaches for random sampling of data from previous calibrations of high number of instruments of same type (i.e., date reduction should be carried out through a data science approach).
Finally, it can be concluded that data fusion approach is highly adaptable for various decision-making situations in the TIC sector, opening possibilities for mitigation and reduction of risks during TIC operations.