General information - Recurrent event prediction

General information

Description

Recurrent process (for example, failure-repair) is analyzed. First, non-parametric estimation of expected number of events is performed, then regression analysis is applied to obtained data for recurrent event prediction. To apply it, we have to calculate the cumulative intensity function (CIF) at each event time using non-parametric estimation. For this purpose we sorted all recurrence and censoring times (ages). If a recurrence age for an item is the same as its suspension age, then the recurrence time goes first. If multiple units have a common recurrence time, then they are shifted slightly, so that at any time only one event occurs. Then we calculated CIF using the following formula [1]

where r_i=r_i-1 if t_i is recurrence time, r_i=r_i-1-1 if t_i is censoring time, r1=N, which is the total number of items in the test. Parameter a=1 in all cases except the case when t_i corresponds to the very first failure in the test. In this case a=0.5. This value better fits the obvious condition CIF(t)≈F(t) for small CIF (according to non-parametric estimation for underlying failure probability function F(t)).

Note, that the accuracy of this formula significantly depends on the number of observed components. The error is in-between 1/N and 1/r_N where N is the total number of components and r_N is the number of operating components at the end of observation. Therefore, the formula is valid at time t_i of a failure if corresponding value of CIF is much greater than 1/r_i. We recommend to use this calculation only if N>10. Otherwise it is better to apply maximum likelihood estimator (MLE). However, in case of large enough number of components N, the suggested regression analysis allows better data approximation using Pade functions with many parameters and different shapes. For this reason it is also more efficient for data extrapolation (see Examples).

The main purpose of this calculation is obtaining a relatively simple formula of data fitting y=f(x). We apply the generalized Pade functions [2] for the approximation of data in the following form:

Function F(x) defines CIF for small x as CIF(x)≈F(x). It is calculated using traditional MLE taking into account only first failures of all components assuming that the lifetime distribution is Weibull. Having shape and scale parameters beta and eta we obtain asymptotic presentation of CIF and Weibull function (x/eta)^beta which is valid for small x. Coefficients a_i, and b_i should be chosen as the best approximation of all given data. We try different values of m and k, and different substitutions X=x^q, (0.1< q ≤ 2) in the program looking for the best approximation. As a result we obtain approximation of CIF(x) function in the entire range of argument x.
Pade functions have the following advantages [2]:

They have the same or better convergence compared to the power series.

Calculating the Pade function coefficients, we have to solve only linear equations [3].

You can find an excellent summary of properties of Pade functions here.

Assumptions

It is assumed that we have enough data for a reasonable approximation. We start to calculate the Pade function coefficients when the number of them is equal to 2, increasing this number until we reach the required accuracy or any other condition of approximation defined by user.

Methodology

If traditional method of Residual Sum of Squares (RSS) is used, the application of Pade functions in regression analysis leads to a system of nonlinear equations. This problem is resolved in our paper [3], Section 3. We have implemented this method in the suggested calculation. In addition to estimation of coefficients of rational function, it allowed us to try different combinations of m and k and apply the above mentioned substitutions X=x^q.

Increasing the number of coefficients of approximation function from 3 until the maximum (or required accuracy) entered by user will be reached and varying the form of approximation function on each step, the program calculates and chooses the best approximation. The criterion is the standard deviation. The calculation will be completed if one of the following 2 conditions is reached:

When the error of approximation is equal to or less than that entered by user. The error of approximation is calculated as the ratio of the maximum error of approximation to the mean value of absolute values of ordinates of all given points.
When the current amount of coefficients of the approximation function is equal to the maximum entered by user. This condition is important when user prefers to obtain a shorter approximation formula rather than to reach stronger accuracy of approximation. The methodology is described in the conference paper [4].

We suppose that this calculation would be useful to researchers and engineers looking for good approximation of data with simple formula. To check the efficiency of suggested calculation, please see examples.

References

W. B. Nelson, Recurrent Events Data Analysis for Products Repairs, Disease Recurrences, and Other Applications, 2003.
C. Baker, P. Graves-Morris, Pade Approximation.
G. Yevkin, A.Yevkin, On regression analysis with Pade approximants,Communication in Statistics - Theory and Methods, November 2023.
A. Yevkin, V. Krivtsov, Modeling Recurrent Failure Processes using Padé Approximants, RAMS 2025. Proceedings of Reliability and Maintainability Symposium, RAMS 2025, Florida, US.