OpenUCT :: Browsing by Subject "Algorithms"

Browsing by Subject "Algorithms"

Now showing 1 - 6 of 6

Open Access
A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data
(2017) Nasejje, Justine B; Mwambi, Henry; Sabur, Natasha F; Lesosky, Maia
Abstract Background Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate. Methods In this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points). Results The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points. Conclusion Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.
Open Access
The estimation of missing values in hydrological records using the EM algorithm and regression methods
(1988) Makhuvha, Tondani; Zucchini, Walter; Sparks, Ross S
The objective of this thesis is to review existing methods for estimating missing values in rainfall records and to propose a number of new procedures. Two classes of methods are considered. The first is based on the theory of variable selection in regression. Here the emphasis is on finding efficient methods to identify the set of control stations which are likely to yield the best regression estimates of the missing values in the target station. The second class of methods is based on the EM algorithm, proposed by Dempster, Laird and Rubin (1977). The emphasis here is to estimate the missing values directly without first making a detailed selection of control stations. All "relevant" stations are included. This method has not previously been applied in the context of estimating missing rainfall values.
Open Access
Finding regular simple paths in graph databases
(1995) Mendelzon, Alberto O; Wood, Peter T
We consider the following problem : given a labelled directed graph G and a regular expression R, find all pairs of nodes connected by a simple path such that the concatenation of the labels along the path satisfies R. The problem is motivated by the observation that many recursive queries in relational databases can be expressed in this form, and by the implementation of query language, G+, based on this observation. We show that the problem is in general intractable, but present an algorithm than runs in polynomial time in the size of the graph when the regular expression and the graph are free of conflicts. We also present a class of languages whose expressions can always be evaluated in time polynomial in the size of both the graph and the expression, and characterize syntactically the expressions for such languages.
Open Access
Optimising regionalisation techniques: identifying centres of endemism in the extraordinarily endemic-rich Cape Floristic Region
(Public Library of Science, 2015) Bradshaw, Peter L; Colville, Jonathan F; Linder, H Peter
We used a very large dataset (>40% of all species) from the endemic-rich Cape Floristic Region (CFR) to explore the impact of different weighting techniques, coefficients to calculate similarity among the cells, and clustering approaches on biogeographical regionalisation. The results were used to revise the biogeographical subdivision of the CFR. We show that weighted data (down-weighting widespread species), similarity calculated using Kulczinsky's second measure, and clustering using UPGMA resulted in the optimal classification. This maximized the number of endemic species, the number of centres recognized, and operational geographic units assigned to centres of endemism (CoEs). We developed a dendrogram branch order cut-off (BOC) method to locate the optimal cut-off points on the dendrogram to define candidate clusters. Kulczinsky's second measure dendrograms were combined using consensus, identifying areas of conflict which could be due to biotic element overlap or transitional areas. Post-clustering GIS manipulation substantially enhanced the endemic composition and geographic size of candidate CoEs. Although there was broad spatial congruence with previous phytogeographic studies, our techniques allowed for the recovery of additional phytogeographic detail not previously described for the CFR.
Open Access
Purely competitive evolutionary dynamics for games
(2012) Veller, Carl; Rajpaul, Vinesh
We introduce and analyze a purely competitive dynamics for the evolution of an infinite population subject to a 3-strategy game. We argue that this dynamics represents a characterization of how certain systems, both natural and artificial, are governed. In each period, the population is randomly sorted into pairs, which engage in a once-off play of the game; the probability that a member propagates its type to its offspring is proportional only to its payoff within the pair. We show that if a type is dominant (obtains higher payoffs in games with both other types), its 'pure' population state, comprising only members of that type, is globally attracting. If there is no dominant type, there is an unstable 'mixed' fixed point; the population state eventually oscillates between the three near-pure states. We then allow for mutations, where offspring have a non-zero probability of randomly changing their type. In this case, the existence of a dominant type renders a point near its pure state globally attracting. If no dominant type exists, a supercritical Hopf bifurcation occurs at the unique mixed fixed point, and above a critical (typically low) mutation rate, this fixed point becomes globally attracting: the implication is that even very low mutation rates can stabilize a system that would, in the absence of mutations, be unstable.
Open Access
A variational Bayes approach to the analysis of occupancy models
(Public Library of Science, 2016) Clark, Allan E; Altwegg, Res; Ormerod, John T
Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site ( K ) are as low as three and that the accuracy of the approximations improves as K increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).

Browsing by Subject "Algorithms"

Results Per Page

Sort Options