1 |
Nonlinear manipulation and analysis of large DNA datasetsCui, Meiying, Zhao, Xueping, Reddavide, Francesco V., Patino Gaillez, Michelle, Heiden, Stephan, Mannocci, Luca, Thompson, Michael, Zhang, Yixin 05 March 2024 (has links)
Information processing functions are essential for organisms to perceive and react to their complex environment, and for humans to analyze and rationalize them. While our brain is extraordinary at processing complex information, winner-take-all, as a type of biased competition is one of the simplest models of lateral inhibition and competition among biological neurons. It has been implemented as DNAbased neural networks, for example, to mimic pattern recognition. However, the utility of DNA-based computation in information processing for real biotechnological applications remains to be demonstrated. In this paper, a biased competitionmethod for nonlinear manipulation and analysis ofmixtures of DNA sequences was developed. Unlike conventional biological experiments, selected species were not directly subjected to analysis. Instead, parallel computation among a myriad of different DNA sequences was carried out to reduce the information entropy. The method could be used for various oligonucleotideencoded libraries, as we have demonstrated its application in decoding and data analysis for selection experiments with DNA-encoded chemical libraries against protein targets.
|
2 |
pcApriori: Scalable apriori for multiprocessor systemsSchlegel, Benjamin, Kiefer, Tim, Kissinger, Thomas, Lehner, Wolfgang 16 September 2022 (has links)
Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores.
|
3 |
Analysis of MRI and CT-based radiomics features for personalized treatment in locally advanced rectal cancer and external validation of published radiomics modelsShahzadi, Iram, Zwanenburg, Alex, Lattermann, Annika, Linge, Annett, Baldus, Christian, Peeken, Jan C., Combs, Stephanie E., Diefenhardt, Markus, Rödel, Claus, Kirste, Simon, Grosu, Anca-Ligia, Baumann, Michael, Krause, Mechthild, Troost, Esther G. C., Löck, Steffen 05 April 2024 (has links)
Radiomics analyses commonly apply imaging features of different complexity for the prediction of the endpoint of interest. However, the prognostic value of each feature class is generally unclear. Furthermore, many radiomics models lack independent external validation that is decisive for their clinical application. Therefore, in this manuscript we present two complementary studies. In our modelling study, we developed and validated different radiomics signatures for outcome prediction after neoadjuvant chemoradiotherapy (nCRT) in patients with locally advanced rectal cancer (LARC) based on computed tomography (CT) and T2-weighted (T2w) magnetic resonance (MR) imaging datasets of 4 independent institutions (training: 122, validation 68 patients). We compared different feature classes extracted from the gross tumour volume for the prognosis of tumour response and freedom from distant metastases (FFDM): morphological and first order (MFO) features, second order texture (SOT) features, and Laplacian of Gaussian (LoG) transformed intensity features. Analyses were performed for CT and MRI separately and combined. Model performance was assessed by the area under the curve (AUC) and the concordance index (CI) for tumour response and FFDM, respectively. Overall, intensity features of LoG transformed CT and MR imaging combined with clinical T stage (cT) showed the best performance for tumour response prediction, while SOT features showed good performance for FFDM in independent validation (AUC = 0.70, CI = 0.69). In our external validation study, we aimed to validate previously published radiomics signatures on our multicentre cohort. We identified relevant publications on comparable patient datasets through a literature search and applied the reported radiomics models to our dataset. Only one of the identified studies could be validated, indicating an overall lack of reproducibility and the need of further standardization of radiomics before clinical application.
|
4 |
Efficient Query Processing for Dynamically Changing DatasetsIdris, Muhammad, Ugarte, Martín, Vansummeren, Stijn, Voigt, Hannes, Lehner, Wolfgang 11 August 2022 (has links)
The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. Traditional approaches to this problem were developed around the notion of Incremental View Maintenance (IVM), and are based either on the materialization of subresults (to avoid their recomputation) or on the recomputation of subresults (to avoid the space overhead of materialization). Both techniques are suboptimal: instead of materializing results and subresults, one may also maintain a data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In two previous articles, we have presented algorithms for dynamically evaluating queries that are easy to implement, efficient, and can be naturally extended to evaluate queries from a wide range of application domains. In this paper, we discuss our algorithm and its complexity, explaining the main components behind its efficiency. Finally, we show experiments that compare our algorithm to a state-of-the-art (Higher-order) IVM engine, as well as to a prominent complex event recognition engine. Our approach outperforms the competitor systems by up to two orders of magnitude in processing time, and one order in memory consumption.
|
Page generated in 0.0534 seconds