Spelling suggestions: "subject:"largescale"" "subject:"largerscale""
321 |
Prediction of DNA-Binding Proteins and their Binding SitesPokhrel, Pujan 01 May 2018 (has links)
DNA-binding proteins play an important role in various essential biological processes such as DNA replication, recombination, repair, gene transcription, and expression. The identification of DNA-binding proteins and the residues involved in the contacts is important for understanding the DNA-binding mechanism in proteins. Moreover, it has been reported in the literature that the mutations of some DNA-binding residues on proteins are associated with some diseases. The identification of these proteins and their binding mechanism generally require experimental techniques, which makes large scale study extremely difficult. Thus, the prediction of DNA-binding proteins and their binding sites from sequences alone is one of the most challenging problems in the field of genome annotation. Since the start of the human genome project, many attempts have been made to solve the problem with different approaches, but the accuracy of these methods is still not suitable to do large scale annotation of proteins. Rather than relying solely on the existing machine learning techniques, I sought to combine those using novel “stacking technique” and used the problem-specific architectures to solve the problem with better accuracy than the existing methods. This thesis presents a possible solution to the DNA-binding proteins prediction problem which performs better than the state-of-the-art approaches.
|
322 |
Automatic Synthesis of VLSI Layout for Analog Continuous-time FiltersRobinson, David Lyle 17 March 1995 (has links)
Automatic synthesis of digital VLSI layout has been available for many years. It has become a necessary part of the design industry as the window of time from conception to production shrinks with ever increasing competition. However, automatic synthesis of analog VLSI layout remains rare. With digital circuits, there is often room for signal drift. In a digital circuit, a signal can drift within a range before hitting the threshold which triggers a change in logic state. The effect of parasitic capacitances for the most part, hinders the timing margins of the signal, but not its functionality. The logic functionality is protected by the inherent noise immunity of digital circuits. With analog circuits, however, there is little room for drift. Parasitic influence directly affects signal integrity and the functionality of the circuit. The underlying problem automatic VLSI layout programs face is how to minimize this influence. This thesis describes a software tool that was written to show that the minimization of parasitic influence is possible in the case of automatic layout of continuous-time filters using transconductance-capacitor methods.
|
323 |
Developing a national frame of reference on student achievement by weighting student records from a state assessmentTudor, Joshua 01 May 2015 (has links)
A fundamental issue in educational measurement is what frame of reference to use when interpreting students’ performance on an assessment. One frame of reference that is often used to enhance interpretations of test scores is normative, which adds meaning to test score interpretations by indicating the rank of an individual’s score within a distribution of test scores of a well-defined reference group. One of the most commonly used frames of reference on student achievement provided by test publishers of large-scale assessments is national norms, whereby students’ test scores are referenced to a distribution of scores of a nationally representative sample. A national probability sample can fail to fully represent the population because of student and school nonparticipation. In practice, this is remedied by weighting the sample so that it better represents the intended reference population.
The focus of this study was on weighting and determining the extent to which weighting grade 4 and grade 8 student records that are not fully representative of the nation can recover distributions of reading and math scores in a national probability sample. Data from a statewide testing program were used to create six grade 4 and grade 8 datasets, each varying in its degree of representativeness of the nation, as well as in the proximity of its reading and math distributions to those of a national sample. The six datasets created for each grade were separately weighted to different population totals in two different weighting conditions using four different bivariate stratification designs. The weighted distributions were then smoothed and compared to smoothed distributions of the national sample in terms of descriptive statistics, maximum absolute differences between the relative cumulative frequency distributions, and chi-square effect sizes. The impact of using percentile ranks developed from the state data was also investigated.
By and large, the smoothed distributions of the weighted datasets were able to recover the national distribution in each content area, grade, and weighting condition. Weighting the datasets to the nation was effective in making the state test score distributions more similar to the national distributions. Moreover, the stratification design that defined weighting cells by the joint distribution of median household income and ethnic composition of the school consistently produced desirable results for the six datasets used in each grade. Log-linear smoothing using a polynomial of degree 4 was effective in making the weighted distributions even more similar to those in the national sample. Investigation of the impact of using the percentile ranks derived from the state datasets revealed that the percentile ranks of the distributions that were most similar to the national distributions resulted in a high percentage of agreement when classifying student performance based on raw scores associated with the same percentile rank in each dataset. The utility of having a national frame of reference on student achievement, and the efficacy of estimating such a frame of reference from existing data are also discussed.
|
324 |
Machine-learning based automated segmentation tool development for large-scale multicenter MRI data analysisKim, Eun Young 01 December 2013 (has links)
Background: Volumetric analysis of brain structures from structural Mag- netic Resonance (MR) images advances the understanding of the brain by providing means to study brain morphometric changes quantitatively along aging, development, and disease status. Due to the recent increased emphasis on large-scale multicenter brain MR study design, the demand for an automated brain MRI processing tool has increased as well. This dissertation describes an automatic segmentation framework for subcortical structures of brain MRI that is robust for a wide variety of MR data.
Method: The proposed segmentation framework, BRAINSCut, is an inte- gration of robust data standardization techniques and machine-learning approaches. First, a robust multi-modal pre-processing tool for automated registration, bias cor- rection, and tissue classification, has been implemented for large-scale heterogeneous multi-site longitudinal MR data analysis. The segmentation framework was then constructed to achieve robustness for large-scale data via the following comparative experiments: 1) Find the best machine-learning algorithm among several available approaches in the field. 2) Find an efficient intensity normalization technique for the proposed region-specific localized normalization with a choice of robust statistics. 3) Find high quality features that best characterize the MR brain subcortical structures. Our tool is built upon 32 handpicked multi-modal muticenter MR images with man- ual traces of six subcortical structures (nucleus accumben, caudate nucleus, globus pallidum, putamen, thalamus, and hippocampus) from three experts.
A fundamental task associated with brain MR image segmentation for re- search and clinical trials is the validation of segmentation accuracy. This dissertation evaluated the proposed segmentation framework in terms of validity and reliability. Three groups of data were employed for the various evaluation aspects: 1) traveling human phantom data for the multicenter reliability, 2) a set of repeated scans for the measurement stability across various disease statuses, and 3) a large-scale data from Huntington's disease (HD) study for software robustness as well as segmentation accuracy.
Result: Segmentation accuracy of six subcortical structures was improved with 1) the bias-corrected inputs, 2) the two region-specific intensity normalization strategies and 3) the random forest machine-learning algorithm with the selected feature-enhanced image. The analysis of traveling human phantom data showed no center-specific bias in volume measurements from BRAINSCut. The repeated mea- sure reliability of the most of structures also displayed no specific association to disease progression except for caudate nucleus from the group of high risk for HD. The constructed segmentation framework was successfully applied on multicenter MR data from PREDICT-HD [133] study ( < 10% failure rate over 3000 scan sessions pro- cessed).
Conclusion: Random-forest based segmentation method is effective and robust to large-scale multicenter data variation, especially with a proper choice of the intensity normalization techniques. Benefits of proper normalization approaches are more apparent compared to the custom set of feature-enhanced images for the ccuracy and robustness of the segmentation tool. BRAINSCut effectively produced subcortical volumetric measurements that are robust to center and disease status with validity confirmed by human experts and low failure rate from large-scale multicenter MR data. Sample size estimation, which is crutial for designing efficient clinical and research trials, is provided based on our experiments for six subcortical structures.
|
325 |
Control of Large Stands of Phragmites australis in Great Salt Lake, Utah WetlandsCranney, Chad R. 01 May 2016 (has links)
Phragmites australis (hereafter Phragmites) often forms dense monocultures, which displace native plant communities and alter ecosystem functions and services. Managers tasked with controlling this plant need science-backed guidance on how to control Phragmites and restore native plant communities. This study took a large-scale approach - to better match the scale of actual restoration efforts - to compare two herbicides (glyphosate vs. imazapyr) and application timings (summer vs. fall). Five treatments were applied to 1.2 ha plots for three consecutive years: 1) summer glyphosate; 2) summer imazapyr; 3) fall glyphosate; 4) fall imazapyr; and 5) untreated control. Dead Phragmites following herbicide treatments was mowed in the first two years. Efficacy of treatments and the response of native plant communities were monitored for three years. We report that fall herbicide applications were superior to summer applications. No difference was found between the two herbicides in their ability to reduce Phragmites cover. Plant communities switched from emergent to open water communities and were limited by Phragmites litter and water depth. Although, some plant communities showed a slow trajectory towards one of the reference sites, cover of important native emergent plants did not increase until year three and remained below 10%. These results suggest that fall is the best time to apply herbicides for effective large-scale control of Phragmites. Active restoration (e.g. seeding) may be needed to gain back important native plant communities. Methods to reduce Phragmites litter after herbicide applications should be considered.
|
326 |
Scalable Energy-efficient Location-Aided Routing (SELAR) Protocol for Wireless Sensor NetworksLukachan, George 01 November 2005 (has links)
Large-scale wireless sensor networks consist of thousands of tiny and low cost nodes with very limited energy, computing power and communication capabilities. They have a myriad of possible applications. They can be used in hazardous and hostile environments to sense for deadly gases and high temperatures, in personal area networks to monitor vital signs, in military and civilian environments for intrusion detection and tracking, emergency operations, etc. In large scale wireless sensor networks the protocols need to be scalable and energy-efficient. Further, new strategies are needed to address the well-known energy depletion problem that nodes close to the sink node face. In this thesis the Scalable Energy-efficient Location-Aided Routing (SELAR) protocol for wireless sensor networks is proposed to solve the above mentioned problems. In SELAR, nodes use location and energy information of the neighboring nodes to perform the routing function. Further, the sink node is moved during the network operation to increase the network lifetime. By means of simulations, the SELAR protocol is evaluated and compared with two very well-known protocols - LEACH (Low-Energy Adaptive-Clustering Hierarchy) and MTE (Minimum Transmission Energy). The results indicate that in realistic senarios,SELAR delivers up to 12 times more and up to 1.4 times more data packets to the base station than LEACH and MTE respectively. It was also seen from the results that for realistic scenarios, SELAR with moving base station has up to 5 times and up to 27 times more lifetime duration compared to MTE and LEACH respectively.
|
327 |
Large Scale ETL Design, Optimization and Implementation Based On Spark and AWS PlatformZhu, Di January 2017 (has links)
Nowadays, the amount of data generated by users within an Internet product is increasing exponentially, for instance, clickstream for a website application from millions of users, geospatial information from GIS-based APPs of Android and IPhone, or sensor data from cars or any electronic equipment, etc. All these data may be yielded billions every day, which is not surprisingly essential that insights could be extracted or built. For instance, monitoring system, fraud detection, user behavior analysis and feature verification, etc.Nevertheless, technical issues emerge accordingly. Heterogeneity, massiveness and miscellaneous requirements for taking use of the data from different dimensions make it much harder when it comes to the design of data pipelines, transforming and persistence in data warehouse. Undeniably, there are traditional ways to build ETLs from mainframe [1], RDBMS, to MapReduce and Hive. Yet with the emergence and popularization of Spark framework and AWS, this procedure could be evolved to a more robust, efficient, less costly and easy-to-implement architecture for collecting, building dimensional models and proceeding analytics on massive data. With the advantage of being in a car transportation company, billions of user behavior events come in every day, this paper contributes to an exploratory way of building and optimizing ETL pipelines based on AWS and Spark, and compare it with current main Data pipelines from different aspects. / Mängden data som genereras internet-produkt-användare ökar lavinartat och exponentiellt. Det finns otaliga exempel på detta; klick-strömmen från hemsidor med miljontals användare, geospatial information från GISbaserade Android och iPhone appar, eller från sensorer på autonoma bilar.Mängden händelser från de här typerna av data kan enkelt uppnå miljardantal dagligen, därför är det föga förvånande att det är möjligt att extrahera insikter från de här data-strömmarna. Till exempel kan man sätta upp automatiserade övervakningssystem eller kalibrera bedrägerimodeller effektivt. Att handskas med data i de här storleksordningarna är dock inte helt problemfritt, det finns flertalet tekniska bekymmer som enkelt kan uppstå. Datan är inte alltid på samma form, den kan vara av olika dimensioner vilket gör det betydligt svårare att designa en effektiv data-pipeline, transformera datan och lagra den persistent i ett data-warehouse. Onekligen finns det traditionella sätt att bygga ETL’s på från mainframe [1], RDBMS, till MapReduce och Hive. Dock har det med upptäckten och ökade populariteten av Spark och AWS blivit mer robust, effektivt, billigare och enklare att implementera system för att samla data, bygga dimensions-enliga modeller och genomföra analys av massiva data-set. Den här uppsatsen bidrar till en ökad förståelse kring hur man bygger och optimerar ETL-pipelines baserade på AWS och Spark och jämför med huvudsakliga nuvarande Data-pipelines med hänsyn till diverse aspekter. Uppsatsen drar nytta av att ha tillgång till ett massivt data-set med miljarder användar-events genererade dagligen från ett bil-transport-bolag i mellanöstern.
|
328 |
Investigations into methods and analysis of computer aided design of VLSI circuitsNoonan, J. A. (John Anthony) January 1986 (has links) (PDF)
Includes bibliography.
|
329 |
An investigation of combined failure mechanisms in large scale open pit slopesFranz, Juergen, Mining Engineering, Faculty of Engineering, UNSW January 2009 (has links)
Failure mechanisms in large scale open pit slopes are more complex than could be considered through conventional slope design methods. Pit slope behaviour must be predicted accurately, because for very deep open pits, a small change of slope angle can have serious technical and economic consequences. Failure of hard rock slopes often involves both failure along naturally existing weakness planes and failure of intact rock. Without an advanced understanding of combined rock slope failure mechanisms, the validity of commonly applied methods of large scale slope analysis is questionable. The problem was investigated by means of a toolbox approach, in which a wide range of slope stability analysis methods were used and compared to address specific problems arising during slope design optimisation of the Cadia Hill Open Pit, NSW. In particular, numerical modelling is an advanced tool to obtain insight into potential failure mechanisms and to assist the slope design process. The distinct element method was employed to simulate complex rock slope failure, including fracture extension, progressive step-path failure and brittle failure propagation, which were previously often considered unimportant or too difficult to model. A new, failure-scale-dependent concept for the categorisation of slope failures with six categories ranging from 0 (stable) to 5 (overall slope failure) was suggested to assist risk-based slope design. Parametric slope modelling was conducted to determine the interrelationship between proposed categories and critical slope/discontinuity parameters. Initiation and progression of complex slope failure were simulated and described, which resulted in an advanced understanding of combined slope failure mechanisms and the important role of rock bridges in large scale slope stability. A graphical presentation of the suggested slope failure categories demonstrated their interrelationship to varied slope/discontinuity parameters. Although large scale slope analyses will always involve data-limited systems, this investigation shows that comprehensive, conceptual modelling of slope failure mechanisms can deliver a significantly improved insight into slope behaviour, so that associated slope failure risks can be judged with more confidence. The consideration of combined slope failure mechanisms in the analysis of large scale open pit slopes is essential if slope behaviour is to be realistically modelled.
|
330 |
"On stochastic modelling of very large scale integrated circuits : an investigation into the timing behaviour of microelectronic systems" / Gregory Raymond H. BishopBishop, Gregory Raymond H. January 1993 (has links)
Bibliography: leaves 302-320 / xiv, iii, 320 leaves : ill ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Thesis (Ph.D.)--University of Adelaide, Faculty of Engineering, 1994?
|
Page generated in 0.0255 seconds