Spelling suggestions: "subject:"realworld"" "subject:"realworld""
31 |
Local and personalised models for prediction, classification and knowledge discovery on real world data modelling problemsHwang, Yuan-Chun January 2009 (has links)
This thesis presents several novel methods to address some of the real world data modelling issues through the use of local and individualised modelling approaches. A set of real world data modelling issues such as modelling evolving processes, defining unique problem subspaces, identifying and dealing with noise, outliers, missing values, imbalanced data and irrelevant features, are reviewed and their impact on the models are analysed. The thesis has made nine major contributions to information science, includes four generic modelling methods, three real world application systems that apply these methods, a comprehensive review of the real world data modelling problems and a data analysis and modelling software. Four novel methods have been developed and published in the course of this study. They are: (1) DyNFIS – Dynamic Neuro-Fuzzy Inference System, (2) MUFIS – A Fuzzy Inference System That Uses Multiple Types of Fuzzy Rules, (3) Integrated Temporal and Spatial Multi-Model System, (4) Personalised Regression Model. DyNFIS addresses the issue of unique problem subspaces by identifying them through a clustering process, creating a fuzzy inference system based on the clusters and applies supervised learning to update the fuzzy rules, both antecedent and consequent part. This puts strong emphasis on the unique problem subspaces and allows easy to understand rules to be extracted from the model, which adds knowledge to the problem. MUFIS takes DyNFIS a step further by integrating a mixture of different types of fuzzy rules together in a single fuzzy inference system. In many real world problems, some problem subspaces were found to be more suitable for one type of fuzzy rule than others and, therefore, by integrating multiple types of fuzzy rules together, a better prediction can be made. The type of fuzzy rule assigned to each unique problem subspace also provides additional understanding of its characteristics. The Integrated Temporal and Spatial Multi-Model System is a different approach to integrating two contrasting views of the problem for better results. The temporal model uses recent data and the spatial model uses historical data to make the prediction. By combining the two through a dynamic contribution adjustment function, the system is able to provide stable yet accurate prediction on real world data modelling problems that have intermittently changing patterns. The personalised regression model is designed for classification problems. With the understanding that real world data modelling problems often involve noisy or irrelevant variables and the number of input vectors in each class may be highly imbalanced, these issues make the definition of unique problem subspaces less accurate. The proposed method uses a model selection system based on an incremental feature selection method to select the best set of features. A global model is then created based on this set of features and then optimised using training input vectors in the test input vector’s vicinity. This approach focus on the definition of the problem space and put emphasis the test input vector’s residing problem subspace. The novel generic prediction methods listed above have been applied to the following three real world data modelling problems: 1. Renal function evaluation which achieved higher accuracy than all other existing methods while allowing easy to understand rules to be extracted from the model for future studies. 2. Milk volume prediction system for Fonterra achieved a 20% improvement over the method currently used by Fonterra. 3. Prognoses system for pregnancy outcome prediction (SCOPE), achieved a more stable and slightly better accuracy than traditional statistical methods. These solutions constitute a contribution to the area of applied information science. In addition to the above contributions, a data analysis software package, NeuCom, was primarily developed by the author prior and during the PhD study to facilitate some of the standard experiments and analysis on various case studies. This is a full featured data analysis and modelling software that is freely available for non-commercial purposes (see Appendix A for more details). In summary, many real world problems consist of many smaller problems. It was found beneficial to acknowledge the existence of these sub-problems and address them through the use of local or personalised models. The rules extracted from the local models also brought about the availability of new knowledge for the researchers and allowed more in-depth study of the sub-problems to be carried out in future research.
|
32 |
What is Relevant Mathematics? An exploration of two perspectives on relevant mathematics in the high school classroomJanuary 2012 (has links)
abstract: Recently there has been an increase in the number of people calling for the incorporation of relevant mathematics in the mathematics classroom. Unfortunately, various researchers define the term relevant mathematics differently, establishing several ideas of how relevancy can be incorporated into the classroom. The differences between mathematics education researchers' definitions of relevant and the way they believe relevant math should be implemented in the classroom, leads one to conclude that a similarly varied set of perspectives probably exists between teachers and students as well. The purpose of this exploratory study focuses on how the student and teacher perspectives on relevant mathematics in the classroom converge or diverge. Specifically, do teachers and students see the same lessons, materials, content, and approach as relevant? A survey was conducted with mathematics teachers at a suburban high school and their algebra 1 and geometry students to provide a general idea of their views on relevant mathematics. An analysis of the findings revealed three major differences: the discrepancy between frequency ratings of teachers and students, the differences between how teachers and students defined the term relevance and how the students' highest rated definitions were the least accounted for among the teacher generated questions, and finally the impact of differing attitudes towards mathematics on students' feelings towards its relevance. / Dissertation/Thesis / M.A. Curriculum and Instruction 2012
|
33 |
Model selection in time series machine learning applicationsFerreira, E. (Eija) 01 September 2015 (has links)
Abstract
Model selection is a necessary step for any practical modeling task. Since the true model behind a real-world process cannot be known, the goal of model selection is to find the best approximation among a set of candidate models.
In this thesis, we discuss model selection in the context of time series machine learning applications. We cover four steps of the commonly followed machine learning process: data preparation, algorithm choice, feature selection and validation. We consider how the characteristics and the amount of data available should guide the selection of algorithms to be used, and how the data set at hand should be divided for model training, selection and validation to optimize the generalizability and future performance of the model. We also consider what are the special restrictions and requirements that need to be taken into account when applying regular machine learning algorithms to time series data. We especially aim to bring forth problems relating model over-fitting and over-selection that might occur due to careless or uninformed application of model selection methods.
We present our results in three different time series machine learning application areas: resistance spot welding, exercise energy expenditure estimation and cognitive load modeling. Based on our findings in these studies, we draw general guidelines on which points to consider when starting to solve a new machine learning problem from the point of view of data characteristics, amount of data, computational resources and possible time series nature of the problem. We also discuss how the practical aspects and requirements set by the environment where the final model will be implemented affect the choice of algorithms to use. / Tiivistelmä
Mallinvalinta on oleellinen osa minkä tahansa käytännön mallinnusongelman ratkaisua. Koska mallinnettavan ilmiön toiminnan taustalla olevaa todellista mallia ei voida tietää, on mallinvalinnan tarkoituksena valita malliehdokkaiden joukosta sitä lähimpänä oleva malli.
Tässä väitöskirjassa käsitellään mallinvalintaa aikasarjamuotoista dataa sisältävissä sovelluksissa neljän koneoppimisprosessissa yleisesti noudatetun askeleen kautta: aineiston esikäsittely, algoritmin valinta, piirteiden valinta ja validointi. Väitöskirjassa tutkitaan, kuinka käytettävissä olevan aineiston ominaisuudet ja määrä tulisi ottaa huomioon algoritmin valinnassa, ja kuinka aineisto tulisi jakaa mallin opetusta, testausta ja validointia varten mallin yleistettävyyden ja tulevan suorituskyvyn optimoimiseksi. Myös erityisiä rajoitteita ja vaatimuksia tavanomaisten koneoppimismenetelmien soveltamiselle aikasarjadataan käsitellään. Työn tavoitteena on erityisesti tuoda esille mallin ylioppimiseen ja ylivalintaan liittyviä ongelmia, jotka voivat seurata mallinvalin- tamenetelmien huolimattomasta tai osaamattomasta käytöstä.
Työn käytännön tulokset perustuvat koneoppimismenetelmien soveltamiseen aikasar- jadatan mallinnukseen kolmella eri tutkimusalueella: pistehitsaus, fyysisen harjoittelun aikasen energiankulutuksen arviointi sekä kognitiivisen kuormituksen mallintaminen. Väitöskirja tarjoaa näihin tuloksiin pohjautuen yleisiä suuntaviivoja, joita voidaan käyttää apuna lähdettäessä ratkaisemaan uutta koneoppimisongelmaa erityisesti aineiston ominaisuuksien ja määrän, laskennallisten resurssien sekä ongelman mahdollisen aikasar- jaluonteen näkökulmasta. Työssä pohditaan myös mallin lopullisen toimintaympäristön asettamien käytännön näkökohtien ja rajoitteiden vaikutusta algoritmin valintaan.
|
34 |
A Methodology of Dataset Generation for Secondary Use of Health Care Big Data / 保健医療ビックデータの二次利用におけるデータセット生成に関する方法論Iwao, Tomohide 23 March 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22575号 / 情博第712号 / 新制||情||122(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 黒田 知宏, 教授 守屋 和幸, 教授 吉川 正俊 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
35 |
ScaleMesh: A Scalable Dual-Radio Wireless Mesh TestbedElRakabawy, Sherif M., Frohn, Simon, Lindemann, Christoph 17 December 2018 (has links)
In this paper, we introduce and evaluate ScaleMesh, a scalable miniaturized dual-radio wireless mesh testbed based on IEEE 802.11b/g technology. ScaleMesh can emulate large-scale mesh networks within a miniaturized experimentation area by adaptively shrinking the transmission range of mesh nodes by means of variable signal attenuators. To this end, we derive a theoretical formula for approximating the attenuation level required for downscaling desired network topologies. We present a performance study in which we validate the feasibility of ScaleMesh for network emulation and protocol evaluation. We further conduct singleradio vs. dual-radio experiments in ScaleMesh, and show that dual-radio communication significantly improves network goodput. The median TCP goodput we observe in a typical random topology at 54 Mbit/s and dual-radio communication ranges between 1468 Kbit/s and 7448 Kbit/s, depending on the current network load.
|
36 |
Exploring and Expanding Through 'Real-World' Tasks: The Digital Practices of Generation Z Post-Secondary FSL LearnersDouglas, Shayna 12 June 2023 (has links)
This exploratory case study examined the digital practices and literacies of Generation Z language learners and explored how these practices could be better addressed in the language classroom through "Real-World" tasks. The study was conducted within the theoretical framework of the Sociocultural Approach to Language Learning and the pedagogical framework of the Real-World Task-Based Language Teaching approach, with an emphasis on the eLANG citizen project. The data was analyzed through qualitative thematic analysis of both a corpus of learner final reports and two questionnaires, resulting in a detailed portrait of the learners' digital practices. The findings indicate that implementing real-world, task-based language learning projects that utilize Generation Z's pre-existing digital competencies can lead to improved language and digital literacy skills. Students reported enhancements in their oral expression, use of slang, and interaction with native speakers, as well as improvements in their understanding of hashtags, video planning and editing, and trend tracking. The students had multiple real-world, authentic interactions through the digital citizenship project, which enabled them to become enhanced digital citizens in FLS by formulating their identities, observing established communities and language users, participating in these communities directly, and learning in informal, gameful ways. It is proposed that utilizing the digital practices of Generation Z learners for language learning not only enhances the authenticity and relevance of the activities but also helps to achieve pedagogical objectives. This prepares learners for their future in a technology-saturated world and becoming effective members of society and is the next relevant step in sociocultural language pedagogy.
|
37 |
Algorithms For Discovering Communities In Complex NetworksBalakrishnan, Hemant 01 January 2006 (has links)
It has been observed that real-world random networks like the WWW, Internet, social networks, citation networks, etc., organize themselves into closely-knit groups that are locally dense and globally sparse. These closely-knit groups are termed communities. Nodes within a community are similar in some aspect. For example in a WWW network, communities might consist of web pages that share similar contents. Mining these communities facilitates better understanding of their evolution and topology, and is of great theoretical and commercial significance. Community related research has focused on two main problems: community discovery and community identification. Community discovery is the problem of extracting all the communities in a given network, whereas community identification is the problem of identifying the community, to which, a given set of nodes belong. We make a comparative study of various existing community-discovery algorithms. We then propose a new algorithm based on bibliographic metrics, which addresses the drawbacks in existing approaches. Bibliographic metrics are used to study similarities between publications in a citation network. Our algorithm classifies nodes in the network based on the similarity of their neighborhoods. One of the drawbacks of the current community-discovery algorithms is their computational complexity. These algorithms do not scale up to the enormous size of the real-world networks. We propose a hash-table-based technique that helps us compute the bibliometric similarity between nodes in O(m ?) time. Here m is the number of edges in the graph and ?, the largest degree. Next, we investigate different centrality metrics. Centrality metrics are used to portray the importance of a node in the network. We propose an algorithm that utilizes centrality metrics of the nodes to compute the importance of the edges in the network. Removal of the edges in ascending order of their importance breaks the network into components, each of which represent a community. We compare the performance of the algorithm on synthetic networks with a known community structure using several centrality metrics. Performance was measured as the percentage of nodes that were correctly classified. As an illustration, we model the ucf.edu domain as a web graph and analyze the changes in its properties like densification power law, edge density, degree distribution, diameter, etc., over a five-year period. Our results show super-linear growth in the number of edges with time. We observe (and explain) that despite the increase in average degree of the nodes, the edge density decreases with time.
|
38 |
Algorithms For Community Identification In Complex NetworksVasudevan, Mahadevan 01 January 2012 (has links)
First and foremost, I would like to extend my deepest gratitude to my advisor, Professor Narsingh Deo, for his excellent guidance and encouragement, and also for introducing me to this wonderful science of complex networks. Without his support this dissertation would not have been possible. I would also like to thank the members of my research committee, professors Charles Hughes, Ratan Guha, Mainak Chatterjee and Yue Zhao for their advice and guidance during the entire process. I am indebted to the faculty and the staff of the Department of Electrical Engineering and Computer Science for providing me the resources and environment to perform this research. I am grateful to my colleagues in the Parallel and Quantum computing lab for the stimulating discussions and support. I would also like to thank Dr. Hemant Balakrishnan and Dr. Sanjeeb Nanda for their valuable suggestions and guidance. My heartfelt thanks to my parents, Vasudevan and Raji, who have always been supportive of my decisions and encouraged me with their best wishes. I would also like to thank my sister Gomathy, for her words of care and affection during tough times. Special thanks to my friends in Orlando for being there when I needed them
|
39 |
Modeling Driver Behavior and I-ADAS in Intersection TraversalsKleinschmidt, Katelyn Anne 20 December 2023 (has links)
Intersection Advance Driver Assist Systems (I-ADAS) may prevent 25 to 93% of intersection crashes. The effectiveness of I-ADAS will be limited by driver's pre-crash behavior and other environmental factors. This study will characterize real-world intersection traversals to evaluate the effectiveness of I-ADAS while accounting for driver behavior in crash and near-crash scenarios. This study characterized real-world intersection traversals using naturalistic driving datasets: the Second Strategic Highway Research Program (SHRP-2) and the Virginia Traffic Cameras for Advanced Safety Technologies (VT-CAST) 2020. A step-by-step approach was taken to create an algorithm that can identify three different intersection traversal trajectories: straight crossing path (SCP); left turn across path opposite direction (LTAP/OD); and left turn across path lateral direction (LTAP/LD). About 140,000 intersection traversals were characterized and used to train a unique driver behavior model. The median average speed for all encounter types was about 7.2 m/s. The driver behavior model was a Markov Model with a multinomial regression that achieved an average 90.5% accuracy across the three crash modes. The model used over 124,000 total intersection encounters including 301 crash and near-crash scenarios. I-ADAS effectiveness was evaluated with realistic driver behavior in simulations of intersection traversal scenarios based on proposed US New Car Assessment Program I-ADAS test protocols. All near-crashes were avoided. The driver with I-ADAS overall helped avoid more crashes. For SCP and LTAP the collisions avoided increased as the field of view of the sensor increased in I-ADAS only simulations. There were 18% crash scenarios that were not avoided with I-ADAS with driver. Among near-crash scenarios, where NHTSA expects no I-ADAS activation, there were fewer I-ADAS activations (58.5%) due to driver input compared to the I-ADAS only simulations (0%). / Master of Science / Intersection Advance Driver Assist Systems (I-ADAS) may prevent 25-93% of intersection crashes. I-ADAS can assist drivers in preventing or mitigating these crashes using a collision warning system or automatically applying the brakes for the driver. One way I-ADAS may assist in crash prevention is with automatic emergency braking (AEB), which will automatically apply braking without driver input if the vehicle detects that a crash is imminent. The United States New Car Assessment Program (US-NCAP) has also proposed adding I-ADAS with AEB tests into its standard test matrix. The US-NCAP has proposed three different scenarios. All the tests have two crash-imminent configurations where the vehicles are set up to collide if no deceleration occurs and a near-miss configuration where the vehicles are set up to barely miss each other. This study will use intersection traversals from naturalistic driving data in the US to build a driver behavior model. The intersection travels will be characterized by their speed, acceleration, deceleration, and estimated time to collision. The driver behavior model was able to predict the longitudinal and lateral movements for the driver. The proposed US-NCAP test protocols were then simulated with varied sensors parameters where one vehicle was equipped with I-ADAS and a driver. The vehicle with I-ADAS with a driver was more successful than a vehicle only equipped with I-ADAS at preventing a crash.
|
40 |
Real-World Considerations for RFML ApplicationsMuller, Braeden Phillip Swanson 20 December 2023 (has links)
Radio Frequency Machine Learning (RFML) is the application of ML techniques to solve problems in the RF domain as an alternative to traditional digital-signal processing (DSP) techniques. Notable among these are the tasks of specific emitter identification (SEI), determining source identity of a received RF signal, and automated modulation classification (AMC), determining the modulation scheme of a received RF transmission. Both tasks have a number of algorithms that are effective on simulated data, but struggle to generalize to data collected in the real-world, partially due to the lack of available datasets upon which to train models and understand their limitations. This thesis covers the practical considerations for systems that can create high-quality datasets for RFML tasks, how variances from real-world effects in these datasets affect RFML algorithm performance, and how well models developed from these datasets are able to generalize and adapt across different receiver hardware platforms. Moreover, this thesis presents a proof-of-concept system for large-scale and efficient data generation, proven through the design and implementation of a custom platform capable of coordinating transmissions from nearly a hundred Software-Defined Radios (SDRs). This platform was used to rapidly perform experiments in both RFML performance sensitivity analysis and successful transfer between SDRs of trained models for both SEI and AMC algorithms. / Master of Science / Radio Frequency Machine Learning (RFML) is the application of machine learning techniques to solve problems having to do with radio signals as an alternative to traditional signal processing techniques. Notable among these are the tasks of specific emitter identification (SEI), determining source identity of a received signal, and automated modulation classification (AMC), determining the data encoding format of a received RF transmission. Both tasks have practical limitations related to the real-world collection of RF training data. This thesis presents a proof-of-concept for large-scale, efficient data generation and management, as proven through the design and construction of a custom platform capable of coordinating transmissions from nearly a hundred radios. This platform was used to rapidly perform experiments in both RFML performance sensitivity analysis and successful cross-radio transfer of trained behaviors.
|
Page generated in 0.0345 seconds