Spelling suggestions: "subject:"data deriven"" "subject:"data dcdriven""
321 |
The application of constraint rules to data-driven parsingJaf, Sardar January 2015 (has links)
The process of determining the structural relationships between words in both natural and machine languages is known as parsing. Parsers are used as core components in a number of Natural Language Processing (NLP) applications such as online tutoring applications, dialogue-based systems and textual entailment systems. They have been used widely in the development of machine languages. In order to understand the way parsers work, we will investigate and describe a number of widely used parsing algorithms. These algorithms have been utilised in a range of different contexts such as dependency frameworks and phrase structure frameworks. We will investigate and describe some of the fundamental aspects of each of these frameworks, which can function in various ways including grammar-driven approaches and data-driven approaches. Grammar-driven approaches use a set of grammatical rules for determining the syntactic structures of sentences during parsing. Data-driven approaches use a set of parsed data to generate a parse model which is used for guiding the parser during the processing of new sentences. A number of state-of-the-art parsers have been developed that use such frameworks and approaches. We will briefly highlight some of these in this thesis. There are three specific important features that it is important to integrate into the development of parsers. These are efficiency, accuracy, and robustness. Efficiency is concerned with the use of as little time and computing resources as possible when processing natural language text. Accuracy involves maximising the correctness of the analyses that a parser produces. Robustness is a measure of a parser’s ability to cope with grammatically complex sentences and produce analyses of a large proportion of a set of sentences. In this thesis, we present a parser that can efficiently, accurately, and robustly parse a set of natural language sentences. Additionally, the implementation of the parser presented here allows for some trading-off between different levels of parsing performance. For example, some NLP applications may emphasise efficiency/robustness over accuracy while some other NLP systems may require a greater focus on accuracy. In dialogue-based systems, it may be preferable to produce a correct grammatical analysis of a question, rather than incorrectly analysing the grammatical structure of a question or quickly producing a grammatically incorrect answer for a question. Alternatively, it may be desirable that document translation systems translate a document into a different language quickly but less accurately, rather than slowly but highly accurately, because users may be able to correct grammatically incorrect sentences manually if necessary. The parser presented here is based on data-driven approaches but we will allow for the application of constraint rules to it in order to improve its performance.
|
322 |
Data driven SEO / Data-driven SEOKoutný, Jiří January 2011 (has links)
The Search Engine Optimization (SEO) industry has recently undergone major changes. Many new analytics tools have been put on the market enabling marketing consultants to be finally able to measure and evaluate the results of their work in SEO effectively. The theoretical part of this diploma thesis therefore aims to describe and compare selected SEO tools including practical examples of their use. The paper is focused on backlink databases (MajesticSEO, SEOmoz OpenSiteExplorer and Ahrefs) and keyword suggestion tools from Google (AdWords), Seznam (Sklik), Wordtracker and SEMRush. The final chapter provides an overview of search engine positions tracking tools and techniques. The practical part describes the method of selection, preparation and processing of data obtained from tools mentioned above. The data are used to compute correlation analysis of Seznam.cz search engine results in relation with the best known SEO factors. The results of the analysis will help marketing consultants to clarify which factors are the most important to focus on to obtain more traffic from search engines.
|
323 |
A Retrospective-Longitudinal Examination of the Relationship between Apportionment of Seat Time in Community-College Algebra Courses and Student Academic PerformanceRoig-Watnik, Steven M 06 December 2012 (has links)
During the past decade, there has been a dramatic increase by postsecondary institutions in providing academic programs and course offerings in a multitude of formats and venues (Biemiller, 2009; Kucsera & Zimmaro, 2010; Lang, 2009; Mangan, 2008). Strategies pertaining to reapportionment of course-delivery seat time have been a major facet of these institutional initiatives; most notably, within many open-door 2-year colleges. Often, these enrollment-management decisions are driven by the desire to increase market-share, optimize the usage of finite facility capacity, and contain costs, especially during these economically turbulent times. So, while enrollments have surged to the point where nearly one in three 18-to-24 year-old U.S. undergraduates are community college students (Pew Research Center, 2009), graduation rates, on average, still remain distressingly low (Complete College America, 2011). Among the learning-theory constructs related to seat-time reapportionment efforts is the cognitive phenomenon commonly referred to as the spacing effect, the degree to which learning is enhanced by a series of shorter, separated sessions as opposed to fewer, more massed episodes.
This ex post facto study explored whether seat time in a postsecondary developmental-level algebra course is significantly related to: course success; course-enrollment persistence; and, longitudinally, the time to successfully complete a general-education-level mathematics course. Hierarchical logistic regression and discrete-time survival analysis were used to perform a multi-level, multivariable analysis of a student cohort (N = 3,284) enrolled at a large, multi-campus, urban community college. The subjects were retrospectively tracked over a 2-year longitudinal period. The study found that students in long seat-time classes tended to withdraw earlier and more often than did their peers in short seat-time classes (p < .05). Additionally, a model comprised of nine statistically significant covariates (all with p-values less than .01) was constructed. However, no longitudinal seat-time group differences were detected nor was there sufficient statistical evidence to conclude that seat time was predictive of developmental-level course success.
A principal aim of this study was to demonstrate—to educational leaders, researchers, and institutional-research/business-intelligence professionals—the advantages and computational practicability of survival analysis, an underused but more powerful way to investigate changes in students over time.
|
324 |
A data-driven approach for Product-Service Systems design : Using data and simulation to understand the value of a new design conceptChowdhery, Syed Azad January 2020 (has links)
Global challenges such as increasingly competitive markets, low-cost competition, shorter lead time demands, and high quality/value output are transforming the business model of the company to focus beyond the performance requirements. In order to meet these challenges, companies are highly concerned with the customer perceived value, which is to connect the product with the customer in a better way and become more proactive to fulfil the customer needs, via function-oriented business models and Product-Service Systems. In literature, the conceptual phase is distinguished as the most critical phase of the product development process. Many authors have recognized the improvement of design in the conceptual phase as the mean to deliver a successful product in the market. At the decision gate, where concepts are selected for further development, the design team needs knowledge/data about the long-term consequences of their early decision, to see how changes in design propagate to the entire lifecycle of the product. The main goal of the thesis is to describe how the design of Product-Service Systems in the conceptual phase can be improved through the use of a data-driven approach. The latter provides an opportunity to enhance decision making and to provide better support at the early development phase. The study highlights how data are managed and used in current industrial setting and indicates the room for improvement with current practices. The thesis further provides guidelines to efficiently use data into the modelling and simulation activities to increase design knowledge. As a result of this study, a data-driven approach emerged to support the early design decision. The thesis presents initial descriptive study findings from the empirical investigations, showing a model-based approach that creates awareness about the value of a new design concept, thus acting as a key enabler to use data in design. This will create a link between the product engineering characteristic to the high-level attributes of customer satisfaction and provider’s long-term profitability. The preliminary results indicate that the application of simulation models to frontload the early design stage creates awareness about how performance can lead to value creation, helping multidisciplinary teams to perform quick trade-off and what-if analysis on design configurations. The proposed framework shows how data from various sources are used through a chain of simulations to understand the entire product lifecycle. The proposed approach holds a potential to improve the key performance indicators for Product-Service Systems development: lead time, design quality, cost and most importantly deliver a value-added product to the customer.
|
325 |
Development of a process modelling methodology and condition monitoring platform for air-cooled condensersHaffejee, Rashid Ahmed 05 August 2021 (has links)
Air-cooled condensers (ACCs) are a type of dry-cooling technology that has seen an increase in implementation globally, particularly in the power generation industry, due to its low water consumption. Unfortunately, ACC performance is susceptible to changing ambient conditions, such as dry bulb temperatures, wind direction, and wind speeds. This can result in performance reduction under adverse ambient conditions, which leads to increased turbine back pressures and in turn, a decrease in generated electricity. Therefore, this creates a demand to monitor and predict ACC performance under changing ambient conditions. This study focuses on modelling a utility-scale ACC system at steady-state conditions applying a 1-D network modelling approach and using a component-level discretization approach. This approach allowed for each cell to be modelled individually, accounting for steam duct supply behaviour, and for off-design conditions to be investigated. The developed methodology was based on existing empirical correlations for condenser cells and adapted to model double-row dephlegmators. A utility-scale 64-cell ACC system based in South Africa was selected for this study. The thermofluid network model was validated using site data with agreement in results within 1%; however, due to a lack of site data, the model was not validated for off-design conditions. The thermofluid network model was also compared to the existing lumped approach and differences were observed due to the steam ducting distribution. The effect of increasing ambient air temperature from 25 35 − C C was investigated, with a heat rejection rate decrease of 10.9 MW and a backpressure increase of 7.79 kPa across the temperature range. Condensers' heat rejection rate decreased with higher air temperatures, while dephlegmators' heat rejection rate increased due to the increased outlet vapour pressure and flow rates from condensers. Off-design conditions were simulated, including hot air recirculation and wind effects. For wind effects, the developed model predicted a decrease in heat rejection rate of 1.7 MW for higher wind speeds, while the lumped approach predicted an increase of 4.9 . MW For practicality, a data-driven surrogate model was developed through machine learning techniques using data generated by the thermofluid network model. The surrogate model predicted systemlevel ACC performance indicators such as turbine backpressure and total heat rejection rate. Multilayer perceptron neural networks were developed in the form of a regression network and binary classifier network. For the test sets, the regression network had an average relative error of 0.3%, while the binary classifier had a 99.85% classification accuracy. The surrogate model was validated to site data over a 3 week operating period, with 93.5% of backpressure predictions within 6% of site data backpressures. The surrogate model was deployed through a web-application prototype which included a forecasting tool to predict ACC performance based on a weather forecast.
|
326 |
Simulations and data-based models for electrical conductivities of graphene nanolaminatesRothe, Tom 13 August 2021 (has links)
Graphene-based conductor materials (GCMs) consist of stacked and decoupled layers of graphene flakes and could potentially transfer graphene’s outstanding material properties like its exceptional electrical conductivity to the macro scale, where alternatives to the heavy and expensive metallic conductors are desperately needed. To reach super-metallic conductivity however, a systematic electrical conductivity optimization regarding the structural and physical input parameters is required. Here, a new trend in the field of process and material optimization are data-based models which utilize data science methods to quickly identify and abstract information and relationships from the available data. In this work such data-based models for the conductivity of a real GCM thin-film sample are build on data generated with an especially improved and extended version of the network simulation approach by Rizzi et al. [1, 2, 3]. Appropriate methods to create data-based models for GCMs are thereby introduced and typical challenges during the modelling process are addressed, so that data-based models for other properties of GCMs can be easily created as soon as sufficient data is accessible. Combined with experimental measurements by Slawig et al. [4] the created data-based models allow for a coherent and comprehensive description of the thin-films’
electrical parameters across several length scales.:List of Figures
List of Tables
Symbol Directory
List of Abbreviations
1 Introduction
2 Simulation approaches for graphene-based conductor materials
2.1 Traditional simulation approaches for GCMs
2.1.1 Analytical model for GCMs
2.1.2 Finite element method simulations for GCMs
2.2 A network simulation approach for GCMs
2.2.1 Geometry generation
2.2.2 Electrical network creation
2.2.3 Contact and probe setting
2.2.4 Conductivity computation
2.2.5 Results obtained with the network simulation approach
2.3 An improved implementation for the network simulation
2.3.1 Rizzi’s implementation of the network simulation approach
2.3.2 An network simulation tool for parameter studies
2.3.3 Extending the network simulation approach for anisotropy investigations and multilayer flakes
3 Data-based material modelling
3.1 Introduction to data-based modelling
3.2 Data-based modelling in material science
3.3 Interpretability of data-based models
3.4 The data-based modelling process
3.4.1 Preliminary considerations
3.4.2 Data acquisition
3.4.3 Preprocessing the data
3.4.4 Partitioning the dataset
3.4.5 Training the model
3.4.6 Model evaluation
3.4.7 Real-world applications
3.5 Regression estimators
3.5.1 Mathematical introduction to regression
3.5.2 Regularization and ridge regression
3.5.3 Support Vector Regression
3.5.4 Introducing non-linearity through kernels
4 Data-based models for a real GCM thin-film
4.1 Experimental measurements
4.2 Simulation procedure
4.3 Data generation
4.4 Creating data-based models
4.4.1 Quadlinear interpolation as benchmark model
4.4.2 KR, KRR and SVR
4.4.3 Enlarging the dataset
4.4.4 KR, KRR and SVR on the enlarged training dataset
4.5 Application to the GCM sample
5 Conclusion and Outlook
5.1 Conclusion
5.2 Outlook
Acknowledgements
Statement of Authorship
|
327 |
Modelem řízený vývoj Spark úloh / Model Driven Development of Spark TasksBútora, Matúš January 2019 (has links)
The aim of the master thesis is to describe Apache Spark framework , its structure and the way how Spark works . Next goal is to present topic of Model- Driven Development and Model-Drive Architecture . Define their advantages , disadvantages and way of usage . However , the main part of this text is devoted to design a model for creating tasks in Apache Spark framework . Text desribes application , that allows user to create graph based on proposed modeling language . Final application allows user to generate source code from created model.
|
328 |
Data-driven Management Framework using National and Corporate Culture Analytics to foster Innovation Ambidexterity : A case study on a world leading telecom companyIsola, Chiara, Peddireddy, Divya January 2021 (has links)
Background: In a highly competitive world, leaders of firms highly dependent on innovation, such astelecom companies, must acquire data-driven managerial skills to systematically analyze datasets from multiple points of view to aid decision-making in the new context of Industry 4.0. Data mining can be performed on both tangible and intangible assets of Big Data sets, but systematic analytics performed on Small data can function as a crucial refinement for such insights. In addition, they are usable to train the algorithms during machine learning supervised stage, for example, when treating datasets in the field of psychometrics: originated by human perceptions and behaviors. This applies to the exploitation of strategic information, for business purposes, from intangible reservoirs, such as human capital aspects. Ambidexterity is a leadership conduct, primarily focusing on human capital and encompassing the behaviors of exploration and exploitation of new ideas. It has been historically proven to be essential for innovation. However, leaders and companies often limitedly focus on the exploitation of human capital aspects through psychometrics inserted in a data-driven framework. For business models that consider innovation as a matter to be pursued at any levels of the organization and not only confined to one specific department such as R&D, this is indeed a crucial element to be investigated to foster innovation and retaining a competitive edge. This research is performed in collaboration with a world leading telecom company and has been requested by its Innovation Leader. Objectives: The first objective of the research is to provide a flexible conceptual model and standardized methodology, suitable for incumbent, cross-country companies, highly dependent on innovation that intend to begin investigation on how those aspects influence their business performances. Second, the hypothesis testing of the conceptual model has the purpose of identifying the human capital aspects of national and corporate culture that show statistically significant andstronger cause-effect relationship towards enhancing innovation ambidexterity. Third, predictions interms of prevalence of explorative or exploitative innovative behaviors are aimed at providing indications on what the company could expect in terms of Innovation Ambidexterity with their current conditions. An automatable and replicable method that is data-driven-based for company`s decision makers is provided. It is also suitable for further integration within machine learning algorithms or simply as refinement of data mining insights and these aspects addressed are within the possibilities for improvement. The objective of the thesis is to test the methodology on a relatively small size sample to show to the company executives and Innovation Leader, the potential of the approach and the value that these data can have for decision making. They can decide to develop further the research involving larger samples at a later stage: inserting the analyses into an automatic periodical routine with dashboarding of the outcomes. During the post survey interviews, awareness among the management and executives has also been raised about the potential of such approach to obtain strategic business information unavailable until now. Please note that it was not the purpose of thisstudy to provide a conceptual model that was specific and suitable for the human capital`s characteristics of one specific company. The purpose was instead to provide a data-driven framework and a conceptual model that could be used by any company of the telecom sector to approach the task and to find moderating or mediating factors. It will also allow companies of different sectors to refine the model based on their needs at a later stage, as a possibility for future improvement. Methodology: A conceptual model, partially newly designed for this research is introduced. It incorporates selected elements of national and corporate culture appearing to be crucial for innovation ambidexterity, according to an extensive literature review. The quantitative analysis is also extensive.A less extensive analysis would have left too much uncertainty in the findings, undermining the confidence of executives in taking into consideration the results aimed at business actions. For these reasons, we recommend to researchers who are tackling the exploitation of intangible assets (such as human capital) to perform an extensive set of analyses. From the main dataset, the analyses of the methodology have been replicated on 5 sub-data sets based on the heterogeneity measured. The methodology includes CTA, PLS-SEM modeling on the outer model, PLS-SEM on the inner model including bootstrapping, MGA, FIMIX-PLS, IPMA, blindfolding for the predictive relevance of the model followed by POS and Weka predictions. Cause-effect relationships, mediating and moderating factors of national and internal culture have been also identified and indicated as part of the possiblefuture personalization of the model on the specific company`s human capital characteristics. The national culture attributes consist of power distance, uncertainty avoidance, collectivism, masculinity (unrelated to the gender) and gender diversity. The corporate culture attributes are categorized into caring climate, creative instability, boundary spanning, decision making and strategic horizon. The methodology employs a bottom-up survey design to collect data through an online questionnaire across three company sites located in Sweden, Italy, and China. The pieces of software used were SmartPLS 3 for Structural Equation Modeling and Predictions Oriented Segmentation and Weka 3.8.5 for a machine learning algorithm (an artificial neural network was used), as a double check on PLS-POS predictions. Some qualitative interpretations, pre and post survey interviews were also added. Results: Hypothesis testing and cross-comparisons are performed on groups such as employees, leaders, and the different geographical sites. During the evaluation of the results, special attention was put on the parameters related to the quality and statistical relevance, not only of the model tested on the six cohorts, but also on the single national and corporate attributes that build it up. The results show that explorative behaviors predict innovation ambidexterity to a larger extent than purely exploitative ones, confirming the main hypothesis. Predictions that were POS-based and verified by Weka machine learning algorithm have shown instead how the pursuit of innovation ambidexterity within the company is unbalanced towards exploitative behaviors. The study has provided PLS-SEM indications on how company executives may wish to pursue explorative behaviors towards innovation, but the company middle management is steering in the opposite direction, focusing on attributes more linked to efficiency and constant delivery. Consequently, what initially appeared to be a complex national culture issue of employees interfering with corporate culture, has been linked instead to a possible middle management issue related to two different business models: where one prevails over the other, instead of cooperating to reach innovation ambidexterity. This is a valuable strategic input for the company executives. The quantitative methodology uncovered results and patterns that the Innovation Leader had so far only intuitively perceived, and it offered such counterintuitive interpretation of the causes. With regards to national culture: power distance increases exploitative behaviors; gender diversity increases explorative behaviors, while it decreases exploitative behaviors. With regards to corporate culture: creative instability crucially increases explorative behaviors but decreases exploitative behaviors. Boundary spanning decreases exploitative behaviors. Conclusions: The thesis answered to the research question. It provided a scientific contribution, allowing a better understanding of how national and corporate cultures interact to generate explorativeand exploitative behaviors and ultimately innovation ambidexterity. It provided a flexible conceptual model and a standardized, automatable data-driven methodology suitable to discover insights from human capital aspects that influence innovation in a business: taking the analyses of human capital data performed by the firm “to the next level”. Recommendations for future research: A recommendation is to apply the proposed conceptual model to compare bigger size samples with even less heterogeneity, according to the optimal datasample`s characteristics identified. This will also allow a further personalization of the flexible andgeneral conceptual model presented (which is so far suitable for the general telecommunication sector), to more specific characteristics of the company which is the object of analysis. In a completely automated framework, it is also recommended to consider the possibilities of applying thisdata-driven, decision-making approach to other companies or industrial domains. This means, for example, integrating the proposed methodology within a machine learning algorithm in its supervised stage. The algorithm can be trained using the current analyses as refinement of insights provided by Big Data mining performed on sets related to innovation and collected within the firm`s organizational or production systems. It is also important to clarify that, according to the indication of the authors of this study, the results of the data-driven framework can be compared among different companies. However, to collect data from different companies through the same questionnaire shall be avoided because the quality of the results is highly dependent on the homogeneity of groups` mindsets and perceptions.
|
329 |
MACHINE LEARNING MODEL FOR ESTIMATION OF SYSTEM PROPERTIES DURING CYCLING OF COAL-FIRED STEAM GENERATORAbhishek Navarkar (8790188) 06 May 2020 (has links)
The intermittent nature
of renewable energy, variations in energy demand, and fluctuations in oil and
gas prices have all contributed to variable demand for power generation from
coal-burning power plants. The varying demand leads to load-follow and on/off
operations referred to as cycling. Cycling causes transients of properties such
as pressure and temperature within various components of the steam generation
system. The transients can cause increased damage because of fatigue and
creep-fatigue interactions shortening the life of components. The data-driven
model based on artificial neural networks (ANN) is developed for the first time
to estimate properties of the steam generator components during cycling
operations of a power plant. This approach utilizes data from the Coal Creek
Station power plant located in North Dakota, USA collected over 10 years with a
1-hour resolution. Cycling characteristics of the plant are identified using a
time-series of gross power. The ANN model estimates the component properties,
for a given gross power profile and initial conditions, as they vary during
cycling operations. As a representative
example, the ANN estimates are presented for the superheater outlet pressure,
reheater inlet temperature, and flue gas temperature at the air heater inlet.
The changes in these variables as a function of the gross power over the time
duration are compared with measurements to assess the predictive capability of
the model. Mean square errors of 4.49E-04 for superheater outlet pressure,
1.62E-03 for reheater inlet temperature, and 4.14E-04 for flue gas temperature
at the air heater inlet were observed.
|
330 |
Fanfictions, linguística de corpus e aprendizagem direcionada por dados : tarefas de produção escrita com foco no uso autêntico de língua e atividades que visam à autonomia dos alunos de letras em analisar preposições /Garcia, William Danilo January 2020 (has links)
Orientador: Paula Tavares Pinto / Resumo: A relação da Linguística de Corpus com o Ensino de Línguas, apesar de receber foco mesmo antes do advento dos computadores, se intensificou por volta da década de 90, momento em que pesquisas em corpora de aprendizes e em Aprendizagem Direcionada por Dados foram enfatizadas. Considerado esse estreitamento, esta pesquisa objetiva compilar quatro corpora de aprendizes a partir do uso autêntico da língua com o intuito de desenvolver atividades didáticas direcionadas por dados dos próprios alunos que promovam nos discentes um perfil autônomo de investigação linguística (mais precisamente das preposições with, in, on, at, for e to). No tocante à fundamentação teórica, destacam-se Prabhu (1987), Skehan (1996), Willis (1996), Nunan (2004) e Ellis (2006) a respeito do Ensino de Línguas por Tarefas, Jenkins (2012) e Neves (2014) que discorrem sobre as ficções de fã. Já sobre a Linguística de Corpus, tem-se Sinclair (1991), Berber Sardinha (2000) e Viana (2011). Granger (1998, 2002, 2013) mais relacionado a Corpus de Aprendizes, e Johns (1991, 1994), Berber Sardinha (2011) e Boulton (2010) no que diz respeito à Aprendizagem Direcionada por Dados. Como metodologia, levantaram-se textos escritos pelos alunos a partir de uma tarefa de produção escrita em que eles redigiram uma ficção de fã. Em seguida, esses textos formaram dois corpora de aprendizes iniciais, que foram analisados com o auxílio da ferramenta AntConc (ANTHONY, 2018) no intuito de observar a presença ou não de inadequações ... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Although the relation between Corpus Linguistics and Language Teaching has been emphasized even before the advent of computers, it has been highlighted around the 90s. This was the moment when research on learner corpora and Data-Driven Learning was focused. Having said that, this study aimed to compile four learner corpora based on the authentic use of the language. This was done in order to develop data-driven teaching activities that could promote, among the students, an autonomous profile of linguistic investigation (more precisely about the prepositions with, in, on, at, for and to). Concerning the existing literature, we highlight the works of Prabhu (1987), Skehan (1996), Willis (1996), Nunan (2004) and Ellis (2006) about Task-Based Language Teaching, and Jenkins (2012) and Neves (2014) about fanfictions. In relation to Corpus Linguistics, this study is based on Sinclair (1991), Berber Sardinha (2000) and Viana (2011). Granger (1998, 2012, 2013) is referenced to define learner corpora, and Johns (1991, 1994), Berber Sardinha (2011) and Boulton (2010) to discuss Data-Driven Learning. The methodological approach involved the collection of the compositions from Language Teaching undergraduate students who developed a writing task in which they had to write a fanfiction. These texts composed two learner corpora, which were analyzed with the AntConc tool (ANTHONY, 2018) with the purpose of observing the occurrence of prepositions in English and whether they were accurately ... (Complete abstract click electronic access below) / Mestre
|
Page generated in 0.0541 seconds