Global ETD Search

1	Meta-learning: strategies, implementations, and evaluations for algorithm selection / Köpf, Christian Rudolf. January 2006 (has links) Univ., Diss.--Ulm, 2005. / Literaturverz. S. 227 - 248.
2	Mining frequent itemsets and order preserving submatrices from uncertain data Chui, Chun-kit, 崔俊傑 January 2007 (has links) published_or_final_version / abstract / Computer Science / Master / Master of Philosophy Algorithms Data mining.
3	Knowledge discovery from distributed aggregate data in data warehouses and statistical databases PaÌirceÌir, RoÌnaÌn January 2002 (has links) No description available. 006 Data mining
4	Geo-demographic analysis in support of the United States Army Reserve (USAR) Unit Positioning and Quality Assessment Model (UPQUAM) Fair, Martin Lynn 06 1900 (has links) Manning United States Army Reserve (USAR) units are fundamentally different from manning Regular Army (RA) units. A soldier assigned to a USAR unit must live within 75 miles or 90 minutes commute of his Reserve Center (RC). This makes reserve unit positioning a key factor in the ability to recruit to fill the unit. This thesis automates, documents, reconciles, and assembles data on over 30,000 ZIP Codes, over 800 RCs, and over 260 Military Occupational Specialties (MOSs), drawing on and integrating over a dozen disparate databases. This effort produces a single data file with demographic, vocational, and economic data on every ZIP Code in America, along with the six year results of its RA, USAR, sister service recruit production, and MOS suitability for each of the 264 MOSs. Preliminary model development accounts for about 70% recruit production variation by ZIP Code. This thesis also develops models for the top five MOSs to predict the maximum number of recruits obtained from a ZIP Code for that MOS. Examples illustrate that ZIP Codes vary in their ability to provide recruits with sufficient aptitude for technical fields. Two subsequent theses will use those results. One completes the MOS models. The second uses the models as constraints in an optimization model to position RCs. An initial version of the optimization model is developed in this thesis. Together, the three theses will provide a powerful tool for analysis of a strategic-based optimal reserve force stationing. / Lieutenant Colonel, United States Army Database management Data mining
5	Measuring academic performance of students in Higher Education using data mining techniques Alsuwaiket, Mohammed January 2018 (has links) Educational Data Mining (EDM) is a developing discipline, concerned with expanding the classical Data Mining (DM) methods and developing new methods for discovering the data that originate from educational systems. It aims to use those methods to achieve a logical understanding of students, and the educational environment they should have for better learning. These data are characterized by their large size and randomness and this can make it difficult for educators to extract knowledge from these data. Additionally, knowledge extracted from data by means of counting the occurrence of certain events is not always reliable, since the counting process sometimes does not take into consideration other factors and parameters that could affect the extracted knowledge. Student attendance in Higher Education has always been dealt with in a classical way, i.e. educators rely on counting the occurrence of attendance or absence building their knowledge about students as well as modules based on this count. This method is neither credible nor does it necessarily provide a real indication of a student s performance. On other hand, the choice of an effective student assessment method is an issue of interest in Higher Education. Various studies (Romero, et al., 2010) have shown that students tend to get higher marks when assessed through coursework-based assessment methods - which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of Educational Data Mining (EDM) studies that pre-processed data through the conventional Data Mining processes including the data preparation process, but they are using transcript data as it stands without looking at examination and coursework results weighting which could affect prediction accuracy. This thesis explores the above problems and tries to formulate the extracted knowledge in a way that guarantees achieving accurate and credible results. Student attendance data, gathered from the educational system, were first cleaned in order to remove any randomness and noise, then various attributes were studied so as to highlight the most significant ones that affect the real attendance of students. The next step was to derive an equation that measures the Student Attendance s Credibility (SAC) considering the attributes chosen in the previous step. The reliability of the newly developed measure was then evaluated in order to examine its consistency. In term of transcripts data, this thesis proposes a different data preparation process through investigating more than 230,000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The data have been processed through different stages in order to extract a categorical factor through which students module marks are refined during the data preparation process. The results of this work show that students final marks should not be isolated from the nature of the enrolled module s assessment methods; rather they must be investigated thoroughly and considered during EDM s data pre-processing phases. More generally, it is concluded that Educational Data should not be prepared in the same way as exist data due to the differences such as sources of data, applications, and types of errors in them. Therefore, an attribute, Coursework Assessment Ratio (CAR), is proposed to use in order to take the different modules assessment methods into account while preparing student transcript data. The effect of CAR and SAC on prediction process using data mining classification techniques such as Random Forest, Artificial Neural Networks and k-Nears Neighbors have been investigated. The results were generated by applying the DM techniques on our data set and evaluated by measuring the statistical differences between Classification Accuracy (CA) and Root Mean Square Error (RMSE) of all models. Comprehensive evaluation has been carried out for all results in the experiments to compare all DM techniques results, and it has been found that Random forest (RF) has the highest CA and lowest RMSE. The importance of SAC and CAR in increasing the prediction accuracy has been proved in Chapter 5. Finally, the results have been compared with previous studies that predicted students final marks, based on students marks at earlier stages of their study. The comparisons have taken into consideration similar data and attributes, whilst first excluding average CAR and SAC and secondly by including them, and then measuring the prediction accuracy between both. The aim of this comparison is to ensure that the new preparation process stage will positively affect the final results. Educational Data Mining (EDM)
6	Automatic web resource compilation using data mining Escudeiro, Nuno Filipe Fonseca Vasconcelos January 2004 (has links) Tese de mestrado. Análise de Dados e Sistemas de Apoio à Decisão. Faculdade de Economia. Universidade do Porto. 2004 Web Data mining
7	Discovering and summarizing email conversations Zhou, Xiaodong 05 1900 (has links) With the ever increasing popularity of emails, it is very common nowadays that people discuss specific issues, events or tasks among a group of people by emails. Those discussions can be viewed as conversations via emails and are valuable for the user as a personal information repository. For instance, in 10 minutes before a meeting, a user may want to quickly go through a previous discussion via emails that is going to be discussed in the meeting soon. In this case, rather than reading each individual email one by one, it is preferable to read a concise summary of the previous discussion with major information summarized. In this thesis, we study the problem of discovering and summarizing email conversations. We believe that our work can greatly support users with their email folders. However, the characteristics of email conversations, e.g., lack of synchronization, conversational structure and informal writing style, make this task particularly challenging. In this thesis, we tackle this task by considering the following aspects: discovering emails in one conversation, capturing the conversation structure and summarizing the email conversation. We first study how to discover all emails belonging to one conversation. Specifically, we study the hidden email problem, which is important for email summarization and other applications but has not been studied before. We propose a framework to discover and regenerate hidden emails. The empirical evaluation shows that this framework is accurate and scalable to large folders. Second, we build a fragment quotation graph to capture email conversations. The hidden emails belonging to each conversation are also included into the corresponding graph. Based on the quotation graph, we develop a novel email conversation summarizer, ClueWordSummarizer. The comparison with a state-of-the-art email summarizer as well as with a popular multi-document summarizer shows that ClueWordSummarizer obtains a higher accuracy in most cases. Furthermore, to address the characteristics of email conversations, we study several ways to improve the ClueWordSummarizer by considering more lexical features. The experiments show that many of those improvements can significantly increase the accuracy especially the subjective words and phrases. email summarization data mining
8	AN EFFICIENT SET-BASED APPROACH TO MINING ASSOCIATION RULES Hsieh, Yu-Ming 28 July 2000 (has links) Discovery of {it association rules} is an important problem in the area of data mining. Given a database of sales transactions, it is desirable to discover the important associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. Since mining association rules may require to repeatedly scan through a large transaction database to find different association patterns, the amount of processing could be huge, and performance improvement is an essential concern. Among this problem, how to efficiently {it count large itemsets} is the major work, where a large itemset is a set of items appearing in a sufficient number of transactions. In this thesis, we propose efficient algorithms for mining association rules based on a high-level set-based approach. A set-based approach allows a clear expression of what needs to be done as opposed to specifying exactly how the operations are carried out in a low-level approach, where a low-level approach means to retrieve one tuple from the database at a time. The advantage of the set-based approach, like the SETM algorithm, is simple and stable over the range of parameter values. However, the SETM algorithm proposed by Houtsma and Swami may generate too many invalid candidate itemsets. Therefore, in this thesis, we propose a set-based algorithm called SETM, which provides the same advantages of the SETM algorithm, while it avoids the disadvantages of the SETM algorithm. In the SETM algorithm, we reduce the size of the candidate database by modifying the way of constructing it, where a candidate database is a transaction database formed with candidate $k$-itemsets. Then, based on the new way to construct the candidate database in the SETM* algorithm, we propose SETM-2K, mbox{SETM-MaxK} and SETM-Lmax algorithms. In the SETM-2K algorithm, given a $k$, we efficiently construct $L_{k}$ based on $L_{w}$, where $w=2^{lceil log_{2}k ceil - 1}$, instead of step by step. In the SETM-MaxK algorithm, we efficiently to find the $L_{k}$ based on $L_{w}$, where $L_{k} ot= emptyset, L_{k+1}=emptyset$ and $w=2^{lceil log_{2}k ceil - 1}$, instead of step by step. In the SETM-Lmax algorithm, we use a forward approach to find all maximal large itemsets from $L_{k}$, and the $k$-itemset is not included in the $k$-subsets of the $j$-itemset, except $k=MaxK$, where $1 leq k < j leq MaxK$, $L_{MaxK} ot= emptyset$ and $L_{MaxK+1}=emptyset$. We conduct several experiments using different synthetic relational databases. The simulation results show that the SETM* algorithm outperforms the SETM algorithm in terms of storage space or the execution time for all relational database settings. Moreover, we show that the proposed SETM-2K and SETM-MaxK algorithms also require shorter time to achieve their goals than the SETM or SETM* algorithms. Furthermore, we also show that the proposed forward approach (SETM*-Lmax) to find all maximal large itemsets requires shorter time than the backward approach proposed by Agrawal. association rule data mining
9	Effective and efficient analysis of spatio-temporal data / Zhang, Zhongnan. January 2008 (has links) Thesis (Ph.D.)--University of Texas at Dallas, 2008. / Includes vita. Includes bibliographical references (leaves 106-114) Database management. Data mining.
10	Data mining medication administration incident data to identify opportunities for improving patient safety Gray, Michael David. Thomas, Robert Evans. January 2009 (has links) Dissertation (Ph.D.)--Auburn University, 2009. / Abstract. Vita. Includes bibliographic references. Medication errors. Data mining.

Search results