201 |
A linear programming and sampling approach to the cutting-order problemHamilton, Evan D. 15 November 2000 (has links)
In the context of forest products, a cutting order is a list of dimension parts along
with demanded quantities. The cutting-order problem is to minimize the total cost of
filling the cutting order from a given lumber grade (or grades). Lumber of a given grade
is supplied to the production line in a random sequence, and each board is cut in a way
that maximizes the total value of dimension parts produced, based on a value (or price)
specified for each dimension part. Hence, the problem boils down to specifying suitable
dimension-part prices for each board to be cut.
The method we propose is adapted from Gilmore and Gomory's linear programming
approach to the cutting stock problem. The main differences are the use of a random
sample to construct the linear program and the use of prices rather than cutting patterns
to specify a solution. The primary result of this thesis is that the expected cost of
filling an order under the proposed method is approximately equal to the minimum possible
expected cost, in the sense that the ratio (expected cost divided by the minimum
expected cost) approaches one as the size of the order (e.g., in board feet) and the size of
the random sample grow large.
A secondary result is a lower bound on the minimum possible expected cost. The
actual minimum is usually impractical to calculate, but the lower bound can be used in
computer simulations to provide an absolute standard against which to compare costs. It
applies only to independent sequences, whereas the convergence property above applies
to a large class of dependent sequences, called alpha-mixing sequences.
Experimental results (in the form of computer simulations) suggest that the proposed
method is capable of attaining nearly minimal expected costs in moderately large
orders. The main drawbacks are that the method is computationally expensive and of
questionable value in smaller orders. / Graduation date: 2001
|
202 |
Lock-free linked lists and skip lists /Fomitchev, Mikhail. January 2003 (has links)
Thesis (M.Sc.)--York University, 2003. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 224-226). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL:http://gateway.proquest.com/openurl?url%5Fver=Z39.88-2004&res%5Fdat=xri:pqdiss&rft%5Fval%5Ffmt=info:ofi/fmt:kev:mtx:dissertation&rft%5Fdat=xri:pqdiss:MQ99307
|
203 |
Data aggregation for capacity managementLee, Yong Woo 30 September 2004 (has links)
This thesis presents a methodology for data aggregation for capacity management. It is assumed that there are a very large number of products manufactured in a company and that every product is stored in the database with its standard unit per hour and attributes that uniquely specify each product. The methodology aggregates products into families based on the standard units-per-hour and finds a subset of attributes that unambiguously identifies each family. Data reduction and classification are achieved using well-known multivariate statistical techniques such as cluster analysis, variable selection and discriminant analysis. The experimental results suggest that the efficacy of the proposed methodology is good in terms of data reduction.
|
204 |
Clustering Lab value working with medical dataDavari, Mahtab January 2007 (has links)
Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.
|
205 |
Incident Data Analysis Using Data Mining TechniquesVeltman, Lisa M. 16 January 2010 (has links)
There are several databases collecting information on various types of incidents, and
most analyses performed on these databases usually do not expand past basic trend
analysis or counting occurrences. This research uses the more robust methods of data
mining and text mining to analyze the Hazardous Substances Emergency Events
Surveillance (HSEES) system data by identifying relationships among variables,
predicting the occurrence of injuries, and assessing the value added by the text data. The
benefits of performing a thorough analysis of past incidents include better understanding
of safety performance, better understanding of how to focus efforts to reduce incidents,
and a better understanding of how people are affected by these incidents.
The results of this research showed that visually exploring the data via bar graphs did not
yield any noticeable patterns. Clustering the data identified groupings of categories
across the variable inputs such as manufacturing events resulting from intentional acts
like system startup and shutdown, performing maintenance, and improper dumping.
Text mining the data allowed for clustering the events and further description of the data,
however, these events were not noticeably distinct and drawing conclusions based on
these clusters was limited. Inclusion of the text comments to the overall analysis of
HSEES data greatly improved the predictive power of the models. Interpretation of the
textual data?s contribution was limited, however, the qualitative conclusions drawn were
similar to the model without textual data input. Although HSEES data is collected to
describe the effects hazardous substance releases/threatened releases have on people, a
fairly good predictive model was still obtained from the few variables identified as cause
related.
|
206 |
The development of a spatial-temporal data imputation technique for the applications of environmental monitoringHuang, Ya-Chen 12 September 2006 (has links)
In recent years, sustainable development has become one of the most important issues internationally. Many indicators related to sustainable development have been proposed and implemented, such as Island Taiwan and Urban Taiwan. However the missing values come along with environmental monitoring data pose serious problems when we conducted the study on building a sustainable development indicator for marine environment. Since data is the origin of the summarized information, such as indicators. Given the poor data quality caused by the missing values, there will be some doubts about the result accuracy when using such data set for estimation. It is therefore important to apply suitable data pre-processing, such that reliable information can be acquired by advanced data analysis. Several reasons cause the problem of missing value in environmental monitoring data, for example: breakdown of machines, ruin of samples, forgot recording, mismatch of records when merging data, and lost of records when processing data. The situations of missing data are also diverse, for example: in the same time of sampling, some data records at several sampling sites are partially or completely disappeared. On the contrary, partial or complete time series data are missing at the same sampling site. It is therefore obvious to see that the missing values of environmental monitoring data are both related to spatial and temporal dimensions. Currently the techniques of data imputation have been developed for certain types of data or the interpolation of missing values based on either geographic data distributions or time-series functions. To accommodate both spatial and temporal information in an analysis is rarely seen. The current study has been tried to integrate the related analysis procedures and develop a computing process using both spatial and temporal dimensions inherent in the environmental monitoring data. Such data imputation process can enhance the accuracy of estimated missing values.
|
207 |
A Different Threshold Approach to Data Replication in Data GridsHuang, Yen-Wei 21 January 2008 (has links)
Certain scientific application domains, such as High-Energy Physics or Earth Observation, are expected to produce several Petabytes (220 Gigabyes) of data that is analyzed and evaluated by the scientists all over the world. In the context of data grid technology, data replication is mostly used to reduce access latency and bandwidth consumption. In this thesis, we adopt the typical Data Grid architecture, three kinds of nodes: server, cache, and client nodes. A server node represents a main storage site. A client node represents a site where data access requests are generated, and a cache node represents an intermediate storage site. However, the access latency of the hierarchical storage system may be of the order of seconds up to hours. The static replication strategy can be used to improve such long delay; however, it cannot adapt to changes of users¡¦ behaviors. Therefore, the dynamic data replication strategy is used in Data Grids. Three fundamental design issues in a dynamic replication strategy are: (1) when to create the replicas, (2) which files to be replicated, and (3) where the replicas to be placed. Two of well known replication strategies are Fast-Spread and Cascading, which can work well for different kinds of access patterns individually. For example, the Fast-Spread strategy works well for random access patterns, and the Cascading strategy works well for the patterns with the properties of localities. However, for so many different access patterns, if we use a strategy for one kind of access patterns and another strategy for another kind of access patterns, the system may become too complex. Therefore, in this thesis, we propose one strategy which can work for any kind of access patterns. We propose a replication approach, a Different Threshold (DT) approach to data replication in Data Grids, which can be dynamically adapted to several kinds of access patterns and provide even better performance than Cascading and Fast-Spread strategies. In our approach, there are different thresholds for different layers. Based on this approach, first, we propose a static DT strategy in which the threshold at each layer is fixed. So, by carefully adjusting the difference between the thresholds Ti, where i is the i-th layer of the tree structure, we can even provide the better performance than the above two well-known strategies. Moreover, among large amount of different data files, there may exist some hot data files. Those files which have been mostly requested are hot data files. To reduce the number of requests for the hot files, next, we propose the dynamic DT strategy. In the dynamic DT strategy, each data file even has its own threshold. We let data replication of hot files occur earlier than others by decreasing the thresholds of hot files earlier than the normal ones. From our simulation results, we show that the response time in our static DT strategy is less than that in the Cascading and the Fast-Spread strategies. Moreover, we can show that the performance of the dynamic DT strategy is better than that of the static DT strategy.
|
208 |
The Business Value of Data Warehouses : Opportunities, Pitfalls and Future DirectionsStrand, Matthias January 2000 (has links)
<p>Organisations have spent billions of dollars (USD) on investments in data warehouses. Many have succeeded, but many have also failed. These failures are considered mostly to be of an organisational nature and not of a technological, as one might have expected. Due to the failures, organisations have problems to derive business value from their data warehouse investments. Obtaining business value from data warehouses is necessary, since the investment is of such a magnitude that it is clearly visible in the balance sheet. In order to investigate how the business value may be increased, we have conducted an extensive literature study, aimed at identifying opportunities and future directions, which may alleviate the problem of low return on investment. To balance the work, we have also identified pitfalls, which may hinder organisations to derive business value from their data warehouses.</p><p>Based on the literature survey, we have identified and motivated possible research areas, which we consider relevant if organisations are to derive real business value from their data warehouses. These areas are:</p><p>* Integrating data warehouses in knowledge management.</p><p>* Data warehouses as a foundation for information data super stores.</p><p>* Using data warehouses to predict the need for business change.</p><p>* Aligning data warehouses and business processes.</p><p>As the areas are rather broad, we have also included examples of more specific research problems, within each possible research area. Furthermore, we have given initial ideas regarding how to investigate those specific research problems.</p>
|
209 |
Optimization strategies for data warehouse maintenance in distributed environmentsLiu, Bin. January 2002 (has links)
Thesis (M.S.)--Worcester Polytechnic Institute. / Keywords: schema change; batch; concurrent; data warehouse maintenance; parallel. Includes bibliographical references (p. 72-74).
|
210 |
Component based user guidance in knowledge discovery and data mining /Engels, Robert. January 1999 (has links)
Thesis (doctoral)--Universität, Karlsruhe, 1999.
|
Page generated in 0.0756 seconds