831 |
Performance Analysis of Relational Database over Distributed File SystemsTsai, Ching-Tang 08 July 2011 (has links)
With the growing of Internet, people use network frequently. Many PC applications have moved to the network-based environment, such as text processing, calendar, photo management, and even users can develop applications on the network. Google is a company providing web services. Its popular services are search engine and Gmail which attract people by short response time and lots amount of data storage. It also charges businesses to place their own advertisements. Another hot social network is Facebook which is also a popular website. It processes huge instant messages and social relationships between users. The magic power of doing this depends on the new technique, Cloud Computing.
Cloud computing has ability to keep high-performance processing and short response time, and its kernel components are distributed data storage and distributed data processing. Hadoop is a famous open source to build cloud distributed file system and distributed data analysis. Hadoop is suitable for batch applications and write-once-and-read-many applications. Thus, currently there are only fewer applications, such as pattern searching and log file analysis, to be implemented over Hadoop. However, almost all database applications are still using relational databases. To port them into cloud platform, it becomes necessary to let relational database running over HDFS. So we will test the solution of FUSE-DFS which is an interface to mount HDFS into a system and is used like a local filesystem. If we can make FUSE-DFS performance satisfy user¡¦s application, then we can easier persuade people to port their application into cloud platform with least overhead.
|
832 |
An Edge-Based Algorithm for Spatial Query Processing in Real-Life Road NetworksWu, Xu-Lun 14 July 2011 (has links)
Due to wireless communication technologies, positioning technologies, and mobile computing develop quickly, mobile services are becoming practical and important on
big spatiotemporal databases management. Mobile service users move only inside a spatial network, e.g. a road network. They often issue the K Nearest Neighbor (KNN) query to obtain data objects reachable through the road network. The challenge problem of mobile services is how to efficiently answer the data objects which user interest to the corresponding mobile users. Therefore, how to effectively modeling road networks, effectively indexing, and querying on the road networks have become a popular topic. Lu et. al. have proposed a road network model that captures the real-life road networks better than previous models. Then, based on their model, they have proposed a RNG (Road Network Grid) index for speeding up the KNN
query on real-life road networks. The RNG index structure is a quad-tree structure and a point-based structure. However, in their model, they divide the double track road which U-turn is allowed at some parts. This modeling does not capture the real-life road networks accurately. Since they divide the road, this makes the number of points of the graph increase. The number of times of partitioning the graph increases. It increases the execution time of constructing the index structure. The format of the leaf node of the RNG index makes the search time increase. Moreover, the query processing on the RNG index structure has to visit the root repeatedly. This condition makes the search time increase. Therefore, in this thesis, we propose a network model that captures the real-life road networks. We do not have to divide the real-life roads when we map the real-life roads into graph. We map the real-life road networks into graph directly. Then, based on our network model, we propose an EBNA (Edge-Based Nine-Area tree) index structure to make the search time of obtaining the interest edge information quickly. The EBNA index structure is an edge-based index structure. We store all of the edge information on the leaf node. We can obtain the edge information directly. Each edge information entry has a
pointer point to link edges. Links of each edge entry consist a graph. This graph makes the KNN query processing visit the root only one time. From our simulation result, we show that the performance of constructing the EBNA index is better than constructing the RNG index and the performance of the KNN query processing by using EBNA index is better than the KNN query processing by using RNG index.
|
833 |
NAAK-Tree: An Index for Querying Spatial Approximate KeywordsLiou, Yen-Guo 11 July 2012 (has links)
¡@¡@In recent years, the geographic information system (GIS) databases develop quickly and play a significant role in many applications. Many of these applications allow users to find objects with keywords and spatial information at the same time. Most researches in the spatial keyword queries only consider the exact match between the database and query with the textual information. Since users may not know how to spell the exact keyword, they make a query with the approximate-keyword, instead of the exact keyword. Therefore, how to process the approximate-keyword query in the spatial database becomes an important research topic. Alsubaiee et al. have proposed the Location-Based-Approximate-Keyword-tree (LBAK-tree) index structure which is to augment a tree-based spatial index with approximate-string indexes such as a gram-based index. However, the LBAK-tree index structure is the R*-tree based index structure. The nodes of the R*-tree have to be split and be reinserted when they get full. Due to this condition, it can not index the spatial attribute and the textual attribute at the same time. It stores the keywords in the nodes after the R*-tree is already built. Based on the R*-tree, it has to search all the children in a node to insert a new item and answer a query. Moreover, after they find the needed keywords by using the approximate index, they probe the nodes by checking the intersection of the similar keyword sets and the keywords stored in the nodes. However, the higher level the node is, the larger the number of keywords stored in the node is. It takes long time to check the intersections. And the LBAK-tree checks all the intersections even if there exits one of the intersections which is already an empty set. Therefore, in this thesis, we propose the Nine-Area-Approximate-Keyword-tree (NAAK-tree) index structure to process the spatial approximate-keyword query. We do not have to partition the space to construct the spatial index. We do not have to reinsert the children when split the nodes, so we can deal with the keywords at the same time. We can use the spatial number to find out the nodes that satisfy the spatial condition of the query. And we augment the NAAK-tree with signatures to speed up the query of the textual condition. We use the union of the bit strings of each keyword in a node to represent them in the node. Therefore, we can efficiently filter out the nodes that there is no keyword corresponding to the query by checking
the signatures just one time without checking all the keywords stored in the nodes. Based on our NAAK-tree, if there exits one empty set in the similar keywords sets, we do not check all the similar keywords sets. From our simulation results, we show that the NAAK-tree is more efficient than the LBAK-tree to build the index and answer the spatial approximate-keyword query.
|
834 |
The Beef Nutrient Database Improvement Project: Retail Cuts From the ChuckWest, Sarah 16 January 2010 (has links)
A total of 40 beef arm chucks were collected from three cities across the United
States to study the proximate composition of their separable lean. Chucks were
fabricated 5-7 d postmortem and later cooked and dissected, or dissected raw into four
separable components, separable lean, external fat, separable seam (intermuscular) fat,
and connective tissue (considered inedible). Proximate analysis was conducted on the
separable lean component of each dissected retail cut.
Dissection data showed that multiple muscled cuts had a numerically lower
percent separable lean when compared to the retail cuts comprised of a single muscle.
Proximate analysis showed that as the mean value for moisture decreased in the retail
cut, the mean percentage of total fat increased. Least squares means of total fat
percentage were reported on the retail cuts stratified by USDA quality grade (upper
Choice, lower Choice, and Select). Some of the retail cuts had significantly different
total fat percentage of the separable lean when considering the differences in USDA
quality grade. Cooking yields for the three methods utilized were numerically different.
The cuts that were roasted had the highest cooking yield (80.72 %), followed by cuts that were grilled (76.58%), and finally cuts that were braised (66.13%). Differences in final
endpoint temperature for each cut may account for the differences between cooking
methods.
This study was designed to acquire data to update the National Nutrient Database
for Standard Reference, as well as to provide nutritional information for cuts that are not
presently in the database. This study evaluated thirteen cooked cuts and twelve raw cuts
in an effort to increase the number of retail cuts available to search for nutrient
information in the National Database.
|
835 |
Consumers' Cognitive, Affective, and Behavioral Responses to an Invasion of Privacy: Essays on Understanding Consumer's Privacy ConcernsSrivastava, Mona 15 May 2009 (has links)
This dissertation focuses on the discrepancy between consumers’ attitudes towards
privacy and actual behavior. Although consumers increasingly protest against invasions
of privacy, they routinely disclose more information than their disclosure intent. Firms
make sizeable investments in acquiring consumer information because it helps them
build and enhance customer relationships. However, some of the information acquisition
occurs at the expense of consumers’ privacy. Against this backdrop, understanding and
being responsive to consumers’ privacy concerns is critical.
Essay 1 focuses on consumers’ thoughts and feelings underlying their intention
to disclose or withhold information from firms. I use the Zaltman Metaphor Elicitation
Technique (ZMET), a depth interviewing process that involves story-telling, sensory
images, and vignettes based on psychodrama. The results reported are based on depth
interviews of twenty consumers from a large city and mid-sized town in the U.S.A. Essay 2 focuses on consumers’ behavioral responses to an invasion of privacy
from a social justice theory perspective. I use the Critical Incident Technique (CIT) in an
online survey of 997 respondents to understand thoughts and feelings about privacy that
drive the behavioral responses of consumers to an actual/potential invasion of privacy. I
identify the antecedents and outcomes of consumers’ information experience with firms.
Additionally, I examine vividness effects to understand the extent to which consumer
perceptions of likely outcomes due to firms acquiring and using information about them
are influenced by media coverage of the issue.
Building on the findings of essays 1 and 2, I develop a model and working
hypotheses for further empirical analysis. By examining the negative (i.e., violation of
privacy) as well as positive experiences of consumers, I identify how consumers’
attitudes towards firms acquiring and using information about them are focused on risks,
whereas their behavior takes into account risks as well as rewards.
A better understanding of consumers’ privacy concerns can be valuable to firms
in personalizing their data acquisition and use strategies, customer communications as
well as their overall customer relationship management (CRM) strategy.
|
836 |
Advances in diapriid (Hymenoptera: diapriidae) systematics, with contributions to cybertaxonomy and the analysis of rRNA sequence dataYoder, Matthew Jon 15 May 2009 (has links)
Diapriids (Hymenoptera: Diapriidae) are small parasitic wasps. Though found
throughout the world they are relatively unknown. A framework for advancing diapriid
systematics is developed by introducing a new web-based application/database capable
of storing a broad range of systematic data, and the first molecular phylogeny
specifically focused at examining intrafamilial relationships. In addition to these efforts,
a description of a new taxon is provided. Several advantages of digital description,
including linking descriptions to an ontology of morphological terms, are highlighted.
The functionality of the database is further illustrated in the production of a catalog of
diapriid host associations. The hosts database currently holds over 450 association
records, for over 500 named taxa (parasitoids and hosts), and over 180 references.
Diapriids are found to be primarily endoparasitoids of Diptera emerging from the host
pupa. Phylogenetic inference for a molecular dataset of 28S and 18S rRNA sequence
data, derived from a diverse selection of diapriids, is accomplished with a new suite of
tools developed for handling complex rRNA datasets. Several parsimony-based
methodologies, including an alignment-free method of analyzing multiple sequences, are
reviewed and applied using the new software tools. Diapriid phylogenetic relationships
are shown to be broadly congruent with existing morphology-based classifications.
Methods for analyzing typically excluded sequence data are shown to recover
phylogenetic signal that would otherwise be lost and the alignment-free method
performed remarkably well in this regard. Empirically, phylogenetic approaches that
incorporate structural data were not notably different than those that did not.
|
837 |
The Beef Nutrient Database Improvement Project: Retail Cuts From the Rib and PlateMay, Laura 2010 December 1900 (has links)
The purpose of this study was to collect and analyze retail cuts from the beef rib and plate that had been identified as needing nutrient composition updates in the United States Department of Agriculture’s (USDA) National Nutrient Database for Standard Reference (SR). Twenty beef carcasses were selected from three different regions of the United States, and the rib and plate were collected for shipment via refrigerated truck to the Rosenthal Meat Science and Technology Center. Each rib and plate was fabricated 14 to 21 d postmortem into the appropriate retail cuts to be used for this study. The cuts were dissected, either raw or cooked (braised, grilled, roasted), into four separable components: separable lean, seam fat, external fat, and refuse. Bone and heavy connective tissue were considered refuse. Percent total chemical fat, moisture, protein, and ash analyses were conducted on the separable lean component obtained from dissection.
Cooking yields were evaluated for each of the three cooking methods utilized in this study. Grilled cuts had the highest numerical yield followed by roasted and braised cuts. Dissection data showed single muscle cuts had a higher percentage of separable lean than retail cuts composed of multiple muscles. Boneless and lip-off retail cuts contained a higher percentage of separable lean when compared to their bone-in and lip-on counterparts. Finally, proximate analysis data showed that as retail cuts increased in the percentage of total chemical fat, the percentage of moisture decreased. When percentage of total chemical fat was stratified by USDA quality grade, most cuts showed differences between USDA Choice and Select quality grades.
This study was a collaborative project; therefore, the results and discussion of this thesis are only based on findings from Texas A and M University's data. The final project results will be published in the USDA’s National Nutrient Database SR.
|
838 |
Light Scattering Problem and its Application in Atmospheric ScienceMeng, Zhaokai 2010 December 1900 (has links)
The light scattering problem and its application in atmospheric science is studied
in this thesis. In the first part of this thesis, light scattering theory of single irregular
particles is investigated. We first introduce the basic concepts of the light scattering
problem. T-matrix ansatz, as well as the null-field technique, are introduced in the
following sections. Three geometries, including sphere, cylinder and hexagonal column,
are defined subsequently. Corresponding light scattering properties (i.e., T-matrix and
Mueller Matrix) of those models with arbitrary sizes are simulated via the T-matrix
method.
In order to improve the efficiency for the algorithms of single-light scattering, we
present a user-friendly database software package of the single-scattering properties of
individual dust-like aerosol particles. The second part of this thesis describes this
database in detail. Its application to radiative transfer calculations in a spectral region
from ultraviolet (UV) to far-infrared (far-IR) is introduced as well. To expand the degree
of morphological freedom of the commonly used spheroidal and spherical models, triaxial
ellipsoids were assumed to be the overall shape of dust-like aerosol particles. The
software package allows for the derivation of the bulk optical properties for a given distribution of particle microphysical parameters (i.e., refractive index, size parameter
and two aspect ratios). The array-oriented single-scattering property data sets are stored
in the NetCDF format.
The third part of this thesis examines the applicability of the tri-axial ellipsoidal
dust model. In this part, the newly built database is equipped in the study. The precomputed
optical properties of tri-axial models are imported to a polarized addingdoubling
radiative transfer (RT) model. The radiative transfer property of a well-defined
atmosphere layer is consequently simulated. Furthermore, several trial retrieval
procedures are taken based on a combination of intensity and polarization in the results of
RT simulation. The retrieval results show a high precision and indicate a further
application in realistic studies.
|
839 |
Analysis of the HSEES Chemical Incident Database Using Data and Text Mining MethodologiesMahdiyati, - 2011 May 1900 (has links)
Chemical incidents can be prevented or mitigated by improving safety performance and implementing the lessons learned from past incidents. Despite some limitations in the range of information they provide, chemical incident databases can be utilized as sources of lessons learned from incidents by evaluating patterns and relationships that exist between the data variables. Much of the previous research focused on studying the causal factors of incidents; hence, this research analyzes the chemical incidents from both the causal and consequence elements of the incidents.
A subset of incidents data reported to the Hazardous Substance Emergency Events Surveillance (HSEES) chemical incident database from 2002-2006 was analyzed using data mining and text mining methodologies. Both methodologies were performed with the aid of STATISTICA software. The analysis studied 12,737 chemical process related incidents and extracted descriptions of incidents in free-text data format from 3,316 incident reports. The structured data was analyzed using data mining tools such as classification and regression trees, association rules, and cluster analysis. The unstructured data (textual data) was transformed into structured data using text mining, and subsequently analyzed further using data mining tools such as, feature selections and cluster analysis.
The data mining analysis demonstrated that this technique can be used in estimating the incident severity based on input variables of release quantity and distance between victims and source of release. Using the subset data of ammonia release, the classification and regression tree produced 23 final nodes. Each of the final nodes corresponded to a range of release quantity and, of distance between victims and source of release. For each node, the severity of injury was estimated from the observed severity scores' average. The association rule identified the conditional probability for incidents involving piping, chlorine, ammonia, and benzene in the value of 0.19, 0.04, 0.12, and 0.04 respectively. The text mining was utilized successfully to generate elements of incidents that can be used in developing incident scenarios. Also, the research has identified information gaps in the HSEES database that can be improved to enhance future data analysis. The findings from data mining and text mining should then be used to modify or revise design, operation, emergency response planning or other management strategies.
|
840 |
Target Market Prediction for New Mobile Telecommunications Products and Services: A Data Mining ApproachChung, Yung-jui 11 August 2004 (has links)
As the deregulation of the mobile number portability (MNP) and the emergence of such new technologies and services as PHS and 3G, the mobile telecommunications industry in Taiwan becomes highly competitive than ever. Under such competition, customer churning and profit declining have become of great concerns to mobile service providers. In response, most of providers continuously develop and introduce new value-added products and services. Frequent value-add products and services might strengthen customers¡¦ loyalty (i.e., decrease customer churning) and improve gross profits, but the corresponding marketing cost would also be increased dramatically.
To lower the marketing cost and respond to market quickly, marketing staff typically adopts a pilot test based on the simple random sampling (SRS) approach or relies on marketing experts for defining potential target market for a new value-add product or service. The former approach requires a large number of respondents in the pilot test, while the latter is knowledge intensive and may suffer from unavailability of knowledge due to turnover of experienced marketing experts.
In this thesis, we propose a novel approach for efficient and effective search for the target market for a new product/service. Specifically, we consider the target market of a new product or service being that of the most similar existing product/service, where the similarity of products/services can be defined based on either their product/service attributes or the similarity between the pilot test of the new product/service and the customer-base of an existing product/service. Accordingly, we propose two target market prediction models for new product/service, i.e., ¡§customer-based target market prediction model¡¨ and ¡§product-attribute-based target market prediction model.¡¨ Our empirical results show that the proposed prediction models are more effective in predicting potential customers for new products/services than traditional approaches.
|
Page generated in 0.0644 seconds