Global ETD Search

891	Text mining of online book reviews for non-trivial clustering of books and users Lin, Eric 14 August 2013 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The classification of consumable media by mining relevant text for their identifying features is a subjective process. Previous attempts to perform this type of feature mining have generally been limited in scope due having limited access to user data. Many of these studies used human domain knowledge to evaluate the accuracy of features extracted using these methods. In this thesis, we mine book review text to identify nontrivial features of a set of similar books. We make comparisons between books by looking for books that share characteristics, ultimately performing clustering on the books in our data set. We use the same mining process to identify a corresponding set of characteristics in users. Finally, we evaluate the quality of our methods by examining the correlation between our similarity metric, and user ratings. mining data analysis recommendation sentiment End-user computing Web usage mining Knowledge management Information behavior -- Research Cluster analysis -- Data processing System analysis -- Data processing Information retrieval -- Book reviews
892	Viejo Period Architecture in the Casas Grandes Region of Northern Mexico Jensen, Samuel J. 24 April 2023 (has links) (PDF) The Casas Grandes region of northern Mexico is an understudied, though important, part of the culture area that has come to be known as the Northwest/Southwest (NW/SW). What studies have been conducted in the Casas Grandes region have focused on the Medio Period (approximately 1200-1450 AD) and the large site of Paquimé. Only a small amount of research has been conducted on the preceding Viejo Period (approximately 700-1200 AD). In this thesis, I create a clearing house of published Viejo Period architectural features excavated in the Casas Grandes region. I also analyze those features to develop our understanding of the materials and technological choices used to construct these features, and to evaluate the validity of sub-regional zones which have begun to develop within the archaeological literature from this area. These analyses include a qualitative analysis of the excavated architectural features as well as statistical clustering methods, a Principal Components Analysis, and a Correspondence Analysis of available architectural data. I ultimately propose revisions to the existing architectural typology for the Viejo Period and the abandonment of the concept of sub-regional zones within the Casas Grandes region. I also observe some emerging patterns within the architectural data and suggest that further research is needed to fully understand the distribution of architectural features throughout the region. casas grandes viejo period architecture chihuahua nw/sw northern mexico cluster analysis principal components analysis correspondence analysis pithouses Family, Life Course, and Society
893	A comparative study on a practical use case for image clustering based on common shareability and metadata / En jämförande studie i ett praktiskt användningsfall för bildklustring baserat på gemensamt delade bilder och dess metadata Dackander, Erik January 2018 (has links) As the amount of data increases every year, the need for effective structuring of data is a growing problem. This thesis aims to investigate and compare how four different clustering algorithms perform on a practical use case for images. The four algorithms used are Affinity Propagation, BIRCH, Rectifying Self-Organizing Maps, Deep Embedded Clustering. The algorithms get the image metadata and also its content, extracted using a pre-trained deep convolutional neural network. The results demonstrate that while there are variations in the data, Affinity Propagation and BIRCH shows the most potential among the four algorithms. Furthermore, when metadata is available it improves the results of the algorithms that can process the extreme values cause. For Affinity Propagation the mean share score is improved by 5.6 percentage points and the silhouette score is improved by 0.044. BIRCH mean share score improves by 1.9 percentage points and silhouette score by 0.051. RSOM and DEC could not process the metadata. / Allt eftersom datamängderna ökar för varje år som går så ökar även behovet av att strukturera datan på en bra sätt. Detta arbete syftar till att undersöka och jämföra hur väl fyra olika klustringsalgoritmer fungerar för ett praktiskt användningsfall med bilder. De fyra algorithmerna som används är Affinity Propagation, BIRCH, Rectifying Self-Organizing Maps och Deep Embedded Clustering. Algoritmerna hade bildernas metadata samt deras innehåll, framtaget med hjälp av ett deep convolutional neural network, att använda för klustringen. Resultaten visar att även om det finns stora variationer i utfallen, visar Affinity Propagation och BIRCH den största potentialen av de fyra algoritmerna. Vidare verkar metadatan, när den finns tillgänglig, förbättra resultaten för de klustringsalgoritmer som kunde hantera de extremvärden som metadatan kunde ge upphov till. För Affinity propagation föbättrades den genomsnittliga delnings poängen med 5,6 procentenheter och dess silhouette index ökade med 0.044. BIRCHs genomsnittliga delnings poäng ökade med 1,9 procentenheter samt dess silhouette index förbättades med 0.051. RSOM och DEC kunde inte processa metadatan. clustering cluster analysis machine learning degoo image clustering comparative study klustring klusteranalys maskininlärning degoo bildklustring jämförande studie Computer Sciences Datavetenskap (datalogi)
894	Deinterleaving of radar pulses with batch processing to utilize parallelism / Gruppering av radar pulser med batch-bearbetning för att utnyttja parallelism Lind, Emma, Stahre, Mattias January 2020 (has links) The threat level (specifically in this thesis, for aircraft) in an environment can be determined by analyzing radar signals. This task is critical and has to be solved fast and with high accuracy. The received electromagnetic pulses have to be identiﬁed in order to classify a radar emitter. Usually, there are several emitters transmitting radar pulses at the same time in an environment. These pulses need to be sorted into groups, where each group contains pulses from the same emitter. This thesis aims to find a fast and accurate solution to sort the pulses in parallel. The selected approach analyzes batches of pulses in parallel to exploit the advantages of a multi-threaded Central Processing Unit (CPU) or a Graphics Processing Unit (GPU). Firstly, a suitable clustering algorithm had to be selected. Secondly, an optimal batch size had to be determined to achieve high clustering performance and to rapidly process the batches of pulses in parallel. A quantitative method based on experiments was used to measure clustering performance, execution time, system response, and parallelism as a function of batch sizes when using the selected clustering algorithm. The algorithm selected for clustering the data was Density-based Spatial Clustering of Applications with Noise (DBSCAN) because of its advantages, such as not having to specify the number of clusters in advance, its ability to find arbitrary shapes of a cluster in a data set, and its low time complexity. The evaluation showed that implementing parallel batch processing is possible while still achieving high clustering performance, compared to a sequential implementation that used the maximum likelihood method.An optimal batch size in terms of data points and cutoff time is hard to determine since the batch size is very dependent on the input data. Therefore, one batch size might not be optimal in terms of clustering performance and system response for all streams of data. A solution could be to determine optimal batch sizes in advance for different streams of data, then adapt a batch size depending on the stream of data. However, with a high level of parallelism, an additional delay is introduced that depends on the difference between the time it takes to collect data points into a batch and the time it takes to process the batch, thus the system will be slower to output its result for a given batch compared to a sequential system. For a time-critical system, a high level of parallelism might be unsuitable since it leads to slower response times. / Genom analysering av radarsignaler i en miljö kan hotnivån bestämmas. Detta är en kritisk uppgift som måste lösas snabbt och med bra noggrannhet. För att kunna klassificera en specifik radar måste de elektromagnetiska pulserna identifieras. Vanligtvis sänder flera emittrar ut radarpulser samtidigt i en miljö. Dessa pulser måste sorteras i grupper, där varje grupp innehåller pulser från en och samma emitter. Målet med denna avhandling är att ta fram ett sätt att snabbt och korrekt sortera dessa pulser parallellt. Den valda metoden använder grupper av data som analyserades parallellt för att nyttja fördelar med en multitrådad Central Processing Unit (CPU) eller en Central Processing Unit (CPU) or a Graphics Processing Unit (GPU). Först behövde en klustringsalgoritm väljas och därefter en optimal gruppstorlek för den valda algoritmen. Gruppstorleken baserades på att grupperna kunde behandlas parallellt och snabbt, samt uppnå tillförlitlig klustring. En kvantitativ metod användes som baserades på experiment genom att mäta klustringens tillförlitlighet, exekveringstid, systemets svarstid och parallellitet som en funktion av gruppstorlek med avseende på den valda klustringsalgoritmen. Density-based Spatial Clustering of Applications with Noise (DBSCAN) valdes som algoritm på grund av dess förmåga att hitta kluster av olika former och storlekar utan att på förhand ange antalet kluster för en mängd datapunkter, samt dess låga tidskomplexitet. Resultaten från utvärderingen visade att det är möjligt att implementera ett system med grupper av pulser och uppnå bra och tillförlitlig klustring i jämförelse med en sekventiell implementation av maximum likelihood-metoden. En optimal gruppstorlek i antal datapunkter och cutoff tid är svårt att definiera då storleken är väldigt beroende på indata. Det vill säga, en gruppstorlek måste inte nödvändigtvis vara optimal för alla typer av indataströmmar i form av tillförlitlig klustring och svarstid för systemet. En lösning skulle vara att definiera optimala gruppstorlekar i förväg för olika indataströmmar, för att sedan kunna anpassa gruppstorleken efter indataströmmen. Det uppstår en fördröjning i systemet som är beroende av differensen mellan tiden det tar att skapa en grupp och exekveringstiden för att bearbeta en grupp. Denna fördröjning innebär att en parallell grupp-implementation aldrig kommer kunna vara lika snabb på att producera sin utdata som en sekventiell implementation. Detta betyder att det i ett tidskritiskt system förmodligen inte är optimalt att parallellisera mycket eftersom det leder till långsammare svarstid för systemet. Cluster analysis DBSCAN Parallelization Signal Separation Unsupervised learning Klusteranalys DBSCAN Parallellisering Signal Separation Oövervakat lärande Computer Sciences Datavetenskap (datalogi) Signal Processing Signalbehandling
895	Concentric Layout, A New Scientific Data Layout For Matrix Data Set In Hadoop File System Cheng, Lu 01 January 2010 (has links) The data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data speed and space efficiently, data preprocess operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce framework directly. The gap between the requirement from analytics application and the property of MapReduce framework motivates us to provide support for these data access patterns in MapReduce framework. In our work, we studied the data access patterns in matrix files and proposed a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a data layout which maintains the dimensional property in chunk level. Contrary to the continuous data layout which adopted in current Hadoop framework by default, concentric data layout stores the data from the same sub-matrix into one chunk. This matches well with the matrix operations like computation. The concentric data layout preprocesses the data beforehand, and optimizes the afterward run of MapReduce application. The experiments indicate that the concentric data layout improves the overall performance, reduces the execution time by 38% when the file size is 16 GB, also it relieves the data overhead phenomenon and increases the effective data retrieval rate by 32% on average. Cluster analysis -- Data processing File organization (Computer science) MapReduce (Computer program) Electrical and Computer Engineering Electrical and Electronics Engineering
896	Feature Pruning For Action Recognition In Complex Environment Nagaraja, Adarsh 01 January 2011 (has links) A significant number of action recognition research efforts use spatio-temporal interest point detectors for feature extraction. Although the extracted features provide useful information for recognizing actions, a significant number of them contain irrelevant motion and background clutter. In many cases, the extracted features are included as is in the classification pipeline, and sophisticated noise removal techniques are subsequently used to alleviate their effect on classification. We introduce a new action database, created from the Weizmann database, that reveals a significant weakness in systems based on popular cuboid descriptors. Experiments show that introducing complex backgrounds, stationary or dynamic, into the video causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning the system or selecting better interest points. Instead, we show that the problem lies at the descriptor level and must be addressed by modifying descriptors. Cluster analysis Computer vision Human activity recognition Pattern recognition systems Support vector machines Video recordings Electrical and Computer Engineering Electrical and Electronics Engineering
897	Определение эффективных подгрупп в социальной группе на основе применения методологии анализа социальных сетей (SNA-методологии) : магистерская диссертация / Detection of effective subgroups in a social group on the basis of SNA-methodology implementation Муравьев, А. А., Muravyov, A. A. January 2020 (has links) В магистерской диссертации производится сравнительный анализ четырех программных инструментов, которые поддерживают методологию анализа социальных сетей (SNA - методологию), и могут быть использованы для решения задачи формирования эффективных команд. В терминах SNA-методологии это есть поиск подгрупп в социальной группе. Приводится описание наиболее известных алгоритмов кластеризации, а также уровень поддержки этих алгоритмов существующими программными инструментами. В результате определяется наиболее эффективный алгоритм и наиболее удобный программный инструмент для решения данной задачи. / In the master's dissertation, a comparative analysis of four software tools is carried out. These tools support the methodology of analysis of social networks (SNA-methodology) and which could be used for effective teams building. In terms of the SNA-methodology, this is a kind of subgroup search in a social group. Description of the most popular clustering algorithms is delivered, as well as the level of support of these algorithms with software tools is under discussion. As a result, the most effective clustering algorithm and the most usable software tool for solving this problem are determined. ИНСТРУМЕНТАЛЬНОЕ ПО СЕТЕВОЙ ПОДХОД КЛАСТЕРНЫЙ АНАЛИЗ КОМАНДА MASTER'S THESIS SNA-METHODOLOGY SOFTWARE TOOLS NETWORK APPROACH CLUSTER ANALYSIS TEAM
898	New Clustering and Feature Selection Procedures with Applications to Gene Microarray Data Xu, Yaomin January 2008 (has links) No description available. Statistics Bioinformatics coherence index data mining feature selection gene expression pathway gene profiling informative gene microarray data profile cluster analysis partitioning regulatory network statistical pattern recognition
899	Development of a Landslide Hazard Rating System for Selected Counties in Northeastern Ohio Dalqamouni, Ahmad Yousef 07 March 2011 (has links) No description available. Geology Geomorphology Geotechnology engineering geology landslides hazard rating system northeastern ohio traffic parameters liquidity index slope geometry cluster analysis glacial geology exponential scale high hazard potential quantitative system
900	Explication of Political User-Generated Content and Theorizing about Its Effects on Democracy with a Mix-of-Attributes Approach and Documenting Attribute Presence with a Quantitative Content Analysis Dylko, Ivan B. 25 July 2011 (has links) No description available. Communication Mass Communications Mass Media Political Science Political participation democracy user-generated content UGC Web 2.0 communication Internet media effects cluster analysis content analysis

Search results