• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 25
  • 15
  • 4
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 52
  • 52
  • 28
  • 25
  • 14
  • 13
  • 13
  • 13
  • 11
  • 11
  • 10
  • 9
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Segmentation of human ovarian follicles from ultrasound images acquired <i>in vivo</i> using geometric active contour models and a naïve Bayes classifier

Harrington, Na 14 September 2007
Ovarian follicles are spherical structures inside the ovaries which contain developing eggs. Monitoring the development of follicles is necessary for both gynecological medicine (ovarian diseases diagnosis and infertility treatment), and veterinary medicine (determining when to introduce superstimulation in cattle, or dividing herds into different stages in the estrous cycle).<p>Ultrasound imaging provides a non-invasive method for monitoring follicles. However, manually detecting follicles from ovarian ultrasound images is time consuming and sensitive to the observer's experience. Existing (semi-) automatic follicle segmentation techniques show the power of automation, but are not widely used due to their limited success.<p>A new automated follicle segmentation method is introduced in this thesis. Human ovarian images acquired <i>in vivo</i> were smoothed using an adaptive neighbourhood median filter. Dark regions were initially segmented using geometric active contour models. Only part of these segmented dark regions were true follicles. A naïve Bayes classifier was applied to determine whether each segmented dark region was a true follicle or not. <p>The Hausdorff distance between contours of the automatically segmented regions and the gold standard was 2.43 ± 1.46 mm per follicle, and the average root mean square distance per follicle was 0.86 ± 0.49 mm. Both the average Hausdorff distance and the root mean square distance were larger than those reported in other follicle segmentation algorithms. The mean absolute distance between contours of the automatically segmented regions and the gold standard was 0.75 ± 0.32 mm, which was below that reported in other follicle segmentation algorithms.<p>The overall follicle recognition rate was 33% to 35%; and the overall image misidentification rate was 23% to 33%. If only follicles with diameter greater than or equal to 3 mm were considered, the follicle recognition rate increased to 60% to 63%, and the follicle misidentification rate increased slightly to 24% to 34%. The proposed follicle segmentation method is proved to be accurate in detecting a large number of follicles with diameter greater than or equal to 3 mm.
12

Spam filter for SMS-traffic

Fredborg, Johan January 2013 (has links)
Communication through text messaging, SMS (Short Message Service), is nowadays a huge industry with billions of active users. Because of the huge userbase it has attracted many companies trying to market themselves through unsolicited messages in this medium in the same way as was previously done through email. This is such a common phenomenon that SMS spam has now become a plague in many countries. This report evaluates several established machine learning algorithms to see how well they can be applied to the problem of filtering unsolicited SMS messages. Each filter is mainly evaluated by analyzing the accuracy of the filters on stored message data. The report also discusses and compares requirements for hardware versus performance measured by how many messages that can be evaluated in a fixed amount of time. The results from the evaluation shows that a decision tree filter is the best choice of the filters evaluated. It has the highest accuracy as well as a high enough process rate of messages to be applicable. The decision tree filter which was found to be the most suitable for the task in this environment has been implemented. The accuracy in this new implementation is shown to be as high as the implementation used for the evaluation of this filter. Though the decision tree filter is shown to be the best choice of the filters evaluated it turned out the accuracy is not high enough to meet the specified requirements. It however shows promising results for further testing in this area by using improved methods on the best performing algorithms.
13

Segmentation of human ovarian follicles from ultrasound images acquired <i>in vivo</i> using geometric active contour models and a naïve Bayes classifier

Harrington, Na 14 September 2007 (has links)
Ovarian follicles are spherical structures inside the ovaries which contain developing eggs. Monitoring the development of follicles is necessary for both gynecological medicine (ovarian diseases diagnosis and infertility treatment), and veterinary medicine (determining when to introduce superstimulation in cattle, or dividing herds into different stages in the estrous cycle).<p>Ultrasound imaging provides a non-invasive method for monitoring follicles. However, manually detecting follicles from ovarian ultrasound images is time consuming and sensitive to the observer's experience. Existing (semi-) automatic follicle segmentation techniques show the power of automation, but are not widely used due to their limited success.<p>A new automated follicle segmentation method is introduced in this thesis. Human ovarian images acquired <i>in vivo</i> were smoothed using an adaptive neighbourhood median filter. Dark regions were initially segmented using geometric active contour models. Only part of these segmented dark regions were true follicles. A naïve Bayes classifier was applied to determine whether each segmented dark region was a true follicle or not. <p>The Hausdorff distance between contours of the automatically segmented regions and the gold standard was 2.43 ± 1.46 mm per follicle, and the average root mean square distance per follicle was 0.86 ± 0.49 mm. Both the average Hausdorff distance and the root mean square distance were larger than those reported in other follicle segmentation algorithms. The mean absolute distance between contours of the automatically segmented regions and the gold standard was 0.75 ± 0.32 mm, which was below that reported in other follicle segmentation algorithms.<p>The overall follicle recognition rate was 33% to 35%; and the overall image misidentification rate was 23% to 33%. If only follicles with diameter greater than or equal to 3 mm were considered, the follicle recognition rate increased to 60% to 63%, and the follicle misidentification rate increased slightly to 24% to 34%. The proposed follicle segmentation method is proved to be accurate in detecting a large number of follicles with diameter greater than or equal to 3 mm.
14

Seleção de características para problemas de classificação de documentos

Hugo Wanderley Pinheiro, Roberto 31 January 2011 (has links)
Made available in DSpace on 2014-06-12T15:58:24Z (GMT). No. of bitstreams: 2 arquivo4097_1.pdf: 888475 bytes, checksum: 0cb3006c0211d4a3f7598e6efed04914 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2011 / Os sistemas de classificação de documentos servem, de modo geral, para facilitar o acesso do usuário a uma base de documentos. Esses sistemas podem ser utilizados para detectar spams; recomendar notícias de uma revista, artigos científicos ou produtos de uma loja virtual; refinar buscas e direcioná-las por assunto. Uma das maiores dificuldades na classificação de documentos é sua alta dimensionalidade. A abordagem bag of words, utilizada para extrair as características e obter os vetores que representam os documentos, gera dezenas de milhares de características. Vetores dessa dimensão demandam elevado custo computacional, além de possuir informações irrelevantes e redundantes. Técnicas de seleção de características reduzem a dimensionalidade da representação, de modo a acelerar o processamento do sistema e a facilitar a classificação. Entretanto, a seleção de características utilizada em problemas de classificação de documentos requer um parâmetro m que define quantas características serão selecionadas. Encontrar um bom valor para m é um procedimento complicado e custoso. A idéia introduzida neste trabalho visa remover a necessidade do parâmetro m e garantir que as características selecionadas cubram todos os documentos do conjunto de treinamento. Para atingir esse objetivo, o algoritmo proposto itera sobre os documentos do conjunto de treinamento e, para cada documento, escolhe a característica mais relevante. Se a característica escolhida já tiver sido selecionada, ela é ignorada, caso contrário, ela é selecionada. Deste modo, a quantidade de características é conhecida no final da execução do algoritmo, sem a necessidade de declarar um valor prévio para m. Os métodos propostos seguem essa ideia inicial com certas variações: inserção do parâmetro f para selecionar várias características por documento; utilização de informação local das classes; restrição de quais documentos serão usados no processo de seleção. Os novos algoritmos são comparados com um método clássico (Variable Ranking). Nos experimentos, foram usadas três bases de dados e cinco funções de avaliação de característica. Os resultados mostram que os métodos propostos conseguem melhores taxas de acerto
15

Automation of support service using Natural Language Processing : - Automation of errands tagging

Haglund, Kristoffer January 2020 (has links)
In this paper, Natural Language Processing and classification algorithms were used to create a program that automatically can tag different errands that are connected to Fortnox (an IT company based in Växjö) support service. Controlled experiments were conducted to find the best classification algorithm together with different Bag-of-Word pre-processing algorithms to find what was best suited for this problem. All data were provided by Fortnox and were manually labeled with tags connected to it as training and test data. The result of the final algorithm was 69.15% correctly/accurately predicted errands using all original data. When looking at the data that were incorrectly predicted a pattern was noticed where many errands have identical text attached to them. By removing the majority of these errands, the result was increased to 94.08%.
16

Automation of support service using Natural Language Processing : Automation of errands tagging

Haglund, Kristoffer January 2020 (has links)
In this paper, Natural Language Processing and classification algorithms were used to create a program that automatically can tag different errands that are connected to Fortnox (an IT company based in Växjö) support service. Controlled experiments were conducted to find the best classification algorithm together with different Bag-of-Word pre-processing algorithms to find what was best suited for this problem. All data were provided by Fortnox and were manually labeled with tags connected to it as training and test data. The result of the final algorithm was 69.15% correctly/accurately predicted errands using all original data. When looking at the data that were incorrectly predicted a pattern was noticed where many errands have identical text attached to them. By removing the majority of these errands, the result was increased to 94.08%
17

An implementation analysis of a machine learning algorithm on eye-tracking data in order to detect early signs of dementia

Lindberg, Jennifer, Siren, Henrik January 2020 (has links)
This study aims to investigate whether or not it is possible to use a machine learning algorithm on eye-tracking data in order to detect early signs of Alzheimer’s disease, which is a type of dementia. Early signs of Alzheimer’s are characterized by mild cognitive impairment. In addition to this, patients with mild cognitive impairment fixate more when reading. The eye-tracking data is gathered in trials, conducted by specialist doctors at a hospital, where 24 patients read a text. Furthermore, the data is pre-processed by extracting different features, such as fixations and difficulty levels of the specific passage in the text. Thenceforth, the features are applied in a naïve Bayes machine learning algorithm, implementing so called leave-one-out cross validation, under two separate conditions; using both fixation features and features related to the difficulty of the text and in addition to this, only using fixation features. Finally, the two conditions achieved the same results - with an accuracy of 64%. Thereby, the conclusion was drawn that even though the amount of data samples (patients) was small, the machine learning algorithm could somewhat predict if a patient was at an early stage of Alzheimer’s disease or not, based on eye-tracking data. Additionally, the implementation is further analyzed through the use of a stakeholder analysis, a SWOT-analysis and from an innovation perspective. / Denna studie syftar till att undersöka huruvida det är möjligt att använda en maskininlärningsalgoritm på eye-tracking data för att upptäcka tidiga tecken på Alzheimer’s sjukdom, vilket är en typ av demens. Tidiga tecken på Alzheimer’s karaktäriseras av mild kognitiv nedsättning. Vidare fixerar patienter med en mild kognitiv nedsättning mer när de läser. Eye-tracking data samlas in i undersökningar genomförda av specialistläkare på ett sjukhus, där 24 patienter läser en text. Därefter förbehandlas datan genom att extrahera olika features, såsom fixeringar och svårighetsnivåer på specifika avsnitt i texten. Efter detta appliceras features i en naïve Bayes maskininlärningsalgoritm som implementerar så kallad leave-one- out cross validation under två separata fall; användande av enbart fixerings features samt användandet av både fixerings features och features för svårighetsgrad för olika avsnitt i texten. Slutligen erhölls samma resultat i båda fallen – med en accuracy på 64%. Därav drogs slutsatsen att även om mängden data (antalet patienter) var liten, kunde maskininlärningsalgoritmen till viss del förutse om en patient var i ett tidigt stadie av Alzheimer’s sjukdom eller inte, baserat på eye-tracking data. Dessutom analyseras implementationen vidare med användning av en intressentanalys, en SWOT-analys och från ett innovationsperspektiv.
18

Uma investigação empírica e comparativa da aplicação de RNAs ao problema de mineração de opiniões e análise de sentimentos

Moraes, Rodrigo de 26 March 2013 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2015-05-04T17:25:43Z No. of bitstreams: 1 Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5) / Made available in DSpace on 2015-05-04T17:25:43Z (GMT). No. of bitstreams: 1 Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5) Previous issue date: 2013 / Nenhuma / A área de Mineração de Opiniões e Análise de Sentimentos surgiu da necessidade de processamento automatizado de informações textuais referentes a opiniões postadas na web. Como principal motivação está o constante crescimento do volume desse tipo de informação, proporcionado pelas tecnologia trazidas pela Web 2.0, que torna inviável o acompanhamento e análise dessas opiniões úteis tanto para usuários com pretensão de compra de novos produtos quanto para empresas para a identificação de demanda de mercado. Atualmente, a maioria dos estudos em Mineração de Opiniões e Análise de Sentimentos que fazem o uso de mineração de dados se voltam para o desenvolvimentos de técnicas que procuram uma melhor representação do conhecimento e acabam utilizando técnicas de classificação comumente aplicadas, não explorando outras que apresentam bons resultados em outros problemas. Sendo assim, este trabalho tem como objetivo uma investigação empírica e comparativa da aplicação do modelo clássico de Redes Neurais Artificiais (RNAs), o multilayer perceptron , no problema de Mineração de Opiniões e Análise de Sentimentos. Para isso, bases de dados de opiniões são definidas e técnicas de representação de conhecimento textual são aplicadas sobre essas objetivando uma igual representação dos textos para os classificadores através de unigramas. A partir dessa reresentação, os classificadores Support Vector Machines (SVM), Naïve Bayes (NB) e RNAs são aplicados considerandos três diferentes contextos de base de dados: (i) bases de dados balanceadas, (ii) bases com diferentes níveis de desbalanceamento e (iii) bases em que a técnica para o tratamento do desbalanceamento undersampling randômico é aplicada. A investigação do contexto desbalanceado e de outros originados dele se mostra relevante uma vez que bases de opiniões disponíveis na web normalmente apresentam mais opiniões positivas do que negativas. Para a avaliação dos classificadores são utilizadas métricas tanto para a mensuração de desempenho de classificação quanto para a de tempo de execução. Os resultados obtidos sobre o contexto balanceado indicam que as RNAs conseguem superar significativamente os resultados dos demais classificadores e, apesar de apresentarem um grande custo computacional para treinamento, proporcionam tempos de classificação significantemente inferiores aos do classificador que apresentou os resultados de classificação mais próximos aos dos resultados das RNAs. Já para o contexto desbalanceado, as RNAs se mostram sensíveis ao aumento de ruído na representação dos dados e ao aumento do desbalanceamento, se destacando nestes experimentos, o classificador NB. Com a aplicação de undersampling as RNAs conseguem ser equivalentes aos demais classificadores apresentando resultados competitivos. Porém, podem não ser o classificador mais adequado de se adotar nesse contexto quando considerados os tempos de treinamento e classificação, e também a diferença pouco expressiva de acerto de classificação. / The area of Opinion Mining and Sentiment Analysis emerges from the need for automated processing of textual information about reviews posted in the web. The main motivation of this area is the constant volume growth of such information, provided by the technologies brought by Web 2.0, that makes impossible the monitoring and analysis of these reviews that are useful for users, who desire to purchase new products, and for companies to identify market demand as well. Currently, the most studies of Opinion Mining and Sentiment Analysis that make use of data mining aims to the development of techniques that seek a better knowledge representation and using classification techniques commonly applied and they not explore others classifiers that work well in other problems. Thus, this work aims a comparative empirical research of the ap-plication of the classical model of Artificial Neural Networks (ANN), the multilayer perceptron, in the Opinion Mining and Sentiment Analysis problem. For this, reviews datasets are defined and techniques for textual knowledge representation applied to these aiming an equal texts rep-resentation for the classifiers. From this representation, the classifiers Support Vector Machines (SVM), Naïve Bayes (NB) and ANN are applied considering three data context: (i) balanced datasets, (ii) datasets with different unbalanced ratio and (iii) datasets with the application of random undersampling technique for the unbalanced handling. The unbalanced context inves-tigation and of others originated from it becomes relevant once datasets available in the web ordinarily contain more positive opinions than negative. For the classifiers evaluation, metrics both for the classification perform and for run time are used. The results obtained in the bal-anced context indicate that ANN outperformed significantly the others classifiers and, although it has a large computation cost for the training fase, the ANN classifier provides classification time (real-time) significantly less than the classifier that obtained the results closer than ANN. For the unbalanced context, the ANN are sensitive to the growth of noise representation and the unbalanced growth while the NB classifier stood out. With the undersampling application, the ANN classifier is equivalent to the others classifiers attaining competitive results. However, it can not be the most appropriate classifier to this context when the training and classification time and its little advantage of classification accuracy are considered.
19

Categorization of Swedish e-mails using Supervised Machine Learning / Kategorisering av svenska e-postmeddelanden med användning av övervakad maskininlärning

Mann, Anna, Höft, Olivia January 2021 (has links)
Society today is becoming more digitalized, and a common way of communication is to send e-mails. Currently, the company Auranest has a filtering method for categorizing e-mails, but the method is a few years old. The filter provides a classification of valuable e-mails for jobseekers, where employers can make contact. The company wants to know if the categorization can be performed with a different method and improved. The degree project aims to investigate whether the categorization can be proceeded with higher accuracy using machine learning. Three supervised machine learning algorithms, Naïve Bayes, Support Vector Machine (SVM), and Decision Tree, have been examined, and the algorithm with the highest results has been compared with Auranest's existing filter. Accuracy, Precision, Recall, and F1 score have been used to determine which machine learning algorithm received the highest results and in comparison, with Auranest's filter. The results showed that the supervised machine learning algorithm SVM achieved the best results in all metrics. The comparison between Auranest's existing filter and SVM showed that SVM performed better in all calculated metrics, where the accuracy showed 99.5% for SVM and 93.03% for Auranest’s filter. The comparative results showed that accuracy was the only factor that received similar results. For the other metrics, there was a noticeable difference. / Dagens samhälle blir alltmer digitaliserat och ett vanligt kommunikationssätt är att skicka e-postmeddelanden. I dagsläget har företaget Auranest ett filter för att kategorisera e-postmeddelanden men filtret är några år gammalt. Användningsområdet för filtret är att sortera ut värdefulla e-postmeddelanden för arbetssökande, där kontakt kan ske från arbetsgivare. Företaget vill veta ifall kategoriseringen kan göras med en annan metod samt förbättras. Målet med examensarbetet är att undersöka ifall filtreringen kan göras med högre träffsäkerhet med hjälp av maskininlärning. Tre övervakade maskininlärningsalgoritmer, Naïve Bayes, Support Vector Machine (SVM) och Decision Tree, har granskats och algoritmen med de högsta resultaten har jämförts med Auranests befintliga filter. Träffsäkerhet, precision, känslighet och F1-poäng har använts för att avgöra vilken maskininlärningsalgoritm som gav högst resultat sinsemellan samt i jämförelse med Auranests filter. Resultatet påvisade att den övervakade maskininlärningsmetoden SVM åstadkom de främsta resultaten i samtliga mätvärden. Jämförelsen mellan Auranests befintliga filter och SVM visade att SVM presterade bättre i alla kalkylerade mätvärden, där träffsäkerheten visade 99,5% för SVM och 93,03% för Auranests filter. De jämförande resultaten visade att träffsäkerheten var den enda faktorn som gav liknande resultat. För de övriga mätvärdena var det en märkbar skillnad.
20

Ärendehantering genom maskininlärning

Bennheden, Daniel January 2023 (has links)
Det här examensarbetet undersöker hur artificiell intelligens kan användas för att automatisktkategorisera felanmälan som behandlas i ett ärendehanteringssystem genom att användamaskininlärning och tekniker som text mining. Studien utgår från Design Science ResearchMethodology och Peffers sex steg för designmetodologi som utöver design även berör kravställningoch utvärdering av funktion. Maskininlärningsmodellerna som tagits fram tränades på historiskadata från ärendehanteringssystem Infracontrol Online med fyra typer av olika algoritmer, NaiveBayes, Support Vector Machine, Neural Network och Random Forest. En webapplikation togs framför att demonstrera hur en av de maskininlärningsmodeller som tränats fungerar och kan användasför att kategorisera text. Olika användare av systemet har därefter haft möjlighet att testafunktionen och utvärdera hur den fungerar genom att markera när kategoriseringen avtextprompter träffar rätt respektive fel.Resultatet visar på att det är möjligt att lösa uppgiften med hjälp av maskininlärning. En avgörandedel av utvecklingsarbetet för att göra modellen användbar var urvalet av data som användes för attträna modellen. Olika kunder som använder systemet, använder det på olika sätt, vilket gjorde detfördelaktigt att separera dem och träna modeller för olika kunder individuellt. En källa tillinkonsistenta resultat är hur organisationer förändrar sina processer och ärendehantering över tidoch problemet hanterades genom att begränsa hur långt tillbaka i tiden modellen hämtar data förträning. Dessa två strategier för att hantera problem har nackdelen att den mängd historiska datasom finns tillgänglig att träna modellen på minskar, men resultaten visar inte någon tydlig nackdelför de maskininlärningsmodeller som tränats på mindre datamängder utan även de har en godtagbarträffsäkerhet. / This thesis investigates how artificial intelligence can be used to automatically categorize faultreports that are processed in a case management system by using machine learning and techniquessuch as text mining. The study is based on Design Science Research Methodology and Peffer's sixsteps of design methodology, which in addition to design of an artifact concerns requirements andevaluation. The machine learning models that were developed were trained on historical data fromthe case management system Infracontrol Online, using four types of algorithms, Naive Bayes,Support Vector Machine, Neural Network, and Random Forest. A web application was developed todemonstrate how one of the machine learning models trained works and can be used to categorizetext. Regular users of the system have then had the opportunity to test the performance of themodel and evaluate how it works by marking where it categorizes text prompts correctly.The results show that it is possible to solve the task using machine learning. A crucial part of thedevelopment was the selection of data used to train the model. Different customers using thesystem use it in different ways, which made it advantageous to separate them and train models fordifferent customers independently. Another source of inconsistent results is how organizationschange their processes and thus case management over time. This issue was addressed by limitinghow far back in time the model retrieves data for training. The two strategies for solving the issuesmentioned have the disadvantage that the amount of historical data available for training decreases,but the results do not show any clear disadvantage for the machine learning models trained onsmaller data sets. They perform well and tests show an acceptable level of accuracy for theirpredictions

Page generated in 0.0323 seconds