Global ETD Search

81	A Content Boosted Collaborative Filtering Approach For Movie Recommendation Based On Local &amp / Global Similarity And Missing Data Prediction Ozbal, Gozde 01 September 2009 (has links) (PDF) Recently, it has become more and more difficult for the existing web based systems to locate or retrieve any kind of relevant information, due to the rapid growth of the World Wide Web (WWW) in terms of the information space and the amount of the users in that space. However, in today&#039 / s world, many systems and approaches make it possible for the users to be guided by the recommendations that they provide about new items such as articles, news, books, music, and movies. However, a lot of traditional recommender systems result in failure when the data to be used throughout the recommendation process is sparse. In another sense, when there exists an inadequate number of items or users in the system, unsuccessful recommendations are produced. Within this thesis work, ReMovender, a web based movie recommendation system, which uses a content boosted collaborative filtering approach, will be presented. ReMovender combines the local/global similarity and missing data prediction v techniques in order to handle the previously mentioned sparseness problem effectively. Besides, by putting the content information of the movies into consideration during the item similarity calculations, the goal of making more successful and realistic predictions is achieved.
82	Pattern Matching for Financial Time Series Data Liu, Ching-An 29 July 2008 (has links) In security markets, the stock price movements are closely linked to the market information. For example, the subprime mortgage triggered a global financial crisis in 2007. Drops occurred in virtually every stock market in the world. After the Federal Reserve took several steps to address the crisis, the stock markets have been gradually stable. Reaction of the traders to the arrival information results in different patterns of the stock price movements. Thus pattern matching is an important subject in future movement prediction, rule discovery and computer aided diagnosis. In this research, we propose a pattern matching procedure to seize the similar stock price movements of two listed companies during one day. First, the algorithm of searching the longest common subsequence is introduced to sieve out the time intervals where the two listed companies have the same integrated volatility levels and price rise/drop trends. Next we transform the raw price data in the found matching time periods to the Bollinger Band Percent data, then use the power spectrum to extract low frequency components. Adjusted Pearson chi-squared tests are performed to analyze the similarity of the price movement patterns in these periods. We perform the study by simulation investigation first, then apply the procedure to empirical analysis of high frequency transaction data of NYSE. Bollinger Band Percent pattern matching high frequency transaction data longest common subsequence power spectrum Pearson chi-squared test stock price movement.
83	Nonparametric criteria for sparse contingency tables / Neparametriniai kriterijai retų įvykių dažnių lentelėms Samusenko, Pavel 18 February 2013 (has links) In the dissertation, the problem of nonparametric testing for sparse contingency tables is addressed. Statistical inference problems caused by sparsity of contingency tables are widely discussed in the literature. Traditionally, the expected (under null the hypothesis) frequency is required to exceed 5 in almost all cells of the contingency table. If this condition is violated, the χ2 approximations of goodness of fit statistics may be inaccurate and the table is said to be sparse . Several techniques have been proposed to tackle the problem: exact tests, alternative approximations, parametric and nonparametric bootstrap, Bayes approach and other methods. However they all are not applicable or have some limitations in nonparametric statistical inference of very sparse contingency tables. In the dissertation, it is shown that, for sparse categorical data, the likelihood ratio statistic and Pearson’s χ2 statistic may become noninformative: they do not anymore measure the goodness-of-fit of null hypotheses to data. Thus, they can be inconsistent even in cases where a simple consistent test does exist. An improvement of the classical criteria for sparse contingency tables is proposed. The improvement is achieved by grouping and smoothing of sparse categorical data by making use of a new sparse asymptotics model relying on (extended) empirical Bayes approach. Under general conditions, the consistency of the proposed criteria based on grouping is proved. Finite-sample behavior of... [to full text] / Disertacijoje sprendžiami neparametrinių hipotezių tikrinimo uždaviniai išretintoms dažnių lentelėms. Problemos, susijusios su retų įvykių dažnių lentelėmis yra plačiai aptartos mokslinėje literatūroje. Yra pasiūlyta visa eilė metodų: tikslieji testai, alternatyvūs aproksimavimo būdai parametrinė ir neparametrinė saviranka, Bayeso ir kiti metodai. Tačiau jie nepritaikomi arba yra neefektyvūs neparametrinėje labai išretintų dažnių lentelių analizėje. Disertacijoje parodyta, kad labai išretintiems kategoriniams duomenims tikėtinumo santykio statistika ir Pearsono χ2 statistika gali pasidaryti neinformatyviomis: jos jau nėra tinkamos nulinės hipotezės ir duomenų suderinamumui matuoti. Vadinasi, jų pagrindu sudaryti kriterijai gali būti net nepagrįsti net tuo atveju, kai egzistuoja paprastas pagrįstas kriterijus. Darbe yra pasiūlytas klasikinių kriterijų patobulinimas išretintų dažnių lentelėms. Siūlomi kriterijai remiasi išretintų kategorinių duomenų grupavimu ir glodinimu naudojant naują išretinimo asimtotikos modelį, kuris remiasi (išplėstine) empirine Bayeso metodologija. Prie bendrų sąlygų yra įrodytas siūlomų kriterijų, naudojančių grupavimą, pagrįstumas. Kriterijų elgesys baigtinių imčių atveju tiriamas taikant Monte Carlo modeliavimą. Disertacija susideda iš įvado, 4 skyrių, literatūros sąrašo, bendrų išvadų ir priedo. Įvade atskleidžiama nagrinėjamos mokslinės problemos svarba, aprašomi darbo tikslai ir uždaviniai, tyrimo metodai, mokslinis naujumas, praktinė gautų... [toliau žr. visą tekstą] Mathematics Large number of rare events Likelihood ratio statistic Pearsons χ2 Nonparametric criteria Retų įvykių dažnių lentelės Tikėtinumo santykio statistika Pearson χ2 Neparametriniai kriterijai
84	Neparametriniai kriterijai retų įvykių dažnių lentelėms / Nonparametric criteria for sparse contingency tables Samusenko, Pavel 18 February 2013 (has links) Disertacijoje sprendžiami neparametrinių hipotezių tikrinimo uždaviniai išretintoms dažnių lentelėms. Problemos, susijusios su retų įvykių dažnių lentelėmis yra plačiai aptartos mokslinėje literatūroje. Yra pasiūlyta visa eilė metodų: tikslieji testai, alternatyvūs aproksimavimo būdai parametrinė ir neparametrinė saviranka, Bayeso ir kiti metodai. Tačiau jie nepritaikomi arba yra neefektyvūs neparametrinėje labai išretintų dažnių lentelių analizėje. Disertacijoje parodyta, kad labai išretintiems kategoriniams duomenims tikėtinumo santykio statistika ir Pearsono χ2 statistika gali pasidaryti neinformatyviomis: jos jau nėra tinkamos nulinės hipotezės ir duomenų suderinamumui matuoti. Vadinasi, jų pagrindu sudaryti kriterijai gali būti net nepagrįsti net tuo atveju, kai egzistuoja paprastas pagrįstas kriterijus. Darbe yra pasiūlytas klasikinių kriterijų patobulinimas išretintų dažnių lentelėms. Siūlomi kriterijai remiasi išretintų kategorinių duomenų grupavimu ir glodinimu naudojant naują išretinimo asimtotikos modelį, kuris remiasi (išplėstine) empirine Bayeso metodologija. Prie bendrų sąlygų yra įrodytas siūlomų kriterijų, naudojančių grupavimą, pagrįstumas. Kriterijų elgesys baigtinių imčių atveju tiriamas taikant Monte Carlo modeliavimą. Disertacija susideda iš įvado, 4 skyrių, literatūros sąrašo, bendrų išvadų ir priedo. Įvade atskleidžiama nagrinėjamos mokslinės problemos svarba, aprašomi darbo tikslai ir uždaviniai, tyrimo metodai, mokslinis naujumas, praktinė gautų... [toliau žr. visą tekstą] / In the dissertation, the problem of nonparametric testing for sparse contingency tables is addressed. Statistical inference problems caused by sparsity of contingency tables are widely discussed in the literature. Traditionally, the expected (under null the hypothesis) frequency is required to exceed 5 in almost all cells of the contingency table. If this condition is violated, the χ2 approximations of goodness of fit statistics may be inaccurate and the table is said to be sparse . Several techniques have been proposed to tackle the problem: exact tests, alternative approximations, parametric and nonparametric bootstrap, Bayes approach and other methods. However they all are not applicable or have some limitations in nonparametric statistical inference of very sparse contingency tables. In the dissertation, it is shown that, for sparse categorical data, the likelihood ratio statistic and Pearson’s χ2 statistic may become noninformative: they do not anymore measure the goodness-of-fit of null hypotheses to data. Thus, they can be inconsistent even in cases where a simple consistent test does exist. An improvement of the classical criteria for sparse contingency tables is proposed. The improvement is achieved by grouping and smoothing of sparse categorical data by making use of a new sparse asymptotics model relying on (extended) empirical Bayes approach. Under general conditions, the consistency of the proposed criteria based on grouping is proved. Finite sample behavior of... [to full text] Mathematics Retų įvykių dažnių lentelės Tikėtinumo santykio statistika Pearsono χ2 statistika Neparametriniai kriterijai Large number of rare events Likelihood ratio statistic Pearson χ2 Nonparametric criteria
85	Predicting Software Defectiveness by Mining Software Repositories Kasianenko, Stanislav January 2018 (has links) One of the important aims of the continuous software development process is to localize and remove all existing program bugs as fast as possible. Such goal is highly related to software engineering and defectiveness estimation. Many big companies started to store source code in software repositories as the later grew in popularity. These repositories usually include static source code as well as detailed data for defects in software units. This allows analyzing all the data without interrupting programing process. The main problem of large, complex software is impossibility to control everything manually while the price of the error can be very high. This might result in developers missing defects on testing stage and increase of maintenance cost. The general research goal is to find a way of predicting future software defectiveness with high precision. Reducing maintenance and development costs will contribute to reduce the time-to-market and increase software quality. To address the problem of estimating residual defects an approach was found to predict residual defectiveness of a software by the means of machine learning. For a prime machine learning algorithm, a regression decision tree was chosen as a simple and reliable solution. Data for this tree is extracted from static source code repository and divided into two parts: software metrics and defect data. Software metrics are formed from static code and defect data is extracted from reported issues in the repository. In addition to already reported bugs, they are augmented with unreported bugs found on “discussions” section in repository and parsed by a natural language processor. Metrics were filtered to remove ones, that were not related to defect data by applying correlation algorithm. Remaining metrics were weighted to use the most correlated combination as a training set for the decision tree. As a result, built decision tree model allows to forecast defectiveness with 89% chance for the particular product. This experiment was conducted using GitHub repository on a Java project and predicted number of possible bugs in a single file (Java class). The experiment resulted in designed method for predicting possible defectiveness from a static code of a single big (more than 1000 files) software version. repository mining software metric correlation defect bug natural language processing Pearson coefficient Breiman’s decision tree machine learning Computer Sciences Datavetenskap (datalogi)
86	Adição de ruído durante o processo de treinamento de redes neurais MLP : Uma abordagem para o aprendizado a partir de bases de dados pequenas e desbalanceadas SILVA, Icaman Botelho Viegas da 31 January 2011 (has links) Made available in DSpace on 2014-06-12T15:56:06Z (GMT). No. of bitstreams: 2 arquivo2738_1.pdf: 2219821 bytes, checksum: e0060e817bd6a925ad67e0971641acff (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2011 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Classificadores têm sido largamente aplicados nos mais diversos campos científicos e industriais, em geral obtendo bons desempenhos. Entretanto, quando aplicados a problemas cuja quantidade de dados disponível para o treinamento é limitada (bases de dados pequenas) ou quando estes dados apresentam um desbalanceamento entre as classes (bases de dados desbalanceadas), a maioria dos classificadores obtém um desempenho pobre. O poder de generalização do classificador é reduzido quando bases de dados pequenas são utilizadas durante o processo de treinamento, enquanto que em bases de dados desbalanceadas, as classes com maior representatividade e menor importância tendem a ser favorecidas. Inerentes a diversos problemas do mundo real, conjuntos de dados pequenos e desbalanceados representam uma limitação a ser superada por algoritmos de aprendizagem para produção de classificadores precisos e confiáveis. Neste trabalho é proposta uma abordagem baseada na adição de ruído Gaussiano durante o processo de treinamento de uma rede neural MultiLayer Perceptron (MLP) com o intuito de contornar as limitações referentes às bases de dados pequenas e/ou desbalanceadas, possibilitando a rede neural obter um alto poder de generalização A metodologia proposta pode ser dividida em duas etapas principais. Na primeira, um estudo acerca da correlação entre as variáveis é realizado. Este estudo envolve avaliar a correlação entre as variáveis por meio do coeficiente de correlação de Pearson e a descorrelação das variáveis através do método Análise de Componentes Principais (ACP). Na segunda, ruídos derivados a partir de uma distribuição Gaussiana são inseridos nas variáveis de entrada. Para validar a abordagem proposta foram utilizadas três bases públicas de um conhecido benchmark da comunidade de redes neurais, Proben1. Os resultados experimentais indicam que a abordagem proposta obtém um desempenho estatisticamente melhor (95% de confiança) que o método de treinamento convencional, principalmente quando utilizado o método PCA para descorrelação das variáveis antes da aplicação de ruído Bases de dados desbalanceadas Bases de dados pequenas Descorrelação de variáveis Correlação de Pearson Análise de Componentes Principais Ruído gaussiano Treinamento com ruído Redes Neurais MLP
87	Parciální a podmíněné korelační koeficienty / Partial correlation coefficients and theirs extension Říha, Samuel January 2015 (has links) No description available.
88	Estudo da influência de eventos sobre a estrutura do mercado brasileiro de ações a partir de redes ponderadas por correlações de Pearson, Spearman e Kendall / Weighted networks from Pearson, Spearman and Kendall correlations to characterize the influence of events on the Brazilian stock market structure Letícia Aparecida Origuela 06 August 2018 (has links) Neste trabalho foi analisada a influência de um evento sobre o mercado de ações brasileiro a partir das redes, e suas árvores geradoras mínimas, obtidas de medidas de dependência baseadas nas correlações de Pearson, de Spearman e de Kendall. O evento considerado foi a notícia da noite de 17 de maio de 2017 em que o dono da empresa brasileira JBS, Joesley Batista, gravou o então Presidente da República Michel Temer autorizando a compra do silêncio de um Deputado Federal. O dia seguinte a notícia, 18 de maio de 2017, foi definido como o dia do evento. Foram coletados dados de alta frequência de 58 ações do Ibovespa no período de 11 a 25 de maio de 2017. As alterações nas redes das ações do mercado foram analisadas comparando-se o período anterior e posterior ao evento em duas escalas de tempo: (1) Redes diárias: cinco pregões antes do evento, o dia do evento e, cinco pregões depois do evento, com cotações a cada 15 minutos; (2) Agrupadas em antes e depois: agrupando os dados dos 5 dias antes e dos 5 dias depois do evento. O estudo das redes diárias indicou mudança de tendência nas suas propriedades no decorrer do período que contém o evento, com cotações a cada 15 minutos. Isto sugeriu que análise do efeito médio contido nos dados agrupados antes de depois do evento poderiam tornar mais evidente as mudanças na estrutura de rede das ações. As redes antes e depois do evento apresentaram mudanças significativas nas suas métricas que ficaram mais evidenciadas nas árvores geradoras mínimas. As redes geradas pelas correlações de Kendall e Spearman apresentaram um número maior de agrupamentos antes e depois do evento e, após o evento, as árvores geradoras mínimas apresentaram uma redução do número de agrupamentos de ações para todos os tipos de correlação. As distribuições de grau ponderado após o evento indicam uma probabilidade maior de vértices com graus distante da média. As métricas das árvores geradoras mínimas por correlação de Spearman sofreram a maior variação, seguidas pelas de Kendall e Pearson, e também, indicaram que as redes após o evento ficaram mais robustas, ou seja, mais rígidas. A maior robustez das redes após o evento indica maior conectividade do mercado, tornando-o, como um todo, mais suscetível ao impacto de novos acontecimentos. / In this work the influence of an event on the Brazilian stock market was analyzed from networks and its minimum spanning trees obtained from measures of dependence based on the Pearson, Spearman, and Kendall\'s correlations. The event considered was the news in the evening of May 17, 2017 in which the owner of the Brazilian company JBS, Joesley Batista, recorded the Brazilian President Michel Temer authorizing the purchase of the silence of a congress member. The day just after the news, May 18, 2017, was defined as the event day. High-frequency data from 58 Ibovespa shares were collected from 11 to 25 May 2017. Changes in the stocks networks were analyzed comparing the period before and after the event in two time scales: (1) Daily networks: five trade sections before the event, the day of the event and, five trade sections after the event, with price every 15 minutes; (2) Grouped before and after do evento: grouping data from 5 days before and 5 days after event. The study of the daily networks indicated a change of trend in their properties during the period that contains the event, with quotations every 15 minutes. The study of daily networks indicated a change of trend in their properties during the period containing the event. This suggested that analysis of the mean effect of grouped data before and after the event could highlight the changes in the network structure. The networks before and after the event showed significant changes in their metrics, which became more evident from the minimum spanning trees. After the event, the minimum spanning trees for grouped data got a smaller number of clusters in the networks for all kind of correlations. The networks generated by Kendall and Spearman correlations presented a larger number of clusters before and after the event. The weighted degree distributions after the event suggest a power law decay tail for all the correlations considered and indicates a higher probability of vertices with weighted degrees far away from the mean weighted degree. The minimum spanning tree metrics generated by Spearman correlation suffered the greatest variation, followed by those of Kendall and Pearson; and their values indicates that after the event the networks became more robust, that is, more rigid. The increase in the networks robustness after the event indicates a higher market connectivity, making it as a whole, more susceptible to the impact of new events.
89	The relationship between perceived talent management practices, perceived organizational support (POS), perceived supervisor support (PSS) and intention to quit amongst Generation Y employees in the recruitment sector Du Plessis, Liesl 22 April 2013 (has links) Orientation: Perceived Talent Management Practices, Perceived Organizational Support and Perceived Supervisor Support are distinct but related constructs, and all of them appear to influence an employee’s intention to quit an organization. Research Purpose – The objective of this study was to investigate Generation Y’s perception of an organization’s talent management practices and to determine how it relates to their intention to quit the organization. In essence, the study aims to establish possible relationships of four constructs: Perceived Talent Management Practices, Perceived Organizational Support (POS), Perceived Supervisor Support (PSS) and Intention to Quit. The mediating/moderating characteristics of POS and PSS on the relationship between Perceived Talent Management Practices and Intention to Quit are also investigated. Motivation for the study – Talent is the new tipping point in corporate success. It has the potential to be the origin of an organisation’s demise or the reason for its continuous success. A concept that exuberates this much potential for both disaster and prosperity validates some examination into its protection. Research design, approach and method – Four Instruments (HCI Assessment of Talent Practices (HCI), Survey of Perceived Organizational Support (SPOS), Survey of Perceived Supervisor Support and an Intention to Quit Scale) was administered to a convenience sample of 135 employees from a population of 450 employees working in three provinces in which the organization was operational. Pearson product-moment correlation analysis and Multiple Regression analysis were used to investigate the structure of the integrated conceptual model on Perceived Talent Management Practices, POS, PSS and Intention to Quit. Main findings - The findings of this study indicates a strong practically significant positive correlation (r(df = 135; p < 0.001) = 0.724, large effect). between Perceived Organizational Support (POS) and Perceived Supervisor Support (PSS). A strong practically significant positive relationship (r(df = 135; p < 0.001) = 0.640, large effect) was found between Perceived Organizational Support (POS) and the employee’s perception of the organization’s Talent Practices. The study confirmed a strong practically significant negative relationship (r(df = 135; p < 0.001) =-0.569, large effect) between Perceived Organizational Support (POS) and the employee’s Intention to Quit. A medium practically significant negative relationship (r(df = 135; p < 0.001) = -0.436, medium effect) was established between Intention to Quit and Perceived Supervisor Support (PSS). This study determined a medium practically significant positive correlation (r(df = 135; p < 0.001) = 0.471, medium effect) between Perceived Supervisor Support (PSS) and the employee’s perception of the organization’s Talent Practices. The findings also establishes a medium practically significant negative relationship (r(df = 135; p < 0.001) = -0.477, medium effect) exists between employees’ perception of the organization’s Talent Practices and their intention to quit the organization. Multiple regression confirmed that neither POS nor PSS mediates/moderates the relationship between Perceived Talent Management Practices and Intention to Quit. Practical/Managerial Implications - Cappelli (2008) stated that paradigms only come undone when they ”encounter problems that they cannot address. But before the old paradigm is overthrown, there must be an alternative, one that describes new developments better than the old one does” (Cappelli, 2008). This study provides evidence that management can use paradigm shifts as a talent retention strategy where the creation of a high perception of talent management practices will result in a lower intent to leave the organization. Contribution: The findings of this study indicate a positive relationship between perceived talent management practices, POS and PSS. The study also established a positive relationship between POS and PSS. A negative relationship was confirmed between POS, PSS and Perceived Talent Management Practices in relation to Intention to Quit. / Dissertation (MCom)--University of Pretoria, 2010. / Human Resource Management / unrestricted Multiple regression Industrial psychology Generation Y Recruitment industry Intention to quit Pearson product-moment correlation Perceived Supervisor Support (PSS) Talent management practices Perceived Organizational Support (POS) UCTD
90	A generalized Neyman-Pearson lemma for hedge problems in incomplete markets Rudloff, Birgit 07 October 2005 (has links) Some financial problems as minimizing the shortfall risk when hedging in incomplete markets lead to problems belonging to test theory. This paper considers a generalization of the Neyman-Pearson lemma. With methods of convex duality we deduce the structure of an optimal randomized test when testing a compound hypothesis against a simple alternative. We give necessary and sufficient optimality conditions for the problem. info:eu-repo/classification/ddc/510 ddc:510 Hedging Risikomaß Neyman-Pearson lemma coherent risk measure convex duality hypothesis testing risk measures shortfall risk

Search results