• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 339
  • 26
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 507
  • 507
  • 272
  • 270
  • 147
  • 135
  • 129
  • 128
  • 113
  • 92
  • 88
  • 77
  • 76
  • 74
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
411

[en] A SUPERVISED LEARNING APPROACH TO PREDICT HOUSEHOLD AID DEMAND FOR RECURRENT CLIME-RELATED DISASTERS IN PERU / [pt] UMA ABORDAGEM DE APRENDIZADO SUPERVISIONADO PARA PREVER A DEMANDA DE AJUDA FAMILIAR PARA DESASTRES CLIMÁTICOS RECORRENTES NO PERU

RENATO JOSE QUILICHE ALTAMIRANO 21 November 2023 (has links)
[pt] Esta dissertação apresenta uma abordagem baseada em dados para o problema de predição de desastres recorrentes em países em desenvolvimento. Métodos de aprendizado de máquina supervisionado são usados para treinar classificadores que visam prever se uma família seria afetada por ameaças climáticas recorrentes (um classificador é treinado para cada perigo natural). A abordagem desenvolvida é válida para perigos naturais recorrentes que afetam um país e permite que os gerentes de risco de desastres direcionem suas operações com mais conhecimento. Além disso, a avaliação preditiva permite que os gerentes entendam os impulsionadores dessas previsões, levando à formulação proativa de políticas e planejamento de operações para mitigar riscos e preparar comunidades para desastres recorrentes. A metodologia proposta foi aplicada ao estudo de caso do Peru, onde foram treinados classificadores para ondas de frio, inundações e deslizamentos de terra. No caso das ondas de frio, o classificador tem 73,82% de precisão. A pesquisa descobriu que famílias pobres em áreas rurais são vulneráveis a desastres relacionados a ondas de frio e precisam de intervenção humanitária proativa. Famílias vulneráveis têm infraestrutura urbana precária, incluindo trilhas, caminhos, postes de iluminação e redes de água e drenagem. O papel do seguro saúde, estado de saúde e educação é menor. Domicílios com membros doentes levam a maiores probabilidades de serem afetados por ondas de frio. Maior realização educacional do chefe da família está associada a uma menor probabilidade de ser afetado por ondas de frio. No caso das inundações, o classificador tem 82.57% de precisão. Certas condições urbanas podem tornar as famílias rurais mais suscetíveis a inundações, como acesso à água potável, postes de iluminação e redes de drenagem. Possuir um computador ou laptop diminui a probabilidade de ser afetado por inundações, enquanto possuir uma bicicleta e ser chefiado por indivíduos casados aumenta. Inundações são mais comuns em assentamentos urbanos menos desenvolvidos do que em famílias rurais isoladas. No caso dos deslizamentos de terra, o classificador tem 88.85% de precisão, e é segue uma lógica diferente do das inundações. A importância na previsão é mais uniformemente distribuída entre as características consideradas no aprendizado do classificador. Assim, o impacto de um recurso individual na previsão é pequeno. A riqueza a longo prazo parece ser mais crítica: a probabilidade de ser afetado por um deslizamento é menor para famílias com certos aparelhos e materiais domésticos de construção. Comunidades rurais são mais afetadas por deslizamentos, especialmente aquelas localizadas em altitudes mais elevadas e maiores distâncias das cidades e mercados. O impacto marginal médio da altitude é não linear. Os classificadores fornecem um método inteligente baseado em dados que economiza recursos garantindo precisão. Além disso, a pesquisa fornece diretrizes para abordar a eficiência na distribuição da ajuda, como formulações de localização da instalação e roteamento de veículos. Os resultados da pesquisa têm várias implicações gerenciais, então os autores convocam à ação gestores de risco de desastres e outros interessados relevantes. Desastres recorrentes desafiam toda a humanidade. / [en] This dissertation presents a data-driven approach to the problem of predicting recurrent disasters in developing countries. Supervised machine learning methods are used to train classifiers that aim to predict whether a household would be affected by recurrent climate threats (one classifier is trained for each natural hazard). The approach developed is valid for recurrent natural hazards affecting a country and allows disaster risk managers to target their operations with more knowledge. In addition, predictive assessment allows managers to understand the drivers of these predictions, leading to proactive policy formulation and operations planning to mitigate risks and prepare communities for recurring disasters. The proposed methodology was applied to the case study of Peru, where classifiers were trained for cold waves, floods, and landslides. In the case of cold waves, the classifier was 73.82% accurate. The research found that low-income families in rural areas are vulnerable to cold wave related disasters and need proactive humanitarian intervention. Vulnerable families have poor urban infrastructure, including footpaths, roads, lampposts, and water and drainage networks. The role of health insurance, health status, and education is minor. Households with sick members are more likely to be affected by cold waves. Higher educational attainment of the head of the household is associated with a lower probability of being affected by cold snaps.In the case of flooding, the classifier is 82.57% accurate. Certain urban conditions, such as access to drinking water, lampposts, and drainage networks, can make rural households more susceptible to flooding. Owning a computer or laptop decreases the likelihood of being affected by flooding while owning a bicycle and being headed by married individuals increases it. Flooding is more common in less developed urban settlements than isolated rural families.In the case of landslides, the classifier is 88.85% accurate and follows a different logic than that of floods. The importance of the prediction is more evenly distributed among the features considered when learning the classifier. Thus, the impact of an individual feature on the prediction is small. Long-term wealth is more critical: the probability of being affected by a landslide is lower for families with specific appliances and household building materials. Rural communities are more affected by landslides, especially those located at higher altitudes and greater distances from cities and markets. The average marginal impact of altitude is non-linear.The classifiers provide an intelligent data-driven method that saves resources by ensuring accuracy. In addition, the research provides guidelines for addressing efficiency in aid distribution, such as facility location formulations and vehicle routing.The research results have several managerial implications, so the authors call for action from disaster risk managers and other relevant stakeholders. Recurrent disasters challenge all of humanity.
412

Data-Driven Success in Infrastructure Megaprojects. : Leveraging Machine Learning and Expert Insights for Enhanced Prediction and Efficiency / Datadriven framgång inom infrastrukturmegaprojekt. : Utnyttja maskininlärning och expertkunskap för förbättrad prognostisering och effektivitet.

Nordmark, David E.G. January 2023 (has links)
This Master's thesis utilizes random forest and leave-one-out cross-validation to predict the success of megaprojects involving infrastructure. The goal was to enhance the efficiency of the design and engineering phase of the infrastructure and construction industries. Due to the small sample size of megaprojects and limitated data sharing, the lack of data poses significant challenges for implementing artificial intelligence for the evaluation and prediction of megaprojects. This thesis explore how megaprojects can benefit from data collection and machine learning despite small sample sizes. The focus of the research was on analyzing data from thirteen megaprojects and identifying the most influential data for machine learning analysis. The results prove that the incorporation of expert data, representing critical success factors for megaprojects, significantly enhanced the accuracy of the predictive model. The superior performance of expert data over economic data, experience data, and documentation data demonstrates the significance of domain expertise. In addition, the results demonstrate the significance of the planning phase by implementing feature selection techniques and feature importance scores. In the planning phase, a small, devoted, and highly experienced team of project planners has proven to be a crucial factor for project success. The thesis concludes that in order for companies to maximize the utility of machine learning, they must identify their critical success factors and collect the corresponding data. / Denna magisteruppsats undersöker följande forskningsfråga: Hur kan maskininlärning och insiktsfull dataanalys användas för att öka effektiviteten i infrastruktursektorns plannerings- och designfas? Denna utmaning löses genom att analysera data från verkliga megaprojekt och tillämpa avancerade maskininlärningsalgoritmer för att förutspå projektframgång och ta reda på framgångsfaktorerna. Vår forskning är särskilt intresserad av megaprojekt på grund av deras komplicerade natur, unika egenskaper och enorma inverkan på samhället. Dessa projekt slutförs sällan, vilket gör att det är svårt att få tillgång till stora mängder verklig data. Det är uppenbart att AI har potential att vara ett ovärderligt verktyg för att förstå och hantera megaprojekts komplexitet, trots de problem vi står inför. Artificiell intelligens gör det möjligt att fatta beslut som är datadrivna och mer informerade. Uppsatsen lyckas med att hanterard det stora problemet som är bristen på data från megaprojekt. Uppsatsen motiveras även av denna brist på data, vilket gör forskningen relevant för andra områden som präglas av litet dataurval. Resultaten från uppsatsen visar att evalueringen av megaprojekt går att förbättra genom smart användning av specifika dataattribut. Uppsatsen inspirerar även företag att börja samla in viktig data för att möjliggöra användningen av artificiell intelligens och maskinginlärning till sin fördel.
413

[en] FORECASTING AMERICAN INDUSTRIAL PRODUCTION WITH HIGH DIMENSIONAL ENVIRONMENTS FROM FINANCIAL MARKETS, SENTIMENTS, EXPECTATIONS, AND ECONOMIC VARIABLES / [pt] PREVENDO A PRODUÇÃO INDUSTRIAL AMERICANA EM AMBIENTES DE ALTA DIMENSIONALIDADE, ATRAVÉS DE MERCADOS FINANCEIROS, SENTIMENTOS, EXPECTATIVAS E VARIÁVEIS ECONÔMICAS

EDUARDO OLIVEIRA MARINHO 20 February 2020 (has links)
[pt] O presente trabalho traz 6 diferentes técnicas de previsão para a variação mensal do Índice da Produção Industrial americana em 3 ambientes diferentes totalizando 18 modelos. No primeiro ambiente foram usados como variáveis explicativas a própria defasagem da variação mensal do Índice da produção industrial e outras 55 variáveis de mercado e de expectativa tais quais retornos setoriais, prêmio de risco de mercado, volatilidade implícita, prêmio de taxa de juros (corporate e longo prazo), sentimento do consumidor e índice de incerteza. No segundo ambiente foi usado à data base do FRED com 130 variáveis econômicas como variáveis explicativas. No terceiro ambiente foram usadas as variáveis mais relevantes do ambiente 1 e do ambiente 2. Observa-se no trabalho uma melhora em prever o IP contra um modelo AR e algumas interpretações a respeito do comportamento da economia americana nos últimos 45 anos (importância de setores econômicos, períodos de incerteza, mudanças na resposta a prêmio de risco, volatilidade e taxa de juros). / [en] This thesis presents 6 different forecasting techniques for the monthly variation of the American Industrial Production Index in 3 different environments, totaling 18 models. In the first environment, the lags of the monthly variation of the industrial production index and other 55 market and expectation variables such as sector returns, market risk premium, implied volatility, and interest rate risk premiums (corporate premium and long term premium), consumer sentiment and uncertainty index. In the second environment was used the FRED data base with 130 economic variables as explanatory variables. In the third environment, the most relevant variables of environment 1 and environment 2 were used. It was observed an improvement in predicting IP against an AR model and some interpretations regarding the behavior of the American economy in the last 45 years (importance of sectors, uncertainty periods, and changes in response to risk premium, volatility and interest rate).
414

Predicting Stock Price Direction for Asian Small Cap Stocks with Machine Learning Methods / Prediktering av Aktiekursriktningen för Asiatiska Småbolagsaktier med Maskininlärning

Abazari, Tina, Baghchesara, Sherwin January 2021 (has links)
Portfolio managers have a great interest in detecting high-performing stocks early on. Detecting outperforming stocks has for long been of interest from a research as well as financial point of view. Quantitative methods to predict stock movements have been widely studied in diverse contexts, where some present promising results. The quantitative algorithms for such prediction models can be, to name a few, support vector machines, tree-based methods, and regression models, where each one can carry different predictive power. Most previous research focuses on indices such as S&P 500 or large-cap stocks, while small- and micro-cap stocks have been examined to a lesser extent. These types of stocks also commonly share the characteristic of high volatility, with prospects that can be difficult to assess. This study examines to which extent widely studied quantitative methods such as random forest, support vector machine, and logistic regression can produce accurate predictions of stock price directions on a quarterly and yearly basis. The problem is modeled as a binary classification task, where the aim is to predict whether a stock achieves a return above or below a benchmark index. The focus lies on Asian small- and micro-cap stocks. The study concludes that the random forest method for a binary yearly prediction produces the highest accuracy of 69.64%, where all three models produced higher accuracy than a binary quarterly prediction. Although the statistical power of the models can be ruled adequate, more extensive studies are desirable to examine whether other models or variables can increase the prediction accuracy for small- and micro-cap stocks. / Portföljförvaltare har ett stort intresse av att upptäcka högpresterande aktier tidigt. Detektering av högavkastande aktier har länge varit av stort intresse dels i forskningssyfte men också ur ett finansiellt perspektiv. Kvantitativa metoder för att förutsäga riktning av aktiepriset har studerats i stor utsträckning där vissa presenterar lovande resultat. De kvantitativa algoritmerna för sådana prediktionsmodeller kan vara, för att nämna ett fåtal, support vector machines, trädbaserade metoder och regressionsmodeller, där var och en kan bära olika prediktiv kraft. Majoriteten av tidigare studier fokuserar på index såsom S&P 500 eller storbolagsaktier, medan små- och mikrobolagsaktier har undersökts i mindre utsträckning. Dessa sistnämnda typer av aktier innehar ofta en hög volatilitet med framtidsutsikter som kan vara svåra att bedöma. Denna studie undersöker i vilken utsträckning väletablerade kvantitativa modeller såsom random forest, support vector machine och logistisk regression, kan ge korrekta förutsägelser av små- och mikrobolags aktiekursriktningar på kvartals- och årsbasis. I avhandlingen modelleras detta som ett binärt klassificeringsproblem, där avkastningen för varje aktie antingen är över eller under jämförelseindex. Fokuset ligger på asiatiska små-och mikrobolag. Studien drar slutsatsen att random forest för en binär årlig prediktion ger den högsta noggrannheten på 69,64 %, där samtliga tre modeller ger högre noggrannhet än en binär kvartalsprediktion. Även om modellerna bedöms vara statistiskt säkerställda, är det önskvärt med fler omfattande studier för att undersöka om andra modeller eller variabler kan öka noggrannheten i prediktionen för små- och mikrobolags aktiekursriktning.
415

Predicting PV self-consumption in villas with machine learning

GALLI, FABIAN January 2021 (has links)
In Sweden, there is a strong and growing interest in solar power. In recent years, photovoltaic (PV) system installations have increased dramatically and a large part are distributed grid connected PV systems i.e. rooftop installations. Currently the electricity export rate is significantly lower than the import rate which has made the amount of self-consumed PV electricity a critical factor when assessing the system profitability. Self-consumption (SC) is calculated using hourly or sub-hourly timesteps and is highly dependent on the solar patterns of the location of interest, the PV system configuration and the building load. As this varies for all potential installations it is difficult to make estimations without having historical data of both load and local irradiance, which is often hard to acquire or not available. A method to predict SC using commonly available information at the planning phase is therefore preferred.  There is a scarcity of documented SC data and only a few reports treating the subject of mapping or predicting SC. Therefore, this thesis is investigating the possibility of utilizing machine learning to create models able to predict the SC using the inputs: Annual load, annual PV production, tilt angle and azimuth angle of the modules, and the latitude. With the programming language Python, seven models are created using regression techniques, using real load data and simulated PV data from the south of Sweden, and evaluated using coefficient of determination (R2) and mean absolute error (MAE). The techniques are Linear Regression, Polynomial regression, Ridge Regression, Lasso regression, K-Nearest Neighbors (kNN), Random Forest, Multi-Layer Perceptron (MLP), as well as the only other SC prediction model found in the literature. A parametric analysis of the models is conducted, removing one variable at a time to assess the model’s dependence on each variable.  The results are promising, with five out of eight models achieving an R2 value above 0.9 and can be considered good for predicting SC. The best performing model, Random Forest, has an R2 of 0.985 and a MAE of 0.0148. The parametric analysis also shows that while more input data is helpful, using only annual load and PV production is sufficient to make good predictions. This can only be stated for model performance for the southern region of Sweden, however, and are not applicable to areas outside the latitudes or country tested. / I Sverige finns ett starkt och växande intresse för solenergi. De senaste åren har antalet solcellsanläggningar ökat dramatiskt och en stor del är distribuerade nätanslutna solcellssystem, dvs takinstallationer. För närvarande är elexportpriset betydligt lägre än importpriset, vilket har gjort mängden egenanvänd solel till en kritisk faktor vid bedömningen av systemets lönsamhet. Egenanvändning (EA) beräknas med tidssteg upp till en timmes längd och är i hög grad beroende av solstrålningsmönstret för platsen av intresse, PV-systemkonfigurationen och byggnadens energibehov. Eftersom detta varierar för alla potentiella installationer är det svårt att göra uppskattningar utan att ha historiska data om både energibehov och lokal solstrålning, vilket ofta inte är tillgängligt. En metod för att förutsäga EA med allmän tillgänglig information är därför att föredra.  Det finns en brist på dokumenterad EA-data och endast ett fåtal rapporter som behandlar kartläggning och prediktion av EA. I denna uppsats undersöks möjligheten att använda maskininlärning för att skapa modeller som kan förutsäga EA. De variabler som ingår är årlig energiförbrukning, årlig solcellsproduktion, lutningsvinkel och azimutvinkel för modulerna och latitud. Med programmeringsspråket Python skapas sju modeller med hjälp av olika regressionstekniker, där energiförbruknings- och simulerad solelproduktionsdata från södra Sverige används. Modellerna utvärderas med hjälp av determinationskoefficienten (R2) och mean absolute error (MAE). Teknikerna som används är linjär regression, polynomregression, Ridge regression, Lasso regression, K-nearest neighbor regression, Random Forest regression, Multi-Layer Perceptron regression. En additionell linjär regressions-modell skapas även med samma metodik som används i en tidigare publicerad rapport. En parametrisk analys av modellerna genomförs, där en variabel exkluderas åt gången för att bedöma modellens beroende av varje enskild variabel.  Resultaten är mycket lovande, där fem av de åtta undersökta modeller uppnår ett R2-värde över 0,9. Den bästa modellen, Random Forest, har ett R2 på 0,985 och ett MAE på 0,0148. Den parametriska analysen visar också att även om ingångsdata är till hjälp, är det tillräckligt att använda årlig energiförbrukning och årlig solcellsproduktion för att göra bra förutsägelser. Det måste dock påpekas att modellprestandan endast är tillförlitlig för södra Sverige, från var beräkningsdata är hämtad, och inte tillämplig för områden utanför de valda latituderna eller land.
416

PV self-consumption: Regression models and data visualization

Tóth, Martos January 2022 (has links)
In Sweden the installed capacity of the residential PV systems is increasing every year. The lack of feed-in-tariff-scheme makes the techno-economic optimization of the PV systems mainly based on the self-consumption. The calculation of this parameter involves hourly building loads and hourly PV generation. This data cannot be obtained easily from households. A predictive model based on already available data would be preferred and needed in this case. The already available machine learning models can be suitable and have been tested but the amount of literature in this topic is fairly low. The machine learning models are using a dataset which includes real measurement data of building loads and simulated PV generation data and the calculated self-consumption data based on these two inputs. The simulation of PV generation can be based on Typical Meteorological Year (TMY) weather file or on measured weather data. The TMY file can be generated quicker and more easily, but it is only spatially matched to the building load, while the measured data is matched temporally and spatially. This thesis investigates if the usage of TMY file leads to any major impact on the performance of the regression models by comparing it to the measured weather file model. In this model the buildings are single-family houses from south Sweden region.  The different building types can have different load profiles which can affect the performance of the model. Because of the different load profiles, the effect of using TMY file may have more significant impact. This thesis also compares the impact of the TMY file usage in the case of multifamily houses and also compares the two building types by performance of the machine learning models. The PV and battery prices are decreasing from year to year. The subsidies in Sweden offer a significant tax credit on battery investments with PV systems. This can make the batteries profitable. Lastly this thesis evaluates the performance of the machine learning models after adding the battery to the system for both TMY and measured data. Also, the optimal system is predicted based on the self-consumption, PV generation and battery size.  The models have high accuracy, the random forest model is above 0.9 R2for all cases. The results confirm that using the TMY file only leads to marginal errors, and it can be used for the training of the models. The battery model has promising results with above 0.9 R2 for four models: random forest, k-NN, MLP and polynomial. The prediction of the optimal system model has promising results as well for the polynomial model with 18% error in predicted payback time compared to the reference. / I Sverige ökar den installerade kapaciteten för solcellsanläggningarna för bostäder varje år. Bristen på inmatningssystem gör att den tekniska ekonomiska optimeringen av solcellssystemen huvudsakligen bygger på egen konsumtion. Beräkningen av denna parameter omfattar byggnadsbelastningar per timme och PV-generering per timme. Dessa uppgifter kan inte lätt erhållas från hushållen. En prediktiv modell baserad på redan tillgängliga data skulle vara att föredra och behövas i detta fall. De redan tillgängliga maskininlärningsmodellerna kan vara lämpliga och redan testade men mängden litteratur i detta ämne är ganska låg. Maskininlärningsmodellerna använder en datauppsättning som inkluderar verkliga mätdata från byggnader och simulerad PV-genereringsdata och den beräknade egenförbrukningsdata baserad på dessa två indata. Simuleringen av PV-generering kan baseras på väderfilen Typical Meteorological Year (TMY) eller på uppmätta väderdata. TMY-filen kan genereras snabbare och enklare, men den anpassas endast rumsligt till byggnadsbelastningen, medan uppmätta data är temporärt och rumsligt. Denna avhandling undersöker om användningen av TMY-fil leder till någon större påverkan på prestandan genom att jämföra den med den uppmätta väderfilsmodellen. I denna modell är byggnaderna småhus från södra Sverige. De olika byggnadstyperna kan ha olika belastningsprofiler vilket kan påverka modellens prestanda. På grund av dessa olika belastningsprofiler kan effekten av att använda TMY-fil ha mer betydande inverkan. Den här avhandlingen jämför också effekten av TMY-filanvändningen i fallet med flerfamiljshus och jämför också de två byggnadstyperna efter prestanda för maskininlärningsmodellerna. PV- och batteripriserna minskar från år till år. Subventionerna i Sverige ger en betydande skattelättnad på batteriinvesteringar med solcellssystem. Detta kan göra batterierna lönsamma. Slutligen utvärderar denna avhandling prestandan för maskininlärningsmodellerna efter att ha lagt till batteriet i systemet för både TMY och uppmätta data. Det optimala systemet förutsägs också baserat på egen förbrukning, årlig byggnadsbelastning, årlig PV-generering och batteristorlek. Modellerna har hög noggrannhet, den slumpmässiga skogsmodellen är över 0,9 R2 för alla fall. Resultaten bekräftar att användningen av TMY-filen endast leder till marginella fel, och den kan användas för träning av modellerna. Batterimodellen har lovande resultat med över 0,9 R2 för fyra modeller: random skog, k-NN, MLP och polynom. Förutsägelsen av den optimala systemmodellen har också lovande resultat för polynommodellen med 18 % fel i förutspådd återbetalningstid jämfört med referensen.
417

Exploring Integration of Predictive Maintenance using Anomaly Detection : Enhancing Productivity in Manufacturing / Utforska integration av prediktivt underhåll med hjälp av avvikelsedetektering : Förbättra produktiviteten inom tillverkning

Bülund, Malin January 2024 (has links)
In the manufacturing industry, predictive maintenance (PdM) stands out by leveraging data analytics and IoT technologies to predict machine failures, offering a significant advancement over traditional reactive and scheduled maintenance practices. The aim of this thesis was to examine how anomaly detection algorithms could be utilized to anticipate potential breakdowns in manufacturing operations, while also investigating the feasibility and potential benefits of integrating PdM strategies into a production line. The methodology of this projectconsisted of a literature review, application of machine learning (ML) algorithms, and conducting interviews. Firstly, the literature review provided a foundational basis to explore the benefits of PdM and its impact on production line productivity, thereby shaping the development of interview questions. Secondly, ML algorithms were employed to analyze data and predict equipment failures. The algorithms used in this project were: Isolation Forest (IF), Local Outlier Factor (LOF), Logistic Regression (LR), One-Class Support Vector Machine(OC-SVM) and Random Forest (RF). Lastly, interviews with production line personnel provided qualitative insights into the current maintenance practices and perceptions of PdM. The findings from this project underscore the efficacy of the IF model in identifying potential equipment failures, emphasizing its key role in improving future PdM strategies to enhance maintenance schedules and boost operational efficiency. Insights gained from both literature and interviews underscore the transformative potential of PdM in refining maintenance strategies, enhancing operational efficiency, and minimizing unplanned downtime. More broadly, the successful implementation of these technologies is expected to revolutionize manufacturing processes, driving towards more sustainable and efficient industrial operations. / I tillverkningsindustrin utmärker sig prediktivt underhåll (PdM) genom att använda dataanalys och IoT-teknologier för att förutse maskinfel, vilket erbjuder ett betydande framsteg jämfört med traditionella reaktiva och schemalagda underhållsstrategier. Syftet med denna avhandling var att undersöka hur algoritmer för avvikelsedetektering kunde användas för att förutse potentiella haverier i tillverkningsoperationer, samtidigt som genomförbarheten och de potentiella fördelarna med att integrera PdM-strategier i en produktionslinje undersöktes. Metodologin för detta projekt bestod av en litteraturöversikt, tillämpning av maskininlärningsalgoritmer (ML) och genomförande av intervjuer. Först och främst gav litteraturöversikten en grundläggande bas för att utforska fördelarna med PdM och dess inverkan på produktionslinjens produktivitet, vilket därmed påverkade utformningen av intervjufrågorna. För det andra användes ML-algoritmer för att analysera data och förutsäga utrustningsfel. Algoritmerna som användes i detta projekt var: Isolation Forest (IF), Local Outlier Factor (LOF), Logistic Regression (LR), One-Class Support Vector Machine (OCSVM) och Random Forest (RF). Slutligen gav intervjuer med produktionslinjepersonal kvalitativa insikter i de nuvarande underhållsstrategierna och uppfattningarna om PdM.Resultaten från detta projekt understryker effektiviteten hos IF-modellen för att identifiera potentiella utrustningsfel, vilket betonar dess centrala roll i att förbättra framtida PdM-strategier för att förbättra underhållsscheman och öka den operativa effektiviteten. Insikter vunna från både litteratur och intervjuer understryker PdM:s transformativa potential att finslipa underhållsstrategier, öka operativ effektivitet och minimera oplanerade driftstopp. Mer generellt förväntas den framgångsrika implementeringen av dessa teknologier revolutionera tillverkningsprocesser och driva mot mer hållbara och effektiva industriella operationer.
418

Apprentissage supervisé de données déséquilibrées par forêt aléatoire / Supervised learning of imbalanced datasets using random forest

Thomas, Julien 12 February 2009 (has links)
La problématique des jeux de données déséquilibrées en apprentissage supervisé est apparue relativement récemment, dès lors que le data mining est devenu une technologie amplement utilisée dans l'industrie. Le but de nos travaux est d'adapter différents éléments de l'apprentissage supervisé à cette problématique. Nous cherchons également à répondre aux exigences spécifiques de performances souvent liées aux problèmes de données déséquilibrées. Ce besoin se retrouve dans notre application principale, la mise au point d'un logiciel d'aide à la détection des cancers du sein.Pour cela, nous proposons de nouvelles méthodes modifiant trois différentes étapes d'un processus d'apprentissage. Tout d'abord au niveau de l'échantillonnage, nous proposons lors de l'utilisation d'un bagging, de remplacer le bootstrap classique par un échantillonnage dirigé. Nos techniques FUNSS et LARSS utilisent des propriétés de voisinage pour la sélection des individus. Ensuite au niveau de l'espace de représentation, notre contribution consiste en une méthode de construction de variables adaptées aux jeux de données déséquilibrées. Cette méthode, l'algorithme FuFeFa, est basée sur la découverte de règles d'association prédictives. Enfin, lors de l'étape d'agrégation des classifieurs de base d'un bagging, nous proposons d'optimiser le vote à la majorité en le pondérant. Pour ce faire nous avons mis en place une nouvelle mesure quantitative d'évaluation des performances d'un modèle, PRAGMA, qui permet la prise en considération de besoins spécifiques de l'utilisateur vis-à-vis des taux de rappel et de précision de chaque classe. / The problem of imbalanced datasets in supervised learning has emerged relatively recently, since the data mining has become a technology widely used in industry. The assisted medical diagnosis, the detection of fraud, abnormal phenomena, or specific elements on satellite imagery, are examples of industrial applications based on supervised learning of imbalanced datasets. The goal of our work is to bring supervised learning process on this issue. We also try to give an answer about the specific requirements of performance often related to the problem of imbalanced datasets, such as a high recall rate for the minority class. This need is reflected in our main application, the development of software to help radiologist in the detection of breast cancer. For this, we propose new methods of amending three different stages of a learning process. First in the sampling stage, we propose in the case of a bagging, to replaced classic bootstrap sampling by a guided sampling. Our techniques, FUNSS and LARSS use neighbourhood properties for the selection of objects. Secondly, for the representation space, our contribution is a method of variables construction adapted to imbalanced datasets. This method, the algorithm FuFeFa, is based on the discovery of predictive association rules. Finally, at the stage of aggregation of base classifiers of a bagging, we propose to optimize the majority vote in using weightings. For this, we have introduced a new quantitative measure of model assessment, PRAGMA, which allows taking into account user specific needs about recall and precision rates of each class.
419

Classification of Carpiodes Using Fourier Descriptors: A Content Based Image Retrieval Approach

Trahan, Patrick 06 August 2009 (has links)
Taxonomic classification has always been important to the study of any biological system. Many biological species will go unclassified and become lost forever at the current rate of classification. The current state of computer technology makes image storage and retrieval possible on a global level. As a result, computer-aided taxonomy is now possible. Content based image retrieval techniques utilize visual features of the image for classification. By utilizing image content and computer technology, the gap between taxonomic classification and species destruction is shrinking. This content based study utilizes the Fourier Descriptors of fifteen known landmark features on three Carpiodes species: C.carpio, C.velifer, and C.cyprinus. Classification analysis involves both unsupervised and supervised machine learning algorithms. Fourier Descriptors of the fifteen known landmarks provide for strong classification power on image data. Feature reduction analysis indicates feature reduction is possible. This proves useful for increasing generalization power of classification.
420

Data-Driven Predictions of Heating Energy Savings in Residential Buildings

Lindblom, Ellen, Almquist, Isabelle January 2019 (has links)
Along with the increasing use of intermittent electricity sources, such as wind and sun, comes a growing demand for user flexibility. This has paved the way for a new market of services that provide electricity customers with energy saving solutions. These include a variety of techniques ranging from sophisticated control of the customers’ home equipment to information on how to adjust their consumption behavior in order to save energy. This master thesis work contributes further to this field by investigating an additional incentive; predictions of future energy savings related to indoor temperature. Five different machine learning models have been tuned and used to predict monthly heating energy consumption for a given set of homes. The model tuning process and performance evaluation were performed using 10-fold cross validation. The best performing model was then used to predict how much heating energy each individual household could save by decreasing their indoor temperature by 1°C during the heating season. The highest prediction accuracy (of about 78%) is achieved with support vector regression (SVR), closely followed by neural networks (NN). The simpler regression models that have been implemented are, however, not far behind. According to the SVR model, the average household is expected to lower their heating energy consumption by approximately 3% if the indoor temperature is decreased by 1°C.

Page generated in 0.0979 seconds