Global ETD Search

141	Fighting Unstructured Data with Formatting Methods : Navigating Crisis Communication: The Role of CAP in Effective Information Dissemination / Bekämpar ostrukturerad data med formateringsmetoder : Att navigera i kriskommunikation: CAP:s roll i effektiv informationsspridning Spridzans, Alfreds January 2024 (has links) This study investigates the format of crisis communication by analysing a news archive dataset from Krisinformation.se, a Swedish website dedicated to sharing information about crises. The primary goal is to assess the dataset's structure and efficacy in meeting the Common Alerting Protocol (CAP) criteria, an internationally recognised format for emergency alerts. The study uses quantitative text analysis and data preprocessing tools like Python and Power Query to identify inconsistencies in the present dataset format. These anomalies limit the dataset's usefulness for extensive research and effective crisis communication. To address these issues, the study constructs two new datasets with enhanced column structures that rectify the identified problems. These refined datasets aim to improve the clarity and accessibility of information regarding crisis events, providing valuable insights into the nature and frequency of these incidents. Additionally, the research offers practical recommendations for optimising the dataset format to better align with CAP standards, enhancing the overall effectiveness of crisis communication on the platform. The findings highlight the critical role of structured and standardised data formats in crisis communication, particularly in the context of increasing climate-related hazards and other emergencies. By improving the dataset format, the study contributes to more efficient data analysis and better preparedness for future crises. The insights gained from this research are intended to assist other analysts and researchers in conducting more robust studies, ultimately aiding in developing more resilient and responsive crisis communication strategies. / Denna studie undersöker formatet för kriskommunikation genom att analysera ett nyhetsarkiv från Krisinformation.se, en svensk hemsida som är avsedd att dela information om kriser. Det primära målet är att bedöma datasetets struktur och effektivitet när det gäller att uppfylla kriterierna för Common Alerting Protocol (CAP), ett internationellt erkänt format för nödmeddelanden. I studien används kvantitativ textanalys och dataförberedande verktyg som Python och Power Query för att identifiera inkonsekvenser i det aktuella datasetformatet. Dessa anomalier begränsar datasetets användbarhet för omfattande forskning och effektiv kriskommunikation. För att ta itu med dessa frågor konstruerar studien två nya dataset med förbättrade kolumnstrukturer som åtgärdar de identifierade problemen. Dessa förfinade dataset syftar till att förbättra tydligheten och tillgängligheten av information om krishändelser, vilket ger värdefulla insikter om dessa händelsers karaktär och frekvens. Dessutom ger forskningen praktiska rekommendationer för att optimera datasetformatet så att det bättre överensstämmer med CAP-standarderna, vilket förbättrar den övergripande effektiviteten i kriskommunikationen på plattformen. Resultaten visar att strukturerade och standardiserade dataformat spelar en avgörande roll för kriskommunikation, särskilt i samband med ökande klimatrelaterade faror och andra nödsituationer. Genom att förbättra formatet på datasetet bidrar studien till effektivare dataanalys och bättre beredskap för framtida kriser. Insikterna från denna forskning är avsedda att hjälpa andra analytiker och forskare att genomföra mer robusta studier, vilket i slutändan bidrar till att utveckla mer motståndskraftiga och lyhörda strategier för kriskommunikation. Data collection Data mining Data preprocessing Common Alerting Protocol (CAP) API Sweden Dataset format Digital media Data analysis Datainsamling Data mining Förbehandling av data Common Alerting Protocol (CAP) API Sverige Datasetformat Digitala medier Dataanalys Media and Communications Medie- och kommunikationsvetenskap
142	Predicting Customer Churn in a Subscription-Based E-Commerce Platform Using Machine Learning Techniques Aljifri, Ahmed January 2024 (has links) This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms. Customer churn prediction E-commerce Machine learning algorithms Logistic Regression k-Nearest Neighbors (KNN) Random Forest Feature engineering Preprocessing techniques Model evaluation performance measures supervised machine learning classification confusion matrix. Computer Sciences Datavetenskap (datalogi)
143	Klientų duomenų valdymas bankininkystėje / Client data management in banking Žiupsnys, Giedrius 09 July 2011 (has links) Darbas apima banko klientų kredito istorinių duomenų dėsningumų tyrimą. Pirmiausia nagrinėjamos banko duomenų saugyklos, siekiant kuo geriau perprasti bankinius duomenis. Vėliau naudojant banko duomenų imtis, kurios apima kreditų grąžinimo istoriją, siekiama įvertinti klientų nemokumo riziką. Tai atliekama adaptuojant algoritmus bei programinę įrangą duomenų tyrimui, kuris pradedamas nuo informacijos apdorojimo ir paruošimo. Paskui pritaikant įvairius klasifikavimo algoritmus, sudarinėjami modeliai, kuriais siekiama kuo tiksliau suskirstyti turimus duomenis, nustatant nemokius klientus. Taip pat siekiant įvertinti kliento vėluojamų mokėti paskolą dienų skaičių pasitelkiami regresijos algoritmai bei sudarinėjami prognozės modeliai. Taigi darbo metu atlikus numatytus tyrimus, pateikiami duomenų vitrinų modeliai, informacijos srautų schema. Taip pat nurodomi klasifikavimo ir prognozavimo modeliai bei algoritmai, geriausiai įvertinantys duotas duomenų imtis. / This work is about analysing regularities in bank clients historical credit data. So first of all bank information repositories are analyzed to comprehend banks data. Then using data mining algorithms and software for bank data sets, which describes credit repayment history, clients insolvency risk is being tried to estimate. So first step in analyzis is information preprocessing for data mining. Later various classification algorithms is used to make models wich classify our data sets and help to identify insolvent clients as accurate as possible. Besides clasiffication, regression algorithms are analyzed and prediction models are created. These models help to estimate how long client are late to pay deposit. So when researches have been done data marts and data flow schema are presented. Also classification and regressions algorithms and models, which shows best estimation results for our data sets, are introduced. Duomenų tyrimas Duomenų vitrina Kredito rizikos vertinimas Klasifikavimas Prognozavimas Kryžminis patikrinimas Nesutapimų matrica Tiesinė regresija Klasifikavimo taisyklė Sprendimų medis. data mining Data mart Credit risk estimation Classification Regression Cross validation Confusion matrix Linear regression Classification rule Desicion tree
144	Applications of Adaptive Antennas in Third-Generation Mobile Communications Systems Lau, Buon Kiong January 2002 (has links) Adaptive antenna systems (AAS's) are traditionally of interest only in radar and sonar applications. However, since the onset of the explosive growth in demand for wireless communications during the 1990's, researchers are giving increasing attention to the use of AAS technology to overcome practical challenges in providing the service. The main benefit of the technology lies in its ability to exploit the spatial domain, on top of the temporal and frequency domains, to improve on transceiver performance. This thesis presents a unified study on two classes of preprocessing techniques for uniform circular arrays (UCA's). UCA's are of interest because of their natural ability to provide a full azimuth (i.e. 360') coverage found in typical scenarios for sensor array applications, such as radar, sonar and wireless communications. The two classes of preprocessing techniques studied are the Davies transformation and the interpolated array transformations. These techniques yield a mathematically more convenient form - the Vandermonde form - for the array steering vector via a linear transformation. The Vandermonde form is useful for different applications such as direction-of-arrival (DOA) estimation and optimum or minimum variance distortionless response (MVDR) beamforming in correlated signal environment and beampattem synthesis. A novel interpolated array transformation is proposed to overcome limitations in the existing interpolated array transformations. A disadvantage of the two classes of preprocessing techniques for UCA's with omnidirectional elements is the lack of robustness in the transformed array steering vector to array imperfections under certain conditions. In order to mitigate the robustness problem, optimisation problems are formulated to modify the transformation matrices. / Suitable optimisation techniques are then applied to obtain more robust transformations. The improved transformations are shown to improve robustness but at the cost of larger transformation errors. The benefits of the robustification procedure are most apparent in DOA estimation. In addition to the algorithm level studies, the thesis also investigates the use of AAS technology with respect to two different third generation (3G) mobile communications systems: Enhanced Data rates for Global Evolution (EDGE) and Wideband Code Division Multiple Access (WCDMA). EDGE, or more generally GSM/EDGE Radio Access Network (GERAN), is the evolution of the widely successful GSM system to provide 3G mobile services in the existing radio spectrum. It builds on the TDMA technology of GSM and relies on improved coding and higher order modulation schemes to provide packet-based services at high data rates. WCDMA, on the other hand, is based on CDMA technology and is specially designed and streamlined for 3G mobile services. For WCDMA, a single-user approach to DOA estimation which utilises the user spreading code and the pulse-shaped chip waveform is proposed. It is shown that the proposed approach produces promising performance improvements. The studies with EDGE are concerned with the evaluation of a simple AAS at the system and link levels. / Results from, the system and link level simulations are presented to demonstrate the effectiveness of AAS technology in the new mobile communications system. Finally, it is noted that the WCDMA and EDGE link level simulations employ the newly developed COST259 directional channel model, which is capable of producing accurate channel realisations of macrocell environments for the evaluation of AAS's.
145	Preprocesserings påverkan på prediktiva modeller : En experimentell analys av tidsserier från fjärrvärme / Impact of preprocessing on predictive models : An experimental analysis of time series from district heating Andersson, Linda, Laurila, Alex, Lindström, Johannes January 2021 (has links) Värme står för det största energibehovet inom hushåll och andra byggnader i samhället och olika tekniker används för att kunna reducera mängden energi som går åt för att spara på både miljö och pengar. Ett angreppssätt på detta problem är genom informatiken, där maskininlärning kan användas för att analysera och förutspå värmebehovet. I denna studie används maskininlärning för att prognostisera framtida energiförbrukning för fjärrvärme utifrån historisk fjärrvärmedata från ett fjärrvärmebolag tillsammans med exogena variabler i form av väderdata från Sveriges meteorologiska och hydrologiska institut. Studien är skriven på svenska och utforskar effekter av preprocessering hos prediktionsmodeller som använder tidsseriedata för att prognostisera framtida datapunkter. Stegen som utförs i studien är normalisering, interpolering, hantering av numeric outliers och missing values, datetime feature engineering, säsongsmässighet, feature selection, samt korsvalidering. Maskininlärningsmodellen som används i studien är Multilayer Perceptron som är en subkategori av artificiellt neuralt nätverk. Forskningsfrågan som besvaras fokuserar på effekter av preprocessering och feature selection för prediktiva modellers prestanda inom olika datamängder och kombinationer av preprocesseringsmetoder. Modellerna delades upp i tre olika datamängder utifrån datumintervall: 2009, 2007–2011, samt 2007–2017, där de olika kombinationerna utgörs av preprocesseringssteg som kombineras inom en iterativ process. Procentuella ökningar på R2-värden för dessa olika intervall har uppnått 47,45% för ett år, 9,97% för fem år och 32,44% för 11 år. I stora drag bekräftar och förstärker resultatet befintlig teori som menar på att preprocessering kan förbättra prediktionsmodeller. Ett antal mindre observationer kring enskilda preprocesseringsmetoders effekter har identifierats och diskuterats i studien, såsom DateTime Feature Engineerings negativa effekter på modeller som tränats med ett mindre antal iterationer. / Heat accounts for the greatest energy needs in households and other buildings in society. Effective production and distribution of heat energy require techniques for minimising economic and environmental costs. One approach to this problem is through informatics where machine learning is used to analyze and predict the heating needs with the help of historical data from a district heating company and exogenous variables in the form of weather data from Sweden's Meteorological and Hydrological Institute (SMHI). This study is written in Swedish and explores the importance of preprocessing practices before training and using prediction models which utilizes time-series data to predict future energy consumption. The preprocessing steps explored in this study consists of normalization, interpolation, identification and management of numerical outliers and missing values, datetime feature engineering, seasonality, feature selection and cross-validation. The machine learning model used in this study is Multilayer Perceptron which is a subcategory of artificial neural network. The research question focuses on the effects of preprocessing and feature selection for predictive model performance within different datasets and combinations of preprocessing methods. The models were divided into three different data sets based on date ranges: 2009, 2007–2011, and 2007–2017, where the different combinations consist of preprocessing steps that are combined within an iterative process. Percentage increases in R2 values for these different ranges have reached 47,45% for one year, 9,97% for five years and 32,44% for 11 years. The results broadly confirm and reinforce the existing theory that preprocessing can improve prediction models. A few minor observations about the effects of individual preprocessing methods have been identified and discussed in the study, such as DateTime Feature Engineering having a detrimental effect on models with very few training iterations. Machine Learning District Heating Preprocessing Time Series Forecasting Artificial Neural Network Cross-validation Feature Selection Seasonality Exogenous Variables Interpolation MultiLayer Perceptrons. Maskininlärning Fjärrvärme Preprocessering Tidsserier Prognostisering Artificiellt Neuralt Nätverk Korsvalidering Feature Selection Säsongsmässighet Exogena Variabler Interpolering Multilayer Perceptron. Computer and Information Sciences Data- och informationsvetenskap
146	Optimalizace strojového učení pro predikci KPI / Machine Learning Optimization of KPI Prediction Haris, Daniel January 2018 (has links) This thesis aims to optimize the machine learning algorithms for predicting KPI metrics for an organization. The organization is predicting whether projects meet planned deadlines of the last phase of development process using machine learning. The work focuses on the analysis of prediction models and sets the goal of selecting new candidate models for the prediction system. We have implemented a system that automatically selects the best feature variables for learning. Trained models were evaluated by several performance metrics and the best candidates were chosen for the prediction. Candidate models achieved higher accuracy, which means, that the prediction system provides more reliable responses. We suggested other improvements that could increase the accuracy of the forecast.
147	Dolování dat z databází / Data Mining Slezák, Milan January 2011 (has links) The thesis is focused on an introduction of data mining. Data mining is focused on finding of a hidden data correlation. Interest in this area is dated back to the 60th the 20th century. Data analysis was first used in marketing. However, later it expanded to more areas, and some of its options are still unused. One of methodologies is useful used for creating of this process. Methodology offers a concise guide on how you can create a data mining procedure. The data mining analysis contains a wide range of algorithms for data modification. The interest in data mining causes that number of data mining software is increasing. This thesis contains overviews some of this programs, some examples and assessment.
148	Predikce časových řad pomocí statistických metod / Prediction of Time Series Using Statistical Methods Beluský, Ondrej January 2011 (has links) Many companies consider essential to obtain forecast of time series of uncertain variables that influence their decisions and actions. Marketing includes a number of decisions that depend on a reliable forecast. Forecasts are based directly or indirectly on the information derived from historical data. This data may include different patterns - such as trend, horizontal pattern, and cyclical or seasonal pattern. Most methods are based on the recognition of these patterns, their projection into the future and thus create a forecast. Other approaches such as neural networks are black boxes, which uses learning.
149	Analýza scény založená na 2D obrazech / Scene Analysis Based on the 2D Images Hejtmánek, Martin Unknown Date (has links) This thesis deals with an object surface analysis in a simple scene represented by two-dimensional raster image. It summarizes the most common methods used within this branch of information technology and explains both their advantages and drawbacks. It introduces the design of an surface profile analysis algorithm based on the lighting analysis using knowledge and experiences from previous work. It contains a detailed description of the implemented algorithm and discusses the experimental results. It also brings up options for the possible enhancement of the projected algorithm.
150	PDF document search within a very large database Wang, Lizhong January 2017 (has links) Digital search engine, taking a search request from user and then returning a result responded to the request to the user, is indispensable for modern humans who are used to surfing the Internet. On the other hand, the digital document PDF is accepted by more and more people and becomes widely used in this day and age due to the convenience and effectiveness. It follows that, the traditional library has already started to be replaced by the digital one. Combining these two factors, a document based search engine that is able to query a digital document database with an input file is urgently needed. This thesis is a software development that aims to design and implement a prototype of such search engine, and propose latent optimization methods for Loredge. This research can be mainly divided into two categories: Prototype Development and Optimization Analysis. It involves an analytical research on sample documents provided by Loredge and a multi-perspective performance analysis. The prototype contains reading, preprocessing and similarity measurement. The reading part reads in a PDF file by using an imported Java library Apache PDFBox. The preprocessing processes the in-reading document and generates document fingerprint. The similarity measurement is the final stage that measures the similarity between the input fingerprint with all the document fingerprints in the database. The optimization analysis is to balance resource consumptions involving response time, accuracy rate and memory consumption. According to the performance analysis, the shorter the document fingerprint is, the better performance the search program presents. Moreover, a permanent feature database and a similarity based filtration mechanism are proposed to further optimize the program. This project has laid a solid foundation for further study in the document based search engine by providing a feasible prototype and enough relevant experimental data. This study figures out that the following study should mainly focuses on improving the effectiveness of the database access, which involves data entry labeling and search algorithm optimization. / Digital sökmotor, som tar en sökfråga från användaren och sedan returnerar ett resultat som svarar på den begäran tillbaka till användaren, är oumbärligt för moderna människor som brukar surfa på Internet. Å andra sidan, det digitala dokumentets format PDF accepteras av fler och fler människor, och det används i stor utsträckning i denna tidsålder på grund av bekvämlighet och effektivitet. Det följer att det traditionella biblioteket redan har börjat bytas ut av det digitala biblioteket. När dessa två faktorer kombineras, framgår det att det brådskande behövs en dokumentbaserad sökmotor, som har förmåga att fråga en digital databas om en viss fil. Den här uppsatsen är en mjukvaruutveckling som syftar till att designa och implementera en prototyp av en sådan sökmotor, och föreslå relevant optimeringsmetod för Loredge. Den här undersökningen kan huvudsakligen delas in i två kategorier, prototyputveckling och optimeringsanalys. Arbeten involverar en analytisk forskning om exempeldokument som kommer från Loredge och en prestandaanalys utifrån flera perspektiv. Prototypen innehåller läsning, förbehandling och likhetsmätning. Läsningsdelen läser in en PDF-fil med hjälp av en importerad Java bibliotek, Apache PDFBox. Förbehandlingsdelen bearbetar det inlästa dokumentet och genererar ett dokumentfingeravtryck. Likhetsmätningen är det sista steget, som mäter likheten mellan det inlästa fingeravtrycket och fingeravtryck av alla dokument i Loredge databas. Målet med optimeringsanalysen är att balansera resursförbrukningen, som involverar responstid, noggrannhet och minnesförbrukning. Ju kortare ett dokuments fingeravtryck är, desto bättre prestanda visar sökprogram enligt resultat av prestandaanalysen. Dessutom föreslås en permanent databas med fingeravtryck, och en likhetsbaserad filtreringsmekanism för att ytterligare optimera sökprogrammet. Det här projektet har lagt en solid grund för vidare studier om dokumentbaserad sökmotorn, genom att tillhandahålla en genomförbar prototyp och tillräckligt relevanta experimentella data. Den här studie visar att kommande forskning bör huvudsakligen inriktas på att förbättra effektivitet i databasåtkomsten, vilken innefattar data märkning och optimering av sökalgoritm. Portable Document Format Search Document Identification Cosine Similarity Document Preprocessing Document Search Optimization Method Performance Analysis Classification Regression Loredge. Portable Document Format Sökning Dokument Identifiering Cosine Similarity Dokument Förhandling Dokument Sökning Optimering metod Prestandaanalys Klassificering Regression Loredge Computer and Information Sciences Data- och informationsvetenskap

Search results