Global ETD Search

301	Algoritmy pro detekci anomálií v datech z klinických studií a zdravotnických registrů / Algorithms for anomaly detection in data from clinical trials and health registries Bondarenko, Maxim January 2018 (has links) This master's thesis deals with the problems of anomalies detection in data from clinical trials and medical registries. The purpose of this work is to perform literary research about quality of data in clinical trials and to design a personal algorithm for detection of anomalous records based on machine learning methods in real clinical data from current or completed clinical trials or medical registries. In the practical part is described the implemented algorithm of detection, consists of several parts: import of data from information system, preprocessing and transformation of imported data records with variables of different data types into numerical vectors, using well known statistical methods for detection outliers and evaluation of the quality and accuracy of the algorithm. The result of creating the algorithm is vector of parameters containing anomalies, which has to make the work of data manager easier. This algorithm is designed for extension the palette of information system functions (CLADE-IS) on automatic monitoring the quality of data by detecting anomalous records.
302	Algoritmy pro detekci anomálií v datech z klinických studií a zdravotnických registrů / Algorithms for anomaly detection in data from clinical trials and health registries Bondarenko, Maxim January 2018 (has links) This master's thesis deals with the problems of anomalies detection in data from clinical trials and medical registries. The purpose of this work is to perform literary research about quality of data in clinical trials and to design a personal algorithm for detection of anomalous records based on machine learning methods in real clinical data from current or completed clinical trials or medical registries. In the practical part is described the implemented algorithm of detection, consists of several parts: import of data from information system, preprocessing and transformation of imported data records with variables of different data types into numerical vectors, using well known statistical methods for detection outliers and evaluation of the quality and accuracy of the algorithm. The result of creating the algorithm is vector of parameters containing anomalies, which has to make the work of data manager easier. This algorithm is designed for extension the palette of information system functions (CLADE-IS) on automatic monitoring the quality of data by detecting anomalous records.
303	Business Intelligence påverkan på beslutsprocesser : En undersökning av BI-systemens påverkan på beslutsprocesser och förändring av beslutsunderlaget hos en organisation Berhane, Aron, Nabeel, Mohamad January 2020 (has links) Business Intelligence-systems are well embedded in the daily work of managers in the organizations today. These systems have a significant impact on the management of big data as well as assisting managers in making decisions. The purpose of this study is to investigate how BI-systems affect decision making processes and the change of decision making in organizations and their activities, by looking into three aspects in Business intelligence (data quality, data analysis and the human factor). This study's approach consists of literature review and interviews with organizations who have implemented a BI-system in their activities. The results show that data quality does not have a direct impact on the use success of BI but is still an essential aspect when it comes to information management for decision making. Data analysis tools offers various of methods to help a decision maker in creating a decision basis for different types of decisions. BI-systems affects the decision-making process by making organizations think systematically in decision making in comparison with making decisions based on intuitions. / Business intelligence-system idag är väl inbäddade i det dagliga arbetet hos ledningen i organisationer. Dessa system har en stor inverkan på hanteringen av Big data och hjälper cheferna att fatta beslut. Studien syftar till att undersöka BI-systemens påverkan på beslutsprocesser och förändringen av beslutsunderlag hos organisationer och dess verksamheter, genom att undersöka utifrån tre aspekter inom BI som kan ha ett inflytande hos beslutsprocessen och beslutsunderlaget (Datakvalitet, dataanalys och mänskliga faktorn). Studiens tillvägagångssätt består av litteraturstudie och intervjuer med organisationer som har ett BI-system implementerat i deras verksamhet. Resultaten antyder på att datakvalitet har ingen direkt påverkan på framgången med BI men att kvalitetssäkring av data är fortfarande en essentiell del vid bearbetning av information för beslutsunderlaget, dataanalysverktyg erbjuder variationer av metoder för att hjälpa en beslutsfattare i att skapa beslutsunderlag för olika typer av beslut. BIsystems påverkar processen genom att organisationer utför ett mer systematiskt tänkande vid beslutshantering i jämförelse med att ta beslut på intuition. Decision decision making process BI-system data analysis data quality human factor. Beslut beslutsprocess beslutsunderlag BI-system dataanalys datakvalitet mänskliga faktorn. Övrig annan teknik
304	Faktorer som påverkar ett rättvist beslutsfattande : En undersökning av begränsningar och möjligheter inom datainsamling för maskininlärning Westerberg, Erik January 2023 (has links) Artificial intelligence, AI, is widely acknowledged to have atransformative impact on various industries. However, thistechnology is not without its limitations. One such limitationis the potential reinforcement of human biases withinmachine learning systems. After all, these systems rely ondata generated by humans. To address this issue, theEuropean Union, EU, are implementing regulationsgoverning the development of AI systems, not only topromote ethical decision-making but also to curb marketoligopolies. Achieving fair decision-making relies on highquality data. The performance of a model is thussynonymous with high-quality data, encompassing breadth,accurate annotation, and relevance. Previous researchhighlights the lack of processes and methods guiding theeffort to ensure high-quality training data. In response, thisstudy aims to investigate the limitations and opportunitiesassociated with claims of data quality within the domain ofdata collection research. To achieve this, a research questionis posed: What factors constrain and enable the creation of ahigh-quality dataset in the context of AI fairness? The studyemploys a method of semistructured interviews withindustry experts, allowing them to describe their personalexperiences and the challenges they have encountered. Thestudy reveals multiple factors that restrict the ability tocreate a high-quality dataset and, ultimately, a fair decisionmaking system. The study also reveals a few opportunities inrelation to high quality data, which methods associated withthe research landscape provides. / Att artificiell intelligens är något som kommer vända uppoch ned på många branscher är något som många experter äröverens om. Men denna teknik är inte helt befriad frånbegränsningar. En av dessa begränsningar är att ett systemsom använder maskininlärning potentiellt kan förstärka defördomar vi människor besitter. Tekniken grundar sig trotsallt i data, data som skapas av oss människor. EU harbestämt sig för att tackla denna problematik genom att införaregler gällande huruvida system som tillämpar AI skallutvecklas. Både för att gynna det etiska beslutsfattandet menockså för att hämma oligopol på marknaden. För att uppnåett så rättvist beslutsfattande som möjligt krävs det data avhög kvalitet. En modells prestanda är således synonymt meddata av hög kvalitet, där bredd, korrekt annotering ochrelevans är betydande. Tidigare forskning pekar påavsaknaden av processer och metoder för att vägleda arbetetmed att säkerställa högkvalitativa träningsdata. Som svar pådetta syftar denna studie till att undersöka vilkabegränsningar och möjligheter som gör anspråk pådatakvalitet i delar av forskningsområdet Data collection.Detta görs genom att besvara forskningsfrågan: Vilkafaktorer begränsar och möjliggör skapandet av etthögkvalitativt dataset i kontexten rättvis AI? Metoden somtillämpas i studien för att besvara ovanstående ärsemistrukturerade intervjuer där yrkesverksamma experterfår beskriva sina personliga upplevelser gällande vilkautmaningar de har ställts inför. Studien resulterar i ett antalfaktorer som begränsar förutsättningarna för att skapa etthögkvalitativt dataset och i slutändan ett rättvistbeslutsfattande system. Studien resulterar även i att peka påett antal möjligheter i relation till högkvalitativa data, sommetoder associerade med forskningslandskapet besitter AI Machine learning AI fairness Data quality Limitations opportunities Affordance theory AI Maskininlärning Rättvis AI Datakvalitet Begränsningar möjligheter Affordance theory Information Systems, Social aspects
305	INVESTIGATING THE IMPACT OF LEAN SIX SIGMA PRINCIPLES ON ESTABLISHING AND MAINTAINING DATA GOVERNANCE SYSTEMS IN SMES: AN EXPLORATORY STUDY USING GROUNDED THEORY AND ISM APPROACH Manal Alduraibi (15265348) 29 April 2023 (has links) <p>Data Governance and Data Privacy are critical aspects of organizational management that are widely utilized across all organizational scales. However, this research focused specifically on the significance of Data Governance and Data Privacy in Small and Medium Enterprises (SMEs). While the importance of maintaining these systems is paramount across all organizations, the challenges faced by SMEs in maintaining these systems are greater due to their limited resources. These challenges include potential errors such as data leaks, use of corrupted data, or insufficient data, as well as the difficulty in identifying clear roles and responsibilities regarding data handling. To address these challenges, this research investigated the impact of utilizing Lean Six Sigma (LSS) tools and practices to overcome the anticipated gaps and challenges in SMEs. The qualitative methodology utilized is a grounded theory design, chosen due to the limited understanding of the best LSS practices for achieving data governance and data privacy in SMEs and how LSS can improve the adoption of data governance concerning privacy in SMEs. Data were collected using semi-structured interview questions that were reviewed by an expert panel and pilot tested. The sampling method included purposive, snowballing, and theoretical sampling, resulting in 20 participants being selected for interviews. Open, axial, and selective coding were performed, resulting in the development of a grounded theory. The obtained data were imported into NVivo, a qualitative analysis software program, to compare responses, categorize them into themes and groups, and develop a conceptual framework for Data Governance and Data Privacy. An iterative data collection and analysis approach was conducted to ensure that all aspects were considered. The applied grounded theory resulted in retrieving the themes used to generate a theory from the participants’ descriptions of LSS, SMEs, data governance, and data privacy. Finally, ISM technique has been applied to identify the relationships between the concepts and factors resulted from the grounded theory. It helps arranging the levels the criteria, drawing the relationships in a flowchart, and providing valuable insights to the researcher. </p> Quality management Data quality Information security management Privacy and data rights data governance model Lean Six Sigma (LSS) Data Privacy small and medium companies
306	Master Data Management-studie om nästa entiteto och leverantör för Scania / Master Data Management study about the next entity and suppier for Scania Oldelius, David, Pham, Douglas January 2018 (has links) Stora företag har olika avdelningar där informationen från dessa måste hanteras. Master Data Management(MDM) är ett informationshanteringssystem för att hantera information från olika källor. En MDM-implementation sker med en entitet i taget. Arbetets problemställning är att rekommendera nästa entitet att inkludera i MDM-implementationen hos Scania samt vilken leverantör som passar till implementationen. En rekommendation av entitet framställs av material från Scania och intervjuer med anställda på Scania. Rekommendationen av leverantör framställs från material från leverantörer och intervjuer med leverantörerna. Entiteten som rekommenderas är produkt som individ för att informationen i området har behov av förbättrad hantering och entiteten är nära kärnverksamheten. Orchestra Networks är leverantören som rekommenderas för att de ligger i framkant inom MDM, de är nischade mot området och är starka inom produktinformation. / Enterprises has different departments and the information from them needs management. Master Data Management(MDM) is an information handling system for handling information from different sources. One entity at the time is implemented to MDM. The work's problem is to recommend the next entity to include in the MDM implementation at Scania as well as which provider fits the implementation. A recommendation of entity is prepared from materials provided by Scania and interviews with employees at Scania. A recommendation of provider is prepared from materials from the providers and interviews with the providers. The recommended entity is product as individual because information in the area needs improved management. Orchestra Networks is the recommended supplier because they are a leader among the MDM providers, they are specialised in the area and they are strong in the product information area. Data governance master data Master Data Management domain multidomain data quality MDM information management Data governance masterdata Master Data Management domän multidomän datakvalité MDM informationshantering Information Systems
307	The importance of supplier information quality in purchasing of transport services / Betydelsen av leverantörers informationskvalitet vid inköp av transporttjänster GORDOS, PYGMALION-ALEXANDROS, BULOVAS, JONAS January 2018 (has links) An important prerequisite for successful supply chain integration is the ability to convert data into information combined with structured storing and sharing processes. The purpose of this master thesis is to investigate potential relation between supplier data quality and performance of purchasing of transport services. The output of the thesis generates evidence about the imperative to emphasize on the supplier data quality throughout the supplier selection process. A supplier data quality assessment framework consisting of 4 dimensions - ease of manipulation, accessibility, accuracy and completeness, is developed as the core product of this research project. The weights of these dimensions were assigned specifically for the case company - Cramo, to determine the quality score for a selected sample of carriers. A coefficient k1 representing the ratio of transport expenditure over sales was introduced to facilitate the identification of relation between supplier data quality and transport expenditure. Business units served by transport companies with higher quality data displayed a lower k1, consequently, paying less for the transport services in comparison to their revenue than business units served by carriers with lower data quality score. The framework developed is adaptable - dimensions and metrics can be added or excluded according to situational factors and case peculiarities. The application of the supplier data quality assessment framework allows for a more objective and streamlined supplier selection. It stresses on the overall costs experienced during the period of cooperation. The finding regarding the importance of supplier data quality in purchasing of transport services can be nonetheless generalized for other cases when companies strive for achieving better informed strategic decisions. / En viktig förutsättning för framgångsrik integration av leverantörskedjor ligger i förmågan att omvandla data till information, kombinerat med en strukturerad lagrings- och delningsprocess. Syftet med denna masteruppsats är att undersöka potentiell relation mellan leverantörers datakvalitet och hur effektivt inköpet av transporttjänsterna är. Utfallet av uppsatsen understryker vikten av att beakta leverantörers datakvalitet i alla delar av en upphandling. Som produkt av denna uppsats har en utvärderingsmall för leverantörers datakvalitet utvecklats. Den består av fyra dimensioner – Hanterbarhet, tillgänglighet, noggrannhet samt fullständighet. De olika dimensionerna är viktade specifikt för det studerade företaget – Cramo, för att fastslå kvalitetsindex för ett urval av deras transportörer. En koefficient - k1- infördes för att representera förhållandet mellan transportkostnad och försäljning. Detta för att underlätta identifieringen av potentiell relation mellan datakvalitet och transportkostnad. Depåer vars transportörer kunde uppvisa en högre datakvalitet hade ett lägre koefficientvärde (k1). Alltså fanns ett samband mellan hög datakvalitet och lägre transportkostnad i förhållande till försäljning. Den utvecklade bedömningsmallen är anpassningsbar – dimensioner och mått kan enkelt adderas eller elimineras utifrån rådande omständigheter i varje fall. Bedömningsmallen ger möjlighet till en mer objektiv och harmoniserad leverantörsbedömning. Mallen understryker även vikten av att beakta den totala kostnaden under avtalstiden. Kunskapen från denna uppsats kring vikten av datakvalitet gällande just transportinköp kan även generaliseras till andra fall där företag strävar mot bättre informerade strategiska beslut. supply chain integration purchasing performance transport services information infrastructure supplier data quality assessment framework total cost of ownership leverantörskejda integration inköpseffektivitet transporttjänster informationsinfrastruktur leverantörsdatakvalitet bedömningsmall totalkostnaden för ägande Övrig annan teknik
308	Data Quality Assurance Begins Before Data Collection and Never Ends: What Marketing Researchers Absolutely Need to Remember Moore, Zachary, Harrison, Dana E., Hair, Joe 01 November 2021 (has links) Data quality has become an area of increasing concern in marketing research. Methods of collecting data, types of data analyzed, and data analytics techniques have changed substantially in recent years. It is important, therefore, to examine the current state of marketing research, and particularly self-administered questionnaires. This paper provides researchers important advice and rules of thumb for crafting high quality research in light of the contemporary changes occuring in modern marketing data collection practices. This is accomplished by a proposed six-step research design process that ensures data quality, and ultimately research integrity, are established and maintained throughout the research process—from the earliest conceptualization and design phases, through data collection, and ultimately the reporting of results. This paper provides a framework, which if followed, will result in reduced headaches for researchers and more robust results for decision makers. big data data analytics data quality marketing research research design research methods sampling secondary data self-administered questionnaires statistical analysis structural equation modeling survey design survey research Management and Marketing
309	Bridging Language & Data : Optimizing Text-to-SQL Generation in Large Language Models / Från ord till SQL : Optimering av text-till-SQL-generering i stora språkmodeller Wretblad, Niklas, Gordh Riseby, Fredrik January 2024 (has links) Text-to-SQL, which involves translating natural language into Structured Query Language (SQL), is crucial for enabling broad access to structured databases without expert knowledge. However, designing models for such tasks is challenging due to numerous factors, including the presence of ’noise,’ such as ambiguous questions and syntactical errors. This thesis provides an in-depth analysis of the distribution and types of noise in the widely used BIRD-Bench benchmark and the impact of noise on models. While BIRD-Bench was created to model dirty and noisy database values, it was not created to contain noise and errors in the questions and gold queries. We found after a manual evaluation that noise in questions and gold queries are highly prevalent in the financial domain of the dataset, and a further analysis of the other domains indicate the presence of noise in other parts as well. The presence of incorrect gold SQL queries, which then generate incorrect gold answers, has a significant impact on the benchmark’s reliability. Surprisingly, when evaluating models on corrected SQL queries, zero-shot baselines surpassed the performance of state-of-the-art prompting methods. The thesis then introduces the concept of classifying noise in natural language questions, aiming to prevent the entry of noisy questions into text-to-SQL models and to annotate noise in existing datasets. Experiments using GPT-3.5 and GPT-4 on a manually annotated dataset demonstrated the viability of this approach, with classifiers achieving up to 0.81 recall and 80% accuracy. Additionally, the thesis explored the use of LLMs for automatically correcting faulty SQL queries. This showed a 100% success rate for specific query corrections, highlighting the potential for LLMs in improving dataset quality. We conclude that informative noise labels and reliable benchmarks are crucial to developing new Text-to-SQL methods that can handle varying types of noise. Chaining Classification Data Quality Few-Shot Learning Large Language Model Machine Learning Noise Prompt Prompt Engineering SQL Structured Query Language Text-to-SQL Zero-Shot Learning Noise Identification
310	Data-based Explanations of Random Forest using Machine Unlearning Tanmay Laxman Surve (17537112) 03 December 2023 (has links) <p dir="ltr">Tree-based machine learning models, such as decision trees and random forests, are one of the most widely used machine learning models primarily because of their predictive power in supervised learning tasks and ease of interpretation. Despite their popularity and power, these models have been found to produce unexpected or discriminatory behavior. Given their overwhelming success for most tasks, it is of interest to identify root causes of the unexpected and discriminatory behavior of tree-based models. However, there has not been much work on understanding and debugging tree-based classifiers in the context of fairness. We introduce FairDebugger, a system that utilizes recent advances in machine unlearning research to determine training data subsets responsible for model unfairness. Given a tree-based model learned on a training dataset, FairDebugger identifies the top-k training data subsets responsible for model unfairness, or bias, by measuring the change in model parameters when parts of the underlying training data are removed. We describe the architecture of FairDebugger and walk through real-world use cases to demonstrate how FairDebugger detects these patterns and their explanations.</p> Data engineering and data science Data quality Model Debugging Example-based explanations Algorithmic Fairness Data Cleaning Data Analytics Fairness in ML Random Forest Debugging

Search results