Spelling suggestions: "subject:"failure recovery"" "subject:"ailure recovery""
11 |
Persistence and Node FailureRecovery in Strongly Consistent Key-Value DatastoreEhsan ul Haque, Muhammad January 2012 (has links)
Consistency preservation of replicated data is a critical aspect for distributed databaseswhich are strongly consistent. Further, in fail-recovery model each process also needs todeal with the management of stable storage and amnesia [1]. CATS is a key/value datastore which combines the Distributed Hash Table (DHT) like scalability and selforganization and also provides atomic consistency of the replicated items. However beingan in memory data store with consistency and partition tolerance (CP), it suffers frompermanent unavailability in the event of majority failure. The goals of this thesis were twofold (i) to implement disk persistent storage in CATS,which would allow the records and state of the nodes to be persisted on disk and (ii) todesign nodes failure recovery-algorithm for CATS which enable the system to run with theassumption of a Fail Recovery model without violating consistency. For disk persistent storage two existing key/value databases LevelDB [2] and BerkleyDB[3] are used. LevelDB is an implementation of log structured merged trees [4] where asBerkleyDB is an implementation of log structured B+ trees [5]. Both have been used as anunderlying local storage for nodes and throughput and latency of the system with each isdiscussed. A technique to improve the performance by allowing concurrent operations onthe nodes is also discussed. The nodes failure-recovery algorithm is designed with a goalto allow the nodes to crash and then recover without violating consistency and also toreinstate availability once the majority of nodes recover. The recovery algorithm is based onpersisting the state variables of Paxos [6] acceptor and proposer and consistent groupmemberships. For fault-tolerance and recovery, processes also need to copy records from the replicationgroup. This becomes problematic when the number of records and the amount of data ishuge. For this problem a technique for transferring key/value records in bulk is alsodescribed, and its effect on the latency and throughput of the system is discussed.
|
12 |
POWER IN THE CLICK OF THE BEHOLDER: THE INFLUENCE OF ELECTRONIC NEGATIVE WORD-OF-MOUTH ON BRAND MANAGEMENTDe Laine, Kimberleigh, 0009-0000-9722-0701 January 2023 (has links)
Ever since the creation of Web 2.0, there has been a seismic shift in how businesses
advertise and promote their brands. Social media has birthed a new platform for people
and organizations to interact with each other to pass information and opinions or accounts
of experiences with products or services. As more consumers gravitate towards social
media, firms are leveraging this sensation to engage and forge relationships with
individuals which in most cases positively influence consumers’ purchase decisions.
However, when some customers are dissatisfied with services or products, they engage in
social media negative word-of-mouth (NWOM) which could impact a brand’s reputation,
the consumer’s purchase intention and ultimately the firm’s bottom line. In the first
study, 118 undergraduate students were surveyed, and empirical evidence was found to
support mediating effects of brand reputation on the relationship between social media
and purchase intention and moderating effects of brand engagement on the relationship
between social media NWOM and brand reputation. In the second study, scenarios were
presented to undergraduate students to investigate the impact of social media NWOM on
small/local businesses vs. large chain businesses, the difficulty of recovery for small/local
businesses, the NWOM correlation of switching behavior after product/service failure,
and responses from a firm after a product/service failure. The third study replicated the
findings from study two using a more diverse sample instead of students. The study
expanded and explored why trust and recovery levels differ in large chain versus
small/local businesses. Results indicated that small businesses suffered more from the
failure in service/product but made a larger surge in trust than large chain businesses.
Keywords: Negative-word-of-mouth, social media, brand engagement, business failure
recovery, brand trust, switching behavior / Business Administration/Marketing
|
13 |
基於複合式架構建構具高強健性的智慧家庭服務管理系統 / Robust Service Management for Smart Home Environments: A Hybrid Approach張惟誠, Chang, Wei Chen Unknown Date (has links)
智慧家庭環境是一個典型的分散式系統,在此類環境中的智慧服務大都由一至多個節點組成,例如一個智慧空調服務需要冷氣機、溫度感測器和邏輯判斷節點。然而,只要服務其中一個節點故障,整個服務就無法正常運作。由於居住在家庭中的大都是不具技術能力的使用者,故理想的智慧家庭服務,即使在有節點故障的狀況下,也應能在短時間內盡可能自動偵測與排除錯誤,使服務的運作不被中斷。本研究主要目的在於提出一個智慧家庭的強健服務管理系統,基於創新的複合式架構,結合集中式與非集中式錯誤偵測機制的特色,能在短時間內偵測到節點失效,進而恢復由於軟體所造成的節點故障或尋找待用節點,使得服務能繼續運行。 / Smart home systems are different from traditional computing systems. In a smart home system, a service is composed of several service nodes. For example, a smart air conditioning service needs a temperature sensor, an application logic, and an air conditioner. A service fails if one of its affiliating nodes fails. However, unexpected failures are undesirable for mission critical services such as healthcare or surveillance. Moreover, a smart home lacks professional system administrators. Users are generally unable to repair a service when it fails. Consequently, in a smart home system, the failed services have to be diagnosed and recovered automatically. In this paper describes a hybrid failure detection and recovery method for smart home environments. Experiments show that the proposed architecture is able to enhance overall availability of a smart home system in a short time.
|
14 |
Reliability and security of vector routing protocolsLi, Yan, doctor of computer science 01 June 2011 (has links)
As the Internet becomes the ubiquitous infrastructure for various applications, demands on the reliability, availability and security of routing protocols in the Internet are becoming more stringent. Unfortunately, failures are still common in the daily operation of a network. Service disruption for even a short time can seriously affect the quality of real-time applications, such as VoIP and video on demand applications. Moreover, critical business and government
applications require routing protocols to be robust against malicious attacks, such as denial of Service attacks. This dissertation proposes three techniques to address some reliability and security
concerns in intra-domain (distance vector) routing protocols and
inter-domain (path vector) routing protocols.
The first technique addresses the problem of service disruption that
arises from sudden link failures in distance vector routing protocols. We consider two types of link failures: single link failures and shared risk link group failures. For single link failures, we propose an IP fast reroute mechanism to reroute packets around the failed
links. This fast reroute mechanism is the first that does not require
complete knowledge of the network topology and does not require
changing of the original routing protocol. This mechanism proactively computes a set of relay nodes that can be used to tunnel the rerouted
packets immediately after the detection of a link or node failure. The mechanism includes an algorithm for a node to automatically identify
itself as a candidate relay node for a reroute link and notify the
source node of the reroute link of its candidacy. The source node can
then decide the validity of a candidate relay node. The mechanism also includes an algorithm to suppress redundant notification messages. We then extend our IP fast reroute mechanism for single link
failures to accommodate shared risk link group failures. We achieve this goal by introducing one more bit information. Through
simulations, I show that the proposed mechanisms succeed in rerouting around failed links about 100% of the time, with the length of the reroute path being comparable to the length of the re-converged shortest path.
The second technique addresses the problem that arises from allowing
any node to route data packets to any other node in the network (and
consequently allow any adversary node to launch DoS attacks against
other nodes in the network). To solve this problem, we propose a
blocking option to allow a node u to block a specified set of
nodes and prevent each of them from sending or forwarding packets to node u. The blocking option intends to discard violating
packets near the adversary nodes that generated them rather than near their ultimate destinations. We then discuss unintentionally blocked nodes, called blind nodes and extend the routing protocols to allow each node to communicate with its blind nodes via some special nodes called joint nodes. Finally, I show, through extensive simulation, that the average number of blind nodes is close to zero when the average number of blocked nodes is small.
The third technique addresses the problem that arises when a set of
malicious ASes in the Internet collude to hijack an IP prefix from its legitimate owner in BGP. (Note that none of previous proposals for protecting BGP against IP prefix hijacking is effective when malicious
ASes can collude.) To solve this problem, we propose an extension of
BGP in which each listed AS in an advertised route supplies a
certified full list of all its peers. Then I present an optimization where each AS in an advertised route supplies only a balanced peer list, that is much smaller than its full peer list. Using real Internet topology data, I demonstrate that the average, and largest, balanced peer list is 92% smaller than the corresponding full peer list. Furthermore, in order to handle the dynamics of the Internet topology, we propose algorithms on how to issue certificates to reflect the latest changes of the Internet topology graph.
Although the results in this dissertation are presented in the context of distance vector and path vector routing protocols, many of these results can be extended to link state routing protocols as well. / text
|
15 |
Detec??o e recupera??o de falhas para a m?quina de redu??o de grafos PEWS-AMLima, Jos? Sueney de 28 February 2014 (has links)
Made available in DSpace on 2014-12-17T15:48:10Z (GMT). No. of bitstreams: 1
JoseSL_DISSERT.pdf: 4125269 bytes, checksum: 88052259a19538e206fc0b62be64f6cb (MD5)
Previous issue date: 2014-02-28 / Universidade Federal do Rio Grande do Norte / Web services are software units that allow access to one or more resources, supporting the
deployment of business processes in the Web. They use well-defined interfaces, using web
standard protocols, making possible the communication between entities implemented
on different platforms. Due to these features, Web services can be integrated as services
compositions to form more robust loose coupling applications. Web services are subject
to failures, unwanted situations that may compromise the business process partially or
completely. Failures can occur both in the design of compositions as in the execution of
compositions. As a result, it is essential to create mechanisms to make the implementation
of service compositions more robust and to treat failures. Specifically, we propose the support
for fault recovery in service compositions described in PEWS language and executed
on PEWS-AM, an graph reduction machine. To support recovery failure on PEWS-AM,
we extend the PEWS language specification and adapted the rules of translation and
reduction of graphs for this machine. These contributions were made both in the model
of abstract machine as at the implementation level / Servi?os Web s?o unidades de software que permitem o acesso a um ou mais recursos,
dando suporte ? implanta??o de processos de neg?cios na Web. Eles permitem a intera??o
atrav?s de interfaces bem-definidas, utilizando-se de protocolos padr?es da Web,
tornando poss?vel a comunica??o entre entidades implementadas em diferentes tipos de
plataformas. Em virtude dessas caracter?sticas, Servi?os Web podem ser integrados com
o objetivo de formar aplica??es mais robustas, com baixo acoplamento entre servi?os,
atrav?s de composi??es. Servi?os Web est?o sujeitos a falhas, situa??es indesejadas que
podem comprometer o processo de neg?cio parcialmente ou mesmo totalmente. Falhas
podem ocorrer tanto na concep??o de composi??es quanto na execu??o das mesmas. Em
virtude disso, ? essencial a cria??o de mecanismos que tornem a execu??o das composi??es
de servi?os mais robusta e tratem falhas. Especificamente, propomos o suporte ? recupera??o
de falhas em composi??es de servi?os descritas na linguagem PEWS e executadas
sobre PEWS-AM, uma implementa??o criada a partir da no??o de grafos. Para o suporte
? recupera??o de falhas em PEWS-AM, estendemos as especifica??es PEWS e adaptamos
as regras de tradu??o e redu??o de grafos desta m?quina. Estas contribui??es foram
realizadas tanto no modelo da m?quina abstrata quanto no n?vel da implementa??o
|
16 |
[en] USING RUNTIME INFORMATION AND MAINTENANCE KNOWLEDGE TO ASSIST FAILURE DIAGNOSIS, DETECTION AND RECOVERY / [pt] UTILIZANDO INFORMAÇÕES DA EXECUÇÃO DO SISTEMA E CONHECIMENTOS DE MANUTENÇÃO PARA AUXILIAR O DIAGNÓSTICO, DETECÇÃO E RECUPERAÇÃO DE FALHASTHIAGO PINHEIRO DE ARAUJO 16 January 2017 (has links)
[pt] Mesmo sistemas de software desenvolvidos com um controle de qualidade
rigoroso podem apresentar falhas durante seu ciclo de vida. Quando uma falha é
observada no ambiente de produção, mantenedores são responsáveis por produzir
o diagnóstico e remover o seu defeito correspondente. No entanto, em um serviço
crítico este tempo pode ser muito longo, logo, se for possível, a assinatura da falha
deve ser utilizada para gerar um mecanismo de recuperação automático capaz de
detectar e tratar futuras ocorrências similares, até que o defeito possa ser
removido. Nesta tese, a atividade de recuperação consiste em restaurar o sistema
para um estado correto, que permita continuar a execução com segurança, ainda
que com limitações em suas funcionalidades. Para serem eficazes, as tarefas de
diagnóstico e recuperação requerem informações detalhadas sobre a execução que
falhou. Falhas que ocorrem durante a fase de testes em um ambiente controlado
podem ser depuradas através da inserção de nova instrumentação e re-execução da
rotina que contem o defeito, tornando mais fácil o estudo de comportamentos inesperados. No entanto, falhas que ocorrem no ambiente de produção apresentam
informações limitadas à situação específica em que ocorrem, além de serem
imprevisíveis. Para mitigar essa adversidade, informações devem ser coletadas
sistematicamente com o intuito de detectar, diagnosticar para recuperar e,
eventualmente, diagnosticar para remover a circunstância geradora da falha. Além
disso, há um balanceamento entre a informação inserida como instrumentação e a
performance do sistema: técnicas de logging geralmente apresentam baixo
impacto no desempenho, porém não provêm informação suficiente sobre a
execução; por outro lado, as técnicas de tracing podem registrar informações
precisas e detalhadas, todavia são impraticáveis para um ambiente de produção.
Esta tese propõe uma abordagem hibrida para gravação e extração de informações durante a execução do sistema. A solução proposta se baseia no registro de
eventos, onde estes são enriquecidos com propriedades contextuais sobre o estado
atual da execução no momento em que o evento é gravado. Através deste registro
de eventos com informações de contexto, uma técnica de diagnóstico e uma
ferramenta foram desenvolvidas para permitir que eventos pudessem ser filtrados
com base na perspectiva de interesse do mantenedor. Além disso, também foi
desenvolvida uma abordagem que utiliza estes eventos enriquecidos para detectar
falhas automaticamente visando recuperação. As soluções propostas foram
avaliadas através de medições e estudos conduzidos em sistemas implantados,
baseando-se nas falhas que de fato ocorreram enquanto se utilizava o software em
um contexto de produção. / [en] Even software systems developed with strict quality control may expect failures during their lifetime. When a failure is observed in a production environment the maintainer is responsible for diagnosing the cause and eventually removing it. However, considering a critical service this might demand too long a time to complete, hence, if possible, the failure signature should be identified in order to generate a recovery mechanism to automatically detect and handle future occurrences until a proper correction can be made. In this thesis, recovery consists of restoring a correct context allowing dependable execution, even if the causing fault is still unknown. To be effective, the tasks of diagnosing and recovery implementation require detailed information about the failed execution. Failures that occur during the test phase run in a controlled environment, allow adding specific code instrumentation and usually can be replicated, making it easier to study the unexpected behavior. However, failures that occur in the production environment are limited to the information present in the first occurrence of the failure. But run time failures are obviously unexpected, hence run time data must be gathered systematically to allow detecting, diagnosing with the purpose of recovering, and eventually diagnosing with the purpose of removing the causing fault. Thus there is a balance between the detail of information inserted as instrumentation and the system performance: standard logging techniques usually present low impact on performance, but carry insufficient information about the execution; while tracing techniques can record precise and detailed information, however are impracticable for a production environment. This thesis proposes a novel hybrid approach for recording and extracting system s runtime information. The solution is based on event logs, where events are enriched with contextual properties about the current state of the execution at the moment the event is recorded. Using these enriched log events a diagnosis technique and a tool have been developed to allow event filtering based on the maintainer s perspective of interest. Furthermore, an approach using these enriched events has been developed that allows detecting and diagnosing failures aiming at recovery. The proposed solutions were evaluated through measurements and studies conducted using deployed systems, based on failures that actually occurred while using the software in a production context.
|
17 |
Recovery of Cycling Endurance Failure in Ferroelectric FETs by Self-HeatingMulaosmanovic, Halid, Breyer, Evelyn T., Mikolajick, Thomas, Slesazeck, Stefan 26 November 2021 (has links)
This letter investigates the impact of self-heating on the post-cycling functionality of a scaled hafnium oxide-based ferroelectric field-effect transistor (FeFET). The full recovery of FeFET switching properties and data retention after the cycling endurance failure is reported. This is achieved by damage annealing through localized heating, which is intentionally induced by a large current flow through the drain (source)-body p-n junctions. The results highlight that the local thermal treatments could be exploited to extend the cycling endurance of FeFETs.
|
18 |
Information technology incidents in the present information society:viewpoints of service providers, users, and the mass mediaTervo, H. (Heli) 30 November 2011 (has links)
Abstract
Our society relies increasingly on information technology (IT). In such a society, it is important that we, as citizens, trust and are satisfied with services utilizing IT. Unfortunately, IT problems in the use of services are part of our daily lives and, as such, are frequently reported by the mass media. While the information systems (IS) field has studied system and service acceptance, use, threats, and failures, we have found no studies that examine how these IT failures affect the system usage after a failure.
This dissertation addresses this gap in research by studying users’ intentions after service degradation related to IT problems and providing a larger view of IT-based incidents in an information society from the viewpoints of the mass media and service providers. In order to do this, a newspaper survey was first conducted to ascertain a concept of the public perception of IT-based problems. Second, qualitative interviews were conducted to reach an understanding of service providers’ viewpoints of IT problems. Third, users’ attitudes and reactions to service degradation were studied through interviews.
The main results reveal that most of the IT problems visible to society are the same ones that system and service providers perceive to be the most problematic. Our results suggest that, after service degradation, users are eager to use the service again if they receive relevant information. Compensation alone will not satisfy users when the incident creates unpredictability and uncertainty for them. If the system provider does not inform users directly after the incident, they readily rely on the mass media. Information and knowledge thus play a significant role in incidents. However, there are two service types where we found a different type of user reaction. First, telecommunications and computers seemed to be special cases, with more tolerance of problems in general. Second, the tolerance was low with regard to vital services, i.e., services related to children, health, and safety, for example. Nevertheless, in interviews it was seen that in both types of services the effect of real time and accurate information was influential, often more than any other activities in the failure recovery. The results of this study provide new views of IT-based incidents in an information society, as well as insights for service providers to better recover from service degradation. / Tiivistelmä
Yhteiskuntamme on rakennettu informaatioteknologian (IT) varaan. Tällaisessa informaatioyhteiskunnassa kansalaisten tulisi voida luottaa käyttämiinsä palveluihin. Palvelujen käyttäjät kohtaavat kuitenkin päivittäin virheitä informaatioteknologiaan pohjautuvissa järjestelmissä. Media uutisoi usein näistä virheistä. Alan kirjallisuus on tutkinut esimerkiksi järjestelmien ja palvelujen vastaanottoa, käyttöä, uhkia ja häiriöitä. Kuitenkaan IT-palvelujen virheiden vaikutuksista järjestelmien käyttöön ei löytynyt kirjallisuutta.
Tämä tutkimus pyrkii tältä osin täydentämään kirjallisuutta selvittämällä käyttäjien ajattelutapaa ja aikomuksia palvelun heikentymisen jälkeen, ja tutkimus myös hahmottelee laajemman kuvan informaatioyhteiskunnan IT-ongelmista erityisesti median ja palveluntuottajien näkökulmasta. Ongelmien julkisen kuvan saamiseksi tutkittiin ensin sanomalehtiuutisia IT-ongelmista. Seuraavaksi selvitettiin haastatteluin palveluntuottajien ja järjestelmätoimittajien näkemyksiä IT-ongelmista. Lopuksi tutkittiin vielä palveluiden käyttäjien mielipiteitä ja reaktioita häiriön sattuessa ja sen jälkeen haastattelujen avulla.
Päätulokset osoittavat, kuinka yhteiskunnan näkyvimmät IT-ongelmat ovat samoja, joiden kanssa myös järjestelmätoimittajat ja palveluntarjoajat kamppailevat. Lisäksi tutkimustulosten mukaan käyttäjät palaavat herkemmin käyttämään palvelua mikäli he saavat asiaankuuluvaa tietoa tilanteesta. Pelkkä aineellinen korvaus ei riitä silloin kun IT-häiriö luo arvaamattomuutta ja epävarmuutta. Jos palveluntuottaja ei tiedota asiasta häiriön sattuessa, käyttäjät luottavat helposti massamedian tarjontaan. Tiedolla ja tietämyksellä on merkittävä rooli ongelmatilanteissa. Tutkimuksessa löytyi kuitenkin kaksi palvelutyyppiä, joissa käyttäjien käytös oli erilaista. Ensinnäkin tietoliikenteen ja tietokoneiden suhteen käyttäjät olivat pitkämielisiä virheiden sattuessa. Toiseksi taas elintärkeät palvelut, esimerkiksi terveyteen, lapsiin ja turvallisuuteen liittyvät, olivat palveluja joissa virheitä ei juurikaan siedetty. Kuitenkin myös näissä palvelutilanteissa tiedottamisella oli merkittävä rooli, usein jopa merkittävämpi kuin muilla toimilla palvelun korjaamisessa. Tutkimuksen tulokset tarjoavat uusia näkökulmia IT-pohjaisista häiriöistä informaatioyhteiskunnassa ja näkemyksiä palveluntarjoajille häiriöistä toipumiseen.
|
Page generated in 0.079 seconds