Titre : | Study of failures in a distributed environment new coordinated and communication-induced checkpointing protocols | Type de document : | texte manuscrit | Auteurs : | Fatima Zahra Abdelhafidi, Auteur ; Mohamed Bachir Yagoubi, Directeur de thèse | Editeur : | Laghouat : Université Amar Telidji - Département d'informatique | Année de publication : | 2017 | Importance : | 168 p | Format : | 27 cm | Langues : | Anglais | Catégories : | THESES :10 informatique
| Mots-clés : | Distributed Systems Fault Tolerance Coordinated Checkpointing Transi- tive Dependency Popular Process RDT NZC Communication induced checkpointing(CIC) | Résumé : | Distributed systems, nowadays, have gained a wide popularity due to their features like reliability, scalability and the diversity of uses in many application areas. However, such systems are susceptible to failures, ensuring their dependability is a big challenge. For this purpose, fault tolerance by checkpointing is used to cope with this problem, which enables the systems, in the event of a failure, to resume its execution from a previously saved state. Three main approaches have been proposed in the literature, namely, uncoordinated, coordinated, and communication-induced checkpointing (CIC). In this thesis, we are interested in the two latter approach (coordinated and CIC) that share the common property of domino-effect freedom and differ in the way of synchronizing and taking checkpoints. In this thesis, we propose, as a first contribution, an extension of Index based CIC pro- tocols by a rollback recovery capability. The chosen protocols are based on either classi- cal or lazy indexing strategies. The proposed extension allows us to provide a complete checkpoint/ restart solution. Additionally, it helps us to study the impact of indexing strategies, skipping techniques, failure rate and timestamping functions on the advance- ment or retrogression of the recovery line in case of failures. Besides, such a study permit us to calculate the overhead incurred and the bounds of the rollback propagation that, in turns, facilitates garbage collection invocation during normal execution rather than during the recovery. The second contribution contains two new RDT (Rollback Dependency Trackability) protocols called CSFDAS (Constant Size FDAS) and CSRDTParner (Constant Size RDT- Partner), which aim at reducing the control information overhead. The proposed pro- tocols piggyback a constant size of control information on messages while maintaining timestamp vector at each process. A simulation study shows that the new protocols achieve a good performance compared to the RDT protocols proposed in the literature. In the last contribution, we propose a Fast Non-Blocking coordinated checkpointing protocol for distributed systems with the aim of minimizing the number of requests and mutable checkpoints while reducing the checkpointing latency. Our protocol relies on two mechanisms: pigeybacking dependency information on computation and reply message, and popular processes. We also present a simulation study that compares our protocol to the CSNB protocol (Cao and Singhal Non-Blocking) and CSB protocol (Ca and Singhal Blocking). | note de thèses : | Thèse de doctorat en informatique |
Study of failures in a distributed environment new coordinated and communication-induced checkpointing protocols [texte manuscrit] / Fatima Zahra Abdelhafidi, Auteur ; Mohamed Bachir Yagoubi, Directeur de thèse . - Laghouat : Université Amar Telidji - Département d'informatique, 2017 . - 168 p ; 27 cm. Langues : Anglais Catégories : | THESES :10 informatique
| Mots-clés : | Distributed Systems Fault Tolerance Coordinated Checkpointing Transi- tive Dependency Popular Process RDT NZC Communication induced checkpointing(CIC) | Résumé : | Distributed systems, nowadays, have gained a wide popularity due to their features like reliability, scalability and the diversity of uses in many application areas. However, such systems are susceptible to failures, ensuring their dependability is a big challenge. For this purpose, fault tolerance by checkpointing is used to cope with this problem, which enables the systems, in the event of a failure, to resume its execution from a previously saved state. Three main approaches have been proposed in the literature, namely, uncoordinated, coordinated, and communication-induced checkpointing (CIC). In this thesis, we are interested in the two latter approach (coordinated and CIC) that share the common property of domino-effect freedom and differ in the way of synchronizing and taking checkpoints. In this thesis, we propose, as a first contribution, an extension of Index based CIC pro- tocols by a rollback recovery capability. The chosen protocols are based on either classi- cal or lazy indexing strategies. The proposed extension allows us to provide a complete checkpoint/ restart solution. Additionally, it helps us to study the impact of indexing strategies, skipping techniques, failure rate and timestamping functions on the advance- ment or retrogression of the recovery line in case of failures. Besides, such a study permit us to calculate the overhead incurred and the bounds of the rollback propagation that, in turns, facilitates garbage collection invocation during normal execution rather than during the recovery. The second contribution contains two new RDT (Rollback Dependency Trackability) protocols called CSFDAS (Constant Size FDAS) and CSRDTParner (Constant Size RDT- Partner), which aim at reducing the control information overhead. The proposed pro- tocols piggyback a constant size of control information on messages while maintaining timestamp vector at each process. A simulation study shows that the new protocols achieve a good performance compared to the RDT protocols proposed in the literature. In the last contribution, we propose a Fast Non-Blocking coordinated checkpointing protocol for distributed systems with the aim of minimizing the number of requests and mutable checkpoints while reducing the checkpointing latency. Our protocol relies on two mechanisms: pigeybacking dependency information on computation and reply message, and popular processes. We also present a simulation study that compares our protocol to the CSNB protocol (Cao and Singhal Non-Blocking) and CSB protocol (Ca and Singhal Blocking). | note de thèses : | Thèse de doctorat en informatique |
|