由 2017/02/01 GitLab 资料库事件来检讨备份 / 回复机制-58码农网

事由：提供 Git 版控的付费 Hosting 网站 GitLab.com，发生资料库异常而下线，恢复上线后，有六小时的资料完全救不回来。

参考资料：官方说明

以下是针对备份／回复的官方说明节录

Problems Encountered
LVM snapshots are by default only taken once every 24 hours. Team-member-1 happened to run one manually about 6 hours prior to the outage because he was working in load balancing for the database.
Regular backups seem to also only be taken once per 24 hours, though team-member-1 has not yet been able to figure out where they are stored. According to team-member-2 these don’t appear to be working, producing files only a few bytes in size.
Team-member-3: It looks like pg_dump may be failing because PostgreSQL 9.2 binaries are being run instead of 9.6 binaries. This happens because omnibus only uses Pg 9.6 if data/PG_VERSION is set to 9.6, but on workers this file does not exist. As a result it defaults to 9.2, failing silently. No SQL dumps were made as a result. Fog gem may have cleaned out older backups.
Disk snapshots in Azure are enabled for the NFS server, but not for the DB servers.
The synchronisation process removes webhooks once it has synchronised data to staging. Unless we can pull these from a regular backup from the past 24 hours they will be lost
The replication procedure is super fragile, prone to error, relies on a handful of random shell scripts, and is badly documented
Our backups to S3 apparently don’t work either: the bucket is empty
So in other words, out of 5 backup/replication techniques deployed none are working reliably or set up in the first place. We ended up restoring a 6 hours old backup.

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1由 2017/02/01 GitLab 资料库事件来检讨备份 / 回复机制

2华擎ASRock Z170M-ITX/ac搭配金士顿HyperX Fury DDR4-黑苹果Hackintosh轻鬆搞定

3从哪边看 如何判断选择适合公司的网路设备?

4Android Studio 笔记─如何删除专案

5Android Studio 笔记─建立新的Project

3从哪边看如何判断选择适合公司的网路设备?