Subject:
What does Gluster do when two file changes come in at the same time on the same file?
Detail:
The propagation issue is only a concern for Gluster Volumes configured with “Replica” copies(mirrored between two or more Gluster Nodes) of the data, this is because of a real world delay in Network and Physical Disk Transfers between the units.
Gluster maintains File versions on Replica Copies via unique file Checksums and timestamps to prevent file corruption of replica copies.
In the event that two Network Clients connected to two different Gluster nodes connected at the same time and modified each units replica copy of the same file, Gluster would usually choose the file that has the newest checksum and timestamp.
If somehow the file is modified at exactly the same epoch, then there would be a split brain issue between the Gluster nodes as the file checksums would be different and there would be no way to authoritatively choose which to use.
In that scenario there would need to be Manual intervention to compare the files and for the user to choose which version should be kept and which removed or copied elsewhere.
If you are using Gluster in a way where this is likely to happen consistently, and they are mostly looking for the HA features of Gluster and not for Performance scale-out; We would suggest only having your network clients connect to one Gluster Node and having the other replica Gluster Node there purely for Failover purposes.
On distributed Gluster Volumes where replication is not at play, there is only one file to create, lock, modify or read.
Gluster has the following article discussing the Distributed mechanics:
http://gluster.org/community/documentation/index.php/GlusterFS_Concepts
For file locking on Gluster Volumes, File locking is instant and would prevent another unit from having a lock on the same file.
This is regardless of whether it is configured with just Distributed(scale-out), Replica(redundancy) or a Distributed Replica(scale-out+redundancy) configuration.
Comments