Evaluate Mechanisms for Capture, Update, and Retrieval of Catalog Entries
AWS S3 ETag
Every S3 object has an associated Entity tag or ETag which can be used for file and object comparison.
The ETag may or may not be an MD5 digest of the object data. Basically, if the object was uploaded with a single PUT operation and doesnt use Customer Managed or KMS keys for encryption then the resulting ETag is just the MD5 hexdigest of the object. If an object is created by either the Multipart Upload or Part Copy operation, the ETag is not an MD5 digest, regardless of the method of encryption.
For multipart uploads the ETag is the MD5 hexdigest of each part’s MD5 digest concatenated together, followed by the number of parts separated by a dash.
E.g. for a two part object the ETag may look something like this: d41d8cd98f00b204e9800998ecf8427e-2
Which can be represented by: hexmd5(md5(part1) + md5(part2))-2
Many S3 clients store a pre-calculated MD5 checksum of the object for use in comparison and sync operations. This is time consuming and essentially obsolete as the existing ETag can be used for comparison resulting in quicker uploads/sync operations.
Clients should instead implement a method to compute an ETag for local file comparison.