DoiT Cloud Intelligence™

Understanding Cloud Storage: GCS

By Randolph KahleNov 9, 20214 min read
Understanding Cloud Storage: GCS

There are many options for storing data in Google Cloud. This article explores one: the Google Cloud Storage (GCS) object storage.

**Mission**

The purpose of Google Cloud Storage is to persist and retrieve binary data as whole entities. If such an entity resides on your computer, you would call it a file, but when stored in GCS it’s called an object. Unlike files on your computer, GCS objects are immutable and can never be changed.

GCS objects can be deleted or replaced, but not changed. This characteristic can be leveraged in many cloud-scale architecture designs.

GCS objects are also opaque. GCS does not understand the internal structure of objects because they are simply binary data sequences. GCS uses associated metadata to keep track of aspects of the object, such as the creation date and time, the MIME type and the size of the object.

GCS objects are durable. Google stores objects using a variety of techniques to ensure at least 11 “9s”, so 99.999999999% of annual durability in a region. (To protect from loss due to locale based disasters, natural or man-made, a user may elect to store data in multiple regions or even within multiple clouds.) Eleven 9s means one billion objects stored for 100 years would result in at most one loss.

GCS objects are available. Being available means GCS can deliver an object upon request. GCS offers various SLA options for monthly availability: 99.95%, 99.9% or 99.0%. In terms of time, this is 22 minutes, 44 minutes or 7 hours 18 minutes of non-availability per month.

GCS objects are stored in buckets within a GCP project. Buckets provide storage context for the objects, including identity, the hosting project — which is associated with security and a billing account — geographic location, various policies and more.

A GCS bucket has a globally unique identifier expressed as a URI . Google uses the URI scheme gs: for GCS buckets and objects.

The URI syntax is: gs:///

For example, gs://doit-intl-storage/logo.png is a globally unique URI.

Architectural Role

GCS plays an important role in the architecture designs of many cloud systems. In general, GCS storage is the least expensive option for storing information. However, you will need to pay attention to latency, bandwidth and traditional architectural constraints to maintain cost-efficiency.

Keep in mind that GCS is a service. It is accessed via a REST API, which itself is wrapped by programming language client libraries for many languages, a CLI and many third-party tools. GCS is not a block storage system. It can’t be used directly as a file system store for a virtual machine or containers.

Costs

There are several factors that impact the cost of using Google Cloud Service. A few key ones to look out for include:

  • Size of an object
  • Duration of storage
  • Characteristics of the bucket: location, redundancy and class of storage
  • Activity, such as writing and reading an object and its metadata
  • Bytes streamed across one or more networks when the object is transferred

You can think of the costs in two ways. The first is how much it costs to store an object if nothing is done with it — once it is written, it is neither read nor deleted. This is the pure storage cost. The second way is activity-based costs such as reading, transferring, deleting, querying, etc.

Let’s talk numbers to understand the situation. For pure storage costs, let’s use the example of a single object of 1 GiB stored for one year.(Note that 1 GiB = 1073741824 bytes (= 10243 B = 230 B).)

The most expensive standard storage is in Brazil: $0.42, while the number can do down to $0.24 in many places. Standard storage is best for frequently used objects. For inactive objects, the costs for archive class storage are dramatically lower: $0.036 in Brazil, $0.0144 in other places.

Activity costs can surprise new users. There are charges to write to storage, read from storage, list objects and delete objects prematurely (if an object is stored in a long-term storage context). These charges can influence the engineering and even architecture of a cloud-scale system.

For example, I was consulting for a company that manages satellite images. Their on-premise system stored many small image tiles. When they moved this to GCS, they found that their design — reading small tiles to assemble a large image — led to extremely high costs because of the read activity.

They redesigned their system to store fewer/larger images and break them apart in memory after reading them. Very counter-intuitive, but a GCS reality.

Security

I’ll explore the broader topic of cloud security in a separate article. GCS security is managed by IAM. In the past, there were some serious security leaks because it was too easy to accidentally specify public access to a storage bucket. Google has changed that and there are many warnings in place now.

To stay connected, follow us on the DoiT Engineering Blog , DoiT Linkedin Channel and DoiT Twitter Channel . To explore career opportunities, visit https://careers.doit-intl.com .