The rapid data growth in manufacturing and similar industries has created huge pools of data which organizations can draw from to refine their processes, manage risk, and drive product innovation. Key business decisions are increasingly being made based on these big data insights. As a result, having ready access to retained data so it can be quickly and easily analyzed is becoming a higher business priority.
When it comes to data storage, many organizations often just think in terms of ‘hot’ and ‘cold’ tiers – ‘hot’ for data that must be easily accessible and ‘cold’ for data that is less frequently used. In this framework, hot data is typically kept on expensive SAN or NAS storage (flash or high-performance disk), and cold data is stored offsite on tape or in a public cloud. However, object storage provides a ‘warm’ data tier that provides quick access and is much more scalable and cost-effective than traditional SAN and NAS primary storage. In addition, object storage avoids the latency of bringing data back from offsite tape or cloud storage.
Having this warm data tier is particularly beneficial because it means organizations can afford to keep more data where it is readily accessible for analysis, which is key to gaining more actionable insights for improving operational and/or financial performance.
Keeping more data warm and easier to access
The standard approach to data tiering, where businesses would progressively move data that they need to search or access less often to cheaper ‘colder’ tiers, is becoming increasingly impractical as data volumes continue to grow. This method involved buying a new batch of storage equipment whenever there was an IT refresh, moving all the data down to the next cheapest tier, and then retiring the oldest batch of equipment.
The problem with the above method relates to the sheer volume of data now bring produced in many industries. This approach evolved in a period where there was simply much less data to store, but continually buying new expensive local storage is neither practical nor cost-effective today. Though the problem of data growth is an issue affecting every industry, in industries such as manufacturing, such growth can easily generate multiple petabytes annually – making this issue particularly troublesome. Rolls Royce, for example, can produce half a terabyte of machine data from the production of just one fan blade, with the yearly quota of production therefore exceeding three petabytes for just one component. It’s easy to see how data volumes like this can push costs to unacceptable levels.
Object storage provides massive scalability at low cost. The technology is designed for limitless growth through a horizontal, scale-out approach that allows organizations to increase capacity by simply adding nodes whenever and wherever needed. Moreover, because object storage uses a single, global namespace, this scaling can be done across multiple geographically distributed sites but managed centrally.
By incorporating object storage, organizations can better capitalize on the business advantages offered by big data, as they can store more data in a way that is easily accessible.
Another key benefit of object storage is its advanced metadata features. In contrast to traditional block and file storage’s very limited metadata capabilities, object storage enables users to add rich metadata tags, making it much easier to organize, identify and retrieve data.
For example, a traditional machine log file would likely have a limited amount of metadata associated with it, such as the creation date, owner, location and size. In comparison, a machine log object can contain “user defined” metadata that provides additional searchable identifiers on the data object. This could include the originating machine’s details as searchable information, such as the machine name, IP address, and any information that’s parsable from the machine to the S3 API PUT command.
Object storage also employs the S3 API, which has become the de facto standard of cloud storage, making it ideal for integrating on-premises private cloud environments with the public cloud. This hybrid cloud capability is especially important as organizations increasingly realize that the public cloud is no magic cure-all when it comes to getting the most out of machine data.
In the case of storing large data sets in the public cloud, data access and egress fees—along with network charges for connectivity to the cloud—can be quite unpredictable and costly. Public cloud performance (i.e., data transfer time to and from the cloud) is also highly variable, as it depends on the available WAN bandwidth and the cloud provider’s overall workload at a given time. When large volumes of data are involved, the latency in accessing data from a public cloud can be significant, thereby negatively impacting business operations.
On-premises object storage offers the scalability and flexibility of the public cloud without these drawbacks or the need to rely on a third-party for data security. In addition, on-premises object storage aligns with the concept of ‘data gravity,’ in which large volumes of data are harder to move from where they’re created, making it more efficient and cost effective to store the data locally and bring compute to the data.
Unlocking the full value of machine data
Retained data is continuing to provide more and more valuable insights for businesses in every sector. But the value of machine data goes far beyond studying the behavior of machines in the short term. Machine data contains a wealth of valuable information about an organization and its processes and can provide a basis for long-term business innovation. Object storage allows organizations to retain more data by reducing the cost of keeping it readily accessible and provides other key advantages over traditional storage systems, making it ideal for uncovering these hidden gems among the noise.
Neil Stobart, VP Systems Engineering, Cloudian