Go to the Data Storage Finder
1. Storage Type
Active: Used for data that will be frequently accessed for processing through network mounting to a local device or through some type of network transfer protocol.
Nearline: Used for data that will be infrequently accessed for processing. Access is usually through some type of network transfer protocol.
Archival: Used for data that is not modified, with an emphasis on preservation. Access is usually through some type of network transfer protocol.
Repository: Used for data that is not modified, with an emphasis on sharing. Access is usually through some type of network transfer protocol.
2. Access
These selections are generalized scenarios and are meant to be flexible. We tried to account for access types most frequently requested for research data at Penn State.
3. Classification
Low (Level 1): Unauthorized access, use, disclosure, or loss is likely to have low or no risk to individuals, groups, or the University. These adverse effects may, but are unlikely to, include limited reputational, psychological, social, or financial harm. Low Risk Information may include some non-public data. Examples include:
- Data made freely available by public sources
- Published data
- Educational data
- Initial and intermediate Research Data
Moderate (Level 2): Unauthorized access, use, disclosure, or loss is likely to have adverse effects for individuals, groups, or the University, but will not have a significant impact on the University. These adverse effects could include but are not limited to social, psychological, reputational, financial, or legal harm. Examples include:
- Non-PII student records
- Personnel records
High (Level 3): Unauthorized access, use, disclosure, or loss is likely to have significant and severe adverse effects for individuals, groups, or the University. These adverse effects could include, but are not limited to, social, psychological, reputational, financial, or legal harm. Compliance requirements are not as strict as for Restricted Information. Examples include:
- ersonally Identifiable Information (PII) as defined in Privacy Policy AD53
- Health Insurance Portability and Accountability Act (HIPAA) data
Restricted (Level 4): Access and use is strictly controlled and restricted by laws, regulations, or contracts. Unauthorized access, use, disclosure, or loss will have significant legal consequences, including civil and criminal penalties, loss of funding, inability to continue current research, and inability to obtain future funding or partnerships. Examples include:
- Payment Card Industry Data Security Standard (PCI-DSS) Data
- Data subject to Federal Information Security Management Act (FISMA) moderate or high standards
4. Sharing
Some tools are designed with collaboration and sharing in mind, others may have limitations (e.g., sharing only within the Penn State community). Penn State Health/College of Medicine storage options require PSH/CoM ID’s.
5. Data Protection
Backups, Snapshots, and Replication offer different types of protection against loss of data access.
Backups and snapshots protect against corruption or unintended user deletion. Previous versions can be kept with options for local or remote storage depending on your needs.
Replication is an availability option that protects against equipment failure and minimizes downtime. Replication doesn't protect against errors or accidents and cannot be used to restore deleted data.
All services can be set up to have backups, snapshots, and replication; services where that protection cannot be configured natively (i.e., require a secondary service to provide protection) will drop out when these items are selected.
TERM | DEFINITION |
Low Velocity Data | Data which undergoes less than a 20% daily change rate. |
High Velocity Data | Data which undergoes a 20% daily change rate or more. |
Storage Types: | |
Active | Used for data that will be frequently accessed for processing. |
Nearline | Used for data that will be infrequently accessed for processing. |
Archival | Used for data that is not modified, with an emphasis on preservation. |
Repository | Used for data that is not modified, with an emphasis on sharing. |
Block | Commonly used by databases and other high-performance applications. Files are stored across one or more blocks and each block is addressable. |
File | Commonly used when organization, data protection, and sharing are important. Each file has an address. |
Object | Commonly used for large data sets. Each file is stored as an object with a unique identifier. |
Database | An organized collection of data usually accessed through a database management system. |
Transfers | Sending data from one storage system to another. |
Cost Definitions: | |
$ | Up to $500 annual cost to store a Terabyte of data |
$$ | $500-$1000 annual cost to store a Terabyte of data |
$$$ | Greater than $1000 annual cost to store a Terabyte of data |
Data Classification/Types Definitions: | |
Low (Level 1) | Unauthorized access, use, disclosure, or loss is likely to have low or no risk to individuals, groups, or the University. These adverse effects may, but are unlikely to, include limited reputational, psychological, social, or financial harm. Low Risk Information may include some non-public data. |
Moderate (Level 2) | Unauthorized access, use, disclosure, or loss is likely to have adverse effects for individuals, groups, or the University, but will not have a significant impact on the University. These adverse effects could include but are not limited to social, psychological, reputational, financial, or legal harm. |
High Risk (Level 3) | Unauthorized access, use, disclosure, or loss is likely to have significant and severe adverse effects for individuals, groups, or the University. These adverse effects could include, but are not limited to, social, psychological, reputational, financial, or legal harm. Compliance requirements are not as strict as for Restricted Information. |
Restricted (Level 4) | Unauthorized access, use, disclosure, or loss will have significant legal consequences, including civil and criminal penalties, loss of funding, inability to continue current research, and inability to obtain future funding or partnerships. |
Durability Definitions: | |
Backup | An instance or copy of physical or virtual files or databases to a secondary location for preservation in case of equipment failure or other catastrophe. |
Point-in-time Snapshots | A view of a filesystem or block device as it existed at a specific time for data protection and disaster recovery purposes. Usually located on the same primary system so does not offer the same protection as backups but may be self-service recovery to previous state usually in the event of human error. |
File Versioning | Creation of multiple copies of files or objects, allowing storage, tracking, and retrieval of specific versions as files change. |
Availability Definitions: | |
Replication | A property or option in a storage service that replicates changes from a primary system to a secondary system. Often used for keeping data accessible in the event of a primary failure or scheduled maintenance. |
Synchronization | A user-driven and configured process that copies/synchronizes changes to data between one system and another. Often used for accessing cloud data offline. |
Durability and Availability Ranking Definitions: | |
High | This feature is included in the service by default - would have to opt out for it to not be in place. |
Medium | This feature is a configurable option or could be provided by combining it with other services (eg Red Cloud + EZBackup) |
Low | There are no options for this feature with this service/product. |
Technical Complexity Definitions: | |
High | Configuration and management require advanced IT knowledge and service may require command-line interaction for use or knowledge of a programming language for advanced usage. |
Med | Initial configuration may require advanced IT knowledge, but the typical/average user can manage the service once initial configuration has been completed. |
Low | Typical/average end user can configure and manage the service without additional local IT support. |
Capacity Definition | Here, we include three considerations for capacity: individual file size limits, total storage limits, and restrictions to total numbers of files in a folder. |