The people, science and technology behind discovery

Best Practices for Handling the Data Explosion

By Natalie Green

First there were megabytes, then gigabytes, and now terabytes. But have you ever heard of a petabyte? Just one petabyte is the data equivalent of a stack of DVDs from here to the moon, and they’re coming to an exploration office near you.

As the quantity of exploration data increases exponentially – one estimate has it doubling every 12 months – companies that fail to efficiently organize, preserve, centralize and collaborate with their data will experience delays and missed opportunities.

Based on a combination of wider industry best practices and our own experiences helping companies effectively manage exploration data assets, Geosoft has come up with five best practices to make exploration data management more efficient:

1. Clear the Clutter:

One exploration group we worked with was using more than a hundred data products.  That number is not sustainable.  Identify which applications and formats are essential for your business and cull the rest. Here’s what you should consider keeping:

  • Raw data You may want to preserve raw data, but keep it in a compressed file that can be opened only if needed.
  • Processed data may include corrected and levelled line or point data, such as a Geosoft database (GDB) file, for the user who doesn’t want or have the time to start from scratch with raw data. Making processed data accessible to the non-specialist encourages collaboration and takes the pressure off the specialist to handle every data request.
  • Interpreted data could be a grid for a quick reference to the nature of the data, or a carefully gridded and processed product.
  • Imagery is helpful for geoscientists and/or GIS users. Several companies provide their geoscientists with grids and other data in image formats for use in GIS or other applications.
  • Metadata provides descriptive information. It can be used to categorize and organize data, query or search for datasets that meet certain criteria, or provide historical context for the origin, purpose, or validity of a dataset.
  • Survey Outlines act as a reference and have a number of general applications in GIS and other areas. For example, one organization publishes the outlines for its GIS users, while another uses the outlines to reference project reports.

2. Find Your Coordinates

It is not uncommon to hear about a target on the wrong datum or a drill hole that missed the target due to been located in wrong location; a costly mistake. All data is associated with a place on Earth – either on the surface or below the surface – and GPS has been an invaluable tool for recording accurate geographic coordinates in the field. But still errors occur in the management of coordinate systems.

Geoscientists need both spatial information that tells them precisely where the data originated and physical information that tells them where it is stored.

3. Classify

Classifications should be used to organize the data to make the discovery of data more efficient.  For example:

Continent > Country > State/Region > Project Name > Data Type

When classifying data, it is important to be consistent, use clarification as needed (e.g. the year the data was collected), and avoid burying data in too many levels: six is plenty, ten is too many.

4. Add Descriptors

Metadata is essential for exploration information management. The metadata for a digital photo, for example would include both data that is captured automatically (i.e. the date, aperture, camera, GPS location) and data filled in manually (i.e. tags and people). Geosoft recommends a standard set of minimum required metadata fields for every dataset, and provides tools and workflows to capture these in industry standard formats.

5. Make it Flow

Workflows move the data through the system from collection to collaboration and ensure that all users know where to find existing data and put new data. The workflow should be relatively painless, allowing spatial and metadata searching. Ideally, one person - a data manager or steward - is responsible for the data and data contribution processes.

The flow typically follows this cycle, with data continually being added to the flow:

  1. Find and create the data sources, including geophysics, geology, GIS and metadata
  2. Submit the data to the queue as packages
  3. Run Quality Control and Assurance (QC & QA) with the DAP repository
  4. Catalog the data using dataset and metadata indexes
  5. Publish to the Intranet and/or Internet for multi-user searching and extraction

Data preparation services

Managing new data and transferring existing data assets into new solutions requires effort. Geosoft Professional Services group has created tools to help exploration companies automate some of these procedures, such as the capture of coordinate system details.

Often these are custom tools that specifically meet the needs of one organization. For example, we created a specialized tool that outlined survey boundaries for one group that wanted to integrate their GIS and exploration solutions. Other tools, such as a configurable metadata editor, meet a more general need.

You can learn more about Geosoft Information Management Solutions online.