OpenEI:Projects/Datasets Improvements

Jump to: navigation, search

Contents

Purpose

  • Evaluate current user-experience (for uploading, browsing, and rating datasets)
  • Track recommendations for changes and improvements to Datasets
  • Identify high priority changes (big and small)

Discussions on Feb 22, 2011

The OpenEI team met, again, to discuss Datasets, both in terms of long-term goals and functionality, as well as near term metadata and UI modifications.

Longer-term Changes

  • We may want a wiki-based metadata environment
    • OpenEI community members will likely be able to improve metadata (especially for datasets accessed via a public site)
    • Could help keep dataset information "up to date" and relevant; especially if original contributor becomes less involved in OpenEI
    • We may consider an approval process (by an OpenEI administrator) for changes made to metadata, to ensure quality
  • Evaluate ways to cluster or group datasets
    • Connect different editions of a dataset (e.g. 2003 annual outlook and 2005 annual outlook)
    • Connect datasets from the same organization

Near-term Modifications to Metadata

We want to make sure OpenEI users have access to all relevant metadata submitted with each dataset, and we want that metadata to be easy to find and read. Each of the three open government sites we looked at presented slightly different metadata, but essentially included many of the same fields. Comparison of metadata presented on OpenEI, data.gov, data.gov.uk, and data.govt.nz, as of Mar-1-2011

  • Changes to UI when adding metadata
    • Make Optional Metadata title more prominent
    • Add file at the beginning
    • More clearly denote files that have been modified by contributor, and are not in the original format
  • Changes to UI when viewing metadata
    • Re-think names of metadata fields (e.g. source vs agency name)
    • Create headers or other organizational structure
    • Show more metadata fields
    • Include Dataset Titles when viewing multiple datasets with the same tag

Suggestions from Jan 13, 2011

During a meeting with several OpenEI team members, Woodjr provided most of the following suggestions.

Specific Changes to Metadata Fields (part of 'upload data' experience)

Some of the suggestions are specific suggestions to existing metadata fields, and would affect how the user adds metadata as well as how OpenEI could handle datasets.

Fields as Links (rather than strings)

In as many fields as possible, enter information as links, rather than strings. Example: Source Name (provide URL for NREL rather than just listing 'NREL').

Multiple Fields

Allow users to determine if they want to add multiple fields to a particular metadata element (e.g. Source Name). Example we like: currently, users may add an unlimited number of files to a single dataset entry. Example of metadata element we'd like to change: 'original data quality requirements'.

Data Upload Links

Instead of a single field called "alternative access", provide a field for the unique URL that exactly matches each file added (in addition to the 'description' field for each file). In the future, this will assist us with tracking if/when datasets change or are updated.

Remove 'Order' for Multiple Fields

It isn't clear what 'order' refers to when there is the option to add multiple fields for a single metadata parameter; the fact that the 'order' options include the value -1, makes it even more confusing.

Specific Recommendations for Improving/Expanding Metadata Fields

Flexible Properties/Metadata Fields

Allow users to create additional metadata fields that they want to include with their dataset(s).

Wizard Flows for Common File Types

This could allow OpenEI to pre-populate certain metadata fields based on the uploaded file. A great example is shape files, which generally have some standard associated metadata. If we implement this, uploading the file(s) should be the first step in the process (or close to the first step).

New Aspects of Datasets

Duplication/Relationship Detection

There are different ways to approach this (requiring different levels of sophistication); it may involve looking for matching URLs (of either Source Name, or the specific dataset). This would allow us to identify if the same file has been added twice, if an updated file has been added, or if a similar file has been added.

Wiki-Style Metadata

Maybe we should open up metadata to edits/additions by all registered users. What is the long-term benefit to keeping metadata 'closed'?

New Dataset 'Type': Translation (?)

When an original dataset that has been uploaded to OpenEI is translated into RDF by the OpenEI team (or any other 3rd party), the dataset should be identified as such. We may want to show that there is a different source/author for the RDF version; an analogy would be a classic text translated to another language, and the translator's name is carried forward. ALTERNATIVELY, we could just use the original source of the data, and denote the transformed dataset as "RDF-version" or "RDF-edition"; the analogy here would be most best-seller-type novels retain the original authors name when translated to another language, and are simply marked with the label, French-edition or German-edition, as shown in this example at Amazon.

Relationships/Similarities

This might be an opportunity to use OpenCalais. We might consider something like Amazon's approach, "Others who looked at this dataset, also downloaded this dataset..."

Errors/Issues (uploading datasets)

There are some small issues that arise when uploading datasets.

  1. Can't actually upload a file using Chrome
  2. Can't upload .pdf files
  3. Can't have too many URLs listed in the "Alternative Access" field (limit is 1,000 characters)