Metadata Stores in Australia: August 2012

Monday, 27 August 2012

Project Governance - roles and responsibilities

To date, in the Metadata Stores Program, 11 Universities have established Project Steering Committees or Project Management Groups. Some of these groups meet monthly, some quarterly. Some are formally constituted with Terms of Reference (TOR). Some have ANDS representatives on them and some will continue on after the ANDS project completion date.

For those of you still working towards some form of Project Governance, the following outline of the various roles and responsibilities is intended to help.

Ideal structure

Roles and responsibilities

Responsible for the Steering Committee
- Sets terms of reference
- Provides direction and guidance
- Sets priorities
- Sets funding level
- Sets timeline
- Sets deliverables
- Appoints chair person
- Provides resources
Promotes the project work to VC colleagues and subordinates

Steering Committee or Project Board

Provides reports, advice and feedback to DVCR and other senior members of the university
Owns terms of reference
Monitors expenditure of funds
Reviews management of project risk
Ensures reports are regularly submitted to higher bodies and DVCR
Offer professional advice and support
Promotes the project work to colleagues and subordinates
Recruits subordinates to assist project work
Supports Project Manager

Provides resources
Provides support
Provides feedback
Provides guidance, direction and assistance [including direction on how to respond to events or constraints that are outside the control of the project]
Reviews project progress
Reviews each completed phase and approves progress to the next phase

Key behaviours

Generally, the Committee or Board manage by exception. This ensures that all members of the board are given equal opportunity to participate in Board level decision making processes and feedback from all members is actively solicited Project Board members do not regularly delegate attendance at meetings.

Steering committee/Project board roles

Chair / Executive

The Chair ultimately is responsible for the project, supported by the other board members.They ensure that the project is focused on achieving its objectives and delivering deliverables and outcomes that will achieve the projected benefits. The Project Chair will...

Be the decision maker with overall authority for implementing the Project Plan
Ensure that there is a coherent project organisational structure and a logical set of plans to deliver the project deliverables
Ensure that risks are being tracked and mitigated as effectively as possible

Other members

Represents the interests of their departments to all researchers who will be involved in the project.

Have authority to resolve project requirements and priority conflicts.

Ensure that appropriate quality control procedures are used to ensure the project meets ANDS and the universities own requirements.

Project Manager

Responsible to the Steering Committee
Promotes the project work to colleagues and subordinates
Responsible for day to day work of the project
Works with colleagues to deliver project outputs

Friday, 24 August 2012

D 5: ARC and NHRMC registries as ‘sources of truth’.

ANDS staff have been busy with a swag of questions relating to the dimensions of Deliverables. Rather than post these all at once, it's probably more useful to keep them separate for further reference.

This question from La Trobe University is about Activity records...

…just looking at the acceptance criteria that you have put up for MS required deliverable D5, particularly in the light of how grant registry activity records are now being handled for ARC and NHMRC, I am hoping to clarify what is required under D5. As ANDS is now facilitating the direct migration of Activity records out of data from the ARC and NHRMC registries into RDA, (activity records to which our collection records presumably will link where appropriate), what direct dealings with these Activity records will our metadata stores be expected to have? Presumably we won't be originating any Activity records for ARC or NHMRC grants for harvest by RDA.

Should we instead be populating our metadata stores with the same activity records that RDA will be acquiring directly from the grant registries (and effectively just duplicating locally what RDA is already doing nationally)?

Should we be acquiring those records via RDA (to ensure alignment with RDA, inclusion of RDA data elements, etc.)?

I'm curious about what sort of role these records would be expected to have in our metadata store when we have not created them nor, presumably, added anything to them locally, and where we are not responsible for ensuring their alignment with any ‘sources of truth’. It puts them into a different category altogether from the party records and collection records that we will be storing, where we do have responsibility for their creation (including alignment with ‘sources of truth’) and integration into NLA or RDA. In this case we simply seem to be taking on responsibility for integrating these records from RDA into our local metadata store rather than the other way around as applies to our other types of records.

I'm curious as to what kind of outcome ANDS is looking for out of this process, and what functions the local versions of these records are supposed to be supporting, especially as ANDS itself has now taken responsibility for aligning activity records relating to ARC or NHMRC grants with their ‘sources of truth’.

Regarding grants that are not from either ARC or NHMRC, I assume that we should be creating these records locally and providing them for harvest by RDA as activity records and then post-harvest integrating additional elements added in by RDA to ensure that we retain alignment with the RDA versions of those records (much as we would do with party records harvested by the NLA).

Answer:

At the time that the Metadata Stores Program was initiated, it was anticipated that we (ANDS) would be able to facilitate the development of a service from ARC and NMHRC that would provide an automated content feed to Activity records. ANDS coverage of ARC/NMHRC Grant Activities is currently 2000-2010 (we can also make records for 2011 on request). These records were intended to act as place holders until the automated service was in place. Regrettably, the development of such a service is taking longer than we had anticipated. In fact, such a service is unlikely to exist before the Metadata Stores Program finishes.

Consequently, we need to be fairly pragmatic about this deliverable. This means that you are correct in thinking that you should continue to acquire those records via RDA and that it will be ANDS that aligns the Activity records relating to ARC or NHMRC grants with 'sources of truth' (for now). However, we would hope that you would have the ability to enhance your Activity records by updating them to include more accurate and current project descriptions that reflect the evolution of what a project aims to achieve and how it is being conducted.

Regarding grants that are not from either ARC or NHMRC, yes, the same principles apply: we expect you to create rich records locally and provide them for harvest by RDA as Activity records.

D5 Acceptance: Integration with the ANDS records derived from the Grants Registries will be demonstrated by inspection (by CLO) of nominated examples of Activity records. An expected date should be included in the Project Plan.

Thursday, 16 August 2012

Testing your NLA/Trove Party infrastructure

These instructions have just come in from National Library of Australia (NLA). They should be of interest to those people wanting to get moving on the NLA/Trove Deliverable...

Before the NLA can harvest records from a Contributor's website you will need to complete steps 1-4 in the following sequence:

Note. This may be able to be completed in a few days but may take a couple of weeks, depending on the NLA's work pressure and resources.

A Contributor should have records to contribute to the NLA Party Infrastructure
A Contributor can make these records available for harvest via OAI-PMH in a schema we can process (RIF-CS or EAC-CPF)
A Contributor provides details to Trove of the following:
* Contributor name:
* Contributor ISIL:
* Contact person:
* Schema used for records:
* Base URL for harvesting by NLA via OAI-PMH:
Note. Press for links to Training and detailed instructions on ISIL codes.
When this information is received Trove submits a request to the NLA's IT Section to set up the Institution as a Contributor in Trove as well as for the Trove Identities Manager(TIM) - (Note. it can take a week or more for this to happen)
When this has been done the Trove team will set up the NLA TEST Harvester to do a Test harvest of the records. These records are checked for content and if changes are required the Trove team will communicate with the nominated contact person. If Trove needs to make any alterations to the transformation steps for the records, the Trove team will have to submit a request to NLA's IT support.
When any issues are resolved, Trove will do a full Test harvest and pass the records through the auto-matching process and the records that pass the auto rules will be loaded to Trove TEST. The records that fail the auto match process will be loaded to the unmatched record queue in the TIM Beta system.
The Contributor will then be given access to the TIM system where they can view their un-matched records so they can manually check for existing names to match against or to create new records from their unmatched records. This access requires the Contributor to be a Trove registered user, by signing up to both the Trove production service [http://trove.nla.gov.au/] and the Trove Test system [http://trove-test.nla.gov.au/] using the same user name, password and email address.
If there are no issues with the records in TIM Beta and Trove Test, the Trove Team will then complete the work and do a harvest into Trove and TIM. The records will then be publically available with their NLA party identifier.
Trove will then set up a schedule to do automatic incremental harvest at a time and frequency that suits the Contributor.

A party record is allocated an NLA Party Identifier when it passes the auto-matching rules in the identity service processing and is displayed in Trove's People and Organisation zone. For records that do not pass the auto-matching rules the records are loaded to the Trove Identities Manager (TIM) where manually reviewing is required. When an unmatched record in TIM is matched to an existing identity or is used to create a new identity, this record acquires a NLA identity and is displayed in Trove.

Note: TIM is the system to manage the unmatched records not the system for a contributor to add their records.

Testing:

NLA will do the initial testing to make sure the records are being harvested, transformed, matched properly through the auto-matching rules and loaded to Trove.

When the NLA has them in the Trove Test and TIM Beta the Trove Team will let the contributor know and they can check to make sure their records are displaying the correct information in Trove.

Questions are welcome...

Friday, 10 August 2012

Challenges for Developers at eResearch Australasia conference 2012

This year's eResearch Australasia conference in Sydney is a great opportunity for Developers to get-together and talk about the exciting work you are doing.

ANDS, NeCTAR and RDSI are encouraging Developers (and designers)to have FUN with activities that will help inspire new ideas.

See this announcement about the Developers Lounge for what is in store:

David F. Flanders (ANDS) is very keen for you to participate in the kick-off hardware hackfest at Sydney University on Sunday the 28th of October.

For more information, you can contact him at [david.flanders@ands.org.au].

Tuesday, 7 August 2012

Storage-coupled data capture?

Peter Sefton (UWS) has written up some notes from the Metadata Stores round table in Sydney on the 27th July 2012.

He is keen to get some feedback about the architectures that projects are using, particularly in the ReDBox community.

Peter's post provides:

1. A quick survey of architectural approaches to connecting Data Capture applications to Research Data Catalogues and central repositories.

2. A short discussion of requirements for Data Capture application that would run on a file store and allow researchers to create useful collections of data from any file-based data.

If you support building or adapting a Storage-coupled data capture app as part of your Metadata Stores projects then, please go to his UWS blog and comment.

Note. By way of an answer to a line from Peter's Post: '...feed data to what ANDS calls Metadata Stores'...The origin of the term 'Metadata Stores' is shrouded in the mists of time. However, my understanding is that it was intended to point towards infrastructure rather than the Institution Repository (IR) space.

Friday, 3 August 2012

Metadata Stores Community News #6

Metadata Stores - fortnightly Q & A: Is there any interest in having a fortnightly Q & A (Tele or video conference) where project issues can be discussed? FAQs are in currently being assembled for your comment.

Acceptance Criteria for required Deliverables: [press link] (thanks to Vicki Picasso and Alan Glixman for prompting discussion about these criteria).

Bi-directional links in Research data Australia (RDA) to external related objects:Your input?

The ability to opt-in or opt-out (in RDA) to bidirectional links to external objects was developed form a rationale where a data source was considered to represent one authority of information. If a link inferred from one authority is displayed within information obtained from another, there is a risk of mis-attribution of information. Currently, the default for automatically enabling two-way links in your own data source is 'on and the default for automatically enabling two-way links for external data sources is 'off’. ANDS is considering changing the opt-in/opt-out function so that bidirectional links for related Parties (internal and external) are always generated. We ask for your comment on either removing the opt-in/opt-out function or, making the default choice 'opt-in' but retaining the ability to 'opt-out'. Comments please.

Sharing Cross Walks, Mappings and XSLTs: [press link]

This is a forum for sharing and discussing semantic mappings and crosswalks in any state of development. Many ANDS partners find that schema mapping is a difficult step in completing their ANDS projects, and it is helpful to see how other people have approached the task. If similar schemas are being used, existing crosswalks may also provide a head start in the development process, if the original creator is willing to allow such use.

If you are posting a crosswalk, please include the following information either in your post or within the crosswalk itself: names and versions (or profile or element set details) of the source and target schemas, a URL if the crosswalk is available publicly, and any restrictions on how the crosswalk can be used.

Featured question: Is it mandatory or optional to provide Party records to Research data Australia (RDA)? (thanks to Hoylen Sue - UQ)

At least one party record needs to be related to each collection described in Research Data Australia. Collection records connect to Party records using the local party record key or (preferred option) by including an NLA party identifier in their related object element. NLA identifiers are entered in the RIF-CS identifier element of local party records (type="AU-ANL:PEAU"), and as a key in the RIF-CS related object element of activity, collection and service records.

Since the preferred model is to send party records directly to NLA Trove (instead of via RDA), then sending collection records to RDA with the NLA Party Identifier embedded in them will be sufficient. RDA will then source a Party record from NLA Trove.

See link: Best practice for creating party records

------------------------------
Best wishes
Simon

Metadata Stores: Acceptance criteria for required deliverables

Principles:

ANDS requires a Project Plan early in the project, in order to finalise project scope, choice of software and to confirm appropriate resourcing and planning;
If a project is embarking on metadata stores without existing infrastructure, then ANDS will not require delivery of the complete metadata store, nor all the deliverables depending on it - until the end of the project.
If a project is embarking on metadata stores with existing infrastructure, then ANDS expects that there will be a feed of records supplied to RDA (see Deliverable #1) by the middle of the project.
Remaining mandated deliverables are likely to have multiple dependencies on other software and organisation units, so ANDS will not require delivery until the end of the project.
ANDS encourages project to schedule deliverables earlier than agreed in the project description where possible.
Because there is likely to be a lengthy period between the Project Plan and other deliverables, ANDS requires regular and frequent progress reporting - every three months. Reporting will be lightweight, just a couple of pages, but ANDS needs to monitor progress closely, given that these are infrastructure projects with a large number of dependencies.
For consistency, ANDS is maintaining a ratio of payments across all projects of 25% each payment period.

D1	A working feed of records describing Collections and associated Activities, Parties and Services to Research Data Australia, in the current version of RIF-CS (1.3), demonstrated to meet the quality requirements for RIF-CS records as set by ANDS. Acceptance: If the project is using an existing technology* then this feed is expected around the middle of the project. The feed will be confirmed by an inspection (by CLO) of a sample of nominated records in Research Data Australia.* If the technology is bespoke then an expected date should be included in the Project Plan. Existing technology means that there is already a working Metadata Store. CLO = Client Liaison Officer (ANDS.
D2	A feed of collections from at least three distinct Faculties (or equivalent organisational units) within the institution to Research Data Australia. Acceptance: This spread across Faculties is intended to support an institution-wide approach. The feed can be automated or manual. The 3, or more, Faculties (or equivalent) will be confirmed by an inspection (by CLO) of a sample of nominated records in Research Data Australia. An expected date should be included in the Project Plan.
D3	Demonstrated alignment of metadata records about Parties with an institutional name authority (HR or Library), with the authoritative form of the name sourced external to the metadata store, and with new researcher descriptions added to the metadata through regular updates from the name authority. Acceptance: This interface between one or more nominated sources of party record details and the Metadata Store is expected to be confirmed by a statement of achievement by Project Manager. Specifically, this will be demonstrated by an alignment of metadata records about parties with an institutional name authority (HR or Library), with the authoritative form of the name sourced external to the metadata store as well as new researcher descriptions added to the metadata through regular update from the name authority be confirmed by written statement by project partner. An expected date should be included in the Project Plan.
D4	Demonstrated alignment of metadata records about Parties with the ARDC Party Infrastructure Project, with researcher descriptions contributed to the NLA, and with People Australia identifiers for researchers recorded against researchers. Acceptance: Alignment with the NLA Party Infrastructure will be demonstrated by an inspection (by CLO) of examples of records using NLA Identifiers in Research Data Australia, as nominated by Project Manager. An expected date should be included in the Project Plan.
D5	Demonstrated alignment of metadata records about Activities with institutional and external sources of truth (Research Office, ARC and NHMRC grant registries), with the authoritative description of the Activity sourced external to the metadata store, and with new researcher project added to the metadata through regular updates from the sources of truth. Acceptance: Integration with the ANDS records derived from the Grants Registries will be demonstrated by inspection (by CLO) of nominated examples of Activity records. An expected date should be included in the Project Plan. See updated criteria for this Deliverable
D6	Demonstrated workflow for registering new Collections in the university; this can include automated update, or semi-automated (notification-based). Acceptance: This will be demonstrated by a document description (with schematic) of the workflow that includes some form of alert or notification that a new collection has been, is being, or is about to be, created. An expected date should be included in the Project Plan.
D7	A software system to realise deliverables D1–D6 (and D8, D13–D14 if applicable), with robust storage and management of metadata. Acceptance: ANDS does not intend to assess software code already assessed or approved e.g. ReDBox, VIVO. A detailed diagram(s) of the working metadata store with associated use-case(s) will be used to assess the deployment by the ANDS Technical Assessment Group and will be shared with ANDS partners (note - please insure that these diagrams are licensed as CC BY).
D8-D13	Optional Deliverables Acceptance: ANDS expects each institution to choose at least one optional deliverable. Progress of chosen optional Deliverables is expected to be included in the ANDS Progress Report Template.

Metadata Stores in Australia

Pages