Monday 11 February 2013

NLA/Trove - unexpected behaviours: Wed 13 Feb 10:00am (Melb Time)

Recent experiences with the NLA/Trove test environment have resulted in a working list of unexpected behaviours. Most need some form of explanation.

We'll be talking through this list on Wednesday 13th February 2013 10:00am (NSW, Vic, Tas) (9:00am Qld) (9:30am SA) (8:30 NT) (7:00am WA) via GoTo Meeting - log-in details below...(places limited to 26).

N.B. This is a working list - please contact Simon Pockley about any amendments or additions you would like to make.

  1. Testing environment

    TIM Beta is the system that NLA use to check the accuracy of records before they are loaded into production. It isn’t a development system. There are limitations to the testing environment.

    a. The test NLA records cannot be deleted, so tests cannot be rerun and new people need to be created and used each time.
    b. NLA ID in TIM Beta resolves to Trove Production rather than Trove Test. This can prompt concern/confusion that valid NLA ID's might be incorrectly being ascribed to real researchers and then published on the web.

    Status - no action. The testing environment in TIM Beta does not impact on the production environment

  2. Access to TIM

    a. Access to TIM is on an individual basis without a generic link to the institution.
    Status - work-around. Establish a convention for managed usernames and passwords, for example, “ANDS.tim.matching01”, “ANDS.tim.matching02”, etc. for multiple users. A single sign-on could be established, remembering that multiple users cannot login to TIM with the same user name and password simultaneously. Generic usernames and email addresses will allow usernames to be reassigned and the password rese
    b. Also having to create individual IDs for Trove and TIM (and making these the same for the test instances of each) is cumbersome compared to AAF access to Research Data Australia.

    Status – No action. Signup to Trove is open to any user of Trove, not just those with access to the Australian Access Federation (AAF) credentials. One username and password is used for both Trove and TIM (production and test).

  3. Transparency of record status

    There’s no indication whether records have been auto-matched or gone into TIM for manual matching.

    Status - work-around. Checking which records are in TIM waiting for matching is a manual process.

  4. Identifiers lose their type

    Identifiers lose the type indicated in the contributor party record e.g. "scopus". Even NLA supplied party identifiers become type="contributor’s ISIL code".  Any identifiers and the key supplied in a contributor party record to Trove become type="contributor’s ISIL code" in the Trove party record.
    Status – low impact - no action

  5. Single element names not harvested

    RIF-CS schema guidelines state that <namePartType> is optional. The reason for variation in how names are presented is that GeoNetWork uses ISO 19115 and/or ANZLIC as its native schema which allows free text entry of names in a single element. RIF-CS party records type person that have origins in either ANZLIC or ISO19115 will be not be harvested by TROVE due to the lack of <namePartType>. Note that this issue impacts on Party person records only. Group records without <namePartType> will be ingested by Trove.
    Status – low impact - no action

  6. NLA converts keys to RDA URLS

    Not all institutions maintain or even create public profile pages. These are used as Source URLS in a Trove record.

    Status - work-around for where there are no source URLs for staff profiles is to use an optional function in Trove. If option is set to ‘on’ the entityID (Source URL) for a university's records in Trove are transformed to ANDS URLS. Trove uses a predefined URL structure to convert keys to a link to RDA and not to the researcher’s web page on the university web site. If set to ‘off’ Trove will use the source URL provided.

    For ReDBox/Mint users:

    Care should be taken that the source URL does not resolve to MINT which requires a logon, and not to a local party record as might be expected. Apply the workaround indicated above, as required.

  7. Vanishing records

    As reported by UQ:  A feed of party records was harvested but some of the records neither appeared in Trove nor in TIM - they vanished without a trace. The harvest operation did not fail, since other party records from the same harvest of the feed appeared in TIM or were automatically matched and appeared in Trove.
    Status - behaviour yet to be replicated

  8. Broken links when Researchers leave an institution (tombstone records)

    There is a need for Policy development regarding Researchers who move or even die. NLA has no control over contributor’s source links. Some universities will use the links to the researchers' local profile as the source URL. However, these links may not be maintained after the researcher leaves the universities. So in the long run there will be a lot of broken links in Trove.

    NLA comment: There is no need for a University to delete the record for a researcher when they move away from that institution. The NLA party identity is a container record and so can contain records from different universities for the same researcher. This is where some descriptive information would be valuable in the record e.g. “Left University of ZX in 2011”.

    Status – low impact - no action

  9. Inconsistent matching algorithms

    Testing Trove's automatic matching algorithms are behaving unexpectedly:

    Issue first raised by Hoylen Sue at UQ - as follows

    Situation #1: party records are harvested for the first time. That is, Trove does not contain any party records with these RIF-CS keys.

    Situation #2:  when these unchanged party records are re-harvested. That is, Trove already has existing party records with these same RIF-CS keys. In this situation UQ has found the two party records for Watson and Drebber pass "Matching Rules Part A" and replace the existing party records -- which is the correct behaviour. But the party record for Holmes does not match and goes into TIM Beta -- which is not correct. It should have been automatically matched just like the other two. Neither us nor NLA have figured out why Holmes is treated differently.

    NLA comment:

    Paul has checked this problem as much as he can and when he sends the record to TIM manually it matches. Essentially the way he's doing it is exactly the way the harvester would send the records to TIM so we have no idea why the records aren't matching when we harvest them. Paul says we need more data to work with and he's reluctant to use your data as then you won't be able to use it. Perhaps if we had lots more test records to work with we might be able to see a pattern. Sorry I don't have a solution or a reason for the records not matching.

    Status - behaviour yet to be replicated

  10. Complex matching rules

    There is an assumption that if local party records contain the NLA identifier as an identifier, then matching will occur automatically, but this is not necessarily the case. The matching rules are complex and deliberately conservative. Records which fail Part A of the matching rules then go to Part B where the NLA identifier is just one of the match points.

    Rules for matching party records: https://wiki.nla.gov.au/download/attachments/24379936/ARDCPIPMatchingRulesSpec-+Ver+2.0++20+Jan+2012.doc?version=1&modificationDate=1333064788000

    From the matching rules:

    The purpose of Part A: to decide if two records are actually the same record
    An incoming record is checked to determine if it has the same contributor ISIL and record ID as a record in Trove.

    If it does, a sanity check is applied. If the incoming record passes the sanity check, it is automatically matched and overlays (replaces) the matching record in Trove. If the incoming record fails the matching rules in Part A then matching rules in Part B are applied.

    The purpose of Part B: to decide if a record should be part of an existing identity.

    The rules specified in Part B are applied to an incoming record that has failed the matching rules specified in Part A.

    An incoming record is checked to determine if it has an NLA persistent identifier. If it does, a sanity check is applied and must be passed for a match to occur.

    Incoming records that fail the matching rules specified in Part A and B are passed by the identity service to the unmatched record queue for human review for matching or new record creation in TIM

    Example:

    A university creates a record for “Davies, Peter” containing the NLA party identifier for the existing Trove record for “Davies, Peter Eric” (the Trove record was contributed by Libraries Australia, ISIL code AU-ANL:PEAU).

    Result: Part A matching rules apply, but neither ISIL code or recordID match so Part B rules are applied.

    In the Part B matching rules the system looks at the incoming record for the NLA party identifier and the NLA’s ISIL (AU-ANL:PEAU) and the NLA party identifier, recorded in control/sources/source/objectXMLWrap.

    If the university added the NLA party identifier for “Davies, Peter Eric” and the NLA agency code to the incoming record, the system would run the sanity check. It would match on Davies, on Peter, but it wouldn’t find Eric in the incoming record and so the name would fail the matching rules. Because there is always the risk that a mistake could be made when the NLA party identifier is manually entered into another organisation’s record, the system has to be certain it’s the same person so it will always check all name parts and all name parts have to match.

    Result: In this example, if the names didn’t automatically match then the record failed the sanity checks at some point.

    Status –investigate. The example above explains why records containing an NLA party identifier will not necessarily match even though they have the same NLA Identifer.

  11. RIF-CS 1.4 mapping to EAC-CPF (Encoded archival context for corporate bodies, persons and families)

    There is currently no separate overarching application profile document for NLA use of EAC-CPF. The NLA maintain mappings from MARC and RIF but these are both in flux at present as the NLA make changes to support RDA and RIF-CS 1.4.The mapping for RIF-CS 1.4 is finished but NLA’s IT is still to finish doing the style sheet to convert the records.

    Status – in progress

  12. Name sequence or name order

    The order in which names are entered is important because the Trove System has its own way of processing first name and last name. Best to enter in natural order i.e. first name: Jane, last name: Smith

    Status – work-around
---------------------------------------------------
1. Please join my meeting.
https://www4.gotomeeting.com/join/902515079

2. Use your microphone and speakers (VoIP) - a headset is recommended.

Or, call in using your telephone.

Dial +61 2 8355 1031
Access Code: 902-515-079
Audio PIN: Shown after joining the meeting

Meeting ID: 902-515-079

0 comments:

Post a Comment

Note: only a member of this blog may post a comment.

Contributors