Just over two weeks ago I asked on JISC’s “JISC-REPOSITORIES” list what features users of repositories felt were missing from their software solutions. I got twelve reponses, which is probably a fairly small percentage of people that “read” JISC-REPOSITORIES so you’ll have to consider for yourself if the results of the questions are relevant or not. As an aside, only one reply came from someone you wouldn’t describe as being a repository manager. This got me thinking that it would be good to ask the same questions of academic users – the scholars whose work we’re trying to promote the communication of – to see what differences there are. Trouble is, how do you reach these people?
Anyways, that is besides the point. You’re here to hear all about the replies right? It would be fair to say there was a Gold, Silver and Bronze and Gold was way ahead of the competion!
The Wishlist Winners:
- Statistics and reporting
- Better item/metadata management
- Automatic generation of bibliography pages/CVs
Lets expand on those:
Statistics and Reporting
Far and away the most popular request was for better support for and handling of statistics. Several respondents gave examples of the types of queries they’d like to be able to ask of the repository:
“Give me all my papers deposited in the last 6 months”
“Give me all my papers published in the last 6 months”
“How many people have followed the DOI to the publisher full-text?”
“What are the top 10 items being used in our repository?”
“Give me all the items in the repository funded under grant number x1773 last year”
“Give me a count of all the published items in the repository available as full-text”
etc.
It is clear that people are really after being able to pull out whatever they want from the system. A Library system I once worked with gave users a simplified SQL form to produce reports – a bit like Microsoft Access’ query wizard and that could be a useful feature for repository vendors to consider.
Various outputs were requested – or implied – so of course the reports need to be available as RSS, as email (and users should be able to subscribe to and automate the running of the reports they set up), and as Web pages.
The other point identified (and it isn’t a new one) was that it was difficult to compare statistics from different softwares and that some kind of standard would be useful to enable repository users to compare like with like.
One respondent mentioned ECS@Southampton’s IRStats project and that it would be nice to have it included “out of the box” for Eprints. Seems there is some potential there, but IRStats appears to be an after the event logfile processor and may or may not meet the needs outlined above – on the fly reporting by email for example – not yet anyway. It isn’t clear to me how IRStats helps to smooth differences in stats reporting across software solution either – I’d welcome comments from people with more experience on that.
Better item/metadata management
This is the ability to move items around – from School to School for example – or to change metadata fields across multiple records/items – which may or may not constitute a single “eprint”. That repository software still requires users to delete and item and then resubmit it seems antiquainted I’d imagine to most repository software users who can drag and drop files on their desktop PCs with ease. Repository and preservation people, I suspect, consider the limitation on editing an item once deposited something of a feature and you can see their point, but if repository managers find the interfaces frustrating, imagine what the academics will think! Perhaps there is scope here to separate “repository for users” and “repository for preservation” – I’ve seen architecture diagrams like that knocking around…
Automatic generation of bibliography pages/CVs
A specialised form of reporting really, but it was high in the number of requests – people requested the ability to embed the output in scholars Web pages – this could be “on the fly” or with scheduled job. Closely related to this was the ability to export references to papers in any format in order to comply with a subject’s citation style preference or to comply with an institutional reporting template/CV for the academics.
The Rest of the Wishlist:
A number of other suggestions were made and here they are in no particular order:
- Browse Filtering – for example, display all depositors at this Institution only
- Desktop Repository/Personal Research Manager – a desktop application to help researchers manage their work, with a “deposit” button built-in – I think this is a *great* idea!
- Automatic Coversheet Generation – with ability to design the coversheet and include arbitrary metadata from the database
- CRIS/IR integration and “Joined-Up” thinking
- Linkage to subject/funder repositories – the ability to push an IR deposit to a 3rd party or make it easy to manage their harvesting
- Ability to manage multiple affiliations for depositors – would require IDs for said authors too I guess. OpenID anyone?
- Granular permissions – the ability to add groups and assign rights to those groups. For example, these authors may edit the following documents only
- Related to granular permissions: Full customisation of submission forms by group, user ID, etc. – presumably this is possible with Open Source offerings provided you’re willing to get your hands dirty. The customisation needs to be manageable outside of the HTML.
- Import and Export of “packages” – where an “eprint” consists of multiple files and needs moving around. OAI-ORE?
- Greater flexibility in defining metadata fields, etc. (Hosted services are more restrictive here)
- Support for the storage and delivery of (very) large collections of data and multimedia items
- Shopping basket: the ability to collect a number of items from a repository (or, indeed, many) and then click “get them” and receive a package of all the papers requested.
- A more sophisticated content model: currently you have “Record 1->* File”, it’d be nice to have a set bit in the middle to be able to group files… SWAP?
- Making the links of authors live – so clicking will return all that author’s papers in the repository
- Some way of recording where work was done – ie. did the scholar produce this paper whilst employed here?
- Item versions – the ability to explicitly and simply link one item to another and typing the version relationship
- Better embargo management
- Better/easier/standardised OAI-PMH output to include all the community required fields – like grant numbers – an Application Profile maybe? What could we call it?
What next?
There are some interesting feature requests here – some of which are closely related – better reporting, bibliographies and “live” author names, for example. All of them sound like useful things and on first read sound like they should be simple to implement and its easy to think “why are these not *just there* already?” or “surely they’ll just take ten minutes to fix?”. Maybe some of them are simple, but some of them are like icebergs, where the simple bit above the water hides a vast complex interaction below that needs to happen inside the software system. To use another metaphor, changing software can be a bit like trying to alter the pattern on the front of a knitted jumper. You look at the circle and say “surely this can be made into a star?” but to do so you have to unpick most of the wool and start over…
That said, I’m hopeful that RSP will be able to explore some of these issues further and produce some step-by-step guides – but I’ll need to reflect on the results more before I can say what these will be…
Thanks again to JISC-REPOSITORIES list members for their input!
Of the first category (reporting & stats) the following are examples of simple repository searches that you should already be able to do
(a) “Give me all my papers deposited in the last 6 months”
(b) “Give me all my papers published in the last 6 months”
(c) “Give me all the items in the repository funded under grant number x1773 last year”
(d) “Give me a count of all the published items in the repository available as full-text”
This one is only possible if we actually make repositories track accesses to external websites (possible, but a bit machiavellian, like Google)
(e) “How many people have followed the DOI to the publisher full-text?”
and this one is genuinely a stats question.
(f) “What are the top 10 items being used in our repository?”
Its answer turns out to be less interesting than you might think. Particularly because it changes so infrequently!
The section “Better Item/Metadata Management” mentions “the ability to move items around – from School to School for example”. Presumably this is only a problem in DSpace? (Is it still a problem? I thought it had been fixed in later versions of the software.) Other platforms don’t have this problem because they don’t have such rigid demarcations into “communities” – it’s all done by views on the metadata.
The ability “to change metadata fields across multiple records/items” is something that was added in EPrints 3.1 with the Batch Editor. However it’s not as powerful as needed for doing RAE/REF-quality metadata management, so we’re working on an EPrints-Excel-EPrints roundtrip to allow spreadsheet-based editing for this kind of hard-core activity.
The issue of automatically-generated bibliography pages & CVs is a really important to just about every researcher, research group and department, so that’s well established in EPrints. Especially the ability for institutional portals to import repository-generated bibliography listings into a corporately controlled environment. See http://www.ecs.soton.ac.uk/people/nrj/publications as an example – the data is generated by the repository but listed with summary information from other databases (teaching responsibilities etc). The format for displaying bibliographies is determined by a bibliography style language, just like other bibliography software.
Of course, “the rest” of the wishlist is where it all starts to get very interesting – it’ll be a while before all of those wishes get satisfied. Still, it’s good to see that the answer to the majority of people’s requests (on this survey at least) are closer at hand
Thanks for your useful comments Les!
I deliberately didn’t ask for what software people were using, but Eprints was mentioned in the context of the “top 3″ along with DSpace so I don’t think things are quite there yet.
I suspect (and one respondent indicated this) that some of the things are “implementable” (and perhaps not easily) rather than “out of the box” features. It is also possible that there is a a gap between what people think they can do with their software and what it does do – a question of documentation perhaps?
Finally, re: the reporting. Its not searches – within a Web interface – that people are after I think, but formattable reports, perhaps as csv files – which is where going out to Excel might be handy – to include in management reports.
Take your point about the top 10!
Another item on the wish list.
With any subset of items in a repository, I would like to be able to assign ‘ownership’ to another author/administrator.
So, if I import metadata from endnote, for all the entire research output of a particular research group, I can assign ownership to an individual within that group, who can enrich the metadata and add the papers themselves. This could also be useful in delegating word.
I currently do this via SQL queries.
David Kane,
Waterford. I.T. Library,
“It is also possible that there is a a gap between what people think they can do with their software and what it does do ” – sounds like an excellent reason to contact the friendly Repository Support Project staff!
Usage statistics are potent evidence for promoting the value of IRs to the different user groups and to policy makers. I have been using the data from the stats packages already in place in some IRs, but the numbers of IRs providing this information is very small. So as an OA-warrior I would urge all IRs to implement the stats packages asap.
IRs are wonderful resources for the economically weak countries where I operate and presenting usage figures from researchers in these countries helps show the importance of these resources to development agencies and funders.
We also need evidence/stories showing how usage of IR resources has had a real impact on someone’s research.
So it’s good stats came top of the list.
Interesting.. it is go to see a few of the things wanted by end users (you know, those people who created the materials for repositories in the first place)
- multiple version management
- multiple affiliation management
- end user input tools
- ability to link Grants etc from multiple sources and grantee and participation roles (researchers do this all time, they are NOT mapped one to one to single organisations!)
id like to add
- easier z30.50 download subset collections (goes a long way towards some of the otherwise SQL syle of additions required, especially if fully structured endnote Connect files are crafted well.
On a broader front
As peter says, why not actually involve the researchers themselves! We dont bite, just want a solid feedback and genuine role in the process. JISC and OI havnt exactly been models of such approaches- so it takes a really persistent, motivated and determined researcher to keep on plugging away to try to be taken seriously (so we often have to build our own complete repositories and sometimes the whole software and data architecture as well (im on my fourth right now), just to be able to get any usable repository services….
Even a slight shift in approach towards actually seriously involve us will draw us in easily – we cant wait!
“sounds like an excellent reason to contact the friendly Repository Support Project staff!” – Exactly!