Had a couple of interesting questions from people contacting RSP these last few days. I wanted to record my replies and so this post and the next are just that. The first question was about repositories -vs- “web-based storage”:
The question was put to me as this:
“an enquiry regarding the advantages of using repository software over other web based storage solutions”
there is lots in that and in discussing it here at UKOLN I got different views on the question – one thought that the problem with with terminology – do we use the word “repository” or “web-based storage”? I’m not sure that is what you are asking, but if it is, then I think you have to call the repository whatever you and your users feel most comfortable with. “Repository” is a terrible, ugly, hugely overloaded term and not one likely to make sense to anyone! The second thought suggested you were asking about the differences between the two and a third suggested that the query was in fact “why do we not just develop our own, in house, repository software?”. All these questions are linked…
The answer, I think, is right back at the start and can be summarised as “it depends on what you want to use it for”. That is to say, what are the user requirements that the system is intended to fulfill?
In the RSP Briefing Paper “Repositories, Content Management Systems and Portals”, some of these requirements are given:
* Open Access – the ability to make available online the material in the repository and control access including time-based release, as required.
* Accessible to search engines – pretty much goes without saying these days!
* Persistent identifiers – should be self-explanatory. You could call this ongoing access – a commitment from the institution to make the items in the software system available until the end of the Internet or the world – which ever comes first.
* Bibliographic metadata – a set of data associated with an item of content that records useful information such as Author, Title, etc. – in a controlled fashion – enabling search by surname, “give me a list of papers I’ve contributed to since 1986″, etc.
(I’d add here and suggest even more important and useful than bibliographic metadata:
* Management/structure metadata – this is more important than bibliographic and includes things like version – eg. is this the latest version and if not, where can I get that? Also “how many times has this paper been downloaded/cited/viewed?” and other statistical data.)
* Export/import – of content from Web pages, CVs, RAE data, research systems, etc.
* Metadata harvesting – some services are trying to make use of the stored metadata to provide more sophisticated search/browse services. At present, to achieve this they use the OAI Protocol for Metadata Harvesting (OAI-PMH). It is expected that the software system will support this protocol if it is to meet the requirements of research output management software.
This set of requirements has not been plucked from the air – it is a result of the “repository movement” that has over the last eight years investigated what might be required to support what a user needs from any research output storage, disemination and management system (“ROSDMS” is less catchy than “repository” and “repository” less expensive than “publisher”
). For further details of this investigation some use cases and other materials are on the Repositories Research Team wiki:
http://www.ukoln.ac.uk/repositories/digirep/index/All_the_Scenarios_and_Use_Cases_Submitted
Having outlined those “one size fits all” requirements, it is important to say that of course every institution is different, everyone has different requirements.
While I would be cautious of starting from nothing and beginning the user requirements analysis again, it is important to allow users and developers to feel ownership of any system built to achieve repository-like functions and some kind of analysis might be a useful way to achieve this. What is the institution trying to achieve by setting up a place to store, disseminate and manage its research outputs and how does that differ or ultimately align with the received wisdom on what such a system needs to support?
Another way of specifying repository-like requirements was presented at a meeting I was at a week ago where the key features were given as:
* Storage/Versioning
* Indexing
* Retrieval
* Access Control/Rights Management
* Cool URIs
None of that screams “specialist system” to me and so you are right to ask if web-based storage can do just as well. I guess the key thing missing from that list is “management” and there is a real worry that without institutional management of research outputs there will very quickly be a mess of departmental Web sites, just like that bad old days of an institutions Web presence. The concept of a “Repository” implies, but doesn’t necessarily give, organisational control and a structured approach to looking after research outputs. That concept could be met by “web-based storage” provided it allowed for that structured approach to management of material.
So, the answer isn’t simple and at one level all a “Repository” is is a “web-based storage system”. I’ve outlined some of the specialist requirements that “web-based storage system” has been shown by the HE community to need to support. It is up to the institution to mould/map those requirements to the specifics of the institutional need to come up with its own set of requirements.
Then you consider software and you compare your requirements against them and consider the cost/benefit. Does repository-software-X meet our needs? (probably!
) Does “web-based-storage-solution-X” meet our needs? What about S3, Sharepoint, Alfresco? A DIY solution?
Perhaps it helps to try not to start by thinking of the “Repository” (call it what you like – PLEASE don’t call it “Repository”!
) as a piece of software. Better to think of it as a service to the institution. After that think about the technology that fulfills the needs of that service. At one level we call that technology “repository” but we could just as easily call it “web-based storage system” and whatever we call it it could still be CDSware behind the scenes.