Preserving blogs – an interview with the Blog Forever Project

Blog ForeverAs blogs become a more important place to host scholarly work in whatever form the problem of longevity becomes more of an issue.  How do we capture this activity in the long term?  Is there a use to blog posts beyond the moment of their creation and is the context in which they were published important?  The BlogForever project is looking into these issues and attempting to come up with a solution.  The result: a weblog digital repository designed to store, preserve, disseminate and aggregate blogs.

Blogging for Historians has interviewed Patricia Sleeman who has been working on the project on behalf of one of its partners (the University of London Computer Centre – ULCC).  It is an EU funded project with a long list of partners all intent on developing a robust digital preservation, management and dissemination facility for blogs.  Here’s what she had to say.

First, could you tell us a bit about the people behind Blog Forever?

Blogforever is a collaborative EU funded project lead by the Computer Science Department of the Aristotle University of Thessaloniki (AUTH). The other partners include CERN, whose repository system is being adapted for the management of blogs. Other Universities include TU Berlin, University of London Computer Centre, the University of Warwick and the University of Glasgow. Private enterprise is represented by Mokono and Cyberwatcher. It sees a combination of archival, developer and entrepreneurial skills. It is a unique combination of skills.

What do you hope to achieve with the Blog Forever project?

Its key objective is to develop robust management and dissemination facilities for weblogs. These facilities will be able to capture the dynamic and continuously evolving nature of weblogs, their network and social structure, and the exchange of concepts and ideas that they foster; pieces of information omitted by current Web Archiving methods and solutions. It also aims to assist in the preservation of blogs as a result.

The project is largely concerned with preservation and management of blogs, which is generally a neglected consideration for those setting up and running blogs.  Why do you think it should be something that bloggers concern themselves about?

Blogs reflect the diversity of lives, interests and activities throughout the world, and demonstrate opinions from a perspective which very often would otherwise not be obtained. One example of this is blogging from a war zone where people can anonymously report about the situation providing an ‘unoffical’ viewpoint. Another example is where people are required to blog about their research and provide insights into the day to day findings of the project.  Blogs are constantly changing and even disappearing unless they are captured.

Blogs are ephemeral – the average web page lifetime is below 100 days – and cannot be considered a reliable and long term source of information as they are extremely volatile. By achieving blog preservation, we provide integrity, permanence and credibility to blog content, making it a first line source of information which will be discoverable, referenceable, and relatable in the future.

In this project what requirements do you have for blogs to be included?  Do they need to meet certain criteria to be considered authentic, useful, and valuable in terms of the aims of Blog Forever?

Blogforever will not conduct selection, it is up to the client who acquires the product to do their own selection based on their own criteria.

The Blog Forever website suggests that part of the project is concerned with the study of weblog semantics and ‘the social importance of weblogs’.  How do you think the project will achieve these aims and do you plan on publishing research results yourselves?

In addition to the repository itself, there has been a wealth of material published about research into this area. BlogForever is addressing the study of blog semantics by modelling all aspects of blog structure, elements, and interconnections, creating a generic blog data model. The blog data model also encompasses significant properties of blogs and their inter-blog relationships – creating a solid foundation for further theoretical and practical work.

This work enables us to devise novel ways to perform data extraction, preservation, and dissemination of blog content. The outcomes of this work are both research results in the form of public reports (deliverables), as well as an open source platform that anyone can have access to.

Regarding the 2nd part of the question about the social importance of blogs, we believe that by addressing blog preservation, we elevate the importance of blogs as a contemporary resource of information.

In your view, what makes for a good blog?

This is an interesting question, as the idea of well formed blogs is something I imagine which is not considered often in this rather anarchic world of the internet, but I think a good blog is one which is well structured (and the default structure which has emerged seems to have page/post and comment). It should be well described so the reader gets a sense of it at the start, it should also be intuitively structured so readers can navigate easily. Other than that design and the like are a matter of taste. Of course a good blog should also be regularly backed up.

If people are interested in the project where should they go to find out more?

Please have a look at

When will the new platform be ready for public use?

It will be ready by October 2013 for release.

Would you like to share any other thoughts?

Collaborative working on a project such as Blogforever has shown me that anything is possible as long as all of us have a common understanding. Communication and shared understanding are crucial for working on large multi disciplinary projects like Blogforever.  It can also lead to beautiful friendships and future collaboration.

This Interview was conducted with Patricia Sleeman (ULCC) over e-mail in July 2013 in relation to the Blog Forever project.  BlogForever is funded by the European Commission under Framework Programme 7 (FP7) ICT Programme and involves the following partners: Aristotle University of Thessaloniki; European Organization for Nuclear Research (CERN); University of Glasgow; University of Warwick; University of London Computer Centre (ULCC); Technische Universitat Berlin; CyberWatcher; Software Research and Development and Consultancy Ltd.; Tero Ltd.; Mokono; Phaistos Networks S.A.; Altec Research S.A.