Blog - News

Named Entity Recognition for Placenames in Welsh Language Texts

Collections / Digitisation / News / Research - Posted 07-03-2023

Using Wikidata to structure Welsh placename data

The text we read when we view a web page, a blog or a journal article is full of rich and valuable information. Our brains are very good at processing and making sense of words in the context in which they are presented. We can tell when a word is a placename because we understand the sentence around it, and are expecting to see a place name. Also, we often already know the name of the place and could describe it in further detail from memory.

If computers could understand text as we do then they could be super useful in helping us find and understand information better. Technology such as Named Entity Recognition (NER), where machines are trained to recognise things like people, places and organizations by analyzing a whole text, is increasingly being used to turn plain text into a structured network of ‘things’, and this means machines can make a more complex analysis of text, much as we do.

As part of our ongoing Welsh Place Names project, which is funded by the Welsh Government, we were keen to explore how these new technologies and methodologies might be applied to Welsh language texts and to our own collections. With millions of pages of journals, newspapers and books already digitised, how might this technology help us improve our services for better research, discovery and interpretation?

Named Entity Recognition

The Dictionary of Welsh Biography was chosen for this experiment, as a (fairly) manageable corpus of about 5000 articles, packed with information about people and places. Most placenames have actually already been tagged as such in the mark-up for each page, which gives us a good benchmark for NER models to aim for, and a big corpus of place names for further analysis.

Identifying which words are placenames is the first step in this process. Those names then need to be reconciled against a database of names, which can give us access to a deeper, multilingual understanding of the place.

English language NER tools struggle to identify places in Welsh text for a number of reasons. Firstly they are not trained to understand grammatical mutations present in the Welsh language. For example, ‘Tregaron’ is the name of a town, in English and Welsh, however, if the text reads ‘yn Nhregaron’ it will not recognise the name due to the mutation (treiglo) of the first letter. Secondly, many placenames are different in Welsh (e.g. Cardiff is Caerdydd) and so models trained on English text simply won’t have the word in their vocabulary. Several English models were tested and many either didn’t recognise names, or assumed they were names of people.

We therefore experimented with ‘Cymrie’, part of the Welsh Government funded Welsh Natural Language Toolkit.

 

Extracting named entities from digital text using ‘Cymrie’

This was able to extract a number of Welsh placenames, including many with mutations. The text of 5 articles was analyzed in detail. On average the tool was able to extract approximately 67% of placenames. Of those place names identified, only 2% were not in fact places.

Some of the placenames it was unable to recognise were tagged as people or organizations, though this was at a lower rate than the English language model.

Reconciling the Data

Knowing what words are names of people or places is useful only to a point, because we still know nothing more than ‘it’s a place’. For the data to be really useful we need access to more information about each place, such as its name in other languages, its location on a map and the county, country or continent it is part of. We can then apply a unique identifier to each place and they become unique data entities.

To do this we need to take our long list of place names and attempt to reconcile them against a database which holds more information about them. In our case we are using Wikidata, which is home to one of the largest corpus of Welsh place names available. Wikidata is free for anyone to reuse and is structured as linked data.

The Dictionary of Welsh Biography contains around 80,000 instances of place names. Due to the practicalities of working with such a large dataset, I opted to work with the first 46,000 tagged places.

The tags in the Welsh Biography code often contained more than just the placename. They commonly included a Grid reference, the type of place (city, village etc) and the relation to that place being discussed in the article.

Obviously having all this information to hand makes the reconciliation process far more likely to succeed. As NER technology improves, it should be able to imply much of this information, by understanding the wider context in which the place name appears, but for now, we must accept that without this additional information, this process would have a far lower success rate.

Using Open Refine’s reconciliation tool we were able to compare our list of placenames to Wikidata. The software’s algorithm looks for similarities in spelling but also considers the likelihood of a match based on the popularity of its content. By transforming the grid references from our data into coordinates we were also able to instruct Open Refine to score matches based on their proximity. Places with matching names and proximity of less than a kilometre were mostly matched automatically. Our data on the type of place was also used to help the software make a judgement.

In order to give the reconciliation process the best chance of success some initial cleaning was done to remove mutations from the text. Much of this could be done using a series of transformations such as;

  • Nghaer – Caer
  • Nhre – Tre

Others require knowledge of the language and human input in order to avoid the corruption of other names. For example ‘Lan’ cannot be automatically changed to ‘Llan’ without corrupting other names such as ‘Lanishan’.

Other issues included the use of English language names in the Welsh text;

  • New England (Lloegr Newydd)
  • Bristol (Bryste)
  • Saint Brides (Sant y Brid)

There were also a number of placenames which had suggested matches, but had a high chance of also being the name of a property. For example;

  • Trawscoed (house, estate and community)
  • Cilgwyn (village in Powys, Gwynedd, Carmarthenshire AND a gentry house)
  • Ty-coch (area near Swansea and common house name)

short of reading each article in order to make a decision, there is currently no way to match such places with any certainty. However, such a manual process could be easily gamified as a crowd-sourcing task. Undertaking such tasks would also create training data for improving NER in the future.

 

Reconciling the data to Wikidata using OpenRefine

The result was an initial match of 25,000 names, to which a further 2000 were quickly added following a human review of high-scoring match suggestions. These matches include 2208 unique place names. Beyond this, an increasing amount of time would be required to match entries manually.

 

Matching placenames to unique identifiers allows us to examine the frequency of specific places in the text with greater accuracy

Utilizing the enriched data

Now that we have aligned our placenames to Wikidata entries for those places, we have access to a wealth of additional information. This extra information can be summarized in several categories;

  • Persistent ID – Being able to assign a unique Qid to each placename means we can treat each one as a unique entity, even if there are examples of multiple places with the same name.
  • External ID’s – Wikidata collects persistent Id’s from other institutions which hold information about the subject. This helps align and enrich data across multiple datasets.
  • Contextual information – This includes links to Wikipedia articles, openly licenced images and references to other authoritative works.
  • Structured Data – Wikidata contains a linked, structured ontology about its items, So places are linked to their administrative hierarchy and every other item in the dataset with a statement about that place.

 

This allows us to better understand the connections between people and place. In the example below a computer is able to understand that two people are connected to several common places through reference to these places in their Welsh Biography articles. The colour and thickness of the connecting strands also indicate the frequency of these references within each article.

 

 

When this approach is scaled up to the whole corpus we can see a hugely complex web of interconnections between people and places.

 

 

And since we now have access to coordinates for all our places, we can visualize these connections on a map. Below we see visualisations for an individual and for the whole collection using people’s birthplace as a starting point, connected to all other places mentioned in their articles.

 

 

Using the contextual information in place name tags we can make more granular queries, such as links between the place of birth and places of education mentioned in their articles. This highlights clear correlations to major centres of learning and further demonstrates the research potential of the data.

 

Conclusions

In conclusion, existing technology can accurately identify around 60-70% of Welsh place names in digital text. Training more advanced A.I. algorithms using larger place name vocabularies and a bigger corpus of training data may help to increase this percentage even further. Undertaking this process at scale would allow for further research and reconciliation work to take place and would also help to improve search and discovery functionality, but it does not identify unique places, only the instance of a place name.

In order to create notable benefits, the data must be reconciled against a database with data about specific places. With many duplications in place names in Wales and around the world this step is vital in creating connections to the correct places. It would seem that we don’t yet have the technology to automate this, in any language, with a high level of certainty. Several examples of pipelines being developed in order to identify entities in text and reconcile directly against Wikidata or other large datasets do exist, including a project by a colleague here at the National Library (link). However, they have faced the same kind of challenges.

Where additional supporting data already exists, like our Dictionary of Welsh Biography example it is possible to automate this to some degree but there is still a significant margin for error without human input.

Whilst accurate and complete identification of entities from a text is not yet possible, these processes offer value, as a stand alone activity or as part of a multidisciplinary approach, as a way of improving understanding of a text and improving search and discovery services for users.

Importantly, the ability to undertake this work on Welsh language texts is only possible with the continued development, adaptation and improvement of new technologies, and the availability of Open Access data sources such as Wikidata and Open Street Map as well as large corpora of Welsh language text for training machine learning algorithms.

 

Jason Evans, Open Data Manager

Tags: , , , , , , , , ,

A Welsh Placename Revival?

#LoveMaps / News / Rants and Raves / Research - Posted 25-11-2022

A Democratic Digital Infrastructure for Welsh Place-Names

Eryri (Snowdonia) National Park recently announced that they would no longer be using the English names for Yr Wyddfa (Snowdon) or Eryri (Snowdonia) as part of a wider policy of adopting and safeguarding Welsh place names across the park.

 

This move has been broadly welcomed, especially here in Wales, and campaigners will be hoping that this bold move will empower others to follow suit. There is already talk of the Welsh football team adopting the use of ‘Cymru’ in both languages after the impending World Cup.

 

Such moves not only help to safeguard the Welsh language but to celebrate it, and to encourage visitors to engage with it.

 

The decision by Eryri National Park also raises some interesting questions. Will the rest of the tourism sector follow their lead? Or will they continue to use the English name? And what about education, the media and the government? We shall see.

 

As custodians of knowledge, the National Library of Wales is naturally invested in archiving official records, but when it comes to Welsh place names, these official records don’t always reflect popular culture and practice and are certainly slow to react to changing public expectations. 

 

In terms of mapping, many official maps only use the English versions of Welsh place names. For example, despite a concerted effort in recent years, Ordnance Survey still lacks a lot of Welsh language data. In an effort to support the growing demand for Welsh language mapping and data the National Library is working on developing free and open data and mapping solutions for Welsh placenames. With funding from Welsh Government and in partnership with Mapio Cymru and Menter Iaith Môn we are engaging with crowd-sourced, community-governed data sets, Wikidata and Open Street Map, to help develop a Welsh language mapping solution. We’ve used our technical expertise to help align these two sources of Welsh place name data, and worked with Welsh Government open data and the Welsh Language Commission to increase the richness and diversity of the data. 

 

 

And these data sets allow the community to decide on the form of placenames. On Wikipedia and Wikidata names are changed or adopted by an open democratic process – already there is a lively discussion on English Wikipedia about changing the title of the article on Yr Wyddfa. But the data sets also offer flexibility, a name can have many variants, including multiple ‘official’ names, and different names can be noted for different time periods. Consumers of the data then have a choice of what data they want to present on their map. Recently both the BBC and Welsh Government have used this open data to serve Welsh language maps to the public

 

 

We have also used this rich open data to ensure that Welsh Wicipedia has basic articles about (almost) all towns and villages in Wales. We recently created over 800 of these and are working with volunteers to enrich Welsh language content about our places. One of our volunteers has created dozens of articles about historic streets and buildings in Wrexham and we are planning an editing event in partnership with the Welsh Place-Name Society to further improve Wikipedia content about Welsh placenames, their history and their meaning. If you are interested in taking part, you can find out more here.

 

 

We will also be working with Menter Iaith Môn to teach school children how to add information about their community to Wikipedia in Welsh, and to collect sound bytes of children pronouncing their local placenames. These too will be made freely available on Wiki.

 

This project allows us to do more than simply archive and give access to records. This is about engaging with the public and supporting the development of digital infrastructure for Welsh placenames. This will also enable us to think about how we present our collections in the context of place and time. Another output of our work this year will be a prototype map for viewing our collections in both English and Welsh, which we hope will be a positive step towards the development of a truly bilingual search and discovery solution, with the flexibility to adapt quickly to positive change, like the recent renaming of Yr Wyddfa (Snowdon).

 

Jason Evans

 

Open Data Manager

 

Tags: , , , , ,

Wikipedia Translate-a-thon

Events / News / News and Events / Research - Posted 11-10-2019

To celebrate Libraries Week the National Library hosted a Welsh language Translate-a-thon for students at Aberystwyth University hoping to pursue a career as translators. The goal was to translate existing English Wikipedia articles about famous writers into Welsh. The event was part of a wider WiciLlên project, funded by the Welsh Government and aimed at improving online access to Welsh language information and data about literature and the Welsh bibliography.

The National Library of Wales’ National Wikimedian helps the library support and contribute to Wikipedia. The Welsh language Wikipedia has been the focus of this work since collaboration began in 2015. The Library and its main funder, the Welsh Government have recognised the importance of this hub of Welsh language knowledge in building a sustainable and thriving future for the Welsh language – Welsh Wicipedia is already the most viewed Welsh website and now has over 100,000 articles. However there is still lots of work to do in order to give access to ‘all knowledge’ in Welsh.

The Library has been working with the Professional Translation Studies course at Aberystwyth University for several years, building on the idea that using Wicipedia’s content translation tool for perfecting translation means students can actively contribute to the improvement of freely available Welsh language content whilst studying, giving real value to their assignments.

Coarse leader Mandi Morse says: “We are delighted to be able to take advantage of the Wikipedia platform while teaching the postgraduate Professional Translation Studies course. It gives our students great experiences as they develop their translation skills, giving them the opportunity to practice translating all kinds of subjects and contexts. Wikipedia is certainly extremely useful and enriches our provision”

12 students attended the event at the National Library, and 9 new articles were created. In many cases, making information about these people available in Welsh for the first time. New articles include German novelist Gerhart Hauptmann, who won the Nobel prize for literature in 1912 and English Children’s author Joan Aiken. You can find a full list of articles created are available on Wikimedia.

We hope to facilitate similar events in the future in order to support the improvement of Welsh language content online and to encourage Welsh Universities to think about how they can do the same.

Jason Evans

National Wikimedian

Tags: , , , , , , , , , , ,

The Dictionary of Welsh Biography

Collections / News / Story of Wales - Posted 13-09-2019

This post is a part of the Story of Wales series, which looks at different aspects of Welsh history, and how today’s Wales remembers, and shapes it. Subscribe to the blog on the right to ensure you don’t miss any posts.

Developing an interactive timeline

Wales is a small but proud nation, a nation which has contributed more than its fair share of reformers, inventors and innovators to society. From Aneurin Bevan’s NHS to Edward George Bowen’s development of Radar, Wales’ contribution to technology and civilisation as a whole, should not be underestimated. And lets not forget, Wales too has entertained us with sporting greats, actors like Richard Burton and a plenitude of musical talent.

The Dictionary of Welsh Biography has for many years recorded the lives of our most celebrated people, so that we never forget their contribution to Wales and the world. Since 2004 all these biographies have been available bilingualy on the Dictionary of Welsh Biography website, and it’s regularly updated with new entries – over 5000 and counting.

In recent years, in a bid to make this resource as open and accessible as possible, the National Library has been sharing the data behind the website openly to Wikidata – a lesser known sister of the one and only Wikipedia, designed for sharing information as data, rather than prose, freely and openly with the world. Like Wikipedia anyone can edit and improve the data in Wikidata and we now have a rich resource of data about our 5000 VIPs. Wikidata lets us plot birthplaces on a map, it lets us connect data about people’s education with data for the schools and universities they attended, and we can see which other institutions hold relevant records, like portraits or archives.

Our volunteer team has also been busy using the Dictionary of Welsh Biography to create Wikipedia articles for the people, so that we effectively have two versions of every article – one a peer reviewed and carefully managed historical record, and the other, a community managed, constantly evolving article which anyone can contribute to and reuse freely.

Following the launch of a new website for the Dictionary of Welsh Biography last year, we secured funding to work with developers to add a new and exciting feature. Using the enriched data from Wikidata, and thousands of digital images from the library collections, we are developing an interactive timeline which will allow users to explore all 5000 people in the dictionary chronologically. Click on a person on the timeline and you will be able to see the relevant Dictionary of Welsh Biography entry and the Wikipedia article.

What’s more, the timeline will allow users to filter the records based on where they were born, where they were educated, their occupation and more. And these filters can be used in combination, so if you only want to see all the Footballers born in Aberdare, that’s fine! The Library has also carefully curated a timeline of important events in Welsh history which can be overlayed on the timeline to give more context to the lives of these people.

This level of interaction and customisation will help bring the dictionary of Welsh Biography to life. It will be easier than ever before to search and discover the lives of our most important citizens – the people who helped shape the story of Wales.

The timeline should be live later this year.

Jason Evans

National Wikimedian

Tags: , , , , , , , , ,

The WiciLlên project

Collections / Events / News / News and Events / Research - Posted 10-09-2019

Sharing data and information about Welsh literature with the world

The National Library of Wales working in partnership with Menter Iaith Môn for a second time has secured a grant from the Welsh Government for the WiciLlên project, in order to deliver an ambitious project focused on openly sharing information about Welsh literature on the Wikimedia projects.

The project will consist of two main strands. Firstly the National Library will begin sharing a huge dataset of all books of Welsh interest ever published in Wales. This dataset contains information about nearly half a million books, their authors and publishers.

As part of the WiciLlên project the first 50,000 of those records will be enriched and shared as linked open data on Wikidata. The data will be searchable and reusable in dozens of languages, including Welsh. This will improve access to this important dataset, help improve citations on Wikipedia and provide opportunities for developers and researchers wishing to re-use the data.

The second strand of the project will focus on improving content on the Welsh Wikipedia. The National Library will deliver a Hackathon event and a series of Wikipedia editathons, whilst Menter Môn’s Wikipedian in Residence will deliver events for school children of different ages.

Nia Wyn Thomas, who heads Menter Iaith Môn said: “It’s a privilege, as always, to work with Wikimedia UK and the National Library to enrich open content in Welsh through the skilled hands of Anglesey’s children. Over the period of the collaboration, we are proud of the work that has been achieved, and the impact of the work around developing children’s digital competency through the medium of Welsh, be it their first, or second language. The influence of the work on the development of the Welsh language is also great, in a field where the language is not always seen as progressive”

The project has already started and will run until March 2020.

Jason Evans

National Wikimedian

Tags: , , , , ,

Treasured Manuscript collection gets the Wikidata Treatment

Collections / Digitisation / News / Research - Posted 11-06-2019

Converting metadata to linked open data.

The Peniarth Manuscripts form one of the most important collections held by the National Library of Wales. Its 560 manuscripts date from the 11th Century onward and contain some of the most important and iconic Welsh literary works in existence, including stories from the Mabinogion, the Book of Taliesin and the earliest copies of the ancient Laws books of Wales. In 2010 the collection was included in the UNESCO UK Memory of the World Register, further underlining its importance as a national treasure.

The collection has of course been catalogued and digitisation of the entire collection is currently underway. So now seemed like a good time to explore the potential of linked data in order to better understand and explore the makeup of the collection.

At the National Library of Wales we have now converted collection Metadata to Wikidata for a number of collections including paintings and printed material. This has lead to an enrichment of data and easy access to tools for querying and visualizing the collections. Creating Wikidata for each of the Peniarth manuscripts would result in similar advantages, but first the existing metadata would have to be cleaned and refined before being mapped to entities within Wikidata. Some mappings were easy, for example metadata tags for parchment and paper were easily matched to the relevant Wikidata entities. Dates and measurements simply needed formatting in a particular way in order to add them to Wikidata, and the QuickStatements (QS) upload tool contains detailed instructions on how to do this.

Much of the data already existed in set data fields making mappings fairly straight forward. However the metadata for many manuscripts also included a text based description of the item, which in many cases included additional information such as the names of scribes and people whose works are represented within the manuscript (authors). Extracting this data was more difficult. By filtering searches for specific sentence structures and/or certain keywords it was possible to semi automate the extraction of this data, but it also required manual checking to avoid mistakes. Once the names, works, subjects and genres were extracted they then had to be matched to Wikidata items. If these items did not yet have a Wikidata item, they were created whenever possible using data from other sources.

The ontology for describing manuscripts on Wikidata is still being tweaked, so in order to properly separate and describe both the scribe/copyist of a work and the authors of works included in a manuscript it was necessary to create a new property on Wikidata, which can now be used to describe the scribe, calligrapher or copyist of a manuscript work.

Once the data was prepared in a spreadsheet it was uploaded to Wikidata in stages using the Quickstatements tool. We also uploaded sample images of the 100 or so manuscripts which have already been digitised to Wikimedia Commons. Since the implementation of structured data on Commons any upload which links to the relevant item on Wikidata it now pulls in much of the relevant descriptive data automatically, meaning there is a lot less work involved in preparing a batch upload of images than in days gone by. Since the National Library uses IIIF technology to display its digital assets, we also included persistent id’s to our image viewer and links to IIIF manifests in our Wikidata upload.

Once the data is uploaded it can immediately be queried and explored using the Wikidata SPARQL Query Service. This tool has a suit of visualisation options, but there are a number of other useful visualisation tools which can be used in conjunction with a sparql query without the need for any coding knowledge, such as the Wikidata Visualisation suit and RAWGraphs.

In many cases it is technically possible to retrieve the same data from standard Metadata as you can from the linked data – it’s just that we don’t have the tools to easily do so. For example we could easily list manuscripts from smallest to largest, or oldest to youngest, or perhaps explore the relationship between the size of a manuscript and the date it was created.

Interestingly, this query clearly shows a trend of increasing size in the manuscripts over time and it also seems to point to a trend towards producing manuscripts of similar sizes at different periods in time.

We can also easily analyze data about the language of the works in the collection. It’s worth remembering that many works contain texts in more than one language, but we know that 43% of items contain Welsh language text whilst 33% contain English and 19% contain Latin.

See the Sparql Query

Whilst this is definitely useful, the extra information extracted from text descriptions in the metadata begins to enrich and add further value to the data, allowing us to perform new queries on the data. For example we can attempt to break down the collection by genre and main subject for the first time. This of course is only as accurate as the original data, and in some cases the variety of content within a single manuscript makes it impossible to apply an overarching content type, but in terms of research and discoverability, the data certainly provides new insight. For example, we can identify all manuscripts which contain correspondence, and then see who the main subject of those correspondence are, and because Wikidata is linked data we could then access biographical data about those people.

See the full visualization on Wikimedia Commons. See the Sparql Query

Many of the manuscripts in the Peniarth collection include copies or partial copies of other notable works, in fact some of the manuscripts are copies of other manuscripts in the same collection. Using Wikidatas ‘Exemplar of’ property it was possible to connect the manuscripts to data items for the works they contained. Again, I suspect the original metadata does not identify all the works included in the manuscripts so the results of any query will not be exhaustive but they will represent all of the current data in our catalogue.

See the full visualization on Wikimedia Commons. See the Sparql Query

We can see from the visualisation the no fewer than 22 manuscripts contain text from the codification of Welsh Law by Hywel Dda, 21 manuscripts are copies of other manuscripts in the collection and 12 are exemplars of various printed books.

Using the newly created Scribe property on Wikidata we have been able to link data for each manuscript to the data items for every scribe mentioned in the metadata. Three scribes stand out as the most prolific, with their hand writing appearing in dozens of Manuscripts. Two of the three, Robert Vaughan of Hengwrt and W.W.E Wynne of Peniarth once owned much of the collection and did much to annotate and copy the texts. The third, John Jones, was a well known collector and scribe, and is credited with copying many texts which might otherwise have been lost forever. By exploring which scribes contributed to which manuscripts we can identify connection between otherwise unconnected individuals.

See the Sparql Query

Finally, it’s important to underline the fact that Wikidata doesn’t just allow us to explore individual collections in new ways, it acts as a hub, joining collections together in an ever expanding web of cultural heritage data. We have added a lot of data for people in the Dictionary of Welsh Biography for example, and a simple query now allows us to identify all of those who contributed to the Peniarth collection.

In the same way, we can link to collections in other institutions, many of whom are also beginning to add their collections to Wikidata. Oxford University is one such institution and this means that manuscripts of Welsh interest at Jesus College like the Book of the Anchorite of Llanddewi Brefi and the Red Book of Hergest are now connected through linked data to the copies of those manuscripts in the Peniarth Collection.

Run the live query on the Wikidata Query Service

As more and more collections are added to this huge linked open network we will increasingly be able to reconcile, explore and make sense of our combined cultural heritage, which for hundreds of years has existed in closed silos. By applying new technology and Open licensing, cultural institutions can now breath new life into old data, and reach a wider audience than ever before.

Jason Evans

National Wikimedian

Tags: , , , , , , ,

Wikimedia and the culture sector – A new impact study.

Collections / Events / News / Research - Posted 16-05-2019

In December 2017 the Library’s National Wikimedian began work on a Welsh Government funded project to improve the quality of information about people related to Wales on the Welsh language Wikipedia.

The entire project was planned using Europeana’s new Impact Playbook with the aim of exploring and document the changes, or impacts, to different stakeholder groups of delivering a range of Wikimedia based activities focused around collections at The National Library of Wales.

A full impact report for the project has now been published.

The Impact Playbook works by creating clearly defined change pathways with measurable outcomes (or changes) during the planning process, insuring that a wide range of outcomes and desired impacts can be assessed and measured at the end of the project.

This is the first time a project focused on Wikipedia based activities has been assessed in this way, so this was a great opportunity to explore and document the impact of working with Wikimedia in the culture sector.

The project focused around the release of 4,862 Welsh portraits to Wikimedia Commons, with an emphasis on improving access to Welsh language content and providing opportunities for the public to engage through the medium of Welsh.

Bilingual Wikidata was created for each portrait. This data was used to help create nearly 1,500 new Welsh Wikipedia articles, utilizing 25% of the images. The images generated 1.6 million page views in 55 languages in the space of a month, greatly increasing access to information about Welsh people.

Working with Menter Iaith Môn, a series of events were held at schools highlighting how Wikipedia-based learning can contribute positively to schools’ targets for the Welsh language and digital literacy.

A ‘hackathon’ event demonstrated the value of open data to the creative industries in Wales and a number of use cases were documented.

The project demonstrates how working with Wikimedia can help cultural heritage institutions build and support new communities and achieve outcomes which align with their core values whilst increasing access to, and use of, their digital collections.

Jason Evans

National Wikimedian

Tags: , , , , , ,

From Planting Crops to Planting Trees: Telling the Story of the Forest

Collections / Digitisation / Discover Sound / News / Screen and Sound - Posted

For the last 14 weeks as part of the Unlocking Our Sound Heritage project, the Library has welcomed 10 students from the MA Archive Administration and MSc Digital Curation courses from Aberystwyth University’s Department of Information Studies to work on one of our sound collections. We would like to thank the students for all their hard work and contribution towards the project, and to Crystal Guevara for writing this Blog about their time spent with us.

Timber, forest fires, road building, and World War II stories are just some of the subjects that are covered in a collection made up of 167 MiniDiscs, each containing interviews recorded from people who worked for or around the Forestry Commission.

As part of the Unlocking Our Sound Heritage project, the National Library of Wales is working on preserving and making available sound recordings that tell the story of wales through oral histories. Dr. Sarah Higgins, professor at Aberystwyth University arranged for ten students in the post-graduate Archive Administration course to help the Library work on a project called the Story of the Forest.

I was one of ten students that got to work on the project and I found that my appreciation for the people who had started the work for this story grew from admiration to urgency so that more people could hear and learn from the experiences of the forest workers who transformed the landscape of rural Wales.

The majority of these recordings are in Welsh, the remainder being in English. To place you somewhere in the timeline of history we’re talking about mid-Twentieth Century Wales. Due to a high demand for timber, the Forestry Commission bought slate quarries and farms to transform those areas into plots for forestry farming. Naturally, this meant an adjustment in lifestyle and some people adjusted well to it while others longed for the way that things used to be. The people who were a part of these changes weren’t interviewed until 2002 and 2003 by a team of interviewers who were motivated to get on record the rich details of this time of transition and change.

Because the interviews were recorded on MiniDiscs, they needed to be rescued from becoming completely inaccessible, since so much of the technology around MiniDiscs has already become obsolete. So, our job as archive students was to digitise the recordings on the MiniDiscs, catalogue all of the interviews, transcribe them, and then put together an online exhibit to showcase some of these interviews along with old photographs provided by the interviewees. To get all of this done we got to work with some of the Library staff. They gave us guidance on what to do and we in turn strategized the timetable and roles and responsibilities.

Everyone on the team got to perform unique tasks and we sought to rotate everyone through all the necessary jobs to get a chance at trying different things out. Each task required a different learning process and each one was vital to make these stories publicly available.

During the digitising I was able to appreciate having technology that allowed us to continue preserving these stories. While transcribing, I got to hear first-hand the core of what we were doing. Listening to the interviews, was insightful and eye-opening. They contain stories about forestry policy, road building, nursery work, farm life, and other topics like Land Army Girls, Prisoners of War, and life post-World War II. Then, while cataloguing we strived to do things meticulously, but efficiently to create useable information that would help future users navigate through the collection.

To become more connected with the project and feel the real human connection with the interviewees and their stories, we organized a trip to Corris. Corris is one of the places mentioned often in the oral histories and only a 40-minute drive from Aberystwyth. While we were there, we could see for ourselves the different types of trees in their separate sections, covering the hills. We took pictures of our visit to include in the online exhibit and add our own perspective to continue telling the story of the forest.

 It was a great journey beginning to end. As we are only aspiring archivists at the moment, we relied heavily on the knowledge of all the library staff helping us work the technology and understand the metadata standards. Alison Smith, Berian Elias, Rhodri Shore, Gruffydd Jones, and Elena Gruffudd were especially helpful. That in and of itself was a lesson applicable in how to help and educate people who are learning to use archives.

To see these oral histories start off in a cardboard box and now find them searchable on the British Library catalogue brought a great sense of accomplishment for the entire team.

17 of these stories are now available to listen to online on the People’s Collection Wales website, along with more detailed stories about the specific process of cataloguing, digitising, transcribing, and work on the exhibit.

Crystal Guevara

MA Archives Administration

International Conscientious Objectors’ Day

Collections / News / Screen and Sound - Posted 15-05-2019

As International Conscientious Objectors’ Day is taking place across the globe today – 15th May, The National Library of Wales’ Screen and Sound Archive would like to draw your attention to a short film available on the BFI Player entitled Defending This Country Only Means Attacking Another.  The title was taken from a Peace Pledge Union [PPU] placard seen in the film which was shot by Mr J. Fred Phillips, a cinema operator in Brynmawr  from 1923 to 1958.  He was also captain of the Monmouthshire Golf Club, Abergavenny, and husband of Pollto Williams, a finalist in several national ladies golf championships at Llandrindod.  Other placards indicate the PPU beliefs: Mass Murder is No Defence of Liberty and  Peace is Indivisble – We Seek Peace on Earth, Goodwill to ALL MEN. Hand-crafted placards that have a drawing of a blood splatter on them accompanied by the words Munitions from Ebbw Vale suggest that this could be a protest against the opening or operation of such a factory in the area. The Society of Friends (Quakers, pacifists) had set up projects for the unemployed (e.g. boot and furniture making – see also the film Eastern Valley on the BFI Player) in Brynmawr and area during the 1930s but many of the unemployed found work in munitions factories in Ebbw Vale during WWII.  Or, given that the PPU undertook a Carlisle to London peace campaign in 1938, could this footage show a campaign visit to Ebbw Vale?

Click here to view the film: Defending This Country Only Means Attacking Another (1938)

The Peace Pledge Union [PPU] was initiated in 1934 by Canon Dick Sheppard who had been an Army Chaplain during WWI. He wrote a letter to the newspapers asking men (as women were already active in the peace movement) to sign a pledge if they were sickened by what looked like the stirrings of another war: ‘I renounce war, and I will never support or sanction another.’ He was overwhelmed by the response. The movement included women from 1936. Today, the PPU is the provider of the white poppies worn on Remembrance Day. Such poppies were first worn, at the instigation of the Co-operative Women’s Guild on Armistice Day, 1933 (Armistice Day became Remembrance Day after the Second World War). Many of the women had lost loved ones during WWI and despaired at on-going preparations for further war. It was also felt that remembrance should include all the non-military victims of war too.  The pledge today is as follows: ‘War is a crime against humanity. I renounce war, and am therefore determined not to support any kind of war. I am also determined to work for the removal of all causes of war.’

Mary Moylett,

Cataloguer (Film) Screen and Sound Archive

Launch of the new UK web archive

Events / News - Posted 08-04-2019

Last month marked 30 years since the invention of the World Wide Web. Fortunately, the National Library of Wales and its partners have been archiving Welsh websites and preserving this history for generations to come.  As a result, at the end of last year, the new UK Web Archive website was officially launched. This new site is a response to changes made to Legal Deposit legislation following the passing of the Legal Deposit Libraries (Non-print Works) Regulations in 2013 meaning Legal Deposit now encompasses electronic and online material such as websites, blogs, e-magazines and materials on CD-rom.

The purpose of the UK Web Archive is to collect, preserve and give permanent access to key UK websites for future generations. The 2013 Regulations presented the Legal Deposit Libraries with a huge challenge as one of the requirements is to archive the whole UK Web Space. As with previous UK Legal Deposit Acts, primarily dealing with print material, legal deposit of online material only extend to items published in the UK.

Furthermore, due to the 2013 Regulations, the scope of our collecting substantially increased. For instance, the UK Web Archive collects many millions of websites and billions of individual “assets” (html pages, images, pdf’s, video’s etc.). Since 2017, the UK Web Archive has collected approximately 500TB of data. At least once a year, the British Library performs an automated “crawl” under the terms of the Non-Print Legal Deposit Regulations 2013 to capture as many UK websites as we can identify. This will result in further substantial increases in the huge amount of data that we now collect.

The National Library of Wales and our Legal Deposit Libraries partners, led by the British Library, had been archiving websites from 2003 to 2013, but this was a permission-based model. In order for us to archive a website we needed prior permission from the site owner. Because of the new Regulations, we no longer need permission to archive a site if it is published in the UK.

As for access, the site is viewable from here. However, under the Non-Print Legal Deposit Regulations 2013 access to much of the archived content is restricted to a UK Legal Deposit library reading room. Therefore, you will see a ‘viewable only on Library premises’ alongside many descriptions to archived websites directing you to one of the UK Legal Deposit Libraries for access.

The UK Web Archive aim is to provide ‘open access’ to as many of these sites as possible therefore we are still contacting owners of websites requesting permission for us to open up access to archived versions of their websites. For instance, we already have an arrangement for a number of years with the Welsh Government allowing us to provide open access to their growing list of websites.

Of course, the UK Web Archive will continue to expand and develop over the coming months and years. The UK Web Archive is one of many initiatives undertaken to successfully respond to the new Regulations and the challenge that the Digital black hole presented to us as Libraries. Now the site is live, we hope to increase interaction with our users. For instance, a feature of the site is Special Collections and if you would like to see content included in one of our special collections or provide general feedback on the UK Web Archive then please get in touch. We will be very happy to hear from you.

Aled Betts,

Acquisitions Librarian

← Older Posts

Categories

Search

Archives

About this blog

A blog about the work and collections of the National Library of Wales.

Due to the more personal nature of blogs it is the Library's policy to publish postings in the original language only. An equal number of blog posts are published in both Welsh and English, but they are not the same postings. For a translation of the blog readers may wish to try facilities such as Google Translate.

About the blog