Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
New Archive of 632 Greek Texts with OCR
#1
Quote:Bruce Robertson at Mount Allison University has performed high-quality optical character recognition on over 600 volumes of ancient Greek in collaboration with Federico Boschetti of the CNR, Pisa. Page images with corresponding OCR output and freely downloadable archives of all stages of processing are available at the project website: http://heml.mta.ca/lace 

Complete or near-complete series include the volumes in Commentaria in Aristotelem Graeca, Cramer's Catena Graecorum Patrum, and Meyer's Critical and Exegetical Handbook series of biblical commentaries.

The collection also comprises many collections of fragments, papyri, etc., such as Kock, T. Comicorum Atticorum Fragmenta 3, Koerte's fragments of Metrodorus of Lampsacus and Diels' original Die Fragmente der Vorsokratiker

There are important Canadian elements to this effort: it was made possible through a specially allocated High Performance Computing environment from Compute Canada; and most of the excellent page images needed to start this process were produced at the University of Toronto through a collaboration with archive.org.

The quality of the OCR is varied, but they have photos as well as the scanned text, and where else can you read about the storied EQTA eNI hEBAS, or Dio's gripping account of the Bellum Piςaticum? For our martial members, they have a Greek Polyaenus, the Poeti Lyrici Graeci for Tyrtaeus and Alcaeus, and many historians.
Nullis in verba

I have not checked this forum frequently since 2013, but I hope that these old posts have some value. I now have a blog on books, swords, and the curious things humans do with them.
Reply
#2
Alcman*

Yeah good catch this, I saw it advertised the other day and just had a play around with it. I don't know, its bloody impressive as a feat of technology and, as they say, the rights management is much freer than TLG and Perseus but...these texts, so many are out of date to be near unusable. I really couldn't recommend anything theyve got in lyric or drama since both editing and papyrus finds have drastically changed the game.

I love some of the ancillary books they've put up though like Evangelinus Sophocles lexion...It might be out dated here and there but its well organised and a saviour for students and omg they've got a fully OCR Zosimus on there.

This tool has some great potential. Thanks Sean.
Jass
Reply
#3
Hmmm ... had a look at a Loeb Arrian and it was just gibberish. :dizzy:
posted by Duncan B Campbell
https://ninth-legion.blogspot.com/
Reply
#4
Quote:Alcman*
It is worse than that, since I was thinking of Alcaeus not Alcman! I wonder if they chose some books as test material for their OCR software. Yet even a bad edition is better than nothing, and not everyone has convenient access to a university library or the TLG. Their editions include the apparatus criticus, and that is valuable.

Quote:Hmmm ... had a look at a Loeb Arrian and it was just gibberish. :dizzy:
They seem to do better on the Teubner font with its four-stroke kappas and lunate sigmas. The errors on those texts seem to occur at a rate of one every few lines, which is rare enough that one can correct by hand.

The FAQ says that they will do more proofreading once their access to a data centre expires.
Nullis in verba

I have not checked this forum frequently since 2013, but I hope that these old posts have some value. I now have a blog on books, swords, and the curious things humans do with them.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  translation needed of some Greek texts Jeroen Pelgrom 4 1,266 01-08-2004, 02:11 PM
Last Post: Jeroen Pelgrom

Forum Jump: