Interdisciplinary Teaching and the 1 or 2 Rule

From Sharon J. Harris, PhD Candidate in the English Department and HASTAC Scholar.

Interdisciplinary work has a kind of X-factor. Perhaps because it inherently crosses or defies boundaries, we tend to view interdisciplinary work as innovative and insightful. But for all its buzz, we risk underestimating the demands of such work. At the end of the academic year I have reflected on the processes of a newly designed course I taught this semester that made use of several different forms of media. I’ve taught two courses now that blend literature and music, and each time I reach a point in the semester, usually an otherwise nondescript Wednesday, where I am gobsmacked at how much work it takes just to prepare class materials.

elias-teacher-burnout-signs-custom

Besides the lesson planning, grading, reading, note-taking, typing, copying, printing, and student-email-responding I expect to do in the workload of teaching a course, I’m also listening, vetting music tracks and YouTube videos, reading three different kinds of background material (because of three different disciplines), scanning readings (because no single textbook has all the material I need), uploading files, figuring out delivery of electronic materials, writing playlist listening guides, researching technology platforms, and on and on. Don’t get me wrong: I’m not saying interdisciplinary courses aren’t worth it. They’ve been some of my most rewarding teaching experiences. But I do think that we, as teachers, need to approach the demands of designing interdisciplinary courses with our eyes wide open.

ScalarMaybe this is just an old-fashioned question of form vs. content. The content is exciting, full of new connections. But the form is new too. Are we ready for it? When we push and straddle boundaries of various disciplines, we must be accountable to not only the content of more than one field of study but also the form that information takes. For example, I have assigned my students to create annotated playlists. This project helps them to read music closely, using many of the same skills of literary analysis but applied to a different medium. I have to think through the best format for completing and turning in this kind of work. To do the assignment the students need to learn to use music platforms that can create and playlists and then share them. They also then need to annotate those playlists and find a way to turn all of it in together. The software Scalar advertises that “anything can do anything to anything,” meaning that any media format can comment on and annotate any other. I have also considered SoundCloud because it allows users to place a comment at a designated timestamp in the music. Scalar, however, works best with long-form text-based work and has a slightly steep learning curve. And the drawbacks to SoundCloud include limits on how much music a user with a free membership can post, and a comment format that does not lend itself to longer analyses.

The pros and cons that I weigh to determine what form my students’ playlists should take lead me to the 1 or 2 Rule. I have adopted this rule for myself to manage the labor involved in interdisciplinary courses. The Rule is Pick 1 or 2 new aspects to add to or change about your teaching each semester. As you decide what to add or change, consider the following questions:

  • What is new about this semester? Do you already have big changes happening in your personal life?
  • Have you taught this course before? Do you have a textbook/are the materials already gathered, or will you be creating them from scratch?
  • How many other courses are you teaching?
  • How often will assignments and/or lectures and classes require you to curate multidisciplinary material?
  • Do you need to learn a new technology?
  • Will your students need to learn a new technology?

youtubeIn my case with the annotated playlist assignment, I was developing a new course, which already took quite a bit of work. So I decided to have my students create the playlist on Spotify or YouTube, technologies they most likely knew (but I have also learned not to assume that my students know even what seem to be the most common technologies!), and email their annotations typed in Microsoft Word. This system isn’t as compact or seamless as the other technologies might have been, but it had the benefit of being familiar and easy to communicate as they took on the challenge of thinking and writing about a new discipline.

As I develop my courses more fully, I will be in a better position to expand the forms and formats of my interdisciplinary content. This means that it may take a few years to develop these courses in the way that I would like, but it also helps keep me from burning out so that I can continue to create them. It turns out that learning new technologies takes time, both for me and for my students. We shouldn’t be surprised though: These new technologies are new forms of literacy, and becoming literate takes time. Literacy is worth it though. Twenty-first-century literacy can build skills and knowledge with new content and new forms through interdisciplinary teaching.

TEI and XML Markup for Absolute Beginners

You have likely already come across the term TEI in digital humanities circles. Perhaps you have (like me) downloaded a TEI edition of a text before, but have not known how to read it or what to do with it. You may have (also like me) even deployed the term in casual conversation without completely understanding what TEI is or what it does.

This post aims to provide you with a concise and basic introduction to TEI and XML markup. I hope to provide definitions for all technical terms and to point you to useful resources to reference either when using TEI-encoded texts or when you eventually create your own. I have also included some step-by-step instructions on how to create an XML markup of a text in the last section of this post.

  1. What is TEI?

TEI is an acronym for the Text Encoding Initiative. The “text” here is clear enough for the time being. But what about “encoding?” All text that we view on a website is encoded with information that affects the way we see the text represented, or the way in which your computer sorts and interprets the text. There are many different languages in which one can encode text.

Because there is no single agreed-upon format or standard for encoding text, a consortium of humanists, social scientists and linguists founded the TEI in 1994 to standardize the way in which we encode text. It has since become a hugely popular standard for rendering text digitally and is still widely used today.

  1. What is XML?

Most versions of the TEI standards (of which there are a few) make use of XML: Extensible Markup Language. XML is a text-based coding metalanguage similar to HTML (in fact, both are markup languages, hence the “ML” at the end of each acronym) that, like the standards of TEI, has undergone several changes and updates over the years.

XML documents contain information and meta-information that are expressed through tags. The tags are similar to those used in HTML, if you are familiar with these. Below is a brief example of an XML document:

<p>the quick brown fox</p>

<!–this is a test–>

<p>jumps over the lazy dog</p>

The letters bounded by these symbols “< >” above are tags. In this case, the tag being used is “<p>” which is used to separate paragraphs. All tags must be both opened and closed, or your code will not work. “<p>” is the opening tag and “</p>” is the closing; all tags are closed in the same way. The text in between the opening and closing tags is what will be encoded.

To insert comments into your XML document that will not have any bearing on the function of your code, follow the format found on the middle line of the document above, i.e. <!–YOUR TEXT HERE –>

  1. Why know or use TEI and XML?

Before we delve into the actual process of creating a simple XML markup of a text, you might already be wondering precisely how learning a bit of XML and TEI will be beneficial to your scholarship.

There are a few ways in which we can envision the uses of a working knowledge of XML and TEI. First and foremost among these is the possibility of you creating and disseminating your own TEI editions of texts – perhaps a transcription of a manuscript few others have seen, or handwritten notes you have uncovered in your archival research. In this case, you can let your more tech-savvy colleagues add your editions to their corpus – to query and explore the documents in any way they see fit.

As humanities scholars in the twenty-first century, we often find ourselves asking broad and provocative questions about our disciplines and our work. One particularly captivating question has always been about the nature of what we call “the text.” The digitization of texts has provided this line of questioning renewed energy, and a basic understanding of the encoding process would also arm you with the vocabulary to take part in this conversation.

A final use of TEI and XML would involve you querying documents created by others. There are an enormous number of TEI editions of texts available freely online, and the more comfortable you are with code, the more you can do with them. One deceptively simple way of visualizing your XML code is turning it into a spreadsheet using Microsoft Excel. This is especially useful if you wish to add data from a TEI edition into a database for your thesis or dissertation. To do this in Excel, simply select the “data” tab on the top utility bar and specify “from other sources” before selecting your XML file.

  1. Finally, actually doing TEI and XML.

In this section, I will quickly walk you through the process of creating a simple XML document in accordance with the standards of TEI.

-What you will need

A plain text editing software is the most important tool you are going to need. Atom is a wonderful free software that works well on Macs, as is Sublime Text 2. When saved as an XML file, your tags should appear in color, making the markup process easier, and making it more noticeable when you have forgotten to close a tag. You will also need a plain text copy of the text you are working with.

-Tags

We have already gone over the basics of tags above. Another key issue here is deciding which tags to use, i.e. which tags are useful for the goals of your project or edition. These tags and their correct formatting can be found online (look http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-tag.html for a list of element tags) but a cursory list of common ones can be found below:

<p>; <body> ; <l> ; <name> ; <placeName> ; <abbr> ; <head> ; <quote> ; <date> ; <time>

Once you have tagged all the elements you wish to tag in your text (making sure to close all of your tags!) you have to add a TEI header in order to ensure that your document adheres to the TEI standards. Examples of TEI headers can be found here: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html

-Get Validated!

A useful resource for checking your code can be found here: http://teibyexample.org/xquery/TBEvalidator.xq

This validator will let you know whether your code follows the TEI standards, and if not, where your mistakes can be found.

  1. Further Reading and Resources

There are lots of online resources for TEI and XML, but here are a few of my personal favorites:

http://teibyexample.org/

http://www.tei-c.org/index.xml

https://wiki.dlib.indiana.edu/display/ETDC/TEI+Tutorial

ftp://ftp.uic.edu/pub/tei/teiteach/slides/louslide.html

 Alexander Profaci is an outgoing MA student in Medieval Studies at Fordham University. He will be entering the PhD program in History at Johns Hopkins University in the fall. Follow him on Twitter @icaforp.

Textual Analysis: The “Text” Part

Another marvelously instructive post from HASTAC scholar Tobias Hrynick, Department of History, Fordham University. Here he outlines some systematic tips and guidelines for creating a digitized version of a text. 

The digital analysis of text constitutes the core of the digital humanities. It is here that Roberto Busa and his team began, and though subsequent scholars have expanded somewhat, exploring the possibilities of digital platforms for applying geographic analysis or presenting scholarship to wider audiences, the humanists’ interest in text has ensured the growth of a healthy trunk directly up from the root, along with the subsequent branches.

Necessary for all such projects at the outset, however, is the creation of a machine-readable text, on which digital analytical tools can be brought to bear. This process is generally more tedious than difficult, but it is nevertheless fundamental to digital scholarship, and a degree of nuance can be applied to it. What follows is intended as a basic introduction to some of the appropriate techniques, intended to highlight useful tools, including some (such as Antconc and Juxta) which also have powerful and user-friendly analytic functionality.

Acquiring Text

The appropriate way of acquiring a machine-readable text file (generally a .txt file, or some format which can be easily converted to .txt), and the difficulty involved in doing so, varies according to several factors. Often, digital versions of the text will already exist, so long as the text is old enough that the copyright has expired, or new enough that it was published digitally. Google Books, Project Gutenberg, and Archive.org all maintain substantial databases of free digital material. These texts, however, are all prone to errors – Google Books and Archive.org texts are generally created with a process of scanning and automated processing that is likely to produce just as many errors as performing this process yourself.

Such automated processing is called Optical Character Recognition (OCR). It requires a great deal of labor intensive scanning if you are working from a print book – though a purpose-built book scanner with a v-shaped cradle will speed the work considerably, and a pair of headphones will do a great deal to make the process more bearable.

Toby 1st photo.png

Once you have .pdf or other image files of all the relevant text pages, these files can be processed by one of a number of OCR software packages. Unfortunately, while freeware OCR software does exist, most of the best software is paid. Adobe Acrobat (not to be confused with the freely available Adobe Reader) is the most common, but another program, ABBYY Finereader deserves special mention for additional flexibility, particularly for more complicated page layouts, and a free trial version.

As a quick glance through the .html version of any Archive.org book will confirm, the outcome of an OCRing process is far from a clean copy. If a clean copy is required, you will need to expend considerable effort editing the text.

Toby 2nd photo

The other option is to simply re-type a given text in print or image format into a text editor – both Apple and Windows machines come with native text-editors, but if you are typing at length into such an editor, you might prefer a product like Atom or Notepad++. Neither of these platforms provides any crucial additional functionality, but both offer tabbed displays, which can be useful for editing multiple files in parallel; line numbers, which are helpful for quickly referencing sections of text; and a range of display options, which can make looking at the screen for long periods of time more pleasant. Alternately, you can choose to type out text in a word processor and then copy and paste it into a plain-text editor.

Assuming there is no satisfactory digital version of your text already available, the choice between scanning and OCRing, and manually retyping should be made keeping the following factors in mind:

  1. How long is your text?

This is important for two reasons. First, the longer a text is, the more the time advantage of OCR comes into play. Second, the longer a text is, the more individual errors within it become acceptable, which can sometimes make the time-consuming process of editing OCRed text by hand less critical. Digital textual analysis is best at making the king of broad arguments about a text in which a large sample size can insulate against the effects of particular errors.

The problem of this argument in favor of OCR is that it assumes the errors produced will be essentially random. OCR systems, however, when they make mistakes are likely to make that same mistake over and over again – particularly common are errors between the letters i, n, m, and r, and various combinations thereof – such errors are likely to cascade across the whole text file. A human typist might make more errors over the course of a text, especially a long text in a clear type-face, but the human is likely to make more random errors, which a large sample size can more easily render irrelevant.

That said, OCR should still generally be favored for longer texts. While automated errors can skew your results more severely than human ones, they are also more amenable to automated correction, as will be discussed in the next section.

  1. What is the quality of your print or image version?

Several features of a text which might cause a human reader to stumble only momentarily will cripple an OCR systems ability to render good text. Some such problems include:

  • A heavily worn type-face.
  • An unusual type-face (such as Fractur).
  • Thin pages, with ink showing through from the opposite side.

If your text or image has any of these features, you can always try OCRing to check the extent of the problem, but it is wise to prepare yourself for disappointment and typing.

  1. How do you want to analyze the text?

Different kinds of study demand different textual qualities. Would you like to know how many times the definite article occurs relative to the indefinite article in the works of different writers? Probably, you don’t need a terribly high quality file to make such a study feasible. Do you want to create a topic model (a study of which words tend to occur together)? Accuracy is important, but a fair number of mistakes might be acceptable if you have a longer text. Do you intend to make a digital critical edition highlighting differences between successive printings of nineteenth century novels? You will require scrupulous accuracy. None of these possibilities totally preclude OCRing, especially for longer texts, but if you choose to OCR, expect a great deal of post-processing, and if the text is relatively short, you might be better served to simply retype it.

Cleaning Text

Once you have produced a digital text, either manually or automatically, there are several steps you can take to help reduce any errors you may have inadvertently introduced. Ultimately, there is no substitute for reading, slowly and out loud, by an experienced proof-reader. A few automated approaches, however, can help to limit the labor for this proof-reader or, if the required text quality is not high, eliminate the necessity altogether.

  1. Quick and Dirty: Quickly correcting the most prominent mistakes in an OCRed text file.

One good way of correcting some of the most blatant errors which may have been introduced, particularly the recurring errors which are common in the OCRing process, is with the use of concordancing software – software which generates a list of all the words which occur in a text. One such program is Antconc, which is available for free download, and contains a package of useful visualization tools as well.

Toby 3rd photo.png

Once you have produced a text file, you can load it using AntConc, and click on the tab labeled Word List. This will produce a list of all the words occurring in the text, listed in order of frequency. Read through this list, noting down any non-words, or words whose presence in the text would be particularly surprising. Once you have noted down all the obvious and likely mistakes, you can correct them using the Find and Find and Replace tools on your preferred text editor.

This method of correction is far from fool-proof. Some subtle substitutions of one plausible word for another will likely remain. This is, however, a good way of quickly eliminating the most glaring errors from your text file.

A similar effect can be achieved using the spell-check and grammar check functions on a word processor, but there are several reasons the concordance method is generally preferable. First, reading through the list of words present in the text will tend to draw your attention to words which are real, but unlikely to be accurate readings in the context of the text, which would be over-looked by spelling and grammar-check functions. Second, a concordancer will present all the variants of a given word which occur in the text – through alternate spelling, use of synonyms, or varying grammatical forms (singular vs. plural, past vs. future) – which might be significant for your analysis.

  1. Slow and Clean: Cross-Collating multiple Text Files

A more extreme way of correcting digitized text is to produce multiple versions and to collate them together. Because of the frequency of similar errors being repeated across OCR versions, comparing two OCR versions is of limited use (although if you have access to more than one version of OCR software, it might be worth trying). It is of greater potential use if you compare two hand-typed versions, or a hand-typed version and an OCRed version, which are much less likely to contain identical errors.

Cross-comparison of two documents can be accomplished even using the merge document tools on Microsoft Word. A somewhat more sophisticated tool which can accomplish the same task is Juxta. This is an online platform (available also as a downloadable application), which is designed primarily to help produce editions from multiple varying manuscripts or editions, but which is just as effective as a way of highlighting errors which were introduced in the process of digitization.

Toby 4th photo

This process is a relatively thorough way of identifying errors in digitized text, which can even identify variations that might escape the attention of human proofreaders. The major weakness of the technique, however, is that it requires you to go through the effort of producing multiple different versions, ideally including one human-typed version. If you need a scrupulously corrected digital text, however, it is a powerful tool in your belt, and in the event that multiple digital versions of your text have already produced, it is an excellent way of using them in concert with one another – another strength of the Juxta platform is that you can collate together many different versions of the same text at once.

Conclusion

Once you have a digitized and cleaned version of the text in which you are interested, a world of possibilities opens up. At a minimum, you should be able to use computer search functions to quickly locate relevant sections within the text, while at maximum you might choose to perform complex statistical analysis, using a coding language like R or Python.

A good way to start exploring some of the possibilities of digital textual analysis is to go back and consider some of the tools associated with Antconc other than its ability to concordance a text. Antconc can be used to visualize occurrences of a word or phrase throughout a text, and to identify words which frequently occur together. Another useful tool for beginners interested in text analysis is Voyant, which creates topic models – visualizations of words which frequently occur together in the text, which can help to highlight key topics.

Exciting Spring Events!

After a hiatus last semester, the Fordham Graduate Student Digital Humanities Group is back with a bang.  We’ve got a great list of events coming up, and two series going on.

FGSDH Events
Rose Hill Campus, 2pm-3pm
February 4: Debates in the Digital Humanities
February 25: Digital Pedagogy
March 25: Building and Maintaining an Online Profile
April 18: Wikipedia Edit-A-Thon

Topics in Digital Mapping Events
Lincoln Center Campus, 3-5pm Workshops, 2-3pm Meet&Greet
February 11: Thinking about Time with Maps: Timelines/Palladio
March 4: Georectifying/MapWarper
April 15: Intro to CartoDB

Tomorrow (Dec. 4), 12:30-2:00pm, Dealy 115 – Talk & Discussion led by Kristen Mapes on Digital Humanities Class

Please join us tomorrow, Dec. 4, from 12:30-2:00pm in Dealy 115. Kristen Mapes willl speak about taking “Digital Humanities” as a graduate level course at the Pratt Institute.

Topics to be discussed: What topics are covered? How are they addressed? What is the value of taking a DH-specific class rather than simply incorporating DH into pre-existing classes?

This will be an informal conversation about Digital Humanities as a course topic and  the graduate student perspective on learning about DH in a formal way. Come to hear and discuss (and eat cookies) tomorrow at 12:30 in Dealy 115!

See you there!

NYC Digital Humanities Inaugural Event, Saturday, 9/25

Image

The NYCDH Inaugural Event took place last Saturday at the Humanities Initiative at New York University.  Many attendees faithfully live-tweeted it at #nycdh, including a significant Fordham contingent: @kmapesy, @ecornell1, @mickimcgee, @diyclassics and @FordhamGSDH!

The two morning sessions on Building NYCDH were led by Lynne and Ray Siemens, two visiting professors from the University of Victoria, currently at NYU.  They discussed the process of building and running a digital humanities center, and the importance of dialogue, discussion and re-discussion, and interdisciplinary and inter-departmental (or inter-institutional!) work for the success of any DH project.  I can’t summarize their talks better than the working notes, so let me just say my biggest takeaway was that we may fail to conclusively define the digital humanities — and that’s okay, as long as we keep talking about it and trying to re-define it.

A summary of lightning talks on a variety of topics can alo be found in the working notes: the range of projects was fascinating, and a wonderful reminder of how lucky we are to be in a city like New York.

After the morning’s traditional conference presentations the afternoon was an unconference.  It was the first time I’d been to an unconference — I’ve heard a lot about them, but hadn’t ever attended one.  As it turns out, my unfamiliarity with the format ended up giving me a bit of a surprise!

During lunch, we wrote topics of interest on a whiteboard, and after lunch, we voted on which topics the group wanted most to discuss.  I was excited that other people wanted to talk about “metadata and DH project sustainability,” and it got through to be one of the final four sessions.  Then I found out I’d be leading it!  Fortunately, it was during the second time slot, so I had a little bit of time to prepare.  I have to admit, though, the first unconference session on pedagogy and DH drew me in pretty fast, and hearing the ways in which different people use DH tools in their classes, or even teach entire classes on the digital humanities, was fascinating, especially since I’m TA’ing this semester, and will be teaching my own classes next year.

The session on metadata was a small one, which isn’t all that surprising: not everyone is excited to talk about cataloging, project hosting and formatting our projects with the future in mind.  But we had a good variety of people in the room, library school students and academics, those with years of experience with DH and with technology and programming and those who were just coming to the field.

We ended up talking not only about metadata and its importance (why create something, if no one can find it?) and the persistence of projects, but about the role of digital humanities more broadly in the world of scholarship.  Questions of citation and of numbers of authors credited for a project came up, and the observation was made that the sciences seem to handle multiple-authorship more gracefully than the humanities.  We also discussed the question of the tension between open access and traditional scholarly publishing, and whether the digital humanities have any obligation to be open access, especially when they draw on open access sources.

The conference’s closing remarks included a list of recommended resources, which are listed in the conference notes (linked above).  At 5:30, we retired to the Swift Hybernian Lounge, just around the corner.

I would encourage anyone in the NYC area to join NYCDH.org and be part of the process of creating the NYC DH community!  As a newly-formed group, the options for where it might go are still very flexible, and it promises to help draw together expertise and opportunities in really beneficial ways.

Photo of Alisa Beer
–Alisa Beer