Maarten Marx, University of Amsterdam
- Digging into Data, call 2013 project
- Canada (U Toronto), The Netherlands (U Amsterdam), UK (History of Parliament Trust et al)
What do we want to preserve?
- Annotate every word/phrase with
- context of speech
- speakers name
- speakers party (at time of speech)
- speakers role/function (at time of speech)
- Example: De Marx-oriëntatie
- Non trivial task in case of OCRed data
- text order is often broken
- Speakers and speaker changes can be hard to recognize due to OCR errors
- As long as possible.
- Deal with many input formats: transform all of them to common output format and schema
- NL: 200 years; UK: 80 years; Canada: ?? (10-200)
Recognize internal and external entities which can be linked to external databases.
We link to our internal representation of these entities. These link again to the LOD.
All software works on any set of documents in the prescribed data format.
Why? -main application
- Give historians and political scientists possibility to do
diachronic comparative research.
- Both quantitative (saliency) and qualitative (search/browse/explore)
- Profiling Politicians and parties
- Dilipad: "immigration" research
Who says what about immigration in national legislative debates, and when and why
do they say it? Presumably, immigration has defenders as well as detractors. Has the
framing of immigration changed over time?
- Lijphart project (with Voerman and Wijfjes): did the verzuiling really occur?
- DiLiPaD and Talk of Europe have common goals and aims
- Linking proceedings increases their value
2 year half time postdoc on DiLiPaD project. Based in Amsterdam. Collaboration with U Toronto, King's College, UK parliament. Possibility of working in Dutch parliament/Royal library.