Restricted access

Purchase article

USD  $25.00

1 year subscription

USD  $360.00

Abstract

We investigate the cost-effectiveness of special-purpose crawled corpora versus more focused corpora for automatic terminology extraction (ATE). Our focus is on medical terminology on heart failure for two languages, viz. English for which we have more web and specialized resources at our disposal and the less resourced Dutch. We show that, although term density in the dedicated corpora is larger for both languages, the potential for term extraction is higher in the crawled corpora than in the dedicated corpora. Furthermore, in a set of experiments in which we evaluate both types of corpora, while keeping size constant, we observe that more Gold Standard (GS) terms are covered by the “noisy” crawled corpus than with a dedicated corpus of the same size.

  • Baroni, M. & Bernardini, S. 2004. BootCaT: Bootstrapping Corpora and Terms from the Web. In: Proceedings of LREC 2004. Lisbon, Portugal.

    • Search Google Scholar
    • Export Citation
  • Baroni, M., Kilgarriff, A., Pomikálek, J. & Rychly, P. 2006. WebBootCaT: Instant Domain-specific Corpora to Support Human Translators. In: Proceedings of the EuraLex Conference 2006. Torino, Italy. 247252.

    • Search Google Scholar
    • Export Citation
  • Baroni, M. & Ueyama, M. 2006. Building General- and Special-purpose Corpora by Web Crawling. In: Proceedings of the 13th NIJL International Symposium, Language Corpora: Their Compilation and Application. Tokyo, Japan. 3140.

    • Search Google Scholar
    • Export Citation
  • Costa, H., Corpas Pastor, G., Mitkov, R. & Seghiri, M.. 2015. Towards a Web-based Tool to Semi-automatically Compile, Manage and Explore Comparable and Parallel Corpora. In: Proceedings of the 7th International Conference of the Iberian Association of Translation and Interpreting Studies, AIETI. Malaga, Spain.

    • Search Google Scholar
    • Export Citation
  • Corpas Pastor, G. & Seghiri, M. (eds) 2016. Corpus-based Approaches to Translation and Interpreting. From Theory to Applications. Bern, Switzerland: Peter Lang;

    • Search Google Scholar
    • Export Citation
  • De Boer, V. 2010. Ontology enrichment from heterogeneous sources on the web. PhD Amsterdam: University of Amsterdam.

  • De Groc, C. 2011. Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction. In: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology. Vol. 1. Washington DC, USA: IEEE Computer Society. 497498.

    • Search Google Scholar
    • Export Citation
  • De Schryver, G.-M. 2002. Web for/as Corpus: A Perspective for the African Languages. Nordic Journal of African Studies Vol. 11. No. 2. 266282.

    • Search Google Scholar
    • Export Citation
  • Ghani, R., Jones, R. & Mladnic, D. 2001. Mining the Web to Create Minority Language Corpora. In: Proceedings of the 10th International Conference on Information and Knowledge Management. Atlanta, GA, USA: ACM. 27642767.

    • Search Google Scholar
    • Export Citation
  • Ghani, R., Jones, R. & Mladnic, D. 2004. Building Minority Language Corpora by Learning to Generate Web Search Queries. Knowledge and Information Systems Vol. 7. No. 1. 5683.

    • Search Google Scholar
    • Export Citation
  • Heylen, K. & De Hertog, D. 2015. Automatic Term Extraction. In: Kockaert, H. J. & Steurs, F. (eds) Handbook of Terminology. Amsterdam/Philadelphia: John Benjamins Publishing Company. 203221.

    • Search Google Scholar
    • Export Citation
  • Kilgarriff, A. & Grefenstette, G. 2003. Introduction to the Special Issue on the Web as Corpus. Computational Linguistics Vol. 29. No. 3. 333347.

    • Search Google Scholar
    • Export Citation
  • Macken, L., Lefever, E. & Hoste, V. 2013. TExSIS: Bilingual Terminology Extraction from Parallel Corpora Using Chunk-based Alignment. Terminology Vol. 19. No. 1. 130.

    • Search Google Scholar
    • Export Citation
  • Maynard, D., Li, Y. & Peters, W. 2008. NLP Techniques for Term Extraction and Ontology Population. In: Buitelaar, P. & Cimiano, P. (eds) Ontology Learning and Population: Bridging the Gap between Text and Knowledge, Vol. 167. Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press. 107127.

    • Search Google Scholar
    • Export Citation
  • Morin, E., Daille, B., Takeuchi, K. & Kageura, K. 2007. Bilingual Terminology Mining – Using Brain, Not Brawn Comparable Corpora. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic: ACL. 664671.

    • Search Google Scholar
    • Export Citation
  • Pearson, J. 1998. Terms in Context. In: Tognini-Bonelli, E. (ed.) Studies in Corpus Linguistics, Vol. 1. Amsterdam/Philadelphia: John Benjamins Publishing Company.

    • Search Google Scholar
    • Export Citation
  • Pinkham, J. 1996. Grammar Sharing between English and French. In: Proceedings of the NLP-IA Conference. 4–6. June, Moncton, Canada

  • Scannell, K. 2007. The Crúbadán Project: Corpus Building for Under-resourced Languages. In: Fairon, C., Naets, H., Kilgarriff, A. & De Schryver, G.-M. (eds) Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop. Louvain-la-Neuve, Belgium: Presses universitaires de Louvain. 515. June, Moncton, Canada.

    • Search Google Scholar
    • Export Citation
  • Varantola, K. 2003. Translators and Disposable Corpora. In: Zanettin, F., Bernardini, S. & Stewart, D. (eds) Corpora in Translator Education. Manchester: St Jerome Publishing. 5570.

    • Search Google Scholar
    • Export Citation
  • Vintar, S. 2010. Bilingual Term Recognition Revisited. Terminology Vol. 16. No. 2. 141158.

  • Wendt, M., Büscher, C., Herta, C., Gerlach, M., Messner, M., Kemmerer, S., Tietze, W. & Düwiger, H. 2009. Extracting Domain Terminologies from the WorldWideWeb. In: Proceedings of the Web as Corpus Workshop (WAC5). 7987.

    • Search Google Scholar
    • Export Citation
  • Wong, W., Liu, W. & Bennamoun, M. 2008. Constructing Web Corpora through Topical Web Partitioning for Term Recognition. In: Wobcke, W. & Zhang, M. (eds) Proceedings of the Australian Joint Conference on Artificial Intelligence. Berlin/Heidelberg: Springer. 6778.

    • Search Google Scholar
    • Export Citation
  • Xu, F., Kurz, D., Piskorski, J. & Schmeier, S. 2002. A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping. In: Proceedings of LREC 2002. Las Palmas de gran Canaria, Spain.

    • Search Google Scholar
    • Export Citation
  • Zanettin, F. 2002. Corpora in Translation Practice. In: Proceedings of the First International Workshop on Language Resources (LR) for Translation Work and Research. Las Palmas de gran Canaria, Spain. 1014.

    • Search Google Scholar
    • Export Citation

The author instruction is available in PDF.
Please, download the file from HERE.

  • Impact Factor (2019): 0.360
  • Scimago Journal Rank (2019): 0.648
  • SJR Hirsch-Index (2019): 13
  • SJR Quartile Score (2019): Q1 Linguistics and Language
  • SJR Quartile Score (2019): Q1 Language and Linguistics
  • Impact Factor (2018): 1.16
  • Scimago Journal Rank (2018): 0.683
  • SJR Hirsch-Index (2018): 11
  • SJR Quartile Score (2018): Q1 Linguistics and Language
  • SJR Quartile Score (2018): Q1 Language and Linguistics

Language: English

Founded in 1999
Size: B5
Publication: One volume of two issues annually
Indexing and Abstracting Services:

  • Arts & Humanities Citation Index
  • Linguistics Abstracts
  • Linguistics and Language Behaviour Abstracts
  • Translation Studies Abstracts
  • SCI-EXPANDED
  • Social Sciences Citation Index

 

Subscribers can access the electronic version of every printed article.

Senior editors

Editor(s)-in-Chief: Klaudy, Kinga

Managing Editor(s): Károly, Krisztina

Consulting Editor(s): Heltai, Pál

Editorial Board

      Jettmarová, Zuzana
      Pym, Anthony
      Snell-Hornby, Mary
      Tirkkonen-Condit, Sonja

Advisory Board

      Baker, Mona
      Chesterman, Andrew
      Corpas Pastor, Gloria
      Dimitriu, Rodica
      Dollerup, Cay
      Englund Dimitrova, Birgitta
      Gentzler, Edwin
      Gottlieb, Henrik
      Kalina, Sylvia
      Kierzkowska, Danuta
      Király, Donald
      Kurz, Ingrid
      Laviosa Sara
      Nord, Christiane
      Prószéky, Gábor
      Riccardi, Alessandra
      Robin, Edina
      Salama-Carr, Myriam
      Sohár, Anikó
      Ulrych, Margherita
      Vermes, Albert

Prof. Kinga Klaudy
Eötvös Loránd University, Department of Translation and Interpreting
Múzeum krt. 4. Bldg. F, I/9-11, H-1088 Budapest, Hungary
Phone: (+36 1) 411 6500/5894
Fax: (+36 1) 485 5217
E-mail: