Close menu

SURE

Sunderland Repository records the research produced by the University of Sunderland including practice-based research and theses.

Text Mining Legal Documents for Clause Extraction

Vidler, Tony, McGarry, Kenneth and Baglee, David (2023) Text Mining Legal Documents for Clause Extraction. In: The 19th International Conference on Data Science (ICDATA'23), 24-27 Jul 2023, Las Vegas, USA.

Item Type: Conference or Workshop Item (Paper)

Abstract

Natural Language Processing (NLP) solutions for legal contracts have been the preserve of large law firms and other industries (e.g., investment banks), especially those with large amounts of resources, having both the volume and range of legal documents and manpower to label the training data. The findings suggest that it is possible to use a smaller volume of training contacts and still generate results that are within an acceptable range. Our results show that just 120 training contracts trained on a pre-trained language model can generate results that are within 10% of the same model trained on 3.3 times the volume. In conclusion, smaller law firms could benefit from machine learning NLP solutions for clause extraction.

[img]
Preview
PDF
CSCE23-vidler v4.pdf - Accepted Version

Download (716kB) | Preview

More Information

Uncontrolled Keywords: NLP, Text Mining, Legal Clauses, Deep Learning, BERT.
Depositing User: Kenneth McGarry

Identifiers

Item ID: 16508
URI: http://sure.sunderland.ac.uk/id/eprint/16508
Official URL: https://icdatascience.org/

Users with ORCIDS

ORCID for Kenneth McGarry: ORCID iD orcid.org/0000-0002-9329-9835
ORCID for David Baglee: ORCID iD orcid.org/0000-0002-7335-5609

Catalogue record

Date Deposited: 21 Aug 2023 10:18
Last Modified: 14 Sep 2023 15:02

Contributors

Author: Kenneth McGarry ORCID iD
Author: David Baglee ORCID iD
Author: Tony Vidler

University Divisions

Faculty of Technology > School of Computer Science

Subjects

Computing > Data Science
Computing > Artificial Intelligence

Actions (login required)

View Item (Repository Staff Only) View Item (Repository Staff Only)