A tool to build labeled datasets, machine learning models, and new applications that can connect people online to high quality legal help.
How do we proactively spot people’s issues, that could be solved through legal help — even if they do not know that they are legal, or how to present them like a lawyer would? Our project focuses on the large amounts of people expressing legal needs online — on Internet forums, search engines, social media, and chats and emails with librarians, courts, and non-profits.
On forums like Reddit, search engines like Google, and social media platforms like Facebook, people write up their stories of life problems to either seek help, or just to express themselves. Similarly, people come to courts and legal aid groups’ websites, and use email addresses, feedback forms, and live chat windows to tell their story, in the hope that someone can help them with their problems.
We see this as a huge opportunity to connect people promptly with legal resources that are jurisdiction-correct, issue-specific, and immediately actionable. The goal is that we can spot people’s issues and trends through their online writing, and use it to provide direct services to them as well as more general understanding of ‘legal health’ needs, trends, and clusters (like with a public health/digital epidemiology approach).
Learned Hands is a game to crowd-label people’s stories with a standardized list of legal issue codes. It’s a mobile-friendly web application that you’re welcome to come play and earn pro bono credit with. Stanford Legal Design Lab and Suffolk LIT Lab built Learned Hands, starting in autumn 2018, with the support of the Pew Charitable Trusts.
On the game, players read stories from Reddit and other data sets where people have described their life problems. They vote if certain types of legal issues might be present. The game rewards people with points and badges for their work — with gamification to engage players. Lawyers and law students can also earn pro bono credit for their game-play.
The users read each story, and then label whether certain legal issues are present — is there Family Law issue? Housing law issue? Money and Consumer issues? After classifying the stories at a high level, then users label more specific issues — is there a Divorce issue present? Child custody? Domestic violence?
The game builds a labeled dataset, for AI R&D and Legal Needs research
Each player’s vote trains machine learning models to automatically issue-spot. As a consensus emerges about whether an issue is present, then the game produces a ‘labeled dataset’ of what issues appear in which problem descriptions.
The labeled dataset is a key component to build AI models that can predict whether a certain legal issue is present in people’s stories. Gradually, as the models learn from the labels, they can automatically spot high-level and specific legal issues within people’s stories.
In addition to the labeled dataset, the models help us do basic research into key questions around access to justice: what are people’s key legal needs, how are they clustered or patterned together, and how are they expressed? We are using latent topic modeling of the Reddit stories, to identify organic patterns of people’s problems and their phrasings of them. This allows us to build a better taxonomy of legal issues, as well as developing an alternative way to understand legal needs (aside from surveys or legal aid experts’ analysis of their own organizational data, which have been the predominant protocols so far).
Moving towards predictive diagnosis online and during intakes
Using the labeled datasets from Learned Hands gameplay, our team is building NLP models that can classify legal issues present in people’s Reddit (or Reddit-like) stories.
These predictive models can then be used in case management systems of legal aid groups, service providers or medical clinics; in live chats; on social media; and other places where people’s problem stories are captured. They can scan text to suggest to the person or service-provider that the person might need a particular type of legal help.
Bots using the models can suggest what legal issue a person might have, and connect them with resources — like lists of local help pages, service providers, and legal terminology and explainers.
The near future of more datasets and matching labeled stories with labeled resources
With more datasets, aside from Reddit, we can have a more diverse population represented as the AI develops. This means having datasets from law libraries chat lines, legal aid groups intake emails and chats, and other places where people are talking about possible legal problems. With more datasets, we will be able to train our models so that they do not only respond to the phrasings and problems of the population of the Reddit forum.
We can also scale the project up by using it to label online help resources that courts and legal aid groups provide. With the same labels we are labelling people’s problems, we are labeling the resources on these websites. In that way, our application can produce a model that is similar to a Rosetta Stone. It can match people’s questions with online resources that can serve them.
We are inspired by Nora al-Haider’s Divorce Bot on Reddit, and are exploring building similar interventions that can direct people on forums and social media to higher quality, local legal help.
Replicability + Building an Access to Justice/AI Ecosystem
The Learned Hands game platform can be used by others wanting to produce labeled datasets, to create AI models and bots for spotting justice issues in other languages and contexts. If you are interested in adapting it to have content in another language or format labeled for AI purposes, please write us!
By making a legal help dataset that is publicly available, Learned Hands can encourage more people building chatbots and online tools to use standard labels and codes in how they mark up their tools. In the future these tools can better link to each other and hand people from one tool to another.