We are building out National Subject Matter Index version 2, as a master-list of the legal issues that people face in the US today — to use as a data standard for AI, applications, and website markup.
Our work builds off of NSMI version 1 (a 2000s era list of legal issue codes that legal aid groups made, with sponsorship by LSC — mainly to track their own project work and billing/grants/finances). It combines this taxonomy with other lists of issues from legal aid groups and legal help website admins.
Our goal is to improve the National Subject Matter Index, to take it from an expert-centered taxonomy to a more user-centered taxonomy that can better link people’s phrasings of problems with experts’. Our research is examining if different rephrasings of these taxonomy terms; if different hierarchies and groupings of these terms; and if additional terms can improve the usability of this taxonomy for spotting legal issues and providing information.
You can see our NSMI version 2 here, at only high-level categories, to review our current working draft.
It takes this compiled list of taxonomies’ terms, and is refining it to make a NSMIv2 that ideally can be used to:
- Consistently label legal content + people’s stories, about what legal issue is present (like on Learned Hands, where we need standardized labels for how we’re categorizing Reddit stories’ legal issues)
- Have understandable, non-jargon terms for these issues (so that non-experts, like law students or paralegals) can understand what the term means
- Link a term to multiple parents, to recognize that the same issue may be categorized within multiple legal families of issues
Are you a lawyer, law librarian, or information scientist who wants to help us build and review this NSMI v2 taxonomy? We would love your support: fill in this form to let us know your interest.
We have been working since 2018 to take the original National Subject Matter Index and update it to be ready for more machine learning-purposes, and to include more user-centered phrasings and arrangements of issues.
Our essential steps in creating and vetting this revised National Subject Index v2 is as follows:
- Clean Up the Draft Taxonomy: Flesh out + clean up our large list of terms that we have compiled from multiple groups, with de-duplication, re-writing of all the children terms that are vague or jargony, and identifying cross-parents
- Expert Review Sessions: Checking our hierarchies and terminologies with legal experts, to hear if they have concerns about topics’ organization, level on hierarchy, or other issues that need to be present. This is done through phone or video calls with experts, on specific branches of the taxonomy
- Topic Model Review: We will see what the topic modeling of Reddit posts (and other online communities) pull out as clusters of issues. Based on this we will see if our taxonomy is covering all the issues that are showing up on Reddit. This is like a large, digital focus group to see what clusters of issues are present.
- Comparison to other Taxonomies: We will compare our proposed taxonomy categories to those of other legal aid groups, including the site maps of statewide legal help portals (that also present taxonomies of common issues for people to find resources) and other applications that have created taxonomies of legal issues.
- Review through running legal/novice checks through Learned Hands: We will work with Metin + the Learned Hands interface to set up controlled tests in which we present the same posts and term-labels to different cohorts of users (experts and novice). We will see if they are using the same terms consistently, and if they are confused or unsure about how to apply the labels.
- Mapping Code #s between Taxonomies: Each term in NSMIv1 has a numeric code. We made new codes in NSMIv2. In order to have people who have used NSMIv1 be able to coordinate with the new taxonomy (if they like), we need to map the v1 codes with the v2 codes.
You can read more about our process in an article at Legal Design and Innovation, “Every legal problem that exists: the legal help taxonomy for machine learning”.