Artificial Intelligence for Learning the Law

Artificial Intelligence for Learning the Law: 

Generative AI for Academic Support in Law Schools and Universities

Michael D. Murray[*]

Table of Contents

Introduction 36

I. Introduction to the use of AI Tools for Learning and Academic Support: A Review of the Literature 38

II. Objectives and Thesis of the Study 43

III. Methodology 44

A. Part I – Comparative Evaluation of Claude 3.5 Sonnet, Copilot, Gemini Pro 1.5, GPT-4o, and Lexis+ AI for Academic Support 44

1. Prompt Engineering Methods 45

2. Use of Counterfactuals 52

3. Designing the Tasks for Explanation, Elaboration, Simplification, Summarization, and Assessment 53

B. Part II – Use of a Generative AI System to Improve a Leaner’s Performance on an Objective Examination in Courses the Learner has not yet Taken 59

1. Selection of Researchers 59

2. Design of the Experiment 60

IV. Results 72

A. Summary of Comparative Evaluations and Conclusions from Part I of the Experiments 72

1. Researcher ratings of GAI systems based on task performance 72

2. Comparative Evaluations and Conclusions 82

B. Summary of Comparative Evaluations and Conclusions from Part II of the Experiments 84

Conclusion 88

Introduction

The integration of artificial intelligence (AI) and generative artificial intelligence (GAI) in education has garnered significant attention in recent years, transforming traditional learning environments into dynamic, adaptive, and personalized experiences. AI tools, particularly Intelligent Tutoring Systems (ITS),[1] adaptive learning platforms,[2] and AI-augmented tutoring solutions,[3] have shown promise in enhancing student engagement, improving learning outcomes, and providing tailored academic support. The current leading models of verbal GAI—Anthropic’s Claude 3.5 Sonnet,[4] Microsoft’s Copilot (formerly known as Bing Chat),[5] OpenAI’s GPT-4o Omni,[6] Google’s Gemini 1.5 Pro,[7] and LexisNexis’ Lexis+ AI[8]—have opened up new horizons for ITS and adaptive learning.

This paper reports research conducted from December 2022 to August 2024 and, in particular, Part I of the experiments conducted from May 20 to July 12, 2024, and Part II from August 15-27, 2024, on the use of GAI in legal education and academic support. This study was a cross-sectional, qualitative evaluation of GAI systems at a certain point in time and at the level of development of each system at that point in time. Although the topic of this study is learning the law, the results and overall approach to using an AI as a personalized learning tutor can be applied to many graduate and undergraduate programs in universities and other levels of education. This paper reports the Part I experiments and their qualitative and comparative findings comparing the performance of the leading public-facing general purpose transformer-neural network-large language model (T-NN-LLM) systems of Claude 3.5 Sonnet, Copilot, Gemini 1.5 Pro, and GPT-4o Omni and a law-specific T-NN-LLM system with a curated legal dataset, Lexis+ AI. The study will reveal which systems performed the best as personalized, self-guided, one-on-one law tutors. It also reports the Part II experiments using one T-NN-LLM GAI system, Claude 3.5 Sonnet, as a personalized one-on-one tutor to improve a novice learner’s performance on objective examinations in subjects the learner has never studied.

Introduction to the use of AI Tools for Learning and Academic Support: A Review of the Literature

The advancements in tutoring represented by GAI systems have increased the pace of adoption of AI technologies to the point that GAI tools can play a significant role in academic support in law schools and universities.[9] GAI tools can help a student learn and understand material better, more deeply, and notably faster than traditional means of reading, rereading, notetaking, and outlining.[10] GAI tools, particularly ITS, adaptive learning platforms, and AI-augmented tutoring solutions, have shown promise in enhancing student engagement, improving learning outcomes, and providing tailored academic support. AI can explain, elaborate on, and summarize course material. It can write and administer formative assessments, and, if desired, it can write self-guided summative evaluations and grade them. AI can translate material into and from foreign languages with a fidelity to context, usage, and nuances of meaning not previously seen in machine learning or neural network translation services. AI also can visualize material using the tools of visual GAI that literally paints pictures of the subjects and situations in the material that can overcome students’ literacy issues both in the native language of the communication and in the students’ own native languages. This literature review examines the current state and future directions of AI in education, drawing on insights from recent studies and systematic reviews.

ITS represent a pivotal application of AI in education. These systems leverage machine learning algorithms, data mining, and clustering techniques to analyze student behaviors and provide personalized learning experiences.[11] A systematic review by Xie et al., of the literature from 2007-2017[12] highlights the dual approach of AI integration in ITS: non-intrusive data extraction for learning behavior analysis and performance prediction through supervised learning models.[13] Lin et al.’s review and Xie et al.’s review underscore the potential of AI-embedded ITS to adapt to students’ real-time learning statuses, offering timely interventions and support.[14]

The term “academic support” is typically used to discuss alleviating or overcoming the problems of struggling and failing students, but AI’s strengths are far more broadly applicable. Adaptive learning systems, which tailor instructional content based on real-time assessments of student performance, are a prominent application of AI in education. OATutor, an adaptive tutoring tool developed at Berkeley, exemplifies this approach by integrating GAI to create and deliver customized content and hints.[15] By leveraging AI-generated hints, OATutor can provide students with immediate, contextually relevant support, reducing the reliance on manually created content and facilitating faster content deployment.[16] The tool’s success in subjects like chemistry and mathematics demonstrates its potential to accelerate the development process and enhance learning outcomes.[17] Similarly, the AI tutor Julian at Walden University represents a significant advancement in dynamic, unscripted AI tutoring. Developed in collaboration with Google, Julian can ingest course materials, generate questions, and assess student responses autonomously.[18] This capability allows it to provide personalized tutoring sessions and continuously improve its performance based on student interactions.[19] Julian’s integration into various courses, including early childhood education and sociology, highlights its versatility and potential for broader application across different academic disciplines.[20] A study by Dahri et al.[21] investigated the acceptance of AI-based academic support in Malaysian and Pakistani higher education institutions and its impact on students’ performance. The study found that cultural and contextual factors significantly influence the acceptance and effectiveness of AI tools in educational settings.[22] The research emphasizes the need for localized adaptations of AI technologies to meet the specific needs of diverse student populations, thereby enhancing their engagement and academic performance.[23]

GAI has opened doors for our thinking and understanding of pedagogy that many had never expected to be opened, at least not in the average adult person’s lifetime. Virtually every subject and teaching practice in law school, graduate, and undergraduate teaching stands to be affected by AI. Tammy Pettinato Oltz has discussed the implications of ChatGPT for legal education, highlighting its potential to assist law students in understanding complex legal concepts and preparing for exams.[24] Javaid et al.’s study has shown that by providing instant feedback and clarifications, ChatGPT can support students in their studies and improve their academic performance.[25]

The nature and form of our assessments and evaluations, the level of rigor required in our own teaching and in our students’ learning, and the very conception of what it means to know something well and produce work evincing that level of knowledge all may need to be examined and reexamined in light of current AI abilities.[26] But one thing that might not be altered is our basic learning objectives for courses.[27] We want students to achieve an appropriate level of mastery of subjects and topics that will enable them to perform as a properly educated person in their field.[28] To foster the development of pedagogically potent and ethically sound AI-integrated learning landscapes, it is pivotal to critically explore and qualitatively assess AI’s use as a transformative tool capable of amplifying student self-efficacy and learning motivation while avoiding overdependence on the AI system and superficial learning habits that erode students’ social and critical thinking skills.[29]

In legal education at the law school, graduate, or undergraduate level, Oltz’s study of ChatGPT[30] is joined by a comprehensive survey of legal LLMs by Lai, Gan, et al.,[31] and by the study of GPT-4 conducted by Choi, Monahan, and Schwarcz.[32] Choi et al. trained sixty University of Minnesota law students to use AI, then had them complete four legal tasks—”drafting a complaint, a contract, a section of an employee handbook, and a client memo”—either using GPT-4 or without the technology, and assessed the results.[33] The investigators asserted that GPT-4 contributed to “consistently induced large declines in the amount time taken to complete tasks,” enhancing students’ speed by as much as 32%.[34] But while GPT-4 did not harm the quality of the test subjects’ work, the study found that students’ use of GPT-4 only slightly and inconsistently improved the quality of the students’ legal analysis and writing.[35] What is most relevant to this study, Choi, et al., concluded that GPT-4 was most advantageous to “lower-skilled” and lower-performing students,[36] indicating that GAI can help level the playing field and improve the performance of disadvantaged students.

Commentary about GAI’s effect on legal education and the practice of law discusses the efficiencies and advantages the technology presents for legal information communication to students, practitioners, and clients alike.[37] Beyond supporting student learning and academic success, AI can be a democratizing force for learners in the legal academy because it can empower students to begin drafting, composing, writing, drawing, or painting images at a level that their own life experiences and education have not prepared them for or enabled them to participate in.[38] AI can empower students to perform creative, artistic, or literary activities related to legal education and law practice at a high level, catching them up to where other classmates would start.[39] First-generation college-goers, graduate students, and traditionally underrepresented students can use the collective knowledge of a large language model to bring themselves to a higher starting point in the process of gaining admission to and finding success in legal education and, ultimately, in the practice of law.[40]

Objectives and Thesis of the Study

The study reported here was designed to test the hypothesis that cutting-edge, general-purpose, public-facing GAI tools—Claude 3.5 Sonnet, Copilot, Gemini 1.5 Pro, and GPT-4o Omni—can be effective tools to assist novices in learning the law. The researchers tested the four leading systems against a law-centric GAI platform, Lexis+ AI, whose dataset is curated to only contain bona fide primary and secondary legal authorities—the very databases used in the Lexis legal search engine product. Part I of the experiments followed a set of five tasks; three of the tasks were scripted with specific prompts and two were open-ended so as to use multiple methods of prompt engineering, especially complexity-based prompting, generated knowledge prompting, maieutic (Socratic) prompting, tree of thought prompting, and chain of thought prompting. Part II of the experiments tested the ability of a general purpose, public-facing GAI system, Claude 3.5 Sonnet, to act as a personalized one-on-one tutor to improve the performance of novice learners on objective examinations in two law subjects that the learners have never studied. 

The current text-based GAI models in the form of Anthropic’s Claude, Microsoft’s Copilot, Google’s Gemini 1.5 Pro, and OpenAI’s GPT-4o Omni models can replicate traditional methods of tutoring and academic support by responding to a learner’s questions and turning back to ask the learner questions and assess the learner’s progress, all in a tirelessly patient manner. The leading GAIs can translate between languages and also translate material downwards to a more simplified level of discussion, all to promote the learner’s knowledge and understanding. Several of the leading GAIs—Claude, Copilot, Gemini, and GPT-4o—are multimodal in nature, meaning you can upload an image, such as a photo, drawing, painting, chart, or diagram, and ask the system to use its computer vision to provide information or answer questions about the uploaded file. All of the systems can receive uploaded text, and Claude and GPT-4o can receive an uploaded PDF and answer questions about the document. By these means, a GAI can explain, elaborate on, and summarize course material; it can interpret, translate, visualize, or reorder parts of the material. AI can evaluate and correct the grammar, spelling, syntax, and style of a piece of student writing, a task that campus writing centers often avoid for pedagogical reasons or simply logistical and resource-driven reasons. 

AI has become a master translator, moving easily from communications in one language and converting them into many languages, and at the same time monitoring the grammar, spelling, syntax, and style of the translated work for fidelity of usage in the target language. At the farther reaches, Copilot, Gemini, and GPT-4o can communicate with illiterate and less than fully literate students through the language of images (i.e., visual communication) by generating visuals to illustrate, depict, diagram, or graph a concept. In sum, GAIs can deliver the gold-standard level of one-on-one, personalized attention for tutoring and academic support.

Naturally, with this amount of power placed in the hands of novice students wielding GAI tools, there is a commensurate amount of responsibility for institutions to teach the proper uses of GAIs professionally, equitably, and ethically both to guaranty that the outputs of AI will be beneficial to students and to preserve the learner’s autonomy.[41] Current textual GAI rely on large language models that were trained on huge amounts of data, and the model builders did not evaluate their source material for truth or bias, or fairness or hatefulness, and instead focused on gathering as much data as possible for the AI to work with.[42] Volume of material was the operating criteria for building large language models, not truth, justice, equity, and inclusion. At the same time, AI has the capacity to collect and run through personal and biometric data, again without thinking because AI does not think. This is an important part of the use experience of these models, and one that needs to be communicated to users who would turn to the AI for truth and correction on a wide range of deeply important topics.[43]

The outcomes of Part I and Part II of the experiments support the primary thesis: all four of the cutting edge, general purpose, public facing GAI tools can assist learners at a novice level of expertise by instructing, tutoring, and assessing the learning of law, and one of the cutting edge general purpose systems, Claude 3.5 Sonnet, proved that it can improve the performance of novice learners on objective examinations in two law subjects that the learners have never studied.

Methodology

Part I – Comparative Evaluation of Claude 3.5 Sonnet, Copilot, Gemini Pro 1.5, GPT-4o, and Lexis+ AI for Academic Support

Part I of the study was broken up into five experiment tasks. Three of the five tasks (Tasks 1, 2, 3 listed below) had a scripted prompt sequence; the last two tasks (Tasks 4, 5) allowed for user construction and variation of the tasks using tree of thought prompting, chain of thought prompting, and generated knowledge prompting:

Task 1: Introducing a new subject with GAIs

Task 2: Getting help from GAIs to understand one or more topics of a subject

Task 3: Getting GAIs to help you learn a difficult topic

Task 4: Using specific prompt engineering techniques with GAIs (maieutic, tree of thought, and chain of thought prompting) to build greater understanding of a new subject 

Task 5: Getting the GAI to quiz you and assess your understanding of a new subject 

The experiments were run by relatively novice learners in the law, rising second year students who had completed the first year of law school. The researchers acted as the learners calling on the GAIs for assistance and support. Each researcher was assigned a first-year subject that all of them had taken and passed (Civil Procedure, Contracts, Criminal Law, or Torts), and an upper division subject that most second year students take in law school but none of the four researchers had yet taken (Administrative Law, Business Associations/Corporations, Evidence, or Trusts and Estates). 

The study did not involve priming the GAI systems with legal information. All of the systems tested—Claude 3.5 Sonnet, Copilot, Gemini 1.5 Pro, GPT-4o Omni, and Lexis+ AI—were used in their publicly released versions as of May 20 through early July 2024. In Tasks 4 and 5, the learners uploaded or entered answers or short essays for evaluation and assessment by the GAI. Some of these answers and essays were student generated and others were generated by a different GAI system from the GAI being tested in that round of the experiments.

The prompting techniques used in the experiments were:Complexity-based prompting; Generated knowledge prompting; Chain of thought prompting; Tree of thought prompting; and Maieutic (Socratic) prompting.

All of which will be discussed in section 1 below.

Prompt Engineering Methods

Prompt engineering is the process of crafting the words to input to the GAI to direct and describe what you want the GAI to do and the output you want from the GAI. The originators who popularized the term “prompt engineering” no doubt wanted to use a STEM-centric word—engineering—to make this process sound “sciencey” and technical or even mathematical. At the very least, it was intended to convey that the process is important. But the process, however important, does not have to be technical or mathematical, and it much more closely tracks proper drafting techniques for rules and instructions than it does actual computer coding. After all, a prompt is made of language, a string of words in the form of instructions or directions.

Prompt engineering is important because GAI systems do not think or act autonomously.[44] “Artificial intelligence can’t actually ‘think’ but it can understand what you tell it. The problem is, when you don’t know what you want it has to guess. The more a GAI has to guess, the more randomized and “creative” it will get and the more difficult it is to get the results you’re looking for.”[45]

The techniques used in the study were:

  • Complexity-based prompting: The primary technique used in all of the experiments was complexity-based prompting,[46] which tailors the complexity of prompts to the learner’s progress so as to ensure that the GAI will give responses that will provide optimal assistance and still be challenging at each stage of the interaction and development of the learner’s knowledge. This method of prompting—or perhaps better labeled as “engagement” with the GAI—seems particularly well suited to assisting learning. At higher levels of learning, complexity-based prompting allows users to probe the GAI with highly complex prompts with multiple reasoning steps, very much in line with chain of thought prompting techniques described below. But GAIs also can accommodate a novice learner who can engage the GAI with open-ended questions of any level of complexity—the simpler the better—and when confronted by a GAI response that is not fully understood, the learner should ask a series of prompts to break down the AI’s comprehensive responses into smaller, more manageable parts. This can be done by asking the AI to explain each part of the response in simpler terms, or by asking follow-up questions to clarify specific points. The learner can also ask the AI to provide examples, analogies, or visual aids to help the learner understand the concepts better. So, while the initial response of the AI might be too complex or too simple, it serves as a starting point for further exploration and clarification. The learner can interact with the AI to tailor the learning process to their own pace and level of understanding thus achieving a highly personalized individual learning plan. This scaffolding approach ensures that the learner is neither overwhelmed nor under-challenged at any stage.
  • Generated knowledge prompting: The generated knowledge technique[47] leverages AI capabilities to generate new prompts based on a learner’s past responses. The technique often takes two stages: at the first stage (knowledge generation), the learner asks the GAI to provide general background information about a subject. The GAI’s responses kick off the iterations of prompts to build a body of information on the larger subject and then to iterate prompts that explore more deeply specific topics or subtopics of the subject. The second stage (knowledge integration) is to use the generated knowledge to address a specific problem or answer a specific question or to produce a specific kind of output (e.g., a research report, a chart or table of data analysis, or other outputs). In the second stage, the user might upload to the GAI additional information about the context and facts of the problem, or more specialized information about the subject matter. As with complexity-based prompting, in the course of an AI providing information to a learner, the learner can request a different modality, for example, to switch from explanation to assessment of the learner’s comprehension of the material, so that the AI can redirect attention and delivery of information to the problem area. If the GAI has generated a significant amount of information on contract law but the learner still struggles with the topic of damages in contract law, the learner can prompt the AI to ask the learner questions about damages, critique the answers and ask follow-up questions that add information building upon the concept until the learner’s responses. For example,[48]

Learner’s Prompt: Can you ask me questions to help me understand damages in contract law?

AI Question: “Can you explain what ‘damages’ in contract law means?”

Learner’s Response: “I think it’s some kind of compensation when a contract is breached, but I’m not sure about the details.”

AI’s Response: “That’s a good start! Damages in contract law indeed refer to a sum of money awarded to a party who has suffered loss or injury due to a breach of contract. There are different types of damages, such as compensatory, punitive, nominal, and liquidated damages.”

AI Question: “Can you explain the difference between compensatory and punitive damages?”

Learner’s Response: “I’m not sure, but I think compensatory damages are meant to compensate for the loss, while punitive damages are meant to punish the party who breached the contract.”

AI’s Response: “You’re on the right track! Compensatory damages are indeed intended to compensate the injured party for the loss they suffered as a result of the breach. They are designed to put the injured party in the position they would have been in if the contract had been performed as agreed. On the other hand, punitive damages, which are also known as exemplary damages, are awarded in addition to compensatory damages in cases where the breach of contract was particularly egregious or malicious. They are intended to punish the breaching party and deter similar behavior in the future.”

The generated knowledge prompting technique has the potential to create a dynamic and highly personalized approach to learning taking into account not only flexibility in the level and complexity of the information being generated but also bringing in an on-the-spot formative assessment of the learner’s progress in mastering the topic. For a more knowledgeable user, generated knowledge prompting also provides the opportunity for the user to check the GAI’s answers for accuracy and reliability throughout the interaction.

In addition to the two primary techniques described above, the study employed several additional techniques that have been developed to try to coax the most useful behavior and outputs from GAI systems:

  • Chain of Thought prompting: This technique involves creating a sequence of prompts that build upon each other to attempt to guide the AI through a logical cognitive process. In effect, each prompt requests the AI to fill in a link in a chain of thoughts that build upon the previous prompts in a linear progression.[49] When conducted in a single session with the AI, this technique guides the AI to move step-by-step through the analysis or explanation of the material without having to consider the entire problem or subject area all at once.[50] For example, in a writing project, the prompts might ask the AI to develop an outline, then an abstract, then an introduction, and finally the actual parts or sections of the writing, with the human able to give feedback and assessment along the way. Chain of thought prompting allows the end user a window into the AI’s “reasoning” and problem-solving approach that the user can observe and potentially correct.[51] With a more knowledgeable user, chain of thought prompting also provides the opportunity for the user to check the GAI’s answers for accuracy and reliability throughout the interaction. The technique is used to improve the performance of large language models on complex reasoning tasks by generating a chain of thought as a series of intermediate steps, and because of the attention mechanisms of transformer model LLMs, it keeps the AI aligned with the task without skipping steps or jumping to more “creative” or fanciful—or incorrect—conclusions.[52] Reportedly, just by prompting the AI to “think step-by-step” the learner can assist the AI in correctly working out the answer, but the learner can go further and actually design the steps it wants the AI to take through the problem.[53]
  • Tree of Thought Prompting: The Tree of Thought (ToT) prompting technique[54] adopts a tree structure to the design of a series of prompts where a central idea or concept or a single problem to solve forms the trunk and the branches represent further paths of inquiry into related topics or alternative methods of reasoning to reach the answer to the central problem.[55] Unlike prompting methods in the chain of thought family that rely on a linear sequence of instructions, the ToT method organizes prompts in a hierarchical manner, akin to a tree structure, for problem-solving.[56] In the law, rule tree diagrams visualize the steps that branch from the central legal question to the limbs and branches of sub-rules that must be answered, and often an answer to one question affects the answer to a sub-rule question further out on the branch.[57] For example, an initial inquiry into copyright fair use as the “trunk” question requires branching out into inquiries (“limbs”) of the category of use (e.g., research, education, comment and criticism), the purpose and character of the use, the nature of the original, the amount taken of the original, and the effect on the market for the original, and the limb of purpose and character of the use requires further inquiry into branches representing the commerciality of the use and the degree to which such use is transformative, which further branches to questions regarding whether the use changes the function and purpose of the use and the content, meaning and expression of the use compared to the original. All of this is necessary before the “trunk” question of fair use can be answered.[58] The ToT technique allows the user to guide the AI in a multi-dimensional exploration of a topic.[59]
  • Maieutic prompting: Just as the Socratic method is used to scrutinize a person’s beliefs by way of questioning to determine their internal consistency and their coherence with other beliefs, the Maieutic prompting technique uses a recursive process of questioning of the GAI to induce a tree of explanations while checking for internal consistency and coherence.[60] The Maieutic technique interrogates the AI to give answers and further explanations of answers in a form of intellectual cross-examination. The user employing the method is called upon to be both interrogator and student, as following the chain of questions and the answers given guides both the AI and the student in self-discovery of knowledge by encouraging both the user and the AI to think, analyze, explain, and justify conclusions.[61] The user continuously challenges the AI to go further in its explanation and can open new branches of inquiry at any time or cut off those that reveal inconsistent, incoherent, or incorrect reasoning and analysis.[62] The method is flexible because it could start with a series of questions to draw out the knowledge from the AI in an Maieutic and knowledge generation fashion—inducing the AI to “give birth” to the information it already knows—and then pivot to cross-examination (the elenchus form) to challenge assumptions in the answers or refute hypotheses, or it could switch to a more dialectic form where the user and the AI debate the topic.[63] There also is “reverse Maieutic prompting” wherein a user could request the AI to turn the form around and Socratically interrogate the user on his or her understanding of a topic.[64]

Other prompting techniques or strategies that were considered but ultimately not directly employed:

  • Least-to-most prompting: This technique employs a graded approach to assistance, starting with the least amount of help and gradually increasing support if the learner struggles. For instance, in learning a new skill like pottery, the learner might first be prompted to shape the clay independently. If they struggle, they might receive verbal instructions, followed by a demonstration if needed. This ensures that learners are given the opportunity to try independently before receiving help. This technique appears to be subsumed within the complexity-based prompting model and the generated knowledge model.
  • Self-refine prompting: This technique encourages learners to refine their understanding over time through reflection and revision. For instance, in a writing context, after composing an essay, the learner might be prompted to review and revise their work, focusing on areas like argument structure, clarity, grammar, and use of evidence. This iterative process attempts to foster critical thinking and self-improvement. This technique appears to require a more leisurely and extended time for reflection and recursion in the writing process. While these are very worthy ends, the technique is not conducive to repeated scripted prompting on multiple GAI platforms for evaluative and comparative analysis.

Use of Counterfactuals

Tasks 4 and 5 below, the unscripted tasks that worked through several prompt engineering and assessment techniques, also used “counterfactuals” in the prompting sequence. Counterfactuals, when used to test GAI systems, are hypothetical scenarios or parts of prompts and queries that contradict known facts or reality. These scenarios are deliberately crafted to include false or misleading information to evaluate the AI’s ability to recognize and correct inaccuracies in inputs.[65] The purpose of testing a GAI system’s ability to handle counterfactuals is to assess its robustness and reliability in discerning truth from falsehoods.[66] This is particularly important in educational and professional contexts, such as tutoring in law, where the accuracy of information is critical. By testing how well an AI system identifies and responds to counterfactuals, researchers can determine its effectiveness in maintaining the integrity of information, providing accurate feedback, and ensuring that users are not misled by incorrect data. This ability is a key indicator of an AI system’s utility in real-world applications where factual precision is paramount.[67]

Designing the Tasks for Explanation, Elaboration, Simplification, Summarization, and Assessment 

The tasks of Part I of the study were:

Task 1: Introducing a new subject

Task 2: Getting help to understand one or more topics of a subject

Task 3: Getting help to learn a difficult topic

Task 4: Using specific prompt engineering techniques (maieutic, tree of thought, and chain of thought prompting) to build greater understanding of a new subject 

Task 5: Getting the GAI to quiz you and assess your understanding of a new subject 

Thus, the tasks followed a path of explanationelaborationsimplificationsummarization, and assessment.[68]

The most basic practice of academic support is to sit down with a student and work to try to bridge a gap in understanding or advance the student’s knowledge and skills development. The methods that will be most advantageous in this task are, of course, dependent on the particular student and the particular problem in comprehension or skills development the student is having. But the best and most effective coaching and tutoring practices often gravitate around the need for the tutor to explain, elaborate, simplify, or summarize material in a new and advantageous manner, and then assess the student’s learning of the subject matter.

A true strength of GAI in learning is that it allows an individualized learning experience for each student. GAI can customize the experience by taking into account the individual’s pace, abilities, interests, and learning styles and delivering content suited for each individual. It can design and refine the material and receive immediate feedback from the student on how and where to adjust the lessons and assessments.[69]

For law students, most semesters of law study involve learning new legal subjects. The process generally is most difficult in the first semesters of law school, but each new subject presents challenges of its own. Practitioners also face this challenge when their practice or their clients’ matters move them into new and unfamiliar areas. Judges and law clerks often must learn new and developing areas of law brought to them by the cases filed in their courts. The leading GAIs—GPT-4o Omni, Claude 3.5 Sonnet, Copilot, and Genesis 1.5 Pro—are well suited to assist all of these learners—novice law students and expert lawyers and judges alike. 

Anyone using the current large language model text-based GAI systems will properly be amazed at the AI’s ability to talk about a wide range of topics and respond to requests for it to explain, elaborate, simplify, or summarize the material. Several of the GAI systems tested—Copilot, Gemini, and GPT-4o—can search the internet for information and assimilate and synthesize a response on any information that is not hidden behind password- or paywall-protected proprietary databases. Unlike prior search engines, which could be quite clever in gathering a laundry list of sources on topics, a GAI goes beyond the search engines to assimilate, synthesize, and explain the answer on the topic in impressively drafted language. Not only is information communicated , but a GAI’s response can also stimulate thinking, change or reframe the learner’s perspective, and ignite curiosity. If the AI’s initial response is well over the head of the user, the AI can and happily will simplify it to a more appropriate level of discourse. The chatbot model of textual GAIs is prepared to engage in a conversation on the topic, so it will suggest follow-up questions for the user to think about if the user’s curiosity or imagination is flagging.

The last task in the study is called “AI assisted Self-Assessment,” which is not intended to be an oxymoron but more of a realization that the learner can take steps in working with an AI tool to get the AI to engage in assessments. GAIs are well-equipped to write multiple choice questions and administer and grade quizzes. AI can draft short answer questions and evaluate the user’s short answers. With a little prompting and priming (giving the AI a form or sample to learn from in single-shot or few-shot priming), an AI can draft, administer, and evaluate the answers to essay questions, too.

There is a general caveat here that applies across the board to many uses of GAI: Al is worse at dealing with questions, issues, and research topics in niche or discrete areas of study and in the most current, novel, cutting edge topics, because the Large Language Model training depends on assimilating and synthesizing a massive number of examples of writing on a topic in order for the AI to write intelligently on it. In addition, if the topic is too new, many of the most-used GAI platforms have a data cutoff date (e.g., at one time GPT-4 and ChatGPT’s data cutoff date was September 2021, but by time of this study’s experiments in May-August 2024 the data cutoff was April 2023), so it will not be able to form answers on topics or material occurring after that date without using a web browsing function.[70]

This shortcoming of GAI also ties into another significant topic for concern: hallucinations, which more recently are called “confabulations.”[71] Having too few examples of writing or commentary on a narrow or novel topic may cause the AI to compensate by filling in information that in reality isn’t there—i.e., “hallucinating” or “confabulating” the information.[72] The nature of these experiments was to get grounded in the law of a general, national legal subject regarding which there would be plenty of information from the internet to draw from in training and inference. Therefore, the prompts outlined in Tasks 1-4 below made no particular effort to try to avoid hallucinations or confabulations.

Prompt Script for Task 1: Getting the GAI to help you get grounded in an unfamiliar subject

Prompt 1: I have to take a law school course in [NAME]. I don’t know anything about [NAME] and I am worried that I will have trouble figuring out what this course is all about. Can you help me?

Follow up prompts: Can you give me some kind of overview of what this [course or subject] [NAME] is all about?

What is the purpose of this area of law? 

What are the big picture issues? 

What is the public policy behind this area of law?

Prompt 2: Can you produce an outline of the major topics that are in this course?

Follow-up prompts:  Do you have access to outlines of the [NAME] course?

Do you have information about the table of contents for a [NAME] course text?

Prompt Script for Task 2 – Getting help understanding one or more topics of a subject

Prompt 1: Can you tell me more about the [TOPIC NAME] topic?

Follow-up prompts: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, the researchers were to summarize some of the information they received and submit it to the GAI for evaluation using Prompt 2 below]

Prompt 2: I have written the following summary of this [TOPIC NAME]. Can you review it and tell me if I understand the [TOPIC NAME] or am I missing something? [Submit your summary]

[Researchers were to follow up on things the GAI corrected or added in their summary]

Prompt Script for Task 3 – Getting the GAI to help you learn a difficult topic

Prompt 1: Can you elaborate on the [HARD TOPIC NAME] topic? 

Follow-up prompts: What does the [HARD TOPIC NAME] cover? 

What is the meaning of these [HARD TOPIC NAME] requirements? [List certain requirements, elements, factors, etc.]

That sounds very complicated. Can you break this down for me into a bullet point summary?

[Researchers were to ask follow up questions as necessary.]

Sorry, I am still confused. Can you simplify your explanation to a high school reading level?

[Next, researchers were to summarize some of the information they received and submit it to the GAI.]

I have written the following summary of this [HARD TOPIC NAME]. Can you review it and tell me if I understand [HARD TOPIC NAME] or am I missing something? [SUBMIT YOUR SUMMARY]

[Researchers were to follow up on things the GAI corrected or added to their summary]

Prompt Instructions[73] for Task 4 – Using Prompt Engineering Techniques to learn a new subject

Prompt 1: Craft a set of Chain-of-Thought or Tree-of-Thought prompts that explore [HARD TOPIC NAME]. Again, check on the GAI to make sure it is not going off the reservation in its discussion.

Chain-of-thought prompting asks the GAI to “go step by step” through a complex analysis. If you don’t know the possible steps in which to break down the analysis, the GAI can help you. Ask it: [Prompt] “What are the steps required to qualify for asylum in the United States?” . . . [Follow up prompt] “Okay, let’s look at the first step . . .” 

The tree-of-thought method is designed for the analysis of a legal issue where the law has a treelike structure with the main trunk being the ultimate issue and the branches being the elements or factors whose analysis informs the outcome of the main trunk issue. First ask the GAI to break the topic down in a hierarchical structure. If the GAI is unable to produce a structure, then ask for a decision tree or thought map of the analysis of the structure. When the GAI is able to populate the tree structure, ask the GAI to go down each branch and subbranch with you where you can evaluate and learn from its analysis.

Prompt 2: Engage in user side Maieutic Prompting (Socratic Questioning) to question the GAI on [HARD TOPIC NAME]. Ask a question, evaluate the response, and ask another question drawing on the GAI’s response to move the conversation forward, deeper, or in a new direction.

Prompt Instructions for Task 5 – Getting the GAI to quiz you and assess your understanding of a new subject

Prompt 1: Thank you for all of your help. I think I am getting a better handle on [TOPIC NAME]. Can you quiz me on it? Please write me [some number, maybe 10] multiple choice questions on [TOPIC NAME]. Present them one at a time, let me answer each one, and then tell me if I am correct and explain the correct answer.

[Note that the last sentence of the suggested prompt here is important because most AIs will simply launch into question writing mode and spit out a whole series of questions not giving you a chance to answer. The researchers were asked to follow up with questions to the GAI if they were confused by the “correct” answers or the GAI’s explanation of the answers. If the GAI was making mistakes or repeating the same questions over and over, the researchers were to try to get it to self-correct.]

Prompt 2:  Can you ask me some short answer or fill in the blank questions on [TOPIC NAME]?

[This is the Maieutic part. As described in the section above on Maieutic prompting—Socratic questioning—the researchers were to launch the GAI by telling it to start questioning you on a general or specific topic. They were to ask the GAI to give them questions one by one giving them a chance to answer and requesting the GAI to evaluate each answer before asking the next question (otherwise it may have just kept spitting out questions with no comment or feedback on the answers). A possible variation in this prompting was for the researchers to tell the GAI to ask the next question in a manner that follows from the answer to the last question.]

Prompt 3: This was very helpful! Can you give me some short essay questions and then I will write an answer and ask you to evaluate my answer, okay?

[The researchers supplied their own answers or used generated answers from a GAI system different from the one being tested.] 

Prompt 4: I have collected some practice exam essay questions. Can I feed you the prompts for these essays, and then I will submit my answers to these essay questions and ask you to evaluate my answers, okay?

[As before, the researchers were to submit the answers and then ask the GAI to evaluate them.]

Part II – Use of a Generative AI System to Improve a Leaner’s Performance on an Objective Examination in Courses the Learner has not yet Taken

Part II of the study tested whether a general-purpose public-facing AI system, Claude 3.5 Sonnet, could improve learners’ results on an objective examination in two law courses that the learner had not yet taken in law school. 

Selection of Researchers

Each of the seven volunteer researchers was a rising 2L student who had completed the first year of law school but had not begun the upper division coursework of the second year of law school. In particular, none of the researchers had taken any courses involving copyright, trademark, or intellectual property law. Six of the researchers were not highly experienced in the use of GAI systems and none of them had used Claude 3.5 Sonnet before. Other than the Round 2 instructions, none of these six researchers had received training or instruction in the use of GAI systems. These six had not researched the Part I experiments reported here. One researcher (#3775) had been involved in the Part I experiments and so was familiar with the use of GAI systems including Claude 3.5 Sonnet. However, as noted in the results below, #3775’s scores and improvements were very similar to those of the other Part II researchers.

Design of the Experiment

There were three rounds of this experiment. Round 1 involved the student researchers taking an objective multiple-choice “pre-test” of twenty-five questions on copyright law and twenty-five questions on trademark law before taking any course of study in copyright or trademark. Round 2 involved the researchers using a general-purpose public-facing AI system, Claude 3.5 Sonnet, to study copyright and trademark law in one-on-one tutoring sessions. All researchers worked alone with the GAI system but followed a guide to using AI systems for academic support (see Round 2 Guided Instructions Instrument below). In this way, the students used the guide to study with Claude 3.5 Sonnet as an AI tutor. Round 3 involved the student researchers taking an objective multiple-choice “post-test” of twenty-five questions on copyright law and twenty-five questions on trademark law, this time after having worked to learn copyright and trademark from the AI tutor. The post-test examination was similar to the pre-test exam from Round 1 and tested the same subject areas within copyright and trademark law, but none of the questions were the same as in the Round 1 and Round 3 examinations.

The subjects tested were as follows:

Copyright Law:

• copyrightable subject matter

• originality and creativity requirements

• idea-expression distinction or dichotomy

• unprotectable ideas under 17 U.S.C. 102

• merger and scènes à faire doctrines

• copyright fair use exception under 17 U.S.C. 107

• purpose and character of the use

• commercial vs. non-commercial uses

• the transformative test

Trademark law:

• What is a trademark, and what is it used for

• The “use in commerce” requirement

• Consumer confusion

• Likelihood of confusion test

• False designation of origin

• Trademark dilution

• Trademark fair use

• Nominative fair use

• Classic fair use[74]

The guide for the researchers to follow in Round 2 in working with the GAI system, Claude 3.5 Sonnet, as a tutor was as follows:

Round 2 Instrument – Guided Instructions for use of the AI as a Tutor

The general objective of this project is to test how well a GAI system can improve a novice learner’s performance on an objective test of law subjects. To get one-on-one assistance in learning a new law subject from a GAI you will engage with the AI in an extended conversation with you prompting questions or requests to the AI, and the AI responding. Eventually, you will get to a point where you would like the GAI to ask you questions and evaluate your responses.

Below, sketched out are some suggested approaches to prompting for this task. Note that prompting is flexible and there is nothing sacred about the specific wording of these prompts. Change them if you want if they don’t sound right to you. The important thing is to settle in for a conversation with the AI, not to try to take one giant swing for the fences with a single prompt. If you are confused at any point—and most likely you will be—tell the GAI that you are confused and ask the AI to explain the topic again, simplify it, or explain it to you at a high school learning level or even a 6th grade learning level.

In the suggested prompts below are some key topics of copyright law you should cover before ending your session: 

• copyrightable subject matter

• originality and creativity requirements

• idea-expression distinction or dichotomy

• unprotectable ideas under 17 U.S.C. 102

• merger and scènes à faire doctrines

• copyright fair use exception under 17 U.S.C. 107

• purpose and character of the use

• commercial vs. non-commercial uses

• the transformative test

And below are some key topics for trademark law:

• What is a trademark, and what is it used for

• The “use in commerce” requirement

• Consumer confusion

• Likelihood of confusion test

• False designation of origin

• Trademark dilution

• Trademark fair use

• Nominative fair use

• Classic fair use

These topics will be the source of questions on Round 3’s Post-test. But you are free to explore other topics going deeper or farther than discussed here. Suggested approach to prompts:

Part 1 – Introducing a new subject – Copyright Law

Prompt 1: I have to learn a new subject in law called copyright. I don’t know anything about Copyright. Can you help me figure out what this course is all about?

Follow up prompts: Can you give me some kind of overview of what Copyright is all about? What is the purpose of this area of law? 

What are the big picture issues that come up regularly in copyright law?

What is the public policy behind this area of law?

Prompt 2: Can you produce an outline of the major topics that are covered in copyright such as you might find in the table of contents of a copyright course text?

[Make sure you get an actual outline that resembles a table of contents for a copyright course text. If not, follow one or more of the follow-up prompts here until you get a good outline or table of contents.]

Follow-up prompts: Do you have access to outlines of the copyright course? Can you show me a table of contents for a copyright course text?

Part 2 – Getting help understanding one or more topics of a subject

Once you have gotten your footing in the new subject, you should follow up and request more information on the topics provided below and any other topics within the subject area that interest you. The following prompts will move you and the GAI in that direction. If you are staying on the same page with the GAI feel free to skip some of the steps asking for the AI to simplify the information. But if the AI’s answers are not understandable you should ask the AI to clarify or simplify the information.

Suggested approach to prompts:

Prompt 1: Can you tell me more about the “copyrightable subject matter” topic? What makes something copyrightable?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 2: I have written the following summary of copyrightable subject matter. Can you review it and tell me if I understand the concept of copyrightable subject matter or am I missing something? [Submit your summary]

[Follow up on things the AI corrected or added to in your summary]

Prompt 3: Can you tell me more about the “originality and creativity” requirements? What do original and creative mean in this context?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 4: I have written the following summary of the originality and creativity requirements. Can you review it and tell me if I understand the concepts of originality and creativity or am I missing something? [Submit your summary]

[Follow up on things the GAI corrected or added to in your summary]

Prompt 5: Can you tell me more about the “idea-expression” distinction or dichotomy? What does this idea vs. expression concept mean?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 6: I have written the following summary of the idea vs. expression concept. Can you review it and tell me if I understand this concept or am I missing something? [Submit your summary]

[Follow up on things the AI corrected or added to in your summary]

Prompt 7: Can you tell me more about the concept of unprotectable ideas under 17 U.S.C. 102? What does this concept mean?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 8: I have written the following summary of the concept of unprotectable ideas under 17 U.S.C. 102. Can you review it and tell me if I understand this concept or am I missing something? [Submit your summary]

[Follow up on things the AI corrected or added to in your summary]

Prompt 9: Can you tell me about the merger and scènes à faire doctrines? What do these doctrines do? What is the difference between the two doctrines?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 10: I have written the following summary of the concept of unprotectable ideas under 17 U.S.C. 102. Can you review it and tell me if I understand this concept or am I missing something? [Submit your summary]

[Follow up on things the AI corrected or added to in your summary]

Part 3 – Getting the generative AI to help you learn a difficult topic

Fair use is the most complicated topic under copyright law, but these prompts below should give you a good grounding in the fair use requirements. As before, suggested are prompts that express your confusion and ask for clarification or simplification. You can ignore these and move on if you feel you have gained a good understanding of the topic.

Suggested approach to prompts:

Prompt 1: Can you elaborate on the copyright fair use topic? 

Follow-up prompts: What does the fair use exception under 17 U.S.C. 107 cover? 

What purposes of fair use are listed in section 107?

Prompt 2: What is the meaning of the purpose and character of the use factor?

Follow-up prompts: That sounds very complicated. Can you break this down for me into a bullet point summary?

[Ask follow up questions as necessary such as the suggested prompts below.]

Sorry, I am still confused. Can you simplify your explanation to a high school reading level?

Prompt 3: Can you elaborate on the commercial vs. non-commercial use topic? 

Prompt 4: Can you explain the transformative test of copyright fair use law?

Follow-up prompts: What make a use of an original work transformative? Does it matter what function and purpose the new work is for compared to the function and purpose of the original work?

[Next, summarize some of the information yourself and submit it to the AI. This topic is more complicated so you might need 300 words or more here.]

Prompt 5: I have written the following summary of the copyright fair use test, including the purposes listed in section 107, the factor of the purpose and character of the use, and the transformative test. Can you review it and tell me if I understand [HARD TOPIC NAME] or am I missing something? [Submit your summary]

[Follow up on things the GAI corrected or added to in your summary]

Part 4 – Getting the generative AI to quiz you and assess your understanding of copyright law

Prompt 1: Thank you for all of your help. I think I am getting a better handle on copyright law. Can you quiz me on it? Please write me [some number, maybe 20] multiple choice questions on copyright law. Present them one at a time, let me answer each one, and then tell me if I am correct and explain the correct answer.

[The last sentence of the prompt here is important because most AIs will simply launch into question writing mode and spit out a whole series of questions without giving you a chance to answer. If this prompting goes well, you could repeat this exercise a few times until you are well tested in the new area. If you are confused by the “correct” answers or the AI’s explanation of the answers, please follow up with questions to the AI. If the AI is making mistakes or repeating the same questions over and over, try to get it to self-correct.]

Prompt 2: Can you ask me some short answer or fill in the blank questions on copyright law?

[Here, you will launch the AI by telling it to start questioning you on a general or specific topic. Ask the AI to give you the questions one by one giving you a chance to answer and requesting the AI to evaluate your answer before asking the next question (otherwise it may just keep spitting out questions with no comment or feedback on your answers). Try to keep up with the AI’s questions. If the AI starts slipping up or repeating itself, feel free to start over with a new session. AIs can do a long session of questioning but sometimes they seem to get fatigued or run out of ideas and start to ask the same questions over and over again.]

Part 1 – Introducing a new subject – Trademark Law

Prompt 1: I have to learn a new subject in law called trademark. I don’t know anything about trademark. Can you help me figure out what this course is all about?

Follow up prompts: Can you give me some kind of overview of what trademark is all about? What is the purpose of this area of law? 

What are the big picture issues that come up regularly in trademark law?

What is the public policy behind this area of law?

Prompt 2: Can you produce an outline of the major topics that are covered in trademark such as you might find in the table of contents of a trademark course text?

[Make sure you get an actual outline that resembles a table of contents for a trademark course text. If not, follow one or more of the follow-up prompts here until you get a good outline or table of contents.]

Follow-up prompts: Do you have access to outlines of the trademark course? Can you show me a table of contents for a trademark course text?

Part 2 – Getting help understanding one or more topics of a subject

Once you have gotten your footing in the new subject you should follow up and request more information on the topics provided below and any other topics within the subject area that interest you. The following prompts will move you and the GAI in that direction. If you are staying on the same page with the GAI feel free to skip some of the steps asking for the AI to simplify the information, but if the AI’s answers are not understandable you should ask the AI to clarify or simplify the information. Suggested approach to prompts:

Prompt 1: Can you tell me more about the “use in commerce” or commercial use requirement in trademark law? What does it mean to use a mark in commerce?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 2: I have written the following summary of the use in commerce requirement. Can you review it and tell me if I understand the concept of the use of a mark in commerce or am I missing something? [Submit your summary]

[Follow up on things the GAI corrected or added to in your summary]

Prompt 3: Can you tell me more about the concept of consumer confusion in trademark law? What does consumer confusion mean?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 4: I have written the following summary of the consumer confusion requirement. Can you review it and tell me if I understand the concept correctly or am I missing something? [Submit your summary]

[Follow up on things the GAI corrected or added to in your summary]

Prompt 5: Can you tell me more about the test used to determine a “likelihood of confusion”? 

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 6: I have written the following summary of the “likelihood of confusion” test. Can you review it and tell me if I understand this concept of “likelihood of confusion” or am I missing something? [Submit your summary]

[Follow up on things the AI corrected or added to in your summary]

Prompt 7: Can you tell me more about the concept of false designation of origin? What does this concept mean?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 8: I have written the following summary of the concept of false designation of origin. Can you review it and tell me if I understand this concept or am I missing something? [Submit your summary]

[Follow up on things the AI corrected or added to in your summary]

Prompt 9: Can you tell me about the concept of trademark dilution? What is trademark dilution, and how does it differ from trademark infringement?

Follow-up prompts for clarification or simplification: That sounds pretty complicated. Can you break this down for me into a bullet point summary?

Sorry, I am still a little confused. Can you simplify your explanation at a high school reading level?

[Next, summarize some of the information you have received and submit it to the AI for evaluation using Prompt 2 below. This should not be overly long—100 words or fewer should be enough.]

Prompt 10: I have written the following summary of the concept of trademark dilution. Can you review it and tell me if I understand this concept or am I missing something? [Submit your summary]

[Follow up on things the AI corrected or added to in your summary]

Part 3 – Getting the generative AI to help you learn a difficult topic

Fair use is a complicated topic under trademark law, particularly because it is so different from fair use. However, these prompts below should give you a good grounding in the trademark fair use requirements. As before, suggested are prompts to express your confusion and ask for clarification or simplification. You can ignore these and move on if you feel you have gained a good understanding of the topic. Suggested approach to prompts:

Prompt 1: Can you elaborate on the trademark fair use topic? 

Follow-up prompts: What purposes are there for trademark fair uses?

Prompt 2: What is the nominative fair use in trademark law? 

Follow-up prompts: That sounds very complicated. Can you break this down for me into a bullet point summary?

[Ask follow up questions as necessary such as the suggested prompts below.]

Sorry, I am still confused. Can you simplify your explanation to a high school reading level?

Prompt 3: Can you elaborate on classic fair use in trademark law? How does the class of products affect the fair use determination? 

Prompt 4: I have written the following summary of the trademark fair uses, including the nominative fair use and the classic fair use. Can you review it and tell me if I understand the trademark fair uses or am I missing something? [Submit your summary]

[Follow up on things the GAI corrected or added to in your summary]

Part 4 – Getting the Generative AI to quiz you and assess your understanding of trademark law

Suggested approach to prompts:

Prompt 1: Thank you for all of your help. I think I am getting a better handle on trademark law. Can you quiz me on it? Please write me [some number, maybe 20] multiple choice questions on trademark law. Present them one at a time, let me answer each one, and then tell me if I am correct and explain the correct answer.

[The last sentence of the prompt here is important because most AIs will simply launch into question writing mode and spit out a whole series of questions without giving you a chance to answer. If this prompting goes well, you could repeat this exercise a few times until you are well tested in the new area. If you are confused by the “correct” answers or the AI’s explanation of the answers, please follow up with questions to the AI. If the AI is making mistakes or repeating the same questions over and over, try to get it to self-correct.]

Prompt 2: Can you ask me some short answer or fill in the blank questions on trademark law?

[Here, you will launch the AI by telling it to start questioning you on a general or specific topic. Ask the AI to give you the questions one by one giving you a chance to answer and requesting the AI to evaluate your answer before asking the next question (otherwise it may just keep spitting out questions with no comment or feedback on your answers). Try to keep up with the AI’s questions. If the AI starts slipping up or repeating itself, feel free to start over with a new session. AIs can do a long session of questioning but sometimes they seem to get fatigued or run out of ideas and start to ask the same questions over and over again.

Results

Summary of Comparative Evaluations and Conclusions from Part I of the Experiments[75]

Researcher ratings of GAI systems based on task performance

Task 1: Getting the GAI to help you get grounded in a law school subject

Researcher 1: The formatting of GPT-4o was by far the best, the adaptability of the AI to conversation and outlining was phenomenal. However, it was a close match with Claude—the biggest downside to Claude was the lockout period I experienced which cut me off during Task 3 but allowed me to pick back up (as well as the lack of citations, which comes up again in later tasks described below). Still—as far as an overview of a subject or topic—GPT-4o was my winner. Copilot and Gemini were close on the leaders’ heels for this round, with Lexis+ AI falling far behind. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Researcher 2: I think for an overview of a new course of study, Copilot or Gemini are your best bets for the introduction, however when you need to get more in depth on specific topics, you should switch to GPT-4o or Claude 3.5 Sonnet. Overall, I think that GPT is my preference for this task, being a solid combination of general, specific, and good at outlining. I think that Gemini may be the best at laying out an outline and table of contents, but only by a small margin over GPT. I wouldn’t recommend Lexis+ AI for any of the prompts in this section. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Copilot: 3d Best; Gemini: 2d Best; Lexis+AI: Worst.

Researcher 3: It seems pretty safe to say Lexis+ AI was the weakest link here, although I believe this is largely rooted in what was being asked of Lexis itself. Because Task 1 is simply about getting one’s feet wet in an area of law, and Lexis is restricted by the sources it could draw on, it isn’t exactly shocking that it struggled with some of the preliminary questions on producing outlines and tables of contents. Copilot and Gemini struggled a bit more with differentiating preliminary questions regarding the basics of the two courses I tested it on (public policy, big picture issues, purpose of that area of law) as it got repetitive and overlapped many of the major ideas. Claude and Chat GPT 4-o did a better job differentiating, getting more in depth in each response and keeping the ideas separate. This may just speak more to the ability of the programs to respond to somewhat vague prompt engineering. Responses on access to textbooks and outlines were mixed: Claude, Gemini, and Lexis were all unable to find the table of contents to the textbook I specified. Claude and Copilot could. Claude and 4-o gave the most comprehensive explanations, providing vocabulary words and tips for a successful time in law school. Overall Ratings: Claude: Best; GPT-4o: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Researcher 4: GPT-4o was the best overall in this section. It continues to baffle me that Lexis+AI was the worst (even overall). GPT-4o gave the best answers with the most organization. It was very useful and always followed directions. It listened to the questions being asked and always provided a response that was coherent and substantive. I appreciate the quality of GPT-4o and the details it gave in its answers without bogging them down and making them very confusing. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Task 2: Getting help understanding one or more topics of a law school subject

Researcher 1: Gemini excelled at this task because it went above and beyond, producing credible, relevant sources for the answer. GPT-4o, Copilot, and Claude were all about equal with each other (with the exception that Claude would not provide any sources). Lexis did not do well here as it could not simplify the concepts nor provide important caselaw. Overall Ratings: Gemini: Best; GPT-4o: 2d Best; Copilot: 2d Best; Claude: 2d Best; Lexis+AI: Worst.

Researcher 2: Overall, I think that Claude probably did the best of all of the systems for this task, if for no other reason than it beat out GPT on its ability to explain how my summaries could be improved. While Copilot and Gemini are decent at giving a general overview, they are not particularly good at doing a deeper dive or handling any sort of case law. Lexis can explain complex concepts, but it will be at a higher level with more jargon and it isn’t particularly good at summarizing information or evaluating your responses. Overall Ratings: Claude: Best; GPT-4o: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Researcher 3: When handed a summary to evaluate, Copilot and Gemini both understood the prompt but were lackluster in the true “analysis” department. In both systems, the AI pointed out what I got right (often restating the points, and reformatting into summary into bullet points and headings). Neither provided anything substantial to add or change to the summary—Gemini’s only suggestion being to consider in my response anything else I may learn in a T&E class. Claude and 4-o both provided more detailed explanations on what I could add to a summary: specific rules and rights that a student may want to incorporate. Lexis, unsurprisingly, struggled and wound up providing templates to attorney-client letters when I asked it to evaluate my summary. Again—a point where it seems a student would have to be asking questions related to a specific source stored in Lexis’ database to get an accurate response, rather than asking broad prompts without something specific in mind. As such, a student looking for a more thorough evaluation of a summary (that is—one that adds specific pointers) would probably benefit from using 4-o and Claude. Copilot and Gemini both did decent jobs providing examples unprompted and reducing explanations to lower levels—so a student looking for less detail and more simplification may find those two programs helpful. 4-o proved to be a preferred program for citing to caselaw. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Researcher 4: GPT-4o once again takes the trophy in this section. It was the most organized and easiest to follow. It gave great examples and did an excellent job at breaking down the information into a high school reading level (I would say it took it down to even maybe a fifth grade reading level). It did a great job at following the bullet point list and it gave very concise yet detailed and organized answers to the questions asked. It seemed very intelligent, and the answers were accurate. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Task 3: Getting the AI to help you on the hardest sub stopic of a law school course

Researcher 1: Claude would have been a clear winner, but its inability (or unwillingness) to provide sources is a huge drawback. One of the most important aspects to learning with AI currently is being able to cross-reference or check its sources. If Claude will not provide those sources at all, I’m not sure how comfortable I would be using the AI to learn or research a legal topic for school. So, I dropped Claude to 3d best here. Gemini was the winner for this round because it completely acknowledged the proceedings concerning Chevron which concluded in the U.S. Supreme Court overturning Chevron after this round of the experiments was completed. Gemini was closely followed by GPT-4o which included important cases and sources on torts, proximate cause (Wagon Mound, Palsgraf). Copilot fell well below the mark in this task for me due to its usage of non-credible sources and failure to mention the Chevron proceedings. Lexis+ AI brings up the rear in this section due to its inability to interact with the user beyond basic legal questioning. Overall Rating: Gemini: Best; GPT-4o: 2d Best; Claude: 3d Best; Copilot: Worst; Lexis+AI: Worst.

Researcher 2: Overall, GPT and Claude take the cake on this one, too, with neither standing out above the other. I like GPT for the depth of information on a topic, but Claude for the ease of understanding. Copilot was average across the board while Gemini was above average in helping to learn Business Organizations but below average on Civ Pro. Lexis continues to be on its own island of superior case law that it doesn’t really know what to do with. Claude doesn’t want to take a strong stance on the issue of judicial precedent in prompt 12 which is similar to GPT’s habit of giving the politician’s answer to this prompt. I accidentally presented my counterfactual to GPT for Civ Pro in this section and GPT caught that there was a mistake with my understanding and immediately noted as much in the first line of its response. Overall Ratings: GPT-4o: Best; Claude: Best; Gemini: 2d Best; Copilot: 3d Best; Lexis+AI: Worst.

Researcher 3: Perhaps one of Lexis’ greater advantages is how accessible and transparent it makes the sources—not just providing the sources but attaching them to the part of the response that it supports. I would imagine a student writing a piece or looking for specific sources would find this advantageous, assuming that Lexis was on point and not veering off course. That being said, they should approach with caution as Lexis doesn’t seem to give much deference to authority. In comparison, Gemini hadn’t cited to any restatements, UCC provisions, or case law in its analysis. Claude couldn’t point to caselaw, but could to UCC provisions/restatements. 4-o could point to caselaw. All helpful considerations depending on a law students needs (do they have a specific source in mind or very specific prompt that they hope to get sources from? Lexis could help. Looking for a source but not sure exactly what they’re looking for? Claude or 4-o. Needing general responses, or examples without much interest in specific source? Gemini and Copilot.). Similar trends in summary evaluations presented themselves here. Copilot and Gemini again did not add real pointers to the summaries provided, and when they did the tips were generic. 4-o and Claude specifically noted areas for improvement, 4-o even rewriting templates of what a corrected summary could look like. Lexis was able to point to nuanced themes that my summary could benefit from recognizing in trusts and estates law— suggestive that it may have the capacity to evaluate with time, but that I wouldn’t yet trust it to be consistent. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Researcher 4: GPT-4o was the best in my opinion for this task. It is so well organized. I really like working with GPT-4o. it follows the directions to a T, and it truly seems like it understands what you are asking it to do. It never seems to get confused. I really enjoy the examples of case law it provides to support its answers, and when fact checked, they are accurate. It provides the most examples out of any platform, and it has not fumbled or given any off the wall answers like Lexis+AI. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Copilot: 3d Best; Gemini: 3d Best; Lexis+AI: Worst.

Task 4: Using Prompt Engineering Techniques

Researcher 1: All of the systems ranked about the same as far as decision trees and chain of thought prompts went. Claude did the best in this section—easily catching obvious incorrect information, as well as not so obvious information/misconceptions. GPT-4o caught the obvious incorrect counterfactual information but failed to catch the small misconceptions. Gemini did notably miss overt incorrect information—even doubling down on its own misconception. Both Copilot and Lexis+ AI struggled when it came to providing incorrect/inaccurate information for cases during our questioning. However, Lexis+ AI did catch the incorrect counterfactual information and corrected my error. Copilot seemed to go along with my errors, but seemed like it believed the two sections of the APA could actually both be categorized as “informal rule-making” which I disagree with. Overall Ratings: Claude: Best; GPT-4o: 2d Best; Gemini: 3d Best; Lexis+AI: 3d Best; Copilot: Worst.

Researcher 2: Overall I was surprised to see Gemini rise through the ranks and to see Claude kind of fall off for this. Claude was by no means bad, but its presentation hurt it greatly. Copilot was not much of a stand out here, but again, Lexis+ AI found its way to the back of the pack. Part of the issue here is how individualized the question line is for each so it’s hard to compare them all properly in many ways. On counterfactual information, Claude caught my misidentification of FRCP 12 and properly clarified it immediately. For Business Organizations, it apologized for my misunderstanding and then corrected my counter factual. It did well on both accounts. For Copilot, I stated that FRCP 12 dealt with service of process and Copilot didn’t correct me and instead ran with it. For Business Organizations, I mislabeled the FTC as federal transmission company and Copilot didn’t acknowledge the mistake but did state the proper title in bold face print. Copilot didn’t really catch these counterfactuals at all. For Gemini, it caught my counterfactual FRCP rule misidentification and immediately addressed it. For the Business Organization one, it was less clear on whether it could actually identify the issue. It said I was correct, but then said that my understanding was incorrect, so it caught it, but wasn’t clear that I was wrong immediately. With Lexis+ AI’s counterfactual, it identified my understanding of Civ Pro to be wrong but didn’t correct my mislabeling of the case but did go on to cite the correct case. It totally missed my counterfactual misnaming of the FTC and just ran with it. Overall Ratings: Gemini: Best; Claude: 2d Best; GPT-4o: 2d Best; Copilot: 3d Best; Lexis+AI: Worst.

Researcher 3: The programs that were able to catch my counterfactuals were Claude and Gemini in one round (T&E). Copilot, Lexis, and (surprisingly) GPT-4-o all brushed over my counterfactuals, not directly calling them out. Gemini took my counterfactual and ran with it—directly restating the error and continuing on with its analysis. This comes with the admission that I cloaked the counterfactuals within bigger prompts, so the errors themselves were a bit more incidental to the questions I was asking. Nevertheless, it could speak to how these programs would respond to a small error made by a student—who may want to be particularly cautious using Gemini. Claude emerged as the most impressive in task 4: it was thorough and detailed (providing specific legal routes one could take, rather than simply suggesting “challenging the trust”) and maintaining strong attention mechanisms in how I requested it format the responses. Overall Ratings: Claude: Best; GPT-4o: 2d Best; Gemini: 3d Best; Copilot: Worst; Lexis+AI: Worst.

Researcher 4: GPT-4o once again takes the cake here. It is the only platform that provided direct rules of evidence to explain and help support its responses. It also continues to give every single answer in a clear and concise manner that is easy to read and understand with titles, subtitles, and headings. I did not truly understand the tree of thought process until GPT-4o produced its tree of thought example and then it made sense and was very easy to follow and understand. I actually enjoyed it. Both GPT-4o, Gemini, and Claude correctly identified my counter factual questions (I did not ask Lexis+AI or Copilot counterfactual questions though), but GPT-4o did the best at providing an explanation and resolution to the flaws in my question that were easy to understand and comprehend. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Gemini: 2d Best; Lexis+AI: 3d Best; Copilot: 3d Best.

Task 5: Getting the AI to quiz you and asses your understanding of topics in law school course

Researcher 1: Gemini excelled at Task 5 because it was able to provide clear, concise feedback on answers with the added bonus of commenting on the organization of the answers. Gemini also crafted the best multiple choice questions—creating scenarios which would be the most similar to what a student would see on a law school exam. GPT-4o did provide adequate feedback, but it was nothing special (similarly to Copilot). Claude falls a bit short due to the lack of sources, but Lexis+ AI falls very far behind the rest. Overall, I would classify Lexis+ AI’s Task 5 performance as a complete failure. It either wouldn’t create questions for me, or it would latch on to random parts of my query and fixate on them for no apparent reason. Overall Ratings: Gemini: Best; GPT-4o: 2d Best; Copilot: 3d Best; Claude: 3d Best; Lexis+AI: Worst.

Researcher 2: While Lexis has been the outsider this whole time, its struggle with this task was on a whole different level. As for the others, I don’t know that there was much different on the quizzing side. All of them provided multiple choice and fill in the blank questions. GPT and Claude will give you more of a challenge, but they are all serviceable aside from Lexis. For essays though, I think that Claude and GPT do the best at evaluating answers. Claude, I think edges out GPT by offering better examples and advice on how to improve your writing and the substance of the answer. Copilot and Gemini aren’t bad in this regard though, and it’s possible that something like Gemini would have access to more material being able to search the web. Overall Ratings: Claude: Best; GPT-4o: 2d Best; Gemini: 2d Best; Copilot: 3d Best; Lexis+AI: Worst.

Researcher 3: Similar to the themes I noted in tasks 2 and 3, Claude and GPT-4-o were consistently the best programs for evaluating summaries and explanations to short answer responses. They gave the most detailed pointers I could use to adjust my explanations, and 4-o was even able to point out things I implicitly touched on but should explicitly bring to light. Multiple choice questions seemed to be the easiest prompt for the programs to work with as a whole, though there were still some troubles: Lexis answered its own questions before allowing me to answer; Copilot asking the same question 3 times in a row; some of the programs like Lexis asked questions where a disproportionate number of the responses were the same letter. What did impress me was Gemini’s ability to ask hypothetical fact patterns in the form of multiple choice questions (a bit more similar to what you’d see on a law school exam as opposed to just “which element is missing,” “what is the name of this type of trust.” Using this as a template to prompt more reliable systems (Claude, 4-o) would probably produce some impressive multiple choice questions. Results on fill-in-the-blanks were mixed. Claude, 4-o, and Gemini all could ask the fill-in-the-blanks (though there were some flags Gemini that I pointed out under my analysis—as such, I’d probably be careful to triple check when using Gemini). Lexis and Copilot dropped the ball on fill-in-the-blanks, getting confused. Overall Ratings: Claude: Best; GPT-4o: 2d Best; Gemini: 3d Best; Copilot: Worst; Lexis+AI: Worst.

Researcher 4: GPT-4o, Claude, and Gemini all did well here. They all gave me good multiple-choice questions, fill in the blank questions, and easy questions without hesitation. They followed the directions and allowed me to provide my answers before assessing the response and giving me an explanation on the right/wrong answers. They all gave the questions to me one by one, and they were decent questions (they all mirrored each other). GPT-4o, Claude, and Gemini all corrected the operative legal fact exception in Copilot’s answer to one of the essay questions and none of the others did. Claude’s attention to detail in the fact that I had a typo in my essay response was unreal and something I had not seen until that point. Overall, all three of these platforms were really good, but GPT-4o did the best for all tasks. Overall Ratings: GPT-4o: Best; Claude: 2d Best; Gemini: 2d Best; Lexis+AI: 3d Best; Copilot: 3d Best.

Comparative Evaluations and Conclusions 

This study was a cross-sectional qualitative evaluation of GAI systems at a certain point in time and at the level of development of each system at that point in time. Each system evaluated is developing and expected to improve greatly in the near future. The fact that Lexis+ AI, the only legal-centric system in the study with a curated dataset of bona fide legal sources, did so poorly is almost certainly a product of the level of development of the integration of Lexis+ AI’s LLM architecture with the curated legal databases from which its inferences and answers were drawn; in most instances, it did not produce results that were as helpful to novice learners as the other AI systems tested.

Another important takeaway is that although there were clear leaders—GPT-4o and Claude—and a clear last place finisher—Lexis+ AI—the other AI systems in the study, Copilot and Gemini, did not consistently fall short, and in the comments of the researchers it is clear that there was noticeable parity in the overall helpfulness of the systems. Two performed better, and one was consistently worse, but the others still were very helpful in learning new legal concepts and subjects.

Overall Performance: GPT-4o and Claude consistently performed well across all five tasks, often being rated either the best or second best by multiple researchers. Their success can be attributed to several key factors:

1. Depth and Clarity of Responses: Both GPT-4o and Claude excelled in providing detailed and clear responses. Researchers highlighted that these systems were adept at explaining complex legal concepts, giving comprehensive feedback, and identifying nuanced details in their queries.

2. Handling of Counterfactuals: Both systems showed a strong ability to catch and correct counterfactual information, which is crucial for learning and understanding legal topics accurately. GPT-4o, in particular, was praised for its attention to detail and ability to provide clear explanations when identifying and correcting errors.

3. Quality of Questions and Feedback: In the context of quizzing and assessing understanding, GPT-4o and Claude stood out for their ability to craft relevant and challenging multiple-choice and fill-in-the-blank questions. They also provided substantive feedback on essay responses, with Claude often noted for its attention to detail and thoroughness in evaluating answers.

4. Prompt Engineering Techniques: Claude showed a strong capability in applying prompt engineering techniques, effectively using decision trees and chain-of-thought prompts to catch both obvious and subtle misconceptions. GPT-4o also performed well, especially in providing organized and easy-to-follow responses.

5. User Experience: Both systems were praised for their user-friendly interfaces and the structured way they presented information. This made them particularly useful for law students who need clear and organized guidance to understand and retain complex material.

Overall Performance: In this study, Lexis+AI consistently received the lowest ratings across all tasks, with researchers pointing out several critical shortcomings that hindered its effectiveness as a learning tool.

1. Lack of Flexibility and Adaptability: Researchers noted that Lexis+AI struggled to adapt to broad and varied prompts. It often fixated on specific parts of queries and failed to provide comprehensive responses. This rigidity made it less effective in helping students understand complex legal topics and answer broader questions.

2. Poor Handling of Counterfactuals: Lexis+AI frequently failed to identify and correct counterfactual information. This is a significant drawback for a legal learning tool, or any learning tool, as the ability to spot and correct inaccuracies is crucial for communicating a sound and accurate understanding of the law.

3. Inadequate Question Generation: In tasks involving the creation of quiz questions, Lexis+AI fell short. Researchers reported that it either failed to generate questions or produced questions that were not relevant or well-constructed. This limited its usefulness in assessing and reinforcing students’ understanding of legal concepts.

4. Lack of Sources and References: A common theme in the feedback was Lexis+AI’s inability to provide credible sources and references. Although drawing from curated databases, Lexis+ AI gravitates to trial level cases and unpublished cases at an alarming rate, both of which are considered low-quality authority or non-authority in legal analysis. In the context of legal education, the ability to cross-reference information with reliable sources is essential. Lexis+AI’s failure to do this made it less trustworthy and effective compared to GPT-4o and Claude.

5. User Experience and Interface: Researchers found Lexis+AI’s interface and overall user experience to be lacking. It often provided confusing or irrelevant information, which made it frustrating to use and less effective as a learning tool.

This study is the not the only one to note these shortcomings of Lexis+ AI. A Stanford Center for Human-Centered Artificial Intelligence (HAI) study pointed out Lexis+ AI’s propensity for misinformation, hallucinations, or confabulations.[76]

GPT-4o and Claude were rated as the better systems overall due to their ability to provide detailed, clear, and accurate responses, effectively handle counterfactuals, generate high-quality questions, and offer a user-friendly experience. In contrast, Lexis+AI consistently underperformed due to its lack of flexibility, poor handling of counterfactuals, inadequate question generation, absence of credible sources, and subpar user experience. These differences highlight the strengths of GPT-4o and Claude in supporting legal education and the areas where Lexis+AI needs significant improvement.

Summary of Comparative Evaluations and Conclusions from Part II of the Experiments

With each of the seven researchers, the use of a GAI system as a tutor improved their performance on the post-test exams compared to the pre-test exam. In only one instance with one researcher did a score for the pre-test exam on trademark (24/25) exceed their post-test result on trademark (22/25). For all other researchers and all other exam results the researchers’ scores improved on the exams following their study and tutoring with the GAI system. The results are reported in tabular and graphical format below.

ResearcherCopyright Pre-Test[77]Copyright Post-TestTrademark Pre-TestTrademark Post-Test
092110171517
585414231923
377514231923
650219222422[78]
55609201519
646120221719
023112221721

The results graphically represented are as follows:

The percentage of increase (and in one instance, decrease) from the pre-test score and the post-test score for each researcher is as follows:A graph of a bar graph

Description automatically generated

Represented graphically, the percent increases are:

The average increase in copyright scores pre-test to post-test and in trademark scores pre-test to post-test is as follows:

Thus, although researchers on average improved on the test scores between the pre-test and the post-test, the average increases for copyright were much greater than for trademark.

Conclusion

The outcomes of Part I and Part II of the experiments support the primary thesis I am stating here: all four of the cutting edge, general purpose, public facing GAI tools can assist learners at a novice level of expertise by instructing, tutoring, and assessing the learning of law, and the cutting edge general purpose system tested in Part II, Claude 3.5 Sonnet, proved that it can improve the performance of novice learners on objective examinations in two law subjects that the learners have never studied.

The current text-based GAI models in the form of Anthropic’s Claude, Microsoft’s Copilot, Google’s Gemini 1.5 Pro, and OpenAI’s GPT-4o Omni models successfully replicate traditional methods of tutoring and academic support by responding to a learner’s questions and turning back to ask the learner questions and assess the learner’s progress, all in a tirelessly patient manner. The leading GAIs can translate between languages and also translate material downwards to a more simplified level of discussion, all to promote the learner’s knowledge and understanding.[79] Several of the leading GAIs—Claude, Copilot, Gemini, and GPT-4o—are multimodal in nature, meaning you can upload an image, such as a photo, drawing, painting, chart, or diagram, and ask the system to use its computer vision to provide information or answer questions about the uploaded file. All of the systems can receive uploaded text, and Claude and GPT-4o can receive an uploaded PDF and answer questions about the document. By these means, a GAI can explain, elaborate on, and summarize course material, and it can interpret, translate, visualize, or reorder parts of the material. AI can evaluate and correct the grammar, spelling, syntax, and style of a piece of student writing, a task which campus writing centers often avoid for pedagogical reasons or simply logistical and resource-driven reasons.

In Part I of the study, GPT-4o and Claude were rated as the better systems overall due to their ability to provide detailed, clear, and accurate responses, effectively handle counterfactuals, generate high-quality questions, and offer a user-friendly experience. In contrast, Lexis+AI consistently underperformed due to its lack of flexibility, poor handling of counterfactuals, inadequate question generation, absence of credible sources, and subpar user experience. These differences highlight the strengths of GPT-4o and Claude in supporting legal education and the areas where Lexis+AI needs significant improvement.

In Part II of the study, the use of a GAI system, Claude 3.5 Sonnet, as a tutor improved the performance of each of the seven novice learners in the study on post-test examinations compared to the learners’ pre-test examination results on subjects the learners had never before studied.

The most basic practice of academic support is to sit down with a student and work to try to bridge a gap in understanding or advance the student’s knowledge and skills development. The methods that will be most advantageous in this task are, of course, dependent on the particular student and the particular problem in comprehension or skills development the student is having. But the best and most effective coaching and tutoring practices often gravitate around the need for the tutor to explain, elaborate, simplify, or summarize material in a new and advantageous manner, and then to assess the student’s learning of the subject matter.

A true strength of GAI in learning is that it allows an individualized learning experience for each student. GAI can customize the experience by taking into account the individual’s pace, abilities, interests, and learning styles and delivering content suited for each individual. It can design and refine the material and receive immediate feedback from the student on how and where to adjust the lessons and assessments. 

Anyone using the current large language model text-based GAI systems will properly be amazed at the AI’s ability to talk about a wide range of topics and respond to requests for it to explain, elaborate, simplify, or summarize the material. Several of the GAI systems tested—Copilot, Gemini, and GPT-4o—can search the internet for information and assimilate and synthesize a response on any information that is not hidden behind password- or paywall-protected proprietary databases.[80] Unlike prior search engines, which could be quite clever in gathering a laundry list of sources on topics, a GAI goes beyond the search engines to assimilate, synthesize, and explain the answer on the topic in impressively drafted language.[81] Not only is information communicated, but a GAI’s response can also stimulate thinking, change or reframe the learner’s perspective, and ignite curiosity. If the AI’s initial response is well over the head of the user, the AI can and happily will simplify it to a more appropriate level of discourse. The chatbot model of textual GAIs is prepared to engage in a conversation on the topic, so it will suggest follow-up questions for the user to think about if the user’s curiosity or imagination is flagging. Therefore, the leading GAIs—GPT-4o Omni, Claude 3.5 Sonnet, Copilot, and Genesis 1.5 Pro—are well suited to assist all learners—novice law students and expert lawyers and judges alike—in learning and eventually mastering new topics of study.

  •  Michael D. Murray is the Spears Gilbert Associate Professor of Law and the principal investigator of the Artificial Intelligence and the Law Project at the University of Kentucky, J. David Rosenberg College of Law, that was funded by an OVPR Grant. Professor Murray teaches law and undergraduate courses at the Rosenberg College of Law and the Lewis Honors College at the University of Kentucky. He also is the chair of the Academic Status Committee at the Rosenberg College of Law. Professor Murray thanks his staff of researchers in 2024: Toby Giganti, Michael Kern, Bridget Lienau, and Riliegh Owens, and in 2023: Qwenton Briggs, Graydon McGrannahan, and Emily Prince. He also thanks the volunteers who participated in Part II of the study reported here: Ben Bertram, Matt Chaney, Alex Heaton, Nick Jackson, Bridget Lienau, Katie Nipper, and Luke Price. Professor Murray thanks Dean Paul Salamanca, Associate Dean for Research Josh Douglas, and the Rosenberg College of Law faculty research fund for their on-going support of his research. This research was supported by two OVPR CURATE Grants from the University of Kentucky.
  1. Intelligent tutoring systems are educational applications of artificial intelligence and machine learning technologies. Intelligent tutoring systems are designed to interact directly with students and perform many of the instructional functions usually reserved for teachers or tutors. The systems have been used to teach students in such diverse domains as law, language, mathematics, medicine, physics, and reading comprehension. How Effective Are Intelligent Tutoring Systems?, Am. Psych. Ass’n (Jun. 30, 2015), https://www.apa.org/pubs/highlights/spotlight/issue-37; Muralidhar Kurni et al., Intelligent Tutoring Systemsin A Beginner’s Guide to Introduce Artificial Intelligence in Teaching and Learning 24–44 (2023). 
  2. Adaptive learning platforms are software systems that use artificial intelligence or data mining to create personalized learning experiences for each learner. They can track and adjust to the learner’s progress, engagement, performance, and level of understanding in real time. See Xiaoman Wang et al., The Efficacy of Artificial Intelligence-Enabled Adaptive Learning Systems From 2010 to 2022 on Learner Outcomes: A Meta-Analysis, J. of Educ. Computing Rsch., May 15, 2024, https://journals.sagepub.com/doi/10.1177/07356331241240459; Tumaini Kabudi et al., AI-enabled adaptive learning systems: A systematic mapping of the literaturein 2 Computers and Education: Artificial Intelligence 100017 (2021), https://www.sciencedirect.com/science/article/pii/S2666920X21000114; Serhat Kurt, Adaptive Learning: What is It, What are its Benefits and How Does it Work?, Educ. Tech (Sep. 12, 2021), https://educationaltechnology.net/adaptive-learning-what-is-it-what-are-its-benefits-and-how-does-it-work/. 
  3. AI-based tutoring systems leverage artificial intelligence to overcome the availability, cost, and location obstacles students face with obtaining human tutoring. AI-based systems facilitate affordable, on-demand, personalized tutoring opportunities for low-income students, students of color, and other underserved students, who may have previously been deterred from more traditional in-person or online tutoring programs. Chris Sadler, The Future of AI Tutoring in Higher Ed, New Am. (Apr. 4, 2023), https://www.newamerica.org/oti/briefs/the-future-of-ai-tutoring-in-higher-ed/. 
  4. The current model of Claude available without a paid subscription is Claude 3.5 Sonnet. See Anthropic, Introducing Claude 3.5 Sonnet, ANTHROPIC (Jun. 20, 2024), https://www.anthropic.com/news/claude-3-5-sonnet. 
  5. See Maria Diaz, How to use Copilot (formerly called Bing Chat), ZD Net (Aug. 27, 2024), https://www.zdnet.com/article/how-to-use-copilot/ (discussing how Copilot is incorporated into the Bing browser and Office 365 applications. Copilot incorporates OpenAI’s GPT-4 large language model, an image generator powered by OpenAI’s DALL-E 3, and its native Bing search making it a fully-featured multimodal chatbot). 
  6. Crabtree, What is OpenAI’s GPT-4o? Launch Date, How it Works, Use Cases & More, Data Camp (Sept. 29, 2024), https://www.datacamp.com/blog/what-is-gpt-4o (discussing how the OpenAI state-of-the-art large language model is GPT-4o which replaced ChatGPT-4 Turbo and how GPT-4o incorporates DALL-E 3 image generator and can access Microsoft’s Bing for web searching, thus making it a fully featured multimodal chatbot). 
  7. Mat Velloso & Josh Woodward, Gemini 1.5 Pro updates, 1.5 Flash debut and 2 new Gemma models, The Keyword (May 14, 2024), https://blog.google/technology/developers/gemini-gemma-developer-updates-may-2024/. The current (as of May 2024) edition of Google’s chatbot, Gemini, is powered by the Gemini 1.5 Pro model. 
  8. Experience a New Era in AI Legal Research Tools, LexisNexis, https://www.lexisnexis.com/en-us/products/lexis-plus.page (last accessed June 22, 2024). 
  9. Nitin Sharma, Generative AI in Education: Know Meaning, Benefits & Challenges, Hurixdigital (June 12, 2024), https://www.hurix.com/generative-ai-in-education-know-meaning-benefits-challenges/. 
  10. See Pratik Rupareliya, Generative AI-Powered Personalized & Adaptive E-Learning Platforms, Intuz (Jan. 15, 2025), https://www.intuz.com/blog/generative-ai-powered-elearning. 
  11. Chien-Chang Lin et al., Artificial intelligence in intelligent tutoring systems toward sustainable education: a systematic review, 10 Smart Learn. Environ., no. 41, 2023, at 2–3,12–13. 
  12. See Haoran Xie et al., Trends and development in technology-enhanced adaptive/personalized learning: A systematic review of journal publications from 2007 to 2017, Computs. & Educ. (Oct. 1, 2019), https://doi.org/10.1016/j.compedu.2019.103599 (“this study discusses the trends and issues in the area of technology-enhanced adaptive/personalized learning by reviewing research studies in the recent decade.”). 
  13. Muhammad Afzaal et al., Automatic and Intelligent Recommendations to Support Students’ Self-Regulationin 2021 International Conference on Advanced Learning Technologies 336, 336 (2021). 
  14. See Lin et al., supra note 11; Xie et al., supra note 12. 
  15. Leveraging AI to improve adaptive tutoring systems, Berkeley Sch. of Educ., https://bse.berkeley.edu/leveraging-ai-improve-adaptive-tutoring-systems (last accessed Jul. 12, 2024). 
  16. See id. 
  17. See id. 
  18. Andy Viano, The Future of AI Tutors in Higher Education, Ed Tech (Sep. 23, 2022), https://edtechmagazine.com/higher/article/2022/09/future-ai-tutors-higher-education; Steven Tom & David Leichner, Steven Tom Of Adtalem On The Future Of Artificial Intelligence, Medium (Apr. 30, 2023), https://medium.com/authority-magazine/steven-tom-of-adtalem-on-the-future-of-artificial-intelligence-bbd69e8f82ae. 
  19. See Viano, supra note 18. 
  20. See Viano, supra note 18; Tom & Leichner supra note 18. 
  21. See Nisar Ahmed Dahri et al., Investigating AI-based academic support acceptance and its impact on students’ performance in Malaysian and Pakistani higher education institutions, Educ. & Info. Techs. 18695, 18695 (2024), https://doi.org/10.1007/s10639-024-12599-x. 
  22. See id. at 18732. 
  23. See id. at 18734–35. 
  24. See Tammy Pettinato Oltz, ChatGPT, Professor of Law, SSRN (Feb. 6, 2023), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4347630 (“While its performance on teaching-related tasks was not as impressive as its performance on service-related tasks, ChatGPT’s responses to the teaching-related tasks still showed some potential for reducing faculty workload. Perhaps unsurprisingly, given the higher level of specialized knowledge required for good teaching, ChatGPT’s drafts would likely require more significant faculty intervention to make them workable. That said, in two out of three cases, the drafts still provided time-saving jumping-off points, and this indicates that ChatGPT could be used to push ahead the typical starting point for some key teaching tasks.”); see also Rabee Al-Qasem et al., Towards the Exploitation of LLM-Based Chatbot for Providing Legal Support to Palestinian Cooperatives, arXiv (June 9, 2023), https://arxiv.org/abs/2306.05827(reporting an overall accuracy rate of 82% for an LLM-based chatbot when asked legal questions about Palestinian cooperatives). 
  25. See, e.g., Mohd Javaid, et al., Unlocking the opportunities through ChatGPT Tool towards ameliorating the education system, 3 BenchCouncil Transactions on Benchmarks, Standards & Evaluations 100115 (June 2023). 
  26. Lynsey Meakin, AI and assessment: Rethinking assessment strategies and supporting students in appropriate use of AI, Impact (May 13, 2024), https://my.chartered.college/impact_article/ai-and-assessment-rethinking-assessment-strategies-and-supporting-students-in-appropriate-use-of-ai/. 
  27. Ramiz Zekaj, AI Language Models as Educational Allies-Enhancing Instructional Support in Higher Education, 22 Int’l J. Learning, Teaching & Educ. Research 120, 120–34 (2023). 
  28. Artificial Intelligence in the Classroom, Chapman U. AI Hub, https://www.chapman.edu/ai/artificial-intelligence-in-the-classroom.aspx (last accessed Jul. 12, 2024). 
  29. Reza Hadi Mogavi et al., ChatGPT in education: A blessing or a curse? A qualitative study exploring early adopters’ utilization and perceptions, 2 Computs. in Hum. Behav.: Artificial Hums. 100027 (2024). 
  30. Oltz, supra note 24. 
  31. Jinqi Lai et al., Large Language Models in Law: A Survey, arXiv (Nov. 26, 2023), https://ar5iv.labs.arxiv.org/html/2312.03718. 
  32. Jonathan H. Choi et al., Lawyering in the Age of Artificial Intelligence, 109 Minn. L. Rev. 147 (2024); see also Jonathan H. Choi & Daniel Schwarcz, AI Assistance in Legal Analysis: An Empirical Study, 73 J. Legal Ed. 384 (2025).  
  33. Choi et al., supra note 32 at 5 
  34. Choi et al., supra note 32 at 6, 25. 
  35. Choi et al., supra note 32 at 17 
  36. Choi et al., supra note 32 at 6, 17, 55. 
  37. AI and the Possibilities for the Legal Profession — and Legal Education, Yale L. Sch. Mag. (May 3, 2023), https://law.yale.edu/yls-today/news/ai-and-possibilities-legal-profession-and-legal-education; Mark Haddad, Generative AI and the Small Law Firm: Leveling the Playing Field, Thomson Reuters (Oct. 19, 2023), https://www.thomsonreuters.com/en-us/posts/legal/generative-ai-small-law-level-field/. 
  38. See Megan Ma, How AI Can Reshape Legal Education + Lawyer Training, Artificial Law. (May 20, 2024), https://www.artificiallawyer.com/2024/05/20/how-ai-can-reshape-legal-education-lawyer-training/ (“And in any case, students have always cheated; in a way, AI might even help level the playing field. ‘AI puts kids who don’t have an Uncle Alito to call for help with their take-home on an equal footing with those who do,’ says DeLaurentis.”); Fabio Boehl, Empowering Creativity Through AI: A Journey Towards Equality and Inclusivity, Medium (May 2, 2023), https://medium.com/@fabiorex/empowering-creativity-through-ai-a-journey-towards-equality-and-inclusivity-4a71ebe1ba03; Hayden Ernst, Artificial: A Study on the use of Artificial Intelligence in Art, DigitalCommons@UNO (May 2023), https://digitalcommons.unomaha.edu/cgi/viewcontent.cgi?article=1260&context=university_honors_program. 
  39. Vince Beiser, AI & the Law…& what it means for legal education & lawyers, Geo. L. Mag. (Jan. 2, 2024), https://www.law.georgetown.edu/news/ai-the-law-what-it-means-for-legal-education-lawyers/; Andres Fortino, Embracing Creativity: How AI Can Enhance the Creative Process, Emerging Techs. Collaborative Blog (Nov. 2, 2023), https://www.sps.nyu.edu/homepage/emerging-technologies-collaborative/blog/2023/embracing-creativity-how-ai-can-enhance-the-creative-process.html.  
  40. See Ma, supra note 38 (“And in any case, students have always cheated; in a way, AI might even help level the playing field. ‘AI puts kids who don’t have an Uncle Alito to call for help with their take-home on an equal footing with those who do,’ says DeLaurentis.”); Jeff Neal, The legal profession in 2024: AI, Harv. L. Today (Feb. 14, 2024), https://hls.harvard.edu/today/harvard-law-expert-explains-how-ai-may-transform-the-legal-profession-in-2024/. 
  41. See generally Recommendation on the Ethics of Artificial Intelligence, UNESCO (Sep. 26, 2024), https://www.unesco.org/en/articles/recommendation-ethics-artificial-intelligence; Arto Laitinen & Otto Sahlgren, AI Systems and Respect for Human Autonomy, Frontiers in A.I. (Oct. 25, 2021), https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2021.705164/full.  
  42. See generally Emilio Ferrara, Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models, arXiv (Nov. 13, 2023), https://ar5iv.labs.arxiv.org/html/2304.03738; Zhao et al., GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models, arXiv (Dec. 11, 2023), https://ar5iv.labs.arxiv.org/html/2312.06315v1 
  43. See UNESCO, supra note 41; Laitinen & Sahlgren, supra note 41; Ferrara, supra note 42; Zhao et al., supra note 42. 
  44. See, e.g., John Edwards, Can AI Ever Become Capable of Original Thought?, Info. Week (Oct. 30, 2023), https://www.informationweek.com/machine-learning-ai/can-ai-ever-become-capable-of-original-thought- 
  45. See What is Prompt Engineering, AWS Mach. Learning, https://aws.amazon.com/what-is/prompt-engineering/ (last visited Jan. 18, 2024) (“Clearly define the desired response in your prompt to avoid misinterpretation by the AI. For instance, if you are asking for a novel summary, clearly state that you are looking for a summary, not a detailed analysis. This helps the AI to focus only on your request and provide a response that aligns with your objective.”). 
  46. See generally Yao Fu et al., Complexity-Based Prompting for Multi-Step Reasoning, arXiv (Jan. 30, 2023), https://arxiv.org/pdf/2210.00720.pdf. 
  47. See generally Jiacheng Liu et al., Generated Knowledge Prompting for Commonsense Reasoning, arXiv (Sep. 28, 2022), https://arxiv.org/abs/2110.08387; Dan Cleary, Generated Knowledge Prompting, PromptHub (Jul. 12, 2024), https://www.prompthub.us/blog/generated-knowledge-prompting; Sander Schulhoff, Generated Knowledge, Learn Prompting (Jul. 6, 2024), https://learnprompting.org/docs/intermediate/generated_knowledge; Sunil Ramlochan, Master Prompting Techniques: Knowledge Generation Prompting, Prompt Eng’g & AI Inst. (Apr. 26, 2023), https://promptengineering.org/knowledge-generation-prompting/;  
  48. This example was provided by BingChat. BingChat Session 1-23-24 Generated Knowledge Prompting (copy on file with author). 
  49. See generally Jason Wei et al., Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, arXiv (Oct. 31, 2022), https://arxiv.org/pdf/2201.11903.pdf. 
  50. Maithili Badhan, Comprehensive Guide to Chain-of-Thought Prompting, Mercity, https://www.mercity.ai/blog-post/guide-to-chain-of-thought-prompting (last visited Jan. 19, 2024). 
  51. Id. 
  52. Sunil Ramlochan, Master Prompting Concepts: Chain of Thought Prompting, Prompt Eng’g Inst. (Apr. 26, 2023), https://promptengineering.org/master-prompting-concepts-chain-of-thought-prompting/. 
  53. Chain-of-Thought Prompting, Prompt Eng’g Guide, https://www.promptingguide.ai/techniques/cot (last visited Jan. 19, 2024). “Think step by step” prompt is classified as a form of Automatic Chain-of-Thought (Auto-CoT) prompting; see also Rick Hightower, Prompt Engineering: Getting What You Want From ChatGPT Using CoT and Synthetic Prompts, Linkedin (Jul. 26, 2023), https://www.linkedin.com/pulse/prompt-engineering-getting-what-you-want-from-chatgpt-rick-hightower/. 
  54. See generally Shunyu Yao et al., Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arXiv (Dec. 3, 2023), https://arxiv.org/pdf/2305.10601.pdf; Jieyi Long, Large Language Model Guided Tree-of-Thought, arXiv (May 15, 2023), https://arxiv.org/pdf/2305.08291.pdf. 
  55. See Banghao Chen et al., Unleashing the potential of prompt engineering inLarge Language Models: a comprehensive review, arXiv at 10–11 (Oct. 27, 2023), https://arxiv.org/pdf/2310.14735.pdf. 
  56. Id. 
  57. Vern Walker, Rule Tree Diagrams, LLT Lab (Oct. 19, 2015), https://www.lltlab.org/rule-tree-diagrams/. 
  58. E.g., 17 U.S.C. § 107; Fair Use, Colum. Copyright Advisory Off. (Jan. 19, 2024), https://copyright.columbia.edu/basics/fair-use.html. (describing how courts evaluate, apply, and balance factors when determining whether a use is or is not a fair use). 
  59. Yao et al., supra note 54. 
  60. See generally Jaehun Jung et al., Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations, arXiv (Oct. 24, 2022), https://arxiv.org/pdf/2205.11822.pdf. The term “Maieutic” relates to the name used by Socrates to describe his method—midwifery—by which Socrates meant the recursive process of questioning and requestioning was intended to allow the questionee to give birth to a more fully formed, logical, and consistent understanding of a concept or position. See Maieutic, Merriam-Webster Dictionary, https://www.merriam-webster.com/dictionary/maieutic (last visited Jan. 21, 2024). 
  61. See Edward Y. Chang, Prompting Large Language Models With the Socratic Method, arXiv 2–3, 5, 7 (Mar. 16, 2023), https://arxiv.org/pdf/2303.08769.pdf. 
  62. See id. at 1–3, 9. BingChat created the following sequence of Maieutic questioning as an example: If we start with the question: “Why do leaves change color in the fall?” The AI model might initially respond with: The color change in fall leaves is due to the breakdown of chlorophyll, which reveals other pigments in the leaves. We might then ask the model to explain the answer further: What causes the breakdown of chlorophyll? The AI might respond: The breakdown of chlorophyll is triggered by the shorter daylight hours and cooler temperatures of autumn. We might then ask: What are the other pigments that become visible? And the AI might respond: The other pigments that become visible when chlorophyll breaks down are carotenoids, which produce yellow and orange hues, and anthocyanins, which produce red and purple hues. Michael D. Murray, BingChat Session, Maieutic Prompting (Jan. 21, 2024) (copy on file with author). 
  63. See Chang, supra note 62, at 3–5. 
  64. Seee.g., Ryszard Szopa, The Socratic Method of Large Language Models, Medium (Apr. 3, 2023), https://szopa.medium.com/the-socratic-method-of-large-language-models-24e43bd774b3. 
  65. See generally Sam Baron, Explainable AI and Causal Understanding: Counterfactual Approaches Considered, 33 Minds & Machines 347 (2023); Nishtha Madaan et al., Generate Your Counterfactuals: Towards Controlled Counterfactual Generation for Textin Proceedings of the AAAI Conference on Artificial Intelligence (May 18, 2021). 
  66. See Bobby Zarkov, Counterfactual Explanations: The What-Ifs of AI Decision Making, KPMG Switz., https://kpmg.com/ch/en/insights/technology/artificial-intelligence-counterfactual-explanation.html (last visited Feb. 18, 2025) (“Counterfactuals help to identify which aspects of the input data are most influential in the model’s decisions, aiding in model debugging, fairness analysis and improving model performance.”). 
  67. See id. 
  68. See, e.g., Pradeep Kumar Misra, Strategies for Learningin Learning and Teaching for Teachers 59 (Oct. 19, 2021); Paul Black & Dylan Wiliam, Assessment and Classroom Learning, 5 Assessment in Educ: Principles, Pol’y & Prac. 7 (1998), https://doi.org/10.1080/0969595980050102; Kate Jones, A Summary of Evidence Based Formative Assessment Strategies, Evid. Based Educ. (Nov. 25, 2022), https://evidencebased.education/a-summary-of-evidence-based-formative-assessment-strategies/; C.M. Reigeluth, The elaboration theory: Guidance for scope and sequence decisionsin 2 Instructional Design Theories and Models: A New Paradigm of Instructional Theory 425 (1999). 
  69. Rupareliya, supra note 10. 
  70. Microsoft’s Copilot is designed to use the Bing browser to search the web in the regular course of performing its inference. Gemini 1.5 Pro will also routinely consult the web using a Google search function. OpenAI’s GPT-4o will search the web if specifically requested (prompted) to do so. Claude and Lexis+ AI will not search the web by design, they intentionally have no native web browser function. See AI That Searches the Internet – An Up-to-Date List, ChatLabs (Apr. 17, 2024), https://writingmate.ai/blog/ai-that-searches-the-internet-an-up-to-date-list. 
  71. See Understanding Confabulations in AI: Causes, Prevention, and Detection, Synterex (July 26, 2024), https://synterex.com/understanding-confabulations-in-ai-causes-prevention-and-detection/; Negar Maleki et al., AI Hallucinations: A Misnomer Worth Clarifying, 2024 IEEE Conference on Artificial Intelligence (2024), https://ieeecai.org/2024/wp-content/pdfs/540900a127/540900a127.pdf, DOI 10.1109/CAI59869.2024.00033; Varun Magesh et al., Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (June 6, 2024), https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Hallucinations.pdf. 
  72. See John Timmer, Researchers describe how to tell if ChatGPT is confabulating, Ars Technica (Jun. 20, 2024, 2:32 PM), https://arstechnica.com/ai/2024/06/researchers-describe-how-to-tell-if-chatgpt-is-confabulating/. 
  73. As noted earlier, Tasks 4 and 5 did not have scripts. 
  74. The exam instruments used for Part II Rounds 1 and 3 can be found in, Michael D. Murray, Artificial Intelligence for Learning the Law: Generative AI and Academic Support—Report of Experiments (Sep. 4, 2024), https://ssrn.com/abstract=4946680. 
  75. Michael D. Murray, Artificial Intelligence for Learning the Law: Generative AI and Academic Support—Report of Experiments (Sep. 4, 2024), https://ssrn.com/abstract=4946680 (breaking down the results and findings of the individual tasks of the experiments in detail). 
  76. Magesh et al., supra note 71 (pointing out the same and even greater shortcomings in Westlaw with AI, and in GPT-4, the predecessor to GPT-4o Omni); see also id. at 16. 
  77. All results are out of 25. 
  78. As noted above, this was the only post-test score that was lower than the pre-test score on a subject. 
  79. Heejung An et al., Considering a Generative AI Tool for Translation: Using a Chatbot for Communication in Families’ Home Languages, Naeyc (Dec. 12, 2024), https://www.naeyc.org/resources/blog/considering-ai-for-translation. 
  80. See From Search Engines to Answer Engines, N.C. Bar Ass’n (Nov. 4, 2024) https://www.ncbar.org/2024/11/04/from-search-engines-to-answer-engines/ 
  81. Catherine Reach, The Difference Between Search Engines, Generative AI, and Automation Tools, N.C. Bar Ass’n (Oct. 24, 2023), https://www.ncbar.org/2023/10/24/the-difference-between-search-engines-generative-ai-and-automation-tools/.