Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
haoxizhong committed Oct 8, 2018
1 parent 4632475 commit e3cc76e
Show file tree
Hide file tree
Showing 2 changed files with 120 additions and 40 deletions.
45 changes: 5 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# "中国法研杯"司法人工智能挑战赛数据说明

English version can be found [here](https://github.com/thunlp/CAIL/blob/master/README_en.md).

## 一、简介

法律智能旨在赋予机器阅读理解法律文本与定量分析案例的能力,完成罪名预测、法律条款推荐、刑期预测等具有实际应用需求的任务,有望辅助法官、律师等人士更加高效地进行法律判决。近年来,以深度学习和自然语言处理为代表的人工智能技术取得巨大突破,也开始在法律智能领域崭露头角,受到学术界和产业界的广泛关注。

为了促进法律智能相关技术的发展,在最高人民法院信息中心、共青团中央青年发展部的指导下,中国司法大数据研究院、中国中文信息学会、中电科系统团委联合清华大学、北京大学、中国科学院软件研究所共同举办“2018中国‘法研杯’法律智能挑战赛([CAIL2018](http://180.76.238.177))”。挑战赛将提供海量的刑事法律文书数据作为数据集,旨在为研究者提供学术交流平台,推动语言理解和人工智能领域技术在法律领域的应用,促进法律人工智能事业的发展。每年比赛结束后将举办技术交流和颁奖活动。诚邀学术界和工业界的研究者和开发者积极参与该挑战赛!

##### 重要提示
由于一些原因,域名登录暂时无法使用,可以使用 http://180.76.238.177 登陆网站进行比赛。如带来不变,请谅解。
为了促进法律智能相关技术的发展,在最高人民法院信息中心、共青团中央青年发展部的指导下,中国司法大数据研究院、中国中文信息学会、中电科系统团委联合清华大学、北京大学、中国科学院软件研究所共同举办“2018中国‘法研杯’法律智能挑战赛([CAIL2018](http://cail.cipsc.org.cn/))”。挑战赛将提供海量的刑事法律文书数据作为数据集,旨在为研究者提供学术交流平台,推动语言理解和人工智能领域技术在法律领域的应用,促进法律人工智能事业的发展。每年比赛结束后将举办技术交流和颁奖活动。诚邀学术界和工业界的研究者和开发者积极参与该挑战赛!


## 二、任务说明
Expand All @@ -26,9 +25,7 @@

数据集共包括`268万刑法法律文书`,共涉及[202条罪名](meta/accu.txt)[183条法条](meta/law.txt),刑期长短包括**0-25年、无期、死刑**

我们将先后发布CAIL2018-Small和CAIL2018-Large两组数据集。CAIL2018-Small包括19.6万份文书样例,直接在该网站发布,包括15万训练集,1.6万验证集和3万测试集。这部分数据可以[注册下载](http://cail.cipsc.org.cn),供参赛者前期训练和测试。

比赛开始2-3周后(具体时间请关注比赛新闻),我们将通过网络下载向有资格的参赛队伍定向发布CAIL2018-Large数据集,包括150万文书样例。最后,剩余90万份文书将作为第一阶段的测试数据CAIL2018-Large-test。
我们将先后发布CAIL2018-Small和CAIL2018-Large两组数据集。CAIL2018-Small包括19.6万份文书样例,直接在该网站发布,包括15万训练集,1.6万验证集和3万测试集。这部分数据可以[注册下载](http://cail.cipsc.org.cn),供参赛者前期训练和测试。CAIL2018-Large数据集,包括150万文书样例。剩余90万份文书将作为第一阶段的测试数据CAIL2018-Large-test。

#### 2.2.1 字段及意义
数据利用json格式储存,每一行为一条数据,每条数据均为一个字典。
Expand All @@ -53,7 +50,7 @@
{
"relevant_articles": [234],
"accusation": ["故意伤害"],
"criminals": ["段某"],
"criminals": ["胡某"],
"term_of_imprisonment":
{
"death_penalty": false,
Expand Down Expand Up @@ -116,35 +113,3 @@
### 0. 有没有选手交流的平台?

选手交流QQ群:237633234。

### 1. 评测是否需要上传代码?

代码上传不是必须的,我们不会保存选手上传的任何文件。

选手上传的模型和文件应按照[代码提交说明](https://github.com/thunlp/CAIL2018)进行上传,只需符合要求并可以进行正常评测即可。

### 2. 上传后测试机的环境是怎么样的,如是否有相应的python包,是否有gpu,各个软件和第三方库的版本信息等。是否允许选手提出增加第三方依赖的需求?

我们主要支持python3.5的编译环境,有GPU(Tesla P100 16G)。如果需要C++(服务器版本为5.4.0)或者Java(服务器版本为1.8.0_171),可以参考[代码提交说明](https://github.com/thunlp/CAIL2018)进行编译上传。

现有python3.5环境可以参考[python环境列表](https://github.com/thunlp/CAIL2018#现有python3.5系统环境)

如果有其他需要的环境,请联系比赛管理员进行安装。

### 3. 上传代码后是否有测试时间限制?

原则上没有评测时间限制,但如果我们发现选手的程序4小时没有输出结果,会手动终止,并与选手联系。

### 4. 上传代码以及其他文件是否有大小限制,如本地训练好的模型可能很大?

通过网站上传代码和模型的文件大小限制为1GB,一般来说正常的模型不会超过1GB。
如果需要上传更大的模型或者代码,请联系比赛管理人员进行沟通。

### 5. 比赛是否允许用外部数据?

如需使用外部数据,可以在上传文件时同时进行。需要注意的是评测环境为禁止网络连接。

### 6. 是否允许人手工构建的一些知识性数据?

我们欢迎选手使用各种方式来解决本次比赛的任务,进而不会对使用的方法进行任何限制。

115 changes: 115 additions & 0 deletions README_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Chinese AI and Law Chanllenge Competition

## 1. Introduction

Intellgient Legal aims at giving machines the ability of judging a case by predicting charges, relevant articles and the term of penalty. These application can help lawyers to judge more sufficiently. In recent years, the improvment of deep learning and natural language processing begins to help the field of law. The combination of AI and law has been noticed by many people.

In order to promote the development of the related technology of legal intelligence, under the guidance of xxx, yyy, together with zzz jointly sponsored 2018 Chinese AI and Law Chanllenge Competition([CAIL2018](http://cail.cipsc.org.cn/)). The competition aims to provide researchers with a platform for academic exchange, promote language understanding and the application of artificial intelligence technology in the legal field, and promote the development of Legal Artificial intelligence. Technical exchanges and awards will be held after the end of each year. Researchers and developers from academia and industry are invited to take part in this challenge.


## 2. Task Specification

### 2.1 Introduction

* Task 1(Charge Preidction): Predicting the charges according to the fact description in the law documents;
* Task 2(Relevant Articles Prediction): Predicting the relevant articles according to the fact description in the law documents;
* Task 3(Term of Penalty Prediction):Predicting the term of penalty according to the fact description in the law documents.

The contestants can one or more tasks to take part in the competition. Meanwhile, we will provide rewards for every tasks in order to encourage contestants to take part in more tasks.

### 2.2 Data

The data of the competition comes from the law documents of criminal cases published on [Chinese Judgment Online](http://wenshu.court.gov.cn/). Every document contains the fact description and the details of the cases. Moreover, there are relevant articles, the charges of the defendants, and the term of penalty including in the documents.

The dataset contains `2,680,000 criminal law documents`,covered [202 charges](meta/accu.txt)[183 articles](meta/law.txt),and the term of penalty varies from **0 to 25 years, including the life imprisonment and the death penalty**.

There are two parts of our dataset called CAIL2018-Small and CAIL2018-Large. CAIL2018-Small contains about 196,000 documents in total, including abour 150,000 documents for training, 16,000 documents for validating and 30,000 documents for testing. CAIL2018-Large contains 1,500,000 documents for training, and another 900,000 documents for testing. You can download the dataset from the [website](http://cail.cipsc.org.cn).

#### 2.2.1 The Fields of Data
The data are stored in json format and every line in the data contains one document. The details of the fields in the data are listed following:

* **fact**: The description of fact.
* **meta**: The label information which contains:
* **criminals**: The defendant in the cases.(There will only be one defendant in the case.)
* **punish\_of\_money**: The punish of money in unit RMB.
* **accusation**: The defendant's charges.
* **relevant\_articles**: The relevant articles to the case.
* **term\_of\_imprisonment**: The term of imprisonment of the defendant.
There three more fields in this part:
* **death\_penalty**: Whether the defendant suffer the death penalty.
* **life\_imprisonment**: Whether the defendant suffer the life imprisonment.
* **imprisonment**: The length of the term of imprisonment in unit month.

Here is a example of the data:

```
{
"fact": "2015年11月5日上午,被告人胡某在平湖市乍浦镇的嘉兴市多凌金牛制衣有限公司车间内,与被害人孙某因工作琐事发生口角,后被告人胡某用木制坐垫打伤被害人孙某左腹部。经平湖公安司法鉴定中心鉴定:孙某的左腹部损伤已达重伤二级。",
"meta":
{
"relevant_articles": [234],
"accusation": ["故意伤害"],
"criminals": ["胡某"],
"term_of_imprisonment":
{
"death_penalty": false,
"imprisonment": 12,
"life_imprisonment": false
}
}
}
```

### 2.3 Evaluation Methods

We have provided the scoring program for contestants, and you can find the details of the evaluation methods, enviroments and model submmission from [here](https://github.com/thunlp/CAIL2018).

The full score of every task is 100, and here are the details of evaluation methods.

#### 2.3.1 Task 1, 2

Task 1(Charge Prediction) and task 2(Relevant Articles Prediction) use Micro-F1-measure and Macro-F1-measure as the evalution method. The calculation should follow:

![f1](pic/f1.png)

And the final score should be:

![score1](pic/score_1.png)

#### 2.3.2 Task 3

For task 3(The Term of Penalty Prediction), suppose the prediction result is `lp` and the standard answer is `la`, then let

![v](pic/v.png)

```
If v≤0.2,then score=1;
If 0.2<v≤0.4,then score=0.8
……
And so on。
```
**Special case**

If the standard answer is the death penalty or the life imprisonment, the prediction `lp` should equal to `-2` or `-1` otherwise there is no score. The details can be found from [here](https://github.com/thunlp/CAIL2018/blob/a258c1dae88e8fc576529e6dcb012a430da00b95/judger/judger.py#L90).

Finally, the score of task 3 should be:

![score3](pic/score_3.png)

#### 2.3.3 Total score

The total score shoudl be:

![score_all](pic/score_all.png)


### 2.4 Baseline

We have provided a baseline system for all the three tasks([LibSVM](https://github.com/thunlp/CAIL2018/tree/master/baseline)).

## FAQ

### 0. Is there a platform for contestants to communicate with others?

QQ group number: 237633234。

0 comments on commit e3cc76e

Please sign in to comment.