Through more than a year of hard work, we continuously processed 49493 prescriptions, including 2284 herbs and 2509 symptoms, through computer and manual operations such as data screening and cleaning, and format standardization. These prescriptions are derived from 1459 ancient medical books, including "Sheng Ji Zong Lu", "Sheng Hui", "Pu Ji Fang" and so on.
Tips: For more information about datasets, click on the datasets section of the menu bar.
For more than one year, we processed 84463 prescriptions by computer-aided and manual processing. Through a comprehensive analysis of prescriptions, we have found many situations, such as: incorrect and incomplete herbs, unidentified herbs, herbs without doses and so on. We screened out the prescriptions with the above conditions.
Through several screening processes, we finally retained 49493 prescriptions as the main dataset for experimental study. We have separated a total of 2284 different herbs. All the herbs in the dataset can be viewed in the selection box below.
You can browse herbs from the drop-down box. The first drop-down box is the first letter of pinyin, and the corresponding herbs will show in the second drop-down box.
Next, we manually label the indication descriptions of 49493 prescriptions. However, the indication descriptions are unstructured, which cannot be directly used to build model and conduct research.
Through manually processing, we marked the unstructured indication descriptions as symptom labels. You can view the processing results through the example below.
In the process of labeling, there are some labels with similar meanings but different expressions. We represent them with a unified label. Finally, we obtained 2509 kinds of symptoms and diseases.
Copyright © 2022 College of Intelligence and Computing, Tianjin University All rights reserved.