AnyBURL

This is the home of the rule learner AnyBURL (Anytime Bottom Up Rule Learning). AnyBURL has been designed for the use case of knowledge base completion, however, it can also be applied to any other use case where rules are helpful. You can use it to (i) learn rules, (ii) apply them create candidate rankings, (iii) and to evaluate the created ranking.

The latest AnyBURL (available since March 2020) version is based on Reinforcement Learning. Upgrade to this version in case you are still using the old version. Its faster and generates better results.

Since December 2020 there exists a faster and probably better alternative for applying the rules learned by AnyBURL, which is called SAFRAN. See here for more details.

Alternative approaches to knowledge base completion, which are currently dominating the research field in number of publications, are embedding a given graph into a low dimensional vector space. If you want to compare AnyBURL to these approaches we recommend the use of LibKGE.

Results

These are the results of the new multithreaded AnyBURL version in comparison to the previous AnyBURL version described in the IJCAI paper when learning 1000 seconds (~17 minutes learning). The IJCAI results have been computed on a laptop, the new results have been computed on a compute server 24 Intel(R) Xeon(R) CPU E5- 2630 v2 @ 2.60GHz cores using 22 cores.

Dataset	IJCAI-19	Current version of AnyBURL (AnyBURL-RE)
Metric	hits@1	hits@10	hits@1	hits@10
WN18	93.9	95.6	94.8	96.2
WN18RR	44.6	55.5	45.7	57.7
FB15	80.4	89.0	81.4	89.4
FB15-237	23.0	47.9	27.3	52.2
YAGO03-10	42.9	63.9	49.2	68.9
IMPORTANT NOTE: To create results as reported in the table, you have to add a specific parameter setting to the configuration used for learning the rules (not to the configuration used for predicting / applying the rules). This is the parameter REWRITE_REFLEXIV = true. The impact of this parameter setting is described in the third paragraph of Section 4.4 in the AnyBURL reinforcement paper. Due to a mistake the default setting of this parameter is false. It should always be manually set to true if there are no reasons against this choice.
Furthermore, the results achieved for WN18 and WN18RR are based on another additional line in the configuration used for learning. For these datasets it is possible to increase the length of the cyclic rules from three (default value) to five, which gives a small additional plus. You have to add the line MAX_LENGTH_CYCLIC = 5.

Download (and build) AnyBURL

The current version of AnyBURL uses reinforcement learning to learn rules of different length sampled from different path types in parallel. In each time span AnyBURL computes how well the sampled rules allow to reconstruct the training set, assigning more computational resources to those path profiles that score best. For more details we have to point to the paper, which is unfortunately not yet available.

AnyBURL is packaged as jar file and requires no external resources. You can download the jar file here.

If you have problems in running the jar due to, e.g., some java version conflict, you can build an AnyBURL.jar on your own. If you want (or need) to do this continue as follows, otherwise skip the following lines. Download the source code and unzip it. Compile the code and create the jar as follows. First create a folder build, then compile with the following command.

javac de/unima/ki/anyburl/*.java -d build
Package in a jar:

jar cfv AnyBURL-RE.jar -C build .
There is a dot . at the end of the line, its required. Afterwards you can delete the build folder.

Datasets

You can use AnyBURL on any dataset that comes as a set of triple. The supported format is rather simple and should look like this. The separator can be blank or tab.

anne loves bob
bob marriedTo charly
bob hasGender male
...
So far it has been tested on the well known datasets FB15k, FB15237, WN18, WN18RR and YAGO03-10. We have zipped the FB15, FB15-237, and WN18 in one file. Please download and unzip. YAGO03-10 and WN18RR is available at the ConvE webpage.

Run AnyBURL

AnyBURL can be used (i) to learn rules and (ii) to apply the learned rules to solve prediction tasks. These are two distinct processes that have to be started independently.

Learning

Download and open the file config-learn.properties and modify the line that directs to the training file choosing the datasets that you want to apply AnyBURL to. Create the output folder rules, then run AnyBURL with this command.

java -cp AnyBURL-RE.jar de.unima.ki.anyburl.LearnReinforced config-learn.properties

If you have been using the previous version of AnyBURL, you might have noticed that the only difference is related to the change from Learn to LearnReinforced. Everything else stays the same.

It will create three files alpha-10, alpha-50, and alpha-100 in the rules folder. These files contain the rules learned after 10, 50, and 100 seconds. During executing AnyBURL you can see how many rules have been found so far and how the saturation rate for cyclic and acyclic rules changes over time. Note that everything should also work fine when setting the maximal heapsize to only 3G.

Learning Parameters relevant for Learning

You can change the following parameters to modify the standard learning behaviour of AnyBURL. Any changes have to be made by writing a line (or changing a line) into the config-learn.properties file

Policy: POLICY = 2 is the default setting. Possible values are 1 (greedy policy) and 2 (weighted policy). Experiments do not show a significant difference even tough the weighted variant is probably more robust in the sense that its less negative affected by the specifics of rather specific datasets.
Reward: REWARD = 5 is the default setting. Possible values are 1 (correct predictions), 3 (correct predictions weighted by confidence with laplace smoothing), 5 (correct predictions weighted by confidence with laplace smoothing divided by (rule length-1)^2).
Epsilon: EPSILON = 0.1 is the default setting, which allocates a core with a probability of 0.1 randomly. You can change this to a value of 0.0 to 1.0 (= random policy).
Thresholds for learning rules can be set like this THRESHOLD_CORRECT_PREDICTIONS = 2 and or THRESHOLD_CONFIDENCE = 0.0001. The values shown here are the default values.

Predicting

Download and open the file config-apply.properties and modify it according to your needs (if required). Create the output folder predictions, then run AnyBURL with this command. Note that you have to specify the rules that have been learned previously.

java -cp AnyBURL-RE.jar de.unima.ki.anyburl.Apply config-apply.properties

If you have been using the previous version of AnyBURL, you might have noticed that nothing has changed w.r.t to the method call.

This will create two files alpha-10, alpha-100 in the predictions folder. Each contains the top-k rankings for the completion tasks, which are already filtered rankings (this is the only reason why the validation and test set must be specified in the apply config files)

Prediction Parameters

You can change the following parameters to modify the standard prediction (= rule application) behaviour of AnyBURL.

A kind of laplace smoothing is done via the parameter UNSEEN_NEGATIVE_EXAMPLES = 5 (default value), which assumes that 5 negative examples are added to the examples that have been observed.

Evaluating Results

To eval these results, use this command after modifying config-eval.properties (if required). The evaluation result is printed to standard out.

java -cp AnyBURL-RE.jar de.unima.ki.anyburl.Eval config-eval.properties

If you follow the whole workflow using the referenced config-files, the evaluation program should print results similar to the following output:

...
-----
10 0.1997 0.3965
50 0.2197 0.4337
100 0.2299 0.4517
The first column refers to the time used for learning, the second column shows the hits@1 score, the third column the hits@10 score.

The evaluation command line interface is only used for demonstration purpose, its not intended to be used in a large scale experimental setting.

Extensions

SAFRAN (Scalable and fast non-redundant rule application) is a framework for fast inference of groundings and aggregation of logical rules on large heterogeneous knowledge graphs. It requires a rule set learned by AnyBURL as input, which is used to make predictions for the standard KBC task. This means that is can be used as alternative to the rule application method that is built in AnyBURL . In most cases it is significantly faster and slightly better in terms of hits@k and MRR.

Publications

Christian Meilicke, Melisachew Wudage Chekol, Manuel Fink, and Heiner Stuckenschmidt: Reinforced Anytime Bottom-Up Rule Learning for Knowledge Graph Completion. 2020.Link to paper published via arxiv.org
 
Christian Meilicke, Melisachew Wudage Chekol, Daniel Ruffinelli and Heiner Stuckenschmidt: Anytime Bottom-Up Rule Learning for Knowledge Graph Completion. IJCAI 2019. Link to authors version.
 
Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla and Heiner Stuckenschmidt: Fine-grained evaluation of rule- and embedding-based systems for knowledge graph completion. ISWC 2018. Link to the paper.
The third paper is not about AnyBURL but about a simple rule-based baseline called RuleN and its comparison againsts some state of the art embedding methods. The good results of RuleN motivated us to develop AnyBURL.

Previous and Special Versions

2020-08 A version (based on the current RE version) that alows to convert the specific AnyBURL prediction format to a list of triples is available here.
2019-05 The AnyBURL version, which is the multi-threaded extension of the IJCAI version, is available here.
IJCAI-19 The AnyBURL version, which has been used in the IJCAI-2019 submission, is available here.
RuleN is available here. It is a predecessor of AnyBURL, with a rather restrictive language bias. Its good performance has been the reason for the development of AnyBURL.

Contact

If you have any questions, feel free to write a mail at any time. We are also interested to hear about some applications where AnyBURL was useful.

Christian Meilicke [christian AT informatik.uni-mannheim.de], University Mannheim, Data and Web Science Group
License

AnyBURL is available under the 3-clause BSD, sometimes referred to as modified BSD license:

Copyright (c) University Mannheim, Data and Web Science Group

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS â€œAS ISâ€ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Colophon

Wikipedia: " A burl [...] is a tree growth in which the grain has grown in a deformed manner. It is commonly found in the form of a rounded outgrowth on a tree trunk or branch that is filled with small knots from dormant buds." If you cut it you get what is shown in the background. The small knots, which are also called burls, can be associated with constants and the regularities that are associated with the forms and structures that surround them.
