Title: On the Gini-impurity Preservation For Privacy Random Forests

Abstract: Random forests have been one of the successful ensemble algorithms in machine learning. Various techniques have been utilized to preserve the privacy of random forests, such as anonymization, differential privacy, homomorphic encryption, etc. This work takes one step towards data encryption by incorporating some crucial ingredients of learning algorithm. Specifically, we develop a new encryption to preserve data's Gini impurity, which plays an important role during the construction of random forests. The basic idea is to modify the structure of binary search tree to store several examples in each node, and encrypt the data features by incorporating label and order information. Theoretically, our scheme is proven to preserve the minimum Gini impurity in ciphertexts without decrypting, and we also present the security guarantee for encryption. For random forests, we encrypt data features based on our Gini-impurity-preserving scheme, and take the homomorphic encryption scheme CKKS to encrypt data labels owing to their importance and privacy. We finally present extensive empirical studies to validate the effectiveness, efficiency and security of our proposed method. * These authors contribute equally. This work takes one step towards data encryption by incorporating some crucial ingredients of learning algorithm, and main contributions can be summarized as follows: • We present a new encryption to preserve data's Gini impurity, and the basic idea is to modify the structure of binary search trees to maintain several samples on each node, and encrypt data's features by incorporating label and order information. Our scheme could change the data frequencies, which is also beneficial for data security. • Theoretically, we prove the preservation of minimum Gini impurity in ciphertexts without decryption, which plays an important role on the construction of random forests. Our scheme also satisfies the security against Gini-impurity-preserving chosen plaintext attack. • We focus on the privacy random forests in the popular client-server protocol, and take our Gini-impurity-preserving encryption for data features. We adopt homomorphic encryption CKKS to encrypt data labels. Our encrypted decision tree takes smaller communication and computational complexities, as shown in Table 1. • Extensive experiments show that our encrypted random forests take significantly better performance than prior privacy random forests via encryption, anonymization and differential privacy, and are comparable to original (plaintexts) random forests without encryption. Our encrypted random forests make a good balance between computational cost and data security. The rest of this work is constructed as follows: Section 2 introduces relevant work. Section 3 presents an encryption on data's Gini impurity. Section 4 proposes the encrypted random forests. Section 5 conducts extensive experiments. Section 6 concludes with future work.Homomorphic Encryption (HE) is a cryptosystem, which allows operations on encrypted data without access to a secret key [40]. We can perform some mathematical operations such as addition and multiplication operations on encrypted data without revealing sensitive information. Given an encryption function E(•) and a decryption function D(•), the HE scheme provides two operators ⊕ and ⊗ such that, for every pair of plaintexts x 1 and x 2 , where + and × denote standard addition and multiplication operations, respectively. Various HE schemes have been developed during the past years, e.g., ElGamal [67], Paillier [68], CKKS [42] encryption, etc. Relevant techniques have been successfully applied to machine learning tasks such as regression problem [69, 70], neural network [71-75], collaborative filtering [76], etc. Generally, HE schemes are accompanied with high computational costs, and one main challenge is to maintain a good trade-off among security, effectiveness and computational cost in real applications.

Section: Introduction
From the pioneer work [1], random forests have been one successful ensemble algorithm [2][3][4], with diverse applications such as ecology [5], computational biology [6], objection recognition [7], remote sensing [8], computer vision [9], etc. The basic idea is to construct a large number of random trees individually and make prediction based on an average of their predictions. Numerous variants of random forests have been developed to improve performance under different settings [10][11][12][13][14][15][16][17][18][19][20][21][22], as well as theoretical understandings on the success of random forests [21,[23][24][25][26][27]. The splitting criterion, such as Gini impurity and information gain, has been one of the most important ingredient during the construction of random forests [1,28].
Various techniques have been adopted to preserve the privacy of random forests, especially for sensitive tasks such as medical diagnosis, financial predictions, and so on. For example, differential privacy [29] has been successfully applied to preserve the privacy of random forests [30,31] and decision trees [32][33][34], by adding certain noise perturbations. Another relevant approach is the secure multi-party computation for random forests and decision tree [35][36][37][38][39], where the privacy is preserved by multi-party joint computation over respective data inputs without leakage.
Homomorphic encryption [40][41][42][43] has been one of the most important cryptosystems in privacypreserving computing [44][45][46][47]. Based on such scheme, various algorithms have been developed to train privacy random forests and decision trees [48][49][50][51][52], while some other methods only considered inference without training due to computational costs [53][54][55][56][57][58]. In addition, LeFevre et al. [59] took Table 1: Comparisons of communications and complexities for different privacy-preserving decision trees. Here, n is the number of examples in training data, and τ is the cardinality of label space. Let h and κ be the height and number of leaves of decision tree (h < κ), respectively. Denote by ȷ the average number of possible splitting features and positions in the construction of decision trees, and p is the number of clients for secure multi-party computation. '-' means the corresponding methods focusing only on inference without training. the anonymization [60] for random forests by grouping similar attributes so as to hardly identify specific individual information.
Figure 1: A simple illustration for our encryption: each plaintext is encrypted into a ciphertext vector (ci, ei,j).
Here, random numbers c1 < c2 < • • • < cs are introduced to preserve the Gini impurity for random forests, and we take homomorphic encryption scheme for ei,j = Enc(kpub, j) in Eqn. (5), which is helpful for decryption.
Secure Multi-Party Computation (SMC) [77] is another cryptographic technique to jointly compute a function from multiple private inputs with confidential, which has been used for machine learning to protect privacy data, such as neural network [78][79][80], k-means clustering [81][82][83], random forests and decision trees [35][36][37][38][39], etc. Differential privacy is introduced to preserve individual privacy by taking statistically inconsequential changes to data [84], and relevant techniques have been utilized in neural network [85][86][87], random forests [30,31] and decision trees [32][33][34].
We introduce some notations used in this work. Write [τ ] = {1, 2, • • • , τ } for integer τ ≥ 2. Let X ⊂ R d and Y = [τ ] denote the feature and label space, respectively. A training sample is given by S n = {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n )}. Let |A| be the cardinality of set A, and • denotes the corresponding encrypted value. Let N (µ, σ 2 ) be a normal distribution of mean µ and variance σ 2 .

Section: An Encryption for Gini Impurity
This section presents the first encryption to preserve the minimum Gini impurity over encrypted data without decryption. For simplicity, we give the detailed encryption on one-dimensional feature by incorporating label information, and make similar considerations for other dimensions.

Section: Theoretical Analysis for Gini Impurity
Let A = {(a 1 , y 1 ), • • • , (a n , y n )} be a dataset with labels y i ∈ [τ ], and define the Gini value as
Gini(A) = 1 - y∈[τ ] p 2 y ,
where p y denotes the proportion of the label y. Let A l a = {(a i , y i ) : a i ≤ a, (a i , y i ) ∈ A} and A r a = {(a i , y i ) : a i > a, (a i , y i ) ∈ A} be the left and right subsets of A w.r.t. a splitting point a, respectively. We define the Gini impurity w.r.t. dataset A and splitting point a as
I G (A, a) = w l • Gini(A l a ) + w r • Gini(A r a ) ,(1)
where
w l = |A l a |/n and w r = |A r a |/n. Let I * G (A) be the minimum Gini impurity of dataset A, i.e., I * G (A) = min a∈R {I G (A, a)} .(2)
The minimum Gini impurity plays a crucial role on nodes splitting during the construction of random forests. We re-sort dataset A with a non-decreasing order for a 1 , a 2 , • • • , a n as follows:
A = (a ⟨1⟩ , y ⟨1⟩ ), (a ⟨2⟩ , y ⟨2⟩ ), • • • , (a ⟨n⟩ , y ⟨n⟩ ) ,(3)
where a ⟨1⟩ ≤ a ⟨2⟩ ≤ • • • ≤ a ⟨n⟩ , and y ⟨1⟩ , y ⟨2⟩ , • • • , y ⟨n⟩ denote their corresponding labels. By incorporating label information, we partition dataset A into several datasets I 1 , I 2 , • • • , I s as follows:
I 1 = (a ⟨1⟩ , y ⟨1⟩ ), • • • , (a ⟨k1⟩ , y ⟨k1⟩ ) , I 2 = (a ⟨k1+1⟩ , y ⟨k1+1⟩ ), , • • • , (a ⟨k1+k2⟩ , y ⟨k1+k2⟩ ) ,(4)
• • • I s = (a ⟨k1+k2+•••+ks-1+1⟩ , y ⟨k1+k2+•••+ks-1+1⟩ ), • • • , (a ⟨n⟩ , y ⟨n⟩ ) .
Here, any two adjacent datasets have different labels, and all samples have an identical label in one dataset I j , i.e., y ⟨i⟩ = y ⟨i ′ ⟩ for every (a ⟨i⟩ , y ⟨i⟩ ) ∈ I j and (a ⟨i ′ ⟩ , y ⟨i ′ ⟩ ) ∈ I j .

Section: Algorithm 1
The Gini-impurity-preserving encryption Input: We consider two important factors in encryption: i) preservation of the minimum Gini impurity I * G (A) over the encrypted data, and ii) a cryptosystem for encoding and decoding data. Based on such recognition, we introduce the following encryption, for every example (a ⟨i⟩ , y ⟨i⟩ ) ∈ I j ,
Dataset A = {(a 1 , y 1 ), • • • , (a n , y n )} Output: Binary search tree BT , ciphertexts { a 1 , • • • , a n } Initialize: Tree BT = ∅ with its cipher 1 = c max /2, where c max = 2 λ log 2 n for i = 1
a ⟨i⟩ = a ⟨i⟩ 1 , a ⟨i⟩ 2 = (c 1 , Enc(k pub , i)) for j = 1 , (c j , Enc(k pub , i -k 1 -• • • -k j-1 )) for 2 ≤ j ≤ s .(5)
Here, c 1 , c 2 , • • • , c s are random numbers s.t. c 1 < c 2 < • • • < c s , which aim to preserve the minimum Gini impurity. We take the homomorphic encryption scheme CKKS with a public key k pub for
a ⟨i⟩ 2 = Enc(k pub , i -k 1 -• • • -k j-1
) in Eqn. (5), and it is useful for decryption. Figure 1 presents a simple illustration for our encryption, and the detailed decryption is given in Appendix A.
We now present our main theorem as follows: Theorem 1. We have I * G (A) = I * G (A ′ ), for re-sort dataset A by Eqn.
(3) and for the corresponding encrypted dataset
A ′ = {( a ⟨1⟩ 1 , y ⟨1⟩ ), • • • , ( a ⟨n⟩ 1 , y ⟨n⟩ )} from Eqns. (4)-(5).
This theorem shows that our encryption could preserve the minimum Gini impurity over encrypted data. The detailed proof is presented in Appendix B, which involves the proof of piecewise monotonicity of I G (A, a) w.r.t. splitting point a, and then solves the minimum splitting point on plaintexts, as well as the corresponding point on encrypted data.

Section: Binary Search Tree for Encryption
We now present new binary search tree to encrypt a 1 , • • • , a n dynamically, especially for un-ordered dataset A = {(a 1 , y 1 ), • • • , (a n , y n )}, or when example (a i , y i ) arrives in a streaming data. We begin with an alternative structure for binary search tree to maintain several samples on a node from Eqns. (4)- (5), rather than previous only one sample [88,89]. Our new structure is given by Struct Tree {Plaintext samples; Ciphertext cipher 1 , cipher 2 ; Tree left, right} .
The samples stores one or multiple samples from A, and cipher 1 and cipher 2 are the first and second ciphertext in Eqn. (5), and left and right denote left and right child of the current node, respectively.
We initialize an empty tree BT = ∅ and set its cipher 1 = c max /2 with c max = 2 λ log 2 n , and then we construct binary search tree iteratively. We maintain an interval [t min , t max ] in each iteration so as  (5). During the i-th iteration, we receive a sample (a i , y i ), and then take two steps as follows:
Step-I: Search a node for sample (a i , y i ) in binary search tree BT Let t be a node pointer with the initialization of the root of BT . We search a path downward in BT by comparing with a i , and the search will terminate when t is a leaf node or an empty node.
For an internal node t, the search continues to its left child and updates t max = t.cipher 1 if the left child t.left ̸ = ∅ and a i < max{a j : (a j , y j ) ∈ t.left.samples} ; and the search continues to its right child and updates t min = t.cipher 1 if the right child t.right ̸ = ∅ and a i > min{a j : (a j , y j ) ∈ t.right.samples} ; otherwise, the search terminates. This procedure can be easily implemented with a while loop.
It is necessary to consider two special cases after the above search. We update t = t.left if t.left ̸ = ∅, a i < min{a j : (a j , y j ) ∈ t.samples} and y i = y j for all (a j , y j ) ∈ t.left.samples . (6) In a similar manner, we update t = t.right if t.right ̸ = ∅, a i > max{a j : (a j , y j ) ∈ t.samples} and y i = y j for all (a j , y j ) ∈ t.right.samples . (7) Step-II: Update the binary search tree BT After Step-I, we could find a node t for sample (a i , y i ) and the corresponding interval [t min , t max ]. We directly append the example (a i , y i ) into t.samples if y i = y j for every (a j , y j ) ∈ t.samples; otherwise, it is necessary to split the node t according to a i .
We initialize an empty node l with l.samples = {(a j , y j ) ∈ t.samples : a j < a i }, and it is sufficient to consider l.samples ̸ = ∅. If t.left ̸ = ∅, then we set l.cipher 
and update t.left = l. Here, ξ is a random number sampled from N (0, 1), and notice that we may randomly sample ξ multiple times so that the condition holds in Eqns ( 8)-( 9), respectively. 
and update t.right = r. Algorithm 2 presents the detailed descriptions on the splitting of node t.
Algorithm 1 presents an overview of our Gini-impurity-preserving encryption, and the decryption is given in Appendix A. Our scheme does not only keep the minimum Gini impurity, but also change frequencies to prevent decryption from frequencies, which is also beneficial for encryption [90]. Our scheme takes an average of O(n log n) computational complexity, since it requires O(log n) and O(1) computational complexities to search and update a node in each iteration, respectively. Finally, the average and worst space complexities are O(log n) and O(n) for our encryption, respectively.

Section: Security Analysis
For ciphertext vector a = ( a 1 , a 2 ) in Eqn. (5), it suffices to discuss the first ciphertext a 1 , since the security of a 2 has been analyzed in homomorphic encryption CKKS [42]. Following semantic security against chosen plaintext attacks [89, 91], we define a security game Game GIPCPA :
• An adversary chooses two sequences with distinct plaintexts {a 0  4) and ( 5), and sends the ciphertexts to the adversary; • The adversary outputs a guess of b, i.e., which sequence is selected for encryption.
1 , • • • , a 0 n } and {a 1 1 , • • • , a 1 n },
We then introduce the security against Gini-impurity-preserving chosen plaintext attack as follows. Definition 2. A scheme is said to be indistinguishable under Gini-impurity-preserving chosen plaintext attack if the probability of outputs with the correct guess b is negligible for the adversary A in Game GIPCPA , that is,
Pr[A(Game GIPCPA ) = b] < 1/2 + small constant .
The following theorem shows that our encrypted plaintexts sequences are indistinguishable. Theorem 3. Our scheme for the first ciphertexts a 1 1 , a 2 1 , • • • , a n 1 in Section 3.2 is security against Gini-impurity-preserving chosen plaintext attack.
The detailed proof is presented in Appendix C, and the basic idea is inspired from [88]. We take induction on n to show that data point (a b i+1 , y i+1 ) affects the constructed binary search trees with the same probability as b = 0 and b = 1, and then the ciphertexts of data points (a b i+1 , y i+1 ) also follow the same distribution, i.e., 4 Encrypted Random Forests
P a 0 1 , • • • , a 0 i+1 |a 0 1 , • • • , a 0 i+1 = P a 1 1 , • • • , a 1 i+1 |a 1 1 , • • • , a 1 i+1 .
For encrypted random forests, we follow the popular client-server protocols [51,65,66,88]. A client encrypts training and testing data, and transfers encrypted data to an honest-but-curious server.
The server trains random forests from the encrypted data with the aid of client, and finally returns predictions on encrypted testing data.

Section: Encryption for training and testing datasets
Recall training data
S n = {(x 1 , y 1 ), • • • , (x n , y n )} with x i = (x i,1 , • • • , x i,d ). The client constructs d binary search trees BT 1 , BT 2 , • • • , BT d according to Algorithm 1 over different dimensional features and labels in S n , where BT j is used to encrypt features {x 1,j , • • • , x n,j } for j ∈ [d].
We take the homomorphic encryption CKKS [42] to encrypt training labels y 1 , • • • , y n . Each label y i is encoded with a vector of size τ by one-hot method, and we encrypt the vector by homomorphic encryption CKKS with a public key k pub . The ciphertexts
y i = [ y i,1 , • • • , y i,τ ] is given by y i,j = Enc(k pub , 1) for j = y i , Enc(k pub , 0) otherwise.
We obtain the final training data
S n = {( x 1 , y 1 ), • • • , ( x n , y n )}. Let Sn ′ = { x1 , • • • , xn ′ } be a testing data with instance xi = (x i,1 , • • • , xi,d ). For every plaintext xi,j with i ∈ [n ′ ] and j ∈ [d],
we search a node t in the binary search tree BT j , similarly to the node search (Step-I) in Section 3.2, and obtain its ciphertext xi,j = [t.cipher, Enc(k pub , i)]. We have the encrypted testing data
Sn ′ = { x1 , • • • , xn ′ }.

Section: Construction on encrypted random forests
Encrypted random forests consist of individual decision trees DT 1 , • • • , DT m , where each tree DT i is constructed as follows. We first take a bootstrap sample S ′ n from S n , and initialize DT i with one node of data S ′ n . We repeat the following procedure recursively for each leaf node, until the number of training samples is smaller than α, or all instances have the same label in the leaf node:
• Select a k-subset B from d available features randomly without replacement;
• Find the best splitting feature in B and position by Gini impurity from the encrypted data;
• Split the current node into left and right children via the best splitting position and feature.
Such construction is essentially similar to original random forests [1], whereas we require a different way to find the best splitting feature and position based on Gini impurity from the encrypted data.
Let t be the current leaf node for further splitting with the encrypted training data S t n ⊆ S n , and s 1 , • • • , s ȷ denote all possible splitting features and positions in the scope of the corresponding feature subset B from S t n . Here, the information of feature and position can be derived from the corresponding index i ∈ [ȷ] and subset B. 
For each i ∈ [ȷ],
= {( x l 1 , y l 1 ), • • • , ( x l n l , y l n l )} and S t n r i = {( x r 1 , y r 1 ), • • • , ( x r nr , y r nr )} . From Eqn.
(1), we have Gini impurity where
I G ( S t n , s i ) = n l n l + n r ⊗ I G ( S t n l i ) ⊕ n r n l + n r ⊗ I G ( S t n r i ) ,(12)
I G ( S t n l i ) = 1 ⊖ p l ⊙ p l and I G ( S t n r i ) = 1 ⊖ p r ⊙ p r , with p l = (1/n l ) ⊗ ( y l 1 ⊕, • • • , ⊕ y l n l ) and p r = (1/n r ) ⊗ ( y r 1 ⊕, • • • , ⊕ y r nr ) .
Here, ⊗, ⊙, ⊕ and ⊖ denote the CKKS element-wise homomorphic multiplication, dot, addition and subtraction functions, respectively, as in the work of [42].
The client gets plaintexts {Dec(k sec , I G ( S t n , s i ))} ȷ i=1 by decrypting with the secret key k sec , when the server sends ciphertexts {I G ( S t n , s i )} ȷ i=1 . If all instances have the same label in S t n , then we have Dec(k sec , I G ( S t n , s i )) = 0 for each i ∈ [ȷ], and we set i * = -1; otherwise, we set i * as
i * ∈ arg min i∈[ȷ] Dec(k sec , I G ( S t n , s i )) .(13)
The client sends index i * to the server for further splitting. Algorithm 3 presents the detailed descriptions on finding the best splitting feature and position.
For encrypted decision tree, the client requires the O(κ) computational complexity with κ leaves nodes, since the client performs constant basic operations for each node. The server takes the O(κȷτ n) computational complexity for Eqn. (12), where ȷ is an average of number of possible splitting features and positions, and τ and n are the number of labels and training examples, respectively.
Our method takes O(h) communication rounds of O(κȷ) communication bandwidth to train an encrypted decision tree of height h. This is because we consider the breadth-first search and aggregate all nodes in the same height and send to the client with a single message at one time.
We do not require bootstrapping for homomorphic encryption in 3-depth homomorphic multiplicative, since we independently compute the splitting feature and position for each node from Eqn. (12). This is different from previous encrypted decision trees [51,62], which could take expensive computational complexity for bootstrapping [40,92].  

Section: Prediction on encrypted testing dataset


Section: Experiment
We conduct experiments on 20 datasets2 as summarized in Table 2. Most datasets have been wellstudied in previous random forests. In addition to the original (plaintexts) random forests [1], we compare with six state-of-the-art privacy-preserving random forests in recent years.
• AnonyRFs: random forests based on anonymization with a top-down greedy search [59];
• DiffPrivRFs: random forests based on differential privacy [93];
• PPD-ERTs: extremely randomized trees from distributed structured data [64];
• PivotRFs: random forests based on a hybrid of threshold partially homomorphic encryption and secure multiparty computation techniques [62];
• MulPRFs: random forests based on the secure multiparty computation [94];
• HEldpRFs: random forests with fully homomorphic encryption and low-degree polynomial approximations [51].
For all random forests, we train 100 individual decision trees, and randomly select ⌊ √ d⌋ candidate features during node splitting. We set α = 10 for datasets of size smaller than 20,000 for our encrypted random forests; otherwise, set α = 100, following [95]. For multi-class datasets, we take the one-vs-all method for MulPRFs, since it is limited to binary classification. Other parameters are set according to their respective references, and more details can be found in Appendix D.

Section: Experimental comparisons
The performance is evaluated by five trials of 5-fold cross validation, and final prediction accuracies are obtained by averaging over these 25 runs, as summarized in Table 3. It is evident that our encrypted random forests take comparable performance with original random forests [1] on plaintexts, which nicely supports our Theorem 1 on the preservation of minimum Gini impurity in the construction of random forests. Our encrypted random forests are also comparable to MulPRFs if they can obtain results within 10 6 seconds (about 11.6 days), since MulPRFs are essentially similar to original random forests, yet with different implementation of secure multi-party computation.
As can be seen from Table 3, our random forests take significantly better performance than AnonyRFs and DiffPrivRFs, since the win/tie/loss counts show that our random forests win for most times and never lose. This is because AnonyRFs combine features by anonymization, while DiffPrivRFs add perturbations to features via differential privacy, therefore, both of them cause information lost in privacy process. Our random forests also achieve better performance than PivotRFs, since PivotRFs have to limit trees' depth for random forests due to heavy computations for HE and communications for secure multi-party computation.
Our random forests also outperform PPD-ERTs and HEldpRFs if results are obtained in 10 6 seconds, since PPD-ERTs adopt completely-random splitting, rather than selecting the minimum Gini impurity, while HEldpRFs take homomorphic encryption on features and employ low-degree polynomial approximation. Those approaches have modified the structures of original random forests. 

Section: Running time
All experiments are performed by c++ on the Ubuntu with 256GB main memory (AMD Ryzen Threadripper 3970X). We compare the training running time of our encrypted random forests and others, and the average CPU time (in seconds) is shown in Figure 2.
As expected, original random forests take the least running time over raw datasets without privacy preservation. Our encrypted random forests take larger running time than AnonyRFs and DiffPrivRFs because they are essentially similar to original random forests, yet with some simple modifications or perturbations on features. Our encrypted random forests take better performance and higher security.
Our encrypted random forests take smaller running time than PPD-ERTs, PivotRFs, MulPRFs and HEldpRFs, in particular for large datasets or high-dimensional datasets, where no results are obtained even after running out 10 6 seconds (almost 11.6 days). Because PPD-ERTs, PivotRFs and MulPRFs require expensive communication cost for multi-parity computation, while PivotRFs and HEldpRFs take heavy computation costs on HE scheme.

Section: Security analysis
We present security analysis for the first ciphertext a 1 in ciphertext vector a = ( a 1 , a 2 ), and the second ciphertext a 2 can be ensured by HE scheme. We compare with four state-of-the-art encryptions: differential privacy [93], anonymization [59], order-preserving scheme [96] and HE scheme [42]. Here, we present results of six datasets and randomly selecting one feature, and trends are similar on other dimensions and datasets. More results can be found in Appendix D.
Figure 3 shows the comparison results, and we take the bitwise leakage matrices to measure the security as in [97]: the more red the area, the higher the security. As expected, HE scheme presents the highest security, yet with heavy computational costs, for example, no results are obtained for datasets of size exceeding 3000 even after running out 10 6 seconds. It is also observed that our scheme presents higher security than the other three schemes, since those schemes simply present perturbations, compression or preserve the entire order information regardless of learning ingredients. In comparison, our scheme could make a good balance between security and computational cost.

Section: Conclusion
This work takes one step on data encryption from some crucial ingredients of learning algorithm. We present a new encryption to preserve data's Gini impurity, which plays a crucial role during the construction of random forests. For random forests, we encrypt data features based on our Gini-impurity-preserving scheme, and take the homomorphic encryption scheme CKKS to encrypt data labels. Both theoretically and empirically, we validate the effectiveness, efficiency and security of our proposed method. An interesting work is to exploit other learning ingredients, such as gini index and information gain, for data encryption in the future. 

Section: Algorithm 4 Decryption


Section: A Detailed Decryption for Our Encryption Method
A.1 Decryption for Our Encryption in Section 3.1
We present the decryption for ciphertext a i = ( a i 1 , a i 2 ) in Eqn. ( 5) by the following steps:
• Find the partition I j according to a i 1 ;
• Decrypt ciphertext a i 2 by the CKKS secret key k sec , and get index τ = Dec(k sec , a i 2 )
in partition I j .
• Obtain the plaintext a i as the τ -th sample in partition I j .
A.2 Decryption for Our Encryption of Binary Search Tree in Section 3.2
We decrypt a ciphertext a i = ( a i 1 , a i 2 ) based on binary search tree BT (in Section 3.2) and the CKKS secret key k sec by the following two steps, and Algorithm 4 presents the details of decryption:
• Let t be a node pointer with the initialization of the root of binary search tree BT . We then search a path downward in BT by comparing with a i 1 . The search continues to its left child if a i 1 < t.cipher 1 and update t = t.left; the search continues to its right child if a i 1 > t.cipher 1 and update t = t.right until a i 1 = t.cipher 1 .
• Decrypt ciphertext a i 2 by the CKKS secret key k sec , and get index τ = Dec(k sec , a i 2 ) in t.samples. Then we use the index τ to get the plaintext a i = t.samples[τ ].

Section: A.3 Formal Definition of Our Gini-impurity-preserving Encryption
We present a formal definition of our Gini-impurity preserving encryption as follows:
• S ← KeyGen(t max ): Generate the secret state S by initializing binary search tree BT = ∅, and a security parameter c max , which is a random number with c max > n. We maintain an interval [t min , t max ] in each secret state S with t min = 0 and t max = c max in the initial stage, so as to keep the order of ciphertexts c 1 , c 2 , • • • , c s in Eqn. (5). In this way, the ciphertexts are random numbers with semi-order of plaintexts, and we have different ciphertext even for the same plaintexts.
• S ′ , a i ← Encrypt(S, a i ): Encrypt a i and update the secret state to S ′ as for receiving a sample (a i , y i ) as follows:
-Search a node for sample (a i , y i ) in binary search tree BT as shown in Algorithm 1. Let t be a node pointer with the initialization of the root of BT . We search a path downward in BT by comparing with a i . The search will terminate when t is a leaf or an empty node. -Update the binary search tree BT . We directly append the example (a i , y i ) into t.samples if y i = y j for every (a j , y j ) ∈ t.samples; otherwise, it is necessary to split the node t according to a i . Algorithm 2 presents the detailed descriptions on the splitting of node t. -Compute ciphertext a i and update the state from S to S ′ . Append example (a i , y i ) into t.samples and update t.cipher 2 = Enc(k pub , |t.samples|). We then compute the ciphertext a i = (t.cipher 1 , t.cipher 2 ), and update the state from S to S ′ through our BT .
• a i ← Decrypt(S ′ , a i ): Solve plaintext a i for ciphertext a i based on state S ′ with binary search tree BT and the CKKS secret key k sec as follows:
-Let t be a node pointer with initialing the root of binary search tree BT . We then search a path downward in binary search tree BT by comparing with a i 1 . The search continues to its left child if a i 1 < t.cipher 1 and update t = t.left; the search continues to its right child if a i 1 > t.cipher 1 and update t = t.right until a i 1 = t.cipher 1 .
-Decrypt ciphertext a i 2 by CKKS secret key k sec , and get index τ = Dec(k sec , a i 2 ) in t.samples. Then we use the index τ to get the plaintext a i = t.samples[τ ].

Section: B Proof of Theorem 1
Lemma 4. Proof. Without loss of generality, we assume that a 1 , a 2 , • • • , a n are distinct elements. Our goal is to solve the optimal splitting point a * ∈ arg min a∈R {I G (A, a)}, and we begin with some notations used in our proof. For every label j ∈ [τ ], we denote by
For dataset A = {(a 1 , y 1 ), • • • , (a n , y n )}, let I 1 , I 2 , • • • , I s be
ν j = |{i ∈ [n] : y i = j}| ,
i.e., the number of the label j in dataset A. Let a be a splitting point, which splits A into left and right datasets A l a and A r a , that is,
A l a = {(a i , y i ) : a i ≤ a, (a i , y i ) ∈ A} , A r a = {(a i , y i ) : a i > a, (a i , y i ) ∈ A} .
For any given a ∈ R and j ∈ [τ ], we further denote by
ν l j = |{i ∈ [n] : y i = j, a i ≤ a}| ,
i.e., the number of label j in subsets A l a . This follows that
I G (A, a) = w l -w l j∈[τ ] (ν l j ) 2 |A l a | 2 + w r -w r j∈[τ ] (ν j -ν l j ) 2 (n -|A l a |) 2 ,
where w l = |A l a |/n, and w r = 1 -w l . In the following, we will explore the monotonicity of function
I G (A, a) when a ≥ max{a k : (a k , y k ) ∈ I i-1 }/2 + min{a k : (a k , y k ) ∈ I i }/2 a ≤ max{a k : (a k , y k ) ∈ I i }/2 + min{a k : (a k , y k ) ∈ I i+1 }/2 , for i = 2, 3, • • • , s -1.
It is easy to observe that ν j and ν l j keep constants except for ν l j * , where j * denotes the label of instances in I i . It remains to discuss the variable ν l j * , and we have
n 2 ∂I G (A, a) ∂ν l j * = 1 n j∈[τ ] (ν l j ) 2 (w l ) 2 -2 ν l j * w l - 1 n j∈[τ ] (ν j -ν l j ) 2 (w r ) 2 + 2 (ν j * -ν l j * ) w r = 1 n j∈[τ ]   ν l j w l 2 - ν j -ν l j w r 2   + 2 ν j * -ν l j * w r - ν l j * w l = 1 n j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   + 1 n   ν l j * w l 2 - ν j * -ν l j * w r ) 2   + 2 ν j * -ν l j * w r - ν l j * w l = 1 n j∈[τ ],j̸ =j * (ν l j ) 2 (w l ) 2 - (ν j -ν l j ) 2 (w r ) 2 + ν j * -ν l j * w r - ν l j * w l 2 - ν j * -ν l j * nw r - ν l j * nw l .
It is easy to observe that
0 ≤ ν j -ν l j w r ≤ n and 0 ≤ ν l j w l ≤ n for each j ∈ [τ ] . (14
)
It is sufficient to consider two cases as follows:
• We consider the first case
j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   ≥ 0 ,
and this follows that
0 ≤ j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   = j∈[τ ],j̸ =j * ν l j w l + ν j -ν l j w r ν l j w l - ν j -ν l j w r ≤ j∈[τ ],j̸ =j * 2n ν l j w l - ν j -ν l j w r = 2n j∈[τ ],j̸ =j * ν l j w l - ν j -ν l j w r .
We have
n - j∈[τ ],j̸ =j * ν j -ν l j w r ≥ n - j∈[τ ],j̸ =j * ν l j w l ,(15)
and it holds that
ν j * -ν l j * w r ≥ ν l j * w l . (16
)
Combining with Eqns. ( 14)-( 16), we have
∂I G (A, a) ∂ν l j * ≥ 0 ,
which proves the increasing function of I G (A, a).
• We now consider the second case
j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   < 0 ,
and this follows that
j∈[τ ]   ν l j w l 2 - ν j -ν l j w r 2   < ν l j * w l 2 - ν j * -ν l j * w r 2 = ν l j * w l + ν j * -ν l j * w r ν l j * w l - ν j * -ν l j * w r < 2n ν l j * w l - ν j * -ν l j * w r .
We have
n 2 ∂I G (A, a) ∂ν l j * = 1 n j∈[τ ]   ν l j w l 2 - ν j -ν l j w r ) 2   + 2 ν j * -ν l j * w r - ν l j * w l < 2 ν l j * w l - ν j * -ν l j * w r + 2 ν j * -ν l j * w r - ν l j * w l = 0 ,
which proves the decreasing function of I G (A, a).
In a summary, we prove the piecewise monotonicity of I G (A, a) for
a ≥ max{a k : (a k , y k ) ∈ I i-1 }/2 + min{a k : (a k , y k ) ∈ I i }/2 a ≤ max{a k : (a k , y k ) ∈ I i }/2 + min{a k : (a k , y k ) ∈ I i+1 }/2 , with i = 2, 3, • • • , s -1.
Moreover, it is easy to observe the monotonicity of
I G (A, a) from ν l j = 0(j ̸ = j * ) when a ∈ (-∞, (max{a k : (a k , y k ) ∈ I 1 } + min{a k : (a k , y k ) ∈ I 2 }) /2] ;
and from ν j -ν l j = 0 (j
̸ = j * ) when a ∈ [(max{a k : (a k , y k ) ∈ I s-1 } + min{a k : (a k , y k ) ∈ I s }) /2, +∞) .
It is not necessary to consider the splitting point a * > max{a k : (a k , y k ) ∈ I s } with |A r a | = 0, as well as the splitting point a * < min{a k : (a k , y k ) ∈ I 1 } with |A l a | = 0, i.e., without splitting dataset A. This completes the proof.

Section: Proof of Theorem 1
According to Lemma 6, we could find an optimal splitting point a * such that
a * ∈ i∈[s-1] max{a k : (a k , y k ) ∈ I i } + min{a k : (a k , y k ) ∈ I i+1 } 2 .
It is easy to observe that, for i ∈ [s -1]
I G (A, (max{a k : (a k , y k ) ∈ I i } + min{a k : (a k , y k ) ∈ I i+1 }/ 2) = I G (A, (c i + c i+1 )/2) ,
where c i is the identical ciphertext for those elements in ∈ I i , and we complete the proof.
Based on Theorem 1, our encryption with binary search trees (Algorithm 1) can also preserve the minimum Gini impurity over encrypted data, which can be shown by the following theorem: Proof. Our constructed binary search tree BT (Algorithm 1) maintains several samples on a node. For each node t, we have t.cipher 1 < t.right.cipher 1 and t.cipher 1 > t.left.cipher 1 . In this way, we can obtain a monotone increasing sequence I 1 , I 2 , • • • , I s by inorder traversing the built Tree BT in Algorithm 1. Each I i for j ∈ [s] contains several samples as follows:
I 1 = (a ⟨1⟩ , y ⟨1⟩ ), • • • , (a ⟨k1⟩ , y ⟨k1⟩ ) I 2 = (a ⟨k1+1⟩ , y ⟨k1+1⟩ ), , • • • , (a ⟨k2⟩ , y ⟨k2⟩ ) (17) • • • I s = (a ⟨ks-1+1⟩ , y ⟨ks-1+1⟩ ), • • • , (a ⟨n⟩ , y ⟨n⟩ ) ,
where a ⟨i ′ ⟩ < a ⟨j ′ ⟩ for (a ⟨i ′ ⟩ , y ⟨i ′ ⟩ ) ∈ I i , (a ⟨j ′ ⟩ , y ⟨j ′ ⟩ ) ∈ I j and i < j.
For each I i , if there is only one identical label, i.e., y ⟨i⟩ = y ⟨i ′ ⟩ for every (a ⟨i⟩ , y ⟨i⟩ ), (a ⟨i ′ ⟩ , y ⟨i ′ ⟩ ) ∈ I j , then we have
I * G (A) = I * G ( Â) from Theorem 1.
On the other hand, if the values are the same for all samples in I j (j ∈ [s]), i.e., a ⟨i⟩ = a ⟨i ′ ⟩ for every (a ⟨i⟩ , y ⟨i⟩ ), (a ⟨i ′ ⟩ , y ⟨i ′ ⟩ ) ∈ I j , then this splitting value is preserved without changing the minimum Gini-impurity of random forests. Hence, we also have
I * G (A) = I * G ( Â)
, and this completes the proof.

Section: C Proof of Theorem 3
Given two sequences of distinct plaintext
A 0 = {a 0 1 , a 0 2 , • • • , a 0 n } and A 1 = {a 1 1 , a 1 2 , • • • , a 1 n }
, their corresponding labels are randomly set as follows:  Then, the Gini impurity of dataset A b and splitting point a b i is given by
• Sort A b = {a b ⟨1⟩ , a b ⟨2⟩ , • • • , a b ⟨n⟩ } in ascending order, i.e., a b ⟨1⟩ < a b ⟨2⟩ < • • • < a b ⟨n⟩ for b ∈ {0, 1}. • Set the corresponding labels {y ⟨1⟩ , y ⟨2⟩ , • • • , y ⟨n⟩ }
A l,b a b i = {(a b 1 , y 1 ), (a b 2 , y 2 ), • • • , (a b i , y i )} and A r,b a b i = {(a b i+1 , y i+1 ), (a b i+2 , y i+2 ), • • • , (a b n , y n )} .
I G (A b , a b i ) = i n - i n j∈[τ ] (ν l,b j ) 2 (i) 2 + n -i n - n -i n j∈[τ ] (ν r,b j ) 2 (n -i) 2 .
For b ∈ {0, 1}, A l,0 a 0 i and A l,1 a 1 i have the same labels {y 1 , y 2 , • • • , y i }, and we have
j∈[τ ] (ν l,0 j ) 2 = j∈[τ ] (ν l,1 j ) 2 .
Similarly, we have j∈[τ ] (ν r,0 j ) 2 = j∈[τ ] (ν r,1 j ) 2 , and this completes the proof.
We can show that adversary can not distinguish the ciphertext of {(a 0 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 0 ⟨n⟩ , y ⟨n⟩ )} from that of {(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨n⟩ , y ⟨n⟩ )} in a probabilistic perspective, i.e., 
Pr a 0 ⟨1⟩ , • • • , a 0 ⟨n⟩ |(a 0 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 0 ⟨n⟩ , y ⟨n⟩ ) = Pr a 1 ⟨1⟩ , • • • , a 1 ⟨n⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨n⟩ , y ⟨n⟩ ) . (18
)
max_features ⌊ √ d ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ differentia privacy level ϵ - - - - - - 1 - anonymization parameter k - - - - - 10 - - multi-party size p 2 2 2 2 2 - - - max_bin - - - 16 - - - -
and
Pr a 1 ⟨1⟩ , • • • , a 1 ⟨i⟩ , a 1 ⟨i+1⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) = Pr a 1 ⟨1⟩ , • • • , a 1 ⟨i⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ) × Pr l 1 .cipher 1 |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) × Pr r 1 .cipher 1 |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) .
This follows that
Pr a 0 ⟨1⟩ , • • • , a 0 ⟨i⟩ , a 0 ⟨i+1⟩ |(a 0 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 0 ⟨i⟩ , y ⟨i⟩ ), (a 0 ⟨i+1⟩ , y ⟨i+1⟩ ) = Pr a 1 ⟨1⟩ , • • • , a 1 ⟨i⟩ , a 1 ⟨i+1⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) .
This completes the proof.

Section: D Experimental Details Experimental settings
We now present some details of compared methods in this work.
• Original RFs3 : The orignal plaintext random forests [1] implemented by sklearn;
• PPD-ERTs4 : The extremely randomized trees algorithm for learning from distributed horizontal partition data [64]; • PivotRFs5 : A private and efficient solution for tree-based models in a vertical federated learning setting [62], based on a hybrid of threshold partially homomorphic encryption and secure multiparty computation techniques; • MulPRFs6 : The original random forest [1] with the secure multiparty computation library MP-SPDZ [94], based on the sh2 protocol to support semi-honest two-party computation; • AnonyRFs7 : The random forests based on anonymization library Mondrian, is a top-down greedy data anonymization algorithm for relational dataset [59]; • DiffPrivRFs8 : Random forests based on differential privacy library Diffprivlib [93].
Tables 4 and5 summarizes some hyperparameters settings in our experiments. Except for parameters 'n_estimators' and 'α' in leaf splitting, other parameters are set according to their respective references. We set security parameter λ > 6.4 according to privacy-preserving requisites as in [89].  
wdbc 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×10 ×25 ×400 cancer 1 ×10 -3 ×10 -3 ×10 -3 ×1.5 ×10 ×20 ×300 breast 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×13 ×30 ×10 3 german 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×18 ×40 ×3000 diabetes 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×15 ×25 ×850 adver 1 ×10 -3 ×10 -3 ×10 -3 NA ×475 NA NA bibtex 1 ×10 -3 ×10 -3 ×10 -3 NA ×328 NA NA phpB0 1 ×10 -3 ×10 -3 ×10 -3 NA NA NA NA pendigits 1 ×10 -4 ×10 -4 ×10 -4 ×2 ×25 NA NA phish 1 ×10 -3 ×10 -3 ×10 -3 ×1 ×139 ×848 NA ailerons 1 ×10 -4 ×10 -4 ×10 -4 ×1 ×31 ×40 NA house 1 ×10 -4 ×10 -4 ×10 -4 ×1 ×31 ×38 NA a9a 1 ×10 -3 ×10 -3 ×10 -3 ×1 ×453 ×762 NA amazon 1 ×10 -4 ×10 -4 ×10 -4 ×1 ×51 ×31 NA bank 1 ×10 -4 ×10 -4 ×10 -4 ×1.5 ×149 ×220 NA adult 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×211 ×276 NA mnist 1 ×10 -4 ×10 -4 ×10 -4 NA NA NA NA miniboone 1 ×10 -4 ×10 -4 ×10 -4 ×2 ×35 NA NA runwalk 1 ×10 -4 ×10 -4 ×10 -4 ×2 ×35 NA NA covtype 1 ×10 -3 ×10 -3 ×10 -3 ×1 NA NA NA

Section: Running Time
We give the prediction time comparisons(in seconds) for different methods as shown in Figure 4. As we can see, our encrypted random forests take comparable running time with original random forests, AnonyRFs and DiffPrivRFs, since our Gini-impurity preserving encryption method only requires O(h) time complexity without other additional operations, where O(h) denotes the height of binary search tree BT .
Furthermore, our encrypted random forests show superior efficiency compared to other methods, such as MulPRFs, PPD-ERTs, PivotRFs, and HEldpRFs, with the training time obtained in 10 6 seconds (almost 11.6 days). This is because MulPRFs, PivotRFs, and PPD-ERTs require expensive communication costs for multi-parity computation, while HEldpRFs takes heavy computation costs on HE scheme. We also present the orders of magnitude improvement of training and prediction time in Table 6 and Table 7, respectively.

Section: Security
We analyze the security across fourteen datasets by randomly selecting an attribute that share a similar trend as other dimensions. We compare our Gini-impurity-preserving scheme with other four privacy-protection methods: differential privacy [93], anonymization [59], order-preserving scheme [96] and HE scheme [42]. The results are depicted in Figure 5.
Inspired from [97], we take the bitwise leakage matrix as our metric. An initial step is to scale and discretize the feature space into integers within the range of [0, 2 7 ], and then we sample 200 representative samples from each dataset to evaluate the security of the feature space. The primary objective in experiment is to safeguard as many bits of the plaintexts as possible. This quantitative  assessment is visualized through a color map: the x-axis represents the individual bits (1 through 7), while the y-axis indicates the rank order of the 200 sampled datasets.
The color gradient, ranging from white to red, represents the degree of security, with white correlating to minimal security and red to maximal security. The security degree was normalized within a [0, 1] range to ensure results' consistency. For instance, a security degree of 0 with white color indicates no security, while a security degree of 1 with red color suggests the highest level of security.
As expected, the HE scheme presents the highest security, yet with heavy computational costs. For example, datasets exceeding 3000 samples yielded no results under the HE scheme, even with an extended runtime of 10 6 seconds. It is also observed that our proposed scheme demonstrated superior security efficacy compared to the other three scheme: differential privacy [93], anonymization [59] and order-preserving scheme [96]. Since those schemes rely on mere data perturbations, compressions, or order information preservation. In comparison, our scheme makes a good balance between security and computational cost.

Section: E Proof of Bitwise Leakage
In this section, we present a comprehensive evaluation of the security properties for our Gini-impurity preserving methods, full homomorphic encryption, anonymization technique, and differential privacy methods. The security analysis is conducted in the feature space using the bitwise leakage matrix which is proposed by [97].
We focus on a discrete and finite feature space with a fixed size as in [97]. The feature space is defined as X = [0, 2 m-1 ], which means that the feature size is m bits, and the space ranges from 0 to 2 m-1 . Let D be the true distribution over the feature space, and dataset S = {a 1 , . . . , a n } are sampled independently and identically from distribution D.
The adversary A possesses two types of knowledge to achieve the goal of recovering plaintexts:
• Auxiliary knowledge about a distribution D ′ over the feature space X [98], which provides additional information to the adversary. • Ciphertexts S corresponding to S, which represents the snapshot of the encrypted data store, as described in Fuller et al. [99].
We re-sort dataset S with a non-decreasing order, i.e., S = a ⟨1⟩ , a ⟨2⟩ , • • • , a ⟨n⟩ where a ⟨1⟩ ≤ a ⟨2⟩ ≤ • • • ≤ a ⟨n⟩ . Let S⟨i⟩ be the i-th sample in S, and S⟨i⟩ The variable L(i, j) represents the probability that an adversary can accurately guess the j-th bit of the plaintext S⟨i⟩. This metric can be considered as a measure of the information security for the ciphertexts S⟨i⟩ , in the sense that a lower value of L(i, j) signifies a higher degree of security. Specifically, the bitwise information security of S⟨i⟩ can be quantified as 1-L(i, j), and this metric provides a precise and quantitative assessment of the encryption scheme's security properties. Specifically, we investigate the correlation among elements of L(i, j), plaintexts, ciphertexts and secret keys. We explore the impact of different encryption parameters on the structure and behavior of L(i, j). Our analysis reveals that the leakage pattern of L(i, j) is highly dependent on the specific encryption scheme. Therefore, it is crucial to carefully design and select the appropriate encryption scheme to minimize the risk of information leakage.
We now present the analysis of bitwise leakage matrix L(i, j) for our encryption method as follows. Theorem 7. For our Gini-impurity-preserving encryption and plaintexts S, we have
L(i, j) = q∈[i,n-k+i] I(S⟨q⟩ [j] ) = S⟨i⟩ [j] ) n -k + 1 × s∈S j b⟨i⟩ [j]
Pr D (S⟨i⟩ = s) + small constant .
Proof. Our Gini-impurity-preserving encryption transfers multiple plaintexts in I i ′ (i ′ ∈ [k]) to the identical first dimension ciphertext, i.e., c i ′ , as shown in Eqn. (5). Hence, the i ′ -th ciphertext c i ′ corresponds to multiple plaintexts, and the adversary will guess the true plaintext S⟨i⟩ of ciphertext Let b⟨i⟩ [j] be the adversary's guess for S⟨i⟩ the adversary guesses b⟨i⟩ [j] as the value corresponding to the maximum probability of the j-th bit in group K q as follows: This completes the proof.

Section: Acknowledgements
The authors want to thank the reviewers for their helpful comments and suggestions. This research was supported by National Key R&D Program of China (2021ZD0112802), NSFC (61921006, 62376119), CAAI-Huawei MindSpore Open Fund, and Fundamental Research Funds for the Central Universities (2023300246). W. Gao is the corresponding author of this paper.

Section: 
We will prove Eqn. (18) by induction on n. We first have For n = 1, we have a 0 ⟨1⟩ = c 0 max /2 and a 1 ⟨1⟩ = c 1 max /2 with c 0 max = c 1 max = 2 λ log 2 n , according to the initialization in Algorithm 1. This follows that Pr a 0 ⟨1⟩ |(a 0 ⟨1⟩ , y ⟨1⟩ ) = Pr a 0 ⟨1⟩ = Pr a 1

Section: ⟨1⟩
= Pr a 1 ⟨1⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ) .
We assume that Eqn. (18) holds for n = i (i > 1), that is,
Let us consider the case n = i + 1, and we add the sample (a b ⟨i+1⟩ , y ⟨i+1⟩ ) in binary search tree BT b (Algorithm 1). It is sufficient to consider two cases as follows:
• If we do not need to split a node for sample (a b ⟨i+1⟩ , y ⟨i+1⟩ ) in Algorithm 1, then we have
⟨i+1⟩ for b = 0 and b = 1, along with the same labels {y 1 , • • • , y i+1 }. Hence, we obtain the same ciphertext for a 0 ⟨i+1⟩ and a 1 ⟨i+1⟩ , i.e., t 0 .cipher 1 = t 1 .cipher 1 . This follows that
. By induction assumption in Eqn. (19), we have
⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) .
• If we need to split the node for (a b ⟨i+1⟩ , y ⟨i+1⟩ ) in Algorithm 1, then we assume that t 0 and t 1 are the corresponding splitting nodes. We firstly initialize the empty node l b and r b , and update the ciphertext l b .cipher 1 and r b .cipher 1 by Eqns ( 8)- (11), respectively. Notice that the random number ξ in Eqns ( 8)-( 11) is sampled from N (0, 1), and thus l 0 .cipher 1 and l 1 .cipher 1 are sampled from the same distribution. We have
⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) . Similarly, r 0 .cipher 1 and r 1 .cipher 1 are sampled from the same distribution, and we have
⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) . For the (i + 1)-th iteration in Algorithm 1, we have
We now provide similar analysis of bitwise leakage matrix L for ϵ-local differential privacy. Theorem 9. For ϵ-local differential privacy, we have
where S and S ′ denotes the plaintexts and ciphertexts, respectively, and
Proof. We concern ϵ-local differential privacy by adding noise to each individual value. If the adversary attempts to infer the original plaintext S⟨i⟩, then it relies on the ϵ-differential privacy disturbed data S ′ (i) and the auxiliary knowledge distribution D ′ . The adversary guesses the j-th bit of the i-th plaintext S⟨i⟩ [j] through a process of deduction as follows:
for b⟨i⟩ [j] = 0 and S ′ ⟨i⟩ [j] = 0 , randomly select from{0, 1} otherwise .

Section: This follows that
This completes the proof.
In order to gain a deeper understanding of the security for the k-anonymous algorithm, we conduct an analysis of the bitwise leakage matrix L. This matrix represents the amount of information leakage that occurs when the original data X is compressed into m partitions K 1 , K 2 , • • • , K t by the k-anonymous algorithm as follows:
The bitwise leakage matrix L quantifies the amount of information that can be inferred about an individual from the corresponding partitions. By analyzing this matrix, we can determine the level of privacy that is maintained by the k-anonymous algorithm and identify any potential vulnerabilities that could be exploited by an adversary. Then, we give the analysis of bitwise leakage matrix L for k-anonymous algorithm as follows.
Theorem 10. For k-anonymous algorithm and the plaintexts S, we have
where S⟨i⟩ [j] denotes the j-th bit of S⟨i⟩ with S⟨i⟩ ∈ K q (q ∈ [t]) and b⟨i⟩
Proof. The k-anonymity is a privacy-preserving technique that aims to protect the identity of individuals in a dataset. It works by grouping together individuals with similar attributes and pooling their data in a larger group, thus making it difficult for an adversary to identify any specific individual in the group. The k-anonymity ensures that each group has at least k individuals with the same attribute values, which further enhances the security of the data.
When the original data S⟨i⟩ is pooled in the group K q (q ∈ [t]), the adversary attempts to guess the j-th bit of the plaintext S⟨i⟩ using the auxiliary knowledge distribution D ′ and K q . To achieve this,


References:
[b0] L Breiman (2001). Random forests. Machine Learning
[b1] G Biau; E Scornet (2016). A random forest guided tour. Test
[b2] L Mentch; S Zhou (2020). Randomization as regularization: A degrees of freedom explanation for random forest success. Journal of Machine Learning Research
[b3] M Fernández-Delgado; E Cernadas; S Barro; D Amorim (2014). Do we need hundreds of classifiers to solve real world classification problems. Journal of Machine Learning Research
[b4] D Cutler; T Edwards; K Beard; A Cutler; K Hess; J Gibson; J Lawler (2007). Random forests for classification in ecology. Ecology
[b5] Y Qi (2012). Random forest for bioinformatics. Springer
[b6] J Shotton; A Fitzgibbon; M Cook; T Sharp; M Finocchio; R Moore; A Kipman; A Blake (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM
[b7] M Belgiu; L Dragut (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing
[b8] A Criminisi; J Shotton (2013). Decision forests for computer vision and medical image analysis. Springer Science & Business Media
[b9] S Basu; K Kumbier; J Brown; Bin B Yu (2018). Iterative random forests to discover predictive and stable high-order interactions. Proceedings of the National Academy of Sciences
[b10] M Denil; D Matheson; N Freitas (2013). Consistency of online random forests. 
[b11] G Louppe; L Wehenkel; A Sutera; P Geurts (2013). Understanding variable importances in forests of randomized trees. Advances in Neural Information Processing Systems
[b12] P Geurts; D Ernst; L Wehenkel (2006). Extremely randomized trees. Machine Learning
[b13] B Lakshminarayanan; D Roy; Y Teh (2014). Mondrian forests: Efficient online random forests. Advances in Neural Information Processing Systems
[b14] X Li; Y Wang; S Basu; K Kumbier; B Yu (2019). A debiased mdi feature importance measure for random forests. Advances in Neural Information Processing Systems
[b15] Y Lin; Y Jeon (2006). Random forests and adaptive nearest neighbors. Journal of the American Statistical Association
[b16] B Menze; B Kelm; D Splitthoff; U Koethe; F Hamprecht (2011). On oblique random forests. 
[b17] J Rodriguez; L Kuncheva; C Alonso (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence
[b18] S Wager; T Hastie; B Efron (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. Journal of Machine Learning Research
[b19] Z.-H Zhou; J Feng (2019). Deep forest. National Science Review
[b20] W Gao; F Xu; Z.-H Zhou (2022). Towards convergence rate analysis of random forests for classification. Artificial Intelligence
[b21] J.-Q Guo; M.-Z Teng; W Gao; Z.-H Zhou (2022). Fast provably robust decision trees and boosting. 
[b22] G Biau; L Devroye; G Lugosi (2008). Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research
[b23] G Biau (2012). Analysis of a random forests model. Journal of Machine Learning Research
[b24] E Scornet; G Biau; J Vert (2015). Consistency of random forests. Annals of Statistics
[b25] J Mourtada; S Gaïffas; E Scornet (2017). Universal consistency and minimax rates for online mondrian forests. Advances in Neural Information Processing Systems
[b26] C Tang; D Garreau; U Luxburg (2018). When do random forests fail. 
[b27] Z.-H Zhou (2012). Ensemble Methods: Foundations and Algorithms. CRC Press
[b28] C Dwork (2006). Differential privacy. 
[b29] A Patil; S Singh (2014). Differential private random forest. 
[b30] X Li; B Qin; Y.-Y Luo; D Zheng (2022). A differential privacy budget allocation algorithm based on out-of-bag estimation in random forest. Mathematics
[b31] A Blum; C Dwork; F Mcsherry; K Nissim (2005). Practical privacy: The sulq framework. 
[b32] A Friedman; A Schuster (2010). Data mining with differential privacy. 
[b33] S Fletcher; M Islam (2019). Decision tree classification with differential privacy: A survey. ACM Computing Surveys
[b34] S Samet; A Miri (2008). Privacy preserving ID3 using gini index over horizontally partitioned data. 
[b35] J Vaidya; C Clifton; M Kantarcioglu; A Patterson (2008). Privacy-preserving decision trees over vertically partitioned data. ACM Transactions on Knowledge Discovery from Data
[b36] S Hoogh; B Schoenmakers; P Chen (2014). Practical secure decision tree learning in a teletreatment application. 
[b37] M Joye; F Salehi (2018). Private yet efficient decision tree evaluation. 
[b38] Y Li; Z Jiang; L Yao; X Wang; S Yiu; Z Huang (2019). Outsourced privacy-preserving C4.5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties. Cluster Computing
[b39] C Gentry (2009). Fully homomorphic encryption using ideal lattices. 
[b40] L Ducas; D Micciancio (2015). Fhew: Bootstrapping homomorphic encryption in less than a second. 
[b41] J Cheon; A Kim; M Kim; Y Song (2017). Homomorphic encryption for arithmetic of approximate numbers. 
[b42] I Chillotti; N Gama; M Georgieva; M Izabachène (2020). Tfhe: Fast fully homomorphic encryption over the torus. Journal of Cryptology
[b43] R Gilad-Bachrach; N Dowlin; K Laine; K Lauter; M Naehrig; J Wernsing (2016). Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. 
[b44] Q Lou; W Lu; C Hong; L Jiang (2020). Falcon: Fast spectral inference on encrypted data. Advances in Neural Information Processing Systems
[b45] Z Ghodsi; N Jha; B Reagen; S Garg (2021). Circa: Stochastic ReLUs for private deep learning. Advances in Neural Information Processing Systems
[b46] X Li; R Dowsley; M Cock (2021). Privacy-preserving feature selection with secure multiparty computation. 
[b47] A Aloufi; P Hu; H Wong; S Chow (2019). Blindfolded evaluation of random forests with multi-key homomorphic encryption. IEEE Transactions on Dependable and Secure Computing
[b48] J Li; X Kuang; S Lin; X Ma; Y Tang (2020). Privacy preservation for machine learning training and classification based on homomorphic encryption schemes. Information Sciences
[b49] L Pulido-Gaytan; A Tchernykh; J Cortés-Mendoza; M Babenko; G Radchenko (2021). A survey on privacy-preserving machine learning with fully homomorphic encryption. 
[b50] A Akavia; M Leibovich; Y Resheff; R Ron; M Shahar; M Vald (2022). Privacy-preserving decision trees training and prediction. ACM Transactions on Privacy and Security
[b51] K Cong; D Das; J Park; H Pereira (2022). Sortinghat: Efficient private decision tree evaluation via homomorphic encryption and transciphering. 
[b52] J Brickell; D Porter; V Shmatikov; E Witchel (2007). Privacy-preserving remote diagnostics. 
[b53] M Barni; P Failla; V Kolesnikov; R Lazzeretti; A Sadeghi; T Schneider (2009). Secure evaluation of private linear branching programs with medical applications. 
[b54] R Tai; J Ma; Y Zhao; S Chow (2017). Privacy-preserving decision trees evaluation via linear functions. 
[b55] A Tueno; F Kerschbaum; S Katzenbeisser (2019). Private evaluation of decision trees using sublinear cost. 
[b56] Á Kiss; M Naderpour; J Liu; N Asokan; T Schneider (2019). Sok: Modular and efficient private decision tree evaluation. 
[b57] K Sarpatwar; N Ratha; K Nandakumar; K Shanmugam; J Rayfield; S Pankanti; R Vaculín (2020). Privacy enhanced decision tree inference. 
[b58] K Lefevre; D Dewitt; R Ramakrishnan (2006). Mondrian multidimensional k-anonymity. 
[b59] L Sweeney (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems
[b60] W Du; Z Zhan (2002). Building decision tree classifier on private data. 
[b61] Y Wu; S Cai; X Xiao; G Chen; B Ooi (2008). Privacy preserving vertical federated learning for tree-based models. 
[b62] K Hamada; D Ikarashi; R Kikuchi; K Chida (2021). Efficient decision tree training with new data structure for secure multi-party computation. 
[b63] A Aminifar; F Rabbi; K Pun; Y Lamo (2021). Privacy preserving distributed extremely randomized trees. 
[b64] M De Cock; R Dowsley; C Horst; R Katti; A Nascimento; W Poon; S Truex (2017). Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Transactions on Dependable and Secure Computing
[b65] A Tueno; Y Boev; F Kerschbaum (2020). Non-interactive private decision tree evaluation. 
[b66] T  (1985). A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory
[b67] P Paillier (1999). Public-key cryptosystems based on composite degree residuosity classes. 
[b68] K Han; S Hong; J Cheon; D Park (2019). Logistic regression on homomorphic encrypted data at scale. 
[b69] P Fenner; E Pyzer-Knapp (2020). Privacy-preserving Gaussian process regression -A modular approach to the application of homomorphic encryption. 
[b70] A Sanyal; M Kusner; A Gascón; V Kanade (2018). TAPAS: Tricks to accelerate (encrypted) prediction as a service. 
[b71] Q Lou; B Feng; G Fox; L Jiang (2020). Glyph: Fast and accurately training deep neural networks on encrypted data. 
[b72] Q Lou; L Jiang (2021). HEMET: A homomorphic-encryption-friendly privacy-preserving mobile neural network architecture. 
[b73] S Pentyala; R Dowsley; M Cock (2021). Privacy-preserving video classification with convolutional neural networks. 
[b74] E Lee; J Lee; J Lee; Y Kim; Y Kim; J No; W Choi (2022). Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions. 
[b75] J Wang; Q Tang; A Arriaga; P Ryan (2019). Novel collaborative filtering recommender friendly to privacy protection. 
[b76]  Ac;  Yao (1982). Protocols for secure computations. 
[b77] S Sayyad (2020). Privacy preserving deep learning using secure multiparty computation. 
[b78] R Shokri; V Shmatikov (2015). Privacy preserving deep learning. 
[b79] K Bonawitz; V Ivanov; B Kreuter; A Marcedone; H B Mcmahan; S Patel; D Ramage; A Segal; K Seth (). Practical secure aggregation for privacy-preserving machine learning. 
[b80] J Vaidya; C Clifton (2003). Privacy-preserving k-means clustering over vertically partitioned data. 
[b81] X Yi; Y C Zhang (2013). Equally contributory privacy preserving k-means clustering over vertically partitioned data. Information Systems
[b82] Y Fan; J Bai; X Lei; W Lin; Q Hu; G Wu; J Guo; G Tan (2021). PPMCK: Privacy-preserving multi-party computing for k-means clustering. Parallel and Distributed Computing
[b83] C Dwork (2008). Differential privacy: A survey of results. 
[b84] M Abadi; A Chu; I Goodfellow; H B Mcmahan; I Mironov; K Talwar; L Zhang (2016). Deep learning with differential privacy. 
[b85] T Ha; T K Dang; T T Dang; T A Truong; M T Nguyen (2019). Differential privacy in deep learning: An overview. 
[b86] B Ghazi; N Golowich; R Kumar; P Manurangsi; C Y Zhang (2021). Deep learning with label differential privacy. Advances in Neural Information Processing Systems
[b87] R Popa; F Li; N Zeldovich (2013). An ideal-security protocol for order-preserving encoding. 
[b88] K Florian (2015). Frequency hiding order preserving encryption. 
[b89] M Islam; M Kuzu; M Kantarcioglu (2012). Access pattern disclosure on searchable encryption: Ramification, attack and mitigationjq. 
[b90] S Goldwasser; S Micali (1982). Probabilistic encryption how to play mental poker keeping secret all partial information. 
[b91] I Chillotti; N Gama; M Georgieva; M Izabachene (2016). Faster fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. 
[b92] N Holohan; S Braghin; P Mac Aonghusa; K Levacher (1907). Diffprivlib: The IBM differential privacy library. 
[b93] K Marcel (2020). MP-SPDZ: A versatile framework for multi-party computation. 
[b94] P Probst; M Wright; A Boulesteix (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
[b95] R Agrawal; J Kiernan; R Srikant; Y Xu (2004). Order preserving encryption for numeric data. 
[b96] C Roy; B Ding; S Jha; W Liu; J Zhou (2022). Strengthening order preserving encryption with differential privacy. 
[b97] M Tschantz; S Sen; A Datta (2020). Sok: Differential privacy as a causal property. 
[b98] B Fuller; M Varia; A Yerukhimovich; E Shen; A Hamlin; V Gadepally; R Shay; J Mitchell; R Cunningham (2017). Sok: Cryptographically protected database search. 

Figures:
Figure fig_0: 
Type: figure
Caption: and sends them to a challenger; • The challenger flips an unbiased coin b ∈ {0, 1} to select {a b 1 , • • • , a b n }, and randomly sets their corresponding labels {y b 1 , • • • , y b n } with each y b i drawn independently and uniformly over [τ ]. The challenger encrypts {a b 1 , • • • , a b n } by Eqns. (
Data: 

Figure fig_1: 1
Type: figure
Caption: After getting decision trees DT 1 ,1• • • , DT m , we predict label ỹi = DT 1 ( xi ) ⊕ • • • ⊕ DT m ( xi ) for test instance xi ∈ Sn ′ . The server sends ciphertexts { ỹ1 , • • • , ỹn ′ } to the client, and the client decrypts those ciphertexts, and gets the final plaintext label by ỹi = arg max j∈[τ ] {Dec ỹi,j }.During such prediction process, the server requires the O(h) computational complexity, since we search from the root to leaf node of tree. The client takes O(1) rounds of communication and communication bandwidth to transfer the testing data and predicting ciphertext without interaction.
Data: 

Figure fig_2: 2
Type: figure
Caption: Figure 2 :2Figure 2: Comparisons of training running time on different random forests. Notice that the y-axis is in log-scale, and full black columns imply that no result was obtained after running out 10 6 seconds (about 11.6 days).
Data: 

Figure fig_3: 3
Type: figure
Caption: Figure 3 :3Figure 3: Security comparisons for different schemes: the more red the area, the higher the security.
Data: 

Figure fig_4: 
Type: figure
Caption: the corresponding partitions as defined by Eqn. (4). There exists a splitting point a * such that I G (A, a * ) = I * G (A) anda * ∈ i∈[s-1] {max{a k : (a k , y k ) ∈ I i }/2 + min{a k : (a k , y k ) ∈ I i+1 }/2} ,where I G (A, a * ) and I * G (A) are defined by Eqns. (1) and (2), respectively.
Data: 

Figure fig_5: 5
Type: figure
Caption: Theorem 5 .5We have I * G (A) = I * G ( Â), for re-sort dataset A by Eqn. (3) and for the corresponding encrypted dataset Â = {( a ⟨1⟩ 1 , y ⟨1⟩ ), • • • , ( a ⟨n⟩ 1 , y ⟨n⟩ )} from Algorithm 1.
Data: 

Figure fig_6: 6
Type: figure
Caption: Then, we have Lemma 6 .6randomly and independently from a uniform distribution on [τ ]. For a b 1 < a b 2 < • • • < a b n with b = {0, 1}, we have the same Gini impurity for two sequences A 0 = {(a 0 1 , y 1 ), (a 0 2 , y 2 ), • • • , (a 0 n , y n )} and A 1 = {(a 1 1 , y 1 ), (a 1 2 , y 2 ), • • • , (a 1 n , y n )}. Proof. Let a b i be a splitting point for b ∈ {0, 1} and i ∈ [n], and we split A b into left and right datasets A
Data: 

Figure fig_7: 
Type: figure
Caption: Forj ∈ [τ ], denote by ν l,b j and ν r,b j the cardinalities of subsets A l,b a b i and A r,b a b i with label j, respectively.
Data: 

Figure fig_8: 4
Type: figure
Caption: Figure 4 :4Figure 4: Comparisons of the prediction running time on different random forest. Notice that the y-axis is in log-scale, and full black columns imply that no result was obtained after running out 10 6 seconds for training (about 11.6 days).
Data: 

Figure fig_9: 
Type: figure
Caption: 0[j]  be the j-th bit of S⟨i⟩ with i ∈ [n] and j ∈ [m]. Then, we denote by b⟨i⟩ [j] the adversary's guess for S⟨i⟩ [j] through the auxiliary knowledge distribution D ′ as follows:b⟨i⟩ [j] = arg max b∈{0,1} Pr D ′ S⟨i⟩ [j] = b = 0 for E D ′ [S⟨i⟩ [j] ] ≤ 1/2 1 for E D ′ [S⟨i⟩ [j] ] > 1/2 , for i ∈ [n] and j ∈ [m].The adversary aims to correctly guess the plaintext S⟨i⟩ [j] using the auxiliary knowledge D ′ . Let L be a n × m matrix withL(i, j) = Pr S⟨i⟩ [j] = b⟨i⟩ [j] |D, D ′ for i ∈ [n] and j ∈ [m] .Similarly to[97], we havePr D S⟨i⟩ [j] = 0 = s∈S j Pr D (S⟨i⟩ = s) for i ∈ [n] and j ∈ [m] ,and Pr D S⟨i⟩ [j] = 1 = 1 -Pr D S⟨i⟩ [j] = 0 , where s [j] denotes the j-th bit of s with S j 0 = {s|s ∈ S and s [j] = 0}. This follows that L(i, j) = Pr S⟨i⟩ [j] = b⟨i⟩ [j] |D, D ′ = s∈S j b⟨i⟩ [j] Pr D (S⟨i⟩ = s) .
Data: 

Figure fig_10: 5
Type: figure
Caption: Figure 5 :5Figure 5: Comparisons of the security degree for the feature space through the bitwise leakage matrix.
Data: 

Figure fig_11: 28
Type: figure
Caption: 2 .Lemma 8 (28[j] , we have b⟨i⟩ [j] = arg max b∈{0,1}Pr D ′ S⟨i⟩ [j] = b = 0 for E D ′ [S⟨i⟩ [j] ] ≤ 1/2 1 for E D ′ [S⟨i⟩ [j] ] > 1/The probability for the adversary correctly identifies the j-th bit of the plaintext S⟨i⟩ isL(i, j) = P i,j s∈S j b⟨i⟩ [j]Pr D (S⟨i⟩ = s) + small constant , and we complete the proof from Lemma 8. Roy et al.[97]). Let D be the input distribution and S = {a 1 , . . . , a n } denotes the dataset with each data point sampled i.i.d. from D, then we havePr D (S⟨i⟩ = a ′ ) = n j=n-i+1 n j (Pr D (a < a ′ )) n-j (Pr D (a = a ′ )) j for Pr D (a > a ′ ) = 0 ,andPr D (S⟨i⟩ = a ′ ) = n j=i n j (Pr D (a = a ′ )) j (Pr D (a > a ′ )) n-j for Pr D (a < a ′ ) = 0 ;otherwise,Pr D (S⟨i⟩ = a ′ ) = n j=1 min{i,n-j+1} k=max{1,i-j+1} n k -1, j, n -k -j + 1) ∆ k-1,j,n-k-j+1 ,where ∆ k-1,j,n-k-j+1 = (Pr D (a < a ′ )) k-1 • (Pr D (a = a ′ )) j • (Pr D (a > a ′ )) n-k-j+1 .
Data: 

Figure fig_12: 
Type: figure
Caption: [j] = b] Pr D ′ (x)
Data: 

Figure tab_1: 
Type: table
Caption: , • • • , n do %% Step-I: Search a node for sample (a i , y i ) in binary search tree BT Set t = root of BT , t min = 0, t max = c max and index = 1 while t is an internal node and index==1 do index= 0 if t.left ̸ = ∅ and a i < max{a j : (a j , y j ) ∈ t.left.samples} then t = t.left, t max = t.cipher 1 , index = 1 else if t.right ̸ = ∅ and a i > min{a j : (a j , y j ) ∈ t.right.samples} then t = t.right, t min = t.cipher 1 , index = 1 end if end while Update t = t.left if Eqn. (6) is true, and update t = t.right if Eqn. (7) is true %% Step-II: Update the binary search tree BT if y Append example (a i , y i ) into t.samples and update t.cipher 2 = Enc(k pub , |t.samples|) Encrypt a i = (t.cipher 1 , t.cipher 2 ) end for
Data: 

Figure tab_2: 
Type: table
Caption: Algorithm 2 Splitting a node for encryption Input: Example (a i , y i ), node t of binary search tree BT , and interval [t min , t max ] Output: Updated node t Initialize an empty node l with l.samples = {(a j , y j ) ∈ t.samples : a j < a i } Update t.samples = t.samples \ l.samples \ r.samples to keep the increasing order of ciphertexts c 1 , c 2 , • • • , c s in Eqn.
Data: if l.samples ̸ = ∅ thenif t.left ̸ = ∅ thenSet l.cipher 1 according to Eqn. (8), and update l.left = t.left, t.left = lelseSet l.cipher 1 according to Eqn. (9), and update t.left = lend ifend ifInitialize an empty node r with r.samples = {(a j , y j ) ∈ t.samples : a j > a i }if r.samples ̸ = ∅ thenif t.right ̸ = ∅ thenSet r.cipher 1 according to Eqn. (10), and update r.right = t.right, t.right = relseSet r.cipher 1 according to Eqn. (11), and update t.right = rend ifend if

Figure tab_3: 
Type: table
Caption: 1 = (t.left.cipher 1 + t.cipher 1 )/2 + ξ s.t. t.left.cipher 1 < l.cipher 1 < t.cipher 1 , (8) and update l.left = t.left, t.left = l; otherwise, we set l.cipher 1 = (t min + t.cipher 1 )/2 + ξ s.t. l.cipher 1 ∈ (t min , t.cipher 1 ) ,
Data: 

Figure tab_4: 
Type: table
Caption: We make similar update for the right child of node t: initialize an empty node r with r.samples = {(a j , y j ) ∈ t.samples : a j > a i }, and consider r.samples̸ = ∅. If t.right ̸ = ∅,then we set r.cipher 1 = (t.cipher 1 + t.right.cipher 1 )/2 + ξ s.t. t.cipher 1 < r.cipher 1 < t.right.cipher 1 , (10) and update r.right = t.right, t.right = r; otherwise, we set r.cipher 1 = (t.cipher 1 + t max )/2 + ξ s.t. r.cipher 1 ∈ (t.cipher 1 , t max ) ,
Data: Algorithm 3 Finding the best splitting feature and positionInput: Encrypted datasets S t n , available splitting feature and position s Output: index i  *ȷ i=1 , and secret key k sec%% Server:for i ∈ [ȷ] doCalculate Gini impurity I G ( S t n , s i ) from Eqn. (12) w.r.t splitting feature and position s i end forSend ciphertexts {I G ( S t n , s i )} i∈[ȷ] to the client%% Client:Get the decrypted {Dec(k sec , I G ( S t n , s i ))} i∈[ȷ] Set i  *  = -1 if Dec(k sec , I G ( S t n , s i )) = 0 for every i ∈ [ȷ]; otherwise, set i  *  by Eqn. (13) Send i  *  to the server

Figure tab_5: 2
Type: table
Caption: Datasets
Data: Datasets#Inst#FeatDatasets#Inst#FeatDatasets#Inst#FeatDatasets#Inst#Featwdbc56930adver3,2791,558ailerons13,75041adult48,84214cancer56931bibtex7,3961,836house22,78416mnist70,000780breast6999phpB07,797617a9a32,563123miniboone72,99851diabetes7688pendigits10,99216amazon32,7699runwalk88,5886german1,00024phish11,05530bank45,21117covtype581,01254

Figure tab_7: 3
Type: table
Caption: Comparisons of prediction accuracies (mean±std).•/• indicates that our encrypted random forests are significantly better/worse than other compared random forests (pairwise t-tests at 95% significance level). 'NA' means that no results were obtained after running out 10 6 seconds (about 11.6 days).
Data: DatasetOur encrypted RFsOriginal RFsAnonyRFsDiffPrivRFsPPD-ERTsPivotRFsMulPRFsHEldpRFswdbc.9525±.0141.9617±.0018.9091±.0205•.8998±.0024•.9222±.0037•.9609±.0101.9510±.0114.9195±.0029•cancer.9766±.0082.9824±.0143.9271±.0016•.9034±.0578•.9600±.0022•.9510±.0130•.9656±.0102.9823±.0024breast.9855±.0012.9881±.0011.9657±.0021•.9271±.0515•.9678±.0129•.9806±.0086.9769±.0107.9275±.0023•german.7939±.0124.8033±.0205.7300±.0214•.7400±.0141•.7610±.0168•.7533±.0122•.7823±.0154.7043±.0027•diabetes.7641±.0093.7677±.0309.7193±.0023•.7328±.0124•.7448±.0193.7419±.0061•.7611±.0035.7478±.0193•adver.9851±.0011.9888±.0014.9278±.0018•.9390±.0051•NA.9664±.0043•NANAbibtex.7907±.0054.7749±.0027•.7425±.0009•.7200±.0130•NA.7461±.0193•NANAphpB0.9380±.0024.9585±.0043•.8641±.0009•.8920±.0031•NANANANApendigits.9917±.0024.9906±.0016.9072±.0104•.9154±.0126•.9639±.0048•.9070±.0130•NANAphish.9798±.0026.9716±.0018.9032±.0014•.9318±.0089•.9555±.0125•.9454±.0067•.9401±.0102•NAailerons.8795±.0027.8819±.0015.8104±.0105•.8322±.0091•.8589±.0043•.8571±.0082•.8766±.0025NAhouse.8794±.0007.8913±.0039•.8255±.0011•.8475±.0025•.8541±.0149•.8508±.0016•.8742±.0023NAa9a.8321±.0011.8303±.0012.8046±.0027•.7909±.0084•.8345±.0144.8314±.0071.8051±.0102•NAamazon.9491±.0109.9478±.0060.9193±.0024•.9104±.0035•.9221±.0024•.9401±.0128.9400±.0032NAbank.8992±.0118.9029±.0104.8499±.0089•.8517±.0064•.8940±.0147.8940±.0091.8827±.0108NAadult.8663±.0019.8691±.0018.8206±.0032•.8355±.0053•.8452±.0106•.8243±.0076•.8594±.0103NAmnist.9674±.0105.9763±.0101.9362±.0006•.9059±.0157•NANANANAminiboone.9497±.0018.9518 ±.0013.8977±.0101•.9111±.0104•.9301±.00021•.9501±.0011NANArunwalk.9784±.0014.9798±.0032.9523±.0024•.9401±.0040•.9572±.0074•.9511±.0071•NANAcovtype.9787±.0042.9650±.0104•.9112±.0015•.9407±.0018•.9569±.0134•NANANAwin/tie/loss2/16/220/0/020/0/017/3/014/6/010/10/019/1/0

Figure tab_8: 
Type: table
Caption: Input: Tree node t of BT , ciphertext a i Output: plaintext a i while a i 1 ̸ = t.cipher 1 do if a i 1 > t.cipher 1 then t = t.right else if a i 1 < t.cipher 1 then t = t.left end if end while Return a i = t.samples[Dec(k sec , a i 2 ])
Data: 

Figure tab_9: 4
Type: table
Caption: Hyperparameter settings for tree ensemble models in experiments. '-' means that the parameter is not exist in the corresponding method, and 'max_bin' denotes the maximum splitting point of each feature.
Data: ParameterOur WorkPPD-ERTsHEldpRFsPivotRFsMulPRFsAnonyRFsDiffPrivRFsOriginal RFsmax_depthNoneNone54NoneNoneNoneNonen_estimators100100100100100100100100

Figure tab_10: 5
Type: table
Caption: Hyperparameter setting of samples' minimum number α for leaves splitting in experiments.
Data: ParameterwdbccancerbreastdiabetesgermanadverbibtexphpB0pendigitsphishα10101010101010101010Parameteraileronshousea9aamazonbankadultmnistminiboonerunwalkcovtypeα10100100100100100100100100100

Figure tab_11: 6
Type: table
Caption: The orders of magnitude improvement compared to other approaches in Figure2. 'NA' means that no results were obtained after running out 10 6 seconds (about 11.6 days).
Data: DatasetOur encrypted RFsOriginal RFsAnonyRFsDiffPrivRFsPPD-ERTsPivotRFsMulPRFsHEldpRFs

Figure tab_12: 7
Type: table
Caption: The orders of magnitude improvement compared to other approaches in Figure4. 'NA' means that no results were obtained after running out 10 6 seconds (about 11.6 days).
Data: DatasetOur encrypted RFsOriginal RFsAnonyRFsDiffPrivRFsPPD-ERTsPivotRFsMulPRFsHEldpRFswdbc1×3×3×3×38×1, 220×93×4, 000cancer1×28×29×25×360×11, 052×851×41, 911breast1×25×31×27×308×11, 631×776×44, 736german1×4×5×4×115×2, 615×421×9, 615diabetes1×42×35×41×411×18, 142×1, 642×64, 285adver1×3×4×10NA×3, 821NANAbibtex1×1×1×1NA×1, 528NANAphpB01×4×4×4NANANANApendigits1×6×6×10×2384×18, 947NANAphish1×5×8×6×1, 966×1, 7619×2, 604NAailerons1×6×9×8×1, 581×30, 200×3, 600NAhouse1×6×9×8×1, 581×30, 400×3, 600NAa9a1×6×10×8×5, 482×27, 000×3, 250NAamazon1×10×12×12×2, 208×54, 500×7, 500NAbank1×14×18×20×5, 637×75, 500×10, 000NAadult1×4×5×4×1, 967×22, 054×2, 876NAmnist1×2×3×2NANANANAminiboone1×6×9×9×1, 800×75, 000NANArunwalk1×12×18×26×2, 413×84, 000NANAcovtype1×7×10×8×2, 943NANANA


Formulas:
Formula formula_0: Gini(A) = 1 - y∈[τ ] p 2 y ,

Formula formula_1: I G (A, a) = w l • Gini(A l a ) + w r • Gini(A r a ) ,(1)

Formula formula_2: w l = |A l a |/n and w r = |A r a |/n. Let I * G (A) be the minimum Gini impurity of dataset A, i.e., I * G (A) = min a∈R {I G (A, a)} .(2)

Formula formula_3: A = (a ⟨1⟩ , y ⟨1⟩ ), (a ⟨2⟩ , y ⟨2⟩ ), • • • , (a ⟨n⟩ , y ⟨n⟩ ) ,(3)

Formula formula_4: I 1 = (a ⟨1⟩ , y ⟨1⟩ ), • • • , (a ⟨k1⟩ , y ⟨k1⟩ ) , I 2 = (a ⟨k1+1⟩ , y ⟨k1+1⟩ ), , • • • , (a ⟨k1+k2⟩ , y ⟨k1+k2⟩ ) ,(4)

Formula formula_5: • • • I s = (a ⟨k1+k2+•••+ks-1+1⟩ , y ⟨k1+k2+•••+ks-1+1⟩ ), • • • , (a ⟨n⟩ , y ⟨n⟩ ) .

Formula formula_6: Dataset A = {(a 1 , y 1 ), • • • , (a n , y n )} Output: Binary search tree BT , ciphertexts { a 1 , • • • , a n } Initialize: Tree BT = ∅ with its cipher 1 = c max /2, where c max = 2 λ log 2 n for i = 1

Formula formula_7: a ⟨i⟩ = a ⟨i⟩ 1 , a ⟨i⟩ 2 = (c 1 , Enc(k pub , i)) for j = 1 , (c j , Enc(k pub , i -k 1 -• • • -k j-1 )) for 2 ≤ j ≤ s .(5)

Formula formula_8: a ⟨i⟩ 2 = Enc(k pub , i -k 1 -• • • -k j-1

Formula formula_9: A ′ = {( a ⟨1⟩ 1 , y ⟨1⟩ ), • • • , ( a ⟨n⟩ 1 , y ⟨n⟩ )} from Eqns. (4)-(5).

Formula formula_12: 1 , • • • , a 0 n } and {a 1 1 , • • • , a 1 n },

Formula formula_13: Pr[A(Game GIPCPA ) = b] < 1/2 + small constant .

Formula formula_14: P a 0 1 , • • • , a 0 i+1 |a 0 1 , • • • , a 0 i+1 = P a 1 1 , • • • , a 1 i+1 |a 1 1 , • • • , a 1 i+1 .

Formula formula_15: S n = {(x 1 , y 1 ), • • • , (x n , y n )} with x i = (x i,1 , • • • , x i,d ). The client constructs d binary search trees BT 1 , BT 2 , • • • , BT d according to Algorithm 1 over different dimensional features and labels in S n , where BT j is used to encrypt features {x 1,j , • • • , x n,j } for j ∈ [d].

Formula formula_16: y i = [ y i,1 , • • • , y i,τ ] is given by y i,j = Enc(k pub , 1) for j = y i , Enc(k pub , 0) otherwise.

Formula formula_17: S n = {( x 1 , y 1 ), • • • , ( x n , y n )}. Let Sn ′ = { x1 , • • • , xn ′ } be a testing data with instance xi = (x i,1 , • • • , xi,d ). For every plaintext xi,j with i ∈ [n ′ ] and j ∈ [d],

Formula formula_18: Sn ′ = { x1 , • • • , xn ′ }.

Formula formula_19: For each i ∈ [ȷ],

Formula formula_20: = {( x l 1 , y l 1 ), • • • , ( x l n l , y l n l )} and S t n r i = {( x r 1 , y r 1 ), • • • , ( x r nr , y r nr )} . From Eqn.

Formula formula_21: I G ( S t n , s i ) = n l n l + n r ⊗ I G ( S t n l i ) ⊕ n r n l + n r ⊗ I G ( S t n r i ) ,(12)

Formula formula_22: I G ( S t n l i ) = 1 ⊖ p l ⊙ p l and I G ( S t n r i ) = 1 ⊖ p r ⊙ p r , with p l = (1/n l ) ⊗ ( y l 1 ⊕, • • • , ⊕ y l n l ) and p r = (1/n r ) ⊗ ( y r 1 ⊕, • • • , ⊕ y r nr ) .

Formula formula_23: i * ∈ arg min i∈[ȷ] Dec(k sec , I G ( S t n , s i )) .(13)

Formula formula_24: For dataset A = {(a 1 , y 1 ), • • • , (a n , y n )}, let I 1 , I 2 , • • • , I s be

Formula formula_25: ν j = |{i ∈ [n] : y i = j}| ,

Formula formula_26: A l a = {(a i , y i ) : a i ≤ a, (a i , y i ) ∈ A} , A r a = {(a i , y i ) : a i > a, (a i , y i ) ∈ A} .

Formula formula_27: ν l j = |{i ∈ [n] : y i = j, a i ≤ a}| ,

Formula formula_28: I G (A, a) = w l -w l j∈[τ ] (ν l j ) 2 |A l a | 2 + w r -w r j∈[τ ] (ν j -ν l j ) 2 (n -|A l a |) 2 ,

Formula formula_29: I G (A, a) when a ≥ max{a k : (a k , y k ) ∈ I i-1 }/2 + min{a k : (a k , y k ) ∈ I i }/2 a ≤ max{a k : (a k , y k ) ∈ I i }/2 + min{a k : (a k , y k ) ∈ I i+1 }/2 , for i = 2, 3, • • • , s -1.

Formula formula_30: n 2 ∂I G (A, a) ∂ν l j * = 1 n j∈[τ ] (ν l j ) 2 (w l ) 2 -2 ν l j * w l - 1 n j∈[τ ] (ν j -ν l j ) 2 (w r ) 2 + 2 (ν j * -ν l j * ) w r = 1 n j∈[τ ]   ν l j w l 2 - ν j -ν l j w r 2   + 2 ν j * -ν l j * w r - ν l j * w l = 1 n j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   + 1 n   ν l j * w l 2 - ν j * -ν l j * w r ) 2   + 2 ν j * -ν l j * w r - ν l j * w l = 1 n j∈[τ ],j̸ =j * (ν l j ) 2 (w l ) 2 - (ν j -ν l j ) 2 (w r ) 2 + ν j * -ν l j * w r - ν l j * w l 2 - ν j * -ν l j * nw r - ν l j * nw l .

Formula formula_31: 0 ≤ ν j -ν l j w r ≤ n and 0 ≤ ν l j w l ≤ n for each j ∈ [τ ] . (14

Formula formula_32: )

Formula formula_33: j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   ≥ 0 ,

Formula formula_34: 0 ≤ j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   = j∈[τ ],j̸ =j * ν l j w l + ν j -ν l j w r ν l j w l - ν j -ν l j w r ≤ j∈[τ ],j̸ =j * 2n ν l j w l - ν j -ν l j w r = 2n j∈[τ ],j̸ =j * ν l j w l - ν j -ν l j w r .

Formula formula_35: n - j∈[τ ],j̸ =j * ν j -ν l j w r ≥ n - j∈[τ ],j̸ =j * ν l j w l ,(15)

Formula formula_36: ν j * -ν l j * w r ≥ ν l j * w l . (16

Formula formula_37: )

Formula formula_38: ∂I G (A, a) ∂ν l j * ≥ 0 ,

Formula formula_39: j∈[τ ],j̸ =j *   ν l j w l 2 - ν j -ν l j w r 2   < 0 ,

Formula formula_40: j∈[τ ]   ν l j w l 2 - ν j -ν l j w r 2   < ν l j * w l 2 - ν j * -ν l j * w r 2 = ν l j * w l + ν j * -ν l j * w r ν l j * w l - ν j * -ν l j * w r < 2n ν l j * w l - ν j * -ν l j * w r .

Formula formula_41: n 2 ∂I G (A, a) ∂ν l j * = 1 n j∈[τ ]   ν l j w l 2 - ν j -ν l j w r ) 2   + 2 ν j * -ν l j * w r - ν l j * w l < 2 ν l j * w l - ν j * -ν l j * w r + 2 ν j * -ν l j * w r - ν l j * w l = 0 ,

Formula formula_42: a ≥ max{a k : (a k , y k ) ∈ I i-1 }/2 + min{a k : (a k , y k ) ∈ I i }/2 a ≤ max{a k : (a k , y k ) ∈ I i }/2 + min{a k : (a k , y k ) ∈ I i+1 }/2 , with i = 2, 3, • • • , s -1.

Formula formula_43: I G (A, a) from ν l j = 0(j ̸ = j * ) when a ∈ (-∞, (max{a k : (a k , y k ) ∈ I 1 } + min{a k : (a k , y k ) ∈ I 2 }) /2] ;

Formula formula_44: ̸ = j * ) when a ∈ [(max{a k : (a k , y k ) ∈ I s-1 } + min{a k : (a k , y k ) ∈ I s }) /2, +∞) .

Formula formula_45: a * ∈ i∈[s-1] max{a k : (a k , y k ) ∈ I i } + min{a k : (a k , y k ) ∈ I i+1 } 2 .

Formula formula_46: I G (A, (max{a k : (a k , y k ) ∈ I i } + min{a k : (a k , y k ) ∈ I i+1 }/ 2) = I G (A, (c i + c i+1 )/2) ,

Formula formula_47: I 1 = (a ⟨1⟩ , y ⟨1⟩ ), • • • , (a ⟨k1⟩ , y ⟨k1⟩ ) I 2 = (a ⟨k1+1⟩ , y ⟨k1+1⟩ ), , • • • , (a ⟨k2⟩ , y ⟨k2⟩ ) (17) • • • I s = (a ⟨ks-1+1⟩ , y ⟨ks-1+1⟩ ), • • • , (a ⟨n⟩ , y ⟨n⟩ ) ,

Formula formula_48: I * G (A) = I * G ( Â) from Theorem 1.

Formula formula_49: I * G (A) = I * G ( Â)

Formula formula_50: A 0 = {a 0 1 , a 0 2 , • • • , a 0 n } and A 1 = {a 1 1 , a 1 2 , • • • , a 1 n }

Formula formula_51: • Sort A b = {a b ⟨1⟩ , a b ⟨2⟩ , • • • , a b ⟨n⟩ } in ascending order, i.e., a b ⟨1⟩ < a b ⟨2⟩ < • • • < a b ⟨n⟩ for b ∈ {0, 1}. • Set the corresponding labels {y ⟨1⟩ , y ⟨2⟩ , • • • , y ⟨n⟩ }

Formula formula_52: A l,b a b i = {(a b 1 , y 1 ), (a b 2 , y 2 ), • • • , (a b i , y i )} and A r,b a b i = {(a b i+1 , y i+1 ), (a b i+2 , y i+2 ), • • • , (a b n , y n )} .

Formula formula_53: I G (A b , a b i ) = i n - i n j∈[τ ] (ν l,b j ) 2 (i) 2 + n -i n - n -i n j∈[τ ] (ν r,b j ) 2 (n -i) 2 .

Formula formula_54: j∈[τ ] (ν l,0 j ) 2 = j∈[τ ] (ν l,1 j ) 2 .

Formula formula_55: Pr a 0 ⟨1⟩ , • • • , a 0 ⟨n⟩ |(a 0 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 0 ⟨n⟩ , y ⟨n⟩ ) = Pr a 1 ⟨1⟩ , • • • , a 1 ⟨n⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨n⟩ , y ⟨n⟩ ) . (18

Formula formula_56: )

Formula formula_57: max_features ⌊ √ d ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ ⌊ √ d⌋ differentia privacy level ϵ - - - - - - 1 - anonymization parameter k - - - - - 10 - - multi-party size p 2 2 2 2 2 - - - max_bin - - - 16 - - - -

Formula formula_58: Pr a 1 ⟨1⟩ , • • • , a 1 ⟨i⟩ , a 1 ⟨i+1⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) = Pr a 1 ⟨1⟩ , • • • , a 1 ⟨i⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ) × Pr l 1 .cipher 1 |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) × Pr r 1 .cipher 1 |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) .

Formula formula_59: Pr a 0 ⟨1⟩ , • • • , a 0 ⟨i⟩ , a 0 ⟨i+1⟩ |(a 0 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 0 ⟨i⟩ , y ⟨i⟩ ), (a 0 ⟨i+1⟩ , y ⟨i+1⟩ ) = Pr a 1 ⟨1⟩ , • • • , a 1 ⟨i⟩ , a 1 ⟨i+1⟩ |(a 1 ⟨1⟩ , y ⟨1⟩ ), • • • , (a 1 ⟨i⟩ , y ⟨i⟩ ), (a 1 ⟨i+1⟩ , y ⟨i+1⟩ ) .

Formula formula_60: wdbc 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×10 ×25 ×400 cancer 1 ×10 -3 ×10 -3 ×10 -3 ×1.5 ×10 ×20 ×300 breast 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×13 ×30 ×10 3 german 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×18 ×40 ×3000 diabetes 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×15 ×25 ×850 adver 1 ×10 -3 ×10 -3 ×10 -3 NA ×475 NA NA bibtex 1 ×10 -3 ×10 -3 ×10 -3 NA ×328 NA NA phpB0 1 ×10 -3 ×10 -3 ×10 -3 NA NA NA NA pendigits 1 ×10 -4 ×10 -4 ×10 -4 ×2 ×25 NA NA phish 1 ×10 -3 ×10 -3 ×10 -3 ×1 ×139 ×848 NA ailerons 1 ×10 -4 ×10 -4 ×10 -4 ×1 ×31 ×40 NA house 1 ×10 -4 ×10 -4 ×10 -4 ×1 ×31 ×38 NA a9a 1 ×10 -3 ×10 -3 ×10 -3 ×1 ×453 ×762 NA amazon 1 ×10 -4 ×10 -4 ×10 -4 ×1 ×51 ×31 NA bank 1 ×10 -4 ×10 -4 ×10 -4 ×1.5 ×149 ×220 NA adult 1 ×10 -3 ×10 -3 ×10 -3 ×2 ×211 ×276 NA mnist 1 ×10 -4 ×10 -4 ×10 -4 NA NA NA NA miniboone 1 ×10 -4 ×10 -4 ×10 -4 ×2 ×35 NA NA runwalk 1 ×10 -4 ×10 -4 ×10 -4 ×2 ×35 NA NA covtype 1 ×10 -3 ×10 -3 ×10 -3 ×1 NA NA NA

Formula formula_61: L(i, j) = q∈[i,n-k+i] I(S⟨q⟩ [j] ) = S⟨i⟩ [j] ) n -k + 1 × s∈S j b⟨i⟩ [j]

