Efficient Learning of Domain-invariant Image Representations
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko
15 Jan 2013 arXiv 7 Comments ICLR 2013 Conference Track
ICLR 2013 Conference Track
We present an algorithm that learns representations which explicitly compensate for domain mismatch and which can be efficiently realized as linear classifiers. Specifically, we form a linear transformation that maps features from the target (test) domain to the source (training) domain as part of training the classifier. We optimize both the transformation and classifier parameters jointly, and introduce an efficient cost function based on misclassification loss. Our method combines several features previously unavailable in a single algorithm: multi-class adaptation through representation learning, ability to map across heterogeneous feature spaces, and scalability to large datasets. We present experiments on several image datasets that demonstrate improved accuracy and computational advantages compared to previous approaches.
State From To ( Cc) Subject Date Due Action
Fulfill
Judith Fanny Hoffman ICLR 2013 Conference Track
Fulfilled: ICLR 2013 call for conference papers

15 Jan 2013
Completed
Judith Fanny Hoffman ICLR 2013 Conference Track
Request for Endorsed for oral presentation: Efficient Learning of...

15 Jan 2013
Reveal: document
Judith Fanny Hoffman
Revealed: document: Efficient Learning of Domain-invariant Image Representations

05 Feb 2013
Completed
Rob Fergus Anonymous 9aa4
Request for review of Efficient Learning of Domain-invariant Image...

05 Feb 2013 01 Mar 2013
Completed
Rob Fergus Anonymous feb2
Request for review of Efficient Learning of Domain-invariant Image...

05 Feb 2013 01 Mar 2013
Completed
Rob Fergus Anonymous 36a3
Request for review of Efficient Learning of Domain-invariant Image...

05 Feb 2013 01 Mar 2013
Reveal: document
ICLR 2013 Conference Track
Revealed: document: Endorsed for oral presentation: Efficient Learning of...

27 Mar 2013
Fulfill
ICLR 2013 Conference Track Judith Fanny Hoffman
Fulfilled: Request for Endorsed for oral presentation: Efficient Learning of...

27 Mar 2013

7 Comments

Anonymous 9aa4 01 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
Anonymous 9aa4
Revealed: document: review of Efficient Learning of Domain-invariant Image...

01 Mar 2013
Fulfill
Anonymous 9aa4 Rob Fergus
Fulfilled: Request for review of Efficient Learning of Domain-invariant Image...

01 Mar 2013
This paper focuses on multi-task learning across domains, where both the data generating distribution and the output labels can change between source and target domains. It presents a SVM-based model which jointly learns 1) affine hyperplanes that separate the classes in a common domain consisting of the source and the target projected to the source; and 2) a linear transformation mapping points from the target domain into the source domain. Positive points 1) The method is dead simple and seems technically sound. To the best of my knowledge it's novel, but I'm not as familiar with the SVM literature - I am hoping that another reviewer comes from the SVM community and can better assess its novelty. 2) The paper is well written and understandable 3) The experiments seem thorough: several datasets and tasks are considered, the model is compared to various baselines. The model is shown to outperform contemporary domain adaption methods, generalize to novel test categories at test time (which many other methods cannot do) and can scale to large datasets. Negative points I have one major criticism: the paper doesn't seem really focused on representation learning - it's more a paper about a method for multi-task learning across domains which learns a (shallow, linear) mapping from source to target. I agree - it's a representation but there's no real analysis or focus on the representation itself - e.g. what is being captured by the representation. The method is totally valid, but I just get the sense that it's a paper that could fit well with CVPR or ICCV (i.e. a good vision paper) where the title says "represention learning", and a few sentences highlight the "representation" that's being learned, however the method nor the paper's focus is really on learning interesting representations. On one hand I question its suitability for ICLR and it's appeal to the community (compared to CVPR/ICCV, etc.) but on the other hand, I think it's great to encourage diversity in the papers/authors at the conference and having a more "visiony"-feeling paper is not a bad thing. Comments -------- Can you state up front what is meant by the asymmetry of the transform (e.g. when it's first mentioned)? Later on in the paper it becomes clear that it has to do with the source and target having different feature dimensions but it wasn't obvious to me at the beginning of the paper. Just before Eq (4) and (5) it says that "we begin by rewriting Eq 1-3 with soft constraints (slack)". But where are the slack variables in Eq 4?
Please log in to comment.
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko 11 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko
Revealed: document:

11 Mar 2013
Please see the comment below (from March 3rd). We have updated the paper to incorporate your comments.
Please log in to comment.
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko 03 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko
Revealed: document:

03 Mar 2013
Thank you for your feedback. We argue that the task of adapting representations across domains is one that is common to all representation learning challenges, including those based on deep architectures, metric learning methods, and max-margin transform learning. Our insight into this problem is to use the source classifier to inform the representation learned for the target data. Specifically, we jointly learn a source domain classifier and a representation for the target domain, such that the target points can be well classified in the source domain. We present a specific algorithm using an SVM classifier and testing on visual domains, however the principles of our method are applicable to both a range of methods for learning and classification (beyond SVM) as well as a range of applications (beyond vision). In addition, thank you for your comments section. We will clarify what is meant by asymmetric transform and modify the wording around equations (4-5) to reflect the math shown, which has soft constraints and no slack variables.
Please log in to comment.
Anonymous feb2 04 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
Anonymous feb2
Revealed: document: review of Efficient Learning of Domain-invariant Image...

04 Mar 2013
Fulfill
Anonymous feb2 Rob Fergus
Fulfilled: Request for review of Efficient Learning of Domain-invariant Image...

04 Mar 2013
This paper proposes to make domain adaptation and multi-task learning easier by jointly learning the task-specific max-margin classifiers and a linear mapping from a new target space to the source space; the loss function encourages the mapped features to lie on the correct side of the hyperplanes learned for each task of the hyperplanes of the max-margin classifiers. Experiments show that the mapping performs as well or better as existing domain adaptation methods, but can scale to larger problems while many earlier approaches are too costly. Overall the paper is clear, well-crafted, and the context and previous work are well presented. The idea is appealing in its simplicity, and works well. Pros: the idea is intuitive and well justified; it is appealing that the method is flexible and can tackle cases where labels are missing for some categories. The paper is clear and well-written. Experimental results are convincing enough; while the results are not outperforming the state of the art (results are within the standard error of previously published performance), the authors' argument that their method is better suited to cases where domains are more different seems reasonable and backed by their experimental results. Cons: this method would work only in cases where a simple general linear rotation of features would do a good job placing features in a favorable space. The method also gives a privileged role to the source space, while methods that map features to a common latent space have more symmetry; the authors argue that it is hard to guess the optimal dimension of the latent space -- but their method simply constrains it to the size of the source space, so there is no guarantee that this would be any more optimal.
Please log in to comment.
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko 10 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko
Revealed: document:

10 Mar 2013
Thank you for your review. In this paper we present a method that learns an asymmetric linear mapping between the source and target feature spaces. In general, the feature transformation learning can be kernelized (the optimization framework can be formulated as a standard QP). However, for this work we focus on the linear case because of it's scalability to a large number of data points. We show that using the linear framework we perform as well or better than other methods which learn a non-linear mapping. We learn a transformation between the target and source points which can be expressed by the matrix W in our paper. In this paper, we use this matrix to compute the dot product in the source domain between theta_k and the transformed target points (Wx^t_i). However, if we think of W (an asymmetric matrix) as begin decomposed as W = A'B, then the dot product function can be interpreted as theta_k'A'Bx^t_i. In other words it could be interpreted as the dot product in some common latent space between source points transformed by A and target points transformed by B. We propose learning the W matrix rather than A,B directly so that we do not have to specify the dimension of the latent space.
Please log in to comment.
Anonymous 36a3 08 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
Anonymous 36a3
Revealed: document: review of Efficient Learning of Domain-invariant Image...

08 Mar 2013
Fulfill
Anonymous 36a3 Rob Fergus
Fulfilled: Request for review of Efficient Learning of Domain-invariant Image...

08 Mar 2013
The paper presents a new method for learning domain invariant image representations. The proposed approach simultaneously learns a linear mapping of the target features into the source domain and the parameters of a multi-class linear SVM classifier. Experimental evaluations show that the proposed approach performs similarly or better than previous art. The new algorithm presents computational advantages with respect to previous approaches. The paper is well written and clearly presented. It addresses an interesting problem proposing that has received attention in recent years. The proposed method is considerably simpler than competitive approaches with similar (or better) performance (in the setting of the reported experiments). The method is not very novel but manages to improve some drawbacks of previous approaches. Pros: - the proposed framework is fairly simple and the provided implementation details makes it easy to reproduce - experimental evaluation is presented, comparing the proposed method with several competing approaches. The amount of empirical evidence seems sufficient to back up the claims. Cons: - Being this method general, I think that it would have been very good to include an example with more distinct source and target feature spaces (e.g. text categorization), or even better different modalities. Comments: In the work [15], the authors propose a metric that measures the adaptability between a pair of source and target domains. In this setting if several possible source domains are available, it selects the best one. How could this be considered in your setting? In the first experimental setting (standard domain adaptation problem), I understand that the idea the experiment is to show how the labeled data in the source domain can help to better classify the data in the target domain. It is not clear to me how the SVM trained with training data, SVM_t, of the target domain. Is this done only with the limited set of labeled data in the target domain? What is the case for the SVM_s? Looking to the last experimental setting, I suppose that the SVM_s (trained using source training data) also includes the transformed data from the target domain. Otherwise, I don't understand how the performance can increase by increasing the number of labeled target examples.
Please log in to comment.
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko 10 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
Judy Hoffman, Erik Rodner, Jeff Donahue, Trevor Darrell, Kate Saenko
Revealed: document:

10 Mar 2013
Thank you for your feedback. We would like to start by clarifying a few points from your comments section. First, our first experiment (standard domain adaptation setting) SVM_t is the classifier learned from being trained with only the limited available data from the target domain. So, for example when we're looking at the shift between amazon to webcam (a->w) we have a lot of training data from amazon and a very small amount of the webcam dataset. SVM_t for this example would be an SVM trained on just the small amount of data from webcam. Note that in the new category experiment setting it is not possible to train SVM_t because there are some categories that have no labeled examples in the target. Second, for our last experiment, SVM_s does not (and should not) change as the number of points in the target is increased. SVM_s is an SVM classifier trained using only source data. In the figure it is represented by the dotted cyan line, which remains constant (at around 42%) as the number of labeled target examples grows. As a third point, if we did have a metric to determine the adaptability of a (source,target) domain pair then we could simply choose to use the source data which is most adaptable to our target data. However, [15] provides a metric to determine a "distance" between the source and target subspace, not necessarily an adaptability metric. The two might be correlated depending on the adaptation algorithm you use. Namely, if a (source,target) pair are "close" you might assume they are easily adaptable. But, with our method we learn a transformation between the two spaces, so it's possible for a (source,target) pair to initially be very different according to the metric from [15], but be very adaptable. For example: in [15] the metric said that Caltech was most similar to Amazon, followed by Webcam, followed by Dslr. However, if you look at Table 1 you see that we received higher accuracy when adapting between dslr->caltech then from webcam->caltech. So even though webcam was initially more similar to caltech than dslr to caltech, we find that dslr is more "adaptable" to caltech. Finally, the idea of using more definite domains or even different modalities is very interesting to us and is something we are considering for future work. We do feel that the experiments we present do justify our claims that our algorithm performs comparable or better than state of the art techniques and is simultaneously applicable to a larger variety of possible adaptation scenarios.
Please log in to comment.
ICLR 2013 Conference Track 27 Mar 2013
State From To ( Cc) Subject Date Due Action
Reveal: document
ICLR 2013 Conference Track
Revealed: document: Endorsed for oral presentation: Efficient Learning of...

27 Mar 2013
Fulfill
ICLR 2013 Conference Track Judith Fanny Hoffman
Fulfilled: Request for Endorsed for oral presentation: Efficient Learning of...

27 Mar 2013
Endorsed for oral presentation: Efficient Learning of Domain-invariant Image Representations
Please log in to comment.

Please log in to comment.