{
  "metadata": {
    "forum_id": "BkzeUiRcY7",
    "review_id": "SyggSiwZp7",
    "rebuttal_id": "S1lrcUBMA7",
    "title": "M^3RL: Mind-aware Multi-agent Management Reinforcement Learning",
    "reviewer": "AnonReviewer3",
    "rating": 7,
    "conference": "ICLR2019",
    "permalink": "https://openreview.net/forum?id=BkzeUiRcY7&noteId=S1lrcUBMA7",
    "annotator": "anno2"
  },
  "review_sentences": [
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 0,
      "text": "Summary:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 1,
      "text": "This paper proposes a way to train a manager agent which would manage a bunch of worker agents to achieve a high-level goal.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 2,
      "text": "Each worker has its own set of skills and preferences and the manager tries to assign sub-tasks to these agents along with bonuses such that the agents can even perform tasks that are not preferred by them.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 3,
      "text": "Authors achieve this by training a manager which tracks the skills and preferences of the agents on the fly.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 4,
      "text": "Authors have done an extensive analysis of the proposed approach in two simple domains: resource collection and crafting.",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 5,
      "text": "Major comments:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 6,
      "text": "This paper focuses on multi-agent settings with self-interested agents.",
      "suffix": "",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_summary",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 7,
      "text": "The problem formulation and the solution are novel enough.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_originality",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 8,
      "text": "Experiments are on toy domains with very few goals and sub-task dependencies.",
      "suffix": "",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_substance",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 9,
      "text": "However, authors have done a good job in doing an extensive analysis of the proposed approach.",
      "suffix": "\n\n",
      "review_action": "arg_evaluative",
      "fine_review_action": "none",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_positive"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 10,
      "text": "1.\tCan you comment about the scalability of the proposed solution when the number of possible subtasks increases? When the sub-task dependency graph size increases?",
      "suffix": "\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_soundness-correctness",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 11,
      "text": "2.\tWhat is the reason for using rule-based agents in all the experiments? It would have been more useful if all the analysis are done with RL agents rather than rule-based agents.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_explanation",
      "aspect": "asp_soundness-correctness",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 12,
      "text": "It would also make the paper stronger.",
      "suffix": "\n\n",
      "review_action": "arg_fact",
      "fine_review_action": "none",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 13,
      "text": "3.\tAre the authors willing to release the code? Overall the model looks complicated and the appendix is not sufficient to reproduce the results in the paper.",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_clarification",
      "aspect": "asp_replicability",
      "polarity": "pol_negative"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 14,
      "text": "I would increase my rating if the authors are willing to release the code to reproduce all the results reported in the paper.",
      "suffix": "\n\n\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_edit",
      "aspect": "asp_replicability",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 15,
      "text": "Minor comments:",
      "suffix": "\n\n",
      "review_action": "arg_structuring",
      "fine_review_action": "arg-structuring_heading",
      "aspect": "none",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 16,
      "text": "1.\tPage 3, line 9: \u201ctypical\u201d -> \u201ctypically\u201d",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 17,
      "text": "2.\tPage 3, \u201cintention\u201d section: \u201cBased on the its reward ..\u201d Check grammar.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 18,
      "text": "3.\tPage 5, last line: \u201cthe total quantitative is 10\u201d check grammar.",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 19,
      "text": "4.\tPage 8, conclusions, second line: \u201cnad\u201d -> \u201cand\u201d",
      "suffix": "\n",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_clarity",
      "polarity": "none"
    },
    {
      "review_id": "SyggSiwZp7",
      "sentence_index": 20,
      "text": "5.\tPage 8, conclusions, 4th line: \u201ccombing\u201d -> \u201ccombine\u201d",
      "suffix": "",
      "review_action": "arg_request",
      "fine_review_action": "arg-request_typo",
      "aspect": "asp_clarity",
      "polarity": "none"
    }
  ],
  "rebuttal_sentences": [
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 0,
      "text": "Thank you for your reviews and comments.",
      "suffix": "",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_social",
      "alignment": [
        "context_global",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 1,
      "text": "We address your questions as follows.",
      "suffix": "\n\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_in-rebuttal",
        null
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 2,
      "text": "1. Scalability of the proposed solution",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 3,
      "text": "From our current results, you may see that our approach has a decent scalability -- even though we doubled the subtasks and also introduced additional dependency in Crafting compared to Resource Collection, it does not need much more episodes for converging to optimal policies, where our agent-wise exploration plays an important role.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 4,
      "text": "Generally speaking, deploying more present workers coupled with our agent-wise exploration should significantly improve the learning efficiency and overcome the challenges introduced from more substasks or a larger dependency graph.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 5,
      "text": "In addition, the computational complexity is linear in terms of the number of agents, so our approach is also scalable when there are more agents.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          10
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 6,
      "text": "2. What is the reason for using rule-based agents in all the experiments?",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 7,
      "text": "We have actually used RL agents as well (Appendix C.3), and it showed that our approach also works when workers are RL agents.",
      "suffix": "",
      "rebuttal_stance": "dispute",
      "rebuttal_action": "rebuttal_refute-question",
      "alignment": [
        "context_sentences",
        [
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 8,
      "text": "In the main results, we focus on rule-based agents because it is computationally demanding to train a large population of RL agents, and our focus was not about the worker policies but rather how the manager assesses the workers\u2019 mental states and encourages an optimal collaboration accordingly.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 9,
      "text": "In this paper, using a cheap rule-based implementation with randomness has demonstrated the effect of different components of our approach.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          11
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 10,
      "text": "3. Are the authors willing to release the code?",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 11,
      "text": "Yes, we do plan to open source our implementation.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 12,
      "text": "Specifically, the game environment and the worker agents were implemented in Python and it runs at a speed of more than 300 steps per second.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 13,
      "text": "We used PyTorch as the framework for implementing all the network modules.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 14,
      "text": "Typically it took < 10 hours to get a converged result by our approach on a single Nvidia Tesla V100 GPU.",
      "suffix": "\n\n",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_answer",
      "alignment": [
        "context_sentences",
        [
          13
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 15,
      "text": "4. Typos",
      "suffix": "\n",
      "rebuttal_stance": "nonarg",
      "rebuttal_action": "rebuttal_structuring",
      "alignment": [
        "context_sentences",
        [
          16,
          17,
          18,
          19,
          20
        ]
      ],
      "details": {}
    },
    {
      "review_id": "SyggSiwZp7",
      "rebuttal_id": "S1lrcUBMA7",
      "sentence_index": 16,
      "text": "Thanks for pointing out these typos. We will fix them in the next revision.",
      "suffix": "",
      "rebuttal_stance": "concur",
      "rebuttal_action": "rebuttal_by-cr",
      "alignment": [
        "context_sentences",
        [
          16,
          17,
          18,
          19,
          20
        ]
      ],
      "details": {
        "manuscript_change": true
      }
    }
  ]
}