{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run a model with h5 data as input\n",
    "\n",
    "FuxiCTR version: v1.0\n",
    "\n",
    "This tutorial presents how to run a model with h5 data as input."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "FuxiCTR supports both csv and h5 data as input. After running a model with csv dataset config, the h5 data will be generated at the `model_root` path. One can reuse the produced h5 data for other experiments flexibly. We demonstrate this with the data `taobao_tiny_h5` as follows. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# demo/demo_config/dataset_config.yaml\n",
    "taobao_tiny_h5:\n",
    "    data_root: ../data/\n",
    "    data_format: h5\n",
    "    train_data: ../data/taobao_tiny_h5/train.h5\n",
    "    valid_data: ../data/taobao_tiny_h5/valid.h5\n",
    "    test_data: ../data/taobao_tiny_h5/test.h5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that each h5 dataset contains a `feature_map.json` file, which saves the feature specifications required for data loading and model training. Take the following feature_map.json as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# data/taobao_tiny_h5/feature_map.json\n",
    "{\n",
    "    \"dataset_id\": \"taobao_tiny_h5\",\n",
    "    \"num_fields\": 14,\n",
    "    \"num_features\": 476,\n",
    "    \"input_length\": 14,\n",
    "    \"feature_specs\": {\n",
    "        \"userid\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 25,\n",
    "            \"index\": 0\n",
    "        },\n",
    "        \"adgroup_id\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 100,\n",
    "            \"index\": 1\n",
    "        },\n",
    "        \"pid\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 3,\n",
    "            \"index\": 2\n",
    "        },\n",
    "        \"cate_id\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 48,\n",
    "            \"index\": 3\n",
    "        },\n",
    "        \"campaign_id\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 98,\n",
    "            \"index\": 4\n",
    "        },\n",
    "        \"customer\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 97,\n",
    "            \"index\": 5\n",
    "        },\n",
    "        \"brand\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 66,\n",
    "            \"index\": 6\n",
    "        },\n",
    "        \"cms_segid\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 10,\n",
    "            \"index\": 7\n",
    "        },\n",
    "        \"cms_group_id\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 10,\n",
    "            \"index\": 8\n",
    "        },\n",
    "        \"final_gender_code\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 3,\n",
    "            \"index\": 9\n",
    "        },\n",
    "        \"age_level\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 6,\n",
    "            \"index\": 10\n",
    "        },\n",
    "        \"pvalue_level\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 3,\n",
    "            \"index\": 11\n",
    "        },\n",
    "        \"shopping_level\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 4,\n",
    "            \"index\": 12\n",
    "        },\n",
    "        \"occupation\": {\n",
    "            \"source\": \"\",\n",
    "            \"type\": \"categorical\",\n",
    "            \"min_categr_count\": 1,\n",
    "            \"vocab_size\": 3,\n",
    "            \"index\": 13\n",
    "        }\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given the h5 dataset as input, we provide a demo to run DeepFM in `DeepFM_with_h5.py`. The core code is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the params\n",
    "config_dir = 'demo_config'\n",
    "experiment_id = 'DeepFM_test_h5'\n",
    "params = load_config(config_dir, experiment_id)\n",
    "\n",
    "# Set up feature encoder and load from feature_map.json\n",
    "feature_encoder = FeatureEncoder(**params)\n",
    "feature_encoder.feature_map.load(feature_encoder.json_file)\n",
    "\n",
    "# Build train/validation/test data generators\n",
    "train_gen, valid_gen, test_gen = data_generator(feature_encoder,\n",
    "                                                train_data=params['train_data'],\n",
    "                                                valid_data=params['valid_data'],\n",
    "                                                test_data=params['test_data'],\n",
    "                                                batch_size=params['batch_size'],\n",
    "                                                shuffle=params['shuffle'],\n",
    "                                                use_hdf5=True)\n",
    "# Build a DeepFM model\n",
    "model = DeepFM(feature_encoder.feature_map, **params)\n",
    "model.fit_generator(train_gen, validation_data=valid_gen, epochs=params['epochs'],\n",
    "                    verbose=params['verbose'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The full code is available in `demo/DeepFM_with_h5.py`. You can run the demo as shown below. In addition, if you would like to change the setting of a feature field, you can modify the corresponding values in feature_map.json."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd demo\n",
    "python DeepFM_with_h5.py"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
