{"code": "###Instruction: What is an API that can be used to classify sports activities in videos?\n###Output: {'domain': 'Video Classification', 'api_call': 'model = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained 3D ResNet model from PyTorch for video classification, suitable for classifying sports activities in videos.', 'code': 'import torch\nmodel = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorchvideo', model='slow_r50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Video Classification", "framework": "PyTorch", "functionality": "3D ResNet", "api_name": "slow_r50", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorchvideo', model='slow_r50', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": ["torch", "json", "urllib", "pytorchvideo", "torchvision", "torchaudio", "torchtext", "torcharrow", "TorchData", "TorchRec", "TorchServe", "PyTorch on XLA Devices"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True)", "device = 'cpu'", "model = model.eval()", "model = model.to(device)"], "performance": {"dataset": "Kinetics 400", "accuracy": {"top_1": 74.58, "top_5": 91.63}, "Flops (G)": 54.52, "Params (M)": 32.45}, "description": "The 3D ResNet model is a Resnet-style video classification network pretrained on the Kinetics 400 dataset. It is based on the architecture from the paper 'SlowFast Networks for Video Recognition' by Christoph Feichtenhofer et al."}}
{"code": "###Instruction: Identify an API capable of converting spoken language in a recording to text.\n###Output: {'domain': 'Speech-To-Text', 'api_call': \"model = torch.hub.load('snakers4/silero-models', 'silero_stt', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the Silero Models for Speech-To-Text from PyTorch Hub with pre-trained weights to convert spoken language in a recording to text.', 'code': 'import torch\ndevice = torch.device('cpu')\nmodel, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', device=device)'}", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Speech-To-Text", "api_name": "Silero Models", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "api_arguments": {"repo_or_dir": "snakers4/silero-models", "model": "silero_stt", "language": ["en", "de", "es"], "device": "device"}, "python_environment_requirements": ["pip install -q torchaudio omegaconf soundfile"], "example_code": ["import torch", "import zipfile", "import torchaudio", "from glob import glob", "device = torch.device('cpu')", "model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', device=device)", "(read_batch, split_into_batches, read_audio, prepare_model_input) = utils", "torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)", "test_files = glob('speech_orig.wav')", "batches = split_into_batches(test_files, batch_size=10)", "input = prepare_model_input(read_batch(batches[0]), device=device)", "output = model(input)", "for example in output:", " print(decoder(example.cpu()))"], "performance": {"dataset": "Open-STT", "accuracy": "See the wiki for quality and performance benchmarks"}, "description": "Silero Speech-To-Text models provide enterprise grade STT in a compact form-factor for several commonly spoken languages. The models are robust to a variety of dialects, codecs, domains, noises, and lower sampling rates. They consume a normalized audio in the form of samples and output frames with token probabilities. A decoder utility is provided for simplicity."}}
{"code": "###Instruction: To analyze street photos, I need to segment different objects like pedestrians, vehicles, and buildings from a given image. Provide an API able to perform semantic segmentation on images.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': 'model = torch.hub.load(repo_or_dir=\"pytorch/vision\", model=\"deeplabv3_resnet50\", pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DeepLabV3 model with a ResNet-50 backbone from PyTorch Hub to perform semantic segmentation on images of streets and other scenes.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"deeplabv3_resnet50\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "DeepLabV3", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet50', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": "latest", "torchvision": "latest", "PIL": "latest", "matplotlib": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert('RGB')", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": [{"model": "deeplabv3_resnet50", "Mean IOU": 66.4, "Global Pixelwise Accuracy": 92.4}]}, "description": "DeepLabV3 models with ResNet-50, ResNet-101 and MobileNet-V3 backbones for semantic segmentation. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: To implement a lightweight object detection, I'm looking for a pre-trained model API that can detect and classify objects within an image in real-time.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the lightweight YOLOv5s model from PyTorch Hub, which is optimized for real-time object detection and classification.\", 'code': 'import torch\nmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='ultralytics/yolov5', model='yolov5s', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "YOLOv5", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='ultralytics/yolov5', model='yolov5s', pretrained=True)", "api_arguments": ["'ultralytics/yolov5'", "'yolov5s'", "pretrained=True"], "python_environment_requirements": "Python>=3.8, PyTorch>=1.7", "example_code": ["import torch", "model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)", "imgs = ['https://ultralytics.com/images/zidane.jpg']", "results = model(imgs)", "results.print()", "results.save()", "results.xyxy[0]", "results.pandas().xyxy[0]"], "performance": {"dataset": "COCO", "accuracy": {"YOLOv5s6": {"mAPval0.5:0.95": 43.3, "mAPtest0.5:0.95": 43.3, "mAPval0.5": 61.9}, "YOLOv5m6": {"mAPval0.5:0.95": 50.5, "mAPtest0.5:0.95": 50.5, "mAPval0.5": 68.7}, "YOLOv5l6": {"mAPval0.5:0.95": 53.4, "mAPtest0.5:0.95": 53.4, "mAPval0.5": 71.1}, "YOLOv5x6": {"mAPval0.5:0.95": 54.4, "mAPtest0.5:0.95": 54.4, "mAPval0.5": 72.0}, "YOLOv5x6 TTA": {"mAPval0.5:0.95": 55.0, "mAPtest0.5:0.95": 55.0, "mAPval0.5": 72.0}}}, "description": "YOLOv5 is a family of compound-scaled object detection models trained on the COCO dataset, and includes simple functionality for Test Time Augmentation (TTA), model ensembling, hyperparameter evolution, and export to ONNX, CoreML and TFLite."}}
{"code": "###Instruction: I need an image classification API that can handle millions of public images with thousands of hashtags. Please recommend one.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNext-101 32x8d WSL model from PyTorch Hub, which is pre-trained on 940 million public images with 1.5K hashtags and provides state-of-the-art accuracy.', 'code': 'import torch\nmodel = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x8d_wsl', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNext WSL", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x8d_wsl', pretrained=True)", "api_arguments": [{"name": "resnext101_32x8d_wsl", "type": "str", "description": "ResNeXt-101 32x8d WSL model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl')", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(output[0])", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeXt-101 32x8d": {"Top-1 Acc.": "82.2", "Top-5 Acc.": "96.4"}}}, "description": "The provided ResNeXt models are pre-trained in weakly-supervised fashion on 940 million public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. The models significantly improve the training accuracy on ImageNet compared to training from scratch. They achieve state-of-the-art accuracy of 85.4% on ImageNet with the ResNext-101 32x48d model."}}
{"code": "###Instruction: Developers of a Virtual Reality event want to create a realistic digital crowd. Can you suggest a pretrained model to generate faces of celebrities?\n###Output: {'domain': 'Image Generation', 'api_call': \"model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'PGAN', model_name='celebAHQ-512', pretrained=True, useGPU=use_gpu)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Progressive Growing of GANs (PGAN) model from PyTorch Hub to generate high-resolution, realistic images of celebrity faces.', 'code': 'import torch\nuse_gpu = True if torch.cuda.is_available() else False\nmodel = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'PGAN', model_name='celebAHQ-512', pretrained=True, useGPU=use_gpu)'}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='PGAN', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Generative Adversarial Networks (GANs)", "api_name": "PGAN", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='PGAN', pretrained=True)", "api_arguments": {"repo_or_dir": "facebookresearch/pytorch_GAN_zoo:hub", "model": "PGAN", "model_name": "celebAHQ-512", "pretrained": "True", "useGPU": "use_gpu"}, "python_environment_requirements": "Python 3", "example_code": {"import": "import torch", "use_gpu": "use_gpu = True if torch.cuda.is_available() else False", "load_model": "model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'PGAN', model_name='celebAHQ-512', pretrained=True, useGPU=use_gpu)", "build_noise_data": "noise, _ = model.buildNoiseData(num_images)", "test": "generated_images = model.test(noise)", "plot_images": {"import_matplotlib": "import matplotlib.pyplot as plt", "import_torchvision": "import torchvision", "make_grid": "grid = torchvision.utils.make_grid(generated_images.clamp(min=-1, max=1), scale_each=True, normalize=True)", "imshow": "plt.imshow(grid.permute(1, 2, 0).cpu().numpy())", "show": "plt.show()"}}, "performance": {"dataset": "celebA", "accuracy": "High-quality celebrity faces"}, "description": "Progressive Growing of GANs (PGAN) is a method for generating high-resolution images using generative adversarial networks. The model is trained progressively, starting with low-resolution images and gradually increasing the resolution until the desired output is achieved. This implementation is based on the paper by Tero Karras et al., 'Progressive Growing of GANs for Improved Quality, Stability, and Variation'."}}
{"code": "###Instruction: I need an API to classify images from a dataset with a high accuracy rate. Provide an appropriate API and the performance on the ImageNet dataset.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet68ds', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Harmonic DenseNet (HarDNet) model from PyTorch Hub for efficient image classification.', 'code': 'import torch\nmodel = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet68ds', pretrained=True)', 'ImageNet_performance': {'hardnet68ds': {'Top-1 error': 25.71, 'Top-5 error': 8.13}}}", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet68ds', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "HarDNet", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet68ds', pretrained=True)", "api_arguments": [{"name": "hardnet68ds", "type": "str", "description": "HarDNet-68DS model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet68ds', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"hardnet68ds": {"Top-1 error": 25.71, "Top-5 error": 8.13}}}, "description": "Harmonic DenseNet (HarDNet) is a low memory traffic CNN model, which is fast and efficient. The basic concept is to minimize both computational cost and memory access cost at the same time, such that the HarDNet models are 35% faster than ResNet running on GPU comparing to models with the same accuracy (except the two DS models that were designed for comparing with MobileNet)."}}
{"code": "###Instruction: A tourism website is building a feature to categorize photos into classes of landmarks. Recommend a machine learning API that will take an image and output which class the image falls into.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet68', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained HarDNet model from PyTorch Hub, which can be fine-tuned for landmark classification given an image of a landmark.', 'code': 'import torch\nmodel = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet68', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet68', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "HarDNet", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet68', pretrained=True)", "api_arguments": [{"name": "hardnet68", "type": "str", "description": "HarDNet-68 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet68', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"hardnet68": {"Top-1 error": 23.52, "Top-5 error": 6.99}}}, "description": "Harmonic DenseNet (HarDNet) is a low memory trafficCNN model, which is fast and efficient. The basic concept is to minimize both computational cost and memory access cost at the same time, such that the HarDNet models are 35% faster than ResNet running on GPU comparing to models with the same accuracy (except the two DS models that were designed for comparing with MobileNet)."}}
{"code": "###Instruction: A photographer at National Geographic is finding photos for the monthly magazine cover. They need a model to classify a picture of a cheetah running in the wild from other images.\n###Output: {'domain': 'Image Classification', 'api_call': \"torch.hub.load(repo_or_dir='pytorch/vision', model='resnet34', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the ResNet34 model in PyTorch Hub for image classification to distinguish a cheetah running in the wild from other images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet34', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet34', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet34', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet34', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet34", "top-1_error": 26.7, "top-5_error": 8.58}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: DXmart needs to build a product image classification system for their e-commerce site. Provide an API that can classify product images.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet152', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet model from PyTorch Hub and use it for product image classification. This deep residual network is pretrained on ImageNet and can be fine-tuned to classify e-commerce product images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet152', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet152', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet152', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet152', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet152", "top-1_error": 21.69, "top-5_error": 5.94}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Identify an API to perform efficient animal classification from user provided images without sacrificing model accuracy for a biodiversity conservation project.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP Small model from PyTorch Hub for efficient image classification. This model can be fine-tuned for animal classification while maintaining high accuracy and low computational requirements.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_s', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_s', pretrained=True)", "api_arguments": [{"name": "snnmlp_s", "type": "str", "description": "SNNMLP Small model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Small", "top-1": 83.3}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: Recommend an API to build an Image Classifier that would better classify images with minimal computational resources.\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP model from PyTorch Hub, which uses LIF neurons to achieve better classification accuracy without extra computational resources.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "api_arguments": [{"name": "snnmlp_b", "type": "str", "description": "SNNMLP Base model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Base", "top-1": 85.59}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: I need to recognize dogs and cats from images. What API should I use to perform this task?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG11 model from PyTorch Hub, which can be used for image recognition tasks such as classifying dogs and cats from images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "api_arguments": [{"name": "vgg11", "type": "str", "description": "VGG11 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11": {"Top-1 error": 30.98, "Top-5 error": 11.37}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: I need a suitable PyTorch API that can classify a wide range of images. Please provide me with instructions on how to load the pretrained model.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg13_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG13 model with batch normalization from PyTorch Hub, which is suitable for a wide range of image classification tasks.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg13_bn', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13_bn', pretrained=True)", "api_arguments": [{"name": "vgg13_bn", "type": "str", "description": "VGG13 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg13_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg13_bn": {"Top-1 error": 28.45, "Top-5 error": 9.63}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: I need to build an image classifier to identify objects in a photo. Suggest a suitable model that I can use for this purpose.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg16', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pre-trained VGG16 model from PyTorch Hub for image recognition and object identification in photos.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg16', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg16', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg16', pretrained=True)", "api_arguments": [{"name": "vgg16", "type": "str", "description": "VGG16 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg16": {"Top-1 error": 28.41, "Top-5 error": 9.62}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: A developer is building a mobile app to identify objects using the mobile camera. Suggest an API to classify object types given an image.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model for image classification from PyTorch Hub to classify objects in an image taken from a mobile camera.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_380x380', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "api_arguments": {"model_name": "mealv2_resnest50_380x380"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ ResNet50", "resolution": "380", "parameters": "25.6M", "top1": "81.72", "top5": "95.81"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: A wildlife organization is looking to classify photos taken on their CCTV cameras into 100 different animal species. Suggest an API to achieve this task.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_100', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 with MobileNet V3-Small 1.0 model from PyTorch Hub, which can be fine-tuned to classify images captured by CCTV cameras into 100 different animal species.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_100', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_100', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_100', pretrained=True)", "api_arguments": {"model_name": "mealv2_mobilenetv3_small_100"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ MobileNet V3-Small 1.0", "resolution": "224", "parameters": "2.54M", "top1": "69.65", "top5": "88.71"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: A self-driving car company is developing an autonomous vehicle that requires detecting objects, drivable area segmentation, and lane detection in real-time. Suggest an appropriate API for this.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained YOLOP model from PyTorch Hub that can perform object detection, drivable area segmentation, and lane detection in real-time for autonomous driving scenarios.', 'code': \"import torch\nmodel = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='hustvl/yolop', model='yolop', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "Object Detection, Drivable Area Segmentation, Lane Detection", "api_name": "YOLOP", "api_call": "torch.hub.load(repo_or_dir='hustvl/yolop', model='yolop', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "pip install -qr https://github.com/hustvl/YOLOP/blob/main/requirements.txt", "example_code": "import torch\nmodel = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\nimg = torch.randn(1,3,640,640)\ndet_out, da_seg_out,ll_seg_out = model(img)", "performance": {"dataset": "BDD100K", "accuracy": {"Object Detection": {"Recall(%)": 89.2, "mAP50(%)": 76.5, "Speed(fps)": 41}, "Drivable Area Segmentation": {"mIOU(%)": 91.5, "Speed(fps)": 41}, "Lane Detection": {"mIOU(%)": 70.5, "IOU(%)": 26.2}}}, "description": "YOLOP is an efficient multi-task network that can jointly handle three crucial tasks in autonomous driving: object detection, drivable area segmentation and lane detection. And it is also the first to reach real-time on embedded devices while maintaining state-of-the-art level performance on the BDD100K dataset."}}
{"code": "###Instruction: I want an ML library that can determine the object distances in a photo without inputting more than one photo.\n###Output: {'domain': 'Depth Estimation', 'api_call': \"model = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained MiDaS model from PyTorch Hub, which can compute relative inverse depth from a single image, giving an estimate of object distances.\", 'code': \"import torch\nmodel = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Computing relative depth from a single image", "api_name": "MiDaS", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "api_arguments": {"repo_or_dir": "intel-isl/MiDaS", "model": "model_type"}, "python_environment_requirements": "pip install timm", "example_code": ["import cv2", "import torch", "import urllib.request", "import matplotlib.pyplot as plt", "url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "urllib.request.urlretrieve(url, filename)", "model_type = 'DPT_Large'", "midas = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small')", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "midas.to(device)", "midas.eval()", "midas_transforms = torch.hub.load('intel-isl/MiDaS', 'transforms')", "if model_type == 'DPT_Large' or model_type == 'DPT_Hybrid':", " transform = midas_transforms.dpt_transform", "else:", " transform = midas_transforms.small_transform", "img = cv2.imread(filename)", "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)", "input_batch = transform(img).to(device)", "with torch.no_grad():", " prediction = midas(input_batch)", "prediction = torch.nn.functional.interpolate(", " prediction.unsqueeze(1),", " size=img.shape[:2],", " mode='bicubic',", " align_corners=False,", ").squeeze()", "output = prediction.cpu().numpy()", "plt.imshow(output)", "plt.show()"], "performance": {"dataset": "10 distinct datasets", "accuracy": "Multi-objective optimization"}, "description": "MiDaS computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs."}}
{"code": "###Instruction: I would like a simple method to turn spoken user commands into text, which AI API would you recommend?\n###Output: {'domain': 'Speech-To-Text', 'api_call': 'model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Utilize the pretrained Silero Speech-To-Text models, which are designed to be robust and support multiple languages. These models can convert spoken user commands into text efficiently.', 'code': 'import torch\nmodel, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Speech-To-Text", "api_name": "Silero Models", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "api_arguments": {"repo_or_dir": "snakers4/silero-models", "model": "silero_stt", "language": ["en", "de", "es"], "device": "device"}, "python_environment_requirements": ["pip install -q torchaudio omegaconf soundfile"], "example_code": ["import torch", "import zipfile", "import torchaudio", "from glob import glob", "device = torch.device('cpu')", "model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', device=device)", "(read_batch, split_into_batches, read_audio, prepare_model_input) = utils", "torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)", "test_files = glob('speech_orig.wav')", "batches = split_into_batches(test_files, batch_size=10)", "input = prepare_model_input(read_batch(batches[0]), device=device)", "output = model(input)", "for example in output:", " print(decoder(example.cpu()))"], "performance": {"dataset": "Open-STT", "accuracy": "See the wiki for quality and performance benchmarks"}, "description": "Silero Speech-To-Text models provide enterprise grade STT in a compact form-factor for several commonly spoken languages. The models are robust to a variety of dialects, codecs, domains, noises, and lower sampling rates. They consume a normalized audio in the form of samples and output frames with token probabilities. A decoder utility is provided for simplicity."}}
{"code": "###Instruction: Write me an API to use as a pretrained model for classifying images into categories.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'densenet161', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DenseNet-161 model from PyTorch Hub to classify images into different categories.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'densenet161', pretrained=True)\nmodel.eval()'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet161', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Dense Convolutional Network", "api_name": "Densenet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet161', pretrained=True)", "api_arguments": [{"name": "densenet161", "type": "str", "description": "Densenet-161 model"}], "python_environment_requirements": {"torch": "latest", "torchvision": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet161', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"densenet161": {"Top-1 error": 22.35, "Top-5 error": 6.2}}}, "description": "Dense Convolutional Network (DenseNet) connects each layer to every other layer in a feed-forward fashion. It alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters."}}
{"code": "###Instruction: A company wants to segment objects in the images for its e-commerce website. Give an API that can segment objects in images.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': \"model = torch.hub.load('pytorch/vision', 'fcn_resnet50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained FCN-ResNet50 model from PyTorch Hub, which is capable of semantic segmentation to separate objects in images.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"fcn_resnet50\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "Fully-Convolutional Network", "api_name": "fcn_resnet50", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet50', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "boolean", "description": "If True, returns a model pre-trained on COCO train2017"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "matplotlib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet50', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert(\"RGB\")", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": {"fcn_resnet50": {"Mean IOU": 60.5, "Global Pixelwise Accuracy": 91.4}}}, "description": "FCN-ResNet is a Fully-Convolutional Network model using a ResNet-50 or a ResNet-101 backbone. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: I'm working on a medical app and I want to classify images of skin lesions. Show me an API that can classify images with high efficiency and accuracy.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('huawei-noah/ghostnet', 'ghostnet_1x', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained GhostNet model from PyTorch Hub, which demonstrates high efficiency and accuracy for image classification tasks. This model can be used for classifying skin lesions.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/ghostnet', 'ghostnet_1x', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/ghostnet', model='ghostnet_1x', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Efficient networks by generating more features from cheap operations", "api_name": "GhostNet", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/ghostnet', model='ghostnet_1x', pretrained=True)", "api_arguments": ["pretrained"], "python_environment_requirements": ["torch", "torchvision", "PIL"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/ghostnet', 'ghostnet_1x', pretrained=True)", "model.eval()", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)", "print(probabilities)"], "performance": {"dataset": "ImageNet", "accuracy": {"Top-1 acc": "73.98", "Top-5 acc": "91.46"}}, "description": "The GhostNet architecture is based on an Ghost module structure which generates more features from cheap operations. Based on a set of intrinsic feature maps, a series of cheap operations are applied to generate many ghost feature maps that could fully reveal information underlying intrinsic features. Experiments conducted on benchmarks demonstrate the superiority of GhostNet in terms of speed and accuracy tradeoff."}}
{"code": "###Instruction: What is an API that can classify an image of a dog into its specific breed from a list of 120 unique breeds?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained IBN-Net ResNeXt-101 model from PyTorch Hub and fine-tune it for dog breed classification with 120 unique breeds.', 'code': 'import torch\nmodel = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnext101_ibn_a", "type": "str", "description": "ResNeXt-101-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnext101_ibn_a": {"Top-1 acc": 79.12, "Top-5 acc": 94.58}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: Can you give me an API that can classify food dishes in restaurant menus using image classification?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNeSt101 model from PyTorch Hub that can be fine-tuned for food dish classification in restaurant menus.', 'code': 'import torch\nmodel = torch.hub.load('zhanghang1989/ResNeSt', 'resnest101', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNeSt", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest101', pretrained=True)", "api_arguments": "resnest101", "python_environment_requirements": {"torch": "1.0.0", "torchvision": "0.2.2"}, "example_code": ["import torch", "model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest101', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeSt-101": 82.83}}, "description": "ResNeSt models are from the ResNeSt: Split-Attention Networks paper. They are a new ResNet variant that enables attention across feature-map groups. By stacking Split-Attention blocks ResNet-style, ResNeSt models outperform other networks with similar model complexities, and also help downstream tasks including object detection, instance segmentation, and semantic segmentation."}}
{"code": "###Instruction: For my mobile app, I need an efficient and light-weight model that can classify animals, plants, landmarks, etc. in an image fed via the device's camera. Suggest an API.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_cpu', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ProxylessNAS model optimized for mobile devices from PyTorch Hub for efficient image classification.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_cpu', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_cpu', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_cpu', pretrained=True)", "api_arguments": [{"name": "proxylessnas_cpu", "type": "str", "description": "ProxylessNAS optimized for CPU"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_cpu'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_cpu", "accuracy": 75.3}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: For a wildlife photography website, suggest an API that can classify the animal species in a given photo.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet34', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained ResNet34 model from PyTorch Hub, which can be fine-tuned for classifying animal species in wildlife photos.\", 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet34', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet34', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet34', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet34', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet34", "top-1_error": 26.7, "top-5_error": 8.58}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Please suggest an API that can detect and count the number of birds in an image.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet101 model from PyTorch Hub, which can be fine-tuned for bird detection and counting in a given image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet101', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet101', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet101', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet101", "top-1_error": 22.63, "top-5_error": 6.44}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Identify an API that can classify images and works with spiking neural networks.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP Tiny model from PyTorch Hub, which utilizes spiking neural networks (SNNs) for image classification.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_t', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_t', pretrained=True)", "api_arguments": [{"name": "snnmlp_t", "type": "str", "description": "SNNMLP Tiny model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Tiny", "top-1": 81.88}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: What is an efficient API that can be used to categorize images and has a much lighter model with fewer parameters than AlexNet?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SqueezeNet 1.1 model from PyTorch Hub for efficient image classification. It achieves AlexNet-level accuracy but has 50x fewer parameters.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SqueezeNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "api_arguments": {"version": "v0.10.0", "model": ["squeezenet1_1"], "pretrained": "True"}, "python_environment_requirements": {"torch": ">=1.9.0", "torchvision": ">=0.10.0"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)", "print(probabilities)"], "performance": {"dataset": "ImageNet", "accuracy": {"squeezenet1_1": {"Top-1 error": 41.81, "Top-5 error": 19.38}}}, "description": "SqueezeNet is an image classification model that achieves AlexNet-level accuracy with 50x fewer parameters. It has two versions: squeezenet1_0 and squeezenet1_1, with squeezenet1_1 having 2.4x less computation and slightly fewer parameters than squeezenet1_0, without sacrificing accuracy."}}
{"code": "###Instruction: Find me an API which will help identifying animals in a given image.\n###Output: {'domain': 'Image Recognition', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG11 model with batch normalization from PyTorch Hub for image recognition, which can be fine-tuned to identify animals in an image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "api_arguments": [{"name": "vgg11_bn", "type": "str", "description": "VGG11 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11_bn": {"Top-1 error": 26.7, "Top-5 error": 8.58}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: My company is building a chatbot for a car dealership and we need a machine learning model that can classify cars from images. Can you suggest one?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg13', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load a pretrained PyTorch VGG13 model, which can be fine-tuned to classify car models from images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg13', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13', pretrained=True)", "api_arguments": [{"name": "vgg13", "type": "str", "description": "VGG13 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg13', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg13": {"Top-1 error": 30.07, "Top-5 error": 10.75}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn,vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: A wildlife conservationist wants to classify animals in their natural habitat with a high accuracy. Recommend an API that can assist in this task.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load('pytorch/vision', 'wide_resnet101_2', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained Wide ResNet-101-2 model in PyTorch Hub for classifying animals with high accuracy.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet101_2', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet101_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet101_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet101_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet101_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet101_2": {"Top-1 error": 21.16, "Top-5 error": 5.72}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: A software engineer working at a computer vision company is looking for a model that can classify images efficiently on NVIDIA GPUs. Provide an API recommendation.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained GPUNet model from NVIDIA, designed and optimized for NVIDIA GPUs and TensorRT performance. It can classify images into different categories.', 'code': 'import torch\nmodel_type = \"GPUNet-0\"\nprecision = \"fp32\"\nmodel = torch.hub.load(\"NVIDIA/DeepLearningExamples:torchhub\", \"nvidia_gpunet\", pretrained=True, model_type=model_type, model_math=precision)'}\n", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "GPUNet Networks", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)", "api_arguments": {"repository": "NVIDIA/DeepLearningExamples:torchhub", "model": "nvidia_gpunet", "pretrained": "True", "model_type": "GPUNet-0", "model_math": "fp32"}, "python_environment_requirements": ["torch", "validators", "matplotlib", "timm==0.5.4"], "example_code": ["import torch", "model_type = 'GPUNet-0'", "precision = 'fp32'", "gpunet = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "gpunet.to(device)", "gpunet.eval()"], "performance": {"dataset": "IMAGENET", "description": "GPUNet demonstrates state-of-the-art inference performance up to 2x faster than EfficientNet-X and FBNet-V3."}, "description": "GPUNet is a family of Convolutional Neural Networks designed by NVIDIA using novel Neural Architecture Search (NAS) methods. They are optimized for NVIDIA GPU and TensorRT performance. GPUNet models are pretrained on the IMAGENET dataset and are capable of classifying images into different categories. The models are provided by the NVIDIA Deep Learning Examples library."}}
{"code": "###Instruction: Recommend an API to translate an English ebook to French.\n###Output: {'domain': 'Translation', 'api_call': 'model = torch.hub.load('pytorch/fairseq', 'transformer.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt')', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Transformer (NMT) model from PyTorch Hub that supports English-French translation.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/fairseq', 'transformer.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt')'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/fairseq')", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "Neural Machine Translation", "api_name": "Transformer (NMT)", "api_call": "torch.hub.load(repo_or_dir='pytorch/fairseq')", "api_arguments": ["model_name", "tokenizer", "bpe", "beam", "sampling", "sampling_topk"], "python_environment_requirements": ["bitarray", "fastBPE", "hydra-core", "omegaconf", "regex", "requests", "sacremoses", "subword_nmt"], "example_code": "import torch\n\nen2fr = torch.hub.load('pytorch/fairseq', 'transformer.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt')\n\nen2fr.cuda()\n\nfr = en2fr.translate('Hello world!', beam=5)\nassert fr == 'Bonjour \u00e0 tous !'", "performance": {"dataset": [{"name": "WMT'14", "accuracy": "Not provided"}, {"name": "WMT'18", "accuracy": "Not provided"}, {"name": "WMT'19", "accuracy": "Not provided"}]}, "description": "Transformer (NMT) is a powerful sequence-to-sequence modeling architecture that produces state-of-the-art neural machine translation systems. It is based on the paper 'Attention Is All You Need' and has been improved using techniques such as large-scale semi-supervised training, back-translation, and noisy-channel reranking. It supports English-French and English-German translation as well as round-trip translation for paraphrasing."}}
{"code": "###Instruction: In an attempt to streamline content moderation, Facebook is implementing an AI-enabled tool to identify potentially inappropriate images. Suggest an API that can recognize objects within an image.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SSD (Single Shot MultiBox Detector) model from PyTorch Hub to detect and recognize objects within images, which can help in content moderation.', 'code': \"import torch\nmodel = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_ssd', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "Single Shot MultiBox Detector", "api_name": "SSD", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_ssd', pretrained=True)", "api_arguments": ["model_name"], "python_environment_requirements": ["numpy", "scipy", "scikit-image", "matplotlib"], "example_code": "import torch\n\nssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')\nutils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')\n\nssd_model.to('cuda')\nssd_model.eval()\n\ninputs = [utils.prepare_input(uri) for uri in uris]\ntensor = utils.prepare_tensor(inputs)\n\nwith torch.no_grad():\n detections_batch = ssd_model(tensor)\n\nresults_per_input = utils.decode_results(detections_batch)\nbest_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input]", "performance": {"dataset": {"name": "COCO", "accuracy": "Not provided"}}, "description": "The SSD (Single Shot MultiBox Detector) model is an object detection model based on the paper 'SSD: Single Shot MultiBox Detector'. It uses a deep neural network for detecting objects in images. This implementation replaces the obsolete VGG model backbone with the more modern ResNet-50 model. The SSD model is trained on the COCO dataset and can be used to detect objects in images with high accuracy and efficiency."}}
{"code": "###Instruction: The weatherman needs an AI which could read out the daily weather information. Tell me an API that generates spoken weather information from a written weather forecast.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load(\"NVIDIA/DeepLearningExamples:torchhub\", \"nvidia_tacotron2\", model_math=\"fp16\")', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Tacotron 2 model from PyTorch Hub, which can convert written text, such as weather forecasts, into spoken speech.', 'code': 'import torch\nmodel = torch.hub.load(\"NVIDIA/DeepLearningExamples:torchhub\", \"nvidia_tacotron2\", model_math=\"fp16\")'}\n", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Text-to-Speech", "api_name": "Tacotron 2", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "api_arguments": {"model_math": "fp16"}, "python_environment_requirements": ["numpy", "scipy", "librosa", "unidecode", "inflect", "libsndfile1"], "example_code": ["import torch", "tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')", "tacotron2 = tacotron2.to('cuda')", "tacotron2.eval()", "text = 'Hello world, I missed you so much.'", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')", "sequences, lengths = utils.prepare_input_sequence([text])", "with torch.no_grad():", " mel, _, _ = tacotron2.infer(sequences, lengths)", " audio = waveglow.infer(mel)", "audio_numpy = audio[0].data.cpu().numpy()", "rate = 22050"], "performance": {"dataset": "LJ Speech", "accuracy": "Not specified"}, "description": "The Tacotron 2 model generates mel spectrograms from input text using an encoder-decoder architecture, and it is designed for generating natural-sounding speech from raw transcripts without any additional prosody information. This implementation uses Dropout instead of Zoneout to regularize the LSTM layers. The WaveGlow model (also available via torch.hub) is a flow-based model that consumes the mel spectrograms to generate speech."}}
{"code": "###Instruction: A developer needs to classify images using a model that does not require additional tricks for high accuracy. Recommend an API with a high top-1 accuracy without using any tricks.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model for image classification from PyTorch Hub. It achieves high accuracy without tricks like architecture modification, extra training data, or mixup/cutmix training.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "api_arguments": {"model_name": "mealv2_mobilenetv3_small_075"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ MobileNet V3-Small 0.75", "resolution": "224", "parameters": "2.04M", "top1": "67.60", "top5": "87.23"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I need an API that can help me identify the type of a cucumber. It should be able to tell me whether it's pickling, slicing, or burpless cucumber.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model with EfficientNet-B0 architecture from PyTorch Hub. Fine-tune the model for identifying the type of a cucumber as pickling, slicing or burpless.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "api_arguments": {"model_name": "mealv2_efficientnet_b0"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ EfficientNet-B0", "resolution": "224", "parameters": "5.29M", "top1": "78.29", "top5": "93.95"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I need to develop a self-driving car which can simultaneously recognize objects, drivable areas, and lanes. Recommend me an API to handle these tasks.\n###Output: {'domain': 'Autonomous Driving', 'api_call': \"model = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained YOLOP model from PyTorch Hub, which efficiently handles object detection, drivable area segmentation, and lane detection tasks for autonomous driving applications.', 'code': \"import torch\nmodel = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='hustvl/yolop', model='yolop', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "Object Detection, Drivable Area Segmentation, Lane Detection", "api_name": "YOLOP", "api_call": "torch.hub.load(repo_or_dir='hustvl/yolop', model='yolop', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "pip install -qr https://github.com/hustvl/YOLOP/blob/main/requirements.txt", "example_code": "import torch\nmodel = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\nimg = torch.randn(1,3,640,640)\ndet_out, da_seg_out,ll_seg_out = model(img)", "performance": {"dataset": "BDD100K", "accuracy": {"Object Detection": {"Recall(%)": 89.2, "mAP50(%)": 76.5, "Speed(fps)": 41}, "Drivable Area Segmentation": {"mIOU(%)": 91.5, "Speed(fps)": 41}, "Lane Detection": {"mIOU(%)": 70.5, "IOU(%)": 26.2}}}, "description": "YOLOP is an efficient multi-task network that can jointly handle three crucial tasks in autonomous driving: object detection, drivable area segmentation and lane detection. And it is also the first to reach real-time on embedded devices while maintaining state-of-the-art level performance on the BDD100K dataset."}}
{"code": "###Instruction: I'd like to detect voice activity in an audio file. What API can help me perform this task?\n###Output: {'domain': 'Voice Activity Detection', 'api_call': \"model = torch.hub.load('snakers4/silero-vad', 'silero_vad', force_reload=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use Silero Voice Activity Detector API to detect voice activity in an audio file. This is a pretrained enterprise-grade VAD optimized for performance on 1 CPU thread and quantized.', 'code': 'import torch\nmodel, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)'}", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Voice Activity Detection", "api_name": "Silero Voice Activity Detector", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)", "api_arguments": {"repo_or_dir": "snakers4/silero-vad", "model": "silero_vad", "force_reload": "True"}, "python_environment_requirements": {"torchaudio": "pip install -q torchaudio"}, "example_code": {"import": ["import torch", "torch.set_num_threads(1)", "from IPython.display import Audio", "from pprint import pprint"], "download_example": "torch.hub.download_url_to_file('https://models.silero.ai/vad_models/en.wav', 'en_example.wav')", "load_model": "model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)", "load_utils": "(get_speech_timestamps, _, read_audio, _) = utils", "set_sampling_rate": "sampling_rate = 16000", "read_audio": "wav = read_audio('en_example.wav', sampling_rate=sampling_rate)", "get_speech_timestamps": "speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=sampling_rate)", "print_speech_timestamps": "pprint(speech_timestamps)"}, "performance": {"dataset": "", "accuracy": ""}, "description": "Silero VAD is a pre-trained enterprise-grade Voice Activity Detector (VAD) that aims to provide a high-quality and modern alternative to the WebRTC Voice Activity Detector. The model is optimized for performance on 1 CPU thread and is quantized."}}
{"code": "###Instruction: We wish to create an app to make coloring books from images. Recommend an API to extract the regions that should be colored.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': \"model = torch.hub.load('pytorch/vision', 'deeplabv3_mobilenet_v3_large', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DeepLabV3 model with MobileNet-v3 backbone from PyTorch Hub to extract regions for coloring in images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'deeplabv3_mobilenet_v3_large', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_mobilenet_v3_large', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "DeepLabV3", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_mobilenet_v3_large', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": "latest", "torchvision": "latest", "PIL": "latest", "matplotlib": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_mobilenet_v3_large', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert('RGB')", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": [{"model": "deeplabv3_mobilenet_v3_large", "Mean IOU": 60.3, "Global Pixelwise Accuracy": 91.4}]}, "description": "DeepLabV3 models with ResNet-50, ResNet-101 and MobileNet-V3 backbones for semantic segmentation. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: Imagine you were given a set of images and you need to tell what objects are on the pictures. Indicate an API that can classify the objects in the images.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_se_resnext101_32x4d', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SE-ResNeXt101-32x4d model from PyTorch Hub for image classification of objects in the given images.', 'code': 'import torch\nmodel = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_se_resnext101_32x4d', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_se_resnext101_32x4d', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SE-ResNeXt101", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_se_resnext101_32x4d', pretrained=True)", "api_arguments": "N/A", "python_environment_requirements": "validators, matplotlib", "example_code": "import torch\nfrom PIL import Image\nimport torchvision.transforms as transforms\nimport numpy as np\nimport json\nimport requests\nimport matplotlib.pyplot as plt\nimport warnings\nwarnings.filterwarnings('ignore')\n%matplotlib inline\ndevice = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')\nprint(f'Using {device} for inference')\nresneXt = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_se_resnext101_32x4d')\nutils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')\nresneXt.eval().to(device)\nuris = ['http://images.cocodataset.org/test-stuff2017/000000024309.jpg','http://images.cocodataset.org/test-stuff2017/000000028117.jpg','http://images.cocodataset.org/test-stuff2017/000000006149.jpg','http://images.cocodataset.org/test-stuff2017/000000004954.jpg']\nbatch = torch.cat([utils.prepare_input_from_uri(uri) for uri in uris]).to(device)\nwith torch.no_grad():\n output = torch.nn.functional.softmax(resneXt(batch), dim=1)\nresults = utils.pick_n_best(predictions=output, n=5)\nfor uri, result in zip(uris, results):\n img = Image.open(requests.get(uri, stream=True).raw)\n img.thumbnail((256,256), Image.ANTIALIAS)\n plt.imshow(img)\n plt.show()\n print(result)", "performance": {"dataset": "IMAGENET", "accuracy": "N/A"}, "description": "The SE-ResNeXt101-32x4d is a ResNeXt101-32x4d model with added Squeeze-and-Excitation module. This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures, which allows researchers to get results 3x faster than training without Tensor Cores while experiencing the benefits of mixed precision training."}}
{"code": "###Instruction: My friend recommended the Densenet-201 model to classify images. Find an API that I can use for this model.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'densenet201', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pre-trained Densenet-201 model from PyTorch Hub, which is suitable for image classification tasks.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"densenet201\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet201', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Dense Convolutional Network", "api_name": "Densenet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet201', pretrained=True)", "api_arguments": [{"name": "densenet201", "type": "str", "description": "Densenet-201 model"}], "python_environment_requirements": {"torch": "latest", "torchvision": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet201', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"densenet201": {"Top-1 error": 22.8, "Top-5 error": 6.43}}}, "description": "Dense Convolutional Network (DenseNet) connects each layer to every other layer in a feed-forward fashion. It alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters."}}
{"code": "###Instruction: Provide me with an API that can segment objects within an image into separate categories.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': 'model = torch.hub.load('pytorch/vision', 'fcn_resnet50', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained FCN-ResNet50, a Fully-Convolutional Network, from PyTorch Hub to segment objects within an image into separate categories.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'fcn_resnet50', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "Fully-Convolutional Network", "api_name": "fcn_resnet50", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet50', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "boolean", "description": "If True, returns a model pre-trained on COCO train2017"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "matplotlib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet50', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert(\"RGB\")", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": {"fcn_resnet50": {"Mean IOU": 60.5, "Global Pixelwise Accuracy": 91.4}}}, "description": "FCN-ResNet is a Fully-Convolutional Network model using a ResNet-50 or a ResNet-101 backbone. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: Looking for a fast and efficient image classification API to suit my low-end device. What would you recommend?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained HarDNet-85 model from PyTorch Hub, which is a fast and efficient low memory traffic CNN suitable for low-end devices.', 'code': 'import torch\nmodel = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet85', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "HarDNet", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet85', pretrained=True)", "api_arguments": [{"name": "hardnet85", "type": "str", "description": "HarDNet-85 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"hardnet85": {"Top-1 error": 21.96, "Top-5 error": 6.11}}}, "description": "Harmonic DenseNet (HarDNet) is a low memory traffic CNN model, which is fast and efficient. The basic concept is to minimize both computational cost and memory access cost at the same time, such that the HarDNet models are 35% faster than ResNet running on GPU comparing to models with the same accuracy (except the two DS models that were designed for comparing with MobileNet)."}}
{"code": "###Instruction: I need a model that can help identify which domain an image belongs to, such as artistic style or natural scenery. Recommend me an API that can do this.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNeXt-101-IBN-a model from PyTorch Hub, which is designed for domain/appearance invariance and can be used to identify image domains, such as artistic styles or natural scenery.', 'code': 'import torch\nmodel = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnext101_ibn_a", "type": "str", "description": "ResNeXt-101-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnext101_ibn_a": {"Top-1 acc": 79.12, "Top-5 acc": 94.58}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: I want to know which dog breed a given image belongs to. Tell me an API that is capable of identifying dog breeds.\n\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained ResNeSt model from PyTorch Hub, which can be fine-tuned for dog breed classification given an input image of a dog.\", 'code': 'import torch\nmodel = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNeSt", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest50', pretrained=True)", "api_arguments": "resnest50", "python_environment_requirements": {"torch": "1.0.0", "torchvision": "0.2.2"}, "example_code": ["import torch", "model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeSt-50": 81.03}}, "description": "ResNeSt models are from the ResNeSt: Split-Attention Networks paper. They are a new ResNet variant that enables attention across feature-map groups. By stacking Split-Attention blocks ResNet-style, ResNeSt models outperform other networks with similar model complexities, and also help downstream tasks including object detection, instance segmentation, and semantic segmentation."}}
{"code": "###Instruction: I need to classify images into various categories based on their content. Can you suggest an API that can do this?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest200', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained ResNeSt model from PyTorch Hub to classify images into various categories based on their content.\", 'code': 'import torch\nmodel = torch.hub.load(\"zhanghang1989/ResNeSt\", \"resnest200\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest200', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNeSt", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest200', pretrained=True)", "api_arguments": "resnest200", "python_environment_requirements": {"torch": "1.0.0", "torchvision": "0.2.2"}, "example_code": ["import torch", "model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest200', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeSt-200": 83.84}}, "description": "ResNeSt models are from the ResNeSt: Split-Attention Networks paper. They are a new ResNet variant that enables attention across feature-map groups. By stacking Split-Attention blocks ResNet-style, ResNeSt models outperform other networks with similar model complexities, and also help downstream tasks including object detection, instance segmentation, and semantic segmentation."}}
{"code": "###Instruction: Recommend an API to automatically fine-tune a neural network's architecture for optimal performance on a specific graphics processing unit (GPU) platform.\n###Output: {'domain': 'Neural Architecture Search', 'api_call': \"model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the ProxylessNAS model from PyTorch Hub, optimized for GPU platforms. This model specializes Convolutional Neural Network (CNN) architectures for GPU platforms, offering improved performance and accuracy.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "api_arguments": [{"name": "proxylessnas_gpu", "type": "str", "description": "ProxylessNAS optimized for GPU"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_gpu'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_gpu", "accuracy": 75.1}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: A software engineer is trying to determine if an image contains a dog, cat or a horse. Identify an API that could be fine-tuned to achieve the objective.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet152', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained ResNet152 model from PyTorch Hub, which can be fine-tuned for a specific classification task, such as identifying dogs, cats, and horses in an image.\", 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet152', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet152', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet152', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet152', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet152", "top-1_error": 21.69, "top-5_error": 5.94}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Can you suggest me an AI model that can classify images with 50x fewer parameters than AlexNet and better performance on a robotics project I'm working on?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SqueezeNet 1.1 model from PyTorch Hub, which has 50x fewer parameters than AlexNet and better performance for image classification tasks.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SqueezeNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "api_arguments": {"version": "v0.10.0", "model": ["squeezenet1_1"], "pretrained": "True"}, "python_environment_requirements": {"torch": ">=1.9.0", "torchvision": ">=0.10.0"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)", "print(probabilities)"], "performance": {"dataset": "ImageNet", "accuracy": {"squeezenet1_1": {"Top-1 error": 41.81, "Top-5 error": 19.38}}}, "description": "SqueezeNet is an image classification model that achieves AlexNet-level accuracy with 50x fewer parameters. It has two versions: squeezenet1_0 and squeezenet1_1, with squeezenet1_1 having 2.4x less computation and slightly fewer parameters than squeezenet1_0, without sacrificing accuracy."}}
{"code": "###Instruction: Recommend a way to recognize decorative and architectural elements in architectural design images using a pre-trained network.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Utilize the pre-trained VGG11 network to classify architectural design images and recognize decorative and architectural elements.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "api_arguments": [{"name": "vgg11", "type": "str", "description": "VGG11 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11": {"Top-1 error": 30.98, "Top-5 error": 11.37}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: Can you suggest an API that can automatically classify images for me?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg19_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG19 with batch normalization model from PyTorch Hub, which can classify images into multiple categories.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg19_bn', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19_bn', pretrained=True)", "api_arguments": [{"name": "vgg19_bn", "type": "str", "description": "VGG19 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg19_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg19_bn": {"Top-1 error": 25.76, "Top-5 error": 8.15}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: Suggest an API for classifying dog breeds given an image of a dog.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Wide ResNet50_2 model from PyTorch Hub, which can be fine-tuned for dog breed classification given a dog image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet50_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet50_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet50_2": {"Top-1 error": 21.49, "Top-5 error": 5.91}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: Suggest an API designed for NVIDIA GPU and TensorRT performance optimization to classify images into different categories.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained GPUNet model from NVIDIA Deep Learning Examples library on PyTorch Hub, optimized for NVIDIA GPU and TensorRT performance, to classify images into different categories.', 'code': 'import torch\nmodel_type = \"GPUNet-0\"\nprecision = \"fp32\"\nmodel = torch.hub.load(\"NVIDIA/DeepLearningExamples:torchhub\", \"nvidia_gpunet\", \n                        pretrained=True, model_type=model_type, model_math=precision)'}\n", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "GPUNet Networks", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)", "api_arguments": {"repository": "NVIDIA/DeepLearningExamples:torchhub", "model": "nvidia_gpunet", "pretrained": "True", "model_type": "GPUNet-0", "model_math": "fp32"}, "python_environment_requirements": ["torch", "validators", "matplotlib", "timm==0.5.4"], "example_code": ["import torch", "model_type = 'GPUNet-0'", "precision = 'fp32'", "gpunet = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_gpunet', pretrained=True, model_type=model_type, model_math=precision)", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "gpunet.to(device)", "gpunet.eval()"], "performance": {"dataset": "IMAGENET", "description": "GPUNet demonstrates state-of-the-art inference performance up to 2x faster than EfficientNet-X and FBNet-V3."}, "description": "GPUNet is a family of Convolutional Neural Networks designed by NVIDIA using novel Neural Architecture Search (NAS) methods. They are optimized for NVIDIA GPU and TensorRT performance. GPUNet models are pretrained on the IMAGENET dataset and are capable of classifying images into different categories. The models are provided by the NVIDIA Deep Learning Examples library."}}
{"code": "###Instruction: Translate the given English text to French using machine learning API.\n###Input: {\"text\": \"I like playing basketball.\"}\n###Output: {\"result\": \"J'aime jouer au basketball.\"}", "api_call": "torch.hub.load(repo_or_dir='pytorch/fairseq')", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "Neural Machine Translation", "api_name": "Transformer (NMT)", "api_call": "torch.hub.load(repo_or_dir='pytorch/fairseq')", "api_arguments": ["model_name", "tokenizer", "bpe", "beam", "sampling", "sampling_topk"], "python_environment_requirements": ["bitarray", "fastBPE", "hydra-core", "omegaconf", "regex", "requests", "sacremoses", "subword_nmt"], "example_code": "import torch\n\nen2fr = torch.hub.load('pytorch/fairseq', 'transformer.wmt14.en-fr', tokenizer='moses', bpe='subword_nmt')\n\nen2fr.cuda()\n\nfr = en2fr.translate('Hello world!', beam=5)\nassert fr == 'Bonjour \u00e0 tous !'", "performance": {"dataset": [{"name": "WMT'14", "accuracy": "Not provided"}, {"name": "WMT'18", "accuracy": "Not provided"}, {"name": "WMT'19", "accuracy": "Not provided"}]}, "description": "Transformer (NMT) is a powerful sequence-to-sequence modeling architecture that produces state-of-the-art neural machine translation systems. It is based on the paper 'Attention Is All You Need' and has been improved using techniques such as large-scale semi-supervised training, back-translation, and noisy-channel reranking. It supports English-French and English-German translation as well as round-trip translation for paraphrasing."}}
{"code": "###Instruction: Recommend an API to identify the breed of a dog from a picture input.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model from PyTorch Hub, which can be fine-tuned for dog breed identification tasks.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "api_arguments": {"model_name": "mealv2_resnest50_380x380"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ ResNet50", "resolution": "380", "parameters": "25.6M", "top1": "81.72", "top5": "95.81"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I want to build an image classifier to boost the accuracy of the Vanilla Resnet-50 model on ImageNet data without using any data augmentation tricks. What API should I use?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model from PyTorch Hub, which boosts the accuracy of the Vanilla ResNet-50 model without using data augmentation tricks or architecture modifications.', 'code': 'import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "api_arguments": {"model_name": "mealv2_efficientnet_b0"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ EfficientNet-B0", "resolution": "224", "parameters": "5.29M", "top1": "78.29", "top5": "93.95"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: Create a 3D reconstruction of a scene from only one image.\n###Output: {'domain': 'Image Depth Estimation', 'api_call':\"model = torch.hub.load('intel-isl/MiDaS', 'DPT_Hybrid', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained MiDaS model from PyTorch Hub to compute the relative depth of a scene from a single image, which can be used to create a 3D reconstruction.\", 'code': 'import torch\nmodel = torch.hub.load(\\'intel-isl/MiDaS\\', \\'DPT_Hybrid\\', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='DPT_Hybrid', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Computing relative depth from a single image", "api_name": "MiDaS", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='DPT_Hybrid', pretrained=True)", "api_arguments": {"repo_or_dir": "intel-isl/MiDaS", "model": "model_type"}, "python_environment_requirements": "pip install timm", "example_code": ["import cv2", "import torch", "import urllib.request", "import matplotlib.pyplot as plt", "url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "urllib.request.urlretrieve(url, filename)", "model_type = 'DPT_Large'", "midas = torch.hub.load('intel-isl/MiDaS', 'DPT_Hybrid')", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "midas.to(device)", "midas.eval()", "midas_transforms = torch.hub.load('intel-isl/MiDaS', 'transforms')", "if model_type == 'DPT_Large' or model_type == 'DPT_Hybrid':", " transform = midas_transforms.dpt_transform", "else:", " transform = midas_transforms.small_transform", "img = cv2.imread(filename)", "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)", "input_batch = transform(img).to(device)", "with torch.no_grad():", " prediction = midas(input_batch)", "prediction = torch.nn.functional.interpolate(", " prediction.unsqueeze(1),", " size=img.shape[:2],", " mode='bicubic',", " align_corners=False,", ").squeeze()", "output = prediction.cpu().numpy()", "plt.imshow(output)", "plt.show()"], "performance": {"dataset": "10 distinct datasets", "accuracy": "Multi-objective optimization"}, "description": "MiDaS computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs."}}
{"code": "###Instruction: A video editor is developing a software that will allow users to mute specific instruments in a song. Provide an API that can separate audio into multiple tracks.\n###Output: {'domain': 'Audio Separation', 'api_call': \"model = torch.hub.load('sigsep/open-unmix-pytorch', 'umxhq', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Open-Unmix model for music source separation from PyTorch Hub, which can separate pop music into vocals, drums, bass, and other instruments.', 'code': 'import torch\nmodel = torch.hub.load('sigsep/open-unmix-pytorch', 'umxhq', pretrained=True)'} \n", "api_call": "torch.hub.load(repo_or_dir='sigsep/open-unmix-pytorch', model='umxhq', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Audio Separation", "framework": "PyTorch", "functionality": "Music Source Separation", "api_name": "Open-Unmix", "api_call": "torch.hub.load(repo_or_dir='sigsep/open-unmix-pytorch', model='umxhq', pretrained=True)", "api_arguments": ["model_name"], "python_environment_requirements": ["PyTorch >=1.6.0", "torchaudio"], "example_code": ["import torch", "separator = torch.hub.load('sigsep/open-unmix-pytorch', 'umxhq')", "audio = torch.rand((1, 2, 100000))", "original_sample_rate = separator.sample_rate", "estimates = separator(audio)"], "performance": {"dataset": "MUSDB18", "accuracy": "N/A"}, "description": "Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments. The models were pre-trained on the freely available MUSDB18 dataset."}}
{"code": "###Instruction: I am working on a project where I need to convert a text document into an audio file. Can you suggest an API for text-to-speech conversion?\n###Output: {'domain': 'Text-To-Speech', 'api_call': \"model = torch.hub.load('snakers4/silero-models', 'silero_tts', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the Silero Text-To-Speech model from PyTorch Hub with pretrained weights for converting text into naturally sounding speech in multiple languages.', 'code': \"import torch\nlanguage = 'en'\nspeaker = 'lj_16khz'\ndevice = torch.device('cpu')\nmodel, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', language=language, speaker=speaker)\"}", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Text-To-Speech", "framework": "PyTorch", "functionality": "Text-To-Speech", "api_name": "Silero Text-To-Speech Models", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', pretrained=True)", "api_arguments": {"repo_or_dir": "snakers4/silero-models", "model": "silero_tts", "language": "language", "speaker": "speaker"}, "python_environment_requirements": ["pip install -q torchaudio omegaconf"], "example_code": "import torch\nlanguage = 'en'\nspeaker = 'lj_16khz'\ndevice = torch.device('cpu')\nmodel, symbols, sample_rate, example_text, apply_tts = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', language=language, speaker=speaker)\nmodel = model.to(device)\naudio = apply_tts(texts=[example_text], model=model, sample_rate=sample_rate, symbols=symbols, device=device)", "performance": {"dataset": [{"language": "Russian", "speakers": 6}, {"language": "English", "speakers": 1}, {"language": "German", "speakers": 1}, {"language": "Spanish", "speakers": 1}, {"language": "French", "speakers": 1}], "accuracy": "High throughput on slow hardware. Decent performance on one CPU thread"}, "description": "Silero Text-To-Speech models provide enterprise grade TTS in a compact form-factor for several commonly spoken languages. They offer one-line usage, naturally sounding speech, no GPU or training required, minimalism and lack of dependencies, a library of voices in many languages, support for 16kHz and 8kHz out of the box."}}
{"code": "###Instruction: Suggest an API for identifying objects in a picture taken at a city park.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': \"model = torch.hub.load('pytorch/vision', 'deeplabv3_resnet50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DeepLabV3 model with a ResNet-50 backbone from PyTorch Hub, which can be used for semantic segmentation to identify and label objects in a city park picture.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'deeplabv3_resnet50', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "DeepLabV3", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet50', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": "latest", "torchvision": "latest", "PIL": "latest", "matplotlib": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert('RGB')", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": [{"model": "deeplabv3_resnet50", "Mean IOU": 66.4, "Global Pixelwise Accuracy": 92.4}]}, "description": "DeepLabV3 models with ResNet-50, ResNet-101 and MobileNet-V3 backbones for semantic segmentation. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: I have an image and I need to detect the different objects in that image. Give me an API that can do this task.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': \"model = torch.hub.load('pytorch/vision', 'deeplabv3_resnet101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DeepLabV3 model with ResNet-101 backbone from PyTorch Hub, which can be used for semantic segmentation to detect different objects within an image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'deeplabv3_resnet101', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "DeepLabV3", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet101', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": "latest", "torchvision": "latest", "PIL": "latest", "matplotlib": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet101', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert('RGB')", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": [{"model": "deeplabv3_resnet101", "Mean IOU": 67.4, "Global Pixelwise Accuracy": 92.4}]}, "description": "DeepLabV3 models with ResNet-50, ResNet-101 and MobileNet-V3 backbones for semantic segmentation. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: I want to create a new collection of clothing designs. Recommend an API that can generate unique images of clothing items.\n###Output: {'domain': 'Image Generation', 'api_call': \"model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Deep Convolutional Generative Adversarial Networks (DCGAN) model from PyTorch Hub, which can generate unique low-resolution images of clothing items.', 'code': 'import torch\nuse_gpu = True if torch.cuda.is_available() else False\nmodel = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)'}\n", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='DCGAN', pretrained=True, useGPU=use_gpu)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Generative Adversarial Networks", "api_name": "DCGAN", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='DCGAN', pretrained=True, useGPU=use_gpu)", "api_arguments": {"pretrained": "True", "useGPU": "use_gpu"}, "python_environment_requirements": "Python 3", "example_code": {"import": ["import torch", "import matplotlib.pyplot as plt", "import torchvision"], "use_gpu": "use_gpu = True if torch.cuda.is_available() else False", "load_model": "model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)", "build_noise_data": "noise, _ = model.buildNoiseData(num_images)", "generate_images": "with torch.no_grad(): generated_images = model.test(noise)", "plot_images": ["plt.imshow(torchvision.utils.make_grid(generated_images).permute(1, 2, 0).cpu().numpy())", "plt.show()"]}, "performance": {"dataset": "FashionGen", "accuracy": "N/A"}, "description": "DCGAN is a model designed in 2015 by Radford et. al. in the paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. It is a GAN architecture both very simple and efficient for low resolution image generation (up to 64x64)."}}
{"code": "###Instruction: I'm working on an image classification project where I need to identify the contents of an image. Can you suggest an API for that?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_se_resnext101_32x4d', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SE-ResNeXt101-32x4d model from PyTorch Hub for image classification tasks to identify the contents of an image.', 'code': \"import torch\nmodel = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_se_resnext101_32x4d', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_se_resnext101_32x4d', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SE-ResNeXt101", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_se_resnext101_32x4d', pretrained=True)", "api_arguments": "N/A", "python_environment_requirements": "validators, matplotlib", "example_code": "import torch\nfrom PIL import Image\nimport torchvision.transforms as transforms\nimport numpy as np\nimport json\nimport requests\nimport matplotlib.pyplot as plt\nimport warnings\nwarnings.filterwarnings('ignore')\n%matplotlib inline\ndevice = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')\nprint(f'Using {device} for inference')\nresneXt = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_se_resnext101_32x4d')\nutils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')\nresneXt.eval().to(device)\nuris = ['http://images.cocodataset.org/test-stuff2017/000000024309.jpg','http://images.cocodataset.org/test-stuff2017/000000028117.jpg','http://images.cocodataset.org/test-stuff2017/000000006149.jpg','http://images.cocodataset.org/test-stuff2017/000000004954.jpg']\nbatch = torch.cat([utils.prepare_input_from_uri(uri) for uri in uris]).to(device)\nwith torch.no_grad():\n output = torch.nn.functional.softmax(resneXt(batch), dim=1)\nresults = utils.pick_n_best(predictions=output, n=5)\nfor uri, result in zip(uris, results):\n img = Image.open(requests.get(uri, stream=True).raw)\n img.thumbnail((256,256), Image.ANTIALIAS)\n plt.imshow(img)\n plt.show()\n print(result)", "performance": {"dataset": "IMAGENET", "accuracy": "N/A"}, "description": "The SE-ResNeXt101-32x4d is a ResNeXt101-32x4d model with added Squeeze-and-Excitation module. This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures, which allows researchers to get results 3x faster than training without Tensor Cores while experiencing the benefits of mixed precision training."}}
{"code": "###Instruction: List an API that will allow me to input text that will be transformed into an audio file.\n###Output: {'domain': 'Text-to-Speech', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the WaveGlow API provided by NVIDIA for PyTorch, which can convert text into natural-sounding speech.', 'code': \"import torch\nwaveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_waveglow', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Text-to-Speech", "framework": "PyTorch", "functionality": "Speech Synthesis", "api_name": "WaveGlow", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_waveglow', pretrained=True)", "api_arguments": {"repo_or_dir": "NVIDIA/DeepLearningExamples:torchhub", "model": "nvidia_waveglow", "model_math": "fp32"}, "python_environment_requirements": ["numpy", "scipy", "librosa", "unidecode", "inflect", "libsndfile1"], "example_code": {"load_waveglow_model": "waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32')", "prepare_waveglow_model": ["waveglow = waveglow.remove_weightnorm(waveglow)", "waveglow = waveglow.to('cuda')", "waveglow.eval()"], "load_tacotron2_model": "tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp32')", "prepare_tacotron2_model": ["tacotron2 = tacotron2.to('cuda')", "tacotron2.eval()"], "synthesize_speech": ["text = \"hello world, I missed you so much\"", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')", "sequences, lengths = utils.prepare_input_sequence([text])", "with torch.no_grad():", " mel, _, _ = tacotron2.infer(sequences, lengths)", " audio = waveglow.infer(mel)", "audio_numpy = audio[0].data.cpu().numpy()", "rate = 22050"], "save_audio": "write(\"audio.wav\", rate, audio_numpy)", "play_audio": "Audio(audio_numpy, rate=rate)"}, "performance": {"dataset": "LJ Speech", "accuracy": null}, "description": "The Tacotron 2 and WaveGlow model form a text-to-speech system that enables users to synthesize natural-sounding speech from raw transcripts without any additional prosody information. The Tacotron 2 model produces mel spectrograms from input text using encoder-decoder architecture. WaveGlow is a flow-based model that consumes the mel spectrograms to generate speech."}}
{"code": "###Instruction: Find a model that is optimal for the task of person re-identification from a set of images.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load('XingangPan/IBN-Net', 'resnet50_ibn_a', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained IBN-Net ResNet50 model from PyTorch Hub, which is well-suited for person re-identification tasks.', 'code': 'import torch\nmodel = torch.hub.load('XingangPan/IBN-Net', 'resnet50_ibn_a', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet50_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet50_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnet50_ibn_a", "type": "str", "description": "ResNet-50-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnet50_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnet50_ibn_a": {"Top-1 acc": 77.46, "Top-5 acc": 93.68}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: Query an API that carries out vehicle or person re-identification tasks accurately.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet-101-IBN-a model from PyTorch Hub, a CNN model designed with domain/appearance invariance for person/vehicle re-identification tasks.', 'code': 'import torch\nmodel = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet101_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet101_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnet101_ibn_a", "type": "str", "description": "ResNet-101-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnet101_ibn_a": {"Top-1 acc": 78.61, "Top-5 acc": 94.41}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: I need an image classification model that can classify objects in images with high accuracy. Suggest me an API.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained ResNet101 model from PyTorch Hub for image classification, as it offers high accuracy on the ImageNet dataset.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet101', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet101', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet101', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet101", "top-1_error": 22.63, "top-5_error": 6.44}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Help me find a way to classify different species of birds given an image from the Internet.\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pre-trained SNNMLP model from PyTorch Hub to classify birds from an input image.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "api_arguments": [{"name": "snnmlp_b", "type": "str", "description": "SNNMLP Base model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Base", "top-1": 85.59}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: Your pet store is building a new image classifier for the different types of pets. Tell me which API can identify the breeds given pet images.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG11 model for image classification from PyTorch Hub to identify breeds in pet images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "api_arguments": [{"name": "vgg11", "type": "str", "description": "VGG11 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11": {"Top-1 error": 30.98, "Top-5 error": 11.37}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: I want to recognize objects in an image. Can you find me an API that can do this?\n###Output: {'domain': 'Image Recognition', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG11 model with batch normalization from PyTorch Hub for object recognition in images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)\nmodel.eval()'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "api_arguments": [{"name": "vgg11_bn", "type": "str", "description": "VGG11 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11_bn": {"Top-1 error": 26.7, "Top-5 error": 8.58}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: I'm a photographer and I need to classify images according to their category. Write the code to use a machine learning API to achieve that.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Wide ResNet 50-2 model from PyTorch Hub to classify images into categories using advanced computer vision techniques.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)\nmodel.eval()'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet50_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet50_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet50_2": {"Top-1 error": 21.49, "Top-5 error": 5.91}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: I want to classify images accurately without latency. Help me find an API to do that.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'wide_resnet101_2', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Wide ResNet101_2 model from PyTorch Hub for efficient and accurate image classification.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet101_2', pretrained=True)\nmodel.eval()'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet101_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet101_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet101_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet101_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet101_2": {"Top-1 error": 21.16, "Top-5 error": 5.72}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: Imagine I am an app developer and need to build Instagram like app that can classify user's images for easy searching lateron. Please suggest a pre-trained AI API that can help me in my endeavors.\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the Once-for-all (OFA) Networks from MIT Han Lab with pretrained weights for image classification, ideal for an Instagram-like app that requires fast image categorization.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Once-for-all (OFA) Networks", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "api_arguments": {"repository": "mit-han-lab/once-for-all", "model": "ofa_supernet_mbv3_w10", "pretrained": "True"}, "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "super_net_name = 'ofa_supernet_mbv3_w10'", "super_net = torch.hub.load('mit-han-lab/once-for-all', super_net_name, pretrained=True).eval()"], "performance": {"description": "OFA networks outperform state-of-the-art NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5x faster than MobileNetV3, 2.6x faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and CO2 emission."}, "description": "Once-for-all (OFA) networks are a family of neural networks designed by MIT Han Lab. They decouple training and search, achieving efficient inference across various edge devices and resource constraints. OFA networks are pretrained on the IMAGENET dataset and are capable of classifying images into different categories."}}
{"code": "###Instruction: A retailer would like to better categorize images of products on their website. Provide a model API that can perform image classification.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_cutmix', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL_V2 model from PyTorch Hub to classify images of products with high accuracy.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_cutmix', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_cutmix', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_cutmix', pretrained=True)", "api_arguments": {"model_name": "mealv2_resnest50_cutmix"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 + CutMix w/ ResNet50", "resolution": "224", "parameters": "25.6M", "top1": "80.98", "top5": "95.35"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: Tesla wants to improve the back camera of their cars, and they are seeking an API for jointly handling object detection, drivable area segmentation, and lane detection. Provide a suitable API for this task.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained YOLOP model from PyTorch Hub, which can efficiently handle three essential tasks for autonomous driving: object detection, drivable area segmentation, and lane detection, while maintaining real-time performance on embedded devices.', 'code': \"import torch\nmodel = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='hustvl/yolop', model='yolop', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "Object Detection, Drivable Area Segmentation, Lane Detection", "api_name": "YOLOP", "api_call": "torch.hub.load(repo_or_dir='hustvl/yolop', model='yolop', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "pip install -qr https://github.com/hustvl/YOLOP/blob/main/requirements.txt", "example_code": "import torch\nmodel = torch.hub.load('hustvl/yolop', 'yolop', pretrained=True)\nimg = torch.randn(1,3,640,640)\ndet_out, da_seg_out,ll_seg_out = model(img)", "performance": {"dataset": "BDD100K", "accuracy": {"Object Detection": {"Recall(%)": 89.2, "mAP50(%)": 76.5, "Speed(fps)": 41}, "Drivable Area Segmentation": {"mIOU(%)": 91.5, "Speed(fps)": 41}, "Lane Detection": {"mIOU(%)": 70.5, "IOU(%)": 26.2}}}, "description": "YOLOP is an efficient multi-task network that can jointly handle three crucial tasks in autonomous driving: object detection, drivable area segmentation and lane detection. And it is also the first to reach real-time on embedded devices while maintaining state-of-the-art level performance on the BDD100K dataset."}}
{"code": "###Instruction: I need a Python library for calculating relative depth from a single image. What do you suggest?\n###Output: {'domain': 'Depth Estimation', 'api_call': \"model = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Use the pretrained MiDaS model from PyTorch Hub, which can estimate relative depth from a single image across multiple use cases, from small high-speed models to very large, high accuracy models.\", 'code': 'import torch\nmodel = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Computing relative depth from a single image", "api_name": "MiDaS", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "api_arguments": {"repo_or_dir": "intel-isl/MiDaS", "model": "model_type"}, "python_environment_requirements": "pip install timm", "example_code": ["import cv2", "import torch", "import urllib.request", "import matplotlib.pyplot as plt", "url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "urllib.request.urlretrieve(url, filename)", "model_type = 'DPT_Large'", "midas = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small')", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "midas.to(device)", "midas.eval()", "midas_transforms = torch.hub.load('intel-isl/MiDaS', 'transforms')", "if model_type == 'DPT_Large' or model_type == 'DPT_Hybrid':", " transform = midas_transforms.dpt_transform", "else:", " transform = midas_transforms.small_transform", "img = cv2.imread(filename)", "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)", "input_batch = transform(img).to(device)", "with torch.no_grad():", " prediction = midas(input_batch)", "prediction = torch.nn.functional.interpolate(", " prediction.unsqueeze(1),", " size=img.shape[:2],", " mode='bicubic',", " align_corners=False,", ").squeeze()", "output = prediction.cpu().numpy()", "plt.imshow(output)", "plt.show()"], "performance": {"dataset": "10 distinct datasets", "accuracy": "Multi-objective optimization"}, "description": "MiDaS computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs."}}
{"code": "###Instruction: Tell me an API that I can use to classify images into different categories using a pre-trained ResNet50 model.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_resnet50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pre-trained ResNet50 v1.5 model from PyTorch Hub, which is designed for image classification tasks.', 'code': 'import torch\nresnet50 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNet50", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_resnet50', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": ["pip install validators matplotlib"], "example_code": ["import torch", "from PIL import Image", "import torchvision.transforms as transforms", "import numpy as np", "import json", "import requests", "import matplotlib.pyplot as plt", "import warnings", "warnings.filterwarnings('ignore')", "%matplotlib inline", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "print(f'Using {device} for inference')", "resnet50 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True)", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')", "resnet50.eval().to(device)", "uris = [...]", "batch = torch.cat([utils.prepare_input_from_uri(uri) for uri in uris]).to(device)", "with torch.no_grad():", " output = torch.nn.functional.softmax(resnet50(batch), dim=1)", "results = utils.pick_n_best(predictions=output, n=5)", "for uri, result in zip(uris, results):", " img = Image.open(requests.get(uri, stream=True).raw)", " img.thumbnail((256,256), Image.ANTIALIAS)", " plt.imshow(img)", " plt.show()", " print(result)"], "performance": {"dataset": "ImageNet", "accuracy": "~0.5% top1 improvement over ResNet50 v1"}, "description": "The ResNet50 v1.5 model is a modified version of the original ResNet50 v1 model. The difference between v1 and v1.5 is that, in the bottleneck blocks which requires downsampling, v1 has stride = 2 in the first 1x1 convolution, whereas v1.5 has stride = 2 in the 3x3 convolution. This difference makes ResNet50 v1.5 slightly more accurate (~0.5% top1) than v1, but comes with a small performance drawback (~5% imgs/sec). The model is initialized as described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures."}}
{"code": "###Instruction: I am developing an app for bird species classification. Suggest me an API that can identify bird species in images.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'densenet121', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DenseNet-121 model from PyTorch Hub, which can be fine-tuned for bird species classification.', 'code': \"import torch\nmodel = torch.hub.load('pytorch/vision', 'densenet121', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet121', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Dense Convolutional Network", "api_name": "Densenet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet121', pretrained=True)", "api_arguments": [{"name": "densenet121", "type": "str", "description": "Densenet-121 model"}], "python_environment_requirements": {"torch": "latest", "torchvision": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet121', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"densenet121": {"Top-1 error": 25.35, "Top-5 error": 7.83}}}, "description": "Dense Convolutional Network (DenseNet) connects each layer to every other layer in a feed-forward fashion. It alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters."}}
{"code": "###Instruction: I need to analyze aerial images of agricultural fields to identify specific crop types. Can you suggest an API for classifying the crops in the images?\n###Output: {'domain': 'Semantic Segmentation', 'api_call': 'model = torch.hub.load(\"pytorch/vision\", \"fcn_resnet50\", pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained FCN-Resnet50 model from PyTorch Hub, which can be fine-tuned for agricultural crop type segmentation in aerial images.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"fcn_resnet50\", pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "Fully-Convolutional Network", "api_name": "fcn_resnet50", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet50', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "boolean", "description": "If True, returns a model pre-trained on COCO train2017"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "matplotlib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet50', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert(\"RGB\")", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": {"fcn_resnet50": {"Mean IOU": 60.5, "Global Pixelwise Accuracy": 91.4}}}, "description": "FCN-ResNet is a Fully-Convolutional Network model using a ResNet-50 or a ResNet-101 backbone. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: Identify an API that can help me classify various objects in a given image efficiently and quickly.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained HarDNet-85 model from PyTorch Hub, which is a fast and efficient image classification model optimized for object classification in images.', 'code': 'import torch\nmodel = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet85', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "HarDNet", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet85', pretrained=True)", "api_arguments": [{"name": "hardnet85", "type": "str", "description": "HarDNet-85 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"hardnet85": {"Top-1 error": 21.96, "Top-5 error": 6.11}}}, "description": "Harmonic DenseNet (HarDNet) is a low memory traffic CNN model, which is fast and efficient. The basic concept is to minimize both computational cost and memory access cost at the same time, such that the HarDNet models are 35% faster than ResNet running on GPU comparing to models with the same accuracy (except the two DS models that were designed for comparing with MobileNet)."}}
{"code": "###Instruction: Find an API that allows me to classify pictures of animals with high accuracy.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNeXt-101-IBN-a model from PyTorch Hub for image classification. This model can be fine-tuned for classifying pictures of animals with high accuracy.', 'code': 'import torch\nmodel = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnext101_ibn_a", "type": "str", "description": "ResNeXt-101-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnext101_ibn_a": {"Top-1 acc": 79.12, "Top-5 acc": 94.58}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: An AI engineer is searching for an API capable of image classification. Please provide an SDK that uses a pre-trained model for image recognition tasks.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet152', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet152 model from PyTorch Hub for image classification tasks. This model is trained on ImageNet and can be used for various image recognition tasks.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet152', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet152', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet152', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet152', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet152", "top-1_error": 21.69, "top-5_error": 5.94}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Tell me an API that can predict the breed of a dog through its image.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SqueezeNet model from PyTorch Hub, which can be fine-tuned for predicting the breed of a dog from its image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SqueezeNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "api_arguments": {"version": "v0.10.0", "model": ["squeezenet1_1"], "pretrained": "True"}, "python_environment_requirements": {"torch": ">=1.9.0", "torchvision": ">=0.10.0"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)", "print(probabilities)"], "performance": {"dataset": "ImageNet", "accuracy": {"squeezenet1_1": {"Top-1 error": 41.81, "Top-5 error": 19.38}}}, "description": "SqueezeNet is an image classification model that achieves AlexNet-level accuracy with 50x fewer parameters. It has two versions: squeezenet1_0 and squeezenet1_1, with squeezenet1_1 having 2.4x less computation and slightly fewer parameters than squeezenet1_0, without sacrificing accuracy."}}
{"code": "###Instruction: A wildlife researcher wants to identify different bird species from a picture. Suggest a deep learning model that can help them achieve this.\n###Output: {'domain': 'Image Recognition', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the VGG11_bn model with batch normalization from PyTorch Hub, which can be fine-tuned for bird species identification.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "api_arguments": [{"name": "vgg11_bn", "type": "str", "description": "VGG11 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11_bn": {"Top-1 error": 26.7, "Top-5 error": 8.58}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: What type of model is best for recognizing multiple objects in images? \n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'VGG11, part of VGG-Nets, is a ConvNet model optimized for large-scale image recognition tasks and capable of recognizing multiple objects in images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "api_arguments": [{"name": "vgg11", "type": "str", "description": "VGG11 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11": {"Top-1 error": 30.98, "Top-5 error": 11.37}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: Find the species of an animal in a given photo using an API.\n###Input: \"zebra.jpg\"\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained wide_resnet50_2 model from PyTorch to classify the species of the animal in the given photo.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet50_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet50_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet50_2": {"Top-1 error": 21.49, "Top-5 error": 5.91}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: I need to classify images on different edge devices with various resource constraints. Suggest an API suitable for this task.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained Once-for-All (OFA) network from PyTorch Hub which is designed to handle efficient inference across various edge devices and resource constraints.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Once-for-all (OFA) Networks", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "api_arguments": {"repository": "mit-han-lab/once-for-all", "model": "ofa_supernet_mbv3_w10", "pretrained": "True"}, "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "super_net_name = 'ofa_supernet_mbv3_w10'", "super_net = torch.hub.load('mit-han-lab/once-for-all', super_net_name, pretrained=True).eval()"], "performance": {"description": "OFA networks outperform state-of-the-art NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5x faster than MobileNetV3, 2.6x faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and CO2 emission."}, "description": "Once-for-all (OFA) networks are a family of neural networks designed by MIT Han Lab. They decouple training and search, achieving efficient inference across various edge devices and resource constraints. OFA networks are pretrained on the IMAGENET dataset and are capable of classifying images into different categories."}}
{"code": "###Instruction: Provide an API for converting text to speech, since the marketing team needs to generate realistic voices for a series of advertisements.\n###Output: {'domain': 'Text-to-Speech', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')\", 'api_provider': 'PyTorch', 'explanation': 'Load the Tacotron 2 model from NVIDIA via PyTorch Hub for text-to-speech conversion to generate natural-sounding speech from text.', 'code': 'import torch\ntacotron2 = torch.hub.load(\"NVIDIA/DeepLearningExamples:torchhub\", \"nvidia_tacotron2\", model_math=\"fp16\")'}\n", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Text-to-Speech", "api_name": "Tacotron 2", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "api_arguments": {"model_math": "fp16"}, "python_environment_requirements": ["numpy", "scipy", "librosa", "unidecode", "inflect", "libsndfile1"], "example_code": ["import torch", "tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')", "tacotron2 = tacotron2.to('cuda')", "tacotron2.eval()", "text = 'Hello world, I missed you so much.'", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')", "sequences, lengths = utils.prepare_input_sequence([text])", "with torch.no_grad():", " mel, _, _ = tacotron2.infer(sequences, lengths)", " audio = waveglow.infer(mel)", "audio_numpy = audio[0].data.cpu().numpy()", "rate = 22050"], "performance": {"dataset": "LJ Speech", "accuracy": "Not specified"}, "description": "The Tacotron 2 model generates mel spectrograms from input text using an encoder-decoder architecture, and it is designed for generating natural-sounding speech from raw transcripts without any additional prosody information. This implementation uses Dropout instead of Zoneout to regularize the LSTM layers. The WaveGlow model (also available via torch.hub) is a flow-based model that consumes the mel spectrograms to generate speech."}}
{"code": "###Instruction: I need an API that helps classify images with the highest accuracy. Tell me an API that can achieve this.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'MEAL V2 is a state-of-the-art image classification model trained using ensemble knowledge distillation. Load the pretrained MEAL V2 model with MobileNet V3-Small 0.75 architecture from PyTorch Hub for high accuracy image classification.', 'code': 'import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "api_arguments": {"model_name": "mealv2_mobilenetv3_small_075"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ MobileNet V3-Small 0.75", "resolution": "224", "parameters": "2.04M", "top1": "67.60", "top5": "87.23"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: Pinterest wants to build a system that can categorize images uploaded by users. What API should they use for this task?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_efficientnet_b0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model for image classification from PyTorch Hub to help categorize images uploaded by users.', 'code': \"import torch\\nimport torchvision.transforms as transforms\\nfrom PIL import Image\\n\\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_efficientnet_b0', pretrained=True)\\nmodel.eval()\\n\\ninput_image = Image.open('input.jpg')\\npreprocess = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),])\\ninput_tensor = preprocess(input_image)\\ninput_batch = input_tensor.unsqueeze(0)\\n\\nif torch.cuda.is_available():\\n input_batch = input_batch.to('cuda')\\n model.to('cuda')\\n\\nwith torch.no_grad():\\n output = model(input_batch)\\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "api_arguments": {"model_name": "mealv2_efficientnet_b0"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ EfficientNet-B0", "resolution": "224", "parameters": "5.29M", "top1": "78.29", "top5": "93.95"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: Recommend me an API that can compute a depth map from a single input image.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('intel-isl/MiDaS', 'DPT_Hybrid', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the MiDaS model for computing relative depth from a single image in PyTorch. It provides several different models for various use cases, ranging from high-speed to high-accuracy.', 'code': \"import torch\nmodel = torch.hub.load('intel-isl/MiDaS', 'DPT_Hybrid', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='DPT_Hybrid', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Computing relative depth from a single image", "api_name": "MiDaS", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='DPT_Hybrid', pretrained=True)", "api_arguments": {"repo_or_dir": "intel-isl/MiDaS", "model": "model_type"}, "python_environment_requirements": "pip install timm", "example_code": ["import cv2", "import torch", "import urllib.request", "import matplotlib.pyplot as plt", "url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "urllib.request.urlretrieve(url, filename)", "model_type = 'DPT_Large'", "midas = torch.hub.load('intel-isl/MiDaS', 'DPT_Hybrid')", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "midas.to(device)", "midas.eval()", "midas_transforms = torch.hub.load('intel-isl/MiDaS', 'transforms')", "if model_type == 'DPT_Large' or model_type == 'DPT_Hybrid':", " transform = midas_transforms.dpt_transform", "else:", " transform = midas_transforms.small_transform", "img = cv2.imread(filename)", "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)", "input_batch = transform(img).to(device)", "with torch.no_grad():", " prediction = midas(input_batch)", "prediction = torch.nn.functional.interpolate(", " prediction.unsqueeze(1),", " size=img.shape[:2],", " mode='bicubic',", " align_corners=False,", ").squeeze()", "output = prediction.cpu().numpy()", "plt.imshow(output)", "plt.show()"], "performance": {"dataset": "10 distinct datasets", "accuracy": "Multi-objective optimization"}, "description": "MiDaS computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs."}}
{"code": "###Instruction: I am working on a project that involves bird image identification. Can you recommend an API that can classify bird species from images?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})\", 'api_provider': 'PyTorch', 'explanation': 'Load the NTS-Net pretrained model for bird species classification from PyTorch Hub, which is specifically trained on the CUB200-2011 dataset.', 'code': 'import torch\nmodel = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})'}\n", "api_call": "torch.hub.load(repo_or_dir='nicolalandro/ntsnet-cub200', model='ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Fine-grained image classifier", "api_name": "ntsnet", "api_call": "torch.hub.load(repo_or_dir='nicolalandro/ntsnet-cub200', model='ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})", "api_arguments": {"pretrained": "True", "topN": "6", "device": "cpu", "num_classes": "200"}, "python_environment_requirements": ["torch", "torchvision", "PIL"], "example_code": "from torchvision import transforms\nimport torch\nimport urllib\nfrom PIL import Image\n\ntransform_test = transforms.Compose([\n transforms.Resize((600, 600), Image.BILINEAR),\n transforms.CenterCrop((448, 448)),\n transforms.ToTensor(),\n transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),\n])\n\nmodel = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})\nmodel.eval()\n\nurl = 'https://raw.githubusercontent.com/nicolalandro/ntsnet-cub200/master/images/nts-net.png'\nimg = Image.open(urllib.request.urlopen(url))\nscaled_img = transform_test(img)\ntorch_images = scaled_img.unsqueeze(0)\n\nwith torch.no_grad():\n top_n_coordinates, concat_out, raw_logits, concat_logits, part_logits, top_n_index, top_n_prob = model(torch_images)\n\n_, predict = torch.max(concat_logits, 1)\npred_id = predict.item()\nprint('bird class:', model.bird_classes[pred_id])", "performance": {"dataset": "CUB200 2011", "accuracy": "Not provided"}, "description": "This is an nts-net pretrained with CUB200 2011 dataset, which is a fine-grained dataset of birds species."}}
{"code": "###Instruction: Suggest an object detection API that is suitable for implementing real-time applications like a security camera.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the YOLOv5 model from PyTorch Hub, which is well-suited for real-time object detection applications like security camera systems.', 'code': 'import torch\nmodel = torch.hub.load(\"ultralytics/yolov5\", \"yolov5s\", pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='ultralytics/yolov5', model='yolov5s', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "YOLOv5", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='ultralytics/yolov5', model='yolov5s', pretrained=True)", "api_arguments": ["'ultralytics/yolov5'", "'yolov5s'", "pretrained=True"], "python_environment_requirements": "Python>=3.8, PyTorch>=1.7", "example_code": ["import torch", "model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)", "imgs = ['https://ultralytics.com/images/zidane.jpg']", "results = model(imgs)", "results.print()", "results.save()", "results.xyxy[0]", "results.pandas().xyxy[0]"], "performance": {"dataset": "COCO", "accuracy": {"YOLOv5s6": {"mAPval0.5:0.95": 43.3, "mAPtest0.5:0.95": 43.3, "mAPval0.5": 61.9}, "YOLOv5m6": {"mAPval0.5:0.95": 50.5, "mAPtest0.5:0.95": 50.5, "mAPval0.5": 68.7}, "YOLOv5l6": {"mAPval0.5:0.95": 53.4, "mAPtest0.5:0.95": 53.4, "mAPval0.5": 71.1}, "YOLOv5x6": {"mAPval0.5:0.95": 54.4, "mAPtest0.5:0.95": 54.4, "mAPval0.5": 72.0}, "YOLOv5x6 TTA": {"mAPval0.5:0.95": 55.0, "mAPtest0.5:0.95": 55.0, "mAPval0.5": 72.0}}}, "description": "YOLOv5 is a family of compound-scaled object detection models trained on the COCO dataset, and includes simple functionality for Test Time Augmentation (TTA), model ensembling, hyperparameter evolution, and export to ONNX, CoreML and TFLite."}}
{"code": "###Instruction: A mobile application needs a machine learning model for object classification from various user images. Suggest an appropriate API for this task. \n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'densenet169', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DenseNet-169 model for image classification from PyTorch Hub to classify objects in user images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'densenet169', pretrained=True)\nmodel.eval()'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet169', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Dense Convolutional Network", "api_name": "Densenet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet169', pretrained=True)", "api_arguments": [{"name": "densenet169", "type": "str", "description": "Densenet-169 model"}], "python_environment_requirements": {"torch": "latest", "torchvision": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet169', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"densenet169": {"Top-1 error": 24.0, "Top-5 error": 7.0}}}, "description": "Dense Convolutional Network (DenseNet) connects each layer to every other layer in a feed-forward fashion. It alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters."}}
{"code": "###Instruction: I have a dataset with labeled images of clothing items from several fashion brands, and I want to classify them by brand. Which API can help me perform a classification task?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet-101-IBN-a model from PyTorch Hub. This model is suitable for cross-domain image classification tasks, perfect for classifying clothing items from various fashion brands.', 'code': 'import torch\nmodel = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet101_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet101_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnet101_ibn_a", "type": "str", "description": "ResNet-101-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnet101_ibn_a": {"Top-1 acc": 78.61, "Top-5 acc": 94.41}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: Retrieve an API capable of re-identifying vehicles across different cameras by using appearance invariance.\n###Output: {'domain': 'Vehicle Re-identification', 'api_call': \"model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained IBN-Net (Instance-Batch Normalization Network) from PyTorch Hub to perform vehicle re-identification across different cameras using appearance invariance.', 'code': 'import torch\nmodel = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnext101_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnext101_ibn_a", "type": "str", "description": "ResNeXt-101-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnext101_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnext101_ibn_a": {"Top-1 acc": 79.12, "Top-5 acc": 94.58}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: I want to classify some images using a state-of-the-art model. Can you provide me an API to help in this task?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNeSt-50 model from PyTorch Hub, a state-of-the-art image classification model that can help classify images effectively.', 'code': 'import torch\nmodel = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNeSt", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest50', pretrained=True)", "api_arguments": "resnest50", "python_environment_requirements": {"torch": "1.0.0", "torchvision": "0.2.2"}, "example_code": ["import torch", "model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeSt-50": 81.03}}, "description": "ResNeSt models are from the ResNeSt: Split-Attention Networks paper. They are a new ResNet variant that enables attention across feature-map groups. By stacking Split-Attention blocks ResNet-style, ResNeSt models outperform other networks with similar model complexities, and also help downstream tasks including object detection, instance segmentation, and semantic segmentation."}}
{"code": "###Instruction: Show me an API that can efficiently classify images on mobile platforms.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_mobile', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the ProxylessNAS mobile optimized model with pretrained weights from PyTorch Hub for efficient image classification on mobile platforms.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_mobile', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_mobile', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_mobile', pretrained=True)", "api_arguments": [{"name": "proxylessnas_mobile", "type": "str", "description": "ProxylessNAS optimized for Mobile"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_mobile'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_mobile", "accuracy": 74.6}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: We are developing an app that can guess the type of a picture. We need it to work on most platforms with almost the same efficiency. Give me an API that can do it.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ProxylessNAS model from PyTorch Hub, which is optimized for various hardware platforms and can perform image classification with similar efficiency.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "api_arguments": [{"name": "proxylessnas_gpu", "type": "str", "description": "ProxylessNAS optimized for GPU"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_gpu'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_gpu", "accuracy": 75.1}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: A company wants to develop a photo sharing app like Instagram. Recommend an API to recognize objects in the photos uploaded by users.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet model for object recognition and classification from PyTorch Hub, which can be used to recognize objects in photos uploaded by users.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet18', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet18', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet18", "top-1_error": 30.24, "top-5_error": 10.92}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Google Photos wants to create a way to classify images uploaded by users into different categories. Recommend an API for this purpose.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load(repo_or_dir='pytorch/vision', model='resnet50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet50 model from PyTorch Hub to classify images into different categories.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"resnet50\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet50', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet50", "top-1_error": 23.85, "top-5_error": 7.13}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Help me build a bird detection system. Recommend me an API that I can adapt for bird classification from photographs. \n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP model from PyTorch Hub, which can be fine-tuned for bird detection and classification from photographs.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "api_arguments": [{"name": "snnmlp_b", "type": "str", "description": "SNNMLP Base model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Base", "top-1": 85.59}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: I have an image with animals in it; I need to know the species. Can you suggest an image recognition API that can identify the species of animals in the given image?\n###Output: {'domain': 'Image Recognition', 'api_call': 'model = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG11 model with batch normalization from PyTorch Hub, which can be used for large-scale image recognition tasks like identifying the species of animals in a given image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "api_arguments": [{"name": "vgg11_bn", "type": "str", "description": "VGG11 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11_bn": {"Top-1 error": 26.7, "Top-5 error": 8.58}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: I want to create an AI tool that automates recognizing objects in an image. Recommend an API that can do this.\n###Output: {'domain': 'Image Recognition', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg13_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG13 model with batch normalization from PyTorch Hub to automate the recognition of objects in an image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg13_bn', pretrained=True)\nmodel.eval()'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13_bn', pretrained=True)", "api_arguments": [{"name": "vgg13_bn", "type": "str", "description": "VGG13 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg13_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg13_bn": {"Top-1 error": 28.45, "Top-5 error": 9.63}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: Is there any API that can identify plants from an image I provide?\n###Output: {'domain': 'Plant Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet-101 model from PyTorch Hub, which can be fine-tuned for plant identification using a dataset of plant images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet101', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg16', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg16', pretrained=True)", "api_arguments": [{"name": "vgg16", "type": "str", "description": "VGG16 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg16', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg16": {"Top-1 error": 28.41, "Top-5 error": 9.62}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: A mobile app developer needs an image classification API that can be used on a range of mobile devices without the need to adjust the model size. Recommend an API that fits this purpose.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Once-for-all (OFA) model for image classification from PyTorch Hub. The OFA network can efficiently adapt to different edge devices and resource constraints without adjusting the model size.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Once-for-all (OFA) Networks", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "api_arguments": {"repository": "mit-han-lab/once-for-all", "model": "ofa_supernet_mbv3_w10", "pretrained": "True"}, "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "super_net_name = 'ofa_supernet_mbv3_w10'", "super_net = torch.hub.load('mit-han-lab/once-for-all', super_net_name, pretrained=True).eval()"], "performance": {"description": "OFA networks outperform state-of-the-art NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5x faster than MobileNetV3, 2.6x faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and CO2 emission."}, "description": "Once-for-all (OFA) networks are a family of neural networks designed by MIT Han Lab. They decouple training and search, achieving efficient inference across various edge devices and resource constraints. OFA networks are pretrained on the IMAGENET dataset and are capable of classifying images into different categories."}}
{"code": "###Instruction: I'm building an image classification app to classify animals. Tell me an API that can classify an input image into a specific category.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model with EfficientNet-B0 architecture from PyTorch Hub for image classification, which can help classify input images into specific categories, such as animals.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "api_arguments": {"model_name": "mealv2_efficientnet_b0"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ EfficientNet-B0", "resolution": "224", "parameters": "5.29M", "top1": "78.29", "top5": "93.95"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I want to create a 3D visualization of a room using only a single image. How can I estimate the depth of the objects in the room from that image?\n###Output: {'domain': 'Depth Estimation', 'api_call': \"model = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MiDaS model from PyTorch Hub, which is designed to estimate relative inverse depth from a single image. Use the MiDaS_small model for a more efficient estimation.', 'code': 'import torch\nmodel = torch.hub.load(\"intel-isl/MiDaS\", \"MiDaS_small\", pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Computing relative depth from a single image", "api_name": "MiDaS", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "api_arguments": {"repo_or_dir": "intel-isl/MiDaS", "model": "model_type"}, "python_environment_requirements": "pip install timm", "example_code": ["import cv2", "import torch", "import urllib.request", "import matplotlib.pyplot as plt", "url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "urllib.request.urlretrieve(url, filename)", "model_type = 'DPT_Large'", "midas = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small')", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "midas.to(device)", "midas.eval()", "midas_transforms = torch.hub.load('intel-isl/MiDaS', 'transforms')", "if model_type == 'DPT_Large' or model_type == 'DPT_Hybrid':", " transform = midas_transforms.dpt_transform", "else:", " transform = midas_transforms.small_transform", "img = cv2.imread(filename)", "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)", "input_batch = transform(img).to(device)", "with torch.no_grad():", " prediction = midas(input_batch)", "prediction = torch.nn.functional.interpolate(", " prediction.unsqueeze(1),", " size=img.shape[:2],", " mode='bicubic',", " align_corners=False,", ").squeeze()", "output = prediction.cpu().numpy()", "plt.imshow(output)", "plt.show()"], "performance": {"dataset": "10 distinct datasets", "accuracy": "Multi-objective optimization"}, "description": "MiDaS computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs."}}
{"code": "###Instruction: Give me an API that can predict the category of an object given its image.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x32d_wsl', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNext WSL model for image classification from PyTorch Hub, which can categorize objects given their images.', 'code': 'import torch\nmodel = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x32d_wsl', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x32d_wsl', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNext WSL", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x32d_wsl', pretrained=True)", "api_arguments": [{"name": "resnext101_32x32d_wsl", "type": "str", "description": "ResNeXt-101 32x32d WSL model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x32d_wsl')", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(output[0])", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeXt-101 32x32d": {"Top-1 Acc.": "85.1", "Top-5 Acc.": "97.5"}}}, "description": "The provided ResNeXt models are pre-trained in weakly-supervised fashion on 940 million public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. The models significantly improve the training accuracy on ImageNet compared to training from scratch. They achieve state-of-the-art accuracy of 85.4% on ImageNet with the ResNext-101 32x48d model."}}
{"code": "###Instruction: Can you provide a GAN API that can generate high-quality 64x64 images for an apparel ecommerce company?\n###Output: {'domain': 'Image Generation', 'api_call': \"model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DCGAN model from PyTorch Hub for generating high-quality 64x64 images, suitable for an apparel ecommerce company.', 'code': 'import torch\nuse_gpu = True if torch.cuda.is_available() else False\nmodel = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)'}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='DCGAN', pretrained=True, useGPU=use_gpu)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Generative Adversarial Networks", "api_name": "DCGAN", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='DCGAN', pretrained=True, useGPU=use_gpu)", "api_arguments": {"pretrained": "True", "useGPU": "use_gpu"}, "python_environment_requirements": "Python 3", "example_code": {"import": ["import torch", "import matplotlib.pyplot as plt", "import torchvision"], "use_gpu": "use_gpu = True if torch.cuda.is_available() else False", "load_model": "model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)", "build_noise_data": "noise, _ = model.buildNoiseData(num_images)", "generate_images": "with torch.no_grad(): generated_images = model.test(noise)", "plot_images": ["plt.imshow(torchvision.utils.make_grid(generated_images).permute(1, 2, 0).cpu().numpy())", "plt.show()"]}, "performance": {"dataset": "FashionGen", "accuracy": "N/A"}, "description": "DCGAN is a model designed in 2015 by Radford et. al. in the paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. It is a GAN architecture both very simple and efficient for low resolution image generation (up to 64x64)."}}
{"code": "###Instruction: I am a city planner responsible for managing different areas of the city. Recommend an API that can segment roads, parks and buildings from a satellite image.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': \"model = torch.hub.load('pytorch/vision', 'fcn_resnet101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Fully-Convolutional Network (FCN) ResNet-101 model from PyTorch Hub, which can semantically segment different areas within a satellite image such as roads, parks, and buildings.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'fcn_resnet101', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "Fully-Convolutional Network", "api_name": "fcn_resnet101", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet101', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "boolean", "description": "If True, returns a model pre-trained on COCO train2017"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "matplotlib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet101', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert(\"RGB\")", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": {"fcn_resnet101": {"Mean IOU": 63.7, "Global Pixelwise Accuracy": 91.9}}}, "description": "FCN-ResNet is a Fully-Convolutional Network model using a ResNet-50 or a ResNet-101 backbone. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: Recommend an API that can be used for bird species recognition using pictures taken by a wildlife photographer.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNeSt-101 model from PyTorch Hub, which can be fine-tuned for bird species recognition using images taken by a wildlife photographer.', 'code': 'import torch\nmodel = torch.hub.load('zhanghang1989/ResNeSt', 'resnest101', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNeSt", "api_call": "torch.hub.load(repo_or_dir='zhanghang1989/ResNeSt', model='resnest101', pretrained=True)", "api_arguments": "resnest101", "python_environment_requirements": {"torch": "1.0.0", "torchvision": "0.2.2"}, "example_code": ["import torch", "model = torch.hub.load('zhanghang1989/ResNeSt', 'resnest101', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeSt-101": 82.83}}, "description": "ResNeSt models are from the ResNeSt: Split-Attention Networks paper. They are a new ResNet variant that enables attention across feature-map groups. By stacking Split-Attention blocks ResNet-style, ResNeSt models outperform other networks with similar model complexities, and also help downstream tasks including object detection, instance segmentation, and semantic segmentation."}}
{"code": "###Instruction: I am starting a startup that recommends clothing to users based on images of their outfits. What is a good API for this?\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('pytorch/vision', 'resnext101_32x4d', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNext model from PyTorch Hub which can be fine-tuned for clothing recommendations based on outfit images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnext101_32x4d', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnext101_32x4d', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNext", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnext101_32x4d', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": ["torch", "torchvision", "PIL"], "example_code": ["import torch", "from PIL import Image", "from torchvision import transforms", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnext101_32x4d', pretrained=True)", "model.eval()", "input_image = Image.open('dog.jpg')", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)"], "performance": {"dataset": "ImageNet", "resnext101_32x8d": {"top-1": 20.69, "top-5": 5.47}}, "description": "ResNext is a next-generation ResNet architecture for image classification. It is more efficient and accurate than the original ResNet. This implementation includes two versions of the model, resnext50_32x4d and resnext101_32x8d, with 50 and 101 layers respectively."}}
{"code": "###Instruction: Generate an API that performs image classification using a small model with low computational requirements.\n###Output: {'domain': 'Image Classification', 'api_call': \"torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP Small model from PyTorch Hub to perform image classification with low computational requirements.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_s', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_s', pretrained=True)", "api_arguments": [{"name": "snnmlp_s", "type": "str", "description": "SNNMLP Small model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Small", "top-1": 83.3}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: I need an efficient AI-based classifier to identify products on grocery store shelves. Suggest an appropriate API to implement this.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP model from PyTorch Hub, which is an efficient AI-based classifier for image recognition and can be fine-tuned to classify products on grocery store shelves.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_b', pretrained=True)", "api_arguments": [{"name": "snnmlp_b", "type": "str", "description": "SNNMLP Base model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_b', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Base", "top-1": 85.59}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: I want to perform image classification for optimizing the storage space of a database. Provide an API that enables this while maintaining accuracy.\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SqueezeNet model from PyTorch Hub for efficient image classification. SqueezeNet provides high accuracy with significantly fewer parameters, making it suitable for optimizing storage space.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'squeezenet1_1', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SqueezeNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='squeezenet1_1', pretrained=True)", "api_arguments": {"version": "v0.10.0", "model": ["squeezenet1_1"], "pretrained": "True"}, "python_environment_requirements": {"torch": ">=1.9.0", "torchvision": ">=0.10.0"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)", "print(probabilities)"], "performance": {"dataset": "ImageNet", "accuracy": {"squeezenet1_1": {"Top-1 error": 41.81, "Top-5 error": 19.38}}}, "description": "SqueezeNet is an image classification model that achieves AlexNet-level accuracy with 50x fewer parameters. It has two versions: squeezenet1_0 and squeezenet1_1, with squeezenet1_1 having 2.4x less computation and slightly fewer parameters than squeezenet1_0, without sacrificing accuracy."}}
{"code": "###Instruction: I am a content writer for Marvel Studios and I am trying to categorize certain images of the characters based on their similarity. Recommend an API that can classify an image of a Marvel character.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'shufflenet_v2_x1_0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ShuffleNet v2 model from PyTorch Hub for image classification. This model can be fine-tuned for Marvel character classification.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"shufflenet_v2_x1_0\", pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='shufflenet_v2_x1_0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ShuffleNet v2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='shufflenet_v2_x1_0', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": "torch", "torchvision": "torchvision", "PIL": "Image", "urllib": "urllib"}, "example_code": {"import_libraries": ["import torch", "from PIL import Image", "from torchvision import transforms", "import urllib"], "load_model": ["model = torch.hub.load('pytorch/vision:v0.10.0', 'shufflenet_v2_x1_0', pretrained=True)", "model.eval()"], "load_image": ["url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "try: urllib.URLopener().retrieve(url, filename)", "except: urllib.request.urlretrieve(url, filename)", "input_image = Image.open(filename)"], "preprocess_image": ["preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)"], "run_inference": ["if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)"], "get_probabilities": ["probabilities = torch.nn.functional.softmax(output[0], dim=0)"], "top_categories": ["top5_prob, top5_catid = torch.topk(probabilities, 5)", "for i in range(top5_prob.size(0)):", " print(categories[top5_catid[i]], top5_prob[i].item())"]}, "performance": {"dataset": "Imagenet", "accuracy": {"top-1_error": 30.64, "top-5_error": 11.68}}, "description": "ShuffleNet V2 is an efficient ConvNet optimized for speed and memory, pre-trained on Imagenet. It is designed based on practical guidelines for efficient network design, including speed and accuracy tradeoff."}}
{"code": "###Instruction: A digital artist needs an API that can recognize and classify images containing multiple objects. Which API would you suggest?\n###Output: {'domain': 'Image Recognition', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg19_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG19 model with batch normalization from PyTorch Hub for image recognition and classification tasks involving multiple objects.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg19_bn', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19_bn', pretrained=True)", "api_arguments": [{"name": "vgg19_bn", "type": "str", "description": "VGG19 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg19_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg19_bn": {"Top-1 error": 25.76, "Top-5 error": 8.15}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: Suggest an API for a wildlife conservation organization that could help them identify animals from images captured by their research cameras.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('facebookresearch/semi-supervised-ImageNet1K-models', 'resnet18_swsl', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained ResNet-18 model from 'Semi-supervised and semi-weakly supervised ImageNet Models' by Facebook Research, which can be fine-tuned for classifying animals in images captured by research cameras.\", 'code': 'import torch\nmodel = torch.hub.load(\"facebookresearch/semi-supervised-ImageNet1K-models\", \"resnet18_swsl\", pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/semi-supervised-ImageNet1K-models', model='resnet18_swsl', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Semi-supervised and semi-weakly supervised ImageNet Models", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/semi-supervised-ImageNet1K-models', model='resnet18_swsl', pretrained=True)", "api_arguments": {"repository": "facebookresearch/semi-supervised-ImageNet1K-models", "model": "resnet18_swsl", "pretrained": "True"}, "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/semi-supervised-ImageNet1K-models', 'resnet18_swsl', pretrained=True)", "model.eval()"], "performance": {"description": "Semi-supervised and semi-weakly supervised ImageNet models achieve state-of-the-art accuracy of 81.2% on ImageNet for the widely used/adopted ResNet-50 model architecture."}, "description": "Semi-supervised and semi-weakly supervised ImageNet Models are introduced in the 'Billion scale semi-supervised learning for image classification' paper. These models are pretrained on a subset of unlabeled YFCC100M public image dataset and fine-tuned with the ImageNet1K training dataset. They are capable of classifying images into different categories and are provided by the Facebook Research library."}}
{"code": "###Instruction: What would be a suitable API for an application that classifies images of autonomous driving from different devices and should be efficient in terms of size?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_efficientnet_b0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained NVIDIA EfficientNet-B0 model from PyTorch Hub, which offers state-of-the-art image classification performance with a smaller and faster architecture suitable for autonomous driving applications.', 'code': \"import torch\nmodel = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_efficientnet_b0', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "EfficientNet", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_efficientnet_b0', pretrained=True)", "api_arguments": ["model_name", "pretrained"], "python_environment_requirements": ["validators", "matplotlib"], "example_code": "import torch\n\nefficientnet = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_efficientnet_b0', pretrained=True)\nutils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')\n\nefficientnet.eval().to(device)\n\nbatch = torch.cat([utils.prepare_input_from_uri(uri) for uri in uris]).to(device)\n\nwith torch.no_grad():\n output = torch.nn.functional.softmax(efficientnet(batch), dim=1)\n \nresults = utils.pick_n_best(predictions=output, n=5)", "performance": {"dataset": {"name": "IMAGENET", "accuracy": "Not provided"}}, "description": "EfficientNet is a family of image classification models that achieve state-of-the-art accuracy while being smaller and faster. The models are trained with mixed precision using Tensor Cores on the NVIDIA Volta and Ampere GPU architectures. The EfficientNet models include EfficientNet-B0, EfficientNet-B4, EfficientNet-WideSE-B0, and EfficientNet-WideSE-B4. The WideSE models use wider Squeeze-and-Excitation layers than the original EfficientNet models, resulting in slightly better accuracy."}}
{"code": "###Instruction: I am a developer at Audible and I am looking for an API that can convert text to speech, find something suitable.\n###Output: {'domain': 'Text-to-Speech', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')\", 'api_provider': 'PyTorch', 'explanation': 'Use the NVIDIA Tacotron 2 model from PyTorch Hub, which converts text to mel spectrograms for generating natural-sounding speech.', 'code': 'import torch\nmodel = torch.hub.load(\"NVIDIA/DeepLearningExamples:torchhub\", \"nvidia_tacotron2\", model_math=\"fp16\")'}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Text-to-Speech", "api_name": "Tacotron 2", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "api_arguments": {"model_math": "fp16"}, "python_environment_requirements": ["numpy", "scipy", "librosa", "unidecode", "inflect", "libsndfile1"], "example_code": ["import torch", "tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')", "tacotron2 = tacotron2.to('cuda')", "tacotron2.eval()", "text = 'Hello world, I missed you so much.'", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')", "sequences, lengths = utils.prepare_input_sequence([text])", "with torch.no_grad():", " mel, _, _ = tacotron2.infer(sequences, lengths)", " audio = waveglow.infer(mel)", "audio_numpy = audio[0].data.cpu().numpy()", "rate = 22050"], "performance": {"dataset": "LJ Speech", "accuracy": "Not specified"}, "description": "The Tacotron 2 model generates mel spectrograms from input text using an encoder-decoder architecture, and it is designed for generating natural-sounding speech from raw transcripts without any additional prosody information. This implementation uses Dropout instead of Zoneout to regularize the LSTM layers. The WaveGlow model (also available via torch.hub) is a flow-based model that consumes the mel spectrograms to generate speech."}}
{"code": "###Instruction: You are tasked to parse images in a storage platform to classify a set of new products. Suggest me an API that can help you do this classification task.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_resnest50_380x380', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model from PyTorch Hub for image classification to classify images of new products.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_resnest50_380x380', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "api_arguments": {"model_name": "mealv2_resnest50_380x380"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ ResNet50", "resolution": "380", "parameters": "25.6M", "top1": "81.72", "top5": "95.81"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I am building an app to identify poisonous and non-poisonous mushrooms by taking a picture of it. Suggest an API to help me classify the pictures taken.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL_V2 model from PyTorch Hub, which can be fine-tuned for classifying mushrooms into poisonous and non-poisonous categories using the picture taken.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "api_arguments": {"model_name": "mealv2_mobilenetv3_small_075"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ MobileNet V3-Small 0.75", "resolution": "224", "parameters": "2.04M", "top1": "67.60", "top5": "87.23"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: Can you provide me an API for classifying a video content based on the actions performed in it?\n###Output: {'domain': 'Video Classification', 'api_call': \"model = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained 3D ResNet model (slow_r50) from PyTorch Hub for video classification based on the actions performed in the video.', 'code': 'import torch\nmodel = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorchvideo', model='slow_r50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Video Classification", "framework": "PyTorch", "functionality": "3D ResNet", "api_name": "slow_r50", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorchvideo', model='slow_r50', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": ["torch", "json", "urllib", "pytorchvideo", "torchvision", "torchaudio", "torchtext", "torcharrow", "TorchData", "TorchRec", "TorchServe", "PyTorch on XLA Devices"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True)", "device = 'cpu'", "model = model.eval()", "model = model.to(device)"], "performance": {"dataset": "Kinetics 400", "accuracy": {"top_1": 74.58, "top_5": 91.63}, "Flops (G)": 54.52, "Params (M)": 32.45}, "description": "The 3D ResNet model is a Resnet-style video classification network pretrained on the Kinetics 400 dataset. It is based on the architecture from the paper 'SlowFast Networks for Video Recognition' by Christoph Feichtenhofer et al."}}
{"code": "###Instruction: A startup called \"DriveMe\" is building a vehicular safety app and wants to detect traffic objects, segment drivable areas, and detect lanes in real-time. Suggest an API to help them achieve their goal.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('datvuthanh/hybridnets', 'hybridnets', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained HybridNets model from PyTorch Hub for multi-task vehicular perception, including traffic object detection, drivable area segmentation, and lane detection in real-time.', 'code': \"import torch\nmodel = torch.hub.load('datvuthanh/hybridnets', 'hybridnets', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='datvuthanh/hybridnets', model='hybridnets', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "Traffic Object Detection, Drivable Area Segmentation, Lane Detection", "api_name": "HybridNets", "api_call": "torch.hub.load(repo_or_dir='datvuthanh/hybridnets', model='hybridnets', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "Python>=3.7, PyTorch>=1.10", "example_code": "import torch\nmodel = torch.hub.load('datvuthanh/hybridnets', 'hybridnets', pretrained=True)\nimg = torch.randn(1,3,640,384)\nfeatures, regression, classification, anchors, segmentation = model(img)", "performance": {"dataset": [{"name": "BDD100K", "accuracy": {"Traffic Object Detection": {"Recall (%)": 92.8, "mAP@0.5 (%)": 77.3}, "Drivable Area Segmentation": {"Drivable mIoU (%)": 90.5}, "Lane Line Detection": {"Accuracy (%)": 85.4, "Lane Line IoU (%)": 31.6}}}]}, "description": "HybridNets is an end2end perception network for multi-tasks. Our work focused on traffic object detection, drivable area segmentation and lane detection. HybridNets can run real-time on embedded systems, and obtains SOTA Object Detection, Lane Detection on BDD100K Dataset."}}
{"code": "###Instruction: Identify an API which detects voice activity in an audio file and share the code to load it.\n###Output: {'domain': 'Voice Activity Detection', 'api_call': \"model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load Silero Voice Activity Detector model from PyTorch Hub to detect voice activity in an audio file.', 'code': \"import torch\nmodel, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)\"}", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Voice Activity Detection", "api_name": "Silero Voice Activity Detector", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)", "api_arguments": {"repo_or_dir": "snakers4/silero-vad", "model": "silero_vad", "force_reload": "True"}, "python_environment_requirements": {"torchaudio": "pip install -q torchaudio"}, "example_code": {"import": ["import torch", "torch.set_num_threads(1)", "from IPython.display import Audio", "from pprint import pprint"], "download_example": "torch.hub.download_url_to_file('https://models.silero.ai/vad_models/en.wav', 'en_example.wav')", "load_model": "model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True)", "load_utils": "(get_speech_timestamps, _, read_audio, _) = utils", "set_sampling_rate": "sampling_rate = 16000", "read_audio": "wav = read_audio('en_example.wav', sampling_rate=sampling_rate)", "get_speech_timestamps": "speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=sampling_rate)", "print_speech_timestamps": "pprint(speech_timestamps)"}, "performance": {"dataset": "", "accuracy": ""}, "description": "Silero VAD is a pre-trained enterprise-grade Voice Activity Detector (VAD) that aims to provide a high-quality and modern alternative to the WebRTC Voice Activity Detector. The model is optimized for performance on 1 CPU thread and is quantized."}}
{"code": "###Instruction: Help me identify various objects in an image. Suggest an API for performing image classification.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNext WSL (weakly-supervised learning) model from PyTorch Hub to classify various objects within an image.', 'code': 'import torch\nmodel = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x8d_wsl', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNext WSL", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x8d_wsl', pretrained=True)", "api_arguments": [{"name": "resnext101_32x8d_wsl", "type": "str", "description": "ResNeXt-101 32x8d WSL model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl')", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(output[0])", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeXt-101 32x8d": {"Top-1 Acc.": "82.2", "Top-5 Acc.": "96.4"}}}, "description": "The provided ResNeXt models are pre-trained in weakly-supervised fashion on 940 million public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. The models significantly improve the training accuracy on ImageNet compared to training from scratch. They achieve state-of-the-art accuracy of 85.4% on ImageNet with the ResNext-101 32x48d model."}}
{"code": "###Instruction: A marketing company needs an API to classify images into animals and assign them different categories. Which API would you recommend them?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the ResNet50 v1.5 model from PyTorch Hub, a powerful image classification model that can classify images into different categories, including animal classes.', 'code': 'import torch\nmodel = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_resnet50', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNet50", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_resnet50', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": ["pip install validators matplotlib"], "example_code": ["import torch", "from PIL import Image", "import torchvision.transforms as transforms", "import numpy as np", "import json", "import requests", "import matplotlib.pyplot as plt", "import warnings", "warnings.filterwarnings('ignore')", "%matplotlib inline", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "print(f'Using {device} for inference')", "resnet50 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True)", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')", "resnet50.eval().to(device)", "uris = [...]", "batch = torch.cat([utils.prepare_input_from_uri(uri) for uri in uris]).to(device)", "with torch.no_grad():", " output = torch.nn.functional.softmax(resnet50(batch), dim=1)", "results = utils.pick_n_best(predictions=output, n=5)", "for uri, result in zip(uris, results):", " img = Image.open(requests.get(uri, stream=True).raw)", " img.thumbnail((256,256), Image.ANTIALIAS)", " plt.imshow(img)", " plt.show()", " print(result)"], "performance": {"dataset": "ImageNet", "accuracy": "~0.5% top1 improvement over ResNet50 v1"}, "description": "The ResNet50 v1.5 model is a modified version of the original ResNet50 v1 model. The difference between v1 and v1.5 is that, in the bottleneck blocks which requires downsampling, v1 has stride = 2 in the first 1x1 convolution, whereas v1.5 has stride = 2 in the 3x3 convolution. This difference makes ResNet50 v1.5 slightly more accurate (~0.5% top1) than v1, but comes with a small performance drawback (~5% imgs/sec). The model is initialized as described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures."}}
{"code": "###Instruction: Recommend an API for a mobile app that can identify fruits from images taken by the users.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load('pytorch/vision', 'alexnet', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained AlexNet model from PyTorch Hub, which can be fine-tuned for fruit classification using images taken by users.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'alexnet', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='alexnet', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "AlexNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='alexnet', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": ">=1.9.0", "torchvision": ">=0.10.0"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'alexnet', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"top-1_error": 43.45, "top-5_error": 20.91}}, "description": "AlexNet is a deep convolutional neural network that achieved a top-5 error of 15.3% in the 2012 ImageNet Large Scale Visual Recognition Challenge. The main contribution of the original paper was the depth of the model, which was computationally expensive but made feasible through the use of GPUs during training. The pretrained AlexNet model in PyTorch can be used for image classification tasks."}}
{"code": "###Instruction: A city is planning to survey the land for urban development. Provide me with an API that can identify buildings and roads from an aerial photo.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': 'model = torch.hub.load(\"pytorch/vision\", \"fcn_resnet101\", pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained FCN-ResNet101 model from PyTorch Hub, which can be fine-tuned for aerial photo segmentation to identify buildings and roads.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"fcn_resnet101\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "Fully-Convolutional Network", "api_name": "fcn_resnet101", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='fcn_resnet101', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "boolean", "description": "If True, returns a model pre-trained on COCO train2017"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "matplotlib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet101', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert(\"RGB\")", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": {"fcn_resnet101": {"Mean IOU": 63.7, "Global Pixelwise Accuracy": 91.9}}}, "description": "FCN-ResNet is a Fully-Convolutional Network model using a ResNet-50 or a ResNet-101 backbone. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: I need an efficient model for classifying animals in images taken by wildlife cameras. Suggest me an API for this purpose.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet39ds', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained HarDNet-39DS model from PyTorch Hub, which is an efficient and fast model suitable for classifying animals in images taken by wildlife cameras.', 'code': 'import torch\nmodel = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet39ds', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet39ds', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "HarDNet", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet39ds', pretrained=True)", "api_arguments": [{"name": "hardnet39ds", "type": "str", "description": "HarDNet-39DS model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet39ds', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"hardnet39ds": {"Top-1 error": 27.92, "Top-5 error": 9.57}}}, "description": "Harmonic DenseNet (HarDNet) is a low memory traffic CNN model, which is fast and efficient. The basic concept is to minimize both computational cost and memory access cost at the same time, such that the HarDNet models are 35% faster than ResNet running on GPU comparing to models with the same accuracy (except the two DS models that were designed for comparing with MobileNet)."}}
{"code": "###Instruction: The company is creating a neural network model that can run efficiently on different hardware platforms. Tell me an API that specializes CNNs for different hardware.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/ProxylessNAS', platform_type, pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ProxylessNAS model from PyTorch Hub that specializes CNN architectures for different hardware platforms such as CPU, GPU, and Mobile, while maintaining similar accuracy.', 'code': 'import torch\nplatform_type = \"proxylessnas_cpu\"  # replace with proxylessnas_gpu or proxylessnas_mobile as needed\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', platform_type, pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_cpu', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_cpu', pretrained=True)", "api_arguments": [{"name": "proxylessnas_cpu", "type": "str", "description": "ProxylessNAS optimized for CPU"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_cpu'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_cpu", "accuracy": 75.3}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: Farlando Corp has an application that runs on their customers' GPUs, and they want a neural network that is optimized on GPU performance. Recommend an API that they can use for image classification.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ProxylessNAS optimized for GPU from PyTorch Hub, which can be used for image classification tasks.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "api_arguments": [{"name": "proxylessnas_gpu", "type": "str", "description": "ProxylessNAS optimized for GPU"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_gpu'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_gpu", "accuracy": 75.1}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: I need an efficient model for image classification with good accuracy. Provide me with an API that uses LIF neurons.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP model from PyTorch Hub, which utilizes LIF neurons for an efficient image classification with good accuracy.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_s', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_s', pretrained=True)", "api_arguments": [{"name": "snnmlp_s", "type": "str", "description": "SNNMLP Small model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_s', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Small", "top-1": 83.3}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: As a market research analyst, I want to find a tool to classify different product types using their images.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pre-trained VGG11 model from PyTorch Hub for image recognition tasks, which can be fine-tuned for product type classification.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "api_arguments": [{"name": "vgg11", "type": "str", "description": "VGG11 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11": {"Top-1 error": 30.98, "Top-5 error": 11.37}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: A media company that works with image recognition is trying to identify an object in an image. Recommend an API that specializes in image recognition.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained VGG11 model with batch normalization from PyTorch Hub, specifically designed for image recognition tasks.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11_bn', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11_bn', pretrained=True)", "api_arguments": [{"name": "vgg11_bn", "type": "str", "description": "VGG11 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11_bn": {"Top-1 error": 26.7, "Top-5 error": 8.58}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: Inform me of an API that can help identify famous landmarks from images.\n###Output: {'domain': 'Image Recognition', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg13_bn', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained VGG13 model with batch normalization from PyTorch Hub, which can be fine-tuned for famous landmark identification from images.\", 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg13_bn', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg13_bn', pretrained=True)", "api_arguments": [{"name": "vgg13_bn", "type": "str", "description": "VGG13 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg13_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg13_bn": {"Top-1 error": 28.45, "Top-5 error": 9.63}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: I am working on an image classification project where accuracy is important, and I need a pretrained model that has a lower error rate when classifying images. What model might work for me?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the Wide ResNet-50-2 pretrained model from PyTorch Hub, which offers lower top-1 and top-5 error rates for improved image classification accuracy.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet50_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet50_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet50_2": {"Top-1 error": 21.49, "Top-5 error": 5.91}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: The New York Times wants to classify some information about Jim Henson. Recommend an API to analyze and classify the text.\n###Output: {'domain': 'Natural Language Processing', 'api_call': 'model = torch.hub.load(\\'huggingface/pytorch-transformers\\', \\'model\\', \\'bert-base-cased\\')', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained BERT model for NLP analysis and classification from PyTorch Hub to process and classify the text.', 'code': 'import torch\ntokenizer = torch.hub.load(\\'huggingface/pytorch-transformers\\', \\'tokenizer\\', \\'bert-base-cased\\')\nmodel = torch.hub.load(\\'huggingface/pytorch-transformers\\', \\'model\\', \\'bert-base-cased\\')'}\n", "api_call": "torch.hub.load(repo_or_dir='huggingface/pytorch-transformers')", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Natural Language Processing", "api_name": "PyTorch-Transformers", "api_call": "torch.hub.load(repo_or_dir='huggingface/pytorch-transformers')", "api_arguments": ["pretrained_model_or_path", "output_attention", "output_hidden_states", "config", "from_tf"], "python_environment_requirements": ["tqdm", "boto3", "requests", "regex", "sentencepiece", "sacremoses"], "example_code": "import torch\ntokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased')\n\ntext_1 = \"Jim Henson was a puppeteer\"\ntext_2 = \"Who was Jim Henson ?\"\n\nindexed_tokens = tokenizer.encode(text_1, text_2, add_special_tokens=True)\n\nmodel = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-cased')\n\nwith torch.no_grad():\n encoded_layers, _ = model(tokens_tensor, token_type_ids=segments_tensors)", "performance": {"dataset": [{"name": "MRPC", "accuracy": "Not provided"}]}, "description": "PyTorch-Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP) including BERT, GPT, GPT-2, Transformer-XL, XLNet, XLM, RoBERTa, and DistilBERT. The library provides functionality for tokenization, configuration, and various model architectures for different tasks such as causal language modeling, sequence classification, question answering, and masked language modeling."}}
{"code": "###Instruction: Recommend a pretrained API that classifies animals from an image given the photo of the animal.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model with ResNet50 architecture from PyTorch Hub, which can be used for image classification, including animal classification.', 'code': 'import torch\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_cutmix', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_cutmix', pretrained=True)", "api_arguments": {"model_name": "mealv2_resnest50_cutmix"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 + CutMix w/ ResNet50", "resolution": "224", "parameters": "25.6M", "top1": "80.98", "top5": "95.35"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I have a picture of my dog and I want to classify its breed. Provide me an API to do this.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 model for image classification from PyTorch Hub to classify the breed of your dog.', 'code': 'import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_075', pretrained=True)", "api_arguments": {"model_name": "mealv2_mobilenetv3_small_075"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ MobileNet V3-Small 0.75", "resolution": "224", "parameters": "2.04M", "top1": "67.60", "top5": "87.23"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: A developer at Pinterest wants to automatically categorize uploaded images based on their content. Provide an API suggestion that can help with this task.\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained MEAL V2 model with EfficientNet-B0 architecture from PyTorch Hub for image categorization based on the content.', 'code': 'import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "api_arguments": {"model_name": "mealv2_efficientnet_b0"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ EfficientNet-B0", "resolution": "224", "parameters": "5.29M", "top1": "78.29", "top5": "93.95"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: A startup is working on a computer vision application supporting autonomous drones. Can you provide an API that can compute the relative depth of an object in a given image?\n###Output: {'domain': 'Depth Estimation', 'api_call': \"model = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MiDaS model from PyTorch Hub, which is capable of computing the relative depth of objects within a single input image.', 'code': 'import torch\nmodel = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Computing relative depth from a single image", "api_name": "MiDaS", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='MiDaS_small', pretrained=True)", "api_arguments": {"repo_or_dir": "intel-isl/MiDaS", "model": "model_type"}, "python_environment_requirements": "pip install timm", "example_code": ["import cv2", "import torch", "import urllib.request", "import matplotlib.pyplot as plt", "url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "urllib.request.urlretrieve(url, filename)", "model_type = 'DPT_Large'", "midas = torch.hub.load('intel-isl/MiDaS', 'MiDaS_small')", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "midas.to(device)", "midas.eval()", "midas_transforms = torch.hub.load('intel-isl/MiDaS', 'transforms')", "if model_type == 'DPT_Large' or model_type == 'DPT_Hybrid':", " transform = midas_transforms.dpt_transform", "else:", " transform = midas_transforms.small_transform", "img = cv2.imread(filename)", "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)", "input_batch = transform(img).to(device)", "with torch.no_grad():", " prediction = midas(input_batch)", "prediction = torch.nn.functional.interpolate(", " prediction.unsqueeze(1),", " size=img.shape[:2],", " mode='bicubic',", " align_corners=False,", ").squeeze()", "output = prediction.cpu().numpy()", "plt.imshow(output)", "plt.show()"], "performance": {"dataset": "10 distinct datasets", "accuracy": "Multi-objective optimization"}, "description": "MiDaS computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs."}}
{"code": "###Instruction: Imagine you are trying to build podcast transcription for people who are impaired. Get an API to transcribe a sample podcast from Spotify.\n###Output: {'domain': 'Speech-To-Text', 'api_call': \"model = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', device=device)\", 'api_provider': 'PyTorch', 'explanation': \"Load Silero's pretrained Speech-To-Text model from PyTorch Hub for transcribing a podcast from Spotify in English.\", 'code': 'import torch\ndevice = torch.device(\"cpu\")\nmodel = torch.hub.load(repo_or_dir=\"snakers4/silero-models\", model=\"silero_stt\", language=\"en\", device=device)'}\n", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Speech-To-Text", "api_name": "Silero Models", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "api_arguments": {"repo_or_dir": "snakers4/silero-models", "model": "silero_stt", "language": ["en", "de", "es"], "device": "device"}, "python_environment_requirements": ["pip install -q torchaudio omegaconf soundfile"], "example_code": ["import torch", "import zipfile", "import torchaudio", "from glob import glob", "device = torch.device('cpu')", "model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', device=device)", "(read_batch, split_into_batches, read_audio, prepare_model_input) = utils", "torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)", "test_files = glob('speech_orig.wav')", "batches = split_into_batches(test_files, batch_size=10)", "input = prepare_model_input(read_batch(batches[0]), device=device)", "output = model(input)", "for example in output:", " print(decoder(example.cpu()))"], "performance": {"dataset": "Open-STT", "accuracy": "See the wiki for quality and performance benchmarks"}, "description": "Silero Speech-To-Text models provide enterprise grade STT in a compact form-factor for several commonly spoken languages. The models are robust to a variety of dialects, codecs, domains, noises, and lower sampling rates. They consume a normalized audio in the form of samples and output frames with token probabilities. A decoder utility is provided for simplicity."}}
{"code": "###Instruction: A tourist is planning to take a picture of a beautiful scene but wants to separate the people from the background. Recommend an API to help do this.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': \"model = torch.hub.load('pytorch/vision', 'deeplabv3_resnet101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained DeepLabV3 model with a ResNet-101 backbone for semantic segmentation to separate people from the background in an image.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'deeplabv3_resnet101', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "DeepLabV3", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet101', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": "latest", "torchvision": "latest", "PIL": "latest", "matplotlib": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet101', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert('RGB')", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": [{"model": "deeplabv3_resnet101", "Mean IOU": 67.4, "Global Pixelwise Accuracy": 92.4}]}, "description": "DeepLabV3 models with ResNet-50, ResNet-101 and MobileNet-V3 backbones for semantic segmentation. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: I took a photo and I want to detect all the objects in the image. Provide me with an API to do this.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained YOLOv5 model from PyTorch Hub to detect objects in an image.', 'code': 'import torch\nmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='ultralytics/yolov5', model='yolov5s', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "YOLOv5", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='ultralytics/yolov5', model='yolov5s', pretrained=True)", "api_arguments": ["'ultralytics/yolov5'", "'yolov5s'", "pretrained=True"], "python_environment_requirements": "Python>=3.8, PyTorch>=1.7", "example_code": ["import torch", "model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)", "imgs = ['https://ultralytics.com/images/zidane.jpg']", "results = model(imgs)", "results.print()", "results.save()", "results.xyxy[0]", "results.pandas().xyxy[0]"], "performance": {"dataset": "COCO", "accuracy": {"YOLOv5s6": {"mAPval0.5:0.95": 43.3, "mAPtest0.5:0.95": 43.3, "mAPval0.5": 61.9}, "YOLOv5m6": {"mAPval0.5:0.95": 50.5, "mAPtest0.5:0.95": 50.5, "mAPval0.5": 68.7}, "YOLOv5l6": {"mAPval0.5:0.95": 53.4, "mAPtest0.5:0.95": 53.4, "mAPval0.5": 71.1}, "YOLOv5x6": {"mAPval0.5:0.95": 54.4, "mAPtest0.5:0.95": 54.4, "mAPval0.5": 72.0}, "YOLOv5x6 TTA": {"mAPval0.5:0.95": 55.0, "mAPtest0.5:0.95": 55.0, "mAPval0.5": 72.0}}}, "description": "YOLOv5 is a family of compound-scaled object detection models trained on the COCO dataset, and includes simple functionality for Test Time Augmentation (TTA), model ensembling, hyperparameter evolution, and export to ONNX, CoreML and TFLite."}}
{"code": "###Instruction: Find an API that can generate new images of various clothing styles in 64x64 resolution using Generative Adversarial Networks.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained DCGAN model from PyTorch Hub, which can generate new images of various clothing styles in 64x64 resolution using Generative Adversarial Networks.', 'code': \"import torch\nuse_gpu = True if torch.cuda.is_available() else False\nmodel = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)\"}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='DCGAN', pretrained=True, useGPU=use_gpu)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Generative Adversarial Networks", "api_name": "DCGAN", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/pytorch_GAN_zoo:hub', model='DCGAN', pretrained=True, useGPU=use_gpu)", "api_arguments": {"pretrained": "True", "useGPU": "use_gpu"}, "python_environment_requirements": "Python 3", "example_code": {"import": ["import torch", "import matplotlib.pyplot as plt", "import torchvision"], "use_gpu": "use_gpu = True if torch.cuda.is_available() else False", "load_model": "model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'DCGAN', pretrained=True, useGPU=use_gpu)", "build_noise_data": "noise, _ = model.buildNoiseData(num_images)", "generate_images": "with torch.no_grad(): generated_images = model.test(noise)", "plot_images": ["plt.imshow(torchvision.utils.make_grid(generated_images).permute(1, 2, 0).cpu().numpy())", "plt.show()"]}, "performance": {"dataset": "FashionGen", "accuracy": "N/A"}, "description": "DCGAN is a model designed in 2015 by Radford et. al. in the paper Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. It is a GAN architecture both very simple and efficient for low resolution image generation (up to 64x64)."}}
{"code": "###Instruction: I am trying to classify an image to find its category. Please give me an API that can identify the content of an image.\n\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'alexnet', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained AlexNet model from PyTorch Hub, which is designed for image classification tasks.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"alexnet\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='alexnet', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "AlexNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='alexnet', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": ">=1.9.0", "torchvision": ">=0.10.0"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'alexnet', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"top-1_error": 43.45, "top-5_error": 20.91}}, "description": "AlexNet is a deep convolutional neural network that achieved a top-5 error of 15.3% in the 2012 ImageNet Large Scale Visual Recognition Challenge. The main contribution of the original paper was the depth of the model, which was computationally expensive but made feasible through the use of GPUs during training. The pretrained AlexNet model in PyTorch can be used for image classification tasks."}}
{"code": "###Instruction: I would like to convert text to natural sounding speech using Deep Learning. Can you provide me with an API to achieve this?\n###Output: {'domain': 'Text-to-Speech', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained WaveGlow model from NVIDIA PyTorch Hub to convert text to natural-sounding speech using a speech synthesis system.', 'code': 'import torch\nwaveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_waveglow', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Text-to-Speech", "framework": "PyTorch", "functionality": "Speech Synthesis", "api_name": "WaveGlow", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_waveglow', pretrained=True)", "api_arguments": {"repo_or_dir": "NVIDIA/DeepLearningExamples:torchhub", "model": "nvidia_waveglow", "model_math": "fp32"}, "python_environment_requirements": ["numpy", "scipy", "librosa", "unidecode", "inflect", "libsndfile1"], "example_code": {"load_waveglow_model": "waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32')", "prepare_waveglow_model": ["waveglow = waveglow.remove_weightnorm(waveglow)", "waveglow = waveglow.to('cuda')", "waveglow.eval()"], "load_tacotron2_model": "tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp32')", "prepare_tacotron2_model": ["tacotron2 = tacotron2.to('cuda')", "tacotron2.eval()"], "synthesize_speech": ["text = \"hello world, I missed you so much\"", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')", "sequences, lengths = utils.prepare_input_sequence([text])", "with torch.no_grad():", " mel, _, _ = tacotron2.infer(sequences, lengths)", " audio = waveglow.infer(mel)", "audio_numpy = audio[0].data.cpu().numpy()", "rate = 22050"], "save_audio": "write(\"audio.wav\", rate, audio_numpy)", "play_audio": "Audio(audio_numpy, rate=rate)"}, "performance": {"dataset": "LJ Speech", "accuracy": null}, "description": "The Tacotron 2 and WaveGlow model form a text-to-speech system that enables users to synthesize natural-sounding speech from raw transcripts without any additional prosody information. The Tacotron 2 model produces mel spectrograms from input text using encoder-decoder architecture. WaveGlow is a flow-based model that consumes the mel spectrograms to generate speech."}}
{"code": "###Instruction: Design a system to diagnose diseases from X-Ray images. Recommend an appropriate API for classifying diseases in the X-Ray images.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'densenet169', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Densenet-169 model from PyTorch Hub, which can be fine-tuned for diagnosing diseases from X-Ray images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'densenet169', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet169', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Dense Convolutional Network", "api_name": "Densenet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet169', pretrained=True)", "api_arguments": [{"name": "densenet169", "type": "str", "description": "Densenet-169 model"}], "python_environment_requirements": {"torch": "latest", "torchvision": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet169', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"densenet169": {"Top-1 error": 24.0, "Top-5 error": 7.0}}}, "description": "Dense Convolutional Network (DenseNet) connects each layer to every other layer in a feed-forward fashion. It alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters."}}
{"code": "###Instruction: A smartphone company is developing an app that can classify object from a picture. Provide an API that can achieve this task.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet39ds', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained HarDNet model for image classification from PyTorch Hub to efficiently classify objects in images.', 'code': \"import torch\nmodel = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet39ds', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet39ds', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "HarDNet", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet39ds', pretrained=True)", "api_arguments": [{"name": "hardnet39ds", "type": "str", "description": "HarDNet-39DS model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet39ds', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"hardnet39ds": {"Top-1 error": 27.92, "Top-5 error": 9.57}}}, "description": "Harmonic DenseNet (HarDNet) is a low memory traffic CNN model, which is fast and efficient. The basic concept is to minimize both computational cost and memory access cost at the same time, such that the HarDNet models are 35% faster than ResNet running on GPU comparing to models with the same accuracy (except the two DS models that were designed for comparing with MobileNet)."}}
{"code": "###Instruction: I want to create an app that recognizes items from pictures taken by users. Can you recommend any machine learning API for this purpose?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('huawei-noah/ghostnet', 'ghostnet_1x', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained GhostNet model that performs efficient image classification by generating more features from cheap operations.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/ghostnet', 'ghostnet_1x', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/ghostnet', model='ghostnet_1x', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Efficient networks by generating more features from cheap operations", "api_name": "GhostNet", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/ghostnet', model='ghostnet_1x', pretrained=True)", "api_arguments": ["pretrained"], "python_environment_requirements": ["torch", "torchvision", "PIL"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/ghostnet', 'ghostnet_1x', pretrained=True)", "model.eval()", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)", "print(probabilities)"], "performance": {"dataset": "ImageNet", "accuracy": {"Top-1 acc": "73.98", "Top-5 acc": "91.46"}}, "description": "The GhostNet architecture is based on an Ghost module structure which generates more features from cheap operations. Based on a set of intrinsic feature maps, a series of cheap operations are applied to generate many ghost feature maps that could fully reveal information underlying intrinsic features. Experiments conducted on benchmarks demonstrate the superiority of GhostNet in terms of speed and accuracy tradeoff."}}
{"code": "###Instruction: Recommend an API that can be used for image classification tasks on a dataset of images.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained HarDNet-85 model from PyTorch Hub for image classification, which is a low memory traffic CNN model designed to be fast, efficient, and accurate.', 'code': 'import torch\nmodel = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet85', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "HarDNet", "api_call": "torch.hub.load(repo_or_dir='PingoLH/Pytorch-HarDNet', model='hardnet85', pretrained=True)", "api_arguments": [{"name": "hardnet85", "type": "str", "description": "HarDNet-85 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('PingoLH/Pytorch-HarDNet', 'hardnet85', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"hardnet85": {"Top-1 error": 21.96, "Top-5 error": 6.11}}}, "description": "Harmonic DenseNet (HarDNet) is a low memory traffic CNN model, which is fast and efficient. The basic concept is to minimize both computational cost and memory access cost at the same time, such that the HarDNet models are 35% faster than ResNet running on GPU comparing to models with the same accuracy (except the two DS models that were designed for comparing with MobileNet)."}}
{"code": "###Instruction: Find out an API that can identify 102 different types of flowers from an image.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_mobile', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ProxylessNAS Mobile model from PyTorch Hub, which can be fine-tuned for 102 types of flower identification from an image.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_mobile', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_mobile', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_mobile', pretrained=True)", "api_arguments": [{"name": "proxylessnas_mobile", "type": "str", "description": "ProxylessNAS optimized for Mobile"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_mobile'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_mobile", "accuracy": 74.6}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: Can you recommend an API for image classification which is efficient in terms of computational resources and has decent accuracy?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'mobilenet_v2', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MobileNet v2 model from PyTorch Hub for efficient and accurate image classification.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'mobilenet_v2', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='mobilenet_v2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MobileNet v2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='mobilenet_v2', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision', 'mobilenet_v2', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "probabilities = torch.nn.functional.softmax(output[0], dim=0)"], "performance": {"dataset": "ImageNet", "accuracy": {"top-1_error": 28.12, "top-5_error": 9.71}}, "description": "The MobileNet v2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input. MobileNet v2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, non-linearities in the narrow layers were removed in order to maintain representational power."}}
{"code": "###Instruction: A photography service needs a fast algorithm to recognize objects in their images from the ImageNet dataset out of the box. What API should they use?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained ResNet18 model from PyTorch Hub, which offers fast object recognition using the ImageNet dataset out of the box.\", 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet18', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet18', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet18", "top-1_error": 30.24, "top-5_error": 10.92}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Can you suggest an API for classifying images in my dataset using a model with spiking neural networks?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP (Spiking Neural Network MLP) Tiny model from PyTorch Hub for image classification. It incorporates the mechanism of LIF neurons to achieve better accuracy without extra FLOPs.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_t', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_t', pretrained=True)", "api_arguments": [{"name": "snnmlp_t", "type": "str", "description": "SNNMLP Tiny model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Tiny", "top-1": 81.88}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: I am trying to recognize objects in an image using a popular image classification model. Which model should I use?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the VGG11 model, a popular image classification model, available from PyTorch Hub with pre-trained weights for recognizing objects in images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg11', pretrained=True)\nmodel.eval()'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg11', pretrained=True)", "api_arguments": [{"name": "vgg11", "type": "str", "description": "VGG11 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg11": {"Top-1 error": 30.98, "Top-5 error": 11.37}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: I want to create an app to recognize objects in images. Which API is suitable for this task?\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'vgg19', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained VGG19 model from PyTorch Hub for image recognition to recognize objects in images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'vgg19', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19', pretrained=True)", "api_arguments": [{"name": "vgg19", "type": "str", "description": "VGG19 model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg19', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg19": {"Top-1 error": 27.62, "Top-5 error": 9.12}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: Air Traffic Control needs an image classifier to identify if an image contains an aircraft or not. Suggest an API that would be suitable for this task.\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load(\"pytorch/vision\", \"vgg19_bn\", pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the VGG19 model with batch normalization from PyTorch Hub for image recognition tasks, which can be used to classify if an image contains an aircraft.', 'code': 'import torch\nmodel = torch.hub.load(\"pytorch/vision\", \"vgg19_bn\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19_bn', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Recognition", "api_name": "vgg-nets", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='vgg19_bn', pretrained=True)", "api_arguments": [{"name": "vgg19_bn", "type": "str", "description": "VGG19 model with batch normalization"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg19_bn', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"vgg19_bn": {"Top-1 error": 25.76, "Top-5 error": 8.15}}}, "description": "vgg-nets are award-winning ConvNets from the 2014 Imagenet ILSVRC challenge. They are used for large-scale image recognition tasks. The available models are vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, and vgg19_bn."}}
{"code": "###Instruction: A smart fridge wants to identify food items from images taken from its camera. Provide an API to identify the food items.\n###Output: {'domain': 'Food Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Wide ResNet50 model from PyTorch Hub, which can be fine-tuned for food item classification from camera images.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet50_2', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet50_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet50_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet50_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet50_2": {"Top-1 error": 21.49, "Top-5 error": 5.91}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: I want to count how many people are present in a room using an image. Tell me an API that can do this task.\n###Output: {'domain': 'Object Detection', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained NVIDIA SSD model from PyTorch Hub, which can be used to detect and count the number of people present in a room using an input image.\", 'code': \"import torch\nmodel = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_ssd', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Object Detection", "framework": "PyTorch", "functionality": "Single Shot MultiBox Detector", "api_name": "SSD", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_ssd', pretrained=True)", "api_arguments": ["model_name"], "python_environment_requirements": ["numpy", "scipy", "scikit-image", "matplotlib"], "example_code": "import torch\n\nssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')\nutils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')\n\nssd_model.to('cuda')\nssd_model.eval()\n\ninputs = [utils.prepare_input(uri) for uri in uris]\ntensor = utils.prepare_tensor(inputs)\n\nwith torch.no_grad():\n detections_batch = ssd_model(tensor)\n\nresults_per_input = utils.decode_results(detections_batch)\nbest_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input]", "performance": {"dataset": {"name": "COCO", "accuracy": "Not provided"}}, "description": "The SSD (Single Shot MultiBox Detector) model is an object detection model based on the paper 'SSD: Single Shot MultiBox Detector'. It uses a deep neural network for detecting objects in images. This implementation replaces the obsolete VGG model backbone with the more modern ResNet-50 model. The SSD model is trained on the COCO dataset and can be used to detect objects in images with high accuracy and efficiency."}}
{"code": "###Instruction: I am developing a website that can predict the content of an image based on its URL. What API would you recommend with a code example?\n\n###Input: {\"image_url\": \"https://example.com/image.jpg\"}\n\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_efficientnet_b0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained EfficientNet model from PyTorch Hub for image classification of the given URL.', 'code': \"import torch\nefficientnet = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_efficientnet_b0', pretrained=True)\nutils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')\n\nefficientnet.eval()\n\nimage_url = 'https://example.com/image.jpg'\nbatch = utils.prepare_input_from_uri(image_url).unsqueeze(0)\n\nwith torch.no_grad():\n    output = torch.nn.functional.softmax(efficientnet(batch), dim=1)\n\nresults = utils.pick_n_best(predictions=output, n=5)\", 'predicted_labels': ['<label1>', '<label2>', '<label3>', '<label4>', '<label5>']}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "EfficientNet", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_efficientnet_b0', pretrained=True)", "api_arguments": ["model_name", "pretrained"], "python_environment_requirements": ["validators", "matplotlib"], "example_code": "import torch\n\nefficientnet = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_efficientnet_b0', pretrained=True)\nutils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_convnets_processing_utils')\n\nefficientnet.eval().to(device)\n\nbatch = torch.cat([utils.prepare_input_from_uri(uri) for uri in uris]).to(device)\n\nwith torch.no_grad():\n output = torch.nn.functional.softmax(efficientnet(batch), dim=1)\n \nresults = utils.pick_n_best(predictions=output, n=5)", "performance": {"dataset": {"name": "IMAGENET", "accuracy": "Not provided"}}, "description": "EfficientNet is a family of image classification models that achieve state-of-the-art accuracy while being smaller and faster. The models are trained with mixed precision using Tensor Cores on the NVIDIA Volta and Ampere GPU architectures. The EfficientNet models include EfficientNet-B0, EfficientNet-B4, EfficientNet-WideSE-B0, and EfficientNet-WideSE-B4. The WideSE models use wider Squeeze-and-Excitation layers than the original EfficientNet models, resulting in slightly better accuracy."}}
{"code": "###Instruction: A wildlife photographer wants to classify animals in images taken during a safari. Provide me with an API that can help classify these animals.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_resnest50_380x380', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 ResNet50 model from PyTorch Hub for image classification. This model provides high accuracy and can be used to classify animals in wildlife images.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_resnest50_380x380', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_resnest50_380x380', pretrained=True)", "api_arguments": {"model_name": "mealv2_resnest50_380x380"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ ResNet50", "resolution": "380", "parameters": "25.6M", "top1": "81.72", "top5": "95.81"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I want to use my camera app to identify objects that I point it to. What API would you recommend?\n###Output: {'domain': 'Classification', 'api_call': 'model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_mobilenetv3_small_100', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL-V2 model with MobileNet V3-Small 1.0 for image classification from PyTorch Hub, which can identify objects in your camera app.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_mobilenetv3_small_100', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_100', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenetv3_small_100', pretrained=True)", "api_arguments": {"model_name": "mealv2_mobilenetv3_small_100"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ MobileNet V3-Small 1.0", "resolution": "224", "parameters": "2.54M", "top1": "69.65", "top5": "88.71"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: I am building an image classification model and want to achieve a high accuracy. Which API should I use?\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenet_v3_large_100', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Use the pretrained MEAL V2 model for image classification from PyTorch Hub to achieve high accuracy without relying on common tricks.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenet_v3_large_100', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenet_v3_large_100', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_mobilenet_v3_large_100', pretrained=True)", "api_arguments": {"model_name": "mealv2_mobilenet_v3_large_100"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ MobileNet V3-Large 1.0", "resolution": "224", "parameters": "5.48M", "top1": "76.92", "top5": "93.32"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}
{"code": "###Instruction: A photographer at a film studio wants to find the relative depth from a single image. Recommend an API that can compute relative depth from an input image.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('intel-isl/MiDaS', 'DPT_Large', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MiDaS model from PyTorch Hub, which computes relative inverse depth from a single input image, allowing the photographer to analyze depth information.', 'code': 'import torch\nmodel = torch.hub.load('intel-isl/MiDaS', 'DPT_Large', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='DPT_Large', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Computing relative depth from a single image", "api_name": "MiDaS", "api_call": "torch.hub.load(repo_or_dir='intel-isl/MiDaS', model='DPT_Large', pretrained=True)", "api_arguments": {"repo_or_dir": "intel-isl/MiDaS", "model": "model_type"}, "python_environment_requirements": "pip install timm", "example_code": ["import cv2", "import torch", "import urllib.request", "import matplotlib.pyplot as plt", "url, filename = ('https://github.com/pytorch/hub/raw/master/images/dog.jpg', 'dog.jpg')", "urllib.request.urlretrieve(url, filename)", "model_type = 'DPT_Large'", "midas = torch.hub.load('intel-isl/MiDaS', 'DPT_Large')", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')", "midas.to(device)", "midas.eval()", "midas_transforms = torch.hub.load('intel-isl/MiDaS', 'transforms')", "if model_type == 'DPT_Large' or model_type == 'DPT_Hybrid':", " transform = midas_transforms.dpt_transform", "else:", " transform = midas_transforms.small_transform", "img = cv2.imread(filename)", "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)", "input_batch = transform(img).to(device)", "with torch.no_grad():", " prediction = midas(input_batch)", "prediction = torch.nn.functional.interpolate(", " prediction.unsqueeze(1),", " size=img.shape[:2],", " mode='bicubic',", " align_corners=False,", ").squeeze()", "output = prediction.cpu().numpy()", "plt.imshow(output)", "plt.show()"], "performance": {"dataset": "10 distinct datasets", "accuracy": "Multi-objective optimization"}, "description": "MiDaS computes relative inverse depth from a single image. The repository provides multiple models that cover different use cases ranging from a small, high-speed model to a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs."}}
{"code": "###Instruction: A bird watching society is developing an app that can identify birds in a picture. Provide a suitable API that can be used for classifying birds from images.\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained NTS-Net model from PyTorch Hub specifically designed for fine-grained classification of bird species using the CUB200 2011 dataset.', 'code': 'import torch\nmodel = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})'}\n", "api_call": "torch.hub.load(repo_or_dir='nicolalandro/ntsnet-cub200', model='ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Fine-grained image classifier", "api_name": "ntsnet", "api_call": "torch.hub.load(repo_or_dir='nicolalandro/ntsnet-cub200', model='ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})", "api_arguments": {"pretrained": "True", "topN": "6", "device": "cpu", "num_classes": "200"}, "python_environment_requirements": ["torch", "torchvision", "PIL"], "example_code": "from torchvision import transforms\nimport torch\nimport urllib\nfrom PIL import Image\n\ntransform_test = transforms.Compose([\n transforms.Resize((600, 600), Image.BILINEAR),\n transforms.CenterCrop((448, 448)),\n transforms.ToTensor(),\n transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),\n])\n\nmodel = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True, **{'topN': 6, 'device':'cpu', 'num_classes': 200})\nmodel.eval()\n\nurl = 'https://raw.githubusercontent.com/nicolalandro/ntsnet-cub200/master/images/nts-net.png'\nimg = Image.open(urllib.request.urlopen(url))\nscaled_img = transform_test(img)\ntorch_images = scaled_img.unsqueeze(0)\n\nwith torch.no_grad():\n top_n_coordinates, concat_out, raw_logits, concat_logits, part_logits, top_n_index, top_n_prob = model(torch_images)\n\n_, predict = torch.max(concat_logits, 1)\npred_id = predict.item()\nprint('bird class:', model.bird_classes[pred_id])", "performance": {"dataset": "CUB200 2011", "accuracy": "Not provided"}, "description": "This is an nts-net pretrained with CUB200 2011 dataset, which is a fine-grained dataset of birds species."}}
{"code": "###Instruction: Provide an API recommendation for a call center which wants to convert customer voice calls into text.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('snakers4/silero-models', 'silero_stt', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Silero Models Speech-to-Text model from PyTorch Hub for converting customer voice calls into text. It supports English, German, and Spanish languages.', 'code': 'import torch\nmodel = torch.hub.load(\"snakers4/silero-models\", \"silero_stt\", pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Speech-To-Text", "api_name": "Silero Models", "api_call": "torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', pretrained=True)", "api_arguments": {"repo_or_dir": "snakers4/silero-models", "model": "silero_stt", "language": ["en", "de", "es"], "device": "device"}, "python_environment_requirements": ["pip install -q torchaudio omegaconf soundfile"], "example_code": ["import torch", "import zipfile", "import torchaudio", "from glob import glob", "device = torch.device('cpu')", "model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', device=device)", "(read_batch, split_into_batches, read_audio, prepare_model_input) = utils", "torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)", "test_files = glob('speech_orig.wav')", "batches = split_into_batches(test_files, batch_size=10)", "input = prepare_model_input(read_batch(batches[0]), device=device)", "output = model(input)", "for example in output:", " print(decoder(example.cpu()))"], "performance": {"dataset": "Open-STT", "accuracy": "See the wiki for quality and performance benchmarks"}, "description": "Silero Speech-To-Text models provide enterprise grade STT in a compact form-factor for several commonly spoken languages. The models are robust to a variety of dialects, codecs, domains, noises, and lower sampling rates. They consume a normalized audio in the form of samples and output frames with token probabilities. A decoder utility is provided for simplicity."}}
{"code": "###Instruction: Provide me with an API that can tackle city-scape segmentation in autonomous driving application.\n###Output: {'domain': 'Semantic Segmentation', 'api_call': \"model = torch.hub.load('pytorch/vision', 'deeplabv3_resnet101', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pre-trained DeepLabV3 model with a ResNet-101 backbone from PyTorch Hub for semantic segmentation tasks, suitable for city-scape segmentation in autonomous driving applications.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'deeplabv3_resnet101', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet101', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Semantic Segmentation", "framework": "PyTorch", "functionality": "DeepLabV3", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='deeplabv3_resnet101', pretrained=True)", "api_arguments": {"pretrained": "True"}, "python_environment_requirements": {"torch": "latest", "torchvision": "latest", "PIL": "latest", "matplotlib": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet101', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "input_image = input_image.convert('RGB')", "preprocess = transforms.Compose([", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)['out'][0]", "output_predictions = output.argmax(0)"], "performance": {"dataset": "COCO val2017", "accuracy": [{"model": "deeplabv3_resnet101", "Mean IOU": 67.4, "Global Pixelwise Accuracy": 92.4}]}, "description": "DeepLabV3 models with ResNet-50, ResNet-101 and MobileNet-V3 backbones for semantic segmentation. The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset."}}
{"code": "###Instruction: I need an API to extract features from a collection of photographs taken at the 2022 Olympics.\n###Output: {'domain': 'Feature Extraction', 'api_call': \"model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNext-101 32x8d WSL model from PyTorch Hub to extract features from a collection of photographs taken at the 2022 Olympics.', 'code': 'import torch\nmodel = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x8d_wsl', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNext WSL", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x8d_wsl', pretrained=True)", "api_arguments": [{"name": "resnext101_32x8d_wsl", "type": "str", "description": "ResNeXt-101 32x8d WSL model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl')", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(output[0])", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeXt-101 32x8d": {"Top-1 Acc.": "82.2", "Top-5 Acc.": "96.4"}}}, "description": "The provided ResNeXt models are pre-trained in weakly-supervised fashion on 940 million public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. The models significantly improve the training accuracy on ImageNet compared to training from scratch. They achieve state-of-the-art accuracy of 85.4% on ImageNet with the ResNext-101 32x48d model."}}
{"code": "###Instruction: An E-commerce manager wants to develop an image classification system for their products. They need a powerful pre-trained model as a starting point. Recommend an API for this purpose.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x32d_wsl', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNext WSL model from PyTorch Hub, which is suitable as a starting point for e-commerce product image classification.', 'code': 'import torch\nmodel = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x32d_wsl', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x32d_wsl', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "ResNext WSL", "api_call": "torch.hub.load(repo_or_dir='facebookresearch/WSL-Images', model='resnext101_32x32d_wsl', pretrained=True)", "api_arguments": [{"name": "resnext101_32x32d_wsl", "type": "str", "description": "ResNeXt-101 32x32d WSL model"}], "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x32d_wsl')", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(output[0])", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"ResNeXt-101 32x32d": {"Top-1 Acc.": "85.1", "Top-5 Acc.": "97.5"}}}, "description": "The provided ResNeXt models are pre-trained in weakly-supervised fashion on 940 million public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. The models significantly improve the training accuracy on ImageNet compared to training from scratch. They achieve state-of-the-art accuracy of 85.4% on ImageNet with the ResNext-101 32x48d model."}}
{"code": "###Instruction: I need an API to classify images with known objects. Suggest a suitable model that can do this.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'densenet169', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Densenet-169 model from PyTorch Hub for image classification of known objects.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'densenet169', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet169', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Dense Convolutional Network", "api_name": "Densenet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='densenet169', pretrained=True)", "api_arguments": [{"name": "densenet169", "type": "str", "description": "Densenet-169 model"}], "python_environment_requirements": {"torch": "latest", "torchvision": "latest"}, "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet169', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"densenet169": {"Top-1 error": 24.0, "Top-5 error": 7.0}}}, "description": "Dense Convolutional Network (DenseNet) connects each layer to every other layer in a feed-forward fashion. It alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters."}}
{"code": "###Instruction: A delivery company wants to recognize if a package is damaged during shipment. Propose an API that can classify images into damaged and undamaged packages.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet-101-IBN-a model from IBN-Net using PyTorch Hub, which can be fine-tuned for classifying images of packages into damaged and undamaged categories.', 'code': 'import torch\nmodel = torch.hub.load(\"XingangPan/IBN-Net\", \"resnet101_ibn_a\", pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet101_ibn_a', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "IBN-Net", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='XingangPan/IBN-Net', model='resnet101_ibn_a', pretrained=True)", "api_arguments": [{"name": "resnet101_ibn_a", "type": "str", "description": "ResNet-101-IBN-a model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('XingangPan/IBN-Net', 'resnet101_ibn_a', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"resnet101_ibn_a": {"Top-1 acc": 78.61, "Top-5 acc": 94.41}}}, "description": "IBN-Net is a CNN model with domain/appearance invariance. Motivated by style transfer works, IBN-Net carefully unifies instance normalization and batch normalization in a single deep network. It provides a simple way to increase both modeling and generalization capacities without adding model complexity. IBN-Net is especially suitable for cross domain or person/vehicle re-identification tasks."}}
{"code": "###Instruction: An image recognition app needs to identify objects from the images it captures. Suggest an API which is optimized for GPUs.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ProxylessNAS model optimized for GPUs from PyTorch Hub for object recognition in captured images.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_gpu', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_gpu', pretrained=True)", "api_arguments": [{"name": "proxylessnas_gpu", "type": "str", "description": "ProxylessNAS optimized for GPU"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_gpu'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_gpu", "accuracy": 75.1}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: Show me an API that provides easy to use neural networks for classifying different types of wildlife on mobile platforms.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_mobile', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load ProxylessNAS model specialized for mobile platforms from PyTorch Hub, which can be fine-tuned for classifying different types of wildlife.\", 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxylessnas_mobile', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_mobile', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "ProxylessNAS", "api_name": "mit-han-lab/ProxylessNAS", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/ProxylessNAS', model='proxylessnas_mobile', pretrained=True)", "api_arguments": [{"name": "proxylessnas_mobile", "type": "str", "description": "ProxylessNAS optimized for Mobile"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "target_platform = 'proxyless_mobile'", "model = torch.hub.load('mit-han-lab/ProxylessNAS', target_platform, pretrained=True)", "model.eval()"], "performance": {"dataset": [{"model_structure": "proxylessnas_mobile", "accuracy": 74.6}]}, "description": "ProxylessNAS models are from the ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware paper. They specialize CNN architectures for different hardware platforms, offering free yet significant performance boost on all three platforms (CPU, GPU, and Mobile) with similar accuracy."}}
{"code": "###Instruction: Recommend an API for identifying defective parts in a manufacturing assembly line based on images taken by an inspection system.\n###Output: {'domain': 'Classification', 'api_call': \"model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': \"Load the pretrained ResNet18 model from PyTorch Hub, which can be fine-tuned for identifying defective parts in a manufacturing assembly line based on images.\", 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet18', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet18', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet18", "top-1_error": 30.24, "top-5_error": 10.92}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: Identify an image classification API that can be used to determine if an object is a car, a bike, or a pedestrian.\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('pytorch/vision', 'resnet34', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained ResNet34 model from PyTorch Hub, which can be fine-tuned to classify objects as cars, bikes, or pedestrians.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'resnet34', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet34', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Deep Residual Networks", "api_name": "ResNet", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='resnet34', pretrained=True)", "api_arguments": [{"name": "pretrained", "type": "bool", "default": "False", "description": "If True, returns a model pre-trained on ImageNet"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet34', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "resnet34", "top-1_error": 26.7, "top-5_error": 8.58}}, "description": "ResNet models are deep residual networks pre-trained on ImageNet. They were proposed in the paper 'Deep Residual Learning for Image Recognition'. Available model variants include ResNet18, ResNet34, ResNet50, ResNet101, and ResNet152."}}
{"code": "###Instruction: I need an API to classify images efficiently without sacrificing too much accuracy. Can you provide me with one?\n###Output: {'domain': 'Image Classification', 'api_call': 'model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained SNNMLP Tiny model from PyTorch Hub, which incorporates LIF neurons for efficient image classification without sacrificing accuracy.', 'code': 'import torch\nmodel = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_t', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "SNNMLP", "api_call": "torch.hub.load(repo_or_dir='huawei-noah/Efficient-AI-Backbones', model='snnmlp_t', pretrained=True)", "api_arguments": [{"name": "snnmlp_t", "type": "str", "description": "SNNMLP Tiny model"}], "python_environment_requirements": ["torch", "torchvision", "PIL", "urllib"], "example_code": ["import torch", "model = torch.hub.load('huawei-noah/Efficient-AI-Backbones', 'snnmlp_t', pretrained=True)", "model.eval()", "from PIL import Image", "from torchvision import transforms", "input_image = Image.open(filename)", "preprocess = transforms.Compose([", " transforms.Resize(256),", " transforms.CenterCrop(224),", " transforms.ToTensor(),", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),", "])", "input_tensor = preprocess(input_image)", "input_batch = input_tensor.unsqueeze(0)", "if torch.cuda.is_available():", " input_batch = input_batch.to('cuda')", " model.to('cuda')", "with torch.no_grad():", " output = model(input_batch)", "print(torch.nn.functional.softmax(output[0], dim=0))"], "performance": {"dataset": "ImageNet", "accuracy": {"model": "SNNMLP Tiny", "top-1": 81.88}}, "description": "SNNMLP incorporates the mechanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. We propose a full-precision LIF operation to communicate between patches, including horizontal LIF and vertical LIF in different directions. We also propose to use group LIF to extract better local features. With LIF modules, our SNNMLP model achieves 81.9%, 83.3% and 83.6% top-1 accuracy on ImageNet dataset with only 4.4G, 8.5G and 15.2G FLOPs, respectively."}}
{"code": "###Instruction: To save the environment, a student wants to evaluate how green his schools area is. Tell me an AI API which can classify the images of plants in his environment and tell the name of the plants.\n###Output: {'domain': 'Plant Classification', 'api_call': 'model = torch.hub.load('pytorch/vision', 'wide_resnet101_2', pretrained=True)', 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Wide ResNet101 model from PyTorch Hub, which can be fine-tuned for plant classification andidentify the names of the plants in the given environment using image classification.', 'code': 'import torch\nmodel = torch.hub.load('pytorch/vision', 'wide_resnet101_2', pretrained=True)'}", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet101_2', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Wide Residual Networks", "api_name": "wide_resnet101_2", "api_call": "torch.hub.load(repo_or_dir='pytorch/vision', model='wide_resnet101_2', pretrained=True)", "api_arguments": "pretrained", "python_environment_requirements": "torch, torchvision", "example_code": ["import torch", "model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet101_2', pretrained=True)", "model.eval()"], "performance": {"dataset": "ImageNet", "accuracy": {"wide_resnet101_2": {"Top-1 error": 21.16, "Top-5 error": 5.72}}}, "description": "Wide Residual networks simply have increased number of channels compared to ResNet. Otherwise the architecture is the same. Deeper ImageNet models with bottleneck block have increased number of channels in the inner 3x3 convolution. The wide_resnet50_2 and wide_resnet101_2 models were trained in FP16 with mixed precision training using SGD with warm restarts. Checkpoints have weights in half precision (except batch norm) for smaller size, and can be used in FP32 models too."}}
{"code": "###Instruction: I need an efficient API to classify images on multiple edge devices with different resource constraints. Suggest one for me.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained Once-for-all (OFA) networks from PyTorch Hub, which are designed for efficient image classification across various edge devices with different resource constraints.', 'code': 'import torch\nmodel = torch.hub.load('mit-han-lab/once-for-all', 'ofa_supernet_mbv3_w10', pretrained=True)'}\n", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Once-for-all (OFA) Networks", "api_name": "torch.hub.load", "api_call": "torch.hub.load(repo_or_dir='mit-han-lab/once-for-all', model='ofa_supernet_mbv3_w10', pretrained=True)", "api_arguments": {"repository": "mit-han-lab/once-for-all", "model": "ofa_supernet_mbv3_w10", "pretrained": "True"}, "python_environment_requirements": ["torch", "torchvision"], "example_code": ["import torch", "super_net_name = 'ofa_supernet_mbv3_w10'", "super_net = torch.hub.load('mit-han-lab/once-for-all', super_net_name, pretrained=True).eval()"], "performance": {"description": "OFA networks outperform state-of-the-art NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5x faster than MobileNetV3, 2.6x faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and CO2 emission."}, "description": "Once-for-all (OFA) networks are a family of neural networks designed by MIT Han Lab. They decouple training and search, achieving efficient inference across various edge devices and resource constraints. OFA networks are pretrained on the IMAGENET dataset and are capable of classifying images into different categories."}}
{"code": "###Instruction: I want my app to be able to read aloud the text for audiobooks. Can you suggest me an API for converting text to speech?\n###Output: {'domain': 'Text-to-Speech', 'api_call': \"model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')\", 'api_provider': 'PyTorch', 'explanation': 'Load the NVIDIA Tacotron 2 model from PyTorch Hub, which generates mel spectrograms from input text, to convert text to natural-sounding speech. Pair it with the WaveGlow model for final speech generation.', 'code': 'import torch\nmodel = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math=\"fp16\")'}", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Text-to-Speech", "api_name": "Tacotron 2", "api_call": "torch.hub.load(repo_or_dir='NVIDIA/DeepLearningExamples:torchhub', model='nvidia_tacotron2', model_math='fp16')", "api_arguments": {"model_math": "fp16"}, "python_environment_requirements": ["numpy", "scipy", "librosa", "unidecode", "inflect", "libsndfile1"], "example_code": ["import torch", "tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp16')", "tacotron2 = tacotron2.to('cuda')", "tacotron2.eval()", "text = 'Hello world, I missed you so much.'", "utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils')", "sequences, lengths = utils.prepare_input_sequence([text])", "with torch.no_grad():", " mel, _, _ = tacotron2.infer(sequences, lengths)", " audio = waveglow.infer(mel)", "audio_numpy = audio[0].data.cpu().numpy()", "rate = 22050"], "performance": {"dataset": "LJ Speech", "accuracy": "Not specified"}, "description": "The Tacotron 2 model generates mel spectrograms from input text using an encoder-decoder architecture, and it is designed for generating natural-sounding speech from raw transcripts without any additional prosody information. This implementation uses Dropout instead of Zoneout to regularize the LSTM layers. The WaveGlow model (also available via torch.hub) is a flow-based model that consumes the mel spectrograms to generate speech."}}
{"code": "###Instruction: An app wants to identify dog breeds from images taken by users. Recommend an API that can classify the dog breed given a photo of a dog.\n###Output: {'domain': 'Image Classification', 'api_call': \"model = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_efficientnet_b0', pretrained=True)\", 'api_provider': 'PyTorch', 'explanation': 'Load the pretrained MEAL V2 EfficientNet-B0 model from PyTorch Hub, which can be fine-tuned for dog breed classification given a photo of a dog.', 'code': \"import torch\nmodel = torch.hub.load('szq0214/MEAL-V2', 'meal_v2', 'mealv2_efficientnet_b0', pretrained=True)\"}", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "provider": "PyTorch", "api_data": {"domain": "Classification", "framework": "PyTorch", "functionality": "Image Classification", "api_name": "MEAL_V2", "api_call": "torch.hub.load(repo_or_dir='szq0214/MEAL-V2', 'meal_v2', model='mealv2_efficientnet_b0', pretrained=True)", "api_arguments": {"model_name": "mealv2_efficientnet_b0"}, "python_environment_requirements": "!pip install timm", "example_code": "import torch\nfrom PIL import Image\nfrom torchvision import transforms\n\nmodel = torch.hub.load('szq0214/MEAL-V2','meal_v2', 'mealv2_resnest50_cutmix', pretrained=True)\nmodel.eval()\n\ninput_image = Image.open('dog.jpg')\npreprocess = transforms.Compose([\n transforms.Resize(256),\n transforms.CenterCrop(224),\n transforms.ToTensor(),\n transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n])\ninput_tensor = preprocess(input_image)\ninput_batch = input_tensor.unsqueeze(0)\n\nif torch.cuda.is_available():\n input_batch = input_batch.to('cuda')\n model.to('cuda')\n\nwith torch.no_grad():\n output = model(input_batch)\nprobabilities = torch.nn.functional.softmax(output[0], dim=0)\nprint(probabilities)", "performance": [{"dataset": "ImageNet", "accuracy": {"model": "MEAL-V2 w/ EfficientNet-B0", "resolution": "224", "parameters": "5.29M", "top1": "78.29", "top5": "93.95"}}], "description": "MEAL V2 models are from the MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks paper. The method is based on ensemble knowledge distillation via discriminators, and it achieves state-of-the-art results without using common tricks such as architecture modification, outside training data, autoaug/randaug, cosine learning rate, mixup/cutmix training, or label smoothing."}}