使用云服务

    云平台提供强大的硬件和基础设施,用于训练和部署深度学习模型。选择下面的云平台开始使用 PyTorch。


    Using PyTorch with AWS

    To gain the full experience of what PyTorch has to offer, a machine with at least one dedicated NVIDIA GPU is necessary. While it is not always practical to have your own machine with these specifications, there are our cloud based solutions to allow you to test and use PyTorch’s full features.

    AWS provides both:

    • Deep Learning AMIs: dedicated, pre-built machine learning instances, complete with PyTorch
    • Deep Learning Base AMI: bare Linux and Windows instances for you to do a custom install of PyTorch.

    Quick Start on Deep Learning AMI

    If you want to get started with a Linux AWS instance that has PyTorch already installed and that you can login into from the command-line, this step-by-step guide will help you do that.

    1. Sign into your AWS console. If you do not have an AWS account, see the primer below.
    2. Click on Launch a virtual machine.
    3. Select Deep Learning AMI (Ubuntu).

      This gives you an instance with a pre-defined version of PyTorch already installed. If you wanted a bare AWS instance that required PyTorch to be installed, you could choose the Deep Learning Base AMI (Ubuntu), which will have the hardware, but none of the software already available.

    4. Choose a GPU compute p3.2xlarge instance type.

      You can choose any of the available instances to try PyTorch, even the free-tier, but it is recommended for best performance that you get a GPU compute or Compute optimized instance. Other instance options include the Compute Optimized c5-series (e.g., c5.2xlarge) or the General Compute t2-series or t3-series (e.g., t2.2xlarge). It is important to note that if you choose an instance without a GPU, PyTorch will only be running in CPU compute mode, and operations may take much, much longer.

    5. Click on Review and Launch.
    6. Review the instance information and click Launch.
    7. You will want to Create a new key pair if you do not have one already to use. Pick a name and download it locally via the Download Key Pair button.
    8. Now click on Launch Instances. You now have a live instance to use for PyTorch. If you click on View Instances, you will see your running instance.
    9. Take note of the Public DNS as this will be used to ssh into your instance from the command-line.
    10. Open a command-line prompt
    11. Ensure that your key-pair has the proper permissions, or you will not be able to log in. Type chmod 400 path/to/downloaded/key-pair.pem.
    12. Type ssh -i path/to/downloaded/key-pair.pem ubuntu@<Public DNS that you noted above>. e.g., ssh -i ~/Downloads/aws-quick-start.pem ubuntu@ec2-55-181-112-129.us-west-2.compute.amazonaws.com. If asked to continue connection, type yes.
    13. You should now see a prompt similar to ubuntu@ip-100-30-20-95. If so, you are now connected to your instance.
    14. Verify that PyTorch is installed by running the verification steps below.

      If you chose the Deep Learning Base AMI (Ubuntu) instead of the Deep Learning AMI (Ubuntu), then you will need to install PyTorch. Follow the Linux getting started instructions in order to install it.

    Quick Start Verification

    To ensure that PyTorch was installed correctly, we can verify the installation by running sample PyTorch code. Here we will construct a randomly initialized tensor.

    import torch
    x = torch.rand(5, 3)
    print(x)
    

    The output should be something similar to:

    tensor([[0.3380, 0.3845, 0.3217],
            [0.8337, 0.9050, 0.2650],
            [0.2979, 0.7141, 0.9069],
            [0.1449, 0.1132, 0.1375],
            [0.4675, 0.3947, 0.1426]])
    

    Additionally, to check if your GPU driver and CUDA is enabled and accessible by PyTorch, run the following commands to return whether or not the CUDA driver is enabled:

    import torch
    torch.cuda.is_available()
    

    AWS Primer

    Generally, you will be using Amazon Elastic Compute Cloud (or EC2) to spin up your instances. Amazon has various instance types, each of which are configured for specific use cases. For PyTorch, it is highly recommended that you use the accelerated computing instances that feature GPUs or custom AI/ML accelerators as they are tailored for the high compute needs of machine learning.

    In order to use AWS, you need to set up an AWS account, if you do not have one already. You will create a username (your email address), password and an AWS account name (since you can create multiple AWS accounts for different purposes). You will also provide contact and billing information. The billing information is important because while AWS does provide what they call “free-tier” instances, to use PyTorch you will want more powerful, paid instances.

    Once you are logged in, you will be brought to your AWS console. You can even learn more about AWS through a set of simple tutorials.

    AWS Inferentia-based instances

    AWS Inferentia is a chip custom built by AWS to provide higher performance and low cost machine learning inference in the cloud. Amazon EC2 Inf1 instances feature up to 16 AWS Inferentia chips, the latest second generation Intel Xeon Scalable processors, and up to 100 Gbps networking to enable high throughput and lowest cost inference in the cloud. You can use Inf1 instances with Amazon SageMaker for a fully managed workflow, or use the AWS Neuron SDK directly which is integrated with PyTorch.

    GPU-based instances

    Amazon EC2 P4d instances deliver the highest performance for machine learning training on AWS. They are powered by the latest NVIDIA A100 Tensor Core GPUs and feature first in the cloud 400 Gbps instance networking. P4d instances are deployed in hyperscale clusters called EC2 UltraClusters that are comprised of more than 4,000 NVIDIA A100 GPUs, Petabit-scale non-blocking networking, and scalable low latency storage with FSx for Lustre. Each EC2 UltraCluster provides supercomputer-class performance to enable you to solve the most complex multi-node ML training tasks.

    For ML inference, AWS Inferentia-based Inf1 instances provide the lowest cost inference in the cloud. Additionally, Amazon EC2 G4dn instances featuring NVIDIA T4 GPUs are optimized for GPU-based machine learning inference and small scale training that leverage NVIDIA libraries.

    Creating and Launching an Instance

    Once you decided upon your instance type, you will need to create, optionally configure and launch your instance. You can connect to your instance from the web browser or a command-line interface. Here are guides for instance launch for various platforms:

    Amazon SageMaker

    With SageMaker service AWS provides a fully-managed service that allows developers and data scientists to build, train, and deploy machine learning models.

    See AWS documentation to learn how to configure Amazon SageMaker with PyTorch.

    Pre-Built AMIs

    AWS provides instances (called AWS Deep Learning AMIs) pre-built with a modern version of PyTorch. The available AMIs are:

    • Ubuntu
    • Amazon Linux
    • Windows 2016

    Amazon has written a good blog post on getting started with pre-built AMI.

    Installing PyTorch From Scratch

    You may prefer to start with a bare instance to install PyTorch. Once you have connected to your instance, setting up PyTorch is the same as setting up locally for your operating system of choice.

    Using PyTorch with Google Cloud

    To gain the full experience of what PyTorch has to offer, a machine with at least one dedicated NVIDIA GPU is necessary. While it is not always practical to have your own machine with these specifications, there are our cloud based solutions to allow you to test and use PyTorch’s full features.

    Google Cloud provides both:

    Google Cloud Primer

    In order to use Google Cloud, you need to set up an Google account, if you do not have one already. You will create a username (typically an @gmail.com email address) and password. Afterwards, you will be able to try Google Cloud. You will also provide contact and billing information. The billing information is initially used to prove you are a real person. And then, after your trial, you can choose to upgrade to a paid account.

    Once you are logged in, you will be brought to your Google Cloud console. You can even learn more about Google Cloud through a set of simple tutorials.

    Cloud Deep Learning VM Image

    Google Cloud provides no setup required, pre-configured virtual machines to help you build your deep learning projects. Cloud Deep Learning VM Image is a set of Debian-based virtual machines that allow you to build and run machine PyTorch learning based applications.

    GPU-based Virtual Machines

    For custom virtual machines, generally you will want to use Compute Engine Virtual Machine instances), with GPU enabled, to build with PyTorch. Google has various virtual machine types and pricing options, with both Linux and Windows, all of which can be configured for specific use cases. For PyTorch, it is highly recommended that you use a GPU-enabled virtual machines. They are tailored for the high compute needs of machine learning.

    The expense of your virtual machine is directly correlated to the number of GPUs that it contains. One NVIDIA Tesla P100 virtual machine, for example, can actually be suitable for many use cases.

    Deep Learning Containers

    Google Cloud also offers pre-configured and optimized Deep Learning Containers. They provide a consistent environment across Google Cloud services, making it easy to scale in the cloud or shift from on-premises. You have the flexibility to deploy on Google Kubernetes Engine (GKE), AI Platform, Cloud Run, Compute Engine, Kubernetes, and Docker Swarm.

    Installing PyTorch From Scratch

    You may prefer to start with a bare instance to install PyTorch. Once you have connected to your instance, setting up PyTorch is the same as setting up locally for your operating system of choice.

    Using PyTorch with Azure

    To gain the full experience of what PyTorch has to offer, a machine with at least one dedicated NVIDIA GPU is necessary. While it is not always practical to have your own machine with these specifications, there are our cloud based solutions to allow you to test and use PyTorch’s full features.

    Azure provides:

    Azure Primer

    In order to use Azure, you need to set up an Azure account, if you do not have one already. You will use a Microsoft-recognized email address and password. You will also verify your identity by providing contact and billing information. The billing information is necessary because while Azure does provide free usage credits and free services, you may need or want higher-end services as well.

    Once you are logged in, you will be brought to your Azure portal. You can even learn more about Azure through a set of simple video tutorials.

    Azure Machine Learning Service

    The Azure Machine Learning service is a cloud-based service you can use to accelerate your end-to-end machine learning workflows, from training to production. Azure Machine Learning allows you to easily move from training PyTorch models on your local machine to scaling out to the cloud. Using Azure ML’s CLI or Python SDK, you can leverage the service’s advanced functionality for distributed training, hyperparameter tuning, run history tracking, and production-scale model deployments.

    See the documentation to learn how to use PyTorch with Azure Machine Learning.

    Pre-Configured Data Science Virtual Machines

    Azure provides pre-configured data learning and machine learning virtual machines. PyTorch are available on many of these - for example here is the documentation for how to setup an Azure virtual machine on Ubuntu Linux.

    GPU-based Virtual Machines

    Microsoft has various virtual machine types and pricing options, with both Linux and Windows, all of which are configured for specific use cases. For PyTorch, it is highly recommended that you use the GPU optimized, virtual machines. They are tailored for the high compute needs of machine learning.

    The expense of your virtual machine is directly correlated to the number of GPUs that it contains. The NC6 virtual machine is, for example, one of the smallest, cheapest virtual machines and can actually be suitable for many use cases.

    Installing PyTorch From Scratch

    You may prefer to start with a bare virtual machine to install PyTorch. Once you have connected to your virtual machine, setting up PyTorch is the same as setting up locally for your operating system of choice.