RuntimeError: CUDA error: device-side assert triggered

CUDA kernel errors might be asynchronously reported at some other API call,
so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93:
operator(): block: [0,0,0], thread: [70,0,0]
Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"

数据超出了边界。在给出label的时候，我的数据集在某个标签上会给进去一个小于零或者大于类别数的一个标签，大白话就是设定了模型分类数量为4，但是数据集中有大于4个类别则会报错，少于4个是不会报错。

解决方法：修改预设的类别数量，class_map里面预先写好的类别个数和实际的数据集里面的标签种类和个数不同。可以要修改class_map或者修改数据集。

device-side assert triggered,CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [0,0,0], thread: [28,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

标签索引不正确，即标签编号大于定义标签，定义crf层的时候标签的类别数和yaml中的类别数不同，要么修改标签文件，要么修改层的类别数量定义

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

原因：

a = str(trues_cls.detach().numpy()[0]) #这样是错误的

numpy不能读取CUDA tensor 需要将它转化为 CPU tensor。

解决方法：

CUDA tensor格式的数据改成numpy时，需要先将其转换成cpu float-tensor随后再转到numpy格式。

a= str(trues_cls.detach().cpu().numpy()[0])  # 这样是正确的
# detach(): 返回一个新的Tensor,但返回的结果是没有梯度的。 
# cpu():把gpu上的数据转到cpu上。 
# numpy():将tensor格式转为numpy

RuntimeError: CUDA error: invalid device ordinal

CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

输入的显卡卡号错了，可以切换显卡号的数字尝试

不同的设备同样参数的情况下训练结果不一致的问题

原因：dropout层的随机性，当输入dropout层的数据维度大于57346时，从第57347位开始不同显卡的结果会不一样，原因是不同显卡的数据采样方法在第57346位后存在差别。

解决方法：手动构造一个由randn实现的dropout层，可以实现跨机结果一致

伯努利分布，使用torch.tensor,与cuda有关，randn和数字有关

服务器存在两个卡，但只能用其中一张卡跑程序

原因：环境使用了export，导致只有一个 GPU可见 (GPU:0)，而程序中使用 GPUs:1。

输入export CUDA_VISIBLE_DEVICES = '0,1'，让设备变为两个，方便在环境中随意切换