Dragon Arrow written by Tatsuya Nakaji, all rights reserved animated-dragon-image-0164

Devide data into train and validation in Pytorch

Mar 01, 2020

Devide data into train and validation in Pytorch



Folder


Data Folder is like below constructure.

This is only example of animal image classifier.

root/
 ├ train/
 │ ├ horse/
 │ │  ├ 8537.png
 │ │  └ ...
 │ ├ butterfly/
 │ │  ├ 2857.png
 │    └ ... 
 ├ test/
 │ ├ horse/
 │ │  ├ 8536.png
 │ │  └ ...
 │ ├ butterfly/
 │ │  ├ 2856.png
 │    └ ... 


How to devide data


split data into train(0.8) and validation(0.2) with stratified target


# load library
import torch
import torchvision
from torchvision import datasets, transforms

# transform
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# ImageFolder
trainset = datasets.ImageFolder(root='./train',
                                        transform=transform)

# target array
targets = trainset.targets

# stratified split for validation
train_idx, valid_idx= train_test_split(
    np.arange(len(targets)),
    test_size=0.2,
    shuffle=True,
    stratify=targets)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, sampler=train_sampler, num_workers=2)
validloader = torch.utils.data.DataLoader(trainset, batch_size=4, sampler=valid_sampler, num_workers=2)


Now, you have train and validation by stratified split!!