Coding Guide to Demonstration of Targeted Data Poisoning Attacks in Deep Learning with Label Browsing in CIFAR-10 with PyTorch

In this tutorial, we demonstrate a practical data poisoning attack by manipulating labels in the CIFAR-10 dataset and seeing its impact on model behavior. We build a clean and poison training pipeline in parallel, using a ResNet-style convolutional network to ensure stable, uniform learning dynamics. By investigating the selection of a subset of samples from the target class to the malicious class during training, we show how subtle corruption in the data line can spread to systematic misclassification during decision-making. Check it out FULL CODES here.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
CONFIG = {
"batch_size": 128,
"epochs": 10,
"lr": 0.001,
"target_class": 1,
"malicious_label": 9,
"poison_ratio": 0.4,
}
torch.manual_seed(42)
np.random.seed(42)
We set the main area needed to check and define all parameters of the earth configuration in one place. We ensure reproducibility by fixing the random seed across PyTorch and NumPy. We also explicitly select the computing device so that the tutorial works well on both CPU and GPU. Check it out FULL CODES here.
class PoisonedCIFAR10(Dataset):
def __init__(self, original_dataset, target_class, malicious_label, ratio, is_train=True):
self.dataset = original_dataset
self.targets = np.array(original_dataset.targets)
self.is_train = is_train
if is_train and ratio > 0:
indices = np.where(self.targets == target_class)[0]
n_poison = int(len(indices) * ratio)
poison_indices = np.random.choice(indices, n_poison, replace=False)
self.targets[poison_indices] = malicious_label
def __getitem__(self, index):
img, _ = self.dataset[index]
return img, self.targets[index]
def __len__(self):
return len(self.dataset)
We develop a custom dataset wrapper that enables controlled label toxicity during training. We selectively select an adjustable proportion of samples from the target class to the malicious class while keeping the experimental data intact. We save the original image data so that only the integrity of the label is compromised. Check it out FULL CODES here.
def get_model():
model = torchvision.models.resnet18(num_classes=10)
model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
model.maxpool = nn.Identity()
return model.to(CONFIG["device"])
def train_and_evaluate(train_loader, description):
model = get_model()
optimizer = optim.Adam(model.parameters(), lr=CONFIG["lr"])
criterion = nn.CrossEntropyLoss()
for _ in range(CONFIG["epochs"]):
model.train()
for images, labels in train_loader:
images = images.to(CONFIG["device"])
labels = labels.to(CONFIG["device"])
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
return model
We describe a lightweight ResNet-based model developed for CIFAR-10 and implement a complete training loop. We train the network using standard cross-entropy loss and Adam’s optimization to ensure stable convergence. We keep the training logic similar to clean and poison data to isolate the impact of data toxicity. Check it out FULL CODES here.
def get_predictions(model, loader):
model.eval()
preds, labels_all = [], []
with torch.no_grad():
for images, labels in loader:
images = images.to(CONFIG["device"])
outputs = model(images)
_, predicted = torch.max(outputs, 1)
preds.extend(predicted.cpu().numpy())
labels_all.extend(labels.numpy())
return np.array(preds), np.array(labels_all)
def plot_results(clean_preds, clean_labels, poisoned_preds, poisoned_labels, classes):
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
for i, (preds, labels, title) in enumerate([
(clean_preds, clean_labels, "Clean Model Confusion Matrix"),
(poisoned_preds, poisoned_labels, "Poisoned Model Confusion Matrix")
]):
cm = confusion_matrix(labels, preds)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax[i],
xticklabels=classes, yticklabels=classes)
ax[i].set_title(title)
plt.tight_layout()
plt.show()
We make assumptions on the test set and collect predictions for quantitative analysis. We calculate confusion matrices to visualize class-wise behavior in both pure and poison models. We use this visual inspection to highlight the target patterns of misclassification introduced by the attack. Check it out FULL CODES here.
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))
])
base_train = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
base_test = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)
clean_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=0)
poison_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=CONFIG["poison_ratio"])
clean_loader = DataLoader(clean_ds, batch_size=CONFIG["batch_size"], shuffle=True)
poison_loader = DataLoader(poison_ds, batch_size=CONFIG["batch_size"], shuffle=True)
test_loader = DataLoader(base_test, batch_size=CONFIG["batch_size"], shuffle=False)
clean_model = train_and_evaluate(clean_loader, "Clean Training")
poisoned_model = train_and_evaluate(poison_loader, "Poisoned Training")
c_preds, c_true = get_predictions(clean_model, test_loader)
p_preds, p_true = get_predictions(poisoned_model, test_loader)
plot_results(c_preds, c_true, p_preds, p_true, classes)
print(classification_report(c_true, c_preds, target_names=classes, labels=[1]))
print(classification_report(p_true, p_preds, target_names=classes, labels=[1]))
We prepare the CIFAR-10 dataset, build clean and poison data loaders, and run both training pipelines end-to-end. We test the trained models on a distributed test to ensure proper comparison. We conclude the analysis by reporting the accuracy of a specific category and remember to express the effect of toxicity on the target category.
In conclusion, we have seen how the level of data label poisoning degrades the performance of a particular category without destroying the overall accuracy. We analyzed this behavior using confusion matrices and classification reports for each phase, revealing the target failure modes presented by the attack. This experiment reinforces the importance of data visualization, validation, and monitoring in real-world machine learning applications, especially in security-critical domains.
Check it out FULL CODES here. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Check out our latest issue of ai2025.deva 2025-centric analytics platform that transforms model implementations, benchmarks, and ecosystem activity into structured datasets that you can sort, compare, and export.
Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the power of Artificial Intelligence for the benefit of society. His latest endeavor is the launch of Artificial Intelligence Media Platform, Marktechpost, which stands out for its extensive coverage of machine learning and deep learning stories that sound technically sound and easily understood by a wide audience. The platform boasts of more than 2 million monthly views, which shows its popularity among viewers.



