Capsule Networks interesting papers 3

5. Capsule Network Performance on Complex Data

Investigate the utilisation of Capsule Networks in Other dataset like Cifar-10

CNN’s key disadvantages, and related reasons:

  • false positives: CNN’s classify the test data as the object, disregarding features’ relative spatial orientation to each other.
  • false negatives: The lack of rotational invariance in CNN’s would cause the network to incorrectly assign the object another label

Regularization and Overfitting:

  • CNN: dropout, and decrease feature amount
  • CapuleNet: reconstruction autoencoder


  • To explore the variety of model modification
    • Staking more capsule layers, the original one is built especially for MNIST
    • Increasing the number of primary capsules
    • Ensemble averaging
    • Modify scaling of reconstruction loss
    • Increase number of convolutional layers before capsule layer
    • Customized activation function


  • Most of the modifications proved to be inferior than the baseline
    • The results of learning curve is given
  • Reconstruction:
    • Reconstructions are blurry and lack distinct features for each class
      • Reason: CIFAR10 much more complex than MINIST
      • 3D reconstruction is much more complex

Review of the paper:

  • Modification part can be learned
  • The idea of matrix capsules using EM routing.
    • As the viewpoint changes, the pose matrix is modified in a way such that the votes from different capsules will persist, thereby allowing capsule network with pose matrix to be viewpoint invariant. The inclusion of such a pose matrix is an interesting and promising direction of research in the future, as it seems to handle capsule network’s shortcomings on complex data.



Powerful CNN structures

There’s been a long time haven’t focus on studying the CNN structure. Usually, the student like me do not have enough resource to develop big and complex structure. Some new powerful structures as follows:

1.Densely Connected Convolutional Networks

Recent works usually in two aspects:

  • Deeper, like VGG, ResNet
  • Wider, like Inception

Advantage of densenet:

  1. can reduce vanishing gradient
  2. make the most of the features
  3. enhance the feature’s translation
  4. reduce the number of parameters
  5. Not so deep and wide


  • Like Res-Block in ResuleNet, It has Dense-Block
  • In each Block, Each layer takes all preceding feature-maps as input
  • The DenseNet contains several blocks
  • In each block the image size is the same


Capsule Networks interesting papers 2

4. CapsuleGAN: Generative Adversarial Capsule Network

The first time to incorporate CapsuleNet in GAN

Target: incorporate CapsuleNet into GAN and outperform it. Build a model called GapsuleGANs

Drawbacks of GAN:

  • Mode Collapse: a commonly encountered failure case for GANs where the generator learns to produce samples with extremely low variety
  • inadequate mode coverage

Previous inspiration:

  • Deep Convolutional GAN: DCGANs
  • Recurrent Neural Networks GAN: GRANs


  • Change the discriminator part from CNN to Capsule Net
  • Sequence for the loop:
    • Set discriminator trainable, the discriminator is Capsule Networks
    • Use real dataset(like MINIST to be the positive sample) and generated dataset(low quality result obtained by generator to be the negative sample)to train the discriminator
    • Set discriminator untrainable
    • Use BP to train the CNN based generator
  • Final loss is Capsule Loss


  • Qualitatively: CapsuleGAN generates images that are more similar to real images and more diverse than those generated using convolutional GAN, leading to better semi-supervised classification performance on the test dataset.
    • CapGANs contains diverse result, solve the mode collapse
    • Cleaner in the result
  • Quantitatively: Use the metric called GAM(generative adversarial metric), in general:
    • The sample generated by CapGAN can cheat GAN
    • The sample generated by GAN can not cheat CapGAN

Future works: More structure can be constructed

Review of the paper:

  • Evaluation in both qualitatively and quantitatively is useful, and can be learned
  • Inspiration of constructing new structures can be learned
  • Helps to understand some new structures of GANs
  • Write in an elegant way



GAN tutorial

Recently, I am focusing on the research of Capsule Networks, and there’s an interesting one combine capsule networks and GAN together to be the CapGAN, this time let’s briefly review the content of basic GANs

1. The best material for Gan Martial I found:

2. the related code

3. How to Train a GAN? Tips and tricks to make GANs work

Capsule Networks interesting papers 1

1. The recognition of rice images by UAV based on capsule network

Utilization of  capsule net in rice identification

The reason why do not use CNN: many overlapping images, and training samples are limited

Target: Rice identification

Step and tricks:

  1. UAV to get rice images, only around 500 images, 300 for training and 200 for testing
  2. Image preprocess
    1. histogram equalization, and change into gray scale
    2. SLIC superpixel method, get the segmentation image
  3. Parallel Capsule Net
    • apply parallel Capsule Net
    • input is both gray scale image and SLIC superpixel image
    • Merge the features after primary capsule

Compared with CNN: in case of overlapping images, Capsule net is better

Review of the paper:

  • Actually not a good work
    • No detailed experiment result given, I think some cheat inside the paper
    • The structure details has some probelm
    • The discussion is very rough, cannot believe the author actually
  • Parallel structure can be learned

2. Brain Tumor Type Classification via Capsule Networks

Utilization of Capsule net in medical images

The reason y do not use CNN:

  • CNNs typically require large amount of training data
  • can not properly handle input transformations.


  • Adopt and incorporate CapsNets for the problem of brain tumor classification to design an improved architecture which maximizes the accuracy of the classification problem at hand;
  • Investigate the over-fitting problem of CapsNets based on a real set of MRI images;
  • Explore whether or not CapsNets are capable of providing better fit for the whole brain images or just the segmented tumor;
  • Develop a visualization paradigm for the output of the CapsNet to better explain the learned features.


  • Magnetic Resonance Imaging (MRI), 3064 images intotal, tumor positive samples 233
  • Two types. Whole brain images and segmented tumor


  • Designed CapsNet, Very interesting investigation of optimise the original structure to a new one
  • To avoid overfitting, use early-stopping
  • Two parts of Loss, CapsNet loss and Decoder loss. The former calculates the miss-classification error and is determined. The latter is related to the reconstruction part and is calculated using the square error between the input and the reconstructed image. This loss contributes to the total loss with a smaller weight.
  • Focus on the final loss, but the result show that the loss come from CapsNet loss
  • Gray image input
  • Interpret the capsules
  • Only 10 epochs, capsule net is better than CNN

Review of the paper:

  • A very promising paper, write and organise in an elegant style
  • Double loss can be learned
  • The method of design Capsule Net can be learned
  • Like visualization and understanding of CNNs, interpret the capsule is also very interesting in capsule networks.

3. Capsules for Object Segmentation

The first time using capsule networks in segmentation filed


  • Construct a Capsule Networks for segmentation


  • The structure contains only a few parameter compared with U-Net
  • Use k-fold cross validation
  • Slightly outperforms U-Net
  • Change the original algorithm and strucutre of Capsule Net, add locally-constrain and deconvolutional

Others: In paper introduced

  • In vector contains: spatial orientation, magnitude/prevalence, and other attributes of the extracted feature

Review of the paper:

The structure is learned from U-Net for segmentation, but unfortunately, it seems also used pooling, which would be the key drawback for this paper

  • Summarised the state-of-the art method in segmentation, can be learned
  • The method and illustration of paper can be learned
  • Can be utilized in large images, while others only utilize in very small image size just like 32 x 32, 28 x 28. Meanwhile, it has the drawback such as using pooling
  • Learn from U-Net, both in skip connection and deconvolutional
  • The paper is a little bit to understand