Application of Transfer Learning Tuning to a Convolutional Neural Network for Image classification to the analysis of collisions in High Energy Physics
- General info
- Technologies
- Orignal repos
- Requirements
- How to download project
- How to get json files
- How to Create Images
- How to tune a Transfer Learning Model
- References and Useful Links
This project is based of this paper: https://arxiv.org/pdf/1708.07034.pdf
The program of LHC and CMS records the collision of high energy physics and shares the data in the worldwide LHC Computing Grid, which gives a platform for physicists in 42 countries.With CMS open data, this research applied convolutional neural network to classify tt + jets, W + jets and Drell-Yan processes. We compare the performance of five well-known CNN models and test transfer learning in particle classification.
Project process:
JSON files -> Image Dataset -> Augmented Dataset (optional) -> Transfer Learning Model -> classification results
- Google colab
- Jupyter notebooks
- For Image Creation : https://github.com/CeliaFernandez/Image-Creation
- For json file Creation : https://github.com/laramaktub/json-collisions
- For Augmentation : https://github.com/mdbloice/Augmentor
All required module can be found in the requirements.txt file
git clone https://github.com/jzyee/cms_image_classification
- After creating the json files transfer the json files
- from folder: json-cms/AnalysisFW/python/outputjsons
- to folder: json_files
This will create the images in the CreatedImages folder
- not advised if user is planning to use the free gpu from provided by google colab. User will require a stronger GPU for training to have decent training times
This will create the images in the CreatedImages/output folder
In this part, we can use Google Colaboratory to run the notebook as it provides GPU accelerators which help accelerate the training process. You will need a google drive account to upload your dataset into.
- Upload the CreatedImages folder to your google drive
- Rename the folders DYjets Images,TTjets Images,Wjets Images to 0,1,2 respectively.
- Download a model that you would like to use or all the .ipynb files in this folder, and their filenames represent the different models and activation function (e.g. “vgg19_quark_classification_RMSprop.ipynb” means using VGG19 model with RMSprop as its activation function)
An example of how to download an .ipynb file:
wget https://github.com/jzyee/cms_image_classification/blob/master/Training_Model/incepV3_quark_classification.ipynb
-
Open the link: https://colab.research.google.com, create a new python3 notebook, and upload the .ipynb files via “File->Upload notebook” in the menu at the top left of the page. Then you can see the codes are successfully loaded in google colaboratory.
-
To use GPU accelerator, you can change the setting via “Runtime->Change runtime type->Hardware accelerator->GPU->SAVE”. If loaded successfully, in the third cell it would print out ‘Found GPU at: /device:GPU:0’.
-
To rerun the codes, images are needed. According to our codes, images should be uploaded to your google drive.In the fifth cell, it will grant you access to your google drive by mounting the drive. You will need to change the image paths accordingly to where you have stored them on your drive
-
Each notebook is essestially is made up of 2 main parts.
- notebook setup
- model
For notebook setup:
Run all the cells under this header, the cells under this header will help you install the packages necessary and create your dataset. You will need to alter your working directory and your image path to where you have stored your images in your google drive account, you will find these particular cells under the heading data prepAn example of setting the current working directory in colab to 'drive/My Drive/lhc_durham':
%cd drive/'My Drive/lhc_durham'
An example of setting the image path in relation to the working directory in colab (drive/My Drive/lhc_durham/filetered_images):
img_folder = '/filtered_images'
Model section :
Each model section is broken up into 4 parts:- entirely frozen
- few layers unfrozen
- many layers unfrozen
- entirely unfrozen
You can choose to run one section to achieve a model tuned via a particular frozen-unfrozen proportion or run all the cells in the notebook to see the performance of the model with different frozen-unfrozen proportions. This technique of tuning the model is inspired by the above graph
-
If you run all the cells in the notebook, the trained models will be saved along with fitting history. This is so that you can load the model at a later time to carry the prediction again without the training.
-
Congratulations you have carried out transfer learning model tuning!
- the European Organization for Nuclear Research (2011). CMS detector design. Available at: http://cms.web.cern.ch/news/cms-detector-design. (Accessed: November 2011)
- the European Organization for Nuclear Research (2011). How CMS detects particles. Available at: http://cms.web.cern.ch/news/how-cms-detects-particles. (Accessed: November 2011)
- CREN Openlab (2017). White Paper: Future Challenges in Scientific Research. Available at:http://cds.cern.ch/record/2301895/files/Whitepaper_brochure_ONLINE.pdf. (Accessed: September 2017)
- towards data science (2018) Transfer learning from pre-trained models. Available at: https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751