Tips and tricks for successfully completeing ATLAS IML tutorial

Steps required to complete ATLAS IML workshop Tutorial

The tutorial can be found here : https://indico.cern.ch/event/668017/contributions/2947042/

Problem

Fetch the ipython notebooks to some data space that you can use on the IC cluster - currently it is your home area

Solution



  1. create your working directory -
    •    mkdir ~/ML-example
  2. go to your working directory  -
    • cd ~/ML-example
  3. fetch the notebooks from github - 
    • git clone https://github.com/stwunsch/iml_tensorflow_keras_workshop

Problem

The first time you run a notebook from the tutorial you will be asked to set a kernel.

Solution



  • The first time you run a notebook from this tutorial you will be asked to set a kernel from this pop up screen -

  • Select the kernel - "Python  conda env:4.2.0" (it should come up as default)
  • Set Kernel
  • Kernel will be set and screen will disappear.

Note - this is a python 3 kernel not a python 2 kernel

Problem

Notebooks are all reporting that they are not trusted -

Solution



  • First time you open a notebook it will not be trusted and this will be seen on your web browser

  • Select the Not Trusted button   and the jupyter trust screen will appear

  • Select the Red Trust button and the notebook will become trusted  

  • Note  you can run untrusted notebooks currently

Problem

Would like to monitor activity of the GPUs on your notebook

Solution



  1. determine what node you are running on -
    1. go to one of the IC submit hosts icsubmit01.sdcc.bnl.gov or icsubmit02.sdcc.bnl.gov
    2. determine if you have a running notebook in SLURM batch system -
      • squeue -u <your username>
      • look for NODELIST

        squeue example
        -bash-4.2$ squeue -u benjamin
                     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                    200461      long spawner- benjamin  R       8:46      1 ichost142
        
  2. using the node name add to Grafana monitoring URL and go to node monitoring

Problem

The notebook iml_tensorflow_keras_workshop/keras/mnist_train.ipyn requires a data file from amazon but the IC compute nodes do not have outbound connectivity

Solution



  1. from one of the IC submit hosts (icsubmit01 or icsubmit02) create the keras dataset directory -
    •     mkdir -pv ~/.keras/datasets
  2. go to this newly create directory -
    •    cd ~/.keras/datasets
  3. use wget to fetch required input file for the notebook
    •   wget https://s3.amazonaws.com/img-datasets/mnist.npz

Problem

The notebook iml_tensorflow_keras_workshop/keras/HIGGS_train.ipyn requires a data file from internet but the IC compute nodes do not have outbound connectivity

Solution



  1. from one of the IC submit hosts (icsubmit01 or icsubmit02) go to the directory with the notebook -
    •     cd ~/ML-example/iml_tensorflow_keras_workshop/keras/
  2. use wget to fetch required input file for the notebook (This is a big file 1.2 GB)
    •   wget http://mlphysics.ics.uci.edu/data/higgs/HIGGS.h5

Problem

ipython note books were written for python 2 kernel but now running in python 3.   Integer division changed between python 2 and python 3

Solution



  1. change the code in notebook ML-example/iml_tensorflow_keras_workshop/keras/custom_loss_metric_callback.ipynb
    • change  "/2"  to "//2"   Notice the double /  for all integer division steps.
  2. change the code in notebook ML-example/iml_tensorflow_keras_workshop/keras/fit_generator.ipynb
    • change  "/2"  to "//2"   Notice the double /  for all integer division steps.