Tips and tricks for successfully completeing ATLAS IML tutorial
Steps required to complete ATLAS IML workshop Tutorial
The tutorial can be found here : https://indico.cern.ch/event/668017/contributions/2947042/
Problem
Fetch the ipython notebooks to some data space that you can use on the IC cluster - currently it is your home area
Solution
- create your working directory -
- mkdir ~/ML-example
- go to your working directory -
- cd ~/ML-example
- fetch the notebooks from github -
- git clone https://github.com/stwunsch/iml_tensorflow_keras_workshop
Problem
The first time you run a notebook from the tutorial you will be asked to set a kernel.
Solution
- The first time you run a notebook from this tutorial you will be asked to set a kernel from this pop up screen -
- Select the kernel - "Python conda env:4.2.0" (it should come up as default)
- Set Kernel
- Kernel will be set and screen will disappear.
Note - this is a python 3 kernel not a python 2 kernel
Problem
Notebooks are all reporting that they are not trusted -
Solution
- First time you open a notebook it will not be trusted and this will be seen on your web browser
- Select the Not Trusted button and the jupyter trust screen will appear
- Select the Red Trust button and the notebook will become trusted
- Note you can run untrusted notebooks currently
Problem
Would like to monitor activity of the GPUs on your notebook
Solution
- determine what node you are running on -
- go to one of the IC submit hosts icsubmit01.sdcc.bnl.gov or icsubmit02.sdcc.bnl.gov
- determine if you have a running notebook in SLURM batch system -
- squeue -u <your username>
look for NODELIST
squeue example-bash-4.2$ squeue -u benjamin JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 200461 long spawner- benjamin R 8:46 1 ichost142
- using the node name add to Grafana monitoring URL and go to node monitoring
Problem
The notebook iml_tensorflow_keras_workshop/keras/mnist_train.ipyn requires a data file from amazon but the IC compute nodes do not have outbound connectivity
Solution
- from one of the IC submit hosts (icsubmit01 or icsubmit02) create the keras dataset directory -
- mkdir -pv ~/.keras/datasets
- go to this newly create directory -
- cd ~/.keras/datasets
- use wget to fetch required input file for the notebook
- wget https://s3.amazonaws.com/img-datasets/mnist.npz
Problem
The notebook iml_tensorflow_keras_workshop/keras/HIGGS_train.ipyn requires a data file from internet but the IC compute nodes do not have outbound connectivity
Solution
- from one of the IC submit hosts (icsubmit01 or icsubmit02) go to the directory with the notebook -
- cd ~/ML-example/iml_tensorflow_keras_workshop/keras/
- use wget to fetch required input file for the notebook (This is a big file 1.2 GB)
- wget http://mlphysics.ics.uci.edu/data/higgs/HIGGS.h5
Problem
ipython note books were written for python 2 kernel but now running in python 3. Integer division changed between python 2 and python 3
Solution
- change the code in notebook ML-example/iml_tensorflow_keras_workshop/keras/custom_loss_metric_callback.ipynb
- change "/2" to "//2" Notice the double / for all integer division steps.
- change the code in notebook ML-example/iml_tensorflow_keras_workshop/keras/fit_generator.ipynb
- change "/2" to "//2" Notice the double / for all integer division steps.
Related articles