Running the PSII Pipeline for Photosynthetic Potential Data

This pipeline uses the data transformers to extract chlorophyll fluorescence data from image files. Before starting, change to alpha branch with git checkout alpha.

Pipeline Overview

PSII currently uses 6 different programs for the analytical pipeline:

Program Function Input Output
cleanmetadata Cleans gantry generated metadata metadata.json metadata_cleaned.json
bin2tif Converts bin compressed files to geotiff image.bin image.tif
resizetif Resized original geotiffs to correct image.tif resized_image.tif
plotclip Clips geotiffs to the plot resized_image.tif, shapefile.geojson plot.tif
psii_segmentation Segments images given a validated set of thresholds plot.tif segment.csv
psii_fluorescence_aggregation Aggregates segmentation data for each image and calculates F0, Fm, Fv, and Fv/Fm segment.csv, multitresh.json :fluorescence_agg.csv

Running the Pipeline

Note

At this point, we assume that the interactive “foreman” and “worker” nodes have already been setup and are running, and the pipelines have been cloned from GitHub. If this is not the case, start here.

Retrieve data

Navigate to your PSII directory, download the data from the CyVerse DataStore with iRODS commands and untar:

cd /<personal_folder>/PhytoOracle/FlirIr
iget -rKVP /iplant/home/shared/phytooracle/season_10_lettuce_yr_2020/level_0/ps2Top/<ps2Top-date.tar>
tar -xvf <ps2Top-date.tar>

Edit scripts

  • process_one_set.sh, process_one_set2.sh

    Find your current working directory using the command pwd. Open process_one_set.sh and paste the output from pwd into line 15. It should look something like this:

    HPC_PATH="/xdisk/group_folder/personal_folder/PhytoOracle/PSII/"
    

    Set your .simg folder path in line 16.

    SIMG_PATH="/xdisk/group_folder/personal_folder/PhytoOracle/singularity_images/"
    
  • entrypoint.sh, entrypoint-2.sh

    In lines 7 and 11, specify the location of CCTools:

    /home/<u_num>/<username>/cctools-<version>-x86_64-centos7/bin/jx2json
    

    and

    /home/<u_num>/<username>/cctools-<version>-x86_64-centos7/bin/makeflow
    

Run pipeline

Begin processing using:

./run.sh <folder_to_process>

Note

This may return a notice with a “FATAL” error. This happens as the pipeline waits for a connection to DockerHub, which takes some time. Usually, the system will fail quickly if there is an issue.

If the pipeline fails, check to make sure you have a “/” concluding line 14 of process_one_set.sh. This is one of the most common errors and is necessary to connect the program scripts to the HPC.