Stopping sbatch exit after transferring to controller
I am attempting to automate calling of a script. To do this, I have a top script that calls my analysis script. However, once the analysis script is called using sbatch, the top script terminates. Is there a way to prevent this? Thank you in advance for any help.
Due to limitations on the maximum number of jobs I can have submitted, using dependencies is not an option. Using the wait flag is also not an option as the first sbatch call will finish prior to all the sbatch calls being made resulting in only a portion of the desired runs being loaded to the controller. Also, I do not have sudo privileges on the cluster on which I work. I have searched the internet to no avail.
jobnumber=$(squeue -h -r -u <username> -o '%i' | wc -l)
samplelist=(1 2 3 4)
splitlist=(xaa xab xac xad xae xaf xag xah xai xaj)
###Currently script is closing after first sbatch call due to exit signal once script successfully transfers to controller
for sample in $samplelist; do
for xlist in $splitlist; do
arrsize=$(wc -l $sample/Ctg5PrBed/$xlist)
sbatch ~/Promoters/scripts/bash/Paraclu.sh 2 $xlist $sample $arrsize
while [ $jobnumber -gt 100 ]
do
sleep 3h
jobnumber=$(squeue -h -r -u aboyd003 -o '%i' | wc -l)
done
done
done
See also questions close to this topic
-
Terraform v0.13.3 : Getting Error : Error: Failed getting task definition ClientException: Unable to describe task definition. "wordpress"
my terraform code looks like below for task definition of resource
resource "aws_ecs_task_definition" "wordpress" { family = "wordpress" container_definitions = <<DEFINITION
[
When i run terraform plan command to execute my ECS cluster it is giving error as below Error: Failed getting task definition ClientException: Unable to describe task definition. "wordpress"
Do you have any insights for this to resolve?... THANKS
-
Why does Failover Cluster Manager GUI not reflect our Powershell input?
We are running the following commands in this order;
Stop-ClusterResource resource-name -Cluster cluster-name
This is successful and reports the resource as stopped.
Start-ClusterResource resource-name -Cluster cluster-name
This is successful and reports the resource as online, however the FCM GUI looks like the below picture despite running the Get-ClusterResource command which reports it as Online.
Discrepancy between CLI response & GUI feedback.
When stopping and starting a resource via the FCM GUI it works as intended.
-
Reliable messaging crate for Rust
What reliable message delivery crates are available for Rust that are similars to JGroups?
Should support:
- cluster creation and deletion across LAN and WAN
- joining and leaving of clusters
- detection and removal of crashed nodes
- view change notification
- point to multipoint reliable message delivery
- point to point reliable message delivery
Searched on crates.io without any success.
-
How to prepare a code for execution on a cluster so that it takes one parameter from a .txt file at a time?
I am preparing some C++ code to be run on a cluster, managed by SLURM. The cluster takes one compiled file: a.out. It will then execute it on 500 different nodes via the JOB_ARRAY. When executing each copy of the file, it will need to read one input parameter, say
double parameter
. My idea was to prepare a .txt file which would hold one value of theparameter
at each line. What is the best strategy for implementing the reading of theseparameter
values?a.out will read the value from the first line and immediately delete it. If this is the right strategy, how to ensure that two copies of a.out are not doing the same thing at the same time?
a.out will read the value from the n-th line. How to let the copy of a.out know which n is it working with?
Is there any better implementation strategy then the two above? If so, how to do this? Is C++ fstream the way to go, or shall I try something completely different.
Thank you for any ideas. I would appreciate if you also left some very simple code for how a.out shall look like.
-
-bash: all: command not found
i work in slurm and i add this worng path to my .bashrc
export path=/home/zabbas/wwatch3/exe/ww3_shel and i forget to add $ and i get this erros:-bash: ls: command not found -bash: vim: command not found
i can access to all directory but i can't run anything
-bash: srun: command not found -bash: squeue: command not found -bash: sbatch: command not found -bash: sinfo: command not found
how can i solve this problem and correct my .bashrc
-
Why it's not possible to run wget with background option in slurm script?
I used this script for downloading files. Without
-b
,wget
download files one by one. With-b
, I have the possibility to download files in background but also simultaneously. Unfortunately, the script doesn't work in SLURM. It only works without-b
in Slurm.Script for downloading files
#!/bin/bash mkdir data cd data for i in 11 08 15 26 ; do wget -cbq ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_1.fastq.gz wget -cbq ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_2.fastq.gz done cd ..
Slurm Script
#!/bin/bash #SBATCH --job-name=mytestjob # create a short name for your job #SBATCH --nodes=2 # node count #SBATCH --ntasks=2 # total number of tasks across all nodes #SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default #SBATCH --time=10:01:00 # total run time limit (HH:MM:SS) #SBATCH --array=1-2 # job array with index values 1, 2 #Execution bash download.sh
On the terminal :
sbatch slurmsript.sh
( It dosen't work)no jobid
-
Chain multiple SLURM jobs with dependency
In a previous question I asked how to queue a job B to start after job A, which is done with
sbatch --dependency=after:123456:+5 jobB.slurm
where
123456
is the id for job A, and:+5
denotes that it will start five minutes after job A. I now need to do this for several jobs. Job B should depend on job A, job C on B, job D on C.sbatch jobA.slurm
will returnSubmitted batch job 123456
, and I will need to pass the job id to the call with dependency for all but the first job. As I am using a busy cluster, I can't rely on incrementing the job ids by one, as someone might queue a job between.As such I want to write a script that takes the job scripts
(*.slurm
) I want to run as arguments, e.g../run_jobs.sh jobA.slurm jobB.slurm jobC.slurm jobD.slurm
The script should then run, for all jobs scripts passed to it,
sbatch jobA.slurm # Submitted batch job 123456 sbatch --dependency=after:123456:+5 jobB.slurm # Submitted batch job 123457 sbatch --dependency=after:123457:+5 jobC.slurm # Submitted batch job 123458 sbatch --dependency=after:123458:+5 jobD.slurm # Submitted batch job 123459
What is an optimal way to do this with bash?
-
How to properly call a perl script from within an SBATCH script for SLURM submissions? Perl script is executed but no output is obtained
I received a perl script that apparently is called from an SBATCH script to be submitted as a job to a computer cluster managed by SLURM. The script is old and I am yet to become more familiar with perl. Additionally, the perl script is being used as wrapper to call an executable with mpiexec_mpt. But whenever I do sbatch sbatch_submission, the perl script is executed by the computer node but I don't obtain any output or execution of the system() method - or I do but I don't know where it is. I know perl is executed by SBATCH because I got an error that it couldn't find a module so I manually pointed perl to the library path using the -l flag as shown below. But after that I don't get any output. The SBATCH script and the perl script are below:
SBATCH SCRIPT
1 #!/bin/bash 2 #SBATCH --job-name=job_submission 3 #SBATCH --output=output_perl.run 4 #SBATCH --error=error_perl.run 5 #SBATCH -n 2 # request 2 cores 6 #SBATCH --constraint=intel 7 8 # Load Needed Modules: 9 module load mpt 10 11 # Set-up environment for perl: 12 13 14 15 # Running perl script: 16 echo "Calling simple hello_world.c with perl (sbatch)" 17 18 perl input_perl.pl 1> perl_in.stdout 2> perl_in.stderr # edit after # suggestions 19 echo "Done with perl script (sbatch)" 20
PERL INPUT
1 #!/usr/bin/perl -w 2 use strict; 3 use warnings; 4 use diagnostics; 5 use List::MoreUtils qw(indexes); ## edit after suggestions 6 system("echo this is your hostname:"); 7 system("hostname"); 8 system("mpiexec_mpt -np 2 hello_world"); 9 print "Done executing hello world! from within perl script!\n"
OUTPUT FROM STDERR
1 Can't locate List/MoreUtils.pm in @INC (@INC contains: /usr/lib64/perl5/vendor_perl/List /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl / usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at input_perl.pl line 5.
Aside from the output above the output files: perl.output and output_perl.run are empty.
I suspect I am missing something regarding the applicability of the system() method in perl, as well as how to tell perl where to send it's output when working with slurm. I have also tried generating a .txt file with the perl script, but when I run it with SBATCH the .txt file is not generated. I have no issues running the perl_input.pl without using the SBATCH script as wrapper: e.g: perl perl_input.pl.
Additional info, the hello_world executable has been written in .c and I have tested it independently and it runs. It is a simple MPI program that lists ranks and size. I don't think that's the issue though.
Independently and running locally the perl and .c scripts run, it's when I use SBATCH that the issues arise.
I would appreciate it if you could give me any useful info or point me in the right direction, to figure this out! Thanks!
-
Split a large gz file into smaller ones filtering and distributing content
I have a gzip file of size 81G which I unzip and size of uncompressed file is 254G. I want to implement a bash script which takes the gzip file and splits it on the basis of the first column. The first column has values range between 1-10. I want to split the files into 10 subfiles where by all rows where value in first column is 1 is put into 1 file. All the rows where the value is 2 in the first column is put into a second file and so on. While I do that I don't want to put column 3 and column 5 in the new subfiles. Also the file is tab separated. For example:
col_1 col_2. col_3. col_4. col_5. col_6 1. 7464 sam. NY. 0.738. 28.9 1. 81932. Dave. NW. 0.163. 91.9 2. 162. Peter. SD. 0.7293. 673.1 3. 7193. Ooni GH. 0.746. 6391 3. 6139. Jess. GHD. 0.8364. 81937 3. 7291. Yeldish HD. 0.173. 1973
File above will result in three different gzipped files such that col_3 and col_5 are removed from each of the new subfiles. What I did was
#!/bin/bash #SBATCH --partition normal #SBATCH --mem-per-cpu 500G #SBATCH --time 12:00:00 #SBATCH -c 1 awk -F, '{print > $1".csv.gz"}' file.csv.gz
But this is not producing the desired result. Also I don't know how to remove col_3 and col_5 from the new subfiles. Like I said gzip file is 81G and therefore, I am looking for an efficient solution. Insights will be appreciated.