CDAAC Data reprocessing instructions. 1) Log into FOX first as cosmicops -> ssh cosmicops@fox **************************************************************************************************************** 2) Get the date range for this reprocessing job. To find out what data has already been reprocessed, go to the CDAAC Webpage: http://cdaac-www.cosmic.ucar.edu/cdaac/products.html And under "Data Access" / post-processed on the mission you would like to reprocess (i.e Grace) This will show you the date range for the data available Going back to your terminal on Fox: Run the findDoy.pl script to find date range -> findDoy.pl 2017 Reprocessing ranges should always start the day before the first day of the month, and end one day prior to the last day of the month. This is because we need fiducial data to be available one day after the last day, and sometimes these data is not available. Example: if you want to do reprocessing for June 2017 (06 2017.152-181) Your date range for this job will be: 2013.151-180. **************************************************************************************************************** 3) Once you have your date range - Do data inventories to see if all Fiducial data needed for this job is on disk for this date range + one day. -> inventory.pl 2013.151-181 fid --filegroup=lvl0 [maggie@fox ~]$ inventory.pl 2013.151-181 fid --filegroup=lvl0 bitArc codDcb satCrx ecmwfp iSond1 igsOrb igsPol igsClk ncpgfs rsond1 codOrb codPol codClk ecmAnl eraInt igsSnx 2013.151: 24 1 1 2 6553 1 1 1 4 1 0 0 0 4 4 1 2013.152: 24 1 1 2 6453 1 1 1 4 1 0 0 0 4 4 1 2013.153: 24 1 1 2 6346 1 1 1 4 1 0 0 0 4 4 1 2013.154: 24 1 1 2 6589 1 1 1 4 1 0 0 0 4 4 1 2013.155: 24 1 1 2 6679 1 1 1 4 1 0 0 0 4 4 1 2013.156: 24 1 1 2 6657 1 1 1 4 1 0 0 0 4 4 1 2013.157: 24 1 1 2 6325 1 1 1 4 1 0 0 0 4 4 1 2013.158: 24 1 1 2 6060 1 1 1 4 1 0 0 0 4 4 1 2013.159: 24 1 1 2 6413 1 1 1 4 1 0 0 0 4 4 1 2013.160: 24 1 1 2 6379 1 1 1 4 1 0 0 0 4 4 1 2013.161: 24 1 1 2 6727 1 1 1 4 1 0 0 0 4 4 1 2013.162: 24 1 1 2 6639 1 1 1 4 1 0 0 0 4 4 1 2013.163: 24 1 1 2 6684 1 1 1 4 1 0 0 0 4 4 1 2013.164: 24 1 1 2 7131 1 1 1 4 1 0 0 0 4 4 1 2013.165: 24 1 1 2 7033 1 1 1 4 1 0 0 0 4 4 1 2013.166: 24 1 1 2 5653 1 1 1 4 1 0 0 0 4 4 1 2013.167: 24 1 1 2 5630 1 1 1 4 1 0 0 0 4 4 1 2013.168: 24 1 1 2 6207 1 1 1 4 1 0 0 0 4 4 1 2013.169: 24 1 1 2 6238 1 1 1 4 1 0 0 0 4 4 1 2013.170: 24 1 1 2 6794 1 1 1 4 1 0 0 0 4 4 1 2013.171: 24 1 1 2 7115 1 1 1 4 1 0 0 0 4 4 1 2013.172: 24 1 1 2 7099 1 1 1 4 1 0 0 0 4 4 1 2013.173: 24 1 1 2 6825 1 1 1 4 1 0 0 0 4 4 1 2013.174: 24 1 1 2 7182 1 1 1 4 1 0 0 0 4 4 1 2013.175: 24 1 1 2 7206 1 1 1 4 1 0 0 0 4 4 1 2013.176: 24 1 1 2 7127 1 1 1 4 1 0 0 0 4 4 1 2013.177: 24 1 1 2 7087 1 1 1 4 1 0 0 0 4 4 1 2013.178: 24 1 1 2 6800 1 1 1 4 1 0 0 0 4 4 1 2013.179: 24 1 1 2 6901 1 1 1 4 1 0 0 0 4 4 1 2013.180: 24 1 1 2 6890 1 1 1 4 1 0 0 0 4 4 1 2013.181: 24 1 0 2 6794 1 1 1 4 1 0 0 0 4 4 1 ---------------------------------------------------------- 744 31 30 62 206216 31 31 31 124 31 0 0 0 124 124 31 Grand total: 207610 Files The above is an example where all Fiducial data is present. (It is ok for file types: codOrb,codPol,codClk to be 0 count; these were used for past date ranges when igsOrb,igsPol,igsClk file types were not available.) If you see some other file types (ecmAnl, eraInt) having a count of 0, please let Teresa know (vanhove@ucar.edu) **************************************************************************************************************** 4) Once Fiducial data is available, fetch level 0 data for the mission you would like to do for this date range + one day. At this moment we are doing reprocessing for the following missions: cosmic - COSMIC metopa - MetOp-A metopb - MetOp-B tsx - TerraSar-X grace - GRACE a) First check the inventory, maybe the data is already available (the example here is for mission: Grace) : -> inventory.pl 2013.151-181 grace --filegroup=lvl0 b) If there is no data, then get it: For mission GRACE: -> FetchRT.pm grcLv0 grace --daterange=2013.151-181 --nonotify --verbose as individual user (i.e maggie) For mission MetOp-A - Get it from HPSS -> unarchiveHsiTar.pl 2013.151-181 metopa --prefix=mtpLv0 For mission MetOp-B - Get it from HPSS -> unarchiveHsiTar.pl 2013.151-181 metopb --prefix=mtpLv0 For mission COSMIC - Get it from HPSS -> unarchiveHsiTar.pl 2013.151-181 cosmic --prefix=cosLv0 For mission TSX - There is a cron job on Fox that fetches the data : 40 */2 * * * /ops/tools/bin/getTsx.pl tsx 2>&1 To see if data is available do: -> inventory.pl 2013.151-181 tsx --filegroup=lvl0 Verified mission level 0 data is available. Here is an example for mission Grace: -> inventory.pl 2013.151-181 grace --filegroup=lvl0 grcLv0 2013.151: 15 2013.152: 19 2013.153: 15 2013.154: 18 2013.155: 16 2013.156: 15 2013.157: 12 2013.158: 15 2013.159: 15 2013.160: 18 2013.161: 15 2013.162: 16 2013.163: 16 2013.164: 16 2013.165: 15 2013.166: 16 2013.167: 17 2013.168: 16 2013.169: 18 2013.170: 15 2013.171: 16 2013.172: 15 2013.173: 17 2013.174: 17 2013.175: 15 2013.176: 19 2013.177: 16 2013.178: 15 2013.179: 17 2013.180: 15 2013.181: 17 ---------------------------------------------------------- 497 Grand total: 497 Files **************************************************************************************************************** 5) After all level zero data is on disk - Check that the processing daemons are running: -> daemons.pl --verify The output should be something as follows - stating "Found" at the beginning of each line, if you see "Not found" advise the CDAAC team. Found localhost(21621): /ops/tools/postgres/bin/postgres -D /ops/tools/postgres/data -i Found localhost(00897): /ops/tools/apache/bin/httpd -k start Found localhost(00911): /ops/tools/apache/bin/httpd -k start Found localhost(00914): /ops/tools/apache/bin/httpd -k start Found localhost(00924): /ops/tools/apache/bin/httpd -k start Found localhost(00939): /ops/tools/apache/bin/httpd -k start Found localhost(19399): /ops/tools/apache/bin/httpd -k start Found localhost(19673): /ops/tools/apache/bin/httpd -k start Found localhost(21636): queued.pl foo Found localhost(10573): filed.pl foo Found localhost(09774): master.pl fidrt Found localhost(32002): rdNTRIP_BNC.pl fidrt startup Found fox1(26059): slave.pl cosmic io --post --noclusterdb Found fox1(26094): slave.pl cosmictest io --post --noclusterdb Found fox1(26112): slave.pl grace io --post --noclusterdb Found fox1(26186): slave.pl grace io --post --noclusterdb Found fox1(26133): slave.pl metopa io --post --noclusterdb Found fox1(26209): slave.pl metopa io --post --noclusterdb Found fox1(26167): slave.pl metopb io --post --noclusterdb Found fox1(26226): slave.pl metopb io --post --noclusterdb Found fox1(26201): slave.pl tsx io --post --noclusterdb Found fox1(26243): slave.pl tsx io --post --noclusterdb Found fox2(13474): slave.pl fidrt io Found fox2(18553): slave.pl fidrt io Found fox3(03482): slave.pl fidrt io Found fox3(03559): slave.pl fidrt serial Found fox3(14153): slave.pl fidrt serial Found fox4(06348): slave.pl cosmic io --post --noclusterdb Found fox4(06387): slave.pl cosmic io --post --noclusterdb Found fox4(06384): slave.pl cosmictest io --post --noclusterdb Found fox4(06427): slave.pl cosmictest io --post --noclusterdb Found fox4(06421): slave.pl grace io --post --noclusterdb Found fox4(06462): slave.pl grace io --post --noclusterdb Found fox4(06458): slave.pl metopa io --post --noclusterdb Found fox4(06498): slave.pl metopa io --post --noclusterdb Found fox4(06496): slave.pl metopb io --post --noclusterdb Found fox4(06534): slave.pl metopb io --post --noclusterdb Found fox4(06532): slave.pl tsx io --post --noclusterdb Found fox4(06553): slave.pl tsx io --post --noclusterdb Found fox5(23839): slave.pl cosmic io --post --noclusterdb Found fox5(23845): slave.pl cosmic io --post --noclusterdb Found fox5(23879): slave.pl cosmictest io --post --noclusterdb Found fox5(23882): slave.pl cosmictest io --post --noclusterdb Found fox5(23913): slave.pl grace io --post --noclusterdb Found fox5(23919): slave.pl grace io --post --noclusterdb Found fox5(23953): slave.pl metopa io --post --noclusterdb Found fox5(23956): slave.pl metopa io --post --noclusterdb Found fox5(23989): slave.pl metopb io --post --noclusterdb Found fox5(23991): slave.pl metopb io --post --noclusterdb Found fox5(24025): slave.pl tsx io --post --noclusterdb Found fox5(24029): slave.pl tsx io --post --noclusterdb Found fox6(00781): slave.pl cosmic io --post --noclusterdb Found fox6(00782): slave.pl cosmic io --post --noclusterdb Found fox6(00823): slave.pl cosmictest io --post --noclusterdb Found fox6(00824): slave.pl cosmictest io --post --noclusterdb Found fox6(00859): slave.pl grace io --post --noclusterdb Found fox6(00865): slave.pl grace io --post --noclusterdb Found fox6(00899): slave.pl metopa io --post --noclusterdb Found fox6(00902): slave.pl metopa io --post --noclusterdb Found fox6(00935): slave.pl metopb io --post --noclusterdb Found fox6(00936): slave.pl metopb io --post --noclusterdb Found fox6(00971): slave.pl tsx io --post --noclusterdb Found fox6(00975): slave.pl tsx io --post --noclusterdb **************************************************************************************************************** 6) Then check there are no other reprocessing jobs running at the same time: -> ps -elf | grep -i postProcess If the output is as follows: 0 S maggie 25906 19802 0 80 0 - 25809 pipe_w 09:41 pts/32 00:00:00 grep -i postProcess then, you can start your post processing job. If you see lines with "postProcessing", you should wait until these jobs finish. **************************************************************************************************************** 7) start the job: (date range 2013.151-180) First go to the directory of the mission/output for example for grace cd /pub/grace/output -> nohup postProcess.pl 2013.151-180 grace --log --cluster & **************************************************************************************************************** 8) Check on the job: The output files will be in the /pub/mission/output directory To see the output log for this job: -> tail -f /pub/grace/output/postProcess_2013.151-180_0.out To do a listing of output files for mission Grace: -> ls -ltr /pub/grace/output/postProcess* **************************************************************************************************************** 9) Once the job is done, verify reprocessing has been successful. a) Check the high-level inventory for this daterange: -> inventory.pl 2013.151-180 grace --filegroup=highlevel If you see some days with 0 files, you can go to /pub/grace/output/YYYY.DDD directories and see what was the problem. b) Check the stats on the web: Go to the Website on Fox: http://fox.cosmic.ucar.edu:8080/cdaac/postProcess.html Click on a mission (i.e GRACE PP results) and then find the job's start day, click on that link. The page will show stats, failures, and inventories for this reprocessing job. **************************************************************************************************************** 10) In case the failures, here are some commands to clean the data and then reprocess again. To see a list of processing steps for this mission: -> postProcess.pl 1991.111 grace --list To clean data for this mission: (This will clean files and database entries from step 1 onwards) -> cleanup.pl 2013.151-180 grace --files --db --steps=1- To restart from a given step: -> postProcess.pl 2013.151-180 grace --steps=29- --log --cluster & **************************************************************************************************************** 11) Data Archiving Once verified, reprocessed data should be archived and migrated to both CDAAC-WWW (Web server) and Lynx (Research Server) Archive the latest reprocessed data from Fox - for file group=archive: ssh fox kinit archiveHsiTar.pl 2013.151-180 grace --filegroup=archive --symlinks Note: The --symlink option was added as we keep grace data on the nexus, thus it is symlinked to the grace mission. **************************************************************************************************************** 12) Data Publishing Once verified, reprocessed data should be migrated to both CDAAC-WWW (Web server) and Lynx (Research Server) ---------------------------------------------------------------------------------------------- Manx - Research server: Notes: We are currently not pushing to manx. Scientists running test missions can grab the data from cdaac-www. That way we avoid having data on manx that they may not be interested in. However, in case needed: First push from FOX: -> pushFiles.pl 2013.151-180 grace manx --filegroup=research --fast Also we need to ssh to manx to undump the dbdump files (this is to populate the database with these latest results. ) To do this: ssh manx: -> dailyDbUnDump.pl 2013.151-180 grace exit ---------------------------------------------------------------------------------------------- CDAAC-WWW - Web server: New way - When the data will only be stored as daily archived tarballs in /pub/archive: You will get data from the HPSS (archive) as daily tarballs: The following command will untar, create daily tarballs, populate the database and extract files needed for the Post Process Web stats: ssh cdaac-www kinit umask 002 /ops/tools/bin/unarchiveHsiTar.pl 2012.305-366 tsx --filegroup=public --dailytar --dbdumps --webreports Please note: so the dbdumps can be undumped you will need to have a user in the database on cdaac-www (createuser) Please email cosmicops@ucar.edu if you have questions about this process. Note: From Teresa: I haven't had any luck with the unarchiveHsiTar.pl script on cdaac-www for the last several months. Because the old CPU will be replaced (hardware is already here) I have just been working around it. On fox I do hsi get /COSMIC/CDAAC/metopa/archive_2017.152-181.tar for the monthly archive htarball just created. then I scp this htar file to cdaac-www. Then (can do as user cosmicops since no hsi is involved) I run unarchiveLocalTar.pl archive_2017.182-212.tar metopa public --dbdumps --webreports Then, once the daily tar balls are on cdaac-www I remove the monthly tar ball from fox and cdaac-www This is clunky but my suspicion is that cdaac-www is just so slow with its handshaking that the authentications to mass store time out and I looked at all the documentation I could find for htar and couldn't find any way to tell it to have a longer time period before timing out. If problem persists when the new machine is up and running then Matt and Doug will have to sort out the issue and fix it. **************************************************************************************************************** Any time a postProcess job has to be redone. run cleanup.pl to make sure the database especially is not confused with double counts and to remove any old files. --steps=1- is critical to make sure the cleanup does not remove the grcLv0 files. cleanup.pl 2017.213-243 grace --steps=1- --files --db If you had to kill a job that was hanging also always do daemons.pl --stop --regexp=grace ; sleep 3; daemons.pl --start **************************************************************************************************************** For any questions concerning this process, please send an email to COSMIC operations group at cosmicops@ucar.edu