The information here are derive from debug dumps during commissionning shifts. The dump command is
./tfc_test.py debug >& dump_file_name
For details about using the TFC python code, see the commissioning page. It is much simpler to diagnose problems if the TFC is in FORCED_WRITE mode. This is done by (N.B. Currently, all TFC's are initialized into the FORCED_WRITE mode.)
Shifters, please do send me the dumps for these circumstances. The descriptions are here to help diagnose whether a hang is caused by a "known" problem or by something unusual
| Date | State | Diagnosis/Symptoms | Debug File |
|---|---|---|---|
| 3/25/03 |
SENDWAIT/ L1CTT |
PCI-A states indicate a bus request/grant but no address cycle started. Assume this is a combination of setup time and use of lm_adrackn to indicate address cycle. Increase setup to 2 cycles and use lm_tsr[?] to indicate address cycle, thereby increasing hold time. | debug-tfc0.txt |
| 3/25/03 |
SENDWAIT/ L1CTT |
same as previous | debug-tfc1.txt |
| 3/27/03 | SENDWAIT/ L1CTT |
same as previous | tfc75_0debug.txt |
| 3/27/03 | SENDWAIT/ L1CTT |
same as previous | tfc75_0debug2.txt |
| 3/27/03 | SENDWAIT/ L1CTT |
same as previous | tfc75_0debug4.txt |
| 3/27/03 | SENDWAIT/ L1CTT |
same as previous | tfc75_1debug.txt |
| 3/27/03 | SENDWAIT/ L1CTT |
same as previous | tfc75_1debug2.txt |
| 3/27/03 | SENDWAIT/ L1CTT |
same as previous | tfc75_1debug4.txt |
| 4/12/03 16:51 |
FITWAIT | DSPA2 is apparently still fitting, and the EventWriter is waiting for the handshaking from the DSP. All fits scheduled. | 0412_fitwait.txt |
| 4/12/03 16:54 |
FITWAIT | same as previous. (DSP A1) | 0412_fitwait_2.txt |
| 4/12/03 18:10 |
FITWAIT | same as previous. (DSPA0) | fitwait_3.txt |
| 4/12/03 18:13 |
FITWAIT | same as previous. (DSPA0) | fitwait_4.txt |
| 4/12/03 08:32 |
STCREAD/ STCHD/ PROCWAIT |
State combination indicates either TFC PCI firmware didn't get start transfer, or no STC trailer word or stc_reader state machine missed PCI disconnect | stc_read.txt |
| 4/12/03 08:54 |
STCREAD | PCI-1/2 are apparently hung. This can be caused if the TFC misses a PCI disconnect because the TFC will never relinquish the IRDYn line. True cause unknown | stc_read2.txt |
| 4/15/03 |
FRCREAD/ CTTDAT |
Inspecting the L2 fifo data in the debug dump indicates invalid-format PCI data, probably missing the 1st CTT header word. Hang caused by mismatch in supposed CTT object count and number of words in PCI transfer. | frc0x70debug.txt |
| 4/16/03 15:39 |
STCREAD/ STCHD/ PROCWAIT |
State combination indicates either TFC PCI firmware didn't get start transfer, or no STC trailer word or stc_reader state machine missed PCI disconnect | tfc_debug_stcread0.txt |
| 4/16/03 16:04 |
STCREAD/ STCHD/ PROCWAIT |
same as previous | tfc_debug_stcread1.txt |
| 4/16/03 17:42 |
STCREAD/ STCHD/ PROCWAIT |
same as previous | tfc_debug_stcread2.txt |
| 4/16/03 17:52 |
STCREAD/ STCHD/ PROCWAIT |
same as previous | tfc_debug_stcread3.txt |
| 4/16/03 18:26 |
STCREAD | PCI-1/2 are apparently hung. This can be caused if the TFC misses a PCI disconnect because the TFC will never relinquish the IRDYn line. True cause unknown (See below for cases with more debug information available) | tfc_debug_stcread_final.txt |
| 4/16/03 18:30 |
FRCREAD | PCI is hung. No additional information available. (Probably same problem as seen below with more debug information available) | tfc_debug_final_pcihang.txt |
| 4/17/03 | FRCREAD | Lost CTT header problem Inspecting the L2 fifo data in the debug dump indicates invalid-format PCI data. Hang probably caused by mismatch in CTT object count and number of words in PCI transfer. | tfc_hung_bus_frcread.0 |
| 4/17/03 12:43 |
FRCREAD | Lost CTT header problem Same as previous (Same L2 FIFO data) | tfc_hung_bus_frcread.1 |
| 4/17/03 13:15 |
FRCREAD | Lost CTT header problem Same as previous (Same L2 FIFO data) | tfc_hung_bus_frcread.2 |
| 4/17/03 13:28 |
STCREAD/ STCHD/ XFERDATA |
??? Cannot distinguish between a number of possibilities including: (a) missing STC trailer and TFC missed PCI disconnect, (b) TFC missed trailer and PCI disconnect or (c) missed PCI disconnect. Probably are others as well. | tfc_hung_bus_frcread.3 |
| 4/17/03 15:04 |
STCREAD/ STCHD/ XFERDATA |
Same as preceeding entry | tfc_hung_bus_frcread.4 |
| 4/17/03 15:11 |
FRCREAD | Lost CTT header problem: Missing 1st of CTT header. Interpreting 2nd word byte 0 as word count=0xb5. | tfc_hung_bus_frcread.5 |
| 4/17/03 15:25 |
STCREAD/ STCHD/ XFERDATA |
Same as preceeding STCREAD entry | tfc_hung_bus_frcread.6 |
| 4/17/03 15:39 |
FRCREAD | Counter in CTT header word 0 doesn't match number of words transferred. PCI data looks unlike real CTT data. | tfc_hung_bus_frcread.7 |
| 4/25/03 10:59 |
STCREAD/ STCDAT/ CHANDONE |
Lost CTT header problemStates indicate PCI transfer completed properly, but stc_reader did not see either STC trailer simulataneously with end-of-PCI. | tfc_stcread.txt |
| 5/06/03 12:34 |
Illegal operating mode (3) | Unknown cause. Hoping it was just a hiccup. | crate70hung.txt |
| 5/08/03 17:32 |
STCREAD STCHD/ PROCWAIT |
Insufficient information to diagnose (need operating mode=11) | crate70tfc0-debug1-ctt_tv-stc_real.txt |
| 5/08/03 19:48 |
STCREAD/ STCEND/ XFERDATA |
c0c0 problem. The dump indicates that the L2 fifo was filled, and the last STC channel has bad data. The L2 fifo has many apparent STC starts without STC ends. The lm_tsr indicates that the last PCI transfer did stop. | crate70tfc0-debug2-ctt_tv-stc_real.txt |
| 5/08/03 20:35 |
STCREAD/ STCDAT/ XFERDATA |
PCI bus hang: PCI-1 and PCI-2 seem to be hung. The dump indicates that the L2 fifo was filled, and the last STC channel has no end-of-data. The lm_tsr indicates that the last PCI transfer never tried to stop | crate70tfc0-debug4-ctt_tv-stc_real.txt |
| 5/12/03 | FRCREAD/ CTTDAT/ CHANDONE |
Lost CTT header problem Crate 75 | TFC0_stt5_dump.txt |
| 5/12/03 | FRCREAD/ CTTDAT/ CHANDONE |
Lost CTT header problemCrate 75 | TFC1_stt5_dump.txt |
| 5/12/03 | FRCREAD/ CTTDAT/ CHANDONE |
Lost CTT header problem Crate 70. | Crate70_tfc0_frcread_hang.txt |
| 5/12/03 | STCREAD/ STCDAT XFERDATA |
PCI bus hangCrate 70, bus hang. Debug dump shows no STC trailer but L2 fifo full and transfer still active (including valid data flag from datataker). The channel read out order was changed, and problem stayed with physical channel. | Crate70_inputs_swapped.txt |
| 5/16/03 | STCREAD/ STCDAT/ XFERDATA |
PCI bus hang Error generated repeatedly at Stony Brook. Logic analyzer shows no end of PCI transfer for the 1st STC channel. (And a good transfer for comparison.) The run conditions are described in detail below | tfchang_sbtest_non_stop.txt |
| 5/19/03 | FRCREAD/ CTTH1/ CHANDONE |
PCI transfer terminated via disconnect with data after only 2 words. The FRC reader properly hung because of the logically incomplete event | TFC0stt5.txt |
| 5/19/03 | STCREAD | Dump file incomplete. No diagnosis possible | TFC1stt5.txt |
| 5/20/03 23:30 |
STCREAD/ STCDAT/ CHANDONE |
c0c0 Problem: | Crate71_3cards_1.txt |
| 5/20/03 23:46 |
STCREAD/ STCBLK/ CHANDONE |
Transfer apparently either not requested or finished with no words transferred | Crate71_3cards_2.txt |
| 6/17/03 | STCREAD/ STCPROC CHANDONE |
c0c0 Problem: | stcread40.txt |
| 6/17/03 | STCREAD/ STCPROC CHANDONE |
c0c0 Problem: | stcread40_b.txt |
| 6/17/03 | STCREAD/ STCPROC CHANDONE |
c0c0 Problem: | stcread41.txt |
| 6/17/03 | STCREAD/ STCPROC CHANDONE |
c0c0 Problem: | stcread41_b.txt |
| 6/18/03 3:19 |
FRCREAD | TFC in FRCREAD with no data present on inputs. | tfc00_18June13.19am.txt / tfc01_18June13.19am.txt |
| 6/18/03 11:15 |
??? | Is there actually a problem here? | tfc01_18June11.15am.txt |
| 6/18/03 14:43 |
STCREAD | STC channel missing input data | tfc20_18June14.43pm.txt |
| 6/18/03 11:17 |
??? | Is there actually a problem? | tfc40_18June11.17am.txt / tfc41_18June11.17am.txt |
| 6/18/03 11:59 |
STCREAD/ STCHD CHANDONE |
c0c0 Problem: | tfc40_18June11.59am.txt/ tfc41_18June11.59am.txt |
| 6/19/03 14:35 |
FRCREAD | TFC in FRCREAD with no data present on inputs. | tfc00_19June14.35pm.txt / tfc01_19June14.35pm.txt |
| 6/19/03 14:39 |
FRCREAD | Lost CTT header problemCrate 70 | tfc00_19June14.39pm.txt / tfc01_19June14.39pm.txt |
| 6/19/03 11:23 |
STCREAD and PCI-1/2 hang | TFC20/21 apparently in STCREAD and PCI-1/2 are hung | tfc20_bushang_19June11.23am.txt / tfc21_bushang_19June11.23am.txt / tfc20_bushang_19June11.30am.txt / tfc21_bushang_19June11.30am.txt |
| 6/19/03 14:30 |
FRCREAD | TFC in FRCREAD with no data present on inputs. | tfc40_19June14.30pm.txt / tfc41_19June14.30pm.txt |
| 6/19/03 11:12 |
STCREAD | STC channel missing input data | tfc50_19June11.12am.txt / tfc51_19June11.12am.txt |
| 6/19/03 11:27 |
STCREAD | STC channel missing input data | tfc50_19June11.27am.txt |
| 6/19/03 11:37 |
STCREAD | STC channel missing input data | tfc50_19June11.37am.txt |
| 6/19/03 14:34 |
FRCREAD | TFC in FRCREAD with no data present on inputs. | tfc51_19June14.34pm.txt |
| 11/03/03 | FRCREAD/ CTTDAT/ CHANDONE |
Lost CTT header problem Crate 70 | tfc50.debug.txt |
| 11/03/03 | FRCREAD/ CTTDAT/ CHANDONE |
Lost CTT header problem Crate 70 | tfc51.debug.txt |
| 11/05/3 | FRCREAD/ CTTDAT/ CHANDONE |
Lost CTT header problem Crate 70 | tfc20debug_183860.txt tfc21debug_183860.txt |
| 11/15/03 | STCREAD (2) | Both TFC's in crate 70 in STCREAD. No debug dump, TFC mode=15. All LTB channels have different numbers of events, and none are in overflow. Did the STC's decay away? No further diagnosis/speculation possible. | tfc00.txt tfc01.txt |
| 11/16/03 | DSP Hang, TFC7x | DSP B0 in ODPM state, and DSL1 in LOAD state. Probable cause is DSP B0 hung. Never seen before. | tfc1_stat_73_5.log |
| 11/17/03 19:28 |
PCI-3 hang, TFC51 (2) | PCI-C bus hang. Status only, no debug dump. No further diagnosis possible | tfc51_status.txt |
| 11/17/03 21:08 |
PCI-1/2 hang, TFC41 (2) | TFC41 Probable PCI-B hang. Debug included shows no obvious problem in TFC | tfc41_debug.txt |
| 11/17/03 21:41 |
PCI-1/2 hang, TFC41 (2) | TFC41 Probable PCI-B hang. Debug included shows no obvious problem in TFC | tfc41_debug.txt |
| 11/17/03 | PCI-1/2 hang(?), TFC30 (2) | TFC30 Debug only, no status. Apparently an STCREAD problem. No further diagnosis possible. **POSSIBLY CONSISTENT WITH PCI-1/2 HANG** | tfc30_debug.txt |
| 11/18/03 17:26 |
DSP Hang? | DSP B0 in ODPM state, and DSL1 in LOAD state. Probable cause is DSP B0 hung. Never seen before. | tfc31_hang_11_18.txt |
| 11/18/03 21:26 |
PCI-1/2 hang, TFC40(2) | TFC 40 apparently in STCREAD and PCI-1/2 are hung | tfc40.debug.txt tfc40.stt4.txt / tfc41.stt4.txt |
| 11/18/03 22:22 |
PCI-3 hang, TFC41 (2) | PCI-C bus hang in TFC41 slot. The debug dumps clearly show a bus hang, but no apparent problem with the TFC L3 related state machines. | tfc41.debug.txt / tfc41.stt4.txt |
| 11/18/03 23:00 |
PCI-3 hang 5x,
L3 read out phase problem (2) |
Both TFC's in crate 75 have same L3 read out hang. For both, last word transferred was FIT(last). Further diagnosis impossible. | tfc51_kb.txt tfc51_kb_2.txt |
| 11/18/03 | STCREAD PCI-1/2 hang? |
TFC in STCREAD state. Buses maybe OK, but could also be that status came after an SCL_INIT. Mode=15, no further diagnosis possible. | tfc41_debug_kevin.txt |
| 11/19/03 | PCI-3 hang TFC51,
L3 read out phase problem (2) |
TFC #1 in crate 75 has 2 L3 read out hangs. For both, last word transferred was FIT(last). | tfc51_pci_kb.txt / tfc51_pci_kb_2.txt |
| 12/02/03 | FRCREAD (TFC's 20 & 21 ) |
Both TFC's in crate enter FRCREAD for 1st event in. Internal states identical, including data in. The FRC LRB shows 16 events pending, and the STC LRB's show 8 events pending. This is not understood, but as it happened to both TFC's, it is likely an input problem. Note also that the following entry shows different problems occuring in crate x73 at the same time as these. | tfc20_frcread_6_07pm_dec2.txt / tfc21_frcread_6_07pm_dec2.txt |
| 12/02/03 | STCREAD (TFC's 30 & 31) |
Both TFC's in crate 0x73 enter STCREAD soon after init(?). In all cases, at least one of the STC input channels reports no events. Presumably this is indicating an upstream problem causing some STC channels to send no data. | tfc30_stcread_5_56pm_dec2.txt tfc30_stcread_6_07pm_dec2.txt tfc30_stcread_6_07pm_dec2_b.txt tfc31_stcread_5_56pm_dec2.txt tfc31_stcread_5_56pm_dec2_b.txt tfc31_stcread_6_07pm_dec2.txt tfc31_stcread_6_07pm_dec2_b.txt |
The "State" cell background colors indicate whether or not the problem has been addressed and/or whether enabling error recovery would prevent the hang.
| Light Green | TFC firmware bug found and fixed |
| Dark Green | Hardware fault found or non TFC bug found and fixed |
| Pink | Missing input data |
| Yellow | Error recovery will allow processing to continue, though perhaps with nonsense data (check the error flags). Problem believed to be input data errors. |
| Orange | Error recovery will address symptom. True cause unknown. |
| Red | Unknown source |
| default | No diagnosis yet |