Proto  3.2
Proto_Timer.H
Go to the documentation of this file.
1 
2 #ifndef _PR_TIMER_H_
3 #define _PR_TIMER_H_
4 
5 #include <cstdlib>
6 #include <vector>
7 #include <string>
8 #include <iostream>
9 #include <fstream>
10 #include <list>
11 #include <cstdio>
12 #include <cstring>
13 #include <sys/time.h>
14 #include <chrono>
15 
16 inline unsigned long long int PR_ticks()
17 {
18 
19  using namespace std::chrono;
20  return steady_clock::now().time_since_epoch().count();
21 
22 }
23 #define PR_TICKS
24 
25 using std::string;
26 
27 /** TraceTimer class is a self-tracing code instrumentation system
28 
29  TraceTimer class is a self-tracing code instrumentation system
30  for Chombo (or any other package really). The user interface is specified
31  by a small set of macros. The usage model is that you just leave these
32  timers in the code, for good. Initially, your application will have 'main'
33  and a few hewavy functions instrumented, and the lower level Chombo library
34  instrumentation. As your tool or application matures, it
35  will garner a larger set of instrumentation giving clear views of your
36  code performance. After a routine has been
37  cleverly and lovingly optimized, you leave in the timers, to spot when
38  some later bug fix or *improvement* undoes your previous labors.
39 
40  \note
41  You should never need to use or interact with the the classes TraceTimer or
42  AutoStart. Use the macros. They call the right functions and classes for you.
43 
44  The first macro is what people will use the most:
45  \code
46  PR_TIME("label");
47  \endcode
48 
49  This is the simplest interface for timers. you place this macro call in a function
50  you wish to be timed. It handles making the timer, calling 'start' when you
51  enter the function, and calling 'stop' when you leave the function. A good
52  idea is to use a 'label' specific enough to be unambiguous without being
53  overwhelming. for instance:
54 
55  \code
56  void AMRLevelPolytropicGas::define(AMRLevel* a_coarserLevelPtr,
57  const ProblemDomain& a_problemDomain,
58  int a_level,
59  int a_refRatio)
60  {
61  PR_TIME("AMRLevelPolytropicGas::define");
62  .
63  .
64  }
65  \endcode
66 
67  In this case, we have a class with many constructors and define functions that
68  all funnel into a single general function. We can just call this 'define' and
69  not worry about naming/instrumenting all the different overloaded instances. If
70  you slip up and use the same label twice, that is not a real problem, the two
71  locations will be timed and tracked properly (even if one is a sibling or parent
72  of the other). The only place it will make things a little harder is in the output
73  where you might have the same name show up and look confusing.
74  <br><br>
75  In serial, you will see a file called <em>time.table</em> (in parallel, you will get a
76  <em>time.table.n</em> (where n is the rank number) files). If
77  you want fewer files, you can do
78  setenv PR_OUTPUT_INTERVAL nproc
79  and it will only output every nproc processors time.table.n files
80  (where n%nproc == 0). I won't go into this file
81  format. It is kind of gprof-ish, with what I consider improvements. The real
82  benefit here is profiling that understands our Chombo context, a smaller information
83  set to observe, and the fact that, so far in my testing, the timers have negligible
84  impact on the run time or memory use of the code.
85  <br><br>
86  By default, Chombo compiles in the instructions for the timers wherever the macros
87  appear. If the compiler macro <b>PR_NTIMER</b> is defined, then all the PR_TIME* macros
88  evaluate to empty expressions at compile time.
89 
90  \par So, you put some PR_TIME calls in your code and ran it, and nothing happened:
91  Chombo looks for the environment variable <b>PR_TIMER</b>. If it is set to anything (even
92  if it is set to 'false' or 'no' or whatever) then the timers will be active and
93  reporting will happen. If this environment variable is not set, then all the timers
94  check a bool and return after doing nothing.
95  \par
96  One point of interest with using the environment variable: In parallel jobs using
97  mpich, only processor 0 inherits the environment variables from the shell where
98  you invoke 'mpirun', the rest read your .cshrc (.bashrc, etc.) file to get their
99  environment. To time all your processes, you need to make sure the <b>PR_TIMER</b>
100  environment variable gets to all your processes.
101 
102 
103 
104  \par Auto hierarchy:
105  The timers automatically figure out their parent/child relationships. They
106  also can be placed in template code. This has some consequences. First,
107  if you have a low level function instrumented that has no timers near it in
108  the code call stack, you will see it show up as a child of a high level timer.
109  the root timer "main" will catch all orphaned timers. So, even though you
110  might make no call to, say, 'exchange' in your 'main' function, you might
111  very well call a function, that calls a function, that calls 'exchange'. Since
112  no code in between was instrumented, this exchange is accounted for at 'main'.
113  This might look strange, but it should prove very powerful. An expensive orphan
114  is exactly where you should consider some more timers, or reconsidering code
115  design.
116 
117  \par
118  For performance reasons, child timers have only one parent. As a consequence
119  each PR_TIME("label") label can show up at multiple places in your output. Each
120  instance has it's own timer. So, each path through the call graph that arrives
121  at a low-level function has a unique lineage, with it's own counter and time.
122  Thus, I can instrument LevelData::copyTo once, but copyTo can appear in many
123  places in the time.table file.
124 
125 
126  The next level up in complexity is the set of *four* macros for when you want
127  sub-function resolution in your timers. For instance, in a really huge function
128  that you have not figured out how to re-factor, or built with lots of bad cut n paste
129  code 're-use'.
130  \code
131  PR_TIMERS("parent");
132  PR_TIMER("child1", t1);
133  PR_TIMER("child2", t2);
134  PR_START(t1);
135  PR_STOP(t1);
136  PR_START(t2);
137  PR_STOP(t2);
138  PR_START(t1);
139  PR_STOP(t1);
140  \endcode
141 
142  PR_TIMERS has the same semantic as PR_TIME, except that you can declare an
143  arbitrary number of children after it in the same function scope. The
144  children here do not autostart and autostop, you have to tell them where to
145  start and stop timing. The children can themselves be parents for timers
146  in called functions, of course. The children obey a set of mutual exclusions. The
147  following generate run time errors:
148  - double start called
149  - double stop called
150  - start called when another child is also started
151  - you leave the function with a child not stopped
152 
153  the following will generate compile time errors:
154  - more than one PR_TIME macro in a function
155  - invoking PR_TIMER("child", t) without having first invoked PR_TIMERS
156  - re-using the timer handle ie. PR_TIMER("bobby", t1); PR_TIMER("sally", t1)
157  - mixing PR_TIME macro with PR_TIMER
158  - mixing PR_TIME macro with PR_TIMERS
159 
160  You do not have to put any calls in your main routine to activate the clocks
161  or generate a report at completion, this is handled with static iniitalization
162  and an atexit function.
163  <br><br>
164  There is a larger argument of manual instrumentation being counter to good development.
165  Profiling the code is supposed to tell you where to expend your optimization effort.
166  Manual instrumentation opens the door to people wasting time *assuming* what parts of the
167  code are going to take up lots of time and instrumenting them, before seeing any real
168  performance data. Good judgement is needed. We have a body of knowledge about Chombo
169  that will inform us about a good minimal first set of functions to instrument.
170 */
171 namespace Proto
172 {
173 #ifndef PR_TURN_OFF_TIMERS
174  inline double TimerGetTimeStampWC()
175  {
176 
177  struct timeval tv; // Values from call to gettimeofday
178  struct timezone tz;
179  gettimeofday(&tv, &tz);
180  return((double)tv.tv_sec + 0.000001 * (double)tv.tv_usec);
181 
182  }
183 
184  class TraceTimer
185  {
186  public:
187  virtual ~TraceTimer()
188  {
189  for(unsigned int i=0; i<m_children.size(); ++i)
190  {
191  delete m_children[i];
192  }
193  }
194 
195  TraceTimer()
196  {
197  m_filename = string("/dev/null");
198  }
199  inline void start(char* mutex);
200  inline unsigned long long int stop(char* mutex);
201  inline void report(bool a_closeAfter=false);
202  inline void reset();
203 
204 
205  inline void leafStart();
206  inline void leafStop();
207 
208  unsigned long long int time() const
209  {
210  return m_accumulated_WCtime;
211  }
212 
213  int rank() const
214  {
215  return m_rank;
216  }
217  long long int count() const
218  {
219  return m_count;
220  }
221 
222  void prune();
223  bool isPruned() const {return m_pruned;}
224 
225 
226  ///need to set the filename so it is not /dev/null
227  void setTimerFileName(string a_filename)
228  {
229 
230  std::vector<TraceTimer*>* troots = getRootTimerPtr();
231  (*troots)[0]->m_filename = a_filename;
232  }
233 
234  inline TraceTimer* getTimer(const char* name); // don't use
235  inline const std::vector<TraceTimer*>& children() const ;//don't use.
236 
237  inline void PruneTimersParentChildPercent(double percent);
238  inline void sampleMemUsage() ;
239 
240  inline void addFlops(long long int flops) {m_flops+=flops;}
241 
242 
243  const char* m_name;
244  mutable int m_rank;
245  int tid = 0;
246  int m_thread_id;
247  long long int m_flops;
248  long long int m_count;
249 //#pragma omp atomic
250 
251  static std::vector<TraceTimer*>* getRootTimerPtr()
252  {
253  static std::vector<TraceTimer*>* retval = new std::vector<TraceTimer*>();
254  static bool initialized = false;
255  if(!initialized)
256  {
257  std::vector<TraceTimer*>* current = getCurrentTimerPtr();
258  //this is the code from initialize
259  const char* rootName = "root";
260  TraceTimer* rootTimer = new TraceTimer(rootName, NULL, 0);
261  rootTimer->m_thread_id = 0;
262  char mutex = 0;
263 
264  retval->resize(1);
265  (*retval)[0] = rootTimer;
266 
267  (*current).resize(1);
268  (*current)[0]=rootTimer;
269 
270  (*retval)[0]->tid = 0;
271  rootTimer->start(&mutex);
272  rootTimer->zeroTime = TimerGetTimeStampWC();
273  rootTimer->zeroTicks = PR_ticks();
274  }
275  initialized = true;
276  return retval;
277  }
278 
279 
280  static std::vector<TraceTimer*>* getCurrentTimerPtr()
281  {
282  static std::vector<TraceTimer*>* retval = new std::vector<TraceTimer*>(1);
283  return retval;
284  }
285 
286 
287  //oh the evil crap we have to do to avoid static initialization
288  static void staticReport()
289  {
290  std::vector<TraceTimer*>* vecptr = getRootTimerPtr();
291  if(vecptr->size() > 0)
292  {
293  (*vecptr)[0]->report();
294  }
295  }
296 
297  //oh the evil crap we have to do to avoid static initialization
298  static void staticReset()
299  {
300  std::vector<TraceTimer*>* vecptr = getRootTimerPtr();
301  if(vecptr->size() > 0)
302  {
303  (*vecptr)[0]->reset();
304  }
305  }
306 
307  //oh the evil crap we have to do to avoid static initialization
308  static void staticPruneTimersParentChildPercent(double percent)
309  {
310  std::vector<TraceTimer*>* vecptr = getRootTimerPtr();
311  if(vecptr->size() > 0)
312  {
313  (*vecptr)[0]->PruneTimersParentChildPercent(percent);
314  }
315  }
316 
317  //oh the evil crap we have to do to avoid static initialization
318  static void staticSetTimerFileName(string a_filename)
319  {
320  std::vector<TraceTimer*>* vecptr = getRootTimerPtr();
321  if(vecptr->size() > 0)
322  {
323  (*vecptr)[0]->setTimerFileName(a_filename);
324  }
325  }
326 
327  //more evil crap
328  static int getTID()
329  {
330  int retval = -1;
331  std::vector<TraceTimer*>* vecptr = getRootTimerPtr();
332  if(vecptr->size() > 0)
333  {
334  retval = (*vecptr)[0]->tid;
335  }
336  return retval;
337  }
338 
339  //and all for the lack of a perfect reasonable language feature
340  static TraceTimer* staticGetTimer(const char* name)
341  {
342  TraceTimer* retval = NULL;
343  std::vector<TraceTimer*>* vecptr = getRootTimerPtr();
344  if(vecptr->size() > 0)
345  {
346  retval = (*vecptr)[0]->getTimer(name);
347  }
348  return retval;
349  }
350 
351  private:
352  string m_filename; //only meaningful at root
353 
354  TraceTimer(const char* a_name, TraceTimer* parent, int thread_id)
355  //:m_pruned(false), m_parent(parent), m_name(a_name), m_count(0),
356  // m_accumulated_WCtime(0),m_last_WCtime_stamp(0), m_thread_id(thread_id),
357  // m_flops(0)
358  {
359  m_pruned = false;
360  m_parent = parent;
361  m_name = a_name;
362  m_count = 0;
363  m_accumulated_WCtime = 0;
364  m_last_WCtime_stamp = 0;
365  m_thread_id = thread_id;
366  m_flops = 0;
367  }
368 
369 
370 
371  bool m_pruned;
372  TraceTimer* m_parent;
373  std::vector<TraceTimer*> m_children;
374 
375  unsigned long long int m_accumulated_WCtime;
376  unsigned long long int m_last_WCtime_stamp;
377 
378  double zeroTime = 0;
379  unsigned long long int zeroTicks = 0;
380 
381 
382 
383  void reportTree(FILE* out, const TraceTimer& node, int depth);
384  const TraceTimer* activeChild() const;
385 
386  int m_depth;
387  inline void macroTest();
388  inline void macroTest2();
389 
390  inline void currentize() const ;
391 
392  inline void reportFullTree(FILE* out, const TraceTimer& timer,
393  unsigned long long int totalTime, int depth, double secondspertick);
394  inline void reportOneTree(FILE* out, const TraceTimer& timer, double secondspertick);
395 
396 
397  inline void reset(TraceTimer& timer);
398  inline void PruneTimersParentChildPercent(double threshold, TraceTimer* parent);
399  inline void sumFlops(TraceTimer& timer);
400  };
401 
402 
403  class AutoStartLeaf
404  {
405  public:
406  AutoStartLeaf(TraceTimer* a_timer):m_timer(a_timer) {if(a_timer)a_timer->leafStart();}
407  ~AutoStartLeaf(){if(m_timer)m_timer->leafStop();}
408  bool active();
409  private:
410  TraceTimer* m_timer;
411  };
412 
413  class AutoStart
414  {
415  public:
416  AutoStart(TraceTimer* a_timer, char* mutex, char* childMutex)
417  :m_mutex(mutex), m_childMutex(childMutex),m_timer(a_timer)
418  {if(a_timer)a_timer->start(mutex);}
419  AutoStart(TraceTimer* a_timer, char* mutex)
420  :m_mutex(mutex), m_childMutex((char*)(getOKPtr()) ),m_timer(a_timer)
421  {
422  if(a_timer)a_timer->start(mutex);
423  }
424  inline ~AutoStart();
425  bool active();
426  private:
427  static char* getOKPtr()
428  {
429  static bool init = false;
430  static char* retval = new char[1024];
431  if(!init)
432  {
433  sprintf(retval, "0");
434  }
435  init = true;
436  return retval;
437  }
438  char* m_mutex;
439  char* m_childMutex;
440  TraceTimer* m_timer;
441  static const char ok = 0;
442 
443  };
444 
445  inline
446  AutoStart::~AutoStart()
447  {
448 # ifndef NDEBUG
449  if(*m_childMutex == 1)
450  {
451  std::cerr<<" stop called in timer unexpectedly";
452  abort();
453  }
454 # endif
455  if(m_timer)m_timer->stop(m_mutex);
456  }
457 
458 
459  inline void TraceTimer::leafStart()
460  {
461  if(m_pruned) return;
462  ++m_count;
463  m_last_WCtime_stamp = PR_ticks();
464  }
465 
466  inline void TraceTimer::leafStop()
467  {
468  if(m_pruned) return;
469  m_accumulated_WCtime += PR_ticks() - m_last_WCtime_stamp;
470  m_last_WCtime_stamp=0;
471  }
472 #endif
473 
474 #ifdef PR_TURN_OFF_TIMERS
475 
476 #define PR_TIMER(name, tpointer)
477 #define PR_TIME(name)
478 #define PR_FLOPS(flops)
479 #define PR_TIMELEAF(name)
480 #define PR_TIMERS(name)
481 #define PR_START(tpointer)
482 #define PR_STOP(tpointer)
483 #define PR_TIMER_REPORT()
484 #define PR_TIMER_RESET()
485 #define PR_TIMER_PRUNE(threshold)
486 #define PR_TIMER_SETFILE(filename)
487 
488 #else
489 
490 #define PR_TIMER(name, tpointer) \
491  const char* TimerTag_##tpointer = name ; \
492  ::Proto::TraceTimer* tpointer = NULL ; \
493  if(::Proto::TraceTimer::getTID()==0) \
494  { \
495  tpointer = ::Proto::TraceTimer::staticGetTimer(TimerTag_##tpointer) ; \
496  }
497 
498 #define PR_TIME(name) \
499  const char* TimerTagA = name ; \
500  char PR_TimermutexA = 0; \
501  ::Proto::TraceTimer* PR_tpointer = NULL; \
502  if(::Proto::TraceTimer::getTID()==0) \
503  { \
504  PR_tpointer = ::Proto::TraceTimer::staticGetTimer(TimerTagA); \
505  } \
506  ::Proto::AutoStart autostart(PR_tpointer, &PR_TimermutexA)
507 
508 #define PR_FLOPS(flops) \
509  if(PR_tpointer)PR_tpointer->addFlops(flops);
510 
511 #define PR_TIMELEAF(name) \
512  const char* TimerTagA = name ; \
513  ::Proto::TraceTimer* PR_tpointer = NULL; \
514  if(::Proto::TraceTimer::getTID()==0) \
515  { \
516  PR_tpointer = ::Proto::TraceTimer::staticGetTimer(TimerTagA); \
517  } \
518  ::Proto::AutoStartLeaf autostart(PR_tpointer)
519 
520 #define PR_TIMERS(name) \
521  const char* TimerTagA = name ; \
522  char PR_TimermutexA = 0; \
523  char PR_Timermutex = 0; \
524  ::Proto::TraceTimer* PR_tpointer = NULL; \
525  if(::Proto::TraceTimer::getTID()==0) \
526  { \
527  PR_tpointer = ::Proto::TraceTimer::staticGetTimer(TimerTagA); \
528  } \
529  ::Proto::AutoStart autostart(PR_tpointer, &PR_TimermutexA, &PR_Timermutex)
530 
531 
532 #define PR_START(tpointer) \
533  if(::Proto::TraceTimer::getTID()==0) \
534  { \
535  tpointer->start(&PR_Timermutex); \
536 }
537 
538 #define PR_STOP(tpointer) \
539  if(::Proto::TraceTimer::getTID()==0) \
540  { \
541  tpointer->stop(&PR_Timermutex); \
542  }
543  //#define PR_STOPV(tpointer, val ) val = tpointer->stop(&PR_Timermutex)
544 
545 #define PR_TIMER_REPORT() ::Proto::TraceTimer::staticReport()
546 
547 #define PR_TIMER_RESET() ::Proto::TraceTimer::staticReset()
548 
549 #define PR_TIMER_PRUNE(threshold) ::Proto::TraceTimer::staticPruneTimersParentChildPercent(threshold)
550 
551 #define PR_TIMER_SETFILE(filename) ::Proto::TraceTimer::staticSetTimerFileName(filename);
552 #endif
553 }//namespace proto
554 
555 #endif // PR_TIMER_H
unsigned long long int PR_ticks()
Definition: Proto_Timer.H:16
Definition: Proto_Array.H:17