Proto
|
Data Structures | |
class | AutoStart |
class | AutoStartLeaf |
class | Box |
An interval in DIM dimensional space. More... | |
class | BoxData |
Multidimensional Rectangular Array. More... | |
struct | boxdataIndexer |
class | BoxIterator |
Iterator for Boxes. More... | |
class | BufferEntry |
class | CInterval |
Component-Space Interval. More... | |
class | Copier |
Abstract Generic Parallel Copier. More... | |
class | CopierIterator |
class | DataIndex |
class | DataIterator |
class | DBLInternals |
class | describeInfo |
class | DisjointBoxLayout |
Disjoint Box Layout. More... | |
struct | emptyIndexer |
struct | getMemType |
struct | getMemType< BoxData< T, C, MEMTYPE, D, E > > |
struct | getMemType< Var< T, C, MEMTYPE, D, E > > |
class | HDF5Handler |
struct | indexer |
struct | indexer_i |
struct | indexer_p |
class | InterpStencil |
Interpolation Stencil. More... | |
class | LazyStencil |
An Unevaluated Stencil Operation. More... | |
class | LevelBoxData |
Level Box Data. More... | |
class | LevelCopier |
Level Copier. More... | |
class | LevelCopierOp |
Level Copier Operator. More... | |
class | LevelExchangeCopier |
Exchange Copier. More... | |
class | MayDay |
Error-reporting Functions. More... | |
class | memInfo |
class | MotionItem |
class | MotionItemSorter |
class | NeighborIterator |
struct | opKernel |
class | Point |
Integer Valued Vector. More... | |
class | ProblemDomain |
Represents a rectangular domain over which a problem can be defined, including periodic images. More... | |
class | Reduction |
class | RK4 |
Generic Explicit RK4 Algorithm. More... | |
struct | scalarIndexer |
class | Shift |
Stencil Shift. More... | |
class | Side |
Encapsulation of high or low side. More... | |
class | SideIterator |
Iterator for low and high side. More... | |
struct | sIndexer3 |
class | Stencil |
A Linear Stencil Operation. More... | |
struct | structIndexer |
class | traceInfo |
class | TraceTimer |
class | Var |
Pointwise Variable. More... | |
Functions | |
std::ostream & | operator<< (std::ostream &a_os, const Box &a_box) |
OStream Operator. | |
std::ostream & | operator<< (std::ostream &a_os, const CInterval &a_int) |
CInterval IOStream Operator. | |
template<typename T , unsigned int C, MemType MEMTYPE, unsigned char D, unsigned char E> | |
BoxData< T, 1, MEMTYPE > | slice (const BoxData< T, C, MEMTYPE, D, E > &a_src, unsigned int a_c, unsigned int a_d, unsigned int a_e) |
Slice Arbitrary Component (Non-Const) More... | |
std::ostream & | operator<< (std::ostream &os, const DisjointBoxLayout &a_dbl) |
Stream output for DBL. | |
std::ostream & | pout () |
Use this in place of std::cout for program output. More... | |
void | setPoutBaseName (const std::string &a_Name) |
Changes the base part of the filename for pout() files. More... | |
const std::string & | poutFileName () |
Accesses the filename for the local pout() file. More... | |
template<unsigned int P> | |
int | ipow (int M) |
Template Based Integer Exponentiation. | |
std::ostream & | operator<< (std::ostream &os, const ProblemDomain &a_pd) |
Stream output for ProblemDomain. | |
int | procID () |
local process ID More... | |
unsigned int | numProc () |
number of parallel processes More... | |
void | barrier () |
all ranks wait here to sync-up More... | |
double | fineCoef (Point p, Point s, double h, int refRatio) |
const char * | WriteBoxData (BoxData< double > &a_array, double dx=1.0) |
Write Scalar BoxData. More... | |
void | WriteBoxData (const char *a_filename, BoxData< double > &a_array, double dx=1.0) |
Write Scalar BoxData with Name. More... | |
template<class T , unsigned int C, MemType MEMTYPE, unsigned char D, unsigned char E> | |
void | WriteBoxData (const char *a_filename, BoxData< T, C, MEMTYPE, D, E > &a_array, const char *const a_varnames[C *D *E], const double *a_corner, const double &a_h) |
Write Non-Scalar BoxData with Name. More... | |
template<class T , unsigned int C, MemType MEMTYPE, unsigned char D, unsigned char E> | |
void | WriteBoxData (FILE *a_fp, BoxData< T, C, MEMTYPE, D, E > &a_array, const char *const a_varnames[C *D *E], const double *a_corner, const double &a_h) |
Write Non-Scalar BoxData to Existing File. More... | |
Alias and Slice Operators | |
The alias and slice operations facilitate BoxData operations while avoiding unnecessary copies. See the sample code below for an explanation of the syntax. Example Box srcBox = Box::Cube(4); BoxData<double, 1, 2, 3> Src(srcBox); Src.setVal(17); // Alias is identical to Src and points to the same data. Changing alias will change Src. auto Alias = alias(Src); // shiftedAlias points to the same buffer as Src, but the domain is shifted by (1,...,1); // (e.g. shiftedAlias[Point::Ones()] == Src[Point::Zeros] will return true.) auto shiftedAlias = alias(Src, Point::Ones()); //shiftedAlias points to the same data, but the associated domain Example | |
template<class T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT> | |
BoxData< T, C, MEMTYPE, D, E > | alias (BoxData< T, C, MEMTYPE, D, E > &a_original, const Point &shift=Point::Zeros()) |
Alias (Non-Const) More... | |
template<class T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT> | |
const BoxData< T, C, MEMTYPE, D, E > | alias (const BoxData< T, C, MEMTYPE, D, E > &a_original, const Point &shift=Point::Zeros()) |
Alias (Const) More... | |
template<typename T , unsigned int C, MemType MEMTYPE = MEMTYPE_DEFAULT, unsigned char D = 1, unsigned char E = 1> | |
BoxData< T, 1, MEMTYPE, 1, 1 > | slice (const BoxData< T, C, MEMTYPE, D, E > &a_src, unsigned int a_c, unsigned int a_d=0, unsigned int a_e=0) |
Slice Arbitrary Component (Non-Const) More... | |
template<typename T , unsigned int C, unsigned char CC, MemType MEMTYPE = MEMTYPE_DEFAULT> | |
BoxData< T, CC, MEMTYPE, 1, 1 > | slice (const BoxData< T, C, MEMTYPE, 1, 1 > &a_src, unsigned int a_nstart) |
Slice Arbitrary Component Range (Non-Const) More... | |
Pointwise Operators | |
The suite of forall functions facilitate writing functions that operate pointwise on BoxData. To this end, the user must write a function with one of the following structures: PROTO_KERNEL_START void F_temp(Var<T,C,MEMTYPE,D,E>&, Args...) { ... } PROTO_KERNEL_END(F_temp, F) // OR PROTO_KERNEL_START void F_p_temp(Point&, Var<T,C,MEMTYPE,D,E>&, Args...) { ... } PROTO_KERNEL_END(F_p_temp, F_p)
Refer to the following code snippet for some sample valid forall input functions: // Valid funcion inputs to forall may be STATIC members of classes: namespace Operator { // Pointwise function with no point dependence PROTO_KERNEL_START // necessary for use with GPU devices static void foo_temp(Var<double, 3, 2>& arg_0, double arg_1, // plain-old-data can be passed by value Var<bool>& arg_2) // any number of Var objects with different types / structures can be passed by reference { // if arg_2 == true at this Point... if (arg_2(0)) { arg_0(1,1) = arg_1; // Access the (1,1,0) component at each point and set it to arg_1 } else { arg_0(1,1) = -arg_1; // Access the (1,1,0) component at each point and se tit to -arg1 } } PROTO_KERNEL_END(foo_temp, foo) // Pointwise function with point dependence PROTO_KERNEL_START static void foo_p_temp(Point& a_p, // If the function depends on the point of evaluation, the Point must be the first argument Var<double, 3, 2>& arg_0, Var<bool>& arg_1) { if (arg_1(0)) { for (int ii = 0; ii < DIM; ii++) { arg_0(1,1) += a_p[ii]; // Set the (1,1,0) component of arg_0 equal to the sum of the components of this Point } } } PROTO_KERNEL_END(foo_p_temp, foo_p) } // globally defined functions are also valid: PROTO_KERNEL_START void bar_temp(Var<double>& arg_0, int arg_1) { arg_0(0) = arg_1; } PROTO_KERNEL_END(bar_temp, bar) // globally defined functions are also valid: PROTO_KERNEL_START void bar_p_temp(Point& a_p, Var<double>& arg_0, int arg_1) { arg_0(0) = a_p[0]*arg_1; } PROTO_KERNEL_END(bar_p_temp, bar_p) | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forall (const Func &a_F, Srcs &&... a_srcs) |
Pointwise Operator. More... | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forallOp (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forall (const Func &a_F, Box a_box, Srcs &&... a_srcs) |
Pointwise Operator: Overload with Box Argument. More... | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forallOp (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Box a_box, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forall_p (const Func &a_F, Srcs &&... a_srcs) |
Pointwise Operator with Point Dependence. More... | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forallOp_p (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forall_p (const Func &a_F, Box a_box, Srcs &&... a_srcs) |
Pointwise Operator with Point Dependence: Overload with const Box Argument. More... | |
template<typename T , unsigned int C = 1, unsigned char D = 1, unsigned char E = 1, MemType MEMTYPE = MEMTYPE_DEFAULT, typename Func , typename... Srcs> | |
BoxData< T, C, MEMTYPE, D, E > | forallOp_p (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Box a_box, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
template<typename Func , typename... Srcs> | |
void | forallInPlace (const Func &a_F, Srcs &&... a_srcs) |
In-Place Pointwise Operator. More... | |
template<typename Func , typename... Srcs> | |
void | forallInPlaceOp (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
template<typename Func , typename... Srcs> | |
void | forallInPlace (const Func &a_F, Box a_box, Srcs &&... a_srcs) |
In-Place Pointwise Operator on Prescribed Box. More... | |
template<typename Func , typename... Srcs> | |
void | forallInPlaceOp (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Box a_box, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
template<typename Func , typename... Srcs> | |
void | forallInPlace_p (const Func &a_F, Srcs &&... a_srcs) |
In-Place Pointwise Operator with Point Dependence. More... | |
template<typename Func , typename... Srcs> | |
void | forallInPlaceOp_p (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
template<typename Func , typename... Srcs> | |
void | forallInPlace_p (const Func &a_F, Box a_box, Srcs &&... a_srcs) |
In-Place Pointwise Operator with Point Dependence and Prescribed Box. More... | |
template<typename Func , typename... Srcs> | |
void | forallInPlaceOp_p (unsigned long long int a_num_flops_point, const char *a_timername, const Func &a_F, Box a_box, Srcs &&... a_srcs) |
same idea, but with flop counts and a timer name | |
External Operators | |
std::ostream & | operator<< (std::ostream &a_os, const Point &a_pt) |
Stream Operator. | |
Point | operator* (int a_scale, Point a_pt) |
Premultiplication by scalar. | |
Point | operator- (Point a_pt) |
Unary Negation. | |
Non-Member Functions | |
template<typename T > | |
Stencil< T > | operator* (T a_coef, Shift a_shift) |
Coefficient Shift Product "Constructor". More... | |
template<typename T > | |
Stencil< T > | operator* (T a_coef, const Stencil< T > a_stencil) |
Scalar Multiplication of Stencil Coefficients. More... | |
template<typename T , unsigned int C, MemType MEMTYPE, unsigned char D, unsigned char E> | |
BoxData< T, C, MEMTYPE, D, E > & | operator|= (BoxData< T, C, MEMTYPE, D, E > &a_dest, LazyStencil< T, C, MEMTYPE, D, E > &&a_op) |
Application by Replacement. More... | |
template<class T , unsigned int C, MemType MEMTYPE, unsigned char D, unsigned char E> | |
BoxData< T, C, MEMTYPE, D, E > & | operator+= (BoxData< T, C, MEMTYPE, D, E > &a_dest, LazyStencil< T, C, MEMTYPE, D, E > &&a_op) |
Application by Increment. More... | |
TraceTimer class is a self-tracing code instrumentation system
TraceTimer class is a self-tracing code instrumentation system for Chombo (or any other package really). The user interface is specified by a small set of macros. The usage model is that you just leave these timers in the code, for good. Initially, your application will have 'main' and a few hewavy functions instrumented, and the lower level Chombo library instrumentation. As your tool or application matures, it will garner a larger set of instrumentation giving clear views of your code performance. After a routine has been cleverly and lovingly optimized, you leave in the timers, to spot when some later bug fix or improvement undoes your previous labors.
The first macro is what people will use the most:
This is the simplest interface for timers. you place this macro call in a function you wish to be timed. It handles making the timer, calling 'start' when you enter the function, and calling 'stop' when you leave the function. A good idea is to use a 'label' specific enough to be unambiguous without being overwhelming. for instance:
In this case, we have a class with many constructors and define functions that all funnel into a single general function. We can just call this 'define' and not worry about naming/instrumenting all the different overloaded instances. If you slip up and use the same label twice, that is not a real problem, the two locations will be timed and tracked properly (even if one is a sibling or parent of the other). The only place it will make things a little harder is in the output where you might have the same name show up and look confusing.
In serial, you will see a file called time.table (in parallel, you will get a time.table.n (where n is the rank number) files). If you want fewer files, you can do setenv PR_OUTPUT_INTERVAL nproc and it will only output every nproc processors time.table.n files (where nnproc == 0). I won't go into this file format. It is kind of gprof-ish, with what I consider improvements. The real benefit here is profiling that understands our Chombo context, a smaller information set to observe, and the fact that, so far in my testing, the timers have negligible impact on the run time or memory use of the code.
By default, Chombo compiles in the instructions for the timers wherever the macros appear. If the compiler macro PR_NTIMER is defined, then all the PR_TIME* macros evaluate to empty expressions at compile time.
The next level up in complexity is the set of four macros for when you want sub-function resolution in your timers. For instance, in a really huge function that you have not figured out how to re-factor, or built with lots of bad cut n paste code 're-use'.
PR_TIMERS has the same semantic as PR_TIME, except that you can declare an arbitrary number of children after it in the same function scope. The children here do not autostart and autostop, you have to tell them where to start and stop timing. The children can themselves be parents for timers in called functions, of course. The children obey a set of mutual exclusions. The following generate run time errors:
the following will generate compile time errors:
You do not have to put any calls in your main routine to activate the clocks or generate a report at completion, this is handled with static iniitalization and an atexit function.
There is a larger argument of manual instrumentation being counter to good development. Profiling the code is supposed to tell you where to expend your optimization effort. Manual instrumentation opens the door to people wasting time assuming what parts of the code are going to take up lots of time and instrumenting them, before seeing any real performance data. Good judgement is needed. We have a body of knowledge about Chombo that will inform us about a good minimal first set of functions to instrument.
|
inline |
all ranks wait here to sync-up
All MPI ranks wait here to sync-up. Calls MPI_Barrier(comm). This is a no-op in the non-MPI/serial case.
|
inline |
number of parallel processes
Returns the number of parallel processes running. Always returns at least 1.
|
inline |
Use this in place of std::cout for program output.
the stream that all output except error msgs should use
Replaces std::cout in most of the Chombo code. In serial this just returns std::cout. In parallel, this creates a separate file for each proc called <basename>.n where n is the procID and <basename> defaults to "pout" but can be set by calling setPoutBaseName(). Output is then directed to these files. This keeps the output from different processors from getting all jumbled up. If you want fewer files, you can do setenv CH_OUTPUT_INTERVAL nproc and it will only output every nproc processors pout.n files (where nnproc == 0).
In serial this is the standard output, in parallel it is a different file on each proc (see setPoutBaseName()).
|
inline |
Accesses the filename for the local pout() file.
return the current filename as used by pout()
Returns the name used for the local pout() file. In parallel this is "<pout_basename>.<procID>", where <pout_basename> defaults to "pout" and can be modified by calling setPoutBaseName(), and <procID> is the local proc number. In serial, this always returns the string "cout". It is an error (exit code 111) to call this in parallel before MPI_Initialize().
in serial, just return the string "cout"; abort if MPI is not initialized.
|
inline |
|
inline |
Changes the base part of the filename for pout() files.
Set the base name for the parallel output files used by pout().
When in parallel, changes the base name of the pout() files. If pout() has already been called, it closes the current output file and opens a new one (unless the name is the same, in which case it does nothing). In serial, ignores the argument and does nothing.
If the file has already been used and this is a different name, close the current file and open a new one.