.. _ftrace: Ftrace (function tracing) ######################### This section describes how to generate a function call graph for user Trusted Applications using ``ftrace``. The name comes from the Linux framework which has a similar purpose, but the OP-TEE ``ftrace`` is very much specific. A call graph logs all the calls to instrumented functions and contains timing information. It is therefore a valuable tool to troubleshoot performance problems or to optimize the code in general. The configuration option ``CFG_FTRACE_SUPPORT=y`` enables OP-TEE to collect function graph information from Trusted Applications running in user mode and compiled with ``-pg``. Once collected, the function graph data is sent to ``tee-supplicant`` via RPC, so they can be saved to disk, later processed and displayed using helper scripts (``ftrace_format.py`` and ``symbolize.py`` which can be found in ``optee_os/scripts``). Another configuration option ``CFG_SYSCALL_FTRACE=y`` in addition to ``CFG_FTRACE_SUPPORT=y`` enables OP-TEE to collect function graph information for syscalls as well while running in kernel mode on behalf of Trusted Applications. Note that a small number of kernel functions cannot be traced; they have the ``__noprof`` attribute in the source code. A third configuration option ``CFG_ULIBS_MCOUNT=y`` enables tracing of user space libraries contained in ``optee_os`` and used by TAs (such as ``libutee`` and ``libutils``). Usage ***** - Build OP-TEE OS with ``CFG_FTRACE_SUPPORT=y`` and optionally ``CFG_ULIBS_MCOUNT=y`` and ``CFG_SYSCALL_FTRACE=y``. - Build user TAs with ``-pg``, for instance enable ``CFG_TA_MCOUNT=y`` to instrument the whole TA. Also, in case user wants to set ``-pg`` for a particular file, following should go in corresponding sub.mk: ``cflags--y+=-pg``. Note that instrumented TAs have a larger ``.bss`` section. The memory overhead depends on ``CFG_FTRACE_BUF_SIZE`` macro which can be configured specific to user TAs using config: ``CFG_FTRACE_BUF_SIZE=4096`` (default value: 2048, refer to the TA linker script for details: ``ta/arch/$(ARCH)/ta.ld.S``). - Run the application normally. When the current session exits or there is any abort during TA execution, ``tee-supplicant`` will write function graph data to ``/tmp/ftrace-.out``. If the file already exists, a number is appended, such as: ``ftrace-.1.out``. - Run helper scripts called ``ftrace_format.py`` to translate the function graph binary data into human readable text and ``symbolize.py`` to convert function addresses into function names: ``optee_os/scripts/ftrace_format.py ftrace-.out | optee_os/scripts/symbolize.py -d .elf -d tee.elf`` - Refer to `commit 5c2c0fb31efb`_ for a full usage example on QEMU. Typical output ************** .. code-block:: none TEE load address @ 0x5ab04000 Function graph for TA: cb3e5ba0-adf1-11e0-998b-0002a5d5c51b @ 80085000 | 1 | __ta_entry() { | 2 | __utee_entry() { 43.840 us | 3 | ta_header_get_session() 7.216 us | 3 | tahead_get_trace_level() 14.480 us | 3 | trace_set_level() | 3 | malloc_add_pool() { | 4 | raw_malloc_add_pool() { 46.032 us | 5 | bpool() | 5 | raw_realloc() { 166.256 us | 6 | bget() 23.056 us | 6 | raw_malloc_return_hook() 267.952 us | 5 | } 398.720 us | 4 | } 426.992 us | 3 | } | 3 | TEE_GetPropertyAsU32() { 23.600 us | 4 | is_propset_pseudo_handle() | 4 | __utee_check_instring_annotation() { 26.416 us | 5 | strlen() | 5 | check_access() { | 6 | TEE_CheckMemoryAccessRights() { | 7 | _utee_check_access_rights() { | 8 | syscall_check_access_rights() { | 9 | ts_get_current_session() { 4.304 us | 10 | ts_get_current_session_may_fail() 10.976 us | 9 | } | 9 | to_user_ta_ctx() { 2.496 us | 10 | is_user_ta_ctx() 8.096 us | 9 | } | 9 | vm_check_access_rights() { | 10 | vm_buf_is_inside_um_private() { | 11 | core_is_buffer_inside() { ... The duration (function's time of execution) is displayed on the closing bracket line of a function or on the same line in case the function is the leaf one. In other words, duration is displayed whenever an instrumented function returns. It comprises the time spent executing the function and any of its callees. The Counter-timer Physical Count register (CNTPCT) and the Counter-timer Frequency register (CNTFRQ) are used to compute durations. Time spent servicing foreign interrupts is subtracted. The second column is the stack depth for the current function. It helps visually match function entries and exit. .. _commit 5c2c0fb31efb: https://github.com/OP-TEE/optee_os/commit/5c2c0fb31efb