[linux-yocto] [PATCH 13/17] opencsd: This is a Backport of the Following

Tue May 16 11:39:05 PDT 2017

From: John Jacques <john.jacques at intel.com>

repository: https://github.com/Linaro/OpenCSD.git
branch: perf-opencsd-4.9
tip:
commit e38cbdb2928a36c5a ("cs-etm: Update to perf cs-etm decoder for OpenCSD v0.5")

Signed-off-by: John Jacques <john.jacques at intel.com>
---
 Documentation/trace/coresight.txt                  |  138 +-
 drivers/hwtracing/coresight/coresight-etm-perf.c   |   31 +-
 drivers/hwtracing/coresight/coresight-etm.h        |    5 +
 .../hwtracing/coresight/coresight-etm3x-sysfs.c    |   12 +-
 drivers/hwtracing/coresight/coresight-priv.h       |    4 +-
 drivers/hwtracing/coresight/coresight-stm.c        |    9 +-
 drivers/hwtracing/coresight/coresight-tmc-etf.c    |   48 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c    |  286 +++-
 drivers/hwtracing/coresight/coresight-tmc.h        |    2 +-
 drivers/hwtracing/coresight/coresight.c            |   62 +-
 tools/perf/Makefile.config                         |   18 +
 tools/perf/Makefile.perf                           |    3 +
 tools/perf/arch/arm/util/cs-etm.c                  |    2 -
 tools/perf/builtin-script.c                        |    3 +-
 tools/perf/scripts/python/cs-trace-disasm.py       |  134 ++
 tools/perf/scripts/python/cs-trace-ranges.py       |   44 +
 tools/perf/util/Build                              |    2 +
 tools/perf/util/auxtrace.c                         |    2 +
 tools/perf/util/cs-etm-decoder/Build               |    6 +
 .../perf/util/cs-etm-decoder/cs-etm-decoder-stub.c |   99 ++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  527 +++++++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |  117 ++
 tools/perf/util/cs-etm.c                           | 1501 ++++++++++++++++++++
 tools/perf/util/cs-etm.h                           |   10 +
 tools/perf/util/machine.c                          |   46 +-
 .../util/scripting-engines/trace-event-python.c    |    2 +
 tools/perf/util/symbol-minimal.c                   |    3 +-
 27 files changed, 3001 insertions(+), 115 deletions(-)
 create mode 100644 tools/perf/scripts/python/cs-trace-disasm.py
 create mode 100644 tools/perf/scripts/python/cs-trace-ranges.py
 create mode 100644 tools/perf/util/cs-etm-decoder/Build
 create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c
 create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
 create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
 create mode 100644 tools/perf/util/cs-etm.c

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88c..a2e7ccb 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -20,13 +20,13 @@ Components are generally categorised as source, link and sinks and are
 
 "Sources" generate a compressed stream representing the processor instruction
 path based on tracing scenarios as configured by users.  From there the stream
-flows through the coresight system (via ATB bus) using links that are connecting
-the emanating source to a sink(s).  Sinks serve as endpoints to the coresight
+flows through the Coresight system (via ATB bus) using links that are connecting
+the emanating source to a sink(s).  Sinks serve as endpoints to the Coresight
 implementation, either storing the compressed stream in a memory buffer or
 creating an interface to the outside world where data can be transferred to a
-host without fear of filling up the onboard coresight memory buffer.
+host without fear of filling up the onboard Coresight memory buffer.
 
-At typical coresight system would look like this:
+At typical Coresight system would look like this:
 
   *****************************************************************
  **************************** AMBA AXI  ****************************===||
@@ -83,8 +83,8 @@ While on target configuration of the components is done via the APB bus,
 all trace data are carried out-of-band on the ATB bus.  The CTM provides
 a way to aggregate and distribute signals between CoreSight components.
 
-The coresight framework provides a central point to represent, configure and
-manage coresight devices on a platform.  This first implementation centers on
+The Coresight framework provides a central point to represent, configure and
+manage Coresight devices on a platform.  This first implementation centers on
 the basic tracing functionality, enabling components such ETM/PTM, funnel,
 replicator, TMC, TPIU and ETB.  Future work will enable more
 intricate IP blocks such as STM and CTI.
@@ -129,11 +129,11 @@ expected to be added as the solution matures.
 Framework and implementation
 ----------------------------
 
-The coresight framework provides a central point to represent, configure and
-manage coresight devices on a platform.  Any coresight compliant device can
+The Coresight framework provides a central point to represent, configure and
+manage Coresight devices on a platform.  Any Coresight compliant device can
 register with the framework for as long as they use the right APIs:
 
-struct coresight_device *coresight_register(struct coresight_desc *desc);
+struct Coresight_device *coresight_register(struct coresight_desc *desc);
 void coresight_unregister(struct coresight_device *csdev);
 
 The registering function is taking a "struct coresight_device *csdev" and
@@ -193,10 +193,120 @@ the information carried in "THIS_MODULE".
 How to use the tracer modules
 -----------------------------
 
-Before trace collection can start, a coresight sink needs to be identify.
-There is no limit on the amount of sinks (nor sources) that can be enabled at
-any given moment.  As a generic operation, all device pertaining to the sink
-class will have an "active" entry in sysfs:
+There is two ways to use the Coresight framework: 1) using the perf cmd line
+tool and 2) interacting directly with the Coresight devices using the sysFS
+interface.  The latter will slowly be faded out as more functionality become
+available from the perf cmd line tool but for the time being both are still
+supported.  The following sections provide details on using both methods.
+
+1) Using perf framework:
+
+Coresight tracers like ETM and PTM are represented using the Perf framework's
+Performance Monitoring Unit (PMU).  As such the perf framework takes charge of
+controlling when tracing happens based on when the process(es) of interest are
+scheduled.  When configure in a system, Coresight PMUs will be listed when
+queried by the perf command line tool:
+
+linaro at linaro-nano:~$ ./perf list pmu
+
+List of pre-defined events (to be used in -e):
+
+  cs_etm//                                           [Kernel PMU event]
+
+linaro at linaro-nano:~$
+
+Regardless of the amount ETM/PTM IP block in a system (usually equal to the
+amount of processor core), the "cs_etm" PMU will be listed only once.
+
+Before a trace can be configured and started a Coresight sink needs to be
+selected using the sysFS method (see below).  This is only temporary until
+sink selection can be made from the command line tool.
+
+linaro at linaro-nano:~$ ls /sys/bus/coresight/devices
+20010000.etb  20030000.tpiu  20040000.funnel  2201c000.ptm
+2201d000.ptm  2203c000.etm  2203d000.etm  2203e000.etm  replicator
+
+linaro at linaro-nano:~$ echo 1 > /sys/bus/coresight/devices/20010000.etb/enable_sink
+
+Once a sink has been selected configuring a Coresight PMU works the same way as
+any other PMU.  As such tracing can happen for a single CPU, a group of CPU, per
+thread or a combination of those:
+
+linaro at linaro-nano:~$ perf record -e cs_etm// --per-thread <command>
+
+linaro at linaro-nano:~$ perf record -C 0,2-3 -e cs_etm// <command>
+
+Tracing limited to user and kernel space can also be used to narrow the amount
+of collected traces:
+
+linaro at linaro-nano:~$ perf record -e cs_etm//u --per-thread <command>
+
+linaro at linaro-nano:~$ perf record -C 0,2-3 -e cs_etm//k <command>
+
+As of this writing two ETM/PTM specific options have are available: cycle
+accurate and timestamp (please refer to the Embedded Trace Macrocell reference
+manual for details on these options).  By default both are disabled but using
+the "cycacc" and "timestamp" mnemonic within the double '/' will see those
+options configure for the upcoming trace run:
+
+linaro at linaro-nano:~$ perf record -e cs_etm/cycacc/ --per-thread <command>
+
+linaro at linaro-nano:~$ perf record -C 0,2-3 -e cs_etm/cycacc,timestamp/ <command>
+
+The Coresight PMUs can be configured to work in "full trace" or "snapshot" mode.
+In full trace mode trace acquisition is enabled from beginning to end with trace
+data being recorded continuously:
+
+linaro at linaro-nano:~$ perf record -e cs_etm// dd if=/dev/random of=./test.txt bs=1k count=1000
+
+Since this can lead to a significant amount of data and because some devices are
+limited in disk space snapshot mode can be used instead.  In snapshot mode
+traces are still collected in the ring buffer but not communicated to user
+space.  The ring buffer is allowed to wrap around, providing the latest
+information before an event of interest happens.  Significant events are
+communicated by sending a USR2 signal to the user space command line tool.
+From there the tool will stop trace collection and harvest data from the ring
+buffer before re-enabling traces.  Snapshot mode can be invoked using '-S' when
+launching a trace collection:
+
+linaro at linaro-nano:~$ perf record -S -e cs_etm// dd if=/dev/random of=./test.txt bs=1k count=1000
+
+Trace data collected during trace runs ends up in the "perf.data" file.  Trace
+configuration information necessary for trace decoding is also embedded in the
+"perf.data" file.  Two new headers, 'PERF_RECORD_AUXTRACE_INFO' and
+'PERF_RECORD_AUXTRACE' have been added to the list of event types in order to
+find out where the different sections start.
+
+It is worth noting that a set of metadata information exists for each tracer
+that participated in a trace run.  As such if 5 processors have been engaged,
+5 sets of metadata will be found in the perf.data file.  This is to ensure that
+tracer decompression tools have all the information they need in order to
+process the trace data.
+
+Metadata information is collected directly from the ETM/PTM management registers
+using the sysFS interface.  Since there is no way for the perf command line
+tool to associate a CPU with a tracer, a symbolic link has been created between
+the cs_etm sysFS event directory and each Coresight tracer:
+
+linaro at linaro-nano:~$ ls /sys/bus/event_source/devices/cs_etm
+cpu0  cpu1  cpu2  cpu3  cpu4  format  perf_event_mux_interval_ms
+power  subsystem  type  uevent
+
+linaro at linaro-nano:~$ ls /sys/bus/event_source/devices/cs_etm/cpu0/mgmt/
+etmccer  etmccr  etmcr  etmidr  etmscr  etmtecr1  etmtecr2
+etmteevr  etmtraceidr  etmtssvr
+
+2) Using the sysFS interface:
+
+Most, if not all, configuration registers are made available to users via the
+sysFS interface.  Until all Coresight ETM drivers have been converted to perf,
+it will also be possible to start and stop traces from sysFS.
+
+As with the perf method described above, a Coresight sink needs to be identify
+before trace collection can commence.  Using the sysFS method _only_, there is
+no limit on the amount of sinks (nor sources) that can be enabled at
+any given moment.  As a generic operation, all devices pertaining to the sink
+class will have an "enable_sink" entry in sysfs:
 
 root:/sys/bus/coresight/devices# ls
 replicator  20030000.tpiu    2201c000.ptm  2203c000.etm  2203e000.etm
@@ -246,7 +356,7 @@ The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
 
 Following is a DS-5 output of an experimental loop that increments a variable up
 to a certain value.  The example is simple and yet provides a glimpse of the
-wealth of possibilities that coresight provides.
+wealth of possibilities that Coresight provides.
 
 Info                                    Tracing enabled
 Instruction     106378866       0x8026B53C      E52DE004        false   PUSH     {lr}
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 2cd7c71..1774196 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -202,6 +202,21 @@ static void *etm_setup_aux(int event_cpu, void **pages,
 	if (!event_data)
 		return NULL;
 
+	/*
+	 * In theory nothing prevent tracers in a trace session from being
+	 * associated with different sinks, nor having a sink per tracer.  But
+	 * until we have HW with this kind of topology we need to assume tracers
+	 * in a trace session are using the same sink.  Therefore go through
+	 * the coresight bus and pick the first enabled sink.
+	 *
+	 * When operated from sysFS users are responsible to enable the sink
+	 * while from perf, the perf tools will do it based on the choice made
+	 * on the cmd line.  As such the "enable_sink" flag in sysFS is reset.
+	 */
+	sink = coresight_get_enabled_sink(true);
+	if (!sink)
+		goto err;
+
 	INIT_WORK(&event_data->work, free_event_data);
 
 	mask = &event_data->mask;
@@ -219,25 +234,11 @@ static void *etm_setup_aux(int event_cpu, void **pages,
 		 * list of devices from source to sink that can be
 		 * referenced later when the path is actually needed.
 		 */
-		event_data->path[cpu] = coresight_build_path(csdev);
+		event_data->path[cpu] = coresight_build_path(csdev, sink);
 		if (IS_ERR(event_data->path[cpu]))
 			goto err;
 	}
 
-	/*
-	 * In theory nothing prevent tracers in a trace session from being
-	 * associated with different sinks, nor having a sink per tracer.  But
-	 * until we have HW with this kind of topology and a way to convey
-	 * sink assignement from the perf cmd line we need to assume tracers
-	 * in a trace session are using the same sink.  Therefore pick the sink
-	 * found at the end of the first available path.
-	 */
-	cpu = cpumask_first(mask);
-	/* Grab the sink at the end of the path */
-	sink = coresight_get_sink(event_data->path[cpu]);
-	if (!sink)
-		goto err;
-
 	if (!sink_ops(sink)->alloc_buffer)
 		goto err;
 
diff --git a/drivers/hwtracing/coresight/coresight-etm.h b/drivers/hwtracing/coresight/coresight-etm.h
index 4a18ee4..ad063d7 100644
--- a/drivers/hwtracing/coresight/coresight-etm.h
+++ b/drivers/hwtracing/coresight/coresight-etm.h
@@ -89,11 +89,13 @@
 /* ETMCR - 0x00 */
 #define ETMCR_PWD_DWN		BIT(0)
 #define ETMCR_STALL_MODE	BIT(7)
+#define ETMCR_BRANCH_BROADCAST	BIT(8)
 #define ETMCR_ETM_PRG		BIT(10)
 #define ETMCR_ETM_EN		BIT(11)
 #define ETMCR_CYC_ACC		BIT(12)
 #define ETMCR_CTXID_SIZE	(BIT(14)|BIT(15))
 #define ETMCR_TIMESTAMP_EN	BIT(28)
+#define ETMCR_RETURN_STACK	BIT(29)
 /* ETMCCR - 0x04 */
 #define ETMCCR_FIFOFULL		BIT(23)
 /* ETMPDCR - 0x310 */
@@ -110,8 +112,11 @@
 #define ETM_MODE_STALL		BIT(2)
 #define ETM_MODE_TIMESTAMP	BIT(3)
 #define ETM_MODE_CTXID		BIT(4)
+#define ETM_MODE_BBROAD		BIT(5)
+#define ETM_MODE_RET_STACK	BIT(6)
 #define ETM_MODE_ALL		(ETM_MODE_EXCLUDE | ETM_MODE_CYCACC | \
 				 ETM_MODE_STALL | ETM_MODE_TIMESTAMP | \
+				 ETM_MODE_BBROAD | ETM_MODE_RET_STACK | \
 				 ETM_MODE_CTXID | ETM_MODE_EXCL_KERN | \
 				 ETM_MODE_EXCL_USER)
 
diff --git a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c
index e9b0719..ca98ad1 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c
@@ -146,7 +146,7 @@ static ssize_t mode_store(struct device *dev,
 			goto err_unlock;
 		}
 		config->ctrl |= ETMCR_STALL_MODE;
-	 } else
+	} else
 		config->ctrl &= ~ETMCR_STALL_MODE;
 
 	if (config->mode & ETM_MODE_TIMESTAMP) {
@@ -164,6 +164,16 @@ static ssize_t mode_store(struct device *dev,
 	else
 		config->ctrl &= ~ETMCR_CTXID_SIZE;
 
+	if (config->mode & ETM_MODE_BBROAD)
+		config->ctrl |= ETMCR_BRANCH_BROADCAST;
+	else
+		config->ctrl &= ~ETMCR_BRANCH_BROADCAST;
+
+	if (config->mode & ETM_MODE_RET_STACK)
+		config->ctrl |= ETMCR_RETURN_STACK;
+	else
+		config->ctrl &= ~ETMCR_RETURN_STACK;
+
 	if (config->mode & (ETM_MODE_EXCL_KERN | ETM_MODE_EXCL_USER))
 		etm_config_trace_mode(config);
 
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index 196a14b..ef9d8e9 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -111,7 +111,9 @@ static inline void CS_UNLOCK(void __iomem *addr)
 void coresight_disable_path(struct list_head *path);
 int coresight_enable_path(struct list_head *path, u32 mode);
 struct coresight_device *coresight_get_sink(struct list_head *path);
-struct list_head *coresight_build_path(struct coresight_device *csdev);
+struct coresight_device *coresight_get_enabled_sink(bool reset);
+struct list_head *coresight_build_path(struct coresight_device *csdev,
+				       struct coresight_device *sink);
 void coresight_release_path(struct list_head *path);
 
 #ifdef CONFIG_CORESIGHT_SOURCE_ETM3X
diff --git a/drivers/hwtracing/coresight/coresight-stm.c b/drivers/hwtracing/coresight/coresight-stm.c
index 8e79056..2d16260 100644
--- a/drivers/hwtracing/coresight/coresight-stm.c
+++ b/drivers/hwtracing/coresight/coresight-stm.c
@@ -419,10 +419,10 @@ static ssize_t stm_generic_packet(struct stm_data *stm_data,
 						   struct stm_drvdata, stm);
 
 	if (!(drvdata && local_read(&drvdata->mode)))
-		return 0;
+		return -EACCES;
 
 	if (channel >= drvdata->numsp)
-		return 0;
+		return -EINVAL;
 
 	ch_addr = (unsigned long)stm_channel_addr(drvdata, channel);
 
@@ -920,6 +920,11 @@ static struct amba_id stm_ids[] = {
 		.mask   = 0x0003ffff,
 		.data	= "STM32",
 	},
+	{
+		.id	= 0x0003b963,
+		.mask	= 0x0003ffff,
+		.data	= "STM500",
+	},
 	{ 0, 0},
 };
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index d6941ea..1549436 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -70,7 +70,7 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 	 * When operating in sysFS mode the content of the buffer needs to be
 	 * read before the TMC is disabled.
 	 */
-	if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etb_dump_hw(drvdata);
 	tmc_disable_hw(drvdata);
 
@@ -103,19 +103,14 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
 	int ret = 0;
 	bool used = false;
 	char *buf = NULL;
-	long val;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	 /* This shouldn't be happening */
-	if (WARN_ON(mode != CS_MODE_SYSFS))
-		return -EINVAL;
-
 	/*
 	 * If we don't have a buffer release the lock and allocate memory.
 	 * Otherwise keep the lock and move along.
@@ -138,13 +133,12 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode)
 		goto out;
 	}
 
-	val = local_xchg(&drvdata->mode, mode);
 	/*
 	 * In sysFS mode we can have multiple writers per sink.  Since this
 	 * sink is already enabled no memory is needed and the HW need not be
 	 * touched.
 	 */
-	if (val == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS)
 		goto out;
 
 	/*
@@ -163,6 +157,7 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode)
 		drvdata->buf = buf;
 	}
 
+	drvdata->mode = CS_MODE_SYSFS;
 	tmc_etb_enable_hw(drvdata);
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -177,34 +172,29 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode)
 	return ret;
 }
 
-static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_perf(struct coresight_device *csdev)
 {
 	int ret = 0;
-	long val;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	 /* This shouldn't be happening */
-	if (WARN_ON(mode != CS_MODE_PERF))
-		return -EINVAL;
-
 	spin_lock_irqsave(&drvdata->spinlock, flags);
 	if (drvdata->reading) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	val = local_xchg(&drvdata->mode, mode);
 	/*
 	 * In Perf mode there can be only one writer per sink.  There
 	 * is also no need to continue if the ETB/ETR is already operated
 	 * from sysFS.
 	 */
-	if (val != CS_MODE_DISABLED) {
+	if (drvdata->mode != CS_MODE_DISABLED) {
 		ret = -EINVAL;
 		goto out;
 	}
 
+	drvdata->mode = CS_MODE_PERF;
 	tmc_etb_enable_hw(drvdata);
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -216,9 +206,9 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode)
 {
 	switch (mode) {
 	case CS_MODE_SYSFS:
-		return tmc_enable_etf_sink_sysfs(csdev, mode);
+		return tmc_enable_etf_sink_sysfs(csdev);
 	case CS_MODE_PERF:
-		return tmc_enable_etf_sink_perf(csdev, mode);
+		return tmc_enable_etf_sink_perf(csdev);
 	}
 
 	/* We shouldn't be here */
@@ -227,7 +217,6 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev, u32 mode)
 
 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-	long val;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -237,10 +226,11 @@ static void tmc_disable_etf_sink(struct coresight_device *csdev)
 		return;
 	}
 
-	val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
 	/* Disable the TMC only if it needs to */
-	if (val != CS_MODE_DISABLED)
+	if (drvdata->mode != CS_MODE_DISABLED) {
 		tmc_etb_disable_hw(drvdata);
+		drvdata->mode = CS_MODE_DISABLED;
+	}
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
@@ -260,7 +250,7 @@ static int tmc_enable_etf_link(struct coresight_device *csdev,
 	}
 
 	tmc_etf_enable_hw(drvdata);
-	local_set(&drvdata->mode, CS_MODE_SYSFS);
+	drvdata->mode = CS_MODE_SYSFS;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	dev_info(drvdata->dev, "TMC-ETF enabled\n");
@@ -280,7 +270,7 @@ static void tmc_disable_etf_link(struct coresight_device *csdev,
 	}
 
 	tmc_etf_disable_hw(drvdata);
-	local_set(&drvdata->mode, CS_MODE_DISABLED);
+	drvdata->mode = CS_MODE_DISABLED;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	dev_info(drvdata->dev, "TMC disabled\n");
@@ -383,7 +373,7 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
 		return;
 
 	/* This shouldn't happen */
-	if (WARN_ON_ONCE(local_read(&drvdata->mode) != CS_MODE_PERF))
+	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
 		return;
 
 	CS_UNLOCK(drvdata->base);
@@ -504,7 +494,6 @@ const struct coresight_ops tmc_etf_cs_ops = {
 
 int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
 {
-	long val;
 	enum tmc_mode mode;
 	int ret = 0;
 	unsigned long flags;
@@ -528,9 +517,8 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	val = local_read(&drvdata->mode);
 	/* Don't interfere if operated from Perf */
-	if (val == CS_MODE_PERF) {
+	if (drvdata->mode == CS_MODE_PERF) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -542,7 +530,7 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
 	}
 
 	/* Disable the TMC if need be */
-	if (val == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etb_disable_hw(drvdata);
 
 	drvdata->reading = true;
@@ -573,7 +561,7 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata)
 	}
 
 	/* Re-enable the TMC if need be */
-	if (local_read(&drvdata->mode) == CS_MODE_SYSFS) {
+	if (drvdata->mode == CS_MODE_SYSFS) {
 		/*
 		 * The trace run will continue with the same allocated trace
 		 * buffer. As such zero-out the buffer so that we don't end
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 886ea83..2db4857 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -15,11 +15,30 @@
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/circ_buf.h>
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/slab.h>
+
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+/**
+ * struct cs_etr_buffer - keep track of a recording session' specifics
+ * @tmc:	generic portion of the TMC buffers
+ * @paddr:	the physical address of a DMA'able contiguous memory area
+ * @vaddr:	the virtual address associated to @paddr
+ * @size:	how much memory we have, starting at @paddr
+ * @dev:	the device @vaddr has been tied to
+ */
+struct cs_etr_buffers {
+	struct cs_buffers	tmc;
+	dma_addr_t		paddr;
+	void __iomem		*vaddr;
+	u32			size;
+	struct device		*dev;
+};
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
 	u32 axictl;
@@ -86,26 +105,22 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	 * When operating in sysFS mode the content of the buffer needs to be
 	 * read before the TMC is disabled.
 	 */
-	if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etr_dump_hw(drvdata);
 	tmc_disable_hw(drvdata);
 
 	CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 {
 	int ret = 0;
 	bool used = false;
-	long val;
 	unsigned long flags;
 	void __iomem *vaddr = NULL;
 	dma_addr_t paddr;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	 /* This shouldn't be happening */
-	if (WARN_ON(mode != CS_MODE_SYSFS))
-		return -EINVAL;
 
 	/*
 	 * If we don't have a buffer release the lock and allocate memory.
@@ -134,13 +149,12 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode)
 		goto out;
 	}
 
-	val = local_xchg(&drvdata->mode, mode);
 	/*
 	 * In sysFS mode we can have multiple writers per sink.  Since this
 	 * sink is already enabled no memory is needed and the HW need not be
 	 * touched.
 	 */
-	if (val == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS)
 		goto out;
 
 	/*
@@ -155,8 +169,7 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode)
 		drvdata->buf = drvdata->vaddr;
 	}
 
-	memset(drvdata->vaddr, 0, drvdata->size);
-
+	drvdata->mode = CS_MODE_SYSFS;
 	tmc_etr_enable_hw(drvdata);
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -171,34 +184,29 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode)
 	return ret;
 }
 
-static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
 	int ret = 0;
-	long val;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-	 /* This shouldn't be happening */
-	if (WARN_ON(mode != CS_MODE_PERF))
-		return -EINVAL;
-
 	spin_lock_irqsave(&drvdata->spinlock, flags);
 	if (drvdata->reading) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	val = local_xchg(&drvdata->mode, mode);
 	/*
 	 * In Perf mode there can be only one writer per sink.  There
 	 * is also no need to continue if the ETR is already operated
 	 * from sysFS.
 	 */
-	if (val != CS_MODE_DISABLED) {
+	if (drvdata->mode != CS_MODE_DISABLED) {
 		ret = -EINVAL;
 		goto out;
 	}
 
+	drvdata->mode = CS_MODE_PERF;
 	tmc_etr_enable_hw(drvdata);
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -210,9 +218,9 @@ static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
 {
 	switch (mode) {
 	case CS_MODE_SYSFS:
-		return tmc_enable_etr_sink_sysfs(csdev, mode);
+		return tmc_enable_etr_sink_sysfs(csdev);
 	case CS_MODE_PERF:
-		return tmc_enable_etr_sink_perf(csdev, mode);
+		return tmc_enable_etr_sink_perf(csdev);
 	}
 
 	/* We shouldn't be here */
@@ -221,7 +229,6 @@ static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
 
 static void tmc_disable_etr_sink(struct coresight_device *csdev)
 {
-	long val;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -231,19 +238,244 @@ static void tmc_disable_etr_sink(struct coresight_device *csdev)
 		return;
 	}
 
-	val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
 	/* Disable the TMC only if it needs to */
-	if (val != CS_MODE_DISABLED)
+	if (drvdata->mode != CS_MODE_DISABLED) {
 		tmc_etr_disable_hw(drvdata);
+		drvdata->mode = CS_MODE_DISABLED;
+	}
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
 	dev_info(drvdata->dev, "TMC-ETR disabled\n");
 }
 
+static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, int cpu,
+				  void **pages, int nr_pages, bool overwrite)
+{
+	int node;
+	struct cs_etr_buffers *buf;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	if (cpu == -1)
+		cpu = smp_processor_id();
+	node = cpu_to_node(cpu);
+
+	/* Allocate memory structure for interaction with Perf */
+	buf = kzalloc_node(sizeof(struct cs_etr_buffers), GFP_KERNEL, node);
+	if (!buf)
+		return NULL;
+
+	buf->dev = drvdata->dev;
+	buf->size = drvdata->size;
+	buf->vaddr = dma_alloc_coherent(buf->dev, buf->size,
+					&buf->paddr, GFP_KERNEL);
+	if (!buf->vaddr) {
+		kfree(buf);
+		return NULL;
+	}
+
+	buf->tmc.snapshot = overwrite;
+	buf->tmc.nr_pages = nr_pages;
+	buf->tmc.data_pages = pages;
+
+	return buf;
+}
+
+static void tmc_free_etr_buffer(void *config)
+{
+	struct cs_etr_buffers *buf = config;
+
+	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->paddr);
+	kfree(buf);
+}
+
+static int tmc_set_etr_buffer(struct coresight_device *csdev,
+			      struct perf_output_handle *handle,
+			      void *sink_config)
+{
+	int ret = 0;
+	unsigned long head;
+	struct cs_etr_buffers *buf = sink_config;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	/* wrap head around to the amount of space we have */
+	head = handle->head & ((buf->tmc.nr_pages << PAGE_SHIFT) - 1);
+
+	/* find the page to write to */
+	buf->tmc.cur = head / PAGE_SIZE;
+
+	/* and offset within that page */
+	buf->tmc.offset = head % PAGE_SIZE;
+
+	local_set(&buf->tmc.data_size, 0);
+
+	/* Tell the HW where to put the trace data */
+	drvdata->vaddr = buf->vaddr;
+	drvdata->paddr = buf->paddr;
+	memset(drvdata->vaddr, 0, drvdata->size);
+
+	return ret;
+}
+
+static unsigned long tmc_reset_etr_buffer(struct coresight_device *csdev,
+					  struct perf_output_handle *handle,
+					  void *sink_config, bool *lost)
+{
+	long size = 0;
+	struct cs_etr_buffers *buf = sink_config;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	if (buf) {
+		/*
+		 * In snapshot mode ->data_size holds the new address of the
+		 * ring buffer's head.  The size itself is the whole address
+		 * range since we want the latest information.
+		 */
+		if (buf->tmc.snapshot) {
+			size = buf->tmc.nr_pages << PAGE_SHIFT;
+			handle->head = local_xchg(&buf->tmc.data_size, size);
+		}
+
+		/*
+		 * Tell the tracer PMU how much we got in this run and if
+		 * something went wrong along the way.  Nobody else can use
+		 * this cs_etr_buffers instance until we are done.  As such
+		 * resetting parameters here and squaring off with the ring
+		 * buffer API in the tracer PMU is fine.
+		 */
+		*lost = !!local_xchg(&buf->tmc.lost, 0);
+		size = local_xchg(&buf->tmc.data_size, 0);
+	}
+
+	/* Get ready for another run */
+	drvdata->vaddr = NULL;
+	drvdata->paddr = 0;
+
+	return size;
+}
+
+static void tmc_update_etr_buffer(struct coresight_device *csdev,
+				  struct perf_output_handle *handle,
+				  void *sink_config)
+{
+	int i, cur;
+	u32 *buf_ptr;
+	u32 read_ptr, write_ptr;
+	u32 status, to_read;
+	unsigned long offset;
+	struct cs_buffers *buf = sink_config;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	if (!buf)
+		return;
+
+	/* This shouldn't happen */
+	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
+		return;
+
+	CS_UNLOCK(drvdata->base);
+
+	tmc_flush_and_stop(drvdata);
+
+	read_ptr = readl_relaxed(drvdata->base + TMC_RRP);
+	write_ptr = readl_relaxed(drvdata->base + TMC_RWP);
+
+	/*
+	 * Get a hold of the status register and see if a wrap around
+	 * has occurred.  If so adjust things accordingly.
+	 */
+	status = readl_relaxed(drvdata->base + TMC_STS);
+	if (status & TMC_STS_FULL) {
+		local_inc(&buf->lost);
+		to_read = drvdata->size;
+	} else {
+		to_read = CIRC_CNT(write_ptr, read_ptr, drvdata->size);
+	}
+
+	/*
+	 * The TMC RAM buffer may be bigger than the space available in the
+	 * perf ring buffer (handle->size).  If so advance the RRP so that we
+	 * get the latest trace data.
+	 */
+	if (to_read > handle->size) {
+		u32 buffer_start, mask = 0;
+
+		/* Read buffer start address in system memory */
+		buffer_start = readl_relaxed(drvdata->base + TMC_DBALO);
+
+		/*
+		 * The value written to RRP must be byte-address aligned to
+		 * the width of the trace memory databus _and_ to a frame
+		 * boundary (16 byte), whichever is the biggest. For example,
+		 * for 32-bit, 64-bit and 128-bit wide trace memory, the four
+		 * LSBs must be 0s. For 256-bit wide trace memory, the five
+		 * LSBs must be 0s.
+		 */
+		switch (drvdata->memwidth) {
+		case TMC_MEM_INTF_WIDTH_32BITS:
+		case TMC_MEM_INTF_WIDTH_64BITS:
+		case TMC_MEM_INTF_WIDTH_128BITS:
+			mask = GENMASK(31, 5);
+			break;
+		case TMC_MEM_INTF_WIDTH_256BITS:
+			mask = GENMASK(31, 6);
+			break;
+		}
+
+		/*
+		 * Make sure the new size is aligned in accordance with the
+		 * requirement explained above.
+		 */
+		to_read = handle->size & mask;
+		/* Move the RAM read pointer up */
+		read_ptr = (write_ptr + drvdata->size) - to_read;
+		/* Make sure we are still within our limits */
+		if (read_ptr > (buffer_start + (drvdata->size - 1)))
+			read_ptr -= drvdata->size;
+		/* Tell the HW */
+		writel_relaxed(read_ptr, drvdata->base + TMC_RRP);
+		local_inc(&buf->lost);
+	}
+
+	cur = buf->cur;
+	offset = buf->offset;
+
+	/* for every byte to read */
+	for (i = 0; i < to_read; i += 4) {
+		buf_ptr = buf->data_pages[cur] + offset;
+		*buf_ptr = readl_relaxed(drvdata->base + TMC_RRD);
+
+		offset += 4;
+		if (offset >= PAGE_SIZE) {
+			offset = 0;
+			cur++;
+			/* wrap around at the end of the buffer */
+			cur &= buf->nr_pages - 1;
+		}
+	}
+
+	/*
+	 * In snapshot mode all we have to do is communicate to
+	 * perf_aux_output_end() the address of the current head.  In full
+	 * trace mode the same function expects a size to move rb->aux_head
+	 * forward.
+	 */
+	if (buf->snapshot)
+		local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
+	else
+		local_add(to_read, &buf->data_size);
+
+	CS_LOCK(drvdata->base);
+}
+
 static const struct coresight_ops_sink tmc_etr_sink_ops = {
 	.enable		= tmc_enable_etr_sink,
 	.disable	= tmc_disable_etr_sink,
+	.alloc_buffer	= tmc_alloc_etr_buffer,
+	.free_buffer	= tmc_free_etr_buffer,
+	.set_buffer	= tmc_set_etr_buffer,
+	.reset_buffer	= tmc_reset_etr_buffer,
+	.update_buffer	= tmc_update_etr_buffer,
 };
 
 const struct coresight_ops tmc_etr_cs_ops = {
@@ -253,7 +485,6 @@ const struct coresight_ops tmc_etr_cs_ops = {
 int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 {
 	int ret = 0;
-	long val;
 	unsigned long flags;
 
 	/* config types are set a boot time and never change */
@@ -266,9 +497,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 		goto out;
 	}
 
-	val = local_read(&drvdata->mode);
 	/* Don't interfere if operated from Perf */
-	if (val == CS_MODE_PERF) {
+	if (drvdata->mode == CS_MODE_PERF) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -280,7 +510,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 	}
 
 	/* Disable the TMC if need be */
-	if (val == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS)
 		tmc_etr_disable_hw(drvdata);
 
 	drvdata->reading = true;
@@ -303,7 +533,7 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
 	spin_lock_irqsave(&drvdata->spinlock, flags);
 
 	/* RE-enable the TMC if need be */
-	if (local_read(&drvdata->mode) == CS_MODE_SYSFS) {
+	if (drvdata->mode == CS_MODE_SYSFS) {
 		/*
 		 * The trace run will continue with the same allocated trace
 		 * buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index 44b3ae3..51c0185 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -117,7 +117,7 @@ struct tmc_drvdata {
 	void __iomem		*vaddr;
 	u32			size;
 	u32			len;
-	local_t			mode;
+	u32			mode;
 	enum tmc_config_type	config_type;
 	enum tmc_mem_intf_width	memwidth;
 	u32			trigger_cntr;
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index 7bf00a0..40ede64 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -368,6 +368,40 @@ struct coresight_device *coresight_get_sink(struct list_head *path)
 	return csdev;
 }
 
+static int coresight_enabled_sink(struct device *dev, void *data)
+{
+	bool *reset = data;
+	struct coresight_device *csdev = to_coresight_device(dev);
+
+	if ((csdev->type == CORESIGHT_DEV_TYPE_SINK ||
+	     csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) &&
+	     csdev->activated) {
+		/*
+		 * Now that we have a handle on the sink for this session,
+		 * disable the sysFS "enable_sink" flag so that possible
+		 * concurrent perf session that wish to use another sink don't
+		 * trip on it.  Doing so has no ramification for the current
+		 * session.
+		 */
+		if (*reset)
+			csdev->activated = false;
+
+		return 1;
+	}
+
+	return 0;
+}
+
+struct coresight_device *coresight_get_enabled_sink(bool reset)
+{
+	struct device *dev = NULL;
+
+	dev = bus_find_device(&coresight_bustype, NULL, &reset,
+			      coresight_enabled_sink);
+
+	return dev ? to_coresight_device(dev) : NULL;
+}
+
 /**
  * _coresight_build_path - recursively build a path from a @csdev to a sink.
  * @csdev:	The device to start from.
@@ -380,6 +414,7 @@ struct coresight_device *coresight_get_sink(struct list_head *path)
  * last one.
  */
 static int _coresight_build_path(struct coresight_device *csdev,
+				 struct coresight_device *sink,
 				 struct list_head *path)
 {
 	int i;
@@ -387,15 +422,15 @@ static int _coresight_build_path(struct coresight_device *csdev,
 	struct coresight_node *node;
 
 	/* An activated sink has been found.  Enqueue the element */
-	if ((csdev->type == CORESIGHT_DEV_TYPE_SINK ||
-	     csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) && csdev->activated)
+	if (csdev == sink)
 		goto out;
 
 	/* Not a sink - recursively explore each port found on this element */
 	for (i = 0; i < csdev->nr_outport; i++) {
 		struct coresight_device *child_dev = csdev->conns[i].child_dev;
 
-		if (child_dev && _coresight_build_path(child_dev, path) == 0) {
+		if (child_dev &&
+		    _coresight_build_path(child_dev, sink, path) == 0) {
 			found = true;
 			break;
 		}
@@ -422,18 +457,22 @@ static int _coresight_build_path(struct coresight_device *csdev,
 	return 0;
 }
 
-struct list_head *coresight_build_path(struct coresight_device *csdev)
+struct list_head *coresight_build_path(struct coresight_device *source,
+				       struct coresight_device *sink)
 {
 	struct list_head *path;
 	int rc;
 
+	if (!sink)
+		return ERR_PTR(-EINVAL);
+
 	path = kzalloc(sizeof(struct list_head), GFP_KERNEL);
 	if (!path)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(path);
 
-	rc = _coresight_build_path(csdev, path);
+	rc = _coresight_build_path(source, sink, path);
 	if (rc) {
 		kfree(path);
 		return ERR_PTR(rc);
@@ -497,6 +536,7 @@ static int coresight_validate_source(struct coresight_device *csdev,
 int coresight_enable(struct coresight_device *csdev)
 {
 	int cpu, ret = 0;
+	struct coresight_device *sink;
 	struct list_head *path;
 
 	mutex_lock(&coresight_mutex);
@@ -508,7 +548,17 @@ int coresight_enable(struct coresight_device *csdev)
 	if (csdev->enable)
 		goto out;
 
-	path = coresight_build_path(csdev);
+	/*
+	 * Search for a valid sink for this session but don't reset the
+	 * "enable_sink" flag in sysFS.  Users get to do that explicitly.
+	 */
+	sink = coresight_get_enabled_sink(false);
+	if (!sink) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	path = coresight_build_path(csdev, sink);
 	if (IS_ERR(path)) {
 		pr_err("building path(s) failed\n");
 		ret = PTR_ERR(path);
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 4ec127b..fe465b5 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -533,6 +533,24 @@ endif
 grep-libs  = $(filter -l%,$(1))
 strip-libs = $(filter-out -l%,$(1))
 
+ifdef CSTRACE_PATH
+  ifeq (${IS_64_BIT}, 1)
+    CSTRACE_LNX = linux64
+  else
+    CSTRACE_LNX = linux
+  endif
+  ifeq (${DEBUG}, 1)
+    LIBCSTRACE = -lcstraced_c_api -lcstraced
+    CSTRACE_LIB_PATH = $(CSTRACE_PATH)/lib/$(CSTRACE_LNX)/dbg
+  else
+    LIBCSTRACE = -lcstraced_c_api -lcstraced
+    CSTRACE_LIB_PATH = $(CSTRACE_PATH)/lib/$(CSTRACE_LNX)/rel
+  endif
+  $(call detected,CSTRACE)
+  $(call detected_var,CSTRACE_PATH)
+  EXTLIBS += -L$(CSTRACE_LIB_PATH) $(LIBCSTRACE) -lstdc++
+endif
+
 ifdef NO_LIBPERL
   CFLAGS += -DNO_LIBPERL
 else
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index a10f064..2810a64 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -86,6 +86,9 @@ include ../scripts/utilities.mak
 #
 # Define FEATURES_DUMP to provide features detection dump file
 # and bypass the feature detection
+#
+# Define NO_CSTRACE if you do not want CoreSight trace decoding support
+#
 
 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 47d584d..dfea6b6 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -575,8 +575,6 @@ static FILE *cs_device__open_file(const char *name)
 	snprintf(path, PATH_MAX,
 		 "%s" CS_BUS_DEVICE_PATH "%s", sysfs, name);
 
-	printf("path: %s\n", path);
-
 	if (stat(path, &st) < 0)
 		return NULL;
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7228d14..667f8c2 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -109,7 +109,8 @@ static struct {
 
 		.fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
 			      PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
-			      PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+			      PERF_OUTPUT_EVNAME | PERF_OUTPUT_ADDR |
+			      PERF_OUTPUT_IP |
 			      PERF_OUTPUT_SYM | PERF_OUTPUT_DSO |
 			      PERF_OUTPUT_PERIOD,
 
diff --git a/tools/perf/scripts/python/cs-trace-disasm.py b/tools/perf/scripts/python/cs-trace-disasm.py
new file mode 100644
index 0000000..c370e26
--- /dev/null
+++ b/tools/perf/scripts/python/cs-trace-disasm.py
@@ -0,0 +1,134 @@
+# perf script event handlers, generated by perf script -g python
+# Licensed under the terms of the GNU GPL License version 2
+
+# The common_* event handler fields are the most useful fields common to
+# all events.  They don't necessarily correspond to the 'common_*' fields
+# in the format files.  Those fields not available as handler params can
+# be retrieved using Python functions of the form common_*(context).
+# See the perf-trace-python Documentation for the list of available functions.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+                '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from subprocess import *
+from Core import *
+import re;
+
+from optparse import OptionParser
+
+#
+# Add options to specify vmlinux file and the objdump executable
+#
+parser = OptionParser()
+parser.add_option("-k", "--vmlinux", dest="vmlinux_name",
+                  help="path to vmlinux file")
+parser.add_option("-d", "--objdump", dest="objdump_name",
+                  help="name of objdump executable (in path)")
+(options, args) = parser.parse_args()
+
+if (options.objdump_name == None):
+        sys.exit("No objdump executable specified - use -d or --objdump option")
+
+# initialize global dicts and regular expression
+
+build_ids = dict();
+mmaps = dict();
+disasm_cache = dict();
+disasm_re = re.compile("^\s*([0-9a-fA-F]+):")
+
+cache_size = 16*1024
+
+def trace_begin():
+        cmd_output = check_output(["perf", "buildid-list"]).split('\n');
+        bid_re = re.compile("([a-fA-f0-9]+)[ \t]([^ \n]+)")
+        for line in cmd_output:
+                m = bid_re.search(line)
+                if (m != None) :
+			if (m.group(2) == "[kernel.kallsyms]") :
+				append = "/kallsyms"
+				dirname = "/" + m.group(2)
+			elif (m.group(2) == "[vdso]") :
+				append = "/vdso"
+				dirname = "/" + m.group(2)
+			else:
+				append = "/elf"
+				dirname = m.group(2)
+
+                        build_ids[m.group(2)] =  \
+                        os.environ['PERF_BUILDID_DIR'] +  \
+			dirname + "/" + m.group(1) + append;
+
+        if ((options.vmlinux_name != None) and ("[kernel.kallsyms]" in build_ids)):
+                build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
+        else:
+                del build_ids['[kernel.kallsyms]']
+
+        mmap_re = re.compile("PERF_RECORD_MMAP2 -?[0-9]+/[0-9]+: \[(0x[0-9a-fA-F]+).*:\s.*\s(\S*)")
+        cmd_output= check_output("perf script --show-mmap-events | fgrep PERF_RECORD_MMAP2",shell=True).split('\n')
+        for line in cmd_output:
+                m = mmap_re.search(line)
+                if (m != None) :
+                        mmaps[m.group(2)] = int(m.group(1),0)
+
+
+
+def trace_end():
+        pass
+
+def process_event(t):
+        global cache_size
+        global options
+
+        sample = t['sample']
+        dso = t['dso']
+
+        # don't let the cache get too big, but don't bother with a fancy replacement policy
+        # just clear it when it hits max size
+
+        if (len(disasm_cache) > cache_size):
+                disasm_cache.clear();
+
+        cpu = format(sample['cpu'], "d");
+        addr_range = format(sample['ip'],"x")  + ":" + format(sample['addr'],"x");
+
+        try:
+                disasm_output = disasm_cache[addr_range];
+        except:
+                try:
+                        fname = build_ids[dso];
+                except KeyError:
+                        if (dso == '[kernel.kallsyms]'):
+                                return;
+                        fname = dso;
+
+                if (dso in mmaps):
+                        offset = mmaps[dso];
+                        disasm = [options.objdump_name,"-d","-z", "--adjust-vma="+format(offset,"#x"),"--start-address="+format(sample['ip'],"#x"),"--stop-address="+format(sample['addr'],"#x"), fname]
+                else:
+                        offset = 0
+                        disasm = [options.objdump_name,"-d","-z", "--start-address="+format(sample['ip'],"#x"),"--stop-address="+format(sample['addr'],"#x"),fname]
+                disasm_output = check_output(disasm).split('\n')
+                disasm_cache[addr_range] = disasm_output;
+
+        print "FILE: %s\tCPU: %s" % (dso, cpu);
+        for line in disasm_output:
+                m = disasm_re.search(line)
+                if (m != None) :
+                        try:
+                                print "\t",line
+                        except:
+                                exit(1);
+                else:
+                        continue;
+
+def trace_unhandled(event_name, context, event_fields_dict):
+		print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
+
+def print_header(event_name, cpu, secs, nsecs, pid, comm):
+        print "print_header"
+	print "%-20s %5u %05u.%09u %8u %-20s " % \
+	(event_name, cpu, secs, nsecs, pid, comm),
diff --git a/tools/perf/scripts/python/cs-trace-ranges.py b/tools/perf/scripts/python/cs-trace-ranges.py
new file mode 100644
index 0000000..c8edacb
--- /dev/null
+++ b/tools/perf/scripts/python/cs-trace-ranges.py
@@ -0,0 +1,44 @@
+#
+# Copyright(C) 2016 Linaro Limited. All rights reserved.
+# Author: Tor Jeremiassen <tor.jeremiassen at linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License version 2 as published by
+# the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# You should have received a copy of the GNU General Public License along with
+# this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+                '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+
+def trace_begin():
+        pass;
+
+def trace_end():
+        pass
+
+def process_event(t):
+
+        sample = t['sample']
+
+        print "range:",format(sample['ip'],"x"),"-",format(sample['addr'],"x")
+
+def trace_unhandled(event_name, context, event_fields_dict):
+		print ' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())])
+
+def print_header(event_name, cpu, secs, nsecs, pid, comm):
+        print "print_header"
+	print "%-20s %5u %05u.%09u %8u %-20s " % \
+	(event_name, cpu, secs, nsecs, pid, comm),
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 1dc67ef..2bbb725 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -80,6 +80,8 @@ libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
 libperf-$(CONFIG_AUXTRACE) += intel-bts.o
+libperf-$(CONFIG_AUXTRACE) += cs-etm.o
+libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder/
 libperf-y += parse-branch-options.o
 libperf-y += parse-regs-options.o
 libperf-y += term.o
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c5a6e0b..4dbd500 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -58,6 +58,7 @@
 
 #include "intel-pt.h"
 #include "intel-bts.h"
+#include "cs-etm.h"
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
@@ -902,6 +903,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 	case PERF_AUXTRACE_INTEL_BTS:
 		return intel_bts_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_CS_ETM:
+		return cs_etm__process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/cs-etm-decoder/Build b/tools/perf/util/cs-etm-decoder/Build
new file mode 100644
index 0000000..e097599
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/Build
@@ -0,0 +1,6 @@
+ifeq ($(CSTRACE_PATH),)
+libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder-stub.o
+else
+CFLAGS_cs-etm-decoder.o += -I$(CSTRACE_PATH)/include
+libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder.o
+endif
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c
new file mode 100644
index 0000000..d2ebbd2
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c
@@ -0,0 +1,99 @@
+/*
+ *
+ * Copyright(C) 2015 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremiassen at linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+ * Public License for more details.
+ *
+ * You should have received a copy of the GNU GEneral Public License along
+ * with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdlib.h>
+
+#include "cs-etm-decoder.h"
+#include "../util.h"
+
+
+struct cs_etm_decoder {
+	void *state;
+	int dummy;
+};
+
+int cs_etm_decoder__flush(struct cs_etm_decoder *decoder)
+{
+	(void) decoder;
+	return -1;
+}
+
+int cs_etm_decoder__add_bin_file(struct cs_etm_decoder *decoder,
+				uint64_t offset,
+				uint64_t address,
+				uint64_t len,
+				const char *fname)
+{
+	(void) decoder;
+	(void) offset;
+	(void) address;
+	(void) len;
+	(void) fname;
+	return -1;
+}
+
+const struct cs_etm_state *cs_etm_decoder__process_data_block(
+				struct cs_etm_decoder *decoder,
+				uint64_t indx,
+				const uint8_t *buf,
+				size_t len,
+				size_t *consumed)
+{
+	(void) decoder;
+	(void) indx;
+	(void) buf;
+	(void) len;
+	(void) consumed;
+	return NULL;
+}
+
+int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder,
+				uint64_t address,
+				uint64_t len,
+				cs_etm_mem_cb_type cb_func)
+{
+	(void) decoder;
+	(void) address;
+	(void) len;
+	(void) cb_func;
+	return -1;
+}
+
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder,
+				struct cs_etm_packet *packet)
+{
+	(void) decoder;
+	(void) packet;
+	return -1;
+}
+
+struct cs_etm_decoder *cs_etm_decoder__new(uint32_t num_cpu,
+				struct cs_etm_decoder_params *d_params,
+				struct cs_etm_trace_params t_params[])
+{
+	(void) num_cpu;
+	(void) d_params;
+	(void) t_params;
+	return NULL;
+}
+
+void cs_etm_decoder__free(struct cs_etm_decoder *decoder)
+{
+	(void) decoder;
+	return;
+}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
new file mode 100644
index 0000000..ee2e02f
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -0,0 +1,527 @@
+/*
+ *
+ * Copyright(C) 2015 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremiassen at linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+ * Public License for more details.
+ *
+ * You should have received a copy of the GNU GEneral Public License along
+ * with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/err.h>
+#include <stdlib.h>
+
+#include "../cs-etm.h"
+#include "cs-etm-decoder.h"
+#include "../util.h"
+#include "../util/intlist.h"
+
+#include "c_api/opencsd_c_api.h"
+#include "ocsd_if_types.h"
+#include "etmv4/trc_pkt_types_etmv4.h"
+
+#define MAX_BUFFER 1024
+
+struct cs_etm_decoder {
+	struct cs_etm_state	state;
+	dcd_tree_handle_t	dcd_tree;
+	void (*packet_printer)(const char *);
+	cs_etm_mem_cb_type	mem_access;
+	ocsd_datapath_resp_t	prev_return;
+	size_t			prev_processed;
+	bool			trace_on;
+	bool			discontinuity;
+	struct cs_etm_packet	packet_buffer[MAX_BUFFER];
+	uint32_t		packet_count;
+	uint32_t		head;
+	uint32_t		tail;
+	uint32_t		end_tail;
+};
+
+static uint32_t cs_etm_decoder__mem_access(const void *context,
+					   const ocsd_vaddr_t address,
+					   const ocsd_mem_space_acc_t mem_space,
+					   const uint32_t req_size,
+					   uint8_t *buffer)
+{
+	struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
+	(void) mem_space;
+
+	return decoder->mem_access(decoder->state.data, address, req_size, buffer);
+}
+
+static int cs_etm_decoder__gen_etmv4_config(struct cs_etm_trace_params *params,
+					    ocsd_etmv4_cfg *config)
+{
+	config->reg_configr = params->reg_configr;
+	config->reg_traceidr = params->reg_traceidr;
+	config->reg_idr0 = params->reg_idr0;
+	config->reg_idr1 = params->reg_idr1;
+	config->reg_idr2 = params->reg_idr2;
+	config->reg_idr8 = params->reg_idr8;
+
+	config->reg_idr9 = 0;
+	config->reg_idr10 = 0;
+	config->reg_idr11 = 0;
+	config->reg_idr12 = 0;
+	config->reg_idr13 = 0;
+	config->arch_ver = ARCH_V8;
+	config->core_prof = profile_CortexA;
+
+	return 0;
+}
+
+static int cs_etm_decoder__flush_packet(struct cs_etm_decoder *decoder)
+{
+	int err = 0;
+
+	if (decoder == NULL)
+		return -1;
+
+	if (decoder->packet_count >= 31)
+		return -1;
+
+	if (decoder->tail != decoder->end_tail) {
+		decoder->tail = (decoder->tail + 1) & (MAX_BUFFER - 1);
+		decoder->packet_count++;
+	}
+
+	return err;
+}
+
+int cs_etm_decoder__flush(struct cs_etm_decoder *decoder)
+{
+	return cs_etm_decoder__flush_packet(decoder);
+}
+
+static int cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
+					 const ocsd_generic_trace_elem *elem,
+					 const uint8_t trace_chan_id,
+					 enum cs_etm_sample_type sample_type)
+{
+	int err = 0;
+	uint32_t et = 0;
+	struct int_node *inode = NULL;
+
+	if (decoder == NULL)
+		return -1;
+
+	if (decoder->packet_count >= 31)
+		return -1;
+
+	err = cs_etm_decoder__flush_packet(decoder);
+
+	if (err)
+		return err;
+
+	et = decoder->end_tail;
+	/* Search the RB tree for the cpu associated with this traceID */
+	inode = intlist__find(traceid_list, trace_chan_id);
+	if (!inode)
+		return PTR_ERR(inode);
+
+	decoder->packet_buffer[et].sample_type = sample_type;
+	decoder->packet_buffer[et].start_addr = elem->st_addr;
+	decoder->packet_buffer[et].end_addr   = elem->en_addr;
+	decoder->packet_buffer[et].exc	= false;
+	decoder->packet_buffer[et].exc_ret    = false;
+	decoder->packet_buffer[et].cpu	= *((int *)inode->priv);
+
+	et = (et + 1) & (MAX_BUFFER - 1);
+
+	decoder->end_tail = et;
+
+	return err;
+}
+
+static int cs_etm_decoder__mark_exception(struct cs_etm_decoder *decoder)
+{
+	int err = 0;
+
+	if (decoder == NULL)
+		return -1;
+
+	decoder->packet_buffer[decoder->end_tail].exc = true;
+
+	return err;
+}
+
+static int cs_etm_decoder__mark_exception_return(struct cs_etm_decoder *decoder)
+{
+	int err = 0;
+
+	if (decoder == NULL)
+		return -1;
+
+	decoder->packet_buffer[decoder->end_tail].exc_ret = true;
+
+	return err;
+}
+
+static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
+			const void *context,
+			const ocsd_trc_index_t indx,
+			const uint8_t trace_chan_id,
+			const ocsd_generic_trace_elem *elem)
+{
+	ocsd_datapath_resp_t resp = OCSD_RESP_CONT;
+	struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
+
+	(void) indx;
+	(void) trace_chan_id;
+
+	switch (elem->elem_type) {
+	case OCSD_GEN_TRC_ELEM_UNKNOWN:
+		break;
+	case OCSD_GEN_TRC_ELEM_NO_SYNC:
+		decoder->trace_on = false;
+		break;
+	case OCSD_GEN_TRC_ELEM_TRACE_ON:
+		decoder->trace_on = true;
+		break;
+	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
+		cs_etm_decoder__buffer_packet(decoder, elem,
+					     trace_chan_id, CS_ETM_RANGE);
+		resp = OCSD_RESP_WAIT;
+		break;
+	case OCSD_GEN_TRC_ELEM_EXCEPTION:
+		cs_etm_decoder__mark_exception(decoder);
+		break;
+	case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
+		cs_etm_decoder__mark_exception_return(decoder);
+		break;
+	case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
+	case OCSD_GEN_TRC_ELEM_EO_TRACE:
+	case OCSD_GEN_TRC_ELEM_ADDR_NACC:
+	case OCSD_GEN_TRC_ELEM_TIMESTAMP:
+	case OCSD_GEN_TRC_ELEM_CYCLE_COUNT:
+	case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN:
+	case OCSD_GEN_TRC_ELEM_EVENT:
+	case OCSD_GEN_TRC_ELEM_SWTRACE:
+	case OCSD_GEN_TRC_ELEM_CUSTOM:
+	default:
+		break;
+	}
+
+	decoder->state.err = 0;
+
+	return resp;
+}
+
+static ocsd_datapath_resp_t cs_etm_decoder__etmv4i_packet_printer(
+	const void *context,
+	const ocsd_datapath_op_t op,
+	const ocsd_trc_index_t indx,
+	const ocsd_etmv4_i_pkt *pkt)
+{
+	const size_t PACKET_STR_LEN = 1024;
+	ocsd_datapath_resp_t ret = OCSD_RESP_CONT;
+	char packet_str[PACKET_STR_LEN];
+	size_t offset;
+	struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
+
+	sprintf(packet_str, "%ld: ", (long int) indx);
+	offset = strlen(packet_str);
+
+	switch (op) {
+	case OCSD_OP_DATA:
+		if (ocsd_pkt_str(OCSD_PROTOCOL_ETMV4I,
+				  (void *)pkt,
+				  packet_str+offset,
+				  PACKET_STR_LEN-offset) != OCSD_OK)
+			ret = OCSD_RESP_FATAL_INVALID_PARAM;
+		break;
+	case OCSD_OP_EOT:
+		sprintf(packet_str, "**** END OF TRACE ****\n");
+		break;
+	case OCSD_OP_FLUSH:
+	case OCSD_OP_RESET:
+	default:
+		break;
+	}
+
+	decoder->packet_printer(packet_str);
+
+	return ret;
+}
+
+static int cs_etm_decoder__create_etmv4i_packet_printer(struct cs_etm_decoder_params *d_params,
+							struct cs_etm_trace_params *t_params,
+							struct cs_etm_decoder *decoder)
+{
+	ocsd_etmv4_cfg trace_config;
+	int ret = 0;
+	unsigned char CSID; /* CSID extracted from the config data */
+
+	if (d_params->packet_printer == NULL)
+		return -1;
+
+	ret = cs_etm_decoder__gen_etmv4_config(t_params, &trace_config);
+
+	if (ret != 0)
+		return -1;
+
+	decoder->packet_printer = d_params->packet_printer;
+
+	ret = ocsd_dt_create_decoder(decoder->dcd_tree,
+				     OCSD_BUILTIN_DCD_ETMV4I,
+				     OCSD_CREATE_FLG_PACKET_PROC,
+				     (void *)&trace_config,
+				     &CSID);
+
+	if (ret != 0)
+		return -1;
+
+	ret = ocsd_dt_attach_packet_callback(decoder->dcd_tree,
+					  CSID,
+					  OCSD_C_API_CB_PKT_SINK,
+					  cs_etm_decoder__etmv4i_packet_printer,
+					  decoder);
+	return ret;
+}
+
+static int cs_etm_decoder__create_etmv4i_packet_decoder(struct cs_etm_decoder_params *d_params,
+							struct cs_etm_trace_params *t_params,
+							struct cs_etm_decoder *decoder)
+{
+	ocsd_etmv4_cfg trace_config;
+	int ret = 0;
+	unsigned char CSID; /* CSID extracted from the config data */
+
+	decoder->packet_printer = d_params->packet_printer;
+
+	ret = cs_etm_decoder__gen_etmv4_config(t_params, &trace_config);
+
+	if (ret != 0)
+		return -1;
+
+	ret = ocsd_dt_create_decoder(decoder->dcd_tree,
+				     OCSD_BUILTIN_DCD_ETMV4I,
+				     OCSD_CREATE_FLG_FULL_DECODER,
+				     (void *)&trace_config,
+				     &CSID);
+
+	if (ret != 0)
+		return -1;
+
+	ret = ocsd_dt_set_gen_elem_outfn(decoder->dcd_tree,
+					cs_etm_decoder__gen_trace_elem_printer, decoder);
+	return ret;
+}
+
+int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder,
+				      uint64_t address,
+				      uint64_t len,
+				      cs_etm_mem_cb_type cb_func)
+{
+	int err;
+
+	decoder->mem_access = cb_func;
+	err = ocsd_dt_add_callback_mem_acc(decoder->dcd_tree,
+					   address,
+					   address+len-1,
+					   OCSD_MEM_SPACE_ANY,
+					   cs_etm_decoder__mem_access,
+					   decoder);
+	return err;
+}
+
+
+int cs_etm_decoder__add_bin_file(struct cs_etm_decoder *decoder,
+				 uint64_t offset,
+				 uint64_t address,
+				 uint64_t len,
+				 const char *fname)
+{
+	int err = 0;
+	ocsd_file_mem_region_t region;
+
+	(void) len;
+	if (NULL == decoder)
+		return -1;
+
+	if (NULL == decoder->dcd_tree)
+		return -1;
+
+	region.file_offset = offset;
+	region.start_address = address;
+	region.region_size = len;
+	err = ocsd_dt_add_binfile_region_mem_acc(decoder->dcd_tree,
+					   &region,
+					   1,
+					   OCSD_MEM_SPACE_ANY,
+					   fname);
+
+	return err;
+}
+
+const struct cs_etm_state *cs_etm_decoder__process_data_block(struct cs_etm_decoder *decoder,
+					uint64_t indx,
+					const uint8_t *buf,
+					size_t len,
+					size_t *consumed)
+{
+	int ret = 0;
+	ocsd_datapath_resp_t dp_ret = decoder->prev_return;
+	size_t processed = 0;
+
+	if (decoder->packet_count > 0) {
+		decoder->state.err = ret;
+		*consumed = processed;
+		return &(decoder->state);
+	}
+
+	while ((processed < len) && (0 == ret)) {
+
+		if (OCSD_DATA_RESP_IS_CONT(dp_ret)) {
+			uint32_t count;
+			dp_ret = ocsd_dt_process_data(decoder->dcd_tree,
+							OCSD_OP_DATA,
+							indx+processed,
+							len - processed,
+							&buf[processed],
+							&count);
+			processed += count;
+
+		} else if (OCSD_DATA_RESP_IS_WAIT(dp_ret)) {
+			dp_ret = ocsd_dt_process_data(decoder->dcd_tree,
+							OCSD_OP_FLUSH,
+							0,
+							0,
+							NULL,
+							NULL);
+			break;
+		} else
+			ret = -1;
+	}
+	if (OCSD_DATA_RESP_IS_WAIT(dp_ret)) {
+		if (OCSD_DATA_RESP_IS_CONT(decoder->prev_return)) {
+			decoder->prev_processed = processed;
+		}
+		processed = 0;
+	} else if (OCSD_DATA_RESP_IS_WAIT(decoder->prev_return)) {
+		processed = decoder->prev_processed;
+		decoder->prev_processed = 0;
+	}
+	*consumed = processed;
+	decoder->prev_return = dp_ret;
+	decoder->state.err = ret;
+	return &(decoder->state);
+}
+
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder,
+			       struct cs_etm_packet *packet)
+{
+	if (decoder->packet_count == 0)
+		return -1;
+
+	if (packet == NULL)
+		return -1;
+
+	*packet = decoder->packet_buffer[decoder->head];
+
+	decoder->head = (decoder->head + 1) & (MAX_BUFFER - 1);
+
+	decoder->packet_count--;
+
+	return 0;
+}
+
+static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
+{
+	unsigned i;
+
+	decoder->head = 0;
+	decoder->tail = 0;
+	decoder->end_tail = 0;
+	decoder->packet_count = 0;
+	for (i = 0; i < MAX_BUFFER; i++) {
+		decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
+		decoder->packet_buffer[i].end_addr   = 0xdeadbeefdeadbeefUL;
+		decoder->packet_buffer[i].exc	= false;
+		decoder->packet_buffer[i].exc_ret    = false;
+		decoder->packet_buffer[i].cpu	= INT_MIN;
+	}
+}
+
+struct cs_etm_decoder *cs_etm_decoder__new(uint32_t num_cpu,
+					   struct cs_etm_decoder_params *d_params,
+					   struct cs_etm_trace_params t_params[])
+{
+	struct cs_etm_decoder *decoder;
+	ocsd_dcd_tree_src_t format;
+	uint32_t flags;
+	int ret;
+	size_t i;
+
+	if ((t_params == NULL) || (d_params == 0))
+		return NULL;
+
+	decoder = zalloc(sizeof(struct cs_etm_decoder));
+
+	if (decoder == NULL)
+		return NULL;
+
+	decoder->state.data = d_params->data;
+	decoder->prev_return = OCSD_RESP_CONT;
+	cs_etm_decoder__clear_buffer(decoder);
+	format = (d_params->formatted ? OCSD_TRC_SRC_FRAME_FORMATTED :
+					 OCSD_TRC_SRC_SINGLE);
+	flags = 0;
+	flags |= (d_params->fsyncs ? OCSD_DFRMTR_HAS_FSYNCS : 0);
+	flags |= (d_params->hsyncs ? OCSD_DFRMTR_HAS_HSYNCS : 0);
+	flags |= (d_params->frame_aligned ? OCSD_DFRMTR_FRAME_MEM_ALIGN : 0);
+
+	/* Create decode tree for the data source */
+	decoder->dcd_tree = ocsd_create_dcd_tree(format, flags);
+
+	if (decoder->dcd_tree == 0)
+		goto err_free_decoder;
+
+	for (i = 0; i < num_cpu; ++i) {
+		switch (t_params[i].protocol) {
+		case CS_ETM_PROTO_ETMV4i:
+			if (d_params->operation == CS_ETM_OPERATION_PRINT)
+				ret = cs_etm_decoder__create_etmv4i_packet_printer(d_params, &t_params[i], decoder);
+			else if (d_params->operation == CS_ETM_OPERATION_DECODE)
+				ret = cs_etm_decoder__create_etmv4i_packet_decoder(d_params, &t_params[i], decoder);
+			else
+				ret = -CS_ETM_ERR_PARAM;
+			if (ret != 0)
+				goto err_free_decoder_tree;
+			break;
+		default:
+			goto err_free_decoder_tree;
+			break;
+		}
+	}
+
+
+	return decoder;
+
+err_free_decoder_tree:
+	ocsd_destroy_dcd_tree(decoder->dcd_tree);
+err_free_decoder:
+	free(decoder);
+	return NULL;
+}
+
+
+void cs_etm_decoder__free(struct cs_etm_decoder *decoder)
+{
+	if (decoder == NULL)
+		return;
+
+	ocsd_destroy_dcd_tree(decoder->dcd_tree);
+	decoder->dcd_tree = NULL;
+
+	free(decoder);
+}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
new file mode 100644
index 0000000..7e9db4c
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -0,0 +1,117 @@
+/*
+ * Copyright(C) 2015 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremiassen at linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+ * Public License for more details.
+ *
+ * You should have received a copy of the GNU GEneral Public License along
+ * with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef INCLUDE__CS_ETM_DECODER_H__
+#define INCLUDE__CS_ETM_DECODER_H__
+
+#include <linux/types.h>
+#include <stdio.h>
+
+struct cs_etm_decoder;
+
+struct cs_etm_buffer {
+	const unsigned char *buf;
+	size_t  len;
+	uint64_t offset;
+	//bool    consecutive;
+	uint64_t	ref_timestamp;
+	//uint64_t	trace_nr;
+};
+
+enum cs_etm_sample_type {
+	CS_ETM_RANGE      = 1 << 0,
+};
+
+struct cs_etm_state {
+	int err;
+	void *data;
+	unsigned isa;
+	uint64_t start;
+	uint64_t end;
+	uint64_t timestamp;
+};
+
+struct cs_etm_packet {
+	enum cs_etm_sample_type sample_type;
+	uint64_t start_addr;
+	uint64_t end_addr;
+	bool     exc;
+	bool     exc_ret;
+	int cpu;
+};
+
+
+struct cs_etm_queue;
+typedef uint32_t (*cs_etm_mem_cb_type)(struct cs_etm_queue *, uint64_t, size_t, uint8_t *);
+
+struct cs_etm_trace_params {
+	void *etmv4i_packet_handler;
+	uint32_t reg_idr0;
+	uint32_t reg_idr1;
+	uint32_t reg_idr2;
+	uint32_t reg_idr8;
+	uint32_t reg_configr;
+	uint32_t reg_traceidr;
+	int  protocol;
+};
+
+struct cs_etm_decoder_params {
+	int  operation;
+	void (*packet_printer)(const char *);
+	cs_etm_mem_cb_type  mem_acc_cb;
+	bool formatted;
+	bool fsyncs;
+	bool hsyncs;
+	bool frame_aligned;
+	void *data;
+};
+
+enum {
+	CS_ETM_PROTO_ETMV3 = 1,
+	CS_ETM_PROTO_ETMV4i,
+	CS_ETM_PROTO_ETMV4d,
+};
+
+enum {
+	CS_ETM_OPERATION_PRINT = 1,
+	CS_ETM_OPERATION_DECODE,
+};
+
+enum {
+	CS_ETM_ERR_NOMEM = 1,
+	CS_ETM_ERR_NODATA,
+	CS_ETM_ERR_PARAM,
+};
+
+
+struct cs_etm_decoder *cs_etm_decoder__new(uint32_t num_cpu, struct cs_etm_decoder_params *, struct cs_etm_trace_params []);
+
+int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *, uint64_t, uint64_t, cs_etm_mem_cb_type);
+
+int cs_etm_decoder__flush(struct cs_etm_decoder *);
+void cs_etm_decoder__free(struct cs_etm_decoder *);
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *, struct cs_etm_packet *);
+
+int cs_etm_decoder__add_bin_file(struct cs_etm_decoder *, uint64_t, uint64_t, uint64_t, const char *);
+
+const struct cs_etm_state *cs_etm_decoder__process_data_block(struct cs_etm_decoder *,
+				       uint64_t,
+				       const uint8_t *,
+				       size_t,
+				       size_t *);
+
+#endif /* INCLUDE__CS_ETM_DECODER_H__ */
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
new file mode 100644
index 0000000..91d6a8a
--- /dev/null
+++ b/tools/perf/util/cs-etm.c
@@ -0,0 +1,1501 @@
+/*
+ * Copyright(C) 2016 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremiassen at linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "perf.h"
+#include "thread_map.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "callchain.h"
+#include "auxtrace.h"
+#include "evlist.h"
+#include "machine.h"
+#include "util.h"
+#include "util/intlist.h"
+#include "color.h"
+#include "cs-etm.h"
+#include "cs-etm-decoder/cs-etm-decoder.h"
+#include "debug.h"
+
+#include <stdlib.h>
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define MAX_TIMESTAMP (~0ULL)
+
+struct cs_etm_auxtrace {
+	struct auxtrace		auxtrace;
+	struct auxtrace_queues	queues;
+	struct auxtrace_heap	heap;
+	u64			**metadata;
+	u32			auxtrace_type;
+	struct perf_session	*session;
+	struct machine		*machine;
+	struct perf_evsel	*switch_evsel;
+	struct thread		*unknown_thread;
+	uint32_t		num_cpu;
+	bool			timeless_decoding;
+	bool			sampling_mode;
+	bool			snapshot_mode;
+	bool			data_queued;
+	bool			sync_switch;
+	bool			synth_needs_swap;
+	int			have_sched_switch;
+
+	bool			sample_instructions;
+	u64			instructions_sample_type;
+	u64			instructions_sample_period;
+	u64			instructions_id;
+	struct itrace_synth_opts synth_opts;
+	unsigned		pmu_type;
+};
+
+struct cs_etm_queue {
+	struct cs_etm_auxtrace	*etm;
+	unsigned		queue_nr;
+	struct auxtrace_buffer	*buffer;
+	const struct		cs_etm_state *state;
+	struct ip_callchain	*chain;
+	union perf_event	*event_buf;
+	bool			on_heap;
+	bool			step_through_buffers;
+	bool			use_buffer_pid_tid;
+	pid_t			pid, tid;
+	int			cpu;
+	struct thread		*thread;
+	u64			time;
+	u64			timestamp;
+	bool			stop;
+	struct cs_etm_decoder	*decoder;
+	u64			offset;
+	bool			eot;
+	bool			kernel_mapped;
+};
+
+static int cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq);
+static int cs_etm__update_queues(struct cs_etm_auxtrace *);
+static int cs_etm__process_queues(struct cs_etm_auxtrace *, u64);
+static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *, pid_t, u64);
+static uint32_t cs_etm__mem_access(struct cs_etm_queue *, uint64_t, size_t, uint8_t *);
+
+static void cs_etm__packet_dump(const char *pkt_string)
+{
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color, "  %s\n", pkt_string);
+	fflush(stdout);
+}
+
+static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
+			      struct auxtrace_buffer *buffer)
+{
+	const char *color = PERF_COLOR_BLUE;
+	struct cs_etm_decoder_params d_params;
+	struct cs_etm_trace_params *t_params;
+	struct cs_etm_decoder *decoder;
+	size_t buffer_used = 0;
+	size_t i;
+
+	fprintf(stdout, "\n");
+	color_fprintf(stdout, color,
+		     ". ... CoreSight ETM Trace data: size %zu bytes\n",
+		     buffer->size);
+
+	t_params = zalloc(sizeof(struct cs_etm_trace_params) * etm->num_cpu);
+	for (i = 0; i < etm->num_cpu; ++i) {
+		t_params[i].protocol = CS_ETM_PROTO_ETMV4i;
+		t_params[i].reg_idr0 = etm->metadata[i][CS_ETMV4_TRCIDR0];
+		t_params[i].reg_idr1 = etm->metadata[i][CS_ETMV4_TRCIDR1];
+		t_params[i].reg_idr2 = etm->metadata[i][CS_ETMV4_TRCIDR2];
+		t_params[i].reg_idr8 = etm->metadata[i][CS_ETMV4_TRCIDR8];
+		t_params[i].reg_configr = etm->metadata[i][CS_ETMV4_TRCCONFIGR];
+		t_params[i].reg_traceidr = etm->metadata[i][CS_ETMV4_TRCTRACEIDR];
+  //[CS_ETMV4_TRCAUTHSTATUS] = "   TRCAUTHSTATUS		  %"PRIx64"\n",
+	}
+	d_params.packet_printer = cs_etm__packet_dump;
+	d_params.operation = CS_ETM_OPERATION_PRINT;
+	d_params.formatted = true;
+	d_params.fsyncs = false;
+	d_params.hsyncs = false;
+	d_params.frame_aligned = true;
+
+	decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params);
+
+	zfree(&t_params);
+
+	if (decoder == NULL)
+		return;
+	do {
+	    size_t consumed;
+	    cs_etm_decoder__process_data_block(decoder, buffer->offset,
+					       &(((uint8_t *)buffer->data)[buffer_used]),
+					       buffer->size - buffer_used, &consumed);
+	    buffer_used += consumed;
+	} while (buffer_used < buffer->size);
+	cs_etm_decoder__free(decoder);
+}
+
+static int cs_etm__flush_events(struct perf_session *session, struct perf_tool *tool)
+{
+	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+						   struct cs_etm_auxtrace,
+						   auxtrace);
+
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = cs_etm__update_queues(etm);
+
+	if (ret < 0)
+		return ret;
+
+	if (etm->timeless_decoding)
+		return cs_etm__process_timeless_queues(etm, -1, MAX_TIMESTAMP - 1);
+
+	return cs_etm__process_queues(etm, MAX_TIMESTAMP);
+}
+
+static void  cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
+				    struct auxtrace_queue *queue)
+{
+	struct cs_etm_queue *etmq = queue->priv;
+
+	if ((queue->tid == -1) || (etm->have_sched_switch)) {
+		etmq->tid = machine__get_current_tid(etm->machine, etmq->cpu);
+		thread__zput(etmq->thread);
+	}
+
+	if ((!etmq->thread) && (etmq->tid != -1))
+		etmq->thread = machine__find_thread(etm->machine, -1, etmq->tid);
+
+	if (etmq->thread) {
+		etmq->pid = etmq->thread->pid_;
+		if (queue->cpu == -1)
+			etmq->cpu = etmq->thread->cpu;
+	}
+}
+
+static void cs_etm__free_queue(void *priv)
+{
+	struct cs_etm_queue *etmq = priv;
+
+	if (!etmq)
+		return;
+
+	thread__zput(etmq->thread);
+	cs_etm_decoder__free(etmq->decoder);
+	zfree(&etmq->event_buf);
+	zfree(&etmq->chain);
+	free(etmq);
+}
+
+static void cs_etm__free_events(struct perf_session *session)
+{
+	struct cs_etm_auxtrace *aux = container_of(session->auxtrace,
+						   struct cs_etm_auxtrace,
+						   auxtrace);
+
+	struct auxtrace_queues *queues = &(aux->queues);
+
+	unsigned i;
+
+	for (i = 0; i < queues->nr_queues; ++i) {
+		cs_etm__free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = 0;
+	}
+
+	auxtrace_queues__free(queues);
+
+}
+
+static void cs_etm__free(struct perf_session *session)
+{
+
+	size_t i;
+	struct int_node *inode, *tmp;
+	struct cs_etm_auxtrace *aux = container_of(session->auxtrace,
+						   struct cs_etm_auxtrace,
+						   auxtrace);
+	auxtrace_heap__free(&aux->heap);
+	cs_etm__free_events(session);
+	session->auxtrace = NULL;
+
+	/* First remove all traceID/CPU# nodes from the RB tree */
+	intlist__for_each_entry_safe(inode, tmp, traceid_list)
+		intlist__remove(traceid_list, inode);
+	/* Then the RB tree itself */
+	intlist__delete(traceid_list);
+
+	//thread__delete(aux->unknown_thread);
+	for (i = 0; i < aux->num_cpu; ++i)
+		zfree(&aux->metadata[i]);
+	zfree(&aux->metadata);
+	free(aux);
+}
+
+static void cs_etm__use_buffer_pid_tid(struct cs_etm_queue *etmq,
+				      struct auxtrace_queue *queue,
+				      struct auxtrace_buffer *buffer)
+{
+	if ((queue->cpu == -1) && (buffer->cpu != -1))
+		etmq->cpu = buffer->cpu;
+
+	etmq->pid = buffer->pid;
+	etmq->tid = buffer->tid;
+
+	thread__zput(etmq->thread);
+
+	if (etmq->tid != -1) {
+		if (etmq->pid != -1) {
+			etmq->thread = machine__findnew_thread(etmq->etm->machine,
+							       etmq->pid,
+							       etmq->tid);
+		} else {
+			etmq->thread = machine__findnew_thread(etmq->etm->machine,
+							       -1,
+							       etmq->tid);
+		}
+	}
+}
+
+
+static int cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq)
+{
+	struct auxtrace_buffer *aux_buffer = etmq->buffer;
+	struct auxtrace_buffer *old_buffer = aux_buffer;
+	struct auxtrace_queue *queue;
+
+	if (etmq->stop) {
+		buff->len = 0;
+		return 0;
+	}
+
+	queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
+
+	aux_buffer = auxtrace_buffer__next(queue, aux_buffer);
+
+	if (!aux_buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		buff->len = 0;
+		return 0;
+	}
+
+	etmq->buffer = aux_buffer;
+
+	if (!aux_buffer->data) {
+		int fd = perf_data_file__fd(etmq->etm->session->file);
+
+		aux_buffer->data = auxtrace_buffer__get_data(aux_buffer, fd);
+		if (!aux_buffer->data)
+			return -ENOMEM;
+	}
+
+	if (old_buffer)
+		auxtrace_buffer__drop_data(old_buffer);
+
+	if (aux_buffer->use_data) {
+		buff->offset = aux_buffer->offset;
+		buff->len = aux_buffer->use_size;
+		buff->buf = aux_buffer->use_data;
+	} else {
+		buff->offset = aux_buffer->offset;
+		buff->len = aux_buffer->size;
+		buff->buf = aux_buffer->data;
+	}
+	/*
+	 * buff->offset = 0;
+	 * buff->len = sizeof(cstrace);
+	 * buff->buf = cstrace;
+	*/
+
+	buff->ref_timestamp = aux_buffer->reference;
+
+	if (etmq->use_buffer_pid_tid &&
+	    ((etmq->pid != aux_buffer->pid) ||
+	     (etmq->tid != aux_buffer->tid))) {
+		cs_etm__use_buffer_pid_tid(etmq, queue, aux_buffer);
+	}
+
+	if (etmq->step_through_buffers)
+		etmq->stop = true;
+
+	return buff->len;
+}
+
+static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
+					       unsigned int queue_nr)
+{
+	struct cs_etm_decoder_params d_params;
+	struct cs_etm_trace_params   *t_params;
+	struct cs_etm_queue *etmq;
+	size_t i;
+
+	etmq = zalloc(sizeof(struct cs_etm_queue));
+	if (!etmq)
+		return NULL;
+
+	if (etm->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		sz += etm->synth_opts.callchain_sz * sizeof(u64);
+		etmq->chain = zalloc(sz);
+		if (!etmq->chain)
+			goto out_free;
+	} else {
+		etmq->chain = NULL;
+	}
+
+	etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!etmq->event_buf)
+		goto out_free;
+
+	etmq->etm = etm;
+	etmq->queue_nr = queue_nr;
+	etmq->pid = -1;
+	etmq->tid = -1;
+	etmq->cpu = -1;
+	etmq->stop = false;
+	etmq->kernel_mapped = false;
+
+	t_params = zalloc(sizeof(struct cs_etm_trace_params)*etm->num_cpu);
+
+	for (i = 0; i < etm->num_cpu; ++i) {
+		t_params[i].reg_idr0 = etm->metadata[i][CS_ETMV4_TRCIDR0];
+		t_params[i].reg_idr1 = etm->metadata[i][CS_ETMV4_TRCIDR1];
+		t_params[i].reg_idr2 = etm->metadata[i][CS_ETMV4_TRCIDR2];
+		t_params[i].reg_idr8 = etm->metadata[i][CS_ETMV4_TRCIDR8];
+		t_params[i].reg_configr = etm->metadata[i][CS_ETMV4_TRCCONFIGR];
+		t_params[i].reg_traceidr = etm->metadata[i][CS_ETMV4_TRCTRACEIDR];
+		t_params[i].protocol = CS_ETM_PROTO_ETMV4i;
+	}
+	d_params.packet_printer = cs_etm__packet_dump;
+	d_params.operation = CS_ETM_OPERATION_DECODE;
+	d_params.formatted = true;
+	d_params.fsyncs = false;
+	d_params.hsyncs = false;
+	d_params.frame_aligned = true;
+	d_params.data = etmq;
+
+	etmq->decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params);
+
+
+	zfree(&t_params);
+
+	if (!etmq->decoder)
+		goto out_free;
+
+	etmq->offset = 0;
+	etmq->eot = false;
+
+	return etmq;
+
+out_free:
+	zfree(&etmq->event_buf);
+	zfree(&etmq->chain);
+	free(etmq);
+	return NULL;
+}
+
+static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
+			      struct auxtrace_queue *queue,
+			      unsigned int queue_nr)
+{
+	struct cs_etm_queue *etmq = queue->priv;
+
+	if (list_empty(&(queue->head)))
+		return 0;
+
+	if (etmq == NULL) {
+		etmq = cs_etm__alloc_queue(etm, queue_nr);
+
+		if (etmq == NULL)
+			return -ENOMEM;
+
+		queue->priv = etmq;
+
+		if (queue->cpu != -1)
+			etmq->cpu = queue->cpu;
+
+		etmq->tid = queue->tid;
+
+		if (etm->sampling_mode)
+			if (etm->timeless_decoding)
+				etmq->step_through_buffers = true;
+			if (etm->timeless_decoding || !etm->have_sched_switch)
+				etmq->use_buffer_pid_tid = true;
+	}
+
+	if (!etmq->on_heap &&
+	    (!etm->sync_switch)) {
+		const struct cs_etm_state *state;
+		int ret = 0;
+
+		if (etm->timeless_decoding)
+			return ret;
+
+		//cs_etm__log("queue %u getting timestamp\n", queue_nr);
+		//cs_etm__log("queue %u decoding cpu %d pid %d tid %d\n",
+			   //queue_nr, etmq->cpu, etmq->pid, etmq->tid);
+		(void) state;
+		return ret;
+		/*
+		while (1) {
+			state = cs_etm_decoder__decode(etmq->decoder);
+			if (state->err) {
+				if (state->err == CS_ETM_ERR_NODATA) {
+					//cs_etm__log("queue %u has no timestamp\n",
+						   //queue_nr);
+					return 0;
+				}
+				continue;
+			}
+			if (state->timestamp)
+				break;
+		}
+
+		etmq->timestamp = state->timestamp;
+		//cs_etm__log("queue %u timestamp 0x%"PRIx64 "\n",
+			   //queue_nr, etmq->timestamp);
+		etmq->state = state;
+		etmq->have_sample = true;
+		//cs_etm__sample_flags(etmq);
+		ret = auxtrace_heap__add(&etm->heap, queue_nr, etmq->timestamp);
+		if (ret)
+			return ret;
+		etmq->on_heap = true;
+		*/
+	}
+
+	return 0;
+}
+
+
+static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < etm->queues.nr_queues; i++) {
+		ret = cs_etm__setup_queue(etm, &(etm->queues.queue_array[i]), i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+#if 0
+struct cs_etm_cache_entry {
+	struct auxtrace_cache_entry	entry;
+	uint64_t			icount;
+	uint64_t			bcount;
+};
+
+static size_t cs_etm__cache_divisor(void)
+{
+	static size_t d = 64;
+
+	return d;
+}
+
+static size_t cs_etm__cache_size(struct dso *dso,
+				struct machine *machine)
+{
+	off_t size;
+
+	size = dso__data_size(dso, machine);
+	size /= cs_etm__cache_divisor();
+
+	if (size < 1000)
+		return 10;
+
+	if (size > (1 << 21))
+		return 21;
+
+	return 32 - __builtin_clz(size);
+}
+
+static struct auxtrace_cache *cs_etm__cache(struct dso *dso,
+					   struct machine *machine)
+{
+	struct auxtrace_cache *c;
+	size_t bits;
+
+	if (dso->auxtrace_cache)
+		return dso->auxtrace_cache;
+
+	bits = cs_etm__cache_size(dso, machine);
+
+	c = auxtrace_cache__new(bits, sizeof(struct cs_etm_cache_entry), 200);
+
+	dso->auxtrace_cache = c;
+
+	return c;
+}
+
+static int cs_etm__cache_add(struct dso *dso, struct machine *machine,
+			    uint64_t offset, uint64_t icount, uint64_t bcount)
+{
+	struct auxtrace_cache *c = cs_etm__cache(dso, machine);
+	struct cs_etm_cache_entry *e;
+	int err;
+
+	if (!c)
+		return -ENOMEM;
+
+	e = auxtrace_cache__alloc_entry(c);
+	if (!e)
+		return -ENOMEM;
+
+	e->icount = icount;
+	e->bcount = bcount;
+
+	err = auxtrace_cache__add(c, offset, &e->entry);
+
+	if (err)
+		auxtrace_cache__free_entry(c, e);
+
+	return err;
+}
+
+static struct cs_etm_cache_entry *cs_etm__cache_lookup(struct dso *dso,
+						      struct machine *machine,
+						      uint64_t offset)
+{
+	struct auxtrace_cache *c = cs_etm__cache(dso, machine);
+
+	if (!c)
+		return NULL;
+
+	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
+}
+#endif
+
+static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
+					    struct cs_etm_packet *packet)
+{
+	int ret = 0;
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	union perf_event *event = etmq->event_buf;
+	struct perf_sample sample = {.ip = 0,};
+	uint64_t start_addr = packet->start_addr;
+	uint64_t end_addr = packet->end_addr;
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+
+	sample.ip = start_addr;
+	sample.pid = etmq->pid;
+	sample.tid = etmq->tid;
+	sample.addr = end_addr;
+	sample.id = etmq->etm->instructions_id;
+	sample.stream_id = etmq->etm->instructions_id;
+	sample.period = (end_addr - start_addr) >> 2;
+	sample.cpu = packet->cpu;
+	sample.flags = 0; // etmq->flags;
+	sample.insn_len = 1; // etmq->insn_len;
+	sample.cpumode = event->header.misc;
+
+	//etmq->last_insn_cnt = etmq->state->tot_insn_cnt;
+
+#if 0
+	{
+		struct   addr_location al;
+		uint64_t offset;
+		struct   thread *thread;
+		struct   machine *machine = etmq->etm->machine;
+		uint8_t  cpumode;
+		struct   cs_etm_cache_entry *e;
+		uint8_t  buf[256];
+		size_t   bufsz;
+
+		thread = etmq->thread;
+
+		if (!thread)
+			thread = etmq->etm->unknown_thread;
+
+		if (start_addr > 0xffffffc000000000UL)
+			cpumode = PERF_RECORD_MISC_KERNEL;
+		else
+			cpumode = PERF_RECORD_MISC_USER;
+
+		thread__find_addr_map(thread, cpumode, MAP__FUNCTION, start_addr, &al);
+		if (!al.map || !al.map->dso)
+			goto endTest;
+		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+		    dso__data_status_seen(al.map->dso, DSO_DATA_STATUS_SEEN_ITRACE))
+			goto endTest;
+
+		offset = al.map->map_ip(al.map, start_addr);
+
+
+		e = cs_etm__cache_lookup(al.map->dso, machine, offset);
+
+		if (e)
+			(void) e;
+		else {
+			int len;
+			map__load(al.map, machine->symbol_filter);
+
+			bufsz = sizeof(buf);
+			len = dso__data_read_offset(al.map->dso, machine,
+						    offset, buf, bufsz);
+
+			if (len <= 0)
+				goto endTest;
+
+			cs_etm__cache_add(al.map->dso, machine, offset, (end_addr - start_addr) >> 2, end_addr - start_addr);
+
+		}
+endTest:
+		(void) offset;
+	}
+#endif
+
+	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
+
+	if (ret)
+		pr_err("CS ETM Trace: failed to deliver instruction event, error %d\n", ret);
+
+	return ret;
+}
+
+struct cs_etm_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+
+static int cs_etm__event_synth(struct perf_tool *tool,
+			      union perf_event *event,
+			      struct perf_sample *sample,
+			      struct machine *machine)
+{
+	struct cs_etm_synth *cs_etm_synth =
+		      container_of(tool, struct cs_etm_synth, dummy_tool);
+
+	(void) sample;
+	(void) machine;
+
+	return perf_session__deliver_synth_event(cs_etm_synth->session, event, NULL);
+
+}
+
+
+static int cs_etm__synth_event(struct perf_session *session,
+			      struct perf_event_attr *attr, u64 id)
+{
+	struct cs_etm_synth cs_etm_synth;
+
+	memset(&cs_etm_synth, 0, sizeof(struct cs_etm_synth));
+	cs_etm_synth.session = session;
+
+	return perf_event__synthesize_attr(&cs_etm_synth.dummy_tool, attr, 1,
+					   &id, cs_etm__event_synth);
+}
+
+static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
+				struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each_entry(evlist, evsel) {
+
+		if (evsel->attr.type == etm->pmu_type) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Core Sight Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	if (etm->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+
+	if (!id)
+		id = 1;
+
+	if (etm->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = etm->synth_opts.period;
+		etm->instructions_sample_period = attr.sample_period;
+		err = cs_etm__synth_event(session, &attr, id);
+
+		if (err) {
+			pr_err("%s: failed to synthesize 'instructions' event type\n",
+			       __func__);
+			return err;
+		}
+		etm->sample_instructions = true;
+		etm->instructions_sample_type = attr.sample_type;
+		etm->instructions_id = id;
+		id += 1;
+	}
+
+	etm->synth_needs_swap = evsel->needs_swap;
+	return 0;
+}
+
+static int cs_etm__sample(struct cs_etm_queue *etmq)
+{
+	//const struct cs_etm_state *state = etmq->state;
+	struct cs_etm_packet packet;
+	//struct cs_etm_auxtrace *etm = etmq->etm;
+	int err;
+
+	err = cs_etm_decoder__get_packet(etmq->decoder, &packet);
+	// if there is no sample, it returns err = -1, no real error
+
+	if (!err && packet.sample_type & CS_ETM_RANGE) {
+		err = cs_etm__synth_instruction_sample(etmq, &packet);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int cs_etm__run_decoder(struct cs_etm_queue *etmq, u64 *timestamp)
+{
+	struct cs_etm_buffer buffer;
+	size_t buffer_used;
+	int err = 0;
+
+	/* Go through each buffer in the queue and decode them one by one */
+more:
+	buffer_used = 0;
+	memset(&buffer, 0, sizeof(buffer));
+	err = cs_etm__get_trace(&buffer, etmq);
+	if (err <= 0)
+		return err;
+
+	do {
+		size_t processed = 0;
+		etmq->state = cs_etm_decoder__process_data_block(etmq->decoder,
+					etmq->offset,
+					&buffer.buf[buffer_used],
+					buffer.len-buffer_used,
+					&processed);
+		err = etmq->state->err;
+		etmq->offset += processed;
+		buffer_used += processed;
+		if (!err)
+			cs_etm__sample(etmq);
+	} while (!etmq->eot && (buffer.len > buffer_used));
+goto more;
+
+	(void) timestamp;
+
+	return err;
+}
+
+static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
+{
+  if (etm->queues.new_data) {
+	etm->queues.new_data = false;
+	return cs_etm__setup_queues(etm);
+  }
+  return 0;
+}
+
+static int cs_etm__process_queues(struct cs_etm_auxtrace *etm, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct cs_etm_queue *etmq;
+
+		if (!etm->heap.heap_cnt)
+			return 0;
+
+		if (etm->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = etm->heap.heap_array[0].queue_nr;
+		queue = &etm->queues.queue_array[queue_nr];
+		etmq = queue->priv;
+
+		//cs_etm__log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 "\n",
+			   //queue_nr, etm->heap.heap_array[0].ordinal,
+			   //timestamp);
+
+		auxtrace_heap__pop(&etm->heap);
+
+		if (etm->heap.heap_cnt) {
+			ts = etm->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		cs_etm__set_pid_tid_cpu(etm, queue);
+
+		ret = cs_etm__run_decoder(etmq, &ts);
+
+		if (ret < 0) {
+			auxtrace_heap__add(&etm->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&etm->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			etmq->on_heap = false;
+		}
+	}
+	return 0;
+}
+
+static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
+					  pid_t tid,
+					  u64 time_)
+{
+	struct auxtrace_queues *queues = &etm->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; ++i) {
+		struct auxtrace_queue *queue = &(etm->queues.queue_array[i]);
+		struct cs_etm_queue *etmq = queue->priv;
+
+		if (etmq && ((tid == -1) || (etmq->tid == tid))) {
+			etmq->time = time_;
+			cs_etm__set_pid_tid_cpu(etm, queue);
+			cs_etm__run_decoder(etmq, &ts);
+
+		}
+	}
+	return 0;
+}
+
+static struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm,
+						int cpu)
+{
+	unsigned q, j;
+
+	if (etm->queues.nr_queues == 0)
+		return NULL;
+
+	if (cpu < 0)
+		q = 0;
+	else if ((unsigned) cpu >= etm->queues.nr_queues)
+		q = etm->queues.nr_queues - 1;
+	else
+		q = cpu;
+
+	if (etm->queues.queue_array[q].cpu == cpu)
+		return etm->queues.queue_array[q].priv;
+
+	for (j = 0; q > 0; j++) {
+		if (etm->queues.queue_array[--q].cpu == cpu)
+			return etm->queues.queue_array[q].priv;
+	}
+
+	for (; j < etm->queues.nr_queues; j++) {
+		if (etm->queues.queue_array[j].cpu == cpu)
+			return etm->queues.queue_array[j].priv;
+
+	}
+
+	return NULL;
+}
+
+static uint32_t cs_etm__mem_access(struct cs_etm_queue *etmq, uint64_t address, size_t size, uint8_t *buffer)
+{
+	struct   addr_location al;
+	uint64_t offset;
+	struct   thread *thread;
+	struct   machine *machine;
+	uint8_t  cpumode;
+	int len;
+
+	if (etmq == NULL)
+		return -1;
+
+	machine = etmq->etm->machine;
+	thread = etmq->thread;
+	if (address > 0xffffffc000000000UL)
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, address, &al);
+
+	if (!al.map || !al.map->dso)
+		return 0;
+
+	if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+	    dso__data_status_seen(al.map->dso, DSO_DATA_STATUS_SEEN_ITRACE))
+		return 0;
+
+	offset = al.map->map_ip(al.map, address);
+
+	map__load(al.map);
+
+	len = dso__data_read_offset(al.map->dso, machine,
+				    offset, buffer, size);
+
+	if (len <= 0)
+		return 0;
+
+	return len;
+}
+
+static bool check_need_swap(int file_endian)
+{
+	const int data = 1;
+	u8 *check = (u8 *)&data;
+	int host_endian;
+
+	if (check[0] == 1)
+		host_endian = ELFDATA2LSB;
+	else
+		host_endian = ELFDATA2MSB;
+
+	return host_endian != file_endian;
+}
+
+static int cs_etm__read_elf_info(const char *fname, uint64_t *foffset, uint64_t *fstart, uint64_t *fsize)
+{
+	FILE *fp;
+	u8 e_ident[EI_NIDENT];
+	int ret = -1;
+	bool need_swap = false;
+	size_t buf_size;
+	void *buf;
+	int i;
+
+	fp = fopen(fname, "r");
+	if (fp == NULL)
+		return -1;
+
+	if (fread(e_ident, sizeof(e_ident), 1, fp) != 1)
+		goto out;
+
+	if (memcmp(e_ident, ELFMAG, SELFMAG) ||
+	    e_ident[EI_VERSION] != EV_CURRENT)
+		goto out;
+
+	need_swap = check_need_swap(e_ident[EI_DATA]);
+
+	/* for simplicity */
+	fseek(fp, 0, SEEK_SET);
+
+	if (e_ident[EI_CLASS] == ELFCLASS32) {
+		Elf32_Ehdr ehdr;
+		Elf32_Phdr *phdr;
+
+		if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1)
+			goto out;
+
+		if (need_swap) {
+			ehdr.e_phoff = bswap_32(ehdr.e_phoff);
+			ehdr.e_phentsize = bswap_16(ehdr.e_phentsize);
+			ehdr.e_phnum = bswap_16(ehdr.e_phnum);
+		}
+
+		buf_size = ehdr.e_phentsize * ehdr.e_phnum;
+		buf = malloc(buf_size);
+		if (buf == NULL)
+			goto out;
+
+		fseek(fp, ehdr.e_phoff, SEEK_SET);
+		if (fread(buf, buf_size, 1, fp) != 1)
+			goto out_free;
+
+		for (i = 0, phdr = buf; i < ehdr.e_phnum; i++, phdr++) {
+
+			if (need_swap) {
+				phdr->p_type = bswap_32(phdr->p_type);
+				phdr->p_offset = bswap_32(phdr->p_offset);
+				phdr->p_filesz = bswap_32(phdr->p_filesz);
+			}
+
+			if (phdr->p_type != PT_LOAD)
+				continue;
+
+			*foffset = phdr->p_offset;
+			*fstart = phdr->p_vaddr;
+			*fsize = phdr->p_filesz;
+			ret = 0;
+			break;
+		}
+	} else {
+		Elf64_Ehdr ehdr;
+		Elf64_Phdr *phdr;
+
+		if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1)
+			goto out;
+
+		if (need_swap) {
+			ehdr.e_phoff = bswap_64(ehdr.e_phoff);
+			ehdr.e_phentsize = bswap_16(ehdr.e_phentsize);
+			ehdr.e_phnum = bswap_16(ehdr.e_phnum);
+		}
+
+		buf_size = ehdr.e_phentsize * ehdr.e_phnum;
+		buf = malloc(buf_size);
+		if (buf == NULL)
+			goto out;
+
+		fseek(fp, ehdr.e_phoff, SEEK_SET);
+		if (fread(buf, buf_size, 1, fp) != 1)
+			goto out_free;
+
+		for (i = 0, phdr = buf; i < ehdr.e_phnum; i++, phdr++) {
+
+			if (need_swap) {
+				phdr->p_type = bswap_32(phdr->p_type);
+				phdr->p_offset = bswap_64(phdr->p_offset);
+				phdr->p_filesz = bswap_64(phdr->p_filesz);
+			}
+
+			if (phdr->p_type != PT_LOAD)
+				continue;
+
+			*foffset = phdr->p_offset;
+			*fstart = phdr->p_vaddr;
+			*fsize = phdr->p_filesz;
+			ret = 0;
+			break;
+		}
+	}
+out_free:
+	free(buf);
+out:
+	fclose(fp);
+	return ret;
+}
+
+static int cs_etm__process_event(struct perf_session *session,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct perf_tool *tool)
+{
+	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+						   struct cs_etm_auxtrace,
+						   auxtrace);
+
+	u64 timestamp;
+	int err = 0;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("CoreSight ETM Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && (sample->time != (u64)-1))
+		timestamp = sample->time;
+	else
+		timestamp = 0;
+
+	if (timestamp || etm->timeless_decoding) {
+		err = cs_etm__update_queues(etm);
+		if (err)
+			return err;
+
+	}
+
+	if (event->header.type == PERF_RECORD_MMAP2) {
+		struct dso *dso;
+		int cpu;
+		struct cs_etm_queue *etmq;
+
+		cpu = sample->cpu;
+
+		etmq = cs_etm__cpu_to_etmq(etm, cpu);
+
+		if (!etmq)
+			return -1;
+
+		dso = dsos__find(&(etm->machine->dsos), event->mmap2.filename, false);
+		if (NULL != dso) {
+			err = cs_etm_decoder__add_mem_access_cb(
+			    etmq->decoder,
+			    event->mmap2.start,
+			    event->mmap2.len,
+			    cs_etm__mem_access);
+		}
+
+		if ((symbol_conf.vmlinux_name != NULL) && (!etmq->kernel_mapped)) {
+			uint64_t foffset;
+			uint64_t fstart;
+			uint64_t fsize;
+
+			err = cs_etm__read_elf_info(symbol_conf.vmlinux_name,
+						      &foffset, &fstart, &fsize);
+
+			if (!err) {
+				cs_etm_decoder__add_bin_file(
+					etmq->decoder,
+					foffset,
+					fstart,
+					fsize & ~0xULL,
+					symbol_conf.vmlinux_name);
+
+				etmq->kernel_mapped = true;
+			}
+		}
+
+	}
+
+	if (etm->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = cs_etm__process_timeless_queues(etm,
+							     event->fork.tid,
+							     sample->time);
+		}
+	} else if (timestamp) {
+		err = cs_etm__process_queues(etm, timestamp);
+	}
+
+	//cs_etm__log("event %s (%u): cpu %d time%"PRIu64" tsc %#"PRIx64"\n",
+		   //perf_event__name(event->header.type), event->header.type,
+		   //sample->cpu, sample->time, timestamp);
+	return err;
+}
+
+static int cs_etm__process_auxtrace_event(struct perf_session *session,
+				  union perf_event *event,
+				  struct perf_tool *tool)
+{
+	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+						   struct cs_etm_auxtrace,
+						   auxtrace);
+
+	(void) tool;
+
+	if (!etm->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t  data_offset;
+		int fd = perf_data_file__fd(session->file);
+		bool is_pipe = perf_data_file__is_pipe(session->file);
+		int err;
+
+		if (is_pipe)
+			data_offset = 0;
+		else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&etm->queues,
+						 session,
+						 event,
+						 data_offset,
+						 &buffer);
+		if (err)
+			return err;
+
+		if (dump_trace)
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				cs_etm__dump_event(etm, buffer);
+				auxtrace_buffer__put_data(buffer);
+			}
+	}
+
+	return 0;
+
+}
+
+static const char * const cs_etm_global_header_fmts[] = {
+  [CS_HEADER_VERSION_0]	= "   Header version		%"PRIx64"\n",
+  [CS_PMU_TYPE_CPUS]	= "   PMU type/num cpus		%"PRIx64"\n",
+  [CS_ETM_SNAPSHOT]	= "   Snapshot			%"PRIx64"\n",
+};
+
+static const char * const cs_etm_priv_fmts[] = {
+  [CS_ETM_MAGIC]	= "   Magic number		%"PRIx64"\n",
+  [CS_ETM_CPU]		= "   CPU			%"PRIx64"\n",
+  [CS_ETM_ETMCR]	= "   ETMCR			%"PRIx64"\n",
+  [CS_ETM_ETMTRACEIDR]	= "   ETMTRACEIDR		%"PRIx64"\n",
+  [CS_ETM_ETMCCER]	= "   ETMCCER			%"PRIx64"\n",
+  [CS_ETM_ETMIDR]	= "   ETMIDR			%"PRIx64"\n",
+};
+
+static const char * const cs_etmv4_priv_fmts[] = {
+  [CS_ETM_MAGIC]		= "   Magic number		%"PRIx64"\n",
+  [CS_ETM_CPU]			= "   CPU			%"PRIx64"\n",
+  [CS_ETMV4_TRCCONFIGR]		= "   TRCCONFIGR		%"PRIx64"\n",
+  [CS_ETMV4_TRCTRACEIDR]	= "   TRCTRACEIDR		%"PRIx64"\n",
+  [CS_ETMV4_TRCIDR0]		= "   TRCIDR0			%"PRIx64"\n",
+  [CS_ETMV4_TRCIDR1]		= "   TRCIDR1			%"PRIx64"\n",
+  [CS_ETMV4_TRCIDR2]		= "   TRCIDR2			%"PRIx64"\n",
+  [CS_ETMV4_TRCIDR8]		= "   TRCIDR8			%"PRIx64"\n",
+  [CS_ETMV4_TRCAUTHSTATUS]	= "   TRCAUTHSTATUS		%"PRIx64"\n",
+};
+
+static void cs_etm__print_auxtrace_info(u64 *val, size_t num)
+{
+	unsigned i, j, cpu;
+
+	for (i = 0, cpu = 0; cpu < num; ++cpu)
+		if (val[i] == __perf_cs_etmv3_magic)
+			for (j = 0; j < CS_ETM_PRIV_MAX; ++j, ++i)
+				fprintf(stdout, cs_etm_priv_fmts[j], val[i]);
+		else if (val[i] == __perf_cs_etmv4_magic)
+			for (j = 0; j < CS_ETMV4_PRIV_MAX; ++j, ++i)
+				fprintf(stdout, cs_etmv4_priv_fmts[j], val[i]);
+		else
+			// failure.. return
+			return;
+}
+
+int cs_etm__process_auxtrace_info(union perf_event *event,
+				 struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &(event->auxtrace_info);
+	size_t event_header_size = sizeof(struct perf_event_header);
+	size_t info_header_size = 8;
+	size_t total_size = auxtrace_info->header.size;
+	size_t priv_size = 0;
+	size_t num_cpu;
+	struct cs_etm_auxtrace *etm = 0;
+	int err = 0, idx = -1;
+	u64 *ptr;
+	u64 *hdr = NULL;
+	u64 **metadata = NULL;
+	size_t i, j, k;
+	unsigned pmu_type;
+	struct int_node *inode;
+
+	/*
+	 * sizeof(auxtrace_info_event::type) +
+	 * sizeof(auxtrace_info_event::reserved) == 8
+	 */
+	info_header_size = 8;
+
+	if (total_size < (event_header_size + info_header_size))
+		return -EINVAL;
+
+	priv_size = total_size - event_header_size - info_header_size;
+
+	// First the global part
+
+	ptr = (u64 *) auxtrace_info->priv;
+	if (ptr[0] == 0) {
+		hdr = zalloc(sizeof(u64 *) * CS_HEADER_VERSION_0_MAX);
+		if (hdr == NULL)
+			return -EINVAL;
+		for (i = 0; i < CS_HEADER_VERSION_0_MAX; ++i)
+			hdr[i] = ptr[i];
+		num_cpu = hdr[CS_PMU_TYPE_CPUS] & 0xffffffff;
+		pmu_type = (unsigned) ((hdr[CS_PMU_TYPE_CPUS] >> 32) & 0xffffffff);
+	} else
+		return -EINVAL;
+
+	/*
+	 * Create an RB tree for traceID-CPU# tuple.  Since the conversion has
+	 * to be made for each packet that gets decoded optimizing access in
+	 * anything other than a sequential array is worth doing.
+	 */
+	traceid_list = intlist__new(NULL);
+	if (!traceid_list)
+		return -ENOMEM;
+
+	metadata = zalloc(sizeof(u64 *) * num_cpu);
+	if (!metadata) {
+		err = -ENOMEM;
+		goto err_free_traceid_list;
+	}
+
+	if (metadata == NULL)
+		return -EINVAL;
+
+	for (j = 0; j < num_cpu; ++j) {
+		if (ptr[i] == __perf_cs_etmv3_magic) {
+			metadata[j] = zalloc(sizeof(u64)*CS_ETM_PRIV_MAX);
+			if (metadata == NULL)
+				return -EINVAL;
+			for (k = 0; k < CS_ETM_PRIV_MAX; k++)
+				metadata[j][k] = ptr[i+k];
+
+			/* The traceID is our handle */
+			idx = metadata[j][CS_ETM_ETMIDR];
+			i += CS_ETM_PRIV_MAX;
+		} else if (ptr[i] == __perf_cs_etmv4_magic) {
+			metadata[j] = zalloc(sizeof(u64)*CS_ETMV4_PRIV_MAX);
+			if (metadata == NULL)
+				return -EINVAL;
+			for (k = 0; k < CS_ETMV4_PRIV_MAX; k++)
+				metadata[j][k] = ptr[i+k];
+
+			/* The traceID is our handle */
+			idx = metadata[j][CS_ETMV4_TRCTRACEIDR];
+			i += CS_ETMV4_PRIV_MAX;
+		}
+
+		/* Get an RB node for this CPU */
+		inode = intlist__findnew(traceid_list, idx);
+
+		/* Something went wrong, no need to continue */
+		if (!inode) {
+			err = PTR_ERR(inode);
+			goto err_free_metadata;
+		}
+
+		/*
+		 * The node for that CPU should not have been taken already.
+		 * Backout if that's the case.
+		 */
+		if (inode->priv) {
+			err = -EINVAL;
+			goto err_free_metadata;
+		}
+
+		/* All good, associate the traceID with the CPU# */
+		inode->priv = &metadata[j][CS_ETM_CPU];
+
+	}
+
+	if (i*8 != priv_size)
+		return -EINVAL;
+
+	if (dump_trace)
+		cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
+
+	etm = zalloc(sizeof(struct cs_etm_auxtrace));
+
+	etm->num_cpu = num_cpu;
+	etm->pmu_type = pmu_type;
+	etm->snapshot_mode = (hdr[CS_ETM_SNAPSHOT] != 0);
+
+	if (!etm)
+		return -ENOMEM;
+
+
+	err = auxtrace_queues__init(&etm->queues);
+	if (err)
+		goto err_free;
+
+	etm->unknown_thread = thread__new(999999999, 999999999);
+	if (etm->unknown_thread == NULL) {
+		err = -ENOMEM;
+		goto err_free_queues;
+	}
+	err = thread__set_comm(etm->unknown_thread, "unknown", 0);
+	if (err)
+		goto err_delete_thread;
+
+	if (thread__init_map_groups(etm->unknown_thread,
+				    etm->machine)) {
+		err = -ENOMEM;
+		goto err_delete_thread;
+	}
+
+	etm->timeless_decoding = true;
+	etm->sampling_mode = false;
+	etm->metadata = metadata;
+	etm->session = session;
+	etm->machine = &session->machines.host;
+	etm->auxtrace_type = auxtrace_info->type;
+
+	etm->auxtrace.process_event = cs_etm__process_event;
+	etm->auxtrace.process_auxtrace_event = cs_etm__process_auxtrace_event;
+	etm->auxtrace.flush_events = cs_etm__flush_events;
+	etm->auxtrace.free_events  = cs_etm__free_events;
+	etm->auxtrace.free	 = cs_etm__free;
+	session->auxtrace = &(etm->auxtrace);
+
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		etm->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&etm->synth_opts);
+	etm->synth_opts.branches = false;
+	etm->synth_opts.callchain = false;
+	etm->synth_opts.calls = false;
+	etm->synth_opts.returns = false;
+
+	err = cs_etm__synth_events(etm, session);
+	if (err)
+		goto err_delete_thread;
+
+	err = auxtrace_queues__process_index(&etm->queues, session);
+	if (err)
+		goto err_delete_thread;
+
+	etm->data_queued = etm->queues.populated;
+
+	return 0;
+
+err_delete_thread:
+	thread__delete(etm->unknown_thread);
+err_free_queues:
+	auxtrace_queues__free(&etm->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(etm);
+err_free_metadata:
+	/* No need to check @metadata[j], free(NULL) is supported */
+	for (j = 0; j < num_cpu; ++j)
+		free(metadata[j]);
+	free(metadata);
+err_free_traceid_list:
+	intlist__delete(traceid_list);
+
+	return err;
+}
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index 3cc6bc3..32400ac 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -18,6 +18,10 @@
 #ifndef INCLUDE__UTIL_PERF_CS_ETM_H__
 #define INCLUDE__UTIL_PERF_CS_ETM_H__
 
+#include "util/event.h"
+#include "util/intlist.h"
+#include "util/session.h"
+
 /* Versionning header in case things need tro change in the future.  That way
  * decoding of old snapshot is still possible.
  */
@@ -61,6 +65,9 @@ enum {
 	CS_ETMV4_PRIV_MAX,
 };
 
+/* RB tree for quick conversion between traceID and CPUs */
+struct intlist *traceid_list;
+
 #define KiB(x) ((x) * 1024)
 #define MiB(x) ((x) * 1024 * 1024)
 
@@ -71,4 +78,7 @@ static const u64 __perf_cs_etmv4_magic   = 0x4040404040404040ULL;
 #define CS_ETMV3_PRIV_SIZE (CS_ETM_PRIV_MAX * sizeof(u64))
 #define CS_ETMV4_PRIV_SIZE (CS_ETMV4_PRIV_MAX * sizeof(u64))
 
+int cs_etm__process_auxtrace_info(union perf_event *event,
+				  struct perf_session *session);
+
 #endif
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index df85b9e..bcec333 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1,3 +1,4 @@
+#include "build-id.h"
 #include "callchain.h"
 #include "debug.h"
 #include "event.h"
@@ -483,7 +484,8 @@ int machine__process_comm_event(struct machine *machine, union perf_event *event
 }
 
 int machine__process_lost_event(struct machine *machine __maybe_unused,
-				union perf_event *event, struct perf_sample *sample __maybe_unused)
+				union perf_event *event,
+				struct perf_sample *sample __maybe_unused)
 {
 	dump_printf(": id:%" PRIu64 ": lost:%" PRIu64 "\n",
 		    event->lost.id, event->lost.lost);
@@ -491,7 +493,8 @@ int machine__process_lost_event(struct machine *machine __maybe_unused,
 }
 
 int machine__process_lost_samples_event(struct machine *machine __maybe_unused,
-					union perf_event *event, struct perf_sample *sample)
+					union perf_event *event,
+					struct perf_sample *sample)
 {
 	dump_printf(": id:%" PRIu64 ": lost samples :%" PRIu64 "\n",
 		    sample->id, event->lost_samples.lost);
@@ -711,8 +714,16 @@ static struct dso *machine__get_kernel(struct machine *machine)
 						 DSO_TYPE_GUEST_KERNEL);
 	}
 
-	if (kernel != NULL && (!kernel->has_build_id))
-		dso__read_running_kernel_build_id(kernel, machine);
+	if (kernel != NULL && (!kernel->has_build_id)) {
+		if (symbol_conf.vmlinux_name != NULL) {
+			filename__read_build_id(symbol_conf.vmlinux_name,
+						kernel->build_id,
+						sizeof(kernel->build_id));
+			kernel->has_build_id = 1;
+		} else {
+			dso__read_running_kernel_build_id(kernel, machine);
+		}
+	}
 
 	return kernel;
 }
@@ -726,8 +737,19 @@ static void machine__get_kallsyms_filename(struct machine *machine, char *buf,
 {
 	if (machine__is_default_guest(machine))
 		scnprintf(buf, bufsz, "%s", symbol_conf.default_guest_kallsyms);
-	else
-		scnprintf(buf, bufsz, "%s/proc/kallsyms", machine->root_dir);
+	else {
+		if (symbol_conf.vmlinux_name != 0) {
+			unsigned char build_id[BUILD_ID_SIZE];
+			char build_id_hex[SBUILD_ID_SIZE];
+			filename__read_build_id(symbol_conf.vmlinux_name,
+						build_id,
+						sizeof(build_id));
+			build_id__sprintf(build_id, sizeof(build_id), build_id_hex);
+			build_id_cache__linkname((char *)build_id_hex, buf, bufsz);
+		} else {
+			scnprintf(buf, bufsz, "%s/proc/kallsyms", machine->root_dir);
+		}
+	}
 }
 
 const char *ref_reloc_sym_names[] = {"_text", "_stext", NULL};
@@ -736,7 +758,7 @@ const char *ref_reloc_sym_names[] = {"_text", "_stext", NULL};
  * Returns the name of the start symbol in *symbol_name. Pass in NULL as
  * symbol_name if it's not that important.
  */
-static u64 machine__get_running_kernel_start(struct machine *machine,
+static u64 machine__get_kallsyms_kernel_start(struct machine *machine,
 					     const char **symbol_name)
 {
 	char filename[PATH_MAX];
@@ -764,7 +786,7 @@ static u64 machine__get_running_kernel_start(struct machine *machine,
 int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
 {
 	enum map_type type;
-	u64 start = machine__get_running_kernel_start(machine, NULL);
+	u64 start = machine__get_kallsyms_kernel_start(machine, NULL);
 
 	/* In case of renewal the kernel map, destroy previous one */
 	machine__destroy_kernel_maps(machine);
@@ -1126,10 +1148,10 @@ int machine__create_kernel_maps(struct machine *machine)
 {
 	struct dso *kernel = machine__get_kernel(machine);
 	const char *name;
-	u64 addr;
+	u64 addr = machine__get_kallsyms_kernel_start(machine, &name);
 	int ret;
 
-	if (kernel == NULL)
+	if (!addr || kernel == NULL)
 		return -1;
 
 	ret = __machine__create_kernel_maps(machine, kernel);
@@ -1151,7 +1173,7 @@ int machine__create_kernel_maps(struct machine *machine)
 	 */
 	map_groups__fixup_end(&machine->kmaps);
 
-	addr = machine__get_running_kernel_start(machine, &name);
+	addr = machine__get_kallsyms_kernel_start(machine, &name);
 	if (!addr) {
 	} else if (maps__set_kallsyms_ref_reloc_sym(machine->vmlinux_maps, name, addr)) {
 		machine__destroy_kernel_maps(machine);
@@ -1901,7 +1923,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		ip = chain->ips[j];
 
 		if (ip < PERF_CONTEXT_MAX)
-                       ++nr_entries;
+		       ++nr_entries;
 
 		err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip);
 
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index fdbbf04..7698291 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -833,6 +833,8 @@ static void python_process_general_event(struct perf_sample *sample,
 			PyInt_FromLong(sample->cpu));
 	pydict_set_item_string_decref(dict_sample, "ip",
 			PyLong_FromUnsignedLongLong(sample->ip));
+	pydict_set_item_string_decref(dict_sample, "addr",
+			PyLong_FromUnsignedLongLong(sample->addr));
 	pydict_set_item_string_decref(dict_sample, "time",
 			PyLong_FromUnsignedLongLong(sample->time));
 	pydict_set_item_string_decref(dict_sample, "period",
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index 11cdde9..c094091 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -342,9 +342,8 @@ int dso__load_sym(struct dso *dso, struct map *map __maybe_unused,
 	if (ret >= 0)
 		dso->is_64_bit = ret;
 
-	if (filename__read_build_id(ss->name, build_id, BUILD_ID_SIZE) > 0) {
+	if ((!dso->has_build_id) && (filename__read_build_id(ss->name, build_id, BUILD_ID_SIZE) > 0))
 		dso__set_build_id(dso, build_id);
-	}
 	return 0;
 }
 
-- 
2.7.4