mirror of https://github.com/renovatebot/renovate
230 lines
9.3 KiB
Markdown
230 lines
9.3 KiB
Markdown
# OpenTelemetry
|
|
|
|
Requirements:
|
|
|
|
- docker-compose
|
|
|
|
## Prepare setup
|
|
|
|
Create a `docker-compose.yaml` and `otel-collector-config.yml` file as seen below in a folder.
|
|
|
|
```yaml title="docker-compose.yaml"
|
|
version: '3'
|
|
services:
|
|
# Jaeger
|
|
jaeger:
|
|
image: jaegertracing/all-in-one:1.61.0
|
|
ports:
|
|
- '16686:16686'
|
|
- '4317'
|
|
|
|
otel-collector:
|
|
image: otel/opentelemetry-collector-contrib:0.109.0
|
|
command: ['--config=/etc/otel-collector-config.yml']
|
|
volumes:
|
|
- ./otel-collector-config.yml:/etc/otel-collector-config.yml
|
|
ports:
|
|
- '1888:1888' # pprof extension
|
|
- '13133:13133' # health_check extension
|
|
- '55679:55679' # zpages extension
|
|
- '4318:4318' # OTLP HTTP
|
|
- '4317:4317' # OTLP GRPC
|
|
- '9123:9123' # Prometheus exporter
|
|
depends_on:
|
|
- jaeger
|
|
```
|
|
|
|
```yaml title="otel-collector-config.yml"
|
|
receivers:
|
|
otlp:
|
|
protocols:
|
|
grpc:
|
|
http:
|
|
|
|
exporters:
|
|
otlp/jaeger:
|
|
endpoint: jaeger:4317
|
|
tls:
|
|
insecure: true
|
|
logging:
|
|
prometheus:
|
|
endpoint: '0.0.0.0:9123'
|
|
|
|
processors:
|
|
batch:
|
|
spanmetrics:
|
|
metrics_exporter: prometheus
|
|
latency_histogram_buckets: [10ms, 100ms, 250ms, 1s, 30s, 1m, 5m]
|
|
dimensions:
|
|
- name: http.method
|
|
- name: http.status_code
|
|
- name: http.host
|
|
dimensions_cache_size: 1000
|
|
aggregation_temporality: 'AGGREGATION_TEMPORALITY_CUMULATIVE'
|
|
|
|
extensions:
|
|
health_check:
|
|
pprof:
|
|
zpages:
|
|
|
|
service:
|
|
extensions: [pprof, zpages, health_check]
|
|
pipelines:
|
|
traces:
|
|
receivers: [otlp]
|
|
exporters: [otlp/jaeger, logging]
|
|
processors: [spanmetrics, batch]
|
|
|
|
metrics:
|
|
receivers: [otlp]
|
|
exporters: [prometheus]
|
|
```
|
|
|
|
Start setup using this command inside the folder containing the files created in the earlier steps:
|
|
|
|
```
|
|
docker-compose up
|
|
```
|
|
|
|
This command will start an [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector-contrib) and an instance of [Jaeger](https://www.jaegertracing.io/).
|
|
|
|
Jaeger will be now reachable under [http://localhost:16686](http://localhost:16686).
|
|
|
|
## Run Renovate with OpenTelemetry
|
|
|
|
To start Renovate with OpenTelemetry enabled run following command, after pointing to your `config.js` config file:
|
|
|
|
```
|
|
docker run \
|
|
--rm \
|
|
-e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
|
|
-v "/path/to/your/config.js:/usr/src/app/config.js" \
|
|
renovate/renovate:latest
|
|
```
|
|
|
|
You should now see `trace_id` and `span_id` fields in the logs.
|
|
|
|
```
|
|
INFO: Repository finished (repository=org/example)
|
|
"durationMs": 5574,
|
|
"trace_id": "f9a4c33852333fc2a0fbdc163100c987",
|
|
"span_id": "4ac1323eeaee
|
|
```
|
|
|
|
### Traces
|
|
|
|
Open now Jaeger under [http://localhost:16686](http://localhost:16686).
|
|
|
|
You should now be able to pick `renovate` under in the field `service` field.
|
|
|
|
![service picker](../assets/images/opentelemetry_pick_service.png)
|
|
|
|
Select `Find Traces` to search for all Renovate traces and then select one of the found traces to open the trace view.
|
|
|
|
![pick trace](../assets/images/opentelemetry_choose_trace.png)
|
|
|
|
You should be able to see now the full trace view which shows each HTTP request and internal spans.
|
|
|
|
![trace view](../assets/images/opentelemetry_trace_viewer.png)
|
|
|
|
### Metrics
|
|
|
|
Additional to the received traces some metrics are calculated.
|
|
This is achieved using the [spanmetricsprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/spanmetricsprocessor).
|
|
The previous implemented setup will produce following metrics, which are exposed under [http://localhost:9123/metrics](http://localhost:9123/metrics):
|
|
|
|
```
|
|
# HELP calls_total
|
|
# TYPE calls_total counter
|
|
|
|
### Example of internal spans
|
|
calls_total{operation="renovate repository",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET"} 3
|
|
calls_total{operation="run",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET"} 1
|
|
### Example of http calls from Renovate to external services
|
|
calls_total{http_host="api.github.com:443",http_method="POST",http_status_code="200",operation="HTTPS POST",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET"} 9
|
|
|
|
...
|
|
|
|
# HELP latency
|
|
# TYPE latency histogram
|
|
### Example of internal spans
|
|
latency_bucket{operation="renovate repository",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET",le="0.1"} 0
|
|
...
|
|
latency_bucket{operation="renovate repository",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET",le="9.223372036854775e+12"} 3
|
|
latency_bucket{operation="renovate repository",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET",le="+Inf"} 3
|
|
latency_sum{operation="renovate repository",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET"} 30947.4689
|
|
latency_count{operation="renovate repository",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET"} 3
|
|
|
|
...
|
|
|
|
### Example of http calls from Renovate to external services
|
|
latency_bucket{http_host="api.github.com:443",http_method="POST",http_status_code="200",operation="HTTPS POST",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET",le="0.1"} 0
|
|
...
|
|
latency_bucket{http_host="api.github.com:443",http_method="POST",http_status_code="200",operation="HTTPS POST",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET",le="250"} 3
|
|
latency_bucket{http_host="api.github.com:443",http_method="POST",http_status_code="200",operation="HTTPS POST",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET",le="9.223372036854775e+12"} 9
|
|
latency_bucket{http_host="api.github.com:443",http_method="POST",http_status_code="200",operation="HTTPS POST",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET",le="+Inf"} 9
|
|
latency_sum{http_host="api.github.com:443",http_method="POST",http_status_code="200",operation="HTTPS POST",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET"} 2306.1385999999998
|
|
latency_count{http_host="api.github.com:443",http_method="POST",http_status_code="200",operation="HTTPS POST",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET"} 9
|
|
```
|
|
|
|
The [spanmetricsprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/spanmetricsprocessor) creates two sets of metrics.
|
|
|
|
#### Calls metric
|
|
|
|
At first there are the `calls_total` metrics which display how often specific trace spans have been observed.
|
|
|
|
For example:
|
|
`calls_total{operation="renovate repository",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET"} 3` signals that 3 repositories have been renovated.
|
|
`calls_total{operation="run",service_name="renovate",span_kind="SPAN_KIND_INTERNAL",status_code="STATUS_CODE_UNSET"} 1` represents how often Renovate has been run.
|
|
|
|
If we combine this using the PrometheusQueryLanguage ( PromQL ), we can calculate the average count of repositories each Renovate run handles.
|
|
|
|
```
|
|
calls_total{operation="renovate repository",service_name="renovate"} / calls_total{operation="run",service_name="renovate"}
|
|
```
|
|
|
|
This metrics is also for spans generated by http calls:
|
|
|
|
```yaml
|
|
calls_total{http_host="registry.terraform.io:443",http_method="GET",http_status_code="200",operation="HTTPS GET",service_name="renovate",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_UNSET"} 5
|
|
```
|
|
|
|
#### Latency buckets
|
|
|
|
The second class of metrics exposed are the latency focused latency buckets which allow to create [heatmaps](https://grafana.com/docs/grafana/latest/basics/intro-histograms/#heatmaps).
|
|
A request is added to a backed if the latency is bigger than the bucket value (`le`). `request_duration => le`
|
|
|
|
As an example if we receive a request which need `1.533s` to complete get following metrics:
|
|
|
|
```
|
|
latency_bucket{http_host="api.github.com:443",le="0.1"} 0
|
|
latency_bucket{http_host="api.github.com:443",le="1"} 0
|
|
latency_bucket{http_host="api.github.com:443",le="2"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="6"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="10"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="100"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="250"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="9.223372036854775e+12"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="+Inf"} 1
|
|
latency_sum{http_host="api.github.com:443"} 1.533
|
|
latency_count{http_host="api.github.com:443"} 1
|
|
```
|
|
|
|
Now we have another request which this time takes 10s to complete:
|
|
|
|
```
|
|
latency_bucket{http_host="api.github.com:443",le="0.1"} 0
|
|
latency_bucket{http_host="api.github.com:443",le="1"} 0
|
|
latency_bucket{http_host="api.github.com:443",le="2"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="6"} 1
|
|
latency_bucket{http_host="api.github.com:443",le="10"} 2
|
|
latency_bucket{http_host="api.github.com:443",le="100"} 2
|
|
latency_bucket{http_host="api.github.com:443",le="250"} 2
|
|
latency_bucket{http_host="api.github.com:443",le="9.223372036854775e+12"} 2
|
|
latency_bucket{http_host="api.github.com:443",le="+Inf"} 2
|
|
latency_sum{http_host="api.github.com:443"} 11.533
|
|
latency_count{http_host="api.github.com:443"} 2
|
|
```
|
|
|
|
More about the functionality can be found on the Prometheus page for [metric types](https://prometheus.io/docs/concepts/metric_types/#histogram).
|