Skip to content

Commit

Permalink
AppTP Health Monitor (duckduckgo#1608)
Browse files Browse the repository at this point in the history
Task/Issue URL: https://app.asana.com/0/488551667048375/1201527834957243/f

### Description
Adds an automated health monitor for app TP, only for `internal` builds currently. The monitor will start and stop with the `VPNService` lifecycle. 

`Internal` builds will be able to view additional diagnostics in the diagnostics activity, as well as having access to a report button.

#### Bad health alert
When the health monitor detects a problem, and it doesn't seem to be clearing, it will throw up a notification alerting that the health might be degraded. Clicking on that notification will show the diagnostics screen, where a health report can be captured and sent for analysis.

The rules for what constitutes bad health will need to evolve as we learn more, as they are somewhat arbitrary right now. See  `HealthClassifier` for these rules.

#### Health metrics
When it's running, it will monitor metrics, such as:

- current memory usage
- number of socket exceptions
- number of exceptions writing to the TUN
- tracer packets (see below)

#### Tracer packets
When the monitor is running, it will frequently drop in a special _tracer_ `Packet` and measure the length of time it takes for that packet to move through the system. The number of successful/failed tracers will be used to trigger a bad health alert if they start to slow down, or stop reaching their final destination.

### Steps to test this PR

#### General 
- [ ] Test it doesn't automatically run when not an `internal` build

#### Testing `internal` builds

Suggested logcat filter: `AppTp Health|Cleaning up old health metrics|Sending health report|Pixel sent: m_atp_breakage_report|tracer`

- [x] Ensure app monitor does **not** start when the VPN service isn't running
- [x] Ensure monitor **does** start when the VPN service starts
- [x] Launch `DiagnosticsActivity` (available from main AppTP activity->overflow). Verify health stats are updating frequently.
- [x] Click the `Report` button, then hit back. Verify no report sent. (nothing will show with suggested logcat filter)
- [x] Click the `Report` button, then hit `Submit` leaving options untouched. Verify JSON contains system health report and user report (which shows empty notes and "undetermined" health since user didn't specify current health)
- [x] Click the `Report` button Choose "Everything seems fine" and submit. Verify JSON contains system health report and user report (which shows "good" health)
- [x] Click the `Report` button. Choose "Something's not working" and submit. Verify JSON contains system health report and user report (which shows "bad" health)
- [x] Click the `Report` button. Add some notes and submit. Verify JSON contains system health report and user report (which shows "notes" populated)


#### Triggering bad health

Test bad health by adding some pretend bad health metrics.

E.g., in `TunPacketWriter`, line 72, add `healthMetricCounter.onTunWriteIOException()` (this will record an exception every time there is a tracer packet, ie, every ~5 seconds)

- [x] Verify the diagnostics screen shows the red sad face to indicate it's detecting bad health
- [x] Verify the notification doesn't show yet. This won't show until there's been a prolonged bad health event, lasting across multiple samples. 
- [x] Wait until it's detected as a prolonged event, and verify the notification is triggered
  - One sample taken every ~30s, so it might take 1 or 2 mins to be detected as prolonged. 
  - If you're impatient, tweak the constants in `AppTPHealthMonitor` for `MONITORING_FREQUENCY_MS` and how many samples to wait when instantiating each of the `HealthRule`s (e.g., passing `samplesToWaitBeforeAlerting = 2` in constructor)
- [x] Verify clicking on the notification launches the diagnostics screen

Co-authored-by: Aitor Viana <[email protected]>
  • Loading branch information
CDRussell and aitorvs authored Dec 18, 2021
1 parent 59c8578 commit 26204dc
Show file tree
Hide file tree
Showing 36 changed files with 2,903 additions and 392 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,5 @@ app/internal/

build-cache
report

.DS_Store
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
/*
* Copyright (c) 2021 DuckDuckGo
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.duckduckgo.mobile.android.vpn.health

interface AppHealthMonitor {
fun startMonitoring()
fun stopMonitoring()
}
9 changes: 5 additions & 4 deletions vpn-internal/src/main/AndroidManifest.xml
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="com.duckduckgo.vpn.internal">
package="com.duckduckgo.vpn.internal">

<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<application>
<activity android:name=".feature.rules.ExceptionRulesDebugActivity">
</activity>
<activity android:name=".feature.rules.ExceptionRulesDebugActivity" />
<activity
android:name=".feature.VpnInternalSettingsActivity"
android:process=":vpn"
android:label="@string/vpnDevSettingsTitle"/>
android:label="@string/vpnDevSettingsTitle" />

</application>

</manifest>
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
* Copyright (c) 2021 DuckDuckGo
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.duckduckgo.vpn.internal.feature.health

import com.duckduckgo.di.scopes.AppScope
import com.duckduckgo.mobile.android.vpn.health.AppHealthMonitor
import com.duckduckgo.mobile.android.vpn.service.VpnServiceCallbacks
import com.duckduckgo.mobile.android.vpn.service.VpnStopReason
import com.squareup.anvil.annotations.ContributesMultibinding
import kotlinx.coroutines.CoroutineScope
import javax.inject.Inject

@ContributesMultibinding(AppScope::class)
class InternalAppHealthMonitor @Inject constructor(private val appHealthMonitor: AppHealthMonitor) :
VpnServiceCallbacks {

override fun onVpnStarted(coroutineScope: CoroutineScope) {
appHealthMonitor.startMonitoring()
}

override fun onVpnStopped(coroutineScope: CoroutineScope, vpnStopReason: VpnStopReason) {
appHealthMonitor.stopMonitoring()
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/*
* Copyright (c) 2020 DuckDuckGo
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.duckduckgo.mobile.android.vpn.di

import com.duckduckgo.di.scopes.ActivityScope
import com.duckduckgo.di.scopes.AppScope
import com.squareup.anvil.annotations.ContributesTo
import com.squareup.anvil.annotations.MergeSubcomponent
import dagger.Binds
import dagger.Module
import dagger.SingleInstanceIn
import dagger.Subcomponent
import dagger.android.AndroidInjector
import dagger.multibindings.ClassKey
import dagger.multibindings.IntoMap
import dummy.ui.VpnDiagnosticsGetUserHealthReportActivity

@SingleInstanceIn(ActivityScope::class)
@MergeSubcomponent(
scope = ActivityScope::class
)
interface VpnDiagnosticsGetUserHealthReportActivityComponent : AndroidInjector<VpnDiagnosticsGetUserHealthReportActivity> {
@Subcomponent.Factory
interface Factory : AndroidInjector.Factory<VpnDiagnosticsGetUserHealthReportActivity>
}

@ContributesTo(AppScope::class)
interface VpnDiagnosticsGetUserHealthReportActivityComponentProvider {
fun vpnDiagnosticsGetUserHealthReportActivityComponentFactory(): VpnDiagnosticsGetUserHealthReportActivityComponent.Factory
}

@Module
@ContributesTo(AppScope::class)
abstract class VpnDiagnosticsGetUserHealthReportActivityComponentBindingModule {
@Binds
@IntoMap
@ClassKey(VpnDiagnosticsGetUserHealthReportActivity::class)
abstract fun VpnDiagnosticsGetUserHealthReportActivityComponent.Factory.bind(): AndroidInjector.Factory<*>
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"formatVersion": 1,
"database": {
"version": 1,
"identityHash": "2d7bca577a738d92227a6f464799f868",
"entities": [
{
"tableName": "SimpleEvent",
"createSql": "CREATE TABLE IF NOT EXISTS `${TABLE_NAME}` (`id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, `type` TEXT NOT NULL, `timestamp` INTEGER NOT NULL)",
"fields": [
{
"fieldPath": "id",
"columnName": "id",
"affinity": "INTEGER",
"notNull": true
},
{
"fieldPath": "type",
"columnName": "type",
"affinity": "TEXT",
"notNull": true
},
{
"fieldPath": "timestamp",
"columnName": "timestamp",
"affinity": "INTEGER",
"notNull": true
}
],
"primaryKey": {
"columnNames": [
"id"
],
"autoGenerate": true
},
"indices": [],
"foreignKeys": []
}
],
"views": [],
"setupQueries": [
"CREATE TABLE IF NOT EXISTS room_master_table (id INTEGER PRIMARY KEY,identity_hash TEXT)",
"INSERT OR REPLACE INTO room_master_table (id,identity_hash) VALUES(42, '2d7bca577a738d92227a6f464799f868')"
]
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
/*
* Copyright (c) 2021 DuckDuckGo
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.duckduckgo.mobile.android.vpn.health

import androidx.room.Dao
import androidx.room.Database
import androidx.room.Entity
import androidx.room.Insert
import androidx.room.PrimaryKey
import androidx.room.Query
import androidx.room.RoomDatabase

@Database(
version = 1,
entities =
[
SimpleEvent::class,
],
)
abstract class HealthStatsDatabase : RoomDatabase() {
abstract fun healthStatDao(): HealthStatDao
}

abstract class HealthStat(open val timestamp: Long = 0)

@Entity
data class SimpleEvent(
@PrimaryKey(autoGenerate = true) val id: Long = 0,
val type: String,
override val timestamp: Long
) : HealthStat(timestamp) {

@Suppress("FunctionName")
companion object {
fun build(type: String) = SimpleEvent(type = type, timestamp = System.currentTimeMillis())

fun TUN_READ() = build("TUN_READ")
fun ADD_TO_DEVICE_TO_NETWORK_QUEUE() = build("ADD_TO_DEVICE_TO_NETWORK_QUEUE")
fun REMOVE_FROM_DEVICE_TO_NETWORK_QUEUE() = build("REMOVE_FROM_DEVICE_TO_NETWORK_QUEUE")
fun SOCKET_CHANNEL_READ_EXCEPTION() = build("SOCKET_CHANNEL_READ_EXCEPTION")
fun SOCKET_CHANNEL_WRITE_EXCEPTION() = build("SOCKET_CHANNEL_WRITE_EXCEPTION")
fun SOCKET_CHANNEL_CONNECT_EXCEPTION() = build("SOCKET_CHANNEL_CONNECT_EXCEPTION")
fun TUN_WRITE_IO_EXCEPTION() = build("TUN_WRITE_IO_EXCEPTION")
}
}

@Dao
interface HealthStatDao {

companion object {
const val MAX_NUMBER_HEALTH_METRICS_TO_RETAIN = 50_000
}

@Insert
fun insertEvent(event: SimpleEvent)

@Query("SELECT count(*) FROM SimpleEvent WHERE timestamp >= :timestamp AND type=:type")
fun eventCount(type: String, timestamp: Long): Long

@Query(
"DELETE FROM SimpleEvent WHERE id IN (SELECT id FROM SimpleEvent ORDER BY timestamp DESC LIMIT -1 OFFSET :maxNumberToKeep)",
)
fun purgeOldMetrics(maxNumberToKeep:Int = MAX_NUMBER_HEALTH_METRICS_TO_RETAIN)
}
Loading

0 comments on commit 26204dc

Please sign in to comment.