MyGit

mozilla/missioncontrol-v2

Fork: 4 Star: 2 (更新于 2024-11-13 21:31:42)

license: 暂无

Language: R .

An alternate view of crash and stability

GitHub网址

missioncontrol-v2

CircleCI

An alternate view of crash and stability

Installation instructions

This code is designed to be 'easy' to install and repeatable. That is if the underlying data doesn't change the output should not either.

This code should either work inside GCP or on a local computer. In either case, we recommend a powerful one with at least 4 cores and a decent amount (8Gb+) of memory. If using Docker, you will want to increase the amount of resources available to containers if you haven't already: the defaults are likely to be insufficient.

You will also want a GCP service account with permission to read from the datasets in fx-data-shared-prod and write access to a cloud storage bucket. Typically one would do this using a sandbox project.

After getting a service setup, download the credentials into a file called gcloud.json in the root of your checkout.

Development instructions

The ETL pipeline is based on running a number of scripts in succession, performing the following operations:

  • Download the latest crash and usage data for a recent set of versions, and upload the results to a temporary table in BigQuery.
  • Build a statistical model based on the above data downloaded as well as historical data that we have seen before.
  • Generate an Rmarkdown-based report based on the output of the above model and upload it to google cloud storage.

Option 1: Use the Docker container

This is the most deterministic approach and closest to what we are using in production, though it is likely to be slower on non-Linux hosts. These instructions assume that you have Docker and a basic set of developer tools installed on your machine.

First, build the container:

make build

Then, create a shell session inside it:

make shell

Skip to the next section to run the code.

Option 2: Use Conda

This should run on the bare metal of your machine, and should be much faster on Mac. These instructions assume you have either conda or miniconda installed, as well as the Google Cloud SDK.

From the root checkout, creating and activating a conda environment is a two step process:

conda env create -n mc2 -f environment.yml
conda activate mc2

Running

Once you have a shell (either in the docker container or activated conda environment), set some environment variables corresponding to your GCP settings:

export GOOGLE_APPLICATION_CREDENTIALS=$PWD/gcloud.json
export RAW_OUTPUT_TABLE=missioncontrol_v2_test_raw
export MODEL_OUTPUT_TABLE=missioncontrol_v2_test_model
export GCP_PROJECT_ID=my-gcp-project-id
export GCS_OUTPUT_PREFIX=gs://my-cloud-storage-bucket

The RAW_OUTPUT_TABLE and MODEL_OUTPUT_TABLE settings specify the GCP table names for temporary data written during the run.

Then run the model:

./complete.runner.sh

If running on an underpowered machine, or you just want to get results more quickly, you can also enable "simple" mode, which (as the name implifies) speeds up the model generation significantly by using a simplified statistical model:

SIMPLE=1 ./complete.runner.sh

Gotchas

If you run the data pulling code shortly after a new release, and did not pull data in the previous days, then those days' data could be missing for the previous major release versions.

To avoid this problem, you can copy the bigquery table used in production (moz-fx-data-derived-datasets.analysis.missioncontrol_v2_raw_data) to your own GCP project.

最近版本更新:(数据更新于 1970-01-01 00:00:00)

mozilla/missioncontrol-v2同语言 R最近更新仓库

2024-09-16 18:09:18 thomasp85/patchwork

2024-03-10 23:55:19 cxli233/FriendsDontLetFriends

2024-01-23 09:29:59 wilkelab/cowplot

2022-10-25 19:45:23 Dralliag/opera

2021-11-02 22:21:46 biobakery/Maaslin2

2020-06-03 15:47:30 roblanf/minion_qc