mozilla/missioncontrol-v2
Fork: 4 Star: 2 (更新于 2024-11-13 21:31:42)
license: 暂无
Language: R .
An alternate view of crash and stability
missioncontrol-v2
An alternate view of crash and stability
Installation instructions
This code is designed to be 'easy' to install and repeatable. That is if the underlying data doesn't change the output should not either.
This code should either work inside GCP or on a local computer. In either case, we recommend a powerful one with at least 4 cores and a decent amount (8Gb+) of memory. If using Docker, you will want to increase the amount of resources available to containers if you haven't already: the defaults are likely to be insufficient.
You will also want a GCP service account with permission to read from the datasets in
fx-data-shared-prod
and write access to a cloud storage bucket. Typically one would
do this using a sandbox project.
After getting a service setup, download the credentials into a file called gcloud.json
in
the root of your checkout.
Development instructions
The ETL pipeline is based on running a number of scripts in succession, performing the following operations:
- Download the latest crash and usage data for a recent set of versions, and upload the results to a temporary table in BigQuery.
- Build a statistical model based on the above data downloaded as well as historical data that we have seen before.
- Generate an Rmarkdown-based report based on the output of the above model and upload it to google cloud storage.
Option 1: Use the Docker container
This is the most deterministic approach and closest to what we are using in production, though it is likely to be slower on non-Linux hosts. These instructions assume that you have Docker and a basic set of developer tools installed on your machine.
First, build the container:
make build
Then, create a shell session inside it:
make shell
Skip to the next section to run the code.
Option 2: Use Conda
This should run on the bare metal of your machine, and should be much faster on Mac. These instructions assume you have either conda or miniconda installed, as well as the Google Cloud SDK.
From the root checkout, creating and activating a conda environment is a two step process:
conda env create -n mc2 -f environment.yml
conda activate mc2
Running
Once you have a shell (either in the docker container or activated conda environment), set some environment variables corresponding to your GCP settings:
export GOOGLE_APPLICATION_CREDENTIALS=$PWD/gcloud.json
export RAW_OUTPUT_TABLE=missioncontrol_v2_test_raw
export MODEL_OUTPUT_TABLE=missioncontrol_v2_test_model
export GCP_PROJECT_ID=my-gcp-project-id
export GCS_OUTPUT_PREFIX=gs://my-cloud-storage-bucket
The RAW_OUTPUT_TABLE
and MODEL_OUTPUT_TABLE
settings specify the GCP table names for temporary
data written during the run.
Then run the model:
./complete.runner.sh
If running on an underpowered machine, or you just want to get results more quickly, you can also enable "simple" mode, which (as the name implifies) speeds up the model generation significantly by using a simplified statistical model:
SIMPLE=1 ./complete.runner.sh
Gotchas
If you run the data pulling code shortly after a new release, and did not pull data in the previous days, then those days' data could be missing for the previous major release versions.
To avoid this problem, you can copy the bigquery table used in production (moz-fx-data-derived-datasets.analysis.missioncontrol_v2_raw_data
) to your own GCP project.
最近版本更新:(数据更新于 1970-01-01 00:00:00)
mozilla/missioncontrol-v2同语言 R最近更新仓库
2024-09-16 18:09:18 thomasp85/patchwork
2024-03-10 23:55:19 cxli233/FriendsDontLetFriends
2024-01-23 09:29:59 wilkelab/cowplot
2022-10-25 19:45:23 Dralliag/opera
2021-11-02 22:21:46 biobakery/Maaslin2
2020-06-03 15:47:30 roblanf/minion_qc