Technical Documentation

The DEEP technical documentation provides all technical documentation and an open-source codebase for complete transparency, allowing anyone to review, audit, and understand how the DEEP platform operates.

Introduction

The problem

Manual entry was not made for 21st century crises. Over the past 20 years, more than 7,300 natural disasters were recorded worldwide, affecting more than 4 billion people across the globe and generating $2.97 trillion in economic losses (UNDRR). The colossal effort to manually collect, track, translate, store, verify, and share critical data in the wake of these disasters has created overwhelming, and at times, insurmountable challenges for humanitarian responders— jeopardizing accurate decision making during a crisis and hindering global ability to prevent and abate future disasters. That’s where DEEP was born.

The solution

Remove collaboration barriers, improve response time, and make stronger and more informed decisions. DEEP is a platform built with AI to centralize, accelerate, and strengthen inter-agency response to humanitarian crises at national and field levels. The free, open source tool was developed by field responders in the wake of the devastating 2015 Nepal earthquakes, and has since become a go-to resource for leading global humanitarian organizations, including UNHCR, UNICEF, UN OCHA, and the IFRC.Today, DEEP hosts the largest analysis framework repository in the international humanitarian sector, hosting more than 85,000 carefully annotated response documents and connecting more than 3,000 expert users worldwide. Since its inception, DEEP has been used to inform more than 1,800 international humanitarian projects whose scopes are estimated to impact more than 98 million people, including USAID’s response to COVID-19, ACAPS’s response to the Rohingya crisis, and UNHCR and IFRC’s response to the Venezuela migrations crisis. More than two-thirds of individuals in areas where DEEP is utilized earn less than $5 USD per day.

Getting started

General

What’s the stack like? We’re glad you asked. DEEP’s server is powered by Django/Postgresql and we use React for most front end tasks. The whole kit and caboodle is wrapped up in Docker and you can easily deploy DEEP on your local machine to begin developing.

The information below will help you get started on building DEEP locally.

Dependencies

Clone Deeper Repo

git clone https://github.com/the-deep/deeper.git deep-project-root
Go to Deeper project root
cd deep-project-root
Clone client and server
git clone https://github.com/the-deep/server.git server

git clone https://github.com/the-deep/client.git client
git clone --branch=feature/only-ary https://github.com/the-deep/client.git ./ary-only-client

git clone https://github.com/the-deep/deepl-deep-integration.git deepl-service
Setup client
cd client
Building
cd deep-project-root#
Copy ./env-sample as .env
cp .env-sample .env
docker-compose pull
docker-compose build
Useful commands for running Docker
  • Starting docker containers
  • Running django migrations
docker-compose exec web ./manage.py migrate
  • Viewing logs (for detached mode)
docker-compose logs -f          # view logs -f is for flow
docker-compose logs -f web      # view logs for web container
docker-compose logs -f worker      # view logs for worker container
  • Running commands
docker-compose exec web <command>    # Run commands inside web container
docker-compose exec web bash         # Get into web container's bash
[Note: web is the container name (view docker-compose.yml)]
Useful Plugins for Debugging React
React Developer ToolsRedux DevTools
Adding dependencies [web]
  • Get into web container bash
  • Adding Server Dependencies [Python]
    In server directory
    Add package in pyproject.yml file
Run poetry lock --no-update
In deeper directory
docker compose build
## Adding dependencies [Client]
  • Get into client container bash
docker-compose exec client bash
  • Adding Client Dependencies [JS]
cd code/
yarn add <dependency>       # Installs dependency and updates package.json and yarn.lock
Running tests locally
  • Python/Django tests
docker-compose exec web bash
Inside web container
docker-compose exec web pytest  # Run all test with fresh database
docker-compose exec web pytest --reuse-db --last-failed -vv  # Run last failed test but reuse existing db
docker-compose exec web pytest apps/user/tests/test_schemas.py::TestUserSchema::test_user_last_active  # Run specific tests
  • JS/React test
docker-compose exec client bash
Inside client container
cd /code/
yarn test                   # Provides different usages
yarn test a                 # Overall JS/React test
yarn test o                 # Test only changed files
yarn test --coverage        # Also generate coverage

Installation

Clone Deeper Repo

To clone the deeper repository, use the following command:
cd client

Go to Deeper Project Root

Navigate to the deeper project root directory:

Clone Client, Server, and Other Repos

To clone the server repository, use the following command:
git clone https://github.com/the-deep/server.git server
To clone the client repository, use the following command:
git clone https://github.com/the-deep/client.git client
To clone the client repository (ARY branch), use the following command:
git clone --branch=feature/only-ary https://github.com/the-deep/client.git ./ary-only-client
To clone the Deepl services repository, use the following command:
git clone https://github.com/the-deep/deepl-deep-integration.git deepl-service
Modify Deepl Service Dockerfile (for M1 Mac)
If you are using a Mac with an M1 chip, make the following modification to the deepl-service/Dockerfile file:
Change the FROM statement to specify the amd64 platform:
FROM --platform=linux/amd64 python:3.10-slim-buster
Disable Airplay Receiver (for Mac)
If you are on a Mac, disable the ‘Airplay Receiver’ from ‘System Preferences > Sharing’ to make port 5000 available for use.
Start Servers with Docker Compose
To start the servers using Docker Compose, follow these steps:
1. Make sure you have the latest versions of Docker and Docker Compose installed.
2. Build the Docker containers:
docker-compose build
3. Start the servers:
docker-compose up
Useful Commands
  • To migrate, go to the docker container and run migrate command:
docker-compose exec web ./manage.py migrate
  • To test, go to the docker container and run the test command:
docker-compose exec web pytest  # Run all test with fresh database
docker-compose exec web pytest --reuse-db --last-failed -vv  # Run last failed test but reuse existing db
docker-compose exec web pytest apps/user/tests/test_schemas.py::TestUserSchema::test_user_last_active  # Run specific tests
  • To add a new package the following steps:
1. In the server directory, add package in pyproject.yml file. Run
poetry lock --no-update
This will update poetry.lock
2. In the deeper directory
docker compose up --build

AwsCopilotDeployment

aws cloudformation deploy --capabilities CAPABILITY_NAMED_IAM --template-file ./aws/cfn-macros.yml --stack-name deep-custom-macros

Create Client Stack

Get hosted zone id
aws route53 list-hosted-zones-by-name --dns-name thedeep.io | jq -r '.HostedZones[0].Id' | cut -d '/' -f 3

For staging (Replace HostedZoneId with valid value)

aws cloudformation deploy --capabilities CAPABILITY_NAMED_IAM --template-file ./aws/cfn-client.yml --stack-name deep-staging-client --tags app=deep env=staging --parameter-overrides Env=staging HostedZoneId=XXXXXXXXXXXXXXXXXXXXX
SES Setup
For the email used for EMAIL_FROM, verify and add domain to SES.
Dockerhub authentication
We need DOCKERHUB authentication to pull base images.

To do that make sure ssm-paramter are created. Used in copilot/buildspec.yml
aws ssm put-parameter --name /copilot/global/DOCKERHUB_USERNAME --value <USERNAME> --type SecureString --overwrite
aws ssm put-parameter --name /copilot/global/DOCKERHUB_TOKEN --value <TOKEN> --type SecureString --overwrite
Backup account info
aws ssm put-parameter --name /copilot/global/DEEP_BACKUP_ACCOUNT_ID --value <ACCOUNT-ID> --type String --overwrite
Init
Setup app with domain thedeep.io
copilot app init deep --domain thedeep.io
Setup staging first
copilot env init --name staging --profile {profile} --default-config
Setup each services
* copilot svc init --name web
* copilot svc init --name worker
* copilot svc init --name export-worker
Secrets
* Load secrets (Sample: secrets-sample.yml)
* copilot secret init --cli-input-yaml secrets.yml
Deploy (Staging)
copilot svc deploy --name web --env staging
Exec to the server
copilot svc exec --name web --env staging
– Inside container –

Initial collectstatic & migrations
* ./manage.py collectstatic --no-input
* ./manage.py migrate  # Or migrate data manually.
Before deploying worker, export-worker, we need to manually change the template for now.
* copilot svc deploy --name worker --env staging
* copilot svc deploy --name export-worker --env staging
Old domain to new domain redirect
For staging
aws cloudformation deploy \
--capabilities CAPABILITY_NAMED_IAM \
--template-file ./aws/cfn-domain-redirect.yml \
--stack-name deep-alpha-to-staging-redirect \
--parameter-overrides \
   Env=staging \
   HostedZoneId=XXXXXXXXXXXXXXXXXXXXX \
--tags \
   app=deep \
   env=staging
For prod
aws cloudformation deploy \
--capabilities CAPABILITY_NAMED_IAM \
--template-file ./aws/cfn-domain-redirect.yml \
--stack-name deep-beta-to-prod-redirect \
--parameter-overrides \
   Env=prod \
   HostedZoneId=XXXXXXXXXXXXXXXXXXXXX \
--tags \
   app=deep \
   env=prod

Deploy custom CFN Macros (Used later for copilot addons)

Alpha deployment

Generate self-signed certificate for nginx

# Create a directory (shared to nginx container)
mkdir nginx-certs
cd nginx-certs
# Generate the certificate
openssl req -x509 -nodes -newkey rsa:2048 -keyout server.key -out server.crt

Create .env file

cp .env-alpha-sample .env
# Modify as needed, make sure empty value are provided

Start

docker-compose -f docker-compose-alpha.yml -d up
docker-compose -f docker-compose-alpha.yml logs -f

API Reference

DEEP uses both GraphQL and REST. Most of the platform uses GraphQL but there are some parts where we still use REST which we are trying to migrate from as well.

REST Endpoints

Rest API Endpoints documentation

GraphQL Endpoints

GraphQL Endpoints documentation

REST API

A thorough documentation of the API itself can be found at /api-docs/.

Authentication

For the core deep, client -> backend, we use session-based authentication instead. Most of the external clients use basic auth for now.

Response Formats

On success (200 or 201 responses), the body of the response contains the requested resource.

On error, the http status code will represent the type of error and the body of the response contains the internal server error code and an error message.

A json object errors is also returned. It indicates a key-value pair for each field error in user request as well as a list of non-field-errors.
{
   "timestamp": "2017-09-24T06:49:59.699010Z",
      "errorCode": 400,
   "errors": {
       "username": "This field may not be blank.",
       "password": "This field may not be blank.",
      "nonFieldErrors": [
           "You do not permission to modify this resource."
       ]
   }
}
Pagination and filtering

If an API returns a list of results, it is possible to query only a subset of those results using query parameters.

You can use the limit and offset query parameters to indicate the number of results to return as well as the initial index from which to return the results.

The order of the results can be unique to each API. However, if the resource returned by the API has modified modifiedAt or createdAt fields, and unless anything else is explicitly defined for that API, the results are usually ordered first by modifiedAt and then createdAt.

The list API response always contains the count and results fields where count is the total number of items available (not considering the limit and offset) and results is the actual list of items queried. The API can also contain the next and previous fields indicating the URL to retrieve the next and previous set of items of the same count.

Example request:
GET /api/v1/leads/?offset=0&limit=1
Example response:
{
   "count": 2,
   "next": "http://localhost:8000/api/v1/leads/?limit=1&offset=1",
   "previous": null,
   "results": [
       {
           "id": 1,
           "createdAt": "2017-09-29T12:23:18.009158Z",
           "modifiedAt": "2017-09-29T12:23:18.016450Z",
           "createdBy": 1,
           "modifiedBy": 1,
           "title": "Test",
           "source": "Test source",
           "confidentiality": "unprotected",
           "status": "pending",
           "publishedOn": null,
           "text": "This is a test lead and is a cool one.",
           "url": "",
           "website": "",
           "attachment": null,
           "project": 4,
           "assignee": [
               1
           ]
       }
   ]
}
Many APIs also take further query parameters to filter the query set. For example, you can filter Sources by projects using:
GET /api/v1/leads/?project=2
The API documentation at /api/v1/docs/ also lists filters available for each API.

Ordering

To order the results by a particular field, one can use the ordering filter. By default, ascending is used, but descending can be enforced by using minus (-) sign with the field.
GET /api/v1/leads/?ordering=title
GET /api/v1/leads/?ordering=-title
Camel Case vs Snake Case

The JSON requests and responses are, by default, in camel case. JSON requests in snake case are also supported. However, the filtering and ordering parameters need to be in snake case. This is because they need to directly correspond to proper sql column names, which by convention are in snake case.

HTTP Status Codes

Successful Requests:
  • 201 : When a new resource is created. Normally for POST requests only.
  • 200 : For any other successful requests.
Client Errors:
  • 400 : Bad request: the json request doesn’t contain proper fields
  • 401 : Unauthorized: needs a logged in user
  • 403 : Forbidden: user does not have permission for the requested resource
  • 404 : Resource is not found in the database
  • 405 : Not a valid HTTP method
Server Errors:
  • 500 : See internal error code below for actual error
Other codes like 502, 504 etc. may be unintentionally raised by nginx, WSGI, or DNS servers for which the web server is not responsible.

Internal Error Codes

For most types of errors like forbidden, unauthorized and not found, the internal error code returned is the same as the HTTP status code.For server errors, all except the following lists of predefined errors will have internal error code 500 by default.
  • 401 : User is not authenticated. Access token is required in the authorization header.

Testing

Backend

Tests are written using django/django-rest test classes. Tests files are stored in the tests directory which lies inside each app and utils module.
▾ docs/
  mixin_backend.md
▾ server/
  ▾ apps/
     ▾ users/
        ▾ tests/
           __init__.py test_apis.py
The following is an example for testing django-rest API:

backend/project/tests/test_apis.py

python

from rest_framework.test import APITestCase from user.tests.test_apis import AuthMixin from project.models import Project
class ProjectMixin():
   """
   Project related methods
   """
   def create_or_get_project(self):
       """
       Create new or return recent projects
       """
       project = Project.objects.first()
       # ...
       return project
class ProjectApiTest(AuthMixin, ProjectMixin, APITestCase):
   """
   Project Api Test
   """
   def setUp(self):
       pass
   def test_create_project(self):
       pass
The following is an example for testing utils:

backend/project/tests/test_apis.py

from django.test import TestCase
from utils.extractors import (
   PdfExtractor, DocxExtractor, PptxExtractor
)

class ExtractorTest(TestCase):
   """
   Import Test
   Pdf, Pptx and docx
   Note: Html test is in WebDocument Test
   """
   def setUp(self):
       pass
   def extract(self, extractor, path):
       pass
   def test_docx(self):
       """
       Test Docx import
       """
       pass
References:
Writing Django testsWriting API testsTest Mixin

FrontEnd

# Testing

Tests are written using Enzyme and Jest. Tests files are stored in the __tests__ directory which lies inside the same directory as the component or logic that needs to be tested.

The following is an example of how to test if a component renders properly.
// components/Table/__tests__/index.js

import React from 'react';
import { shallow } from 'enzyme';
import Table from '../index';

// Describe a test suite: a group of related tests
describe('<Table />', () => {
   // Initial setup (synchronous)
   const tableData = [
       { a: 'b', c: 'd' },
       { a: 'e', c: 'f' },
   ];

   const tableHeaders = [
       { a: '1', c: '2' },
   ];

   const wrapper = shallow(
       <Table
           data={tableData}
           headers={tableHeaders}
       />,
   );

   // Test if it renders
   it('renders properly', () => {
       expect(wrapper.length).toEqual(1);
   });
   // More tests
   // ...
});
If the initial setup is asynchronous, one may use beforeEach or beforeAll functions, both of which can return a promise object.

To test redux-connected components, one can use the redux-mock-store:
import React from 'react';
import { Provider } from 'react-redux';
import configureStore from 'redux-mock-store';
import { shallow } from 'enzyme';
import Table from '../index';

describe('<Table />', () => {
 const mockStore = configureStore();
 const store = mockStore(initialState);
 const wrapper = shallow(<Provider store={store}><Table /></Provider>)
 it('renders properly', () => {
     expect(wrapper.length).toEqual(1);
      expect(wrapper.prop('someProp').toEqual(initialState.someProp);
 });
});
More examples using redux: Writing tests.

For event based behavioral testing, Enzyme’s simulate can be used as helper method.
wrapper.find('button').simulate('click');
expect(wrapper.find('.no-of-clicks').text()).toBe('1');

Contributing

How to Contribute

Thank you for wanting to contribute to this project! We look forward to working with you. Here is a basic set of guidelines for contributing to our project.
  • Please checkout the issues in the deeper repository or the individual repository you want to contribute to. You can also create a new issue regarding what you want to work on. Please follow the guidelines mentioned above while creating the issue. You will need to include the link to the issue while creating a Pull Request(PR).
  • Fork the relevant repository.
  • Make necessary changes to the repository taking into account the coding guidelines mentioned in the individual repositories. Some general guidelines include the use of “git rebase” to organize commits and these instructions regarding commit messages:
  • Separate subjects from body with a blank line
  • Limit the subject line to 50 characters
  • Capitalize the subject line
  • Do not end the subject line with a period
  • Use the imperative mood in the subject line
  • After your work is complete, make a Pull Request from your repository to the-deep repository. Describe what you did in as much detail as possible. Furthermore, please link to the issue in the description.
  • Our development team will go through the pull request and merge them if they are satisfactory. They can also review the PR and ask for explanations/modifications.

Contributing to the Backend

Python Coding Guidelines
  • Follow PEP 8.
  • Use 4 spaces … . never tabs.
  • Multiple Imports
Avoid this
from .serializers import ProjectSerializer, ProjectMembershipSerializer
Do this
from .serializers import (
   ProjectSerializer, ProjectMembershipSerializer
)

FAQ

How to get a python shell [with django initialization]?
docker-compose up -d
docker-compose exec web ./manage.py shell

Contributing to the FrontEnd

React
  • setState is an async function. If you need an action to be called after setState, it provides a second argument which accepts a callback.
  • Use immutable objects most of the time. Instead of mutating an object, use immutable-helpers.
  • If a re-render is expected after a value is changed, the value should be kept in this.state. If not, don’t keep it in this.state.
  • Redux store stores global states.
  • If possible, don’t instantiate objects and functions in render method. Also avoid writing complex logic in render method.
  • When setting a new state to component, you can only set attributes that need to be changed.
Internal Libraries
  • Use RestRequest for all REST api calls.
  • Use Form to validate form data.
DEEP React Best Practices
  • Most likely, you will never require jquery.
  • For JSX, if there is more than one attribute, the attributes must be broken down in multiple lines. All these attributes should be sorted in alphabetical order.
  • For imports, the absolute and relative imports must be spaced with a new line. All the default imports must be sorted alphabetically.
  • propTypes and defaultProps for React classes must be written at the top of the file, only after imports. The attributes should be sorted alphabetically.
  • Prefer decorators for Higher-Order Components
  • Always use selectors to access data from redux store. Some additional calculations can be performed in selectors and the calculations are cached if selectors are used.
  • Always use action creators to dispatch action to redux store and always use action types to define an action creator. PreviousNext

Change Log

DEEP is an open source, community driven web application to intelligently collect, tag, analyze and export secondary data. This repository contains code for DEEP 2.0, a large full-stack rewrite of DEEP 1.0.

DEEP’s brain is powered by DEEPL, an suite of tools to provide NLP recommendations to the platform. Here is its Github repo.

More information regarding our changelog can be found on the wiki.

Documentation

Introduction

DEEP’s documentation is built on top of Sphinx and uses a theme provided by Read the Docs. We accept contributions to the documentation of the DEEP project too.

Contributions to Documentation

A thorough documentation of the API itself can be found at /api-docs/.
Contributions to DEEP’s documentation must adhere to the contribution guidelines, just like any other code contribution. DEEP’s documentation is generated as a static page using Sphinx. During deployment, the docs are generated using a pre-deployment pipeline in a similar manner. For local creation of docs, refer the notes below.

Steps to generate DEEP docs locally

1. Navigate to the documentation folder:
cd docs/
2. Install sphinx and supporting packages:
pip install -r requirements.txt
3. Generate static documentation locally:
make html
4. View the generated docs by opening the index file in your browser, at the following path: <path-to-project>/docs/_build/html/index.html

Useful References

  • sphinx-autobuild is a tool that auto-builds the documentation everytime a change is detected in the docs/ folder
  • rst Cheatsheet for a handy reference on reStructuredText, the markup language used by Sphinx
GitHub Repository
DEEP is an an innovative and open-source resource designed to empower organizations with crucial insights during times of crisis. We leverage the power of analysis and artificial intelligence to ensure data-informed decision-making. The DEEP platform is built on a foundation of open-source technology, hosted on GitHub, the world's leading platform for collaborative software development. On GitHub we have an open-source codebase that allows complete transparency, allowing anyone to review, audit, and understand how the DEEP platform operates. The DEEP platform is freely accessible to anyone, anywhere, removing barriers to critical data and insights that can save lives during crises.