visit 100000 villagers at birth of nation Bangladesh
download one page tour to 50 years of building partners empowering Asian village women to end poverty, design last mile health service and much more- how brac became the ngo world's largest networking economy DAY I ALMOST CHOKED EATING SUSHI WITH FAZLE ABED; The Japan Ambassador to Bangladesh was kindly hosting a dinner in remembrance of dad The Economist's Norman Macrae; Abed was telling his story: Bangladesh was less than 1 year old- it was 1972 and wanting to do more that being young Asia's leading oil company ceo, his greatest mistake was spending his life savings on building homes for 100000 refugees. Being an engineer I knew how to do that. But as we were opening the meta-village a young lady came up to me : what education/village enterprises do we need to prevent dozens of girls starving every week and scores of infants dying from dehydration? So she & I learnt we needed to innovate 5 last mile services for any space girls are born- safe homes, education, health, food, finance; in searching we found a billion village mothers wanting to COLLAB. ..video 1
 visit 100000 VILLAGERS AT BIRTH NATION BANGLADESH...Download 2-page guide ...consider cases of new nations after world war 2- how many cases lived up to the peoples simplest dreams, end poverty, food/health/safety for every family member, education geared to decent jobs and happiness? bangladesh did something different- empowering 90% of women to find partners in building their own communities- .over 50 years a new economic model emerged which a billion asian women applied to end extreme poverty- how?. sustainability generation goal 5 100% livesmatter communitY 1 PLATFORMS 1 PLATFORMS 5.1 5.2 5.3 5.4 5.5 5.6; 4 livelihood edu for all 4.1 4.2 4.3 4.4 4.5 4.6 ref Safiqul Islam 3 last mile health services 3.1 3,2 3.3 3.4 3.5 3.6 last mile nutrition 2.1 2.2 2.3 2.4 2.5 2,6 banking for all workers 1.1 1.2 1.3 1.4 1.5 1.6 .
..

### Methodological extensions

The design of MOSAIKS naturally provides two additional useful properties: suitability to fusing features with data from other sensors, and the ability to attribute image-scale predictions to sub-image level regions.

Available satellites exhibit a diversity of properties (e.g., wavelength, timing of sampling) that can be used to improve SIML predictions33. While most SIML approaches, including the above analysis, use a single sensor, the design of MOSAIKS allows seamless integration of data from additional satellites because the regression step is linear in the features. To demonstrate this, we include nighttime lights as a second data source in the analysis of survey data from Rwanda, Haiti, and Nepal discussed above (Supplementary Note 3.1). The approach mirrors that of the hybrid MOSAIKS-ResNet18 model discussed previously in that features extracted from the nighttime lights data are simply concatenated with those from MOSAIKS prior to the regression step. In all 36 tasks, predictions either improved or were unchanged when nighttime imagery was added to daytime imagery in the model (average ΔR2 = 0.03). This approach naturally optimizes how data from all sensors are used without requiring that users possess expertise on each technology.

Many use cases would benefit from SIML predictions at finer resolution than is available in training data33,34. Here we show that MOSAIKS can estimate the relative contribution of sub-regions within an image to overall image-level labels, even though only aggregated image-level labels are used in training (See Fig. 4c and Supplementary Fig. 12). Such label super-resolution prediction follows from the functional form of the featurization and linear regression steps in MOSAIKS, allowing it to be analytically derived for labels that represent nearly linear combinations of ground-level conditions (Supplementary Note 2.9 and Supplementary Fig. 11). We numerically assess label super-resolution predictions of MOSAIKS for the forest cover task, since raw label data are available at much finer resolution than our image labels. Provided only a single label per image, MOSAIKS recovers substantial within-image signal when predicting forest cover in 4 to 1024 sub-labels per label (within-image R2 = 0.54–0.32, see Supplementary Fig. 13 for a plot of performance against number of sub-labels and Supplementary Note 2.9 for m_ethodological details).

## Discussion

We develop a new approach to SIML that achieves practical generalization across tasks while exhibiting performance that is competitive with deep-learning models optimized for a single task. Crucial to planet-scale analyses, MOSAIKS requires orders of magnitude less computation time to solve a new task than CNN-based approaches and it allows 1km-by-1km image data to be compressed ~6–500 times before storage/transmission (see Methods). Such compression is a deterministic operation that could theoretically be implemented in satellite hardware. We hope these computational gains, paired with the relative simplicity of using MOSAIKS, will democratize access to global-scale SIML technology and accelerate its application to solving pressing global challenges. We hypothesize that there exist hundreds of variables observable from orbit whose application could improve human well-being if measurements were made accessible.

While we have shown that in many cases MOSAIKS is a faster and simpler alternative to existing deep-learning methods, there remain contexts in which custom-designed SIML pipelines will continue to play a key role in research and decision-making, such as where resources are plentiful and performance is paramount. Existing ground-based surveys will also remain important. In both cases we expect MOSAIKS can complement these systems, especially in resource-constrained settings. For example, MOSAIKS can provide fast assessments to guide slower SIML systems or extend the range and resolution of ground-based surveys.

As real-world policy actions increasingly depend on SIML predictions, it is crucial to understand the accuracy, precision and sensitivity of these measurements. The low cost and high speed of retraining MOSAIKS enables unprecedented stress tests that can support robust SIML-based decision systems. Here, we tested the sensitivity of MOSAIKS to model parameters, number of training points, and degree of spatial extrapolation, and expect that many more tests can be developed and implemented to analyze model performance and prediction accuracies in context. To aid systematic bench-marking and comparison of SIML architectures, the labels and features used in this study are made publicly available; to our knowledge this represents the largest multi-label benchmark dataset for SIML regression tasks. The high performance of RCF, a relatively simple featurization, suggests that developing and bench-marking other unsupervised SIML methods across tasks at scale may be a rich area for future research.

By distilling SIML to a pipeline with simple and mathematically interpretable components, MOSAIKS facilitates development of methodologies for additional SIML use cases and enhanced performance. For example, the ability of MOSAIKS to achieve label super-resolution is easily derived analytically (Supplementary Note 2.9). Furthermore, while we have focused here on tri-band daytime imagery, we showed that MOSAIKS can seamlessly integrate data from multiple sensors through simple concatenation, extracting useful information from each source to maximize performance. We conjecture that integrating new diverse data, from both satellite and non-satellite sources, may substantially increase the predictive accuracy of MOSAIKS for tasks not entirely resolved by daytime imagery alone; such integration using deep-learning models is an active area of research35.

We hope that MOSAIKS lays the foundation for the future development of an accessible and democratized system of global information sharing, where, over time, imagery from all available global sensors is continuously encoded as features and appended to a single table of data, which is distributed and used planet-wide. As a step in this direction, we make a global cross-section of features publicly available. Such a unified global system may enhance our collective ability to observe and understand the world, a necessary condition for tackling pressing global challenges.

## Methods

### Overview

Here we provide additional information on our implementation of MOSAIKS and experimental procedures, as well as a description of the theoretical foundation underlying MOSAIKS. Full details on the methodology behind MOSAIKS can be found throughout Supplementary Note 2.

### Implementation of MOSAIKS

We begin with a set of images ${\left\{{\mathbf{I}}_{\ell }\right\}}_{\ell =1}^{N}$, each of which is centered at locations indexed by  = {1, …, N}. MOSAIKS generates task-agnostic feature vectors x(I) for each satellite image I by convolving an M × M × S "patch”, Pk, across the entire image. M is the width and height of the patch in units of pixels and S is number of spectral bands. In each step of the convolution, the inner product of the patch and an M × M × S sub-image region is taken, and a ReLU activation function with bias bk = 1 is applied. Each patch is a randomly sampled sub-image from the set of training images ${\left\{{\mathbf{I}}_{\ell }\right\}}_{\ell =1}^{N}$ (Supplementary Fig. 5). In our main analysis, we use patches of width and height M = 3 (Supplementary Fig. 6) and S = 3 bands (red, green, and blue). To create a single summary metric for the image-patch pair, these inner product values are then averaged across the entire image, generating the kth feature xk(I), derived from patch Pk. The dimension of the resulting feature space is equal to K, the number of patches used, and in all of our main analyses we employ K = 8,192 (i.e., 213). Both images and patches are whitened according to a standard image preprocessing procedure before convolution (Supplementary Note 2.3).

In practice, this one-time featurization can be centrally computed and then features xk(I) distributed to users in tabular form. A user need only (i) obtain and link the subset of these features that match spatially with their own labels and then (ii) solve linear regressions of the labels on the features to learn nonlinear mappings from the original image pixel values to the labels (the nonlinearity of the mapping between image pixels and labels stems from the nonlinearity of the ReLU activation function). We show strong performance across seven different tasks using ridge regression to train the relationship between labels y and features xk(I) in this second step, although future work may demonstrate that other fitting procedures yield similar or better results for particular tasks.

Implementation of this one-time unsupervised featurization takes about the same time to compute as a single forward pass of a CNN. With K = 8,912 features, featurization results in a roughly 6 to 1 compression of stored and transmitted imagery data in the cases we study. Notably, storage and computational cost can be traded off with performance by using more or fewer features from each image (Fig. 3b). Since features are random, there is no natural value for K that is specifically preferable.

Tasks were selected based on diversity and data availability, with the goal of evaluating the generalizability of MOSAIKS (Supplementary Note 1.1). Results for all tasks evaluated are reported in the paper. We align image and label data by projecting imagery and label information onto a ~1 km × 1 km grid, which was designed to ensure zero spatial overlap between observations (Supplementary Notes 2.1 and 2.2).

Images are obtained from the Google Static Maps API (Supplementary Note 1.2)36, and labels for the seven tasks are obtained from refs. 2,31,37,38,39,40,41. Details on data are described in Supplementary Table 1 and Supplementary Note 1.

### US experiments

From this grid we sample 20,000 hold-out test cells and 80,000 training and validation cells from within the continental US (Supplementary Note 2.4). To span meaningful variation in all seven tasks, we generate two of these 100,000-sample data sets according to different sampling methods. First, we sample uniformly at random across space for the forest cover, elevation, and population density, tasks which exhibit rich variation across the US. Second, we sample via a population-weighted scheme for nighttime lights, income, road length, and housing price, tasks for which meaningful variation lies within populated areas of the US. Some sample sizes are slightly reduced due to missing label data (N = 91,377 for income, 80,420 for housing price, and 67,968 for population density). We model labels whose distribution is approximately log-normal using a log transformation (Supplementary Note 2.5 and Supplementary Table 3).

Because fitting a linear model is computationally cheap, relative to many other SIML approaches, it is feasible to conduct numerous sensitivity tests of predictive skill. We present cross-validation results from a random sample, while also systematically evaluating the behavior of the model with respect to: (a) geographic distance between training and testing samples, i.e., spatial cross-validation, (b) the dimension K of the feature space, and (c) the size N of the training set (Fig. 3, Supplementary Notes 2.7 and 2.8). We represent uncertainty in each sensitivity test by showing variance in predictive performance across different training and validation sets. We also benchmark model performance and computational expense against an 18-layer variant of the ResNet Architecture, a common deep network architecture that has been used in satellite-based learning tasks42, trained end-to-end for each task and a transfer learning approach24 utilizing an unsupervised featurization based on the last hidden layer of a 152-layer ResNet variant pre-trained on natural imagery and applied using ridge regression (Supplementary Notes 3.1 and 3.2).

### Global experiment

To demonstrate performance at scale, we apply the same approach used within the data-rich US context to global imagery and labels. We employ a target sample of N = 1,000,000, which drops to a realized sample of N = 423,476 due to missing imagery and label data outside the US (Fig. 4). We generate predictions for all tasks with labels that are available globally (forest cover, elevation, population density, and nighttime lights) (Supplementary Note 2.10).

### Label super-resolution experiment

Predictions at label super-resolution (i.e., higher resolution than that of the labels used to train the model), shown in Fig. 4c, are generated for forest cover and population density by multiplying the trained ridge regression weights by the un-pooled feature values for each sub-image and applying a Gaussian filter to smooth the resulting predictions (Supplementary Note 2.9). Additional examples of label super-resolution performance are shown in Supplementary Fig. 12. We quantitatively assess label super-resolution performance (Supplementary Fig. 13) using forest cover, as raw forest cover data are available at substantially finer resolution than our common ~ 1 km × 1 km grid. Performance is evaluated by computing the fraction of variance (R2) within each image that is captured by MOSAIKS, across the entire sample.

### Theoretical foundations

MOSAIKS is motivated by the goal of enabling generalizable and skillful SIML predictions. It achieves this by embedding images in a basis that is both descriptive (i.e., models trained using this single basis achieve high skill across diverse labels) and efficient (i.e., such skill is achieved using a relatively low-dimensional basis). The approach for this embedding relies on the theory of random kitchen sinks16, a method for feature generation that enables the linear approximation of arbitrary well-behaved functions. This is akin to the use of polynomial features or discrete Fourier transforms for function approximation generally, such as functions of one dimension. When users apply these features in linear regression, they identify linear weightings of these basis vectors important for predicting a specific set of labels. With inputs of high dimension, such as the satellite images we consider, it has been shown experimentally17,18,19 and theoretically43 that a randomly selected subspace of the basis often performs as well as the entire basis for prediction problems.

### Convolutional random kitchen sinks

Random kitchen sinks approximate arbitrary functions by creating a finite series of features generated by passing the input variables z through a set of K nonlinear functions g(z; Θk), each paramaterized by draws of a random vector Θ. The realized vectors Θk are drawn independently from a pre-specified distributions for each of k = 1. . . K features. Given an expressive enough function g and infinite K, such a featurization would be a universal function approximator43. In our case, such a function g would encode interactions between all subsets of pixels in an image. Unfortunately, for an image of size 256 × 256 × 3, there are 2256×256×3 such subsets. Therefore, the fully-expressive approach is inefficient in generating predictive skill with reasonably concise K because each feature encodes more pixel interactions than are empirically useful.

To adapt random kitchen sinks for satellite imagery, we use convolutional random features, making the simplifying assumption that most information contained within satellite imagery is represented in local image structure. Random convolutional features have been shown to provide good predictive performance across a variety of tasks from predicting DNA binding sites17 and solar flares19 to clustering photographs18 (kitchen sinks have also been used in a non-convolutional approach to classify individual pixels of hyper-spectral satellite data44). Applied to satellite images, random convolutional features reduce the number of effective parameters in the function by considering only local spatial relationships between pixels. This results in a highly expressive, yet computationally tractable, model for prediction.

Specifically, we create each Θk by extracting a small sub-image patch from a randomly selected image within our image set, as described above. These patches are selected independently, and in advance, of any of the label data. The convolution of each patch across the satellite image being featurized captures information from the entire ${\mathbb{R}}^{256×256×3}$ image using only 3*M2 free parameters for each k. Creating and subsequently averaging over the activation map (after a ReLU nonlinearity) defines our instantiation of the kitchen sinks function g(z; Θk) as g(I; Pk, bk) = xk(I), where bk is a scalar bias term. Our choice of this functional form is guided by both the structural properties of satellite imagery and the nature of common SIML prediction tasks, and it is validated by the performance demonstrated across tasks.

### Relevant structural properties of satellite imagery and SIML tasks

Three particular properties provide the the motivation for our choice of a convolution and average-pool mapping to define g.

First, we hypothesize that convolutions of small patches will be sufficient to capture nearly all of the relevant spatial information encoded in images because objects of interest (e.g., a car or a tree) tend to be contained in a small sub-region of the image. This is particularly true in satellite imagery, which has a much lower spatial resolution that most natural imagery (Supplementary Fig. 6).

Second, we expect a single layer of convolutions to perform well because satellite images are taken from a constant perspective (from above the subject) at a constant distance and are (often) orthorectified to remove the effects of image perspective and terrain. Together, these characteristics mean that a given object will tend to appear the same when captured in different images. This allows for MOSAIKS’s relatively simple, translation invariant featurization scheme to achieve high performance, and avoids the need for more complex architectures designed to provide robustness to variation in object size and orientation.

Third, we average-pool the convolution outputs because most labels for the types of problems we study can be approximately decomposed into a sum of sub-image characteristics. For example, forest cover is measured by the percent of total image area covered in forest, which can equivalently be measured by averaging the percent forest cover across sub-regions of the image. Labels that are strictly averages, totals, or counts of sub-image values (such as forest cover, road length, population density, elevation, and night lights) will all exhibit this decomposition. While this is not strictly true of all SIML tasks, for example income and average housing price, we demonstrate that MOSAIKS still recovers strong predictive skill on these tasks. This suggests that some components of the observed variance in these labels may still be decomposable in this way, likely because they are well-approximated by functions of sums of observable objects.

The full MOSAIKS platform, encompassing both featurization and linear prediction, bears similarity to a few related approaches. Namely, it can be interpreted as a computationally feasible approximation of kernel ridge regression for a fully convolutional kernel or, alternatively, as a two-layer CNN with an incredibly wide hidden layer generated with untrained filters. A discussion of these interpretations and how they can help to understand MOSAIKS’s predictive skill can be found in Supplementary Note 2.3.

## Code availability

The code used in this analysis is provided in the github repository available at https://github.com/Global-Policy-Lab/mosaiks-paper and additionally at https://doi.org/10.24433/CO.8021636.v2. The latter is part of the Code Ocean capsule, additionally containing data and computing environment (see Data Availability). On GitHub, release "v1.0” corresponds to the state of the codebase at the time of publication. See the repository’s Readme for more detailed information.

## References

1. 1.

Union of Concerned Scientists, UCS Satellite Database (2019).

2. 2.

Hansen, M. C. High-resolution global maps of 21st-century forest cover change. Science 342, 850 (2013).

3. 3.

Inglada, J. Operational high resolution land cover map production at the country scale using satellite image time series. Remote Sens. 9, 95 (2017).

4. 4.

Jean, N. Combining satellite imagery and machine learning to predict poverty. Science 353, 790 (2016).

5. 5.

Robinson, C., Hohman, F. & Dilkina, B. A Deep Learning Approach for Population Estimation from Satellite Imagery. Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities - GeoHumanities 2017 (ACM Press, New York, New York, USA, 2017), pp. 47–54.

6. 6.

Yu, L. Meta-discoveries from a synthesis of satellite-based land-cover mapping research. Int. J. Remote Sens. 35, 4573 (2014).

7. 7.

Haack, B. & Ryerson, R. Improving remote sensing research and education in developing countries: approaches and recommendations. Int. J. Appl. Earth Observation Geoinf. 45, 77 (2016).

8. 8.

Ball, J.E., Anderson, D.T. & Chan, C.S. A comprehensive survey of deep learning in remote sensing: theories, tools and challenges for the community. J. of Appl. Remote Sens. 11, 042609 (2017).

9. 9.

Romero, A., Gatta, C. & Camps-Valls, G. Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 54, 1349 (2016).

10. 10.

Cheriyadat, A. M. Unsupervised feature learning for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 52, 439 (2014).

11. 11.

Penatti, O. A. B., Nogueira, K., Santos, J. A. dos. Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? Proc of the IEEE conference on computer vision and pattern recognition workshops, pp. 44–51 (2015).

12. 12.

Jean, N. et al. Tile2Vec: Unsupervised Representation Learning for Spatially Distributed Data. Proceedings of the AAAI Conference on Artificial Intelligence 33, pp. 3967–3974 (2019).

13. 13.

Head, A., Manguin, M., Tran N., Blumenstock, J.E. Can Human Development be Measured with Satellite Imagery?, ICTD, pp. 1–8 (2017).

14. 14.

Zhu, L. et al. A review: Remote sensing sensors, Multi-purposeful application of geospatial data pp. 19–42 (2018).

15. 15.

Littlepage, J. DigitalGlobe moves to the cloud with AWS Snowmobile.

16. 16.

Rahimi, A. & Recht, B. Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. Adv. neural Inf. Process.… 1, 1313 (2008).

17. 17.

Morrow, A. et al. Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction, arXiv preprint (2017).

18. 18.

Coates, A. & Ng, A. Y. Neural networks: tricks of the trade. (Springer, 2012).

19. 19.

Jonas, E., Bobra, M., Shankar, V., Todd Hoeksema, J. & Recht, B. Flare prediction using photospheric and coronal image data. Sol. Phys. 293, 1 (2018).

20. 20.

Blumenstock, J. Don’t forget people in the use of big data for development Nature 561 (2018).

21. 21.

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016).

22. 22.

Li, Y., Zhang, H., Xue, X., Jiang, Y. & Shen, Q. Deep learning for remote sensing image classification: a survey. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 8, 1 (2018).

23. 23.

Gu, Y., Wang, Y. & Li, Y. A survey on deep learning-driven remote sensing image scene understanding: Scene classification, scene retrieval and scene-guided object detection, Applied Sciences 9 (2019).

24. 24.

Pan, S. J. & Yang, Q. A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22, 1345 (2010).

25. 25.

Xie, M., Jean, N., Burke, M., Lobell, D. & Ermon, S. Transfer learning from deep features for remote sensing and poverty mapping. Thirtieth AAAI Conference on Artificial Intelligence (2016).

26. 26.

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (2015).

27. 27.

Athey, S. Beyond prediction: using big data for policy problems. Science 355, 483 (2017).

28. 28.

F. Reed, et al., Gridded Population Maps Informed by Different Built Settlement Products, Data 3, 33 (2018).

29. 29.

Bedi, T., Coudouel, A., Simler, K., eds., More than a pretty picture: using poverty maps to design better policies and interventions (The World Bank, Washington, DC, 2007).

30. 30.

De Sherbinin, A. M., Yetman, G., MacManus, K. & Vinay, S. Improved mapping of human population and settlements through integration of remote sensing and socioeconomic data. AGUFM 2017, IN51H (2017).

31. 31.

U.S. Census Bureau, 2015 American Community Survey 5-Year Estimates, Table B19013.

32. 32.

U.S. Census Bureau, Budget Estimates, Fiscal Year 2021 (2021).

33. 33.

Tsagkatakis, G., et al. Survey of deep-learning approaches for remote sensing observation enhancement. Sensors (Switz.) 19, 1 (2019).

34. 34.

Malkin, K. et al. Label super-resolution networks. International Conference on Learning Representations (2018).

35. 35.

Hong, D. More diverse means better: multimodal deep learning meets remote sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 0196, 1 (2020).

36. 36.

37. 37.

Amazon Web Services, Terrain Tiles (2018).

38. 38.

Center for International Earth Science Information Network (CIESIN), Gridded Population of the World, Version 4 (2016).

39. 39.

NOAA National Centers for Environmental Information, Version 1 VIIRS Day/Night Band Nighttime Lights (2019).

40. 40.

U.S. Census Bureau, TIGER/Line Geodatabases (2016).

41. 41.

Zillow, ZTRAX: Zillow Transaction and Assessor Dataset (2017).

42. 42.

Perez, A. et al. Poverty prediction with public Landsat 7 satellite imagery and machine learning. NIPS 2017 Workshop on Machine Learning for the Developing World (2017).

43. 43.

Rahimi, A., Recht, B. Uniform approximation of functions with random bases. 46th Annual Allerton Conference on Communication, Control, and Computing IEEE, pp. 555–561 (2008).

44. 44.

Pérez-Suay, A. Randomized kernels for large scale Earth observation applications. Remote Sens. Environ. 202, 54 (2017).

## Acknowledgements

We thank Patrick Baylis, Joshua Blumenstock, Jennifer Burney, Hannah Druckenmiller, Jonathan Kadish, Alyssa Morrow, James Rising, Geoffrey Schiebinger, Adam Storeygard and participants in seminars at UC Berkeley, University of Chicago, Harvard, American Geophysical Union, the World Bank, the United Nations Development Program & Environment Program, Planet Inc., The Potsdam Institute for Climate Impact Research, the National Bureau of Economic Research, and The Workshop in Environmental Economics and Data Science for helpful comments and suggestions. We acknowledge funding from the NSF Graduate Research Fellowship Program (Grant DGE 1752814), the US Environmental Protection Agency Science To Achieve Results Fellowship Program (Grant FP91780401), the NSF Research Traineeship Program Data Science for the 21st Century, the Harvard Center for the Environment, the Harvard Data Science Initiative, the Sloan Foundation, and a gift from the Rhodium Group. The authors declare no conflicts of interest.

## Author information

### Contributions

E.R., J.P., T.C., I.B., V.S., B.R. and S.H. formulated the research idea and designed the overall analysis structure. V.S. collected imagery data and designed and implemented the featurization. J.P., T.C., I.B. and M.I. collected label data. E.R., J.P., T.C., I.B., V.S. and M.I. developed and carried out experimental procedures. E.R., J.P., T.C., I.B., V.S., M.I., B.R. and S.H. analyzed and interpreted the output of the experiments. E.R., J.P., T.C., I.B. and S.H. wrote the paper with contributions from V.S. and B.R.

### Corresponding author

Correspondence to Solomon Hsiang.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review informationNature Communications thanks Grigorios Tsagkatakis and the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.