ARTÍCULOS

Crop yield estimation using satellite images: comparison of linear and non-linear models

Estimación del rendimiento de cultivos usando imágenes satelitales: comparación de modelos lineales y no lineales

 

Sayago, S. and Bocco, M.

Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba. CC 509 (5000) Córdoba, Argentina.
Correspondence to: silvina.sayago@gmail.com

Fecha de recepción: 06/10/2017; fecha de aceptación: 11/06/2018

 


SUMMARY

Development of models for crop yield prediction using remote sensing allows accurate, reliable and timely estimations over large areas. Particularly, this information is necessary to ensure the adequacy of a nation's food supply as well as to aid policy makers and farmers. In Argentina, soybean (Glycine max (L.) Merr.) and corn (Zea mays L.) are the most important crops. The goal of this research was to develop and evaluate linear and non-linear models to estimate crop yield from satellite data. Particularly, we proposed and applied those models to obtain soybean and corn yield in the central region of Córdoba (Argentina) using Landsat and SPOT images. The models were designed taking into account all or some bands included in the images from one or both satellites. Results showed that models provided a good fit when all images are used, being superior the accuracy obtained by neural networks (NN). For soybean, the best estimation presented a coefficient of determination equal to 0.90 with NN and 0.82 with multiple linear regression models, and for corn 0.92 and 0.88, respectively. This study concludes that Landsat and SPOT images can be effectively used to predict, in early to mid-season crop growth stages, corn and soybean yield.

Keywords: Neural networks; Multiple linear regression; Soybean; Corn; Modelling.

RESUMEN

El desarrollo de modelos para predecir rendimiento de los cultivos mediante teledetección permite realizar estimaciones precisas y oportunas en grandes áreas. En particular, este es un insumo necesario para prever tanto la provisión de alimentos, como para contribuir con formuladores de políticas y agricultores. En Argentina, los cultivos más importantes son soja (Glycine max (L.) Merr.) y maíz (Zea mays L.). El objetivo de este trabajo fue desarrollar modelos lineales y no lineales para estimar rendimiento de los cultivos a partir de datos satelitales. En particular se propuso y se aplicaron esos modelos en la región central de Córdoba (Argentina), usando imágenes Landsat y SPOT. Los modelos se construyeron considerando todas o algunas bandas incluidas en las imágenes de uno o ambos satélites. Los resultados mostraron un buen ajuste cuando se utilizaron todas las imágenes, siendo superior la precisión obtenida por las redes neuronales (NN). Para soja, la mejor estimación presentó un coeficiente de determinación de 0,90 con NN y 0,82 para los modelos de regresión múltiple y para maíz 0,92 y 0,88 respectivamente. En conclusión, las imágenes Landsat y SPOT pueden usarse para predecir en etapas tempranas y a mitad de la temporada de crecimiento, el rendimiento de maíz y soja.

Palabras clave: Redes neuronales; Regresión lineal múltiple; Soja; Maíz; Modelos.


 

INTRODUCTION

National crop area estimates are typically obtained using ground data. Accurate and timely assessment of crop yield is an essential process to ensure the adequacy of food supply. It provides policy makers, governmental agencies and farmers the necessary information to better manage harvest, storage, import/export, transportation and marketing activities (Craig and Atkinson, 2013; Gusso, Ducati, Veronez, Arvor and Gonzaga da Silveira Jr, 2013)

In the past, estimates of crop yield were done, in general, from the expertise of farmers or, as claimed by Geipel, Link and Claupein, (2014) "better estimations can be drawn from destructive sampling procedures in representative areas". Later, approaches using simple mathematical or statistical relationships were built with agronomic and meteorological data (Dadhwal and Ray, 2000), crop growth models (Thorp, De Jonge, Kaleita, Batchelor and Paz, 2008); some of those models were also applied in Argentina (Grassini etal. 2013; Milera and Crotti 2014).

In general, the processes and properties that regulate the crop yield vary in space and time. Therefore, the use of remote sensing techniques is an effective tool for assessing and monitoring crop yield because satellite data provide a spatial and periodic, comprehensive view of the real crop state (Geipel et al., 2014). The information provided by remote sensing allows obtaining field data at a lower cost compared to other methods, the coverage of large areas and the possibility of periodic repetitions (Bocco, Sayago and Willington, 2014). Several types of images can be used to monitor the agricultural surface. In cultivated areas where plots are small, due to its spatial resolution, data from Landsat and SPOT are very suitable; however, the temporal resolution of these satellites and the possibility of clouds during their passes are sometimes a constraint (Kuenzer, Dech and Wagner, 2015). In particular, Gusso et al. (2013) evaluated the Coupled Model performance, which is entirely based on MODIS images, to estimate soybean production prior to the crop harvest in Rio Grande do Sul (Brasil). For Argentina, Holzman, Rivas and Piccolo (2014) evaluated the capacity in estimating regional yield of soybean with linear and quadratic functions using MODIS.

Neural network (NN) methodology is an alternative modeling and simulation tool, which is specially designed for nonlinear systems and they do not require prior assumption about the statistical behavior or any specific relationship between variables (Bocco et al., 2014). Kaul, Hill and Walthall (2005) investigated the NN models performances in predicting corn (Zea mays L.) and soybean (Glycine max (L.) Merr.) yields for typical climatic conditions and Dai, Huo and Wang (2011) simulated sunflower (Helianthus annuus L.) yield using NN and multi-linear regression models. Green, Salas, Martinez and Erskine et al. (2007) presented a Spatial Analysis Neural Network (SANN) and Multiple Linear Regression (MLR) algorithms for analyzing grain yield for wheat (Triticum aestivum L.) in Colorado (USA). Alvarez (2009) proposed a regional analysis for wheat yield in the Argentine Pampas, using surface regression and NN methodologies.

Extensive agriculture is the main crop cultivation system used in productive land in the central area of Argentina. Soybean is the most important crop, taking into account the economic yield obtained by farmers and the sown area (20,480,000 ha in 2015-2016), followed by corn, with more than 6,904,000 ha in the same period. In particular, the province of Córdoba is the second largest producer of soybean and corn in Argentina, with approximately 27% and 28% of the sowed hectares, respectively (Sayago, Ovando, and Bocco, 2017; Ministerio de Agroindustria, 2017).

The goal of this work is to build linear and non-linear models to estimate crop yield, using Landsat 8 and SPOT 5 images and validate them with soybean and corn yield data from the central region of Córdoba (Argentina).

MATERIAL AND METHODS

Study area

The study area is located in the central plains of Córdoba (Argentina), in the sub-region known as "Pampa Alta" (31° to 32° S; 63° to 64° W, approximately). As Bocco et al. (2014) state, its productive characteristics are very representative of the central region of Argentina (Figure 1). It presents a slightly undulating relief of hills developed on loessic material of silt loam texture with a slight slope to the east; soils in this area are classified as Entic and Typic Haplustoll characterized by Rollán and Bachmeier (2014). The climate in the study area is classified as dry sub-humid and the average annual rainfall is approximately 800 mm, concentrated in summer (Sayago et al., 2017).


Figure 1. Composition of study area and main crops in Córdoba (Argentina): a) Soybean crop, b) SPOT image, and c) Corn plant

Soybean is sown by direct seeding with maturity groups 3 and 4 of transgenic varieties resistant to glyphosate, with spacing between rows of 0.52 m, without fertilizer application. On the other hand, corn is sown by direct seeding also, with a distance between rows of 0.53 m and fertilized with urea (Bocco et al., 2014; Ferreyra, 2016) (Figure 1). Both crops were sown between October 15 and November 15 and harvested between April 20 and May 30.

Ground data

Yield prediction modeling was carried out using data collected from 26 plots with soybean (14) and corn (12), whose area varied between 15 and 40 ha. They were visited throughout the growing season (October - June). Immediately before harvesting (Figure 2), four samples were collected from each plot (within a circular area of 0.25 m2). The samples were dried to constant weight in an oven, and the threshing was performed manually. To calculate the average yield of each plot the weight of the samples obtained was adjusted to 13% moisture. In order to validate the yields obtained from the field samples, all values were also compared with average data reported by the farm owners for each plot; both yields were equivalent.


Figure 2. Spot image (March 22). Bands 1, 2 and 3. Studied plots highlighted, corn (yellow) and soybean (white). Crop photographs before harvesting.

Yield values obtained were in a range between 2,800 and 4,900 kg/ha for soybean plots and varied between 8,000 to 10,500 kg/ha for corn plots. As for the latter, there were, two plots with low yield (4,500 and 6,700 kg/ha, respectively), which were included because they allowed validating the robustness of the models, in presence of a significant variability.

Satellite data

This study examined four images from Landsat 8 and SPOT 5.

Landsat 8: two images corresponding to December 28 (I1) and January 13 (I2), with seven bands (b1 to b7). These bands correspond to the Operational Land Imager (OLI) which together with the Thermal Infrared Sensor (TIRS) are the two sensors of Landsat 8. Both images have the same spatial (30 m), temporal (16 days) and radiometric resolutions. It was not possible to obtain images in February because the days were cloudy when images of the area were recorded by the satellite.

SPOT 5: two images corresponding to March 12 (I3) and March 22 (I4). These images have three bands (XS1, XS2, XS3) with 10 m and the SWIR band with 20 m of spatial resolution (resampled by the provider to 10 m).

All of them were obtained in clear sky days. Their specifications are described in Table 1.

Table 1.Satellite and wavelength of each band, for images corresponding to study area

In order to construct the models, digital numbers were converted to surface level reflectance with the method of dark object subtraction (Chavez, 1998). This correction and the rest of the image processing: cut of the study area, layer stacking, overlay regions of interest (ROI), were carried out using the software ENVI 4.6.1. ROIs covered the pixels with valid information without considering the borders. This was done because differences in the phenological stage of crops can occur and/or disturbed pixels from neighboring coverage could be included (Figure 3).


Figure 3. Photograph of a soybean plot with phenological differences between edge and inside.

A subset of all the images was created to cover the study area, in order to make Landsat 8 and SPOT 5 spatial resolutions comparative. For this, the pixel size from Landsat 8 was adjusted to SPOT 5 (each Landsat 8 pixel was divided into nine parts with the same attribute value).

The total number of data (pixels that conformed the ROIs) were 48,783 (soybean: 19,821 and corn: 28,962) and were randomized before being input into the models. Half of the reflectance data were used for the learning process, this is to obtain the models coefficients and weights, and the other half was used for the validation process, with Multiple Linear Regression and NN methodologies.

In this work, we did not consider the algebraic relations between image bands (vegetation indices) as input for the models. On the one hand, as Ovando et al. (2016) and long before Doraiswamy et al. (2004) stated, the relationship between yield and Normalized Difference Vegetation Index (NDVI) may not be adequate in extreme meteorological conditions. This is because the difference between very good or average crop conditions can be masked by NDVI saturation. On the other hand, Li, Liang, Wang and Qin (2007) have already worked with these methodologies including this vegetation index; however, these authors used MODIS images with 1 km resolution.

Models

a- Multiple Linear Regression models were proposed to estimate crop yield:

where the regression variables are y = surface reflectance of i band of SPOT/Landsat satellite and model constants are a and b (this last includes the error term).

b- Neural networks were used to estimate crop yield: they are a structure of neurons joined by nodes that transmit information from one neuron to another, which produces a result by means of mathematical functions (Hilera González and Martínez Hernando, 2000).

In this work, multilayer perceptron neural networks (Figure 4) were designed with an input layer whose number of neurons was equal to the number of bands considered in each model; a hidden layer was designed with the same number of neurons than the respective input layer. The output layer was built with only one neuron that indicates the calculated crop yield.


Figure 4. Scheme of the neural network of multilayer perceptron type.

The general steps that describe the training algorithm were executed according to Bocco et al. (2014). This process was repeated 3,000 times or until the mean square error was lower than a desired value (in these models equal to 0.01).

Table 2 summarizes all variables used, for each Multiple Linear Regression (MLRi) and Neural Network (NNi) models. They were designed taking into account all or some bands included in the images from one or both satellites.

Table 2.Regression and Neural Network models: variables and images used

Validation models and statistical analysis

A set of independent data (50% of total pixels, with known yields) was used for the validations of models. The errors were estimated using standard statistics: coefficient of determination (R2) and root mean square error (RMSE); also scatter plots between measured and estimated values were considered. The equation for RMSE is given by:

where n is the number of data, p is the number of parameters to be estimated, SSE is the sum of squared error.

RESULTS AND DISCUSSION

All regression and neural network models developed to estimate yield provided a good fit with measured yield. Table 3 presents the R2 and RMSE values, which show highly significant relationships between predicted and observed yield, for soybean.

Table 3.Statistic values for soybean yield estimation: Neural Network and Multiple Linear Regression models

Neural networks present the best fits when they are compared with MLR models, for soybean crop. Using all images and their bands, NN1 and MLR1 are the models that best estimate yield (Table 3). Models NN2 and MLR2, which did not include the SWIR (SPOT 5) and b6 - b7 bands (Landsat 8), also presented high values of coefficient of determination. It should be noted that if only early dates in the crop calendar are considered, i.e. only Landsat 8 images are used, with NN models, yield can be predicted with good accuracy (R2 = 0.88). Estimates with multiple linear regression models present a lower coefficient of determination and RMSE increases about 50% (MLR4 and MLR5). Models that use the two bands included in the NDVI definition, and only dates corresponding to late phenological crop stages (SPOT images), present good fit with neural network (NN7). This result confirms what Bauer (1975) stated, that the infrared reflectance decreased and the red one increased with maturity after the crop had reached their maximum vegetative growth.

The results obtained are comparable to Gusso et al. (2013), who evaluated the Enhanced Vegetation Index, which is part of MOD13Q1-V005 product, to estimate soybean production in Rio Grande do Sul (Brasil). Results obtained using linear least squares regression analysis estimated production, at municipality and state level, with R2 = 0.91 and R2 = 0.82 respectively. Prasad, Chai, Singh and Kafatos (2006) improved yield estimation considering NDVI (from AVHRR-NOAA), soil moisture, surface temperature and rainfall. With these four inputs, at Iowa state level, they obtained a linear regression model with R2=0.86. Doraiswamy, Akhmedov, Beard, Stern and Mueller (2007) developed a 5-year multi-regression algorithm and showed, at state level, coefficients of determination of 0.88 (Iowa) and 0.44 (Illinois) for soybean.

Among the models that estimate corn yield, the best fit is also reached with NN methodology, using images from one or two satellites (Table 4).

Table 4.Statistic values for corn yield estimation: Neural Network and Multiple Linear Regression models

Using NN and exclusively images from one satellite, models that estimate corn yield with high precision included as input the bands that define the NDVI (NN12 and NN14). Using NN models that estimate corn yield, Li et al. (2007) informed for MLR models, R2 values between 0.34 and 0.86 with RMSE values ranged from 1,290 to 2,500 kg/ ha and for NN models the R2 values ranged from 0.53 to 0.94 with RMSE between 770 and 1,900 kg/ ha. The RMSE values obtained by these authors significantly exceed those obtained in this work.

All NN models presented estimations with better fit values than MLR models, when estimates were analysed at plot scale. It can be observed, in Figure 5, that for low or high yields, in both crops, the models present the greater over or underestimations respectively.


Figure 5.Soybean and corn yield, measured and estimated values for the best models, at plot level.

Table 5 shows all coefficients for the best MLR models (two for soybean and two for corn, respectively). As previously stated the results of the work confirm the relevance of this methodology, although the coefficients are valid only for the study area.

Table 5.Regression coefficients for the best MLR models*

Mixed models, which included all image data and both crops, showed very good results, their statistic values are presented in Table 6.

Table 6. Statistic values for mixed models that estimate yield

The mixed models, which can be used without previously discriminating the crop, achieved an accuracy as good as NN1 and NN8 models (Figure 6). This figure also shows that MLR methodology underestimates the high yields and overestimates yields lesser than 3,500 kg/ha significantly.


Figure 6. Measured vs. estimated yields for NN1, NN8, MLR1, MLR8 (soybean or corn crop) and NN-MLR mixed models at plot level.

CONCLUSIONS

This study shows that Landsat 8 and SPOT 5 images can be used to predict corn and soybean yield in early to mid season crop growth stages. Multiple linear regression and neural network models were generated to estimate crop yield using, all or some, spectral bands of one or both satellites.

Both methodologies were efficient in capturing the relationship between crop yield and spectral values, although neural network models have higher precision than the multiple linear regression ones. However, the latter technique is a possible alternative due to its easy implementation.

If the crop type present in the plots is not known, the mixed models that include the presence of soybean and corn can accurately describe the crop yield.

The possibility of combining satellite images with climatologic or soil data to improve the performance of yield estimation, is the next step to be explored.

ACKNOWLEDGEMENTS

The authors express their gratitude to Secyt-UNC for the financial support for this study (Res. 113/17). Thanks to G. Ovando for his comments and contributions. Thanks to E. Willington, M. M. Nolasco, J. P Clemente, D. Faraoni and G. Selva. for the collaboration in data collection.

REFERENCES

1. Alvarez, R. (2009). Predicting average regional yield and production of wheat in the Argentine pampas by an artificial neural network approach. European Journal of Agronomy, 30(2), 70-77.

2. Bauer, M. E. (1975). The role of remote sensing in determining the distribution and yield of crops. Advances In Agronomy, 27, 271-304.

3. Bocco, M., Sayago, S. and Willington, E. (2014). Neural Network and Crop Residue Index Multiband models for estimating crop residue cover from Landsat TM and ETM+ images. International Journal of Remote Sensing, 35(10), 3651-3663.

4. Chavez Jr, P. (1989). Radiometric Calibration of Landsat Thematic Mapper Multispectral Images. Photogrammetric Engineering and Remote Sensing, 55, 1285-1294.

5. Craig, M. and Atkinson, D. (2013). A literature review of crop area estimation. Retrieved from The Food and Agriculture Organization of the United Nations website: http://www.fao.org/fileadmin/templates/ess/documents/meetings_and_workshops/GS_SAC_2013/Improving_methods_for_crops_estimates/Crop_Area_Estimation_Lit_review.pdf.

6. Dadhwal, V. and Ray, S. (2000). Crop assessment using remote sensing - Part II: crop condition and yield assessment. Indian Journal of Agricultural Economics, 55(2), 55-67.

7. Dai, X., Huo, Z. and Wang, H. (2011). Simulation for response of crop yield to soil moisture and salinity with artificial neural network. Field Crops Research, 121(3), 441-449.

8. Doraiswamy, P. C., Hatfield, J. L., Jackson, T. J., Akhmedov, B., Prueger, J. and Stern, A. (2004). Crop condition and yield simulations using Landsat and MODIS. Remote sensing of Environment, 92(4), 548559.

9. Doraiswamy, P. C., Akhmedov, B., Beard, L., Stern, A. and Mueller, R. (2007). Operational prediction of crop yields using MODIS data and products. International archives of photogrammetry, remote sensing and spatial information, Commission Working Group VIII WG VIII/10. Sciences Special Publications. European Commision DG JRC-Institute for the Protection and Security of the Citizen. Ispra, Italy.

10. Ferreyra, L. (2016). Rendimiento y comportamiento agronómico de híbridos de maíz en INTA Manfredi, campaña 2015/2016. Retrieved September 05, 2017 from Cartilla Digital Manfredi https://inta.gob.ar/sites/default/files/inta_rendimiento_y_comportamiento_agronomico_de_hibridos_de_maiz_en_inta_eea_manfredi_campana_2016_16.pdf.

11. Geipel, J., Link, J. and Claupein, W. (2014). Combined spectral and spatial modeling of corn yield based on aerial images and crop surface models acquired with an unmanned aircraft system. Remote Sensing, 6(11), 10335-10355.

12. Grassini, P., Monzón, J., Aramburu, F., Mercau, J., Jobbágy, E., Taboada, M. and Hall, A. (2013). Desarrollando un Atlas de Brechas de Rendimiento con relevancia local y global. Retrieved from Revista Técnica de la Asociación Argentina de Productores en Siembra Directa website: http://gea.unsl.edu.ar/pdfs/Grassini_et_al_GYGA_in_Argentina_AAPRESID_Maize_Ext_Pub_2013.pdf.

13. Green, T., Salas, J., Martinez, A. and Erskine, R. (2007). Relating crop yield to topographic attributes using spatial analysis neural networks and regression. Geoderma, 139(1-2), 23-37.

14. Gusso, A., Ducati, J., Veronez, M., Arvor, D. and Gonzaga da Silveira Jr., L. (2013). Spectral model for soybean yield estimate using MODIS/EVI data. International Journal of Geosciences, 4, 1233-1241.

15. Hilera González, J. and Martínez Hernando, V. (2000). Redes neuronales artificíales: fundamentos modelos y aplicaciones. Madrid: Ra-Ma.

16. Holzman, M. E., Rivas, R. and Piccolo, M. C. (2014). Estimating soil moisture and the relationship with crop yield using surface temperature and vegetation index. International Journal of Applied Earth Observation and Geoinformation, 28, 181-192.

17. Kaul, M., Hill, R. and Walthall, C. (2005). Artificial Neural networks for corn and soybean yield prediction. Agricultural Systems, 85(1), 1-18.

18. Kuenzer, C., Dech, S. and Wagner, W. (2015). Remote sensing time series: revealing land surface dynamics. Switzerland: Springer International Publishing.

19. Li, A., Liang, S., Wang, A. and Qin, J. (2007). Estimating crop yield from multi-temporal satellite data using multivariate regression and neural network techniques. Photogrammetric Engineering and Remote Sensing, 73(10), 1149-1157.

20. Milera, S. and Crotti, C. (2014). Implementación del modelo Cropsyst para la simulación del rendimiento del cultivo del maíz en una región de Argentina. Retrieved September 06, 2017 from Centro de Investigación, Observación y Monitoreo Ambiental website: http://hum.unne.edu.ar/revistas/geoweb/Geo4/archivos/milera2.pdf.

21. Ministerio de Agroindustria. (2017). Estimaciones Agrícolas: Series Históricas. Presidencia de la Nación. Retrieved September 14, 2017 from Ministerio de Agroindustria website: http://datosestimaciones.magyp.gob.ar/reportes.php?reporte=Estimaciones.

22. Ovando, G., de la Casa, A., Bressanini, L., Martinez, J., Miranda, C., Miretti, D., Melano, F. and Moglia, J. P. (2016). Metodología para estimar cual es la fecha óptima de adquisición de imágenes Landsat para estimar el rendimiento en un lote de soja en el marco de la agricultura de precisión. Actas del 8° Congreso de AgroInformática, Buenos Aires, Argentina.

23. Prasad, A., Chai, L., Singh, R. and Kafatos, M. (2006). Crop yield estimation model for Iowa using remote sensing and surface parameters. International Journal of Applied Earth Observation and Geoinformation, 8(1), 26-33.

24. Rollán, A. A. and Bachmeier, O. A. (2014). Compactación y retención hídrica en Haplustoles de la provincia de Córdoba (Argentina) bajo siembra directa. AGRISCIENTIA, 3Í(1), 1-10.

25. Sayago, S., Ovando, G. and Bocco, M. (2017). Landsat images and crop model for evaluating water stress of rainfed soybean. Remote Sensing of Environment, 198, 30-39.

26. Thorp, K., De Jonge, K., Kaleita, A., Batchelor, W. and Paz, J. (2008). Methodology for the use of DSSAT models for precision agriculture decision support. Computer and Electronics in Agriculture, 64(2), 276-285.