|
| 1 | +if __name__ == "__main__": |
| 2 | + |
| 3 | + ''' |
| 4 | + Create a Spark program to read the house data from in/RealEstate.csv, |
| 5 | + group by location, aggregate the average price per SQ Ft and sort by average price per SQ Ft. |
| 6 | +
|
| 7 | + The houses dataset contains a collection of recent real estate listings in San Luis Obispo county and |
| 8 | + around it. |
| 9 | +
|
| 10 | + The dataset contains the following fields: |
| 11 | + 1. MLS: Multiple listing service number for the house (unique ID). |
| 12 | + 2. Location: city/town where the house is located. Most locations are in San Luis Obispo county and |
| 13 | + northern Santa Barbara county (Santa MariaOrcutt, Lompoc, Guadelupe, Los Alamos), but there |
| 14 | + some out of area locations as well. |
| 15 | + 3. Price: the most recent listing price of the house (in dollars). |
| 16 | + 4. Bedrooms: number of bedrooms. |
| 17 | + 5. Bathrooms: number of bathrooms. |
| 18 | + 6. Size: size of the house in square feet. |
| 19 | + 7. Price/SQ.ft: price of the house per square foot. |
| 20 | + 8. Status: type of sale. Thee types are represented in the dataset: Short Sale, Foreclosure and Regular. |
| 21 | +
|
| 22 | + Each field is comma separated. |
| 23 | +
|
| 24 | + Sample output: |
| 25 | +
|
| 26 | + +----------------+-----------------+ |
| 27 | + | Location| avg(Price SQ Ft)| |
| 28 | + +----------------+-----------------+ |
| 29 | + | Oceano| 95.0| |
| 30 | + | Bradley| 206.0| |
| 31 | + | San Luis Obispo| 359.0| |
| 32 | + | Santa Ynez| 491.4| |
| 33 | + | Cayucos| 887.0| |
| 34 | + |................|.................| |
| 35 | + |................|.................| |
| 36 | + |................|.................| |
| 37 | + ''' |
| 38 | + |
0 commit comments