Skip to content

s0606757/CA2017FP-Part1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NCTU 2017 Computer Architecture Final Project - Part 1

Part-1:Implement convolution , relu , and maxpooling in convLayerGPU() with CUDA and store your result in the outGPU and use NVVP to analyze your code.

Download

At command line type:

git clone https://github.com/s0606757/CA2017FP-Part1.git

Three sub-directory

./data

This directory contains the input data for the base program:

  • . /data/filter.txt - Store the values of filters
  • . /data/neuron.txt - Store the values of input neurons

./device

The program under this directory can show the device information.

usage

cd ./device
make
make run

./example

There are two examples(InnerProduct and Matrix Multiplication) under this directory.

usage

cd ./example/InnerProduct/
make
make run

or

cd ./example/MatrixMultiplication/
make
make run

Usage of the base program

make
make run

Evaluation

We will compare the execution time to get the speedup by

Speedup = convLayerCPU_execTime / convLayerGPU_execTime

Grading Policy

(A) Completeness (35%)
    Your result(convLayerGPU) must be correct (Pass the check) (10%)
    Your design(convLayerGPU) is faster than convLayerCPU() (20%)
    Use NVIDIA Visual Profiler to help you improve performance(TA will check it in your report) (5%)
(B) Report (35%)
    Describe your implementation algorithm and explain your results (15%)
    Discuss what kind of optimization you did( it is better or worse?) (10%)
    Show how you use NVVP to help you find and solve perf. issues (5%)
    Feedback of this part (5%)
(C) Performance Rank (30%)
    We will rank your CUDA kernels’ performance(execution time) on GTX K20C
    The fastest one will get 30% and the last one will get 1%

Other Rules

  • It’s team work, 1 ~ 3 people in one team
      - Register here before deadline.
  • Account list
  • Compress your code and report into one zip file and upload to E3.
  • Name your report as:LeaderID_Report_FP1.pdf
  • Name your package as: LeaderID_FP1.zip
  • One team only need to upload one package to E3.
  • Make sure TA can compile and run your code with “make” and “make run” on the provided server.
  • Any CUDA library is forbidden to use in this project !!!
  • DELAY IS NOT ACCEPTABLE !!!
  • Due day:2017/10/26(Thr) 23:50

Useful Reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published